Storing the value of attribute and matching it in Xquery - xpath

<a>
{
for $p in doc("x.xml")/user//district
where data($p/#population) div data($p/#area) > 20
return <b>
<c> {$p/#name/string()} </c>
<d> {$p/#population div $p/#area} </d>
<d> {$p/#city/string()}</e>
</b>
}
</a>
I am getting the id of the district but I want the name of the country . So i want to traverse back to the district from city. Like in snippet below, I am getting country(id = 'f0_149') but i am unable to match it and print the country name 'Austria'instead.
I am new to Xquery so I am unaware about many things.
XML FILE SNIPPET:
<?xml version="1.0" encoding="UTF-8"?>
<user xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<country id='f0_149' name='Austria' capital='f0_1467' population='10000000' total_area='50000' >
<name>
Austria
</name>
<district id='f0_17440' name='Burgenland' country='f0_149' capital='f0_2291' population='250000' area='5000'>
<city id='f0_2291' country='f0_149' province='f0_17440'>
<name>
Eisenstadt
</name>
<population year='87'>
10102
</population>
</city>

You can use the parent:: XPath axis:
<c> { $p/parent::country/#name/string() } </c>

Related

ServiceNow : Jelly Error

I want to print the elements from an array , using foreach loop it is giving error "The prefix j for element j:foreach is not bound".
Below is the code which I wrote:
<?xml version="1.0" encoding="utf-8" ?>
<j:jelly trim="false" xmlns:j="jelly:core" xmlns:g="glide" xmlns:j2="null" xmlns:g2="null">
<h1>Jello World </h1>
<g:evaluate>
var words=['Hello','world','bye'];
</g:evaluate>
<j:foreach var="jvar_word" items="${words}">
<p> ${jvar_word} </p>
</j:foreach>
</j:jelly>
I believe Jelly is case sensitive. Try changing j:foreach to f:forEach.
<j:forEach var="jvar_word" items="${words}">
<p> ${jvar_word} </p>
</j:forEach>

Marklogic: Xpath using removing processing instruction tag

How to remove the processing instruction tag in xml using XQuery ?
Sample XML:
<a>
<text><?test id="1" loc="start"?><b type="bold">1. </b>
Security or protection <?test id="1" loc=="end"?><?test id="1" loc="start"?><b type="bold">2.
</b> Analyse.
<?test id="1" loc="end"?></text>
</a>
Expected output :
<a>
<text><b type="bold">1. </b> Security or protection <b type="bold">2.
</b> Analyse.</text>
</a>
Kindly help to removing PI tags.
Something like this should work:
xquery version "1.0-ml";
declare function local:suppress-pi($nodes) {
for $node in $nodes
return
typeswitch ($node)
case element() return
element { fn:node-name($node) } {
$node/#*,
local:suppress-pi($node/node())
}
case processing-instruction() return ()
default return $node
};
local:suppress-pi(<a>
<text><?test id="1" loc="start"?><b type="bold">1. </b>
Security or protection <?test id="1" loc=="end"?><?test id="1" loc="start"?><b type="bold">2.
</b> Analyse.
<?test id="1" loc="end"?></text>
</a>)
HTH!

How to get text between two strings with special characters in ruby?

I have a string (#description) that contains HTML code and I want to extract the content between two elements. It looks something like this
<b>Content title<b><br/>
*All the content I want to extract*
<a href="javascript:print()">
I've managed to do something like this
#want = #description.match(/Content title(.*?)javascript:print()/m)[1].strip
But obviously this solution is far from perfect as I get some unwanted characters in my #want string.
Thanks for your help
Edit:
As requested in the comments, here is the full code:
I'm already parsing an HTML document doing something where the following code:
#description = #doc.at_css(".entry-content").to_s
puts #description
returns:
<div class="post-body entry-content">
<img alt="Photo title" height="333" src="http://photourl.com" width="500"><br><br><div style="text-align: justify;">
Some text</div>
<b>More text</b><br><b>More text</b><br><br><ul>
<li>Numered item</li>
<li>Numered item</li>
<li>Numered item</li>
</ul>
<br><b>Content Title</b><br>
Some text<br><br>
Some text(with links and images)<br>
Some text(with links and images)<br>
Some text(with links and images)<br>
<br><br><img src="http://url.com/photo.jpg">
<div style="clear: both;"></div>
</div>
The text can include more paragraphs, links, images, etc. but it always starts with the "Content Title" part and ends with the javascript reference.
This XPath expression selects all (sibling) nodes between the nodes $vStart and $vEnd:
$vStart/following-sibling::node()
[count(.|$vEnd/preceding-sibling::node())
=
count($vEnd/preceding-sibling::node())
]
To obtain the full XPath expression to use in your specific case, simply substitute $vStart with:
/*/b[. = 'Content Title']
and substitute $vEnd with:
/*/a[#href = 'javascript:print()']
The final XPath expressions after the substitutions is:
/*/b[. = 'Content Title']/following-sibling::node()
[count(.|/*/a[#href = 'javascript:print()']/preceding-sibling::node())
=
count(/*/a[#href = 'javascript:print()']/preceding-sibling::node())
]
Explanation:
This is a simple corollary of the Kayessian formula for the intersection of two nodesets $ns1 and $ns2:
$ns1[count(.|$ns2) = count($ns2)]
In our case, the set of all nodes between the nodes $vStart and $vEnd is the intersection of two node-sets: all following siblings of $vStart and all preceding siblings of $vEnd.
XSLT - based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="vStart" select="/*/b[. = 'Content Title']"/>
<xsl:variable name="vEnd" select="/*/a[#href = 'javascript:print()']"/>
<xsl:template match="/">
<xsl:copy-of select=
"$vStart/following-sibling::node()
[count(.|$vEnd/preceding-sibling::node())
=
count($vEnd/preceding-sibling::node())
]
"/>
==============
<xsl:copy-of select=
"/*/b[. = 'Content Title']/following-sibling::node()
[count(.|/*/a[#href = 'javascript:print()']/preceding-sibling::node())
=
count(/*/a[#href = 'javascript:print()']/preceding-sibling::node())
]
"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document (converted to a well-formed XML document):
<div class="post-body entry-content">
<a href="http://www.photourl">
<img alt="Photo title" height="333" src="http://photourl.com" width="500"/>
</a>
<br />
<br />
<div style="text-align: justify;">
Some text</div>
<b>More text</b>
<br />
<b>More text</b>
<br />
<br />
<ul>
<li>Numered item</li>
<li>Numered item</li>
<li>Numered item</li>
</ul>
<br />
<b>Content Title</b>
<br />
Some text
<br />
<br />
Some text(with links and images)
<br />
Some text(with links and images)
<br />
Some text(with links and images)
<br />
<br />
<br />
<a href="javascript:print()">
<img src="http://url.com/photo.jpg"/>
</a>
<div style="clear: both;"></div>
</div>
the two XPath expressions (with and without variable references) are evaluated and the nodes selected in each case, conveniently delimited, are copied to the output:
<br/>
Some text
<br/>
<br/>
Some text(with links and images)
<br/>
Some text(with links and images)
<br/>
Some text(with links and images)
<br/>
<br/>
<br/>
==============
<br/>
Some text
<br/>
<br/>
Some text(with links and images)
<br/>
Some text(with links and images)
<br/>
Some text(with links and images)
<br/>
<br/>
<br/>
To test your HTML, I have added tags around your code then pasting it in a file
xmllint --html --xpath '/html/body/div/text()' /tmp/l.html
output :
Some text
Some text
Some text
Some text
Now, you can use an Xpath module in ruby and re-use the Xpath expression
You will find many examples on stackoverflow website searches.

xpath syntax in Scrapy

team = hxs.select ('//table[#class="tablehead"/tbody/tr[contains[.#class, "player"]')
The structure of the web site I whose table I want to select is as follows:
<html>
<body>
<table>
<tbody>
<tr>
<td>...</td>
<td>...</td>
...
</tr>
</tbody>
</table>
</body>
</html>
Since there are multiple tables in the web site, I only want to select the one whose class is defined as "tablehead". Also, for that table, I only want to select the tags whose class attributes contain the string "player". My attempt above looks a bit spotty to begin with. I tried running the crawler, and it says that the line I produced above is an invalid xpath line. Any advice would be nice.
I've came across these problems before, try to omit tbody in the xpath expression.
//table[#class="tablehead"/tbody/tr[contains[.#class, "player"]
Correcting this results in:
//table[#class='tablehead']/tbody/tr[contains(#class, 'player')]
This selects every tr the string value of whose class attribute contains the string "player" and that (the tr) is a child of a tbody that is a child of any table in the XML document, whose class attribute has string value "tablehead" .
XSLT - based verification:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:copy-of select=
"//table[#class='tablehead']
/tbody/tr[contains(#class, 'player')]
"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document (made just a little bit more realistic):
<html>
<body>
<table class="tablehead">
<tbody>
<tr class="major-player">
<td>player1</td>
<td>player2</td>
</tr>
</tbody>
</table>
</body>
</html>
the Xpath expression is evaluated and the selected nodes (just one in this case) are copied to the output:
<tr class="major-player">
<td>player1</td>
<td>player2</td>
</tr>

XPath to find attributes where the name starts with a given value

With this xml:
<div val1="q">a</div>
<div val2="w">b</div>
<div val3="e">c</div>
<div some="r">d</div>
<div thing="t">f</div>
<div name="y">g</div>
we want to find only
<div val1="q">a</div>
<div val2="w">b</div>
<div val3="e">c</div>
which are those nodes having an attribute where the attribute name begins with val
You can try this :
//div/#*[starts-with(name(.), 'val')]
if you know that you are looking for the first attribute of the div element.
Edit:
Sorry didn't realize you wanted to select the elements themselves. You could use parent::div or what you did, but the proper way of doing this would be to select directly the div themselves :
//div[#*[starts-with(name(), 'val')]]
have you tried with .../#val* ?
which are those nodes having an attribute where the attribute name
begins with val
Use:
//div[#*[starts-with(name(), 'val')]]
This selects any div element in the document, that has at least one attribute, whose name starts with the string "val".
XSLT - based verification:
This transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select="//div[#*[starts-with(name(), 'val')]]"/>
</xsl:template>
</xsl:stylesheet>
when applied on this XML document (produced from the provided XML fragment):
<html>
<div val1="q">a</div>
<div val2="w">b</div>
<div val3="e">c</div>
<div some="r">d</div>
<div thing="t">f</div>
<div name="y">g</div>
</html>
selects and outputs the wanted nodes:
<div val1="q">a</div>
<div val2="w">b</div>
<div val3="e">c</div>

Resources