XQuery - Get nodes from parent matching given pattern - xpath

I want to filter child nodes of an XML node depending on a given pattern.
My XML
<parent>
<total1>10</total1>
<total2>15</total2>
<value1>1</value1>
<value2>2</value2>
</parent>
Filter node matching this given pattern
total*
Expected result
<parent>
<total1>10</total1>
<total2>15</total2>
</parent>
I tried to work with this xQuery but it doesn't work. I don't know what is the best way to deal with that
for $n in //parent/*[starts-with(.,total)]
return $n
Also, I would like to be able to display in output something like that
<total1>number(10)</total1>
<total2>number(15)</total2>
in the output, to convert the node value from string to number. number(10) come from number(text())

Two little corrections:
You’ll need to check the name of the element; otherwise, it will be checked if the string value of the element will start with the given substring.
total needs to be quoted; otherwise, it will be treated as XPath expression.
Here is the result:
for $n in //parent/*[starts-with(name(), 'total')]
return $n

Related

How to get multiple occurences of an element with XPath under usage of normalize-space and substring-before

I have an element with three occurences on the page. If i match it with Xpath expression //div[#class='col-md-9 col-xs-12'], i get all three occurences as expected.
Now i try to rework the matching element on the fly with
substring-before(//div[#class='col-md-9 col-xs-12'], 'Bewertungen'), to get the string before the word "Bewertungen",
normalize-space(//div[#class='col-md-9 col-xs-12']), to clean up redundant whitespaces,
normalize-space(substring-before(//div[#class='col-md-9 col-xs-12'] - both actions.
The problem with last three expressions is, that they extract only the first occurence of the element. It makes no difference, whether i add /text() after matching definition.
I don't understand, how an addition of normalize-space and/or substring-before influences the "main" expression in the way it stops to recognize multiple occurences of targeted element and gets only the first. Without an addition it matches everything as it should.
How is it possible to adjust the Xpath expression nr. 3 to get all occurences of an element?
Example url is https://www.provenexpert.com/de-de/jazzyshirt/
The problem is that both normalize-space() and substring-before() have a required cardinality of 1, meaning can only accept one occurrence of the element you are trying to normalize or find a substring of. Each of your expressions results in 3 sequences which these two functions cannot process. (I probably didn't express the problem properly, but I think this is the general idea).
In light of that, try:
//div[#class='col-md-9 col-xs-12']/substring-before(normalize-space(.), 'Bewertung')
Note that in XPath 1.0, functions like substring-after(), if given a set of three nodes as input, ignore all nodes except the first. XPath 2.0 changes this: it gives you an error.
In XPath 3.1 you can apply a function to each of the nodes using the apply operator, "!": //div[condition] ! substring-before(normalize-space(), 'Bewertung'). That returns a sequence of 3 strings. There's no equivalent in XPath 1.0, because there's no data type in XPath 1.0 that can represent a sequence of strings.
In XPath 2.0 you can often achieve the same effect using "/" instead of "!", but it has restrictions.
When asking questions on StackOverflow, please always mention which version of XPath you are using. We tend to assume that if people don't say, they're probably using 1.0, because 1.0 products don't generally advertise their version number.

Match a certain subset of nodes by (global) position number

Is there an XPath syntax to match, for instance, the occurrences numbered 2,3,5,7,11,13 of a certain kind of node? That is, the same result as the union of
//item[2]
//item[3]
//item[5]
...
but in a single expression.
(Use case: I am using a Genshi transformer to match and remove a set of nodes. I can't match and remove them in successive expressions, because their indices would change inbetween.)
You can try using XPath position() like for example :
//item[position()=2 or position()=3 or position()=5 ...]
or maybe using parentheses if I understand correctly what you mean by "global position number" :
(//item)[position()=2 or position()=3 or position()=5 ...]

Using Xpath return text that is positioned after the last comma

Using xpath, I want to return the value 000078 & 000077 from the below xml. The text for "Entity" tag can be 2 comma separated values or 3 or more. I always want the last value.
<Parent ID="123">
<SubParent ID="1">
<Name>Modem</Name>
<Entity>000006,000069,000078</Entity>
</SubParent>
<SubParent ID="2">
<Name>Modem</Name>
<Entity>000006,000077</Entity>
</SubParent>
</Parent>
XPath is a selection language, not a string processing (or general purpose programming) language, and you can only select from the distinct nodes in your document.
The nodes that contain the values you are looking for are two text nodes, '000006,000069,000078' and '000006,000077', so //Entity/text() (or //Entity) is the closest you can get with XPath alone.
Any further string processing, like pulling out the substring after the last comma, must be done in the host language.
This is one of the examples that show that storing opaque strings that contain multiple data points (like comma-separated values) in XML is a bad idea.
This is how your XML should look like.
<Parent ID="123">
<SubParent ID="1">
<Name>Modem</Name>
<Entity>000006</Entity>
<Entity>000069</Entity>
<Entity>000078</Entity>
</SubParent>
<SubParent ID="2">
<Name>Modem</Name>
<Entity>000006</Entity>
<Entity>000077</Entity>
</SubParent>
</Parent>
because now you would easily be able to select //Entity[last()]/text() and get exactly two nodes.

How to match text sequences that continue through child nodes (e.g. with sgml-style markup)?

<bits>
<thing>Match this please</thing>
<thing>Don't match this</thing>
<thing>Match <b>this</b> please</thing>
</bits>
An expression like this:
//thing[text()='Match this please']
will locate the first 'thing' but not the third, because the phrase is distributed through a child node.
Is there an expression that would match the first and the third 'thing' in my example?
Try:
//thing[string()='Match this please']
jsfiddle:
http://jsfiddle.net/ZG9n3/2/
Please check the reference to see if this is going to work for you:
http://www.w3.org/TR/xpath/#function-string
Is there an expression that would
match the first and the third 'thing'
in my example?
You mean: Is there an expression that would select the first and the third element named thing, based on their string value.
Use:
/*/thing[. = 'Match this please']
The predicate compares the string value of the context node to the string "Match this please".
By definition the string value of an element is the concatenation (in document order) of all of its text-nodes descendents.
Note: Always try to avoid the // abbreviation because its use may incur big inefficiency. Whenever the structure of an XML document is known, use a chain of specific location steps.

How do construct an xpath to select items that do not contain a string

How do I use something similar to the example below, but with the opposite result, i.e items that do not contain (default) in the text.
<test>
<item>Some text (default)</item>
<item>Some more text</item>
<item>Even more text</item>
</test>
Given this
//test/item[contains(text(), '(default)')]
would return the first item. Is there a not operator that I can use with contains?
Yes, there is:
//test/item[not(contains(text(), '(default)'))]
Hint: not() is a function in XPath instead of an operator.
An alternative, possibly better way to express this is:
//test/item[not(text()[contains(., '(default)')])]
There is a subtle but important difference between the two expressions (let's call them A and B, respectively).
Simple case: If all <item> only have a single text node child, both A and B behave the same.
Complex case: If <item> can have multiple text node children, expression A only matches when '(default)' occurs in the first of them.
This is because text() matches all text node children and produces a node-set. So far no surprise. Now, contains() accepts a node-set as its first argument, but it needs to convert it to string to do its job. And conversion from node-set to string only produces the string value of the first node in the set, all other nodes are disregarded (try string(//item) to see what I mean). In the simple case this exactly what happens as well, but the result is not as surprising.
Expression B deals with this by explicitly checking every text node individually instead of only checking the string value of the whole <item> element. It's therefore the more robust of the two.

Resources