XPath: how to select nodes by IN condition? - xpath

Is it possible to select nodes in a similar way?
'./tr[position() in (1, 3, 7)]'
I found only this solution:
'./tr[position() = 1 or position() = 3 or position() = 7]'

In XPath 2.0 you would simply do:
./tr[position = (1,3,7)]
In XPath 1.0 the usual way to do it is the solution you already found, an alternative that is a bit shorter would be something like:
./tr[contains('1 3 7', position())]
The spaces in the string are essential here, otherwise you'd also get nodes 13,37 and 137.

Related

Xpath query that check whether 2 element text values exist in a list

I am trying to figure out a way to check whether 2 element text values exist in a list using Xpath query. I am limited to Xpath 1.0 at the moment.
I have a body of XML with a subtree that contains a list. It looks like this:
<tags>
<tag>A</tag>
<tag>B</tag>
<tag>C</tag>
<tag>E</tag>
</tags>
I want to have a query that returns a node-set if there are tag text values in that list that equal 'A' and 'C'. (this runs a file with many of these xml tag lists I've described above.
This is my current best effort:
descendant-or-self::node()[local-name(.) = 'tag' and text() = 'A' and (preceding-sibling::text() = 'C' or following-sibling::text() = 'C')]
This issue is likely due to my nested condition and use of preceding-sibling and following-sibling.
Is is possible to nest conditions in the way?
Thanks in advance for any help and insights provided!
if ( //tag/text() = 'A' and //tag/text() = 'C' ) then //tag[text() = 'A' or text() = 'C'] else //empty
This is the solution that I arrived at:
//descendant-or-self::node()[local-name(.) = 'tag' and text() = 'A' and (preceding-sibling::node()[text() = 'C'] or following-sibling::node()[text() = 'C']) ]

How can I select the second last item in a xpath query?

I'm new to xpath and I understand how to get a range of values in xpath:
/bookstore/book[position()>=2 and position()<=10]
but in my case, I need to get above 2 and one less then the total(so if there's 10 then I need 9, or if there's 5, I need up to the 4th spot). I'm applying my code to different pages and the number of entries is not always the same.
In python, I could do something like book[2:-2], but I'm unsure if I can do this within xpath.
You can use last() which represents the last item in the context:
/bookstore/book[position()>=2 and position() <= (last() - 1)]
In my case this was working for me to get last but one element
/bookstore/book[position() = (last() - 1)]

xpath, how to select more than one item using indices

In this query, I select the 3rd
//tablecontainer/table/tbody/tr/td[3]
How do I select both the 3rd and 4th 's?
To get both the 3rd and 4th tds, you can use the expression:
//tablecontainer/table/tbody/tr/td[position() >= 3 and position() <= 4]
//tablecontainer/table/tbody/tr/td[position()=3 or position()=4]
If you can use XPath 2.0 you could use following trick
//tablecontainer/table/tbody/tr/td[position() = (1,2,4)]
Test position() = (1,2,4) means something similar as IN from SQL. Notice the brackets in (1,2,4) part.

XPath :: running counter two levels

Using the count(preceding-sibling::*) XPath expression one can obtaining incrementing counters. However, can the same also be accomplished in a two-levels deep sequence?
example XML instance
<grandfather>
<father>
<child>a</child>
</father>
<father>
<child>b</child>
<child>c</child>
</father>
</grandfather>
code (with Saxon HE 9.4 jar on the CLASSPATH for XPath 2.0 features)
Trying to get an counter sequence of 1,2 and 3 for the three child nodes with different kinds of XPath expressions:
XPathExpression expr = xpath.compile("/grandfather/father/child");
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0 ; i < nodes.getLength() ; i++) {
Node node = nodes.item(i);
System.out.printf("child's index is: %s %s %s, name is: %s\n"
,xpath.compile("count(preceding-sibling::*)").evaluate(node)
,xpath.compile("count(preceding-sibling::child)").evaluate(node)
,xpath.compile("//child/position()").evaluate(doc)
,xpath.compile(".").evaluate(node));
}
The above code prints:
child's index is: 0 0 1, name is: a
child's index is: 0 0 1, name is: b
child's index is: 1 1 1, name is: c
None of the three XPaths I tried managed to produce the correct sequence: 1,2,3. Clearly it can trivially be done using the i loop variable but I want to accomplish it with XPath if possible. Also I need to keep the basic framework of evaluating an XPath expression to get all the nodes to visit and then iterating on that set since that's the way the real application I work on is structured. Basically I visit each node and then need to evaluate a number of XPath expressions on it (node) or on the document (doc); one of these XPAth expressions is supposed to produce this incrementing sequence.
Use the preceding axis with a name test instead.
count(preceding::child)
Using XPath 2.0, there is a much better way to do this. Fetch all <child/> nodes and use the position() function to get the index:
//child/concat("child's index is: ", position(), ", name is: ", text())
You don't say efficiency is important, but I really hate to see this done with O(n^2) code! Jens' solution shows how to do that if you can use the result in the form of a sequence of (position, name) pairs. You could also return an alternating sequence of strings and numbers using //child/(string(.), position()): though you would then want to use the s9api API rather than JAXP, because JAXP can only really handle the data types that arise in XPath 1.0.
If you need to compute the index of each node as part of other processing, it might still be worth computing the index for every node in a single initial pass, and then looking it up in a table. But if you're doing that, the simplest way is surely to iterate over the result of //child and build a map from nodes to the sequence number in the iteration.

How to select all nodes such that their group size is higher than a given value, in XPath

I would like to select all <mynode> elements that have a value that appears a certain number of times (say, x) in all the elements.
Example:
<root>
<mynode>
<attr1>value_1</attr1>
<attr2>value_2</attr2>
</mynode>
<mynode>
<attr1>value_3</attr1>
<attr2>value_3</attr2>
</mynode>
<mynode>
<attr1>value_4</attr1>
<attr2>value_5</attr2>
</mynode>
<mynode>
<attr1>value_6</attr1>
<attr2>value_5</attr2>
</mynode>
</root>
In this case, I want all the <mynode> elements that whose attr2 value occurs > 1 time (x = 1). So, the last two <mynode>s.
Which query I have to perform in order to achieve this target?
If you're using XPath 2.0 or greater, then the following will work:
for $value in distinct-values(/root/mynode/attr2)
return
if (count(/root/mynode[attr2 = $value]) > 1) then
/root/mynode[attr2 = $value]
else ()
For a more detailed discussion see: XPath/XSLT nested predicates: how to get the context of outer predicate?
This is also possible in plain XPath 1.0 (also works in newer versions of XPath); and probably easier to read. Think of your problem as you're looking for all <mynode/>s which have an <att2/> node that also occurs before or after the <mynode/>:
//mynode[attr2 = preceding::attr2 or attr2 = following::attr2]
If <att2/> nodes can also accour inside other elements and you do not want to test for those:
//mynode[attr2 = preceding::mynode/attr2 or attr2 = following::mynode/attr2]

Resources