Xpath combine predicates with common ancestor? - xpath

I want immediate tr of table optionally wrapped in tbody:
//table[complex-predictor]/tbody/tr | //table[complex-predictor]/tr
I want to combine the predicates as:
//table[complex-predictor](/tbody/tr | /tr)
But it not works. What is the correct way to do this?
Btw, i don't want tr deep in table
(/tbody/tr/td/table/tbody/tr)

This is one possible way :
//table//*[self::th|self::tr]
The main XPath returns all descendant elements of table, then the predicate (the expression in []) filters the descendants to be returned to only th and tr elements.
"Btw, i don't want tr deep in table
(/tbody/tr/td/table/tbody/tr)"
In XPath 2.0 or above you can do :
//table[complex-predictor]/(tbody/tr|tr)
But in XPath 1.0, I don't see a clean way to get this done without repeating the 'complex-predictor'

Related

Xpath-filtering items

I have a short question. How can I display only the elements who's value is = '.'
I have no idea how to do that. I'm newbie in XPath.
<SalesTransaction>
<TransactionHeader>
<TransactionHeaderFields>
<WrntyID>a</WrntyID>
<ExternalID/>
<Type>.</Type>
<Status>
Submited
</Status>
<CreationDate>
2015-01-12
</CreationDate>
<Date>
2015-01-12T11:41:29Z
</Date>
<DeliveryDate>
2015-01-12
</DeliveryDate>
<Remark/>
</TransactionHeaderFields>
<CatalogFields>
<CatalogID>
saf
</CatalogID>
</CatalogFields>
</TransactionHeader>
</SalesTransaction>
Ignoring any of the structure and just looking for any element who's text() is equal to ".", you could use:
//*[text()='.']
//* will search through the entire tree structure, looking for any element at any level
[text()='.'] is a predicate filter (kind of like a WHERE clause in SQL) that performs a test on each of those matched elements. Only the ones that have a text() node who's value is equal to . will evaluate to true() and will be what is left.
It's not not he most efficient XPath expression, but may be good enough for what you need.

XPath 1.0 exclusive or node-set expression

What I need doesn't quite seem to match what other articles of a similar title are about.
I need, using Xpath 1, to be able to get node a, or node b, excusively, in that order.
That is, node a if it exists, otherwise, node b.
an xpath expression such as :
expression | expression
will get me both in the case they both exist. that is not what I want.
I could go:
(expression | expression)[last()]
Which does in fact gget me what I need (in my case), but seems to be a bit inefficient, because it will evaluate both sides of the expression before the last result is selected.
I was hoping for an expression that is going to stop working once the left side succeeds.
A more concrete example of XML
<one>
<two>
<three>hello</three>
<four>bye</four>
</two>
<blahfive>again</blahfive>
</one>
and the xpath that works (but inefficient):
(/one/*[starts-with(local-name(.), 'blah')] | .)[last()]
To be clear, I would like to grab the immediate child node of 'one' which starts with 'blah'. However, if it doesn't exist, I would like only the current node.
If the 'blah' node does exist, I do not want the current node.
Is there a more efficient way to achieve this?
I need, using Xpath 1, to be able to get node a, or node b,
excusively, in that order. That is, node a if it exists, otherwise,
node b.
an xpath expression such as :
expression | expression
will get me both in the case they both exist. that is not what I want.
I could go:
(expression | expression)[last()]
Which does in fact gget me what I need (in my case),
This statement is not true.
Here is an example. Let us have this XML document:
<one>
<a/>
<b/>
</one>
Expression1 is:
/*/a
Expression2 is:
/*/b
Your composite expression:
(Expression1 | Expression2)[last()]
when we substitute the two expressions above is:
(/*/a | /*/b)[last()]
And this expression actually selects b -- not a -- because b is the last of the two in document order.
Now, here is an expression that selects just a if it exists, and selects b only if a doesn't exist -- regardless of document order:
/*/a | /*/b[not(/*/a)]
When this expression is evaluated on the XML document above, it selects a, regardless of its document order -- try swapping in the XML document above the places of a and b to confirm that in both cases the element that is selected is a.
To summarize, one expression that selects the wanted node regardless of any document order is:
Expression1 | Expression2[not(Expression1)]
Let us apply this general expression in your case:
Expression1 is:
/one/*[starts-with(local-name(.), 'blah')]
Expression2 is:
self::node()
The wanted expression (after substituting Expression1 and Expression2 in the above general expression) is:
/one/*[starts-with(local-name(.), 'blah')]
|
self::node()[not(/one/*[starts-with(local-name(.), 'blah')])]

xpath Expression for "or" operator

Can anyone please help me, I want to use or operator in my xpath expression to select all input or all a from an html page.
my expression is like this:
document.DocumentNode.SelectNodes("//input or //a");
But I'm having errors.
You can use the union operator:
//input | //a
Or an expression like this, which may perform somewhat better:
//*[self::input or self::a]
The or operator is boolean OR in XPath, so //input or //a is a boolean expression which will return true if either of the node sets //input and //a are non-empty (i.e. within your source document there is at least one input element or one a element or both) and false otherwise.
Instead you're looking for the | operator which is the "union" operation on node sets.
//input | //a
will give you a set containing all the input elements and all the a elements.

XPath combine node sets

Let's say I have two expressions: //div[#class="foo"] and //span[#class="foo"]. Is it possible to "combine" them, like so:
//(div | span)[#class="foo"]
Or can I only take the union of the two complete expressions?
//div[#class="foo"] | //span[#class="foo"]
A more idiomatic (and dare I say readable) way to get all of the div and span elements having class="foo" is this:
//*[(self::div or self::span) and #class="foo"]
In English:
Select all elements that are themselves a div or a span and that have a class attribute whose value is 'foo'
As for your original question, the following expressions return equivalent results:
(//div | //span)[#class="foo"]
//div[#class="foo"] | //span[#class="foo"]
The first gives you the set that is the union of all the div and span elements in the document, further filtered to include only those having class="foo" while the latter gives you the union of 1) the set of all div elements having class="foo" and 2) the set of all span elements having class="foo".
It should be fairly obvious that those two sets contain the same thing.
This construct works:
(//golfer | //batter)[#ID="2" or #ID="3"]
...much to my astonishment.

XPath :: running counter two levels

Using the count(preceding-sibling::*) XPath expression one can obtaining incrementing counters. However, can the same also be accomplished in a two-levels deep sequence?
example XML instance
<grandfather>
<father>
<child>a</child>
</father>
<father>
<child>b</child>
<child>c</child>
</father>
</grandfather>
code (with Saxon HE 9.4 jar on the CLASSPATH for XPath 2.0 features)
Trying to get an counter sequence of 1,2 and 3 for the three child nodes with different kinds of XPath expressions:
XPathExpression expr = xpath.compile("/grandfather/father/child");
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0 ; i < nodes.getLength() ; i++) {
Node node = nodes.item(i);
System.out.printf("child's index is: %s %s %s, name is: %s\n"
,xpath.compile("count(preceding-sibling::*)").evaluate(node)
,xpath.compile("count(preceding-sibling::child)").evaluate(node)
,xpath.compile("//child/position()").evaluate(doc)
,xpath.compile(".").evaluate(node));
}
The above code prints:
child's index is: 0 0 1, name is: a
child's index is: 0 0 1, name is: b
child's index is: 1 1 1, name is: c
None of the three XPaths I tried managed to produce the correct sequence: 1,2,3. Clearly it can trivially be done using the i loop variable but I want to accomplish it with XPath if possible. Also I need to keep the basic framework of evaluating an XPath expression to get all the nodes to visit and then iterating on that set since that's the way the real application I work on is structured. Basically I visit each node and then need to evaluate a number of XPath expressions on it (node) or on the document (doc); one of these XPAth expressions is supposed to produce this incrementing sequence.
Use the preceding axis with a name test instead.
count(preceding::child)
Using XPath 2.0, there is a much better way to do this. Fetch all <child/> nodes and use the position() function to get the index:
//child/concat("child's index is: ", position(), ", name is: ", text())
You don't say efficiency is important, but I really hate to see this done with O(n^2) code! Jens' solution shows how to do that if you can use the result in the form of a sequence of (position, name) pairs. You could also return an alternating sequence of strings and numbers using //child/(string(.), position()): though you would then want to use the s9api API rather than JAXP, because JAXP can only really handle the data types that arise in XPath 1.0.
If you need to compute the index of each node as part of other processing, it might still be worth computing the index for every node in a single initial pass, and then looking it up in a table. But if you're doing that, the simplest way is surely to iterate over the result of //child and build a map from nodes to the sequence number in the iteration.

Resources