XPath combine node sets

XPath combine node sets - xpath

Let's say I have two expressions: //div[#class="foo"] and //span[#class="foo"]. Is it possible to "combine" them, like so:
//(div | span)[#class="foo"]
Or can I only take the union of the two complete expressions?
//div[#class="foo"] | //span[#class="foo"]

A more idiomatic (and dare I say readable) way to get all of the div and span elements having class="foo" is this:
//*[(self::div or self::span) and #class="foo"]
In English:
Select all elements that are themselves a div or a span and that have a class attribute whose value is 'foo'
As for your original question, the following expressions return equivalent results:
(//div | //span)[#class="foo"]
//div[#class="foo"] | //span[#class="foo"]
The first gives you the set that is the union of all the div and span elements in the document, further filtered to include only those having class="foo" while the latter gives you the union of 1) the set of all div elements having class="foo" and 2) the set of all span elements having class="foo".
It should be fairly obvious that those two sets contain the same thing.

This construct works:
(//golfer | //batter)[#ID="2" or #ID="3"]
...much to my astonishment.

Related

Xpath combine predicates with common ancestor?

I want immediate tr of table optionally wrapped in tbody:
//table[complex-predictor]/tbody/tr | //table[complex-predictor]/tr
I want to combine the predicates as:
//table[complex-predictor](/tbody/tr | /tr)
But it not works. What is the correct way to do this?
Btw, i don't want tr deep in table
(/tbody/tr/td/table/tbody/tr)

This is one possible way :
//table//*[self::th|self::tr]
The main XPath returns all descendant elements of table, then the predicate (the expression in []) filters the descendants to be returned to only th and tr elements.
"Btw, i don't want tr deep in table
(/tbody/tr/td/table/tbody/tr)"
In XPath 2.0 or above you can do :
//table[complex-predictor]/(tbody/tr|tr)
But in XPath 1.0, I don't see a clean way to get this done without repeating the 'complex-predictor'

Select all nodes until a specific given node/tag

Given the following markup:
<div id="about">
<dl>
<dt>Date</dt>
<dd>1872</dd>
<dt>Names</dt>
<dd>A</dd>
<dd>B</dd>
<dd>C</dd>
<dt>Status</dt>
<dd>on</dd>
<dt>Another Field</dt>
<dd>X</dd>
<dd>Y</dd>
</dl>
</div>
I'm trying to extract all the <dd> nodes following <dt>Names</dt> but only until another <dt> starts. In this case, I'm after the following nodes:
<dd>A</dd>
<dd>B</dd>
<dd>C</dd>
I'm trying the following XPath code, but it's not working as intended.
xpath("//div[#id='about']/dl/dt[contains(text(),'Names')]/following-sibling::dd[not(following-sibling::dt)]/text()")
Any thoughts on how to fix it?
Many thanks.

Update: much simpler solution
There is a prerequisite in your situation, that is that the anchor item always is the first preceding sibling with a certain property. Because of that, here's a much simpler way of writing the below complex expression:
/div/dl/dd[preceding-sibling::dt[1][. = 'Names']]
In other words:
select any dd
that has a first preceding sibling dt (the preceding sibling axis counts backwards)
that itself has a value of "Names"
As can be seen in the following screenshot from oXygen, it selects the nodes you wanted to select (and if you change "Names" to "Status" or "Another Field", it will select only the following ones before the next dt also).
Original complex solution (leaving in for reference)
This is far easier in XPath 2.0, but let's assume you can only use XPath 1.0. The trick is to count the number of preceding siblings from your anchor element (the one with "Names" in it), and disregard any that have the wrong count (i.e., when we cross over <dt>Status</dt>, the number of preceding siblings has increased).
For XPath 1.0, remove the comments between (: and :) (in XPath, whitespace is insignificant, you can make it a multiline XPath for readability, but in 1.0, comments are not possible)
/div/dl/dd
(: any dd having a dt before it with "Names" :)
[preceding-sibling::dt[. = 'Names']]
(: count the preceding siblings up to dt with "Names", add one to include 'self' :)
[count(preceding-sibling::dt[. = 'Names']/preceding-sibling::dt) + 1
=
(: compare with count of all preceding siblings :)
count(preceding-sibling::dt)]
As a one-liner:
/div/dl/dd[preceding-sibling::dt[. = 'Names']][count(preceding-sibling::dt[. = 'Names']/preceding-sibling::dt) + 1 = count(preceding-sibling::dt)]

How about this:
//dd[preceding-sibling::dt[contains(., 'Names')]][following-sibling::dt]

XPath: Select following siblings until certain class

I have the following html snippet:
<table>
<tr>
<td class="foo">a</td>
<td class="bar">1</td>
<td class="bar">2</td>
<td class="foo">b</td>
<td class="bar">3</td>
<td class="bar">4</td>
<td class="bar">5</td>
<td class="foo">c</td>
<td class="bar">6</td>
<td class="bar">7</td>
</tr>
</table>
I'm looking for a XPath 1.0 expression that starts at a .foo element and selects all following .bar elements before the next .foo element.
For example: I start at a and want to select only 1 and 2.
Or I start at b and want to select 3, 4 and 5.
Background: I have to find an XPath expression for this method (using Java and Selenium):
public List<WebElement> bar(WebElement foo) {
return foo.findElements(By.xpath("./following-sibling::td[#class='bar']..."));
}
Is there a way to solve the problem?
The expression should work for all .foo elements without using any external variables.
Thanks for your help!
Update: There is apparently no solution for these special circumstances. But if you have fewer limitations, the provided expressions work perfectly.

Good question!
The following expression will give you 1..2, 3..5 or 6..7, depending on input X + 1, where X is the set you want (2 gives 1-2, 3 gives 3-.5 etc). In the example, I select the third set, hence it has [4]:
/table/tr[1]
/td[not(#class = 'foo')]
[
generate-id(../td[#class='foo'][4])
= generate-id(
preceding-sibling::td[#class='foo'][1]
/following-sibling::td[#class='foo'][1])
]
The beauty of this expression (imnsho) is that you can index by the given set (as opposed to index by relative position) and that is has only one place where you need to update the expression. If you want the sixth set, just type [7].
This expression works for any situation where you have siblings where you need the siblings between any two nodes of the same requirement (#class = 'foo'). I'll update with an explanation.
Replace the [4] in the expression with whatever set you need, plus 1. In oXygen, the above expression shows me the following selection:
Explanation
/table/tr[1]
Selects the first tr.
/td[not(#class = 'foo')]
Selects any td not foo
generate-id(../td[#class='foo'][4])
Gets the identity of the xth foo, in this case, this selects empty, and returns empty. In all other cases, it will return the identity of the next foo that we are interested in.
generate-id(
preceding-sibling::td[#class='foo'][1]
/following-sibling::td[#class='foo'][1])
Gets the identity of the first previous foo (counting backward from any non-foo element) and from there, the first following foo. In the case of node 7, this returns the identity of nothingness, resulting in true for our example case of [4]. In the case of node 3, this will result in c, which is not equal to nothingness, resulting in false.
If the example would have value [2], this last bit would return node b for nodes 1 and 2, which is equal to the identity of ../td[#class='foo'][2], returning true. For nodes 4 and 7 etc, this will return false.
Update, alternative #1
We can replace the generate-id function with a count-preceding-sibling function. Since the count of the siblings before the two foo nodes is different for each, this works as an alternative for generate-id.
By now it starts to grow just as wieldy as GSerg's answer, though:
/table/tr[1]
/td[not(#class = 'foo')]
[
count(../td[#class='foo'][4]/preceding-sibling::*)
= count(
preceding-sibling::td[#class='foo'][1]
/following-sibling::td[#class='foo'][1]/preceding-sibling::*)
]
The same "indexing" method applies. Where I write [4] above, replace it with the nth + 1 of the intersection position you are interested in.

If the current node is one of the td[#class'foo'] elements you can use the below xpath to get the following td[#class='bar'] elements, which are preceding to next td of foo:
following-sibling::td[#class='bar'][generate-id(preceding-sibling::td[#class='foo'][1]) = generate-id(current())]
Here, you select only those td[#class='bar'] whose first preceding td[#class='foo'] is same as the current node you are iterating on(confirmed using generate-id()).

So you want an intersection of two sets:
following-sibling::td[#class='bar'] that follow your starting td[#class='foo'] node
preceding-sibling::td[#class='bar'] that precede the next td[#class='foo'] node
Given the formula from the linked question, it is not difficult to get:
//td[1]/following-sibling::td[#class='bar'][count(. | (//td[1]/following-sibling::td[#class='foo'])[1]/preceding-sibling::td[#class='bar']) = count((//td[1]/following-sibling::td[#class='foo'])[1]/preceding-sibling::td[#class='bar'])]
However this will return an empty set for the last foo node because there is no next foo node to take precedings from.
So you want a difference of two sets:
following-sibling::td[#class='bar'] that follow your starting td[#class='foo'] node
following-sibling::td[#class='bar'] that follow the next td[#class='foo'] node
Given the formula from the linked question, it is not difficult to get:
//td[1]/following-sibling::td[#class='bar'][
count(. | (//td[1]/following-sibling::td[#class='foo'])[1]/following-sibling::td[#class='bar'])
!=
count((//td[1]/following-sibling::td[#class='foo'])[1]/following-sibling::td[#class='bar'])
]
The only amendable bit is the starting point, //td[1] (three times).
Now this will properly return bar nodes even for the last foo node.
The above was written under impression that you need to have a single XPath query and nothing more. Now that it's clear you don't, you can easily solve your problem with more than one XPath query and some manual list filtering on referential equality, as I already mentioned in a comment.
In C# that would be:
XmlNode context = xmlDocument.SelectSingleNode("//td[8]");
XmlNode nextFoo = context.SelectSingleNode("(./following-sibling::td[#class='foo'])[1]");
IEnumerable<XmlNode> result = context.SelectNodes("./following-sibling::td[#class='bar']").Cast<XmlNode>();
if (nextFoo != null)
{
// Intersect filters using referential equality by default
result = result.Intersect(nextFoo.SelectNodes("./preceding-sibling::td[#class='bar']").Cast<XmlNode>());
}
I'm sure it's trivial to convert to Java.

Pretty straightforward (example for 'a' td) but not very optimal:
//td[
#class='bar' and
preceding-sibling::td[#class='foo'][1][text() = 'a'] and
(
not(following-sibling::td[#class='foo']) or
following-sibling::td[#class='foo'][1][preceding-sibling::td[#class='foo'][1][text() = 'a']]
)
]

xpath Expression for "or" operator

Can anyone please help me, I want to use or operator in my xpath expression to select all input or all a from an html page.
my expression is like this:
document.DocumentNode.SelectNodes("//input or //a");
But I'm having errors.

You can use the union operator:
//input | //a
Or an expression like this, which may perform somewhat better:
//*[self::input or self::a]

The or operator is boolean OR in XPath, so //input or //a is a boolean expression which will return true if either of the node sets //input and //a are non-empty (i.e. within your source document there is at least one input element or one a element or both) and false otherwise.
Instead you're looking for the | operator which is the "union" operation on node sets.
//input | //a
will give you a set containing all the input elements and all the a elements.

How to select all nodes such that their group size is higher than a given value, in XPath

I would like to select all <mynode> elements that have a value that appears a certain number of times (say, x) in all the elements.
Example:
<root>
<mynode>
<attr1>value_1</attr1>
<attr2>value_2</attr2>
</mynode>
<mynode>
<attr1>value_3</attr1>
<attr2>value_3</attr2>
</mynode>
<mynode>
<attr1>value_4</attr1>
<attr2>value_5</attr2>
</mynode>
<mynode>
<attr1>value_6</attr1>
<attr2>value_5</attr2>
</mynode>
</root>
In this case, I want all the <mynode> elements that whose attr2 value occurs > 1 time (x = 1). So, the last two <mynode>s.
Which query I have to perform in order to achieve this target?

If you're using XPath 2.0 or greater, then the following will work:
for $value in distinct-values(/root/mynode/attr2)
return
if (count(/root/mynode[attr2 = $value]) > 1) then
/root/mynode[attr2 = $value]
else ()
For a more detailed discussion see: XPath/XSLT nested predicates: how to get the context of outer predicate?

This is also possible in plain XPath 1.0 (also works in newer versions of XPath); and probably easier to read. Think of your problem as you're looking for all <mynode/>s which have an <att2/> node that also occurs before or after the <mynode/>:
//mynode[attr2 = preceding::attr2 or attr2 = following::attr2]
If <att2/> nodes can also accour inside other elements and you do not want to test for those:
//mynode[attr2 = preceding::mynode/attr2 or attr2 = following::mynode/attr2]

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

XPath combine node sets - xpath

Let's say I have two expressions: //div[#class="foo"] and //span[#class="foo"]. Is it possible to "combine" them, like so: //(div | span)[#class="foo"] Or can I only take the union of the two complete expressions? //div[#class="foo"] | //span[#class="foo"]

This construct works: (//golfer | //batter)[#ID="2" or #ID="3"] ...much to my astonishment.

Related

Xpath combine predicates with common ancestor?

Select all nodes until a specific given node/tag

XPath: Select following siblings until certain class

xpath Expression for "or" operator

How to select all nodes such that their group size is higher than a given value, in XPath

Categories

Resources