xpath: decipher this xpath? - xpath

what does this xpath mean ? can someone decipher this ?
//h1[following-sibling::*[1][self::b]]

Select every h1 element (in the document of the context node) that is immediately followed by a b element (with no other intervening element, though there may be intervening text).
Breaking it down:
//h1
Select every h1 element that is a descendant of the root node of the document that contains the context node;
[...]
filter out any of these h1 elements that don't meet the following criteria:
[following-sibling::*[1]...]
such that the first following sibling element passes this test:
[self::b]
self is a b element. Literally, this last test means, "such that when I start from the context node and select the self (i.e. the context node) subject to the node test that filters out everything except elements named b, the result is a non-empty node set."

Related

XPath: Select following siblings until certain class

I have the following html snippet:
<table>
<tr>
<td class="foo">a</td>
<td class="bar">1</td>
<td class="bar">2</td>
<td class="foo">b</td>
<td class="bar">3</td>
<td class="bar">4</td>
<td class="bar">5</td>
<td class="foo">c</td>
<td class="bar">6</td>
<td class="bar">7</td>
</tr>
</table>
I'm looking for a XPath 1.0 expression that starts at a .foo element and selects all following .bar elements before the next .foo element.
For example: I start at a and want to select only 1 and 2.
Or I start at b and want to select 3, 4 and 5.
Background: I have to find an XPath expression for this method (using Java and Selenium):
public List<WebElement> bar(WebElement foo) {
return foo.findElements(By.xpath("./following-sibling::td[#class='bar']..."));
}
Is there a way to solve the problem?
The expression should work for all .foo elements without using any external variables.
Thanks for your help!
Update: There is apparently no solution for these special circumstances. But if you have fewer limitations, the provided expressions work perfectly.
Good question!
The following expression will give you 1..2, 3..5 or 6..7, depending on input X + 1, where X is the set you want (2 gives 1-2, 3 gives 3-.5 etc). In the example, I select the third set, hence it has [4]:
/table/tr[1]
/td[not(#class = 'foo')]
[
generate-id(../td[#class='foo'][4])
= generate-id(
preceding-sibling::td[#class='foo'][1]
/following-sibling::td[#class='foo'][1])
]
The beauty of this expression (imnsho) is that you can index by the given set (as opposed to index by relative position) and that is has only one place where you need to update the expression. If you want the sixth set, just type [7].
This expression works for any situation where you have siblings where you need the siblings between any two nodes of the same requirement (#class = 'foo'). I'll update with an explanation.
Replace the [4] in the expression with whatever set you need, plus 1. In oXygen, the above expression shows me the following selection:
Explanation
/table/tr[1]
Selects the first tr.
/td[not(#class = 'foo')]
Selects any td not foo
generate-id(../td[#class='foo'][4])
Gets the identity of the xth foo, in this case, this selects empty, and returns empty. In all other cases, it will return the identity of the next foo that we are interested in.
generate-id(
preceding-sibling::td[#class='foo'][1]
/following-sibling::td[#class='foo'][1])
Gets the identity of the first previous foo (counting backward from any non-foo element) and from there, the first following foo. In the case of node 7, this returns the identity of nothingness, resulting in true for our example case of [4]. In the case of node 3, this will result in c, which is not equal to nothingness, resulting in false.
If the example would have value [2], this last bit would return node b for nodes 1 and 2, which is equal to the identity of ../td[#class='foo'][2], returning true. For nodes 4 and 7 etc, this will return false.
Update, alternative #1
We can replace the generate-id function with a count-preceding-sibling function. Since the count of the siblings before the two foo nodes is different for each, this works as an alternative for generate-id.
By now it starts to grow just as wieldy as GSerg's answer, though:
/table/tr[1]
/td[not(#class = 'foo')]
[
count(../td[#class='foo'][4]/preceding-sibling::*)
= count(
preceding-sibling::td[#class='foo'][1]
/following-sibling::td[#class='foo'][1]/preceding-sibling::*)
]
The same "indexing" method applies. Where I write [4] above, replace it with the nth + 1 of the intersection position you are interested in.
If the current node is one of the td[#class'foo'] elements you can use the below xpath to get the following td[#class='bar'] elements, which are preceding to next td of foo:
following-sibling::td[#class='bar'][generate-id(preceding-sibling::td[#class='foo'][1]) = generate-id(current())]
Here, you select only those td[#class='bar'] whose first preceding td[#class='foo'] is same as the current node you are iterating on(confirmed using generate-id()).
So you want an intersection of two sets:
following-sibling::td[#class='bar'] that follow your starting td[#class='foo'] node
preceding-sibling::td[#class='bar'] that precede the next td[#class='foo'] node
Given the formula from the linked question, it is not difficult to get:
//td[1]/following-sibling::td[#class='bar'][count(. | (//td[1]/following-sibling::td[#class='foo'])[1]/preceding-sibling::td[#class='bar']) = count((//td[1]/following-sibling::td[#class='foo'])[1]/preceding-sibling::td[#class='bar'])]
However this will return an empty set for the last foo node because there is no next foo node to take precedings from.
So you want a difference of two sets:
following-sibling::td[#class='bar'] that follow your starting td[#class='foo'] node
following-sibling::td[#class='bar'] that follow the next td[#class='foo'] node
Given the formula from the linked question, it is not difficult to get:
//td[1]/following-sibling::td[#class='bar'][
count(. | (//td[1]/following-sibling::td[#class='foo'])[1]/following-sibling::td[#class='bar'])
!=
count((//td[1]/following-sibling::td[#class='foo'])[1]/following-sibling::td[#class='bar'])
]
The only amendable bit is the starting point, //td[1] (three times).
Now this will properly return bar nodes even for the last foo node.
The above was written under impression that you need to have a single XPath query and nothing more. Now that it's clear you don't, you can easily solve your problem with more than one XPath query and some manual list filtering on referential equality, as I already mentioned in a comment.
In C# that would be:
XmlNode context = xmlDocument.SelectSingleNode("//td[8]");
XmlNode nextFoo = context.SelectSingleNode("(./following-sibling::td[#class='foo'])[1]");
IEnumerable<XmlNode> result = context.SelectNodes("./following-sibling::td[#class='bar']").Cast<XmlNode>();
if (nextFoo != null)
{
// Intersect filters using referential equality by default
result = result.Intersect(nextFoo.SelectNodes("./preceding-sibling::td[#class='bar']").Cast<XmlNode>());
}
I'm sure it's trivial to convert to Java.
Pretty straightforward (example for 'a' td) but not very optimal:
//td[
#class='bar' and
preceding-sibling::td[#class='foo'][1][text() = 'a'] and
(
not(following-sibling::td[#class='foo']) or
following-sibling::td[#class='foo'][1][preceding-sibling::td[#class='foo'][1][text() = 'a']]
)
]

XPath 1.0 exclusive or node-set expression

What I need doesn't quite seem to match what other articles of a similar title are about.
I need, using Xpath 1, to be able to get node a, or node b, excusively, in that order.
That is, node a if it exists, otherwise, node b.
an xpath expression such as :
expression | expression
will get me both in the case they both exist. that is not what I want.
I could go:
(expression | expression)[last()]
Which does in fact gget me what I need (in my case), but seems to be a bit inefficient, because it will evaluate both sides of the expression before the last result is selected.
I was hoping for an expression that is going to stop working once the left side succeeds.
A more concrete example of XML
<one>
<two>
<three>hello</three>
<four>bye</four>
</two>
<blahfive>again</blahfive>
</one>
and the xpath that works (but inefficient):
(/one/*[starts-with(local-name(.), 'blah')] | .)[last()]
To be clear, I would like to grab the immediate child node of 'one' which starts with 'blah'. However, if it doesn't exist, I would like only the current node.
If the 'blah' node does exist, I do not want the current node.
Is there a more efficient way to achieve this?
I need, using Xpath 1, to be able to get node a, or node b,
excusively, in that order. That is, node a if it exists, otherwise,
node b.
an xpath expression such as :
expression | expression
will get me both in the case they both exist. that is not what I want.
I could go:
(expression | expression)[last()]
Which does in fact gget me what I need (in my case),
This statement is not true.
Here is an example. Let us have this XML document:
<one>
<a/>
<b/>
</one>
Expression1 is:
/*/a
Expression2 is:
/*/b
Your composite expression:
(Expression1 | Expression2)[last()]
when we substitute the two expressions above is:
(/*/a | /*/b)[last()]
And this expression actually selects b -- not a -- because b is the last of the two in document order.
Now, here is an expression that selects just a if it exists, and selects b only if a doesn't exist -- regardless of document order:
/*/a | /*/b[not(/*/a)]
When this expression is evaluated on the XML document above, it selects a, regardless of its document order -- try swapping in the XML document above the places of a and b to confirm that in both cases the element that is selected is a.
To summarize, one expression that selects the wanted node regardless of any document order is:
Expression1 | Expression2[not(Expression1)]
Let us apply this general expression in your case:
Expression1 is:
/one/*[starts-with(local-name(.), 'blah')]
Expression2 is:
self::node()
The wanted expression (after substituting Expression1 and Expression2 in the above general expression) is:
/one/*[starts-with(local-name(.), 'blah')]
|
self::node()[not(/one/*[starts-with(local-name(.), 'blah')])]

Confusing sequence of code

I am attempting to use HTML Agility Pack to do parsing on some webpages.
here is a line of code I came across in an example.
var div = document.DocumentNode.Descendants().Where(n => n.Name == "div")
the tooltip says "(parameter) HTMLNode n" when placed over n in visual Studio
I am uncertain what n is and what this line does
This code selects all the descendants of the root node of the document with tag name == "div"
document.DocumentNode selects the root node
.Descendants() selects all the nodes in the root node (not only direct children but all)
.Where() selects only those who meets some criteria
n => n.Name == "div" is the criteria itself that means "if n is a node then the criteria is true when node's Name is equal to "div"

Basic prefix tree implementation question

I've implemented a basic prefix tree or "trie". The trie consists of nodes like this:
// pseudo-code
struct node {
char c;
collection<node> childnodes;
};
Say I add the following words to my trie: "Apple", "Ark" and "Cat". Now when I look-up prefixes like "Ap" and "Ca" my trie's "bool containsPrefix(string prefix)" method will correctly return true.
Now I'm implementing the method "bool containsWholeWord(string word)" that will return true for "Cat" and "Ark" but false for "App" (in the above example).
Is it common for nodes in a trie to have some sort of "endOfWord" flag? This would help determine if the string being looked-up was actually a whole word entered into the trie and not just a prefix.
Cheers!
The end of the key is usually indicated via a leaf node. Either:
the child nodes are empty; or
you have a branch, with one prefix of the key, and some children nodes.
Your design doesn't have a leaf/empty node. Try indicating it with e.g. a null.
If you need to store both "App" and "Apple", but not "Appl", then yes, you need something like an endOfWord flag.
Alternatively, you could fit it into your design by (sometimes) having two nodes with the same character. So "Ap" has to childnodes: The leaf node "p" and an internal node "p" with a child "l".

How can I write an XPath selector that picks only non-empty children?

I want to write an XPath expression that selects all non-empty table:table-cell children of the current element. How do I do this?
XPath expression that selects all
non-empty table:table-cell children
In Xpath 1.0
table:table-cell[node()]
Note: This element is not empty:
<table:table-cell>Something</table:table-cell>
But the expression table:table-cell[count(*) > 0] does not select it, because it means: all table:table-cell chindren having at least one element child
Got it:
table:table-cell[count(*) > 0]

Resources