Xpath to extract from following siblings until the next specified node - xpath

I'm trying to extract everything from the p nodes that follow the h2 containing "Summary" until I get to the next h2.
This is what I have so far:
.//h2[contains(text(),'Summary')]/following-sibling::*
I just don't know how to get it to stop. Is this even possible?

If you select p[preceding-sibling::h2[1][contains(., 'Summary')] you will select all p children of the context node which have (the or a) h2 containing Summary as the immediately preceding h2 sibling.
If you want all such elements (e.g. the ul too) then use *[not(self::h2)][preceding-sibling::h2[1][contains(., 'Summary')].
Or you could try .//h2[contains(., 'Summary')]/following-sibling::*[preceding-sibling::h2[1][contains(., 'Summary')]].

Related

Using XPATH previous:: more like an array

I've got XML like this
<root>
...
<a>
<a>
<a>
<c>
...
It's very flat with LOTS of A elements and a few C elements. The A elements are sensor data and the last reading is bogus, I need the one before. So I'd like to use the C elements as a marker and each of A elements 2 before each C. So I'm trying out an XPATH like:
/root/c/preceding-sibling::a
but I'm getting all previous A elements, I was hoping for something a bit more direct such as:
/root/c/preceeding-sibling[-2]
which would just grab the 2nd sibling before C (no matter the type) I guess I'm asking for array like functionality on an XPATH so what ever I match I can ask for "the second element before that"
Is this possible?
You can
just grab the 2nd sibling before C (no matter the type)
with the XPath expression
/root/c/preceding-sibling::*[2]
The node count for preceding-sibling:: is going backwards. The node with the index [1] is the node before c and the node with the index [2] is the node before this - which is
the second element before that

What does Camel Splitter actually do with XML Document when splitting with xpath?

I have a document with an order and a number of lines. I need to break the order into lines so I have a camel splitter set to xpath with the order line as it's value. This works fine.
However, what I get going forward is an element for the order line, which is what I want, but when converting it I need information from the order element - but if I try to get the parent element via xpath following the split, this doesn't work.
Does Camel create copies of the nodes returned by the xpath expression, or return a list of nodes within the parent document? If the former, can I make it the latter? If the latter, any ideas why a "../*" expression would return nothing?
Thanks!
Screwtape.
Look at the split options that are available when using a Tokenizer:
http://camel.apache.org/splitter.html
You have four different modes (i, w, u, t) and the 'w' one is keeping the ancestor context. In such case, the parent node (=the thing you apparently need) will be repeated in each sub-message
Default:
<m:order><id>123</id><date>2014-02-25</date></m:order>
'w' mode:
<m:orders>
<m:order><id>123</id><date>2014-02-25</date>...</m:order>
</m:orders>

What is the difference between xpath //a and .//a in Selenium Webdriver [duplicate]

While finding the relative XPath via Firebug : it creates like
.//*[#id='Passwd']--------- what if we dont use dot at the start what it signifies?
Just add //* in the Xpath --
it highlights --- various page elements ---------- what does it signify?
Below are XPaths for Gmail password fields. What is significance of * ?
.//*[#id='Passwd']
//child::input[#type='password']
There are several distinct, key XPath concepts in play here...
Absolute vs relative XPaths (/ vs .)
/ introduces an absolute location path, starting at the root of the document.
. introduces a relative location path, starting at the context node.
Named element vs any element (ename vs *)
/ename selects an ename root element
./ename selects all ename child elements of the context node.
/* selects the root element, regardless of name.
./* or * selects all child elements of the context node, regardless of name.
descendant-or-self axis (//*)
//ename selects all ename elements in a document.
.//ename selects all ename elements at or beneath the context node.
//* selects all elements in a document, regardless of name.
.//* selects all elements, regardless of name, at or beneath the context node.
With these concepts in mind, here are answers to your specific questions...
.//*[#id='Passwd'] means to select all elements at or beneath the
context node that have an id attribute value equal to
'Passwd'.
//child::input[#type='password'] can be simplified to
//input[#type='password'] and means to select all input elements
in the document that have an type attribute value equal to 'password'.
These expressions all select different nodesets:
.//*[#id='Passwd']
The '.' at the beginning means, that the current processing starts at the current node. The '*' selects all element nodes descending from this current node with the #id-attribute-value equal to 'Passwd'.
What if we don't use dot at the start what it signifies?
Then you'd select all element nodes with an #id-attribute-value equal to 'Passwd' in the whole document.
Just add //* in the XPath -- it highlights --- various page elements
This would select all element nodes in the whole document.
Below mentioned : XPatht's for Gmail Password field are true what is significance of * ?
.//*[#id='Passwd']
This would select all element nodes descending from the current node which #id-attribute-value is equal to 'Passwd'.
//child::input[#type='password']
This would select all child-element nodes named input which #type-attribute-values are equal to 'password'. The child:: axis prefix may be omitted, because it is the default behaviour.
The syntax of choosing the appropriate expression is explained here at w3school.com.
And the Axes(current point in processing) are explained here at another w3school.com page.
The dot in XPath is called a "context item expression". If you put a dot at the beginning of the expression, it would make it context-specific. In other words, it would search the element with id="Passwd" in the context of the node on which you are calling the "find element by XPath" method.
The * in the .//*[#id='Passwd'] helps to match any element with id='Passwd'.
For the first question: It's all about the context. You can see Syntax to know what '.', '..' etc means. Also, I bet you won't find any explanation better than This Link.
Simplified answer for second question: You would generally find nodes using the html tags like td, a, li, div etc. But '*' means, find any tag that match your given property. It's mostly used when you are sure about a given property but not about that tag in which the element might come with, like suppose I want a list of all elements with ID 'xyz' be it in any tag.
Hope it helps :)

Xpath to go back to sibing td

I am trying to back to to previous td but to no avail, can you help
//*[#class='ein' and contains(.,'aaaa')] gets me to td but need to select the previous td-tried below but did not work
//*[#class='ein' and contains(.,'aaaa')][preceding-sibling::td]
Remember /X means "select X", while [X] means "where X". If you want to select preceding siblings, rather than testing whether they exist, use /.
It's impossible to say for certain without seeing the input HTML but I suspect that instead of
//*[#class='ein' and contains(.,'aaaa')][preceding-sibling::td]
you need something like
//*[#class='ein' and contains(.,'aaaa')]/preceding-sibling::td[1]
to navigate from each node selected by the initial expression to its nearest preceding td. Your first attempt will select exactly the same nodes as
//*[#class='ein' and contains(.,'aaaa')]
but only if they have at least one preceding-sibling element named td.
Use // after the element you found
Instead of preceding-sibling, just use preceding
//*[#class='ein' and contains(.,'aaaa')]//preceding::td[1]

Modify XPath to return second of two values

I have an XPath that returns two items. I want to modify it so that it returns only the second, or the last if there are more than 2.
//a[#rel='next']
I tried
//a[#rel='next'][2]
but that doesn't return anything at all. How can I rewrite the xpath so I get only the 2nd link?
Found the answer in
XPATH : finding an attribute node (and only one)
In my case the right XPath would be
(//a[#rel='next'])[last()]
EDIT (by Tomalak) - Explanation:
This selects all a[#rel='next'] nodes, and takes the last of the entire set:
(//a[#rel='next'])[last()]
This selects all a[#rel='next'] nodes that are the respective last a[#rel='next'] of the parent context each of them is in:
//a[#rel='next'][last()] equivalent: //a[#rel='next' and position()=last()]
This selects all a[#rel='next'] nodes that are the second a[#rel='next'] of the parent context each of them is in (in your case, each parent context had only one a[#rel='next'], that's why you did not get anything back):
//a[#rel='next'][2] equivalent: //a[#rel='next' and position()=2]
For the sake of completeness: This selects all a nodes that are the last of the parent context each of them is in, and of them only those that have #rel='next' (XPath predicates are applied from left to right!):
//a[last()][#rel='next'] NOT equiv!: //a[position()=last() and #rel='next']

Resources