here's a nice puzzle. Suppose we have this bit of code:
<page n="1">
<line n="3">...</line>
</page>
It is real easy to locate the line element "n=3" within the page element "n=1" with a simple xpath expression: xpath(//page[#n='1')/line[#n='3']). Great, beautiful, elegant.
Now suppose what we have is this encoding (folks familiar with the TEI will know where this is coming from).
<pb n="1"/>
(arbitrary amounts of stuff)
<lb n="3"/>
We want to find the lb element with n="3", which follows the pb element with n="1". But note -- this lb element could be almost anywhere following the pb: it may not be (and most likely is not) a sibling, but could be a child of a sibling of the pb, or of the pb's parent, etc etc etc.
So my question: how would you search for this lb element with n="3", which follows the pb element with n="1", with XPath?
Thanks in advance
Peter
Use:
//pb[#n='1']/following::lb[#n='2']
|
//pb[#n='1']/descendant::lb[#n='2']
This selects any lb element that follows the specified pb in document order -- even if the wanted lb element is a descendant of the pb element.
Do note that the following expression doesn't in general select all wanted lb elements (it fails to select any of these that are descendants of the pb element):
//pb[#n='1']/following::lb[#n='2']
Explanation:
As defined in the W3C XPath specification, the following:: and descendant:: axes are non-overlapping:
"the following axis contains all nodes in the same document as the
context node that are after the context node in document order,
excluding any descendants and excluding attribute nodes and namespace nodes"
That would be
//pb[#n=1]/following::lb[#n=3]
Related
I have the following XML structure
<Root>
<BundleItem>
<Item>1</Item>
<Item>2</Item>
<Item>3</Item>
</BundleItem>
<Item>4</Item>
<Item>5</Item>
<Item>6</Item>
<BundleItem>
<Item>7</Item>
<Item>8</Item>
<Item>9</Item>
</BundleItem>
</Root>
And by providing the following xPath
//Item[1]
I am selecting
<Item>1</Item>
<Item>4</Item>
<Item>7</Item>
My goal is to select only <Item>1</Item> or <Item>7</Item> regardless of the parent element where they are found and only depending on the position, which i am providing in the xPath.
Is it possible to do that only by using the position and without providing additional criterias in the xPath ?
//Item[1] selects the all the first child elements that are <Item/> regardless of their parent.
To get the two items you are looking for you could use //Item[text() = 1 or text() = 7].
A good tutorial can be found at w3schools.com and you can play with XPath expressions over your XML input here. (I am not affiliated with either of these resources but find them useful.)
I've got XML like this
<root>
...
<a>
<a>
<a>
<c>
...
It's very flat with LOTS of A elements and a few C elements. The A elements are sensor data and the last reading is bogus, I need the one before. So I'd like to use the C elements as a marker and each of A elements 2 before each C. So I'm trying out an XPATH like:
/root/c/preceding-sibling::a
but I'm getting all previous A elements, I was hoping for something a bit more direct such as:
/root/c/preceeding-sibling[-2]
which would just grab the 2nd sibling before C (no matter the type) I guess I'm asking for array like functionality on an XPATH so what ever I match I can ask for "the second element before that"
Is this possible?
You can
just grab the 2nd sibling before C (no matter the type)
with the XPath expression
/root/c/preceding-sibling::*[2]
The node count for preceding-sibling:: is going backwards. The node with the index [1] is the node before c and the node with the index [2] is the node before this - which is
the second element before that
While finding the relative XPath via Firebug : it creates like
.//*[#id='Passwd']--------- what if we dont use dot at the start what it signifies?
Just add //* in the Xpath --
it highlights --- various page elements ---------- what does it signify?
Below are XPaths for Gmail password fields. What is significance of * ?
.//*[#id='Passwd']
//child::input[#type='password']
There are several distinct, key XPath concepts in play here...
Absolute vs relative XPaths (/ vs .)
/ introduces an absolute location path, starting at the root of the document.
. introduces a relative location path, starting at the context node.
Named element vs any element (ename vs *)
/ename selects an ename root element
./ename selects all ename child elements of the context node.
/* selects the root element, regardless of name.
./* or * selects all child elements of the context node, regardless of name.
descendant-or-self axis (//*)
//ename selects all ename elements in a document.
.//ename selects all ename elements at or beneath the context node.
//* selects all elements in a document, regardless of name.
.//* selects all elements, regardless of name, at or beneath the context node.
With these concepts in mind, here are answers to your specific questions...
.//*[#id='Passwd'] means to select all elements at or beneath the
context node that have an id attribute value equal to
'Passwd'.
//child::input[#type='password'] can be simplified to
//input[#type='password'] and means to select all input elements
in the document that have an type attribute value equal to 'password'.
These expressions all select different nodesets:
.//*[#id='Passwd']
The '.' at the beginning means, that the current processing starts at the current node. The '*' selects all element nodes descending from this current node with the #id-attribute-value equal to 'Passwd'.
What if we don't use dot at the start what it signifies?
Then you'd select all element nodes with an #id-attribute-value equal to 'Passwd' in the whole document.
Just add //* in the XPath -- it highlights --- various page elements
This would select all element nodes in the whole document.
Below mentioned : XPatht's for Gmail Password field are true what is significance of * ?
.//*[#id='Passwd']
This would select all element nodes descending from the current node which #id-attribute-value is equal to 'Passwd'.
//child::input[#type='password']
This would select all child-element nodes named input which #type-attribute-values are equal to 'password'. The child:: axis prefix may be omitted, because it is the default behaviour.
The syntax of choosing the appropriate expression is explained here at w3school.com.
And the Axes(current point in processing) are explained here at another w3school.com page.
The dot in XPath is called a "context item expression". If you put a dot at the beginning of the expression, it would make it context-specific. In other words, it would search the element with id="Passwd" in the context of the node on which you are calling the "find element by XPath" method.
The * in the .//*[#id='Passwd'] helps to match any element with id='Passwd'.
For the first question: It's all about the context. You can see Syntax to know what '.', '..' etc means. Also, I bet you won't find any explanation better than This Link.
Simplified answer for second question: You would generally find nodes using the html tags like td, a, li, div etc. But '*' means, find any tag that match your given property. It's mostly used when you are sure about a given property but not about that tag in which the element might come with, like suppose I want a list of all elements with ID 'xyz' be it in any tag.
Hope it helps :)
I have made an info-graphic depicting the various axes in XPath. However, I am not sure as to whether they are correct.
I get confused in following, following-sibling, preceding and preceding-sibling
Is my diagram correct ?
The original image is here: http://imgur.com/4ekJxca
(Taken from Pro XML Development with Java)
Here is my understanding of the nodes I get confused in:
descendant:: selects the nodes (element and text only) which are children and grandchildren of the context node.
following:: selects any node (text only) which was not selected by descendant.
following-sibling:: all the 'brothers' of the context node. That is, text and element nodes which are children of the same parent as the context node, after the context node.
preceding::sibling all the 'brothers' of the context node. That is, text and element nodes which are children of the same parent as the context node, before the context node.
preceeding:: all the nodes (text only) that do not appear along the ancestor:: axis and are not nested in any element node. (I am sure I screwed this up)
XML
<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns:journal="http://www.apress.com/catalog/journal" >
<journal:journal title="XML" publisher="IBM developerWorks">
<article journal:level="Intermediate"
date="February-2003">
<title>Design XML Schemas Using UML</title>
<author>Ayesha Malik</author>
</article>
</journal:journal>
<journal title="Java Technology" publisher="IBM developerWorks">
<article level="Advanced" date="January-2004">
<title>Design service-oriented architecture
frameworks with J2EE technology</title>
<author>Naveen Balani</author>
</article>
<article level="Advanced" date="October-2003">
<title>Advance DAO Programming</title>
<author>Sean Sullivan </author>
</article>
</journal>
</catalog>
The best way to gain accurate intuition about preceding and following axes is to imagine XML as a set of nested boxes or intervals, where each interval extends from the start tag to its matching end tag. In this picture you can see that any two distinct intervals a and b must be in exactly one of the following relationships:
a contains b (a/descendant::b);
a is contained by b (a/ancestor::b);
a is followed by b (a/following::b).
a is preceded by b (a/preceding::b);
If you keep to this model, you will never have a doubt in the semantics of the XPath axes.
Incidentally, this is why the tree model is bad for your intuition: it doesn't put the "nested boxes" paradigm to the forefront, so it's easy to get confused.
I have a parent element (font) and I would like to select all the child elements (direct descendants) that are either text() or span elements. How would I construct such an xpath?
If the current node is the font element, then something like this:
text()|span
otherwise you have to always combine with | the two complete XPath - the one for text and the one for span, e.g.:
font/text()|font/span
if the current node is just above font - or
//a[text()='View Larger Map']/../../../../div[contains(#class, 'paragraph')][3]/font/span|//a[text()='View Larger Map']/../../../../div[contains(#class, 'paragraph')][3]/font/text()
if starting from the root with some complex selection criteria.
If you have complex paths like the last one probably it is better to store a partial one in a variable - e.g. inside an XSLT:
<xsl:variable name="font" select="//a[text()='View Larger Map']/../../../../div[contains(#class, 'paragraph')][3]/font"/>
. . .
<xsl:for-each select="$font/span|$font/text()">
. . .
</xsl:for-each>
Another possibility is to do something like this:
//a[text()='View Larger Map']/../../../../div[contains(#class, 'paragraph')][3]/font/node()[name()='span' or name()='']
that works because name() returns an empty string for text() nodes - but I am not 100% sure that it works that way for all XPath processors, and it could match by mistake comment nodes.