I am reading this XPath examples: https://msdn.microsoft.com/en-us/library/ms256086(v=vs.110).aspx and I want to know the difference between these 2 expressions:
author
All <author> elements within the current context.
//author
All <author> elements in the document.
What is the difference between these two cases? If the "current context" is the root node, would that make the two equivalent?
For this simple XML file:
<root>
<author>
<first-name></first-name>
</author>
</root>
I tried it on this site https://www.freeformatter.com/xpath-tester.html
Why does author not returning anything as I expected it should (but //author works)?
The description you cite for the relative XPath expression, author,
All <author> elements within the current context.
is wrong1. It should instead say,
All <author> child elements of the current context node.
//author would indeed select all <author> elements in the document because // selects along the descendant-or-self axis.
The reason author doesn't select anything for your XML document is that with the context node set to the document root, you'd have to include root/author to select the <author> children of <root> or just root to select the <root> element itself.
1 As of today, 2018-06-24, but I've submitted feedback that it should be corrected, so hopefully it will fixed be soon.
"element" selects all immediate children named "element" of the current node and is identical to "./element".
"//element" selects all "element" nodes at any depth, starting from the root (ignoring your current node).
And to complete the list:
".//element" would select "element" children below the current node, at any depth.
"/element" would search at the root level only (in you example, you would need "/root" to get anything).
And as for "author" not finding anything: you first need to be at the level of your root node. "/root/author" would get the node you wanted, or first select "/root" and from there you can select "author".
Related
I was selecting all attributes id and everything was going nicely then one day requirements changed and now I have to select all except one!
Given the following example:
<root>
<structs id="123">
<struct>
<comp>
<data id="asd"/>
</comp>
</struct>
</structs>
</root>
I want to select all attributes id except the one at /root/structs/struct/comp/data
Please note that the Xml could be different.
Meaning, what I really want is: given any Xml tree, I want to select all attributes id except the one on element /root/structs/struct/comp/data
I tried the following:
//#id[not(ancestor::struct)] It kinda worked but I want to provide a full xpath to the ancestor axis which I couldn't
//#id[not(contains(name(), 'data'))] It didn't work because name selector returns the name of the underlying node which is the attribute not its parent element
The following should achieve what you're describing:
//#id[not(parent::data/parent::comp/parent::struct/parent::structs/parent::root)]
As you can see, it simply checks from bottom to top whether the id attribute's parent matches the path root/structs/struct/comp/data.
I think this should be sufficient for your needs, but it does not 100% ensure that the parent is at the path /root/structs/struct/comp/data because it could be, for example, at the path /someOtherHigherRoot/root/structs/struct/comp/data. I'm guessing that's not a possible scenario in your XML structure, but if you had to check for that, you could do this:
//#id[not(parent::data/parent::comp/parent::struct/parent::structs/parent::root[not(parent::*)])]
While finding the relative XPath via Firebug : it creates like
.//*[#id='Passwd']--------- what if we dont use dot at the start what it signifies?
Just add //* in the Xpath --
it highlights --- various page elements ---------- what does it signify?
Below are XPaths for Gmail password fields. What is significance of * ?
.//*[#id='Passwd']
//child::input[#type='password']
There are several distinct, key XPath concepts in play here...
Absolute vs relative XPaths (/ vs .)
/ introduces an absolute location path, starting at the root of the document.
. introduces a relative location path, starting at the context node.
Named element vs any element (ename vs *)
/ename selects an ename root element
./ename selects all ename child elements of the context node.
/* selects the root element, regardless of name.
./* or * selects all child elements of the context node, regardless of name.
descendant-or-self axis (//*)
//ename selects all ename elements in a document.
.//ename selects all ename elements at or beneath the context node.
//* selects all elements in a document, regardless of name.
.//* selects all elements, regardless of name, at or beneath the context node.
With these concepts in mind, here are answers to your specific questions...
.//*[#id='Passwd'] means to select all elements at or beneath the
context node that have an id attribute value equal to
'Passwd'.
//child::input[#type='password'] can be simplified to
//input[#type='password'] and means to select all input elements
in the document that have an type attribute value equal to 'password'.
These expressions all select different nodesets:
.//*[#id='Passwd']
The '.' at the beginning means, that the current processing starts at the current node. The '*' selects all element nodes descending from this current node with the #id-attribute-value equal to 'Passwd'.
What if we don't use dot at the start what it signifies?
Then you'd select all element nodes with an #id-attribute-value equal to 'Passwd' in the whole document.
Just add //* in the XPath -- it highlights --- various page elements
This would select all element nodes in the whole document.
Below mentioned : XPatht's for Gmail Password field are true what is significance of * ?
.//*[#id='Passwd']
This would select all element nodes descending from the current node which #id-attribute-value is equal to 'Passwd'.
//child::input[#type='password']
This would select all child-element nodes named input which #type-attribute-values are equal to 'password'. The child:: axis prefix may be omitted, because it is the default behaviour.
The syntax of choosing the appropriate expression is explained here at w3school.com.
And the Axes(current point in processing) are explained here at another w3school.com page.
The dot in XPath is called a "context item expression". If you put a dot at the beginning of the expression, it would make it context-specific. In other words, it would search the element with id="Passwd" in the context of the node on which you are calling the "find element by XPath" method.
The * in the .//*[#id='Passwd'] helps to match any element with id='Passwd'.
For the first question: It's all about the context. You can see Syntax to know what '.', '..' etc means. Also, I bet you won't find any explanation better than This Link.
Simplified answer for second question: You would generally find nodes using the html tags like td, a, li, div etc. But '*' means, find any tag that match your given property. It's mostly used when you are sure about a given property but not about that tag in which the element might come with, like suppose I want a list of all elements with ID 'xyz' be it in any tag.
Hope it helps :)
My document looks like the following:
<a>
whatever
</a>
If I run / or /a on the entire document is returned(at least effectively).
If I run /a/.. the entire document is returned.
But /.. returns an empty sequence
Considering / and /a are returning the same node how come /a/.. and /.. are different?
The Document Node
The XML code you provided as document is actually wrapped in another node, the "document node". The document is another node kind, others are elements, attributes, text nodes, comments and processing instructions. Using XQuery/XPath 2.0 notation, it would look something like this:
document{
<a>
whatever
</a>
}
Effects on Queries
/ selects the document node
/a selects the root element, which is the only child of the document node
/.. returns the empty sequence, as the document node has no parent node
/a/.. again selects the parent node of the root element, which again is the document node
/../a has no results, as we "stepped out of the tree" (compare with /..)
Why we Need a Document Node
The document node is important, as the XML specification allows other nodes to follow the root node, namely processing instructions and comments (and whitespace). From the XML grammar:
document ::= prolog element Misc*
Misc ::= Comment | PI | S
Without a document node, these elements wouldn't be reachable for XPath, as they are no elements of the "root element subtree".
So, this would also be a valid XML document (*):
document {
<a>
whatever
</a>
<!-- Just do nothing -->
<?php foo(); ?>
}
(*) This isn't valid XPath 2.0 any more, as we would have to give a node sequence. I omitted the commas , after each node necessary for XPath 2.0, as this is only for demonstration purpose.
The expressions / and /a are not the same and don't return the entire document. / selects a node set containing the document root. The root node in XPath (or document node in XPath 2.0) is kind of a virtual node which sits above the document element./a selects a node set containing the document element.
The expression /a/.. selects the parent of the document element which is the root node. The expression /.. selects the parent of the root node. Since the root node has no parent, it returns the empty node set. This expression is also a common idiom to select the empty node set.
I have made an info-graphic depicting the various axes in XPath. However, I am not sure as to whether they are correct.
I get confused in following, following-sibling, preceding and preceding-sibling
Is my diagram correct ?
The original image is here: http://imgur.com/4ekJxca
(Taken from Pro XML Development with Java)
Here is my understanding of the nodes I get confused in:
descendant:: selects the nodes (element and text only) which are children and grandchildren of the context node.
following:: selects any node (text only) which was not selected by descendant.
following-sibling:: all the 'brothers' of the context node. That is, text and element nodes which are children of the same parent as the context node, after the context node.
preceding::sibling all the 'brothers' of the context node. That is, text and element nodes which are children of the same parent as the context node, before the context node.
preceeding:: all the nodes (text only) that do not appear along the ancestor:: axis and are not nested in any element node. (I am sure I screwed this up)
XML
<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns:journal="http://www.apress.com/catalog/journal" >
<journal:journal title="XML" publisher="IBM developerWorks">
<article journal:level="Intermediate"
date="February-2003">
<title>Design XML Schemas Using UML</title>
<author>Ayesha Malik</author>
</article>
</journal:journal>
<journal title="Java Technology" publisher="IBM developerWorks">
<article level="Advanced" date="January-2004">
<title>Design service-oriented architecture
frameworks with J2EE technology</title>
<author>Naveen Balani</author>
</article>
<article level="Advanced" date="October-2003">
<title>Advance DAO Programming</title>
<author>Sean Sullivan </author>
</article>
</journal>
</catalog>
The best way to gain accurate intuition about preceding and following axes is to imagine XML as a set of nested boxes or intervals, where each interval extends from the start tag to its matching end tag. In this picture you can see that any two distinct intervals a and b must be in exactly one of the following relationships:
a contains b (a/descendant::b);
a is contained by b (a/ancestor::b);
a is followed by b (a/following::b).
a is preceded by b (a/preceding::b);
If you keep to this model, you will never have a doubt in the semantics of the XPath axes.
Incidentally, this is why the tree model is bad for your intuition: it doesn't put the "nested boxes" paradigm to the forefront, so it's easy to get confused.
here's a nice puzzle. Suppose we have this bit of code:
<page n="1">
<line n="3">...</line>
</page>
It is real easy to locate the line element "n=3" within the page element "n=1" with a simple xpath expression: xpath(//page[#n='1')/line[#n='3']). Great, beautiful, elegant.
Now suppose what we have is this encoding (folks familiar with the TEI will know where this is coming from).
<pb n="1"/>
(arbitrary amounts of stuff)
<lb n="3"/>
We want to find the lb element with n="3", which follows the pb element with n="1". But note -- this lb element could be almost anywhere following the pb: it may not be (and most likely is not) a sibling, but could be a child of a sibling of the pb, or of the pb's parent, etc etc etc.
So my question: how would you search for this lb element with n="3", which follows the pb element with n="1", with XPath?
Thanks in advance
Peter
Use:
//pb[#n='1']/following::lb[#n='2']
|
//pb[#n='1']/descendant::lb[#n='2']
This selects any lb element that follows the specified pb in document order -- even if the wanted lb element is a descendant of the pb element.
Do note that the following expression doesn't in general select all wanted lb elements (it fails to select any of these that are descendants of the pb element):
//pb[#n='1']/following::lb[#n='2']
Explanation:
As defined in the W3C XPath specification, the following:: and descendant:: axes are non-overlapping:
"the following axis contains all nodes in the same document as the
context node that are after the context node in document order,
excluding any descendants and excluding attribute nodes and namespace nodes"
That would be
//pb[#n=1]/following::lb[#n=3]