XPath test to identify node type - xpath

I don't understand why this test
count(.|../#*)=count(../#*)
( from Dave Pawson's Home page )
identify an attribute node :(
could someone give me a detailled explanation ?

A few things to understand:
. refers to the current node (aka "context node")
an attribute node has a parent (the element it belongs to)
an XPath union operation (with |) never duplicates nodes, i.e. (.|.) results in one node, not two
there is the self:: axis you could use in theory (e.g. self::* works to find out if a node is an element), but self::#* does not work, so we must use something different
Knowing that, you can say:
../#* fetches all attributes of the current node's parent (all "sibling attributes", if you will)
(.|../#*) unions the current node with them – if the current node is an attribute, the overall count does not change (as per #3 above)
therefore, if count(.|../#*) equals count(../#*), the current node must be an attribute node.

Just for completeness, in XSLT 2.0 you can do
<xsl:if test="self::attribute()">...</xsl:if>

Here is how this works
count( # Count the nodes in this set
.|../#*) # include self and all attributes of the parent
# this counts all of the distinct nodes returned by the expression
# if the current node is an attribute, then it returns the count of all the
# attributes on the parent element because it does not count the current
# node twice. If it is another type of node it will return 1 because only
# elements have attribute children
=count( # count all the nodes in this set
../#*) # include all attribute nodes of the parent. If the current node is not an
# attribute node this returns 0 because the parent can't have attribute
# children

Related

What is the difference between xpath //a and .//a in Selenium Webdriver [duplicate]

While finding the relative XPath via Firebug : it creates like
.//*[#id='Passwd']--------- what if we dont use dot at the start what it signifies?
Just add //* in the Xpath --
it highlights --- various page elements ---------- what does it signify?
Below are XPaths for Gmail password fields. What is significance of * ?
.//*[#id='Passwd']
//child::input[#type='password']
There are several distinct, key XPath concepts in play here...
Absolute vs relative XPaths (/ vs .)
/ introduces an absolute location path, starting at the root of the document.
. introduces a relative location path, starting at the context node.
Named element vs any element (ename vs *)
/ename selects an ename root element
./ename selects all ename child elements of the context node.
/* selects the root element, regardless of name.
./* or * selects all child elements of the context node, regardless of name.
descendant-or-self axis (//*)
//ename selects all ename elements in a document.
.//ename selects all ename elements at or beneath the context node.
//* selects all elements in a document, regardless of name.
.//* selects all elements, regardless of name, at or beneath the context node.
With these concepts in mind, here are answers to your specific questions...
.//*[#id='Passwd'] means to select all elements at or beneath the
context node that have an id attribute value equal to
'Passwd'.
//child::input[#type='password'] can be simplified to
//input[#type='password'] and means to select all input elements
in the document that have an type attribute value equal to 'password'.
These expressions all select different nodesets:
.//*[#id='Passwd']
The '.' at the beginning means, that the current processing starts at the current node. The '*' selects all element nodes descending from this current node with the #id-attribute-value equal to 'Passwd'.
What if we don't use dot at the start what it signifies?
Then you'd select all element nodes with an #id-attribute-value equal to 'Passwd' in the whole document.
Just add //* in the XPath -- it highlights --- various page elements
This would select all element nodes in the whole document.
Below mentioned : XPatht's for Gmail Password field are true what is significance of * ?
.//*[#id='Passwd']
This would select all element nodes descending from the current node which #id-attribute-value is equal to 'Passwd'.
//child::input[#type='password']
This would select all child-element nodes named input which #type-attribute-values are equal to 'password'. The child:: axis prefix may be omitted, because it is the default behaviour.
The syntax of choosing the appropriate expression is explained here at w3school.com.
And the Axes(current point in processing) are explained here at another w3school.com page.
The dot in XPath is called a "context item expression". If you put a dot at the beginning of the expression, it would make it context-specific. In other words, it would search the element with id="Passwd" in the context of the node on which you are calling the "find element by XPath" method.
The * in the .//*[#id='Passwd'] helps to match any element with id='Passwd'.
For the first question: It's all about the context. You can see Syntax to know what '.', '..' etc means. Also, I bet you won't find any explanation better than This Link.
Simplified answer for second question: You would generally find nodes using the html tags like td, a, li, div etc. But '*' means, find any tag that match your given property. It's mostly used when you are sure about a given property but not about that tag in which the element might come with, like suppose I want a list of all elements with ID 'xyz' be it in any tag.
Hope it helps :)

immediately preceding-sibling must contain attribute

Here is my XML file:
<w type="fruit-hard">apple</w>
<w type="fruit-soft">orange</w>
<w type="vegetable">carrot</w>
I need to find carrot's immediately preceding sibling whose type is fruit-soft. In Chrome (locally loaded XML file), when I try
$x("//w[#type='vegetable']/preceding-sibling::w[1]")
I get "orange" element node like I want, but how do I require that its type be "fruit-soft"? My attempt (below) returns "false."
$x("//w[#type='vegetable']/preceding-sibling::w[1] and preceding-sibling::w[#type='fruit-soft']")
Your original XPath ...
//w[#type='vegetable']/preceding-sibling::w[1]
... is equivalent to
//w[#type='vegetable']/preceding-sibling::w[position()=1]
. You can add additional criteria to the predicate as needed:
//w[#type='vegetable']/preceding-sibling::w[position()=1 and #type='fruit-soft']
Or you can add an add a separate predicate
//w[#type='vegetable']/preceding-sibling::w[1][#type='fruit-soft']
Note that this attempt:
//w[#type='vegetable']/preceding-sibling::w[1] and preceding-sibling::w[#type='fruit-soft']
returns false because the parts on either side of the and are evaluated separately, converted to type boolean, and combined to yield the final result. Supposing that the context node against which that is evaluated is the document root, there will never be a node matching preceding-sibling::w[#type='fruit-soft']. Moreover, even if there were such a node, that expression does not require nodes matching the first part to be the same ones that matches the second part.

Difference between / and /root-node

My document looks like the following:
<a>
whatever
</a>
If I run / or /a on the entire document is returned(at least effectively).
If I run /a/.. the entire document is returned.
But /.. returns an empty sequence
Considering / and /a are returning the same node how come /a/.. and /.. are different?
The Document Node
The XML code you provided as document is actually wrapped in another node, the "document node". The document is another node kind, others are elements, attributes, text nodes, comments and processing instructions. Using XQuery/XPath 2.0 notation, it would look something like this:
document{
<a>
whatever
</a>
}
Effects on Queries
/ selects the document node
/a selects the root element, which is the only child of the document node
/.. returns the empty sequence, as the document node has no parent node
/a/.. again selects the parent node of the root element, which again is the document node
/../a has no results, as we "stepped out of the tree" (compare with /..)
Why we Need a Document Node
The document node is important, as the XML specification allows other nodes to follow the root node, namely processing instructions and comments (and whitespace). From the XML grammar:
document ::= prolog element Misc*
Misc ::= Comment | PI | S
Without a document node, these elements wouldn't be reachable for XPath, as they are no elements of the "root element subtree".
So, this would also be a valid XML document (*):
document {
<a>
whatever
</a>
<!-- Just do nothing -->
<?php foo(); ?>
}
(*) This isn't valid XPath 2.0 any more, as we would have to give a node sequence. I omitted the commas , after each node necessary for XPath 2.0, as this is only for demonstration purpose.
The expressions / and /a are not the same and don't return the entire document. / selects a node set containing the document root. The root node in XPath (or document node in XPath 2.0) is kind of a virtual node which sits above the document element./a selects a node set containing the document element.
The expression /a/.. selects the parent of the document element which is the root node. The expression /.. selects the parent of the root node. Since the root node has no parent, it returns the empty node set. This expression is also a common idiom to select the empty node set.

XPath - Get child elements without a specific id

I have tried refraining from asking for help, but I have had enough! I am trying to get the child elements of a node; all except one with a particular id. This is what I have thus far:
//*[#id='a']/*[#id!='b']
It works to some extent. It gets all child elements of 'a' that do not have an id of 'b', but I want it to get all child elements, regardless whether it has an id attribute or not.
Any ideas?
Try using not(), eg
//*[#id="a"]/*[not(#id="b")]

Modify XPath to return second of two values

I have an XPath that returns two items. I want to modify it so that it returns only the second, or the last if there are more than 2.
//a[#rel='next']
I tried
//a[#rel='next'][2]
but that doesn't return anything at all. How can I rewrite the xpath so I get only the 2nd link?
Found the answer in
XPATH : finding an attribute node (and only one)
In my case the right XPath would be
(//a[#rel='next'])[last()]
EDIT (by Tomalak) - Explanation:
This selects all a[#rel='next'] nodes, and takes the last of the entire set:
(//a[#rel='next'])[last()]
This selects all a[#rel='next'] nodes that are the respective last a[#rel='next'] of the parent context each of them is in:
//a[#rel='next'][last()] equivalent: //a[#rel='next' and position()=last()]
This selects all a[#rel='next'] nodes that are the second a[#rel='next'] of the parent context each of them is in (in your case, each parent context had only one a[#rel='next'], that's why you did not get anything back):
//a[#rel='next'][2] equivalent: //a[#rel='next' and position()=2]
For the sake of completeness: This selects all a nodes that are the last of the parent context each of them is in, and of them only those that have #rel='next' (XPath predicates are applied from left to right!):
//a[last()][#rel='next'] NOT equiv!: //a[position()=last() and #rel='next']

Resources