Get node text only if contain an attribute? - xpath

XPath problem.
I have these nodes:
[...]
<videos>
<video timestamp="201204271112">myVideo.avi</video>
<video>myVideo.avi</video>
<video timestamp="201204271113">myVideo.avi</video>
<video>myVideo.avi</video>
<video>myVideo.avi</video>
</videos>
<photos>
<photo timestamp="201204271112">myphoto.avi</video>
<photo>myphoto.avi</video>
<photo timestamp="201204271113">aphoto.avi</video>
<photo>myphoto.avi</video>
<photo>myphoto.avi</video>
</photos>
[...]
How can i get only node text that contains timestamp attribute?
I tried
//#timestamp
it returns ALL timestamps attribute only. And the text?
How can make a query that include all two conditions? AND condition.
Something like this:
//#text and //#timestamps
to get only
201204271112 - myVideo.avi
201204271113 - myVideo.avi
201204271113 - aphoto.avi
excluding other ones?
thanks.

How can i get only node text that contains timestamp attribute?
Could you mean //*[#timestamp]/text()? That selects all text nodes whose parents have the timestamp attribute.
The conditions are in XPaths, too (i.e. //video[#timestamp and text()] selects all video nodes that have both timestamp and some text nodes).
What you probably meant is a node-set union used with symbol |. To get both the timestamps and the text nodes, you'll need two queries unioned together: //#timestamp | //*[#timestamp]/text() gets all timestamps and all their text nodes. However, I don't think you can get it nicely aligned (there will be all timestamps first, then all text nodes).
You can try either iterating one by one with some kind of for loop and get both the timestamp and the text node via position, or you can just get all nodes that have a timestamp and dig their text out of them later (which is a preffered way).
The spec is a surprisingly good read on this.

You can match on attributes:
//video[#timestamp]/text()
//video = matches a node with name video anywhere in the treee
[#timestamp] is a predicate, meaning the node has to have this attribute
text() selects all text node children of the current node

Related

Search/Parse XML and exclude certain nodes without removing them?

The command below allows me to parse the text in all nodes except for nodes 'wp14:sizeRelH' & 'wp14:sizeRelV'
XML.search('//wp14:sizeRelH', '//wp14:sizeRelV').remove.search('//text()')
I would like to do the same thing but I do not want to remove nodes 'wp14:sizeRelH' and 'wp14:sizeRelV' from the XML.
This way I can parse through the XML tree and make changes to the text in each node without affecting nodes 'wp14:sizeRelH' and 'wp14:sizeRelV'
EDIT: It appears if nodes '//wp14:sizeRelH' or '//wp14:sizeRelV' are not in the XML, then my command also returns nothing which is not good :(
Looks like I found the answer. I used //text()[not...] but had to find the ancestors names of the text I didn't want to include:
XML.search('//text()[not(ancestor::wp14:pctHeight or ancestor::wp14:pctWidth or ancestor::wp:posOffset)]')

What does Camel Splitter actually do with XML Document when splitting with xpath?

I have a document with an order and a number of lines. I need to break the order into lines so I have a camel splitter set to xpath with the order line as it's value. This works fine.
However, what I get going forward is an element for the order line, which is what I want, but when converting it I need information from the order element - but if I try to get the parent element via xpath following the split, this doesn't work.
Does Camel create copies of the nodes returned by the xpath expression, or return a list of nodes within the parent document? If the former, can I make it the latter? If the latter, any ideas why a "../*" expression would return nothing?
Thanks!
Screwtape.
Look at the split options that are available when using a Tokenizer:
http://camel.apache.org/splitter.html
You have four different modes (i, w, u, t) and the 'w' one is keeping the ancestor context. In such case, the parent node (=the thing you apparently need) will be repeated in each sub-message
Default:
<m:order><id>123</id><date>2014-02-25</date></m:order>
'w' mode:
<m:orders>
<m:order><id>123</id><date>2014-02-25</date>...</m:order>
</m:orders>

What is the difference between xpath //a and .//a in Selenium Webdriver [duplicate]

While finding the relative XPath via Firebug : it creates like
.//*[#id='Passwd']--------- what if we dont use dot at the start what it signifies?
Just add //* in the Xpath --
it highlights --- various page elements ---------- what does it signify?
Below are XPaths for Gmail password fields. What is significance of * ?
.//*[#id='Passwd']
//child::input[#type='password']
There are several distinct, key XPath concepts in play here...
Absolute vs relative XPaths (/ vs .)
/ introduces an absolute location path, starting at the root of the document.
. introduces a relative location path, starting at the context node.
Named element vs any element (ename vs *)
/ename selects an ename root element
./ename selects all ename child elements of the context node.
/* selects the root element, regardless of name.
./* or * selects all child elements of the context node, regardless of name.
descendant-or-self axis (//*)
//ename selects all ename elements in a document.
.//ename selects all ename elements at or beneath the context node.
//* selects all elements in a document, regardless of name.
.//* selects all elements, regardless of name, at or beneath the context node.
With these concepts in mind, here are answers to your specific questions...
.//*[#id='Passwd'] means to select all elements at or beneath the
context node that have an id attribute value equal to
'Passwd'.
//child::input[#type='password'] can be simplified to
//input[#type='password'] and means to select all input elements
in the document that have an type attribute value equal to 'password'.
These expressions all select different nodesets:
.//*[#id='Passwd']
The '.' at the beginning means, that the current processing starts at the current node. The '*' selects all element nodes descending from this current node with the #id-attribute-value equal to 'Passwd'.
What if we don't use dot at the start what it signifies?
Then you'd select all element nodes with an #id-attribute-value equal to 'Passwd' in the whole document.
Just add //* in the XPath -- it highlights --- various page elements
This would select all element nodes in the whole document.
Below mentioned : XPatht's for Gmail Password field are true what is significance of * ?
.//*[#id='Passwd']
This would select all element nodes descending from the current node which #id-attribute-value is equal to 'Passwd'.
//child::input[#type='password']
This would select all child-element nodes named input which #type-attribute-values are equal to 'password'. The child:: axis prefix may be omitted, because it is the default behaviour.
The syntax of choosing the appropriate expression is explained here at w3school.com.
And the Axes(current point in processing) are explained here at another w3school.com page.
The dot in XPath is called a "context item expression". If you put a dot at the beginning of the expression, it would make it context-specific. In other words, it would search the element with id="Passwd" in the context of the node on which you are calling the "find element by XPath" method.
The * in the .//*[#id='Passwd'] helps to match any element with id='Passwd'.
For the first question: It's all about the context. You can see Syntax to know what '.', '..' etc means. Also, I bet you won't find any explanation better than This Link.
Simplified answer for second question: You would generally find nodes using the html tags like td, a, li, div etc. But '*' means, find any tag that match your given property. It's mostly used when you are sure about a given property but not about that tag in which the element might come with, like suppose I want a list of all elements with ID 'xyz' be it in any tag.
Hope it helps :)

How to select a node based on its child's text value?

I want to select a node based on the text value of a child.
My structure is as follows (sorry for german nodes):
<InspizierteAbwassertechnischeAnlage>
<Objektbezeichnung>10502002</Objektbezeichnung>
<Anlagentyp>1</Anlagentyp>
</InspizierteAbwassertechnischeAnlage>
How can I select the <InspizierteAbwassertechnischeAnlage> node where e.g. <Objektbezeichnung> = 10502002?
Why your solution didn't work
ancestor:://*[text()='10502002'] is syntactically incorrect, it's not valid XPath. I'm not sure what you tried to do with the axes here.
//*[text()='10502002'] itself would just select the Objektbezeichnung itself and not its parent. It would also select any other element with such a value, regardless of its name. In case of this document, nothing redundant would be returned but you have to be careful when using wildcards (*)
The solution
It's quite simple, you have to use a predicate to inspect the content of the child element
//InspizierteAbwassertechnischeAnlage[Objektbezeichnung = '10502002']
Note the double slash (// ), it is the abbreviated syntax for the descendant-or-self axis. The above expression translates to:
/descendant-or-self::InspizierteAbwassertechnischeAnlage[Objektbezeichnung = '10502002']
Or in plain English
In the set of all descendants of the document's root, find InspizierteAbwassertechnischeAnlage elements that contain at least one Objektbezeichnung element with a value of 10502002
As for German element names, at least it's not Hottentottenstottertrottelmutterbeutelrattenlattengitterkofferattentäter or Rhababerbarbarabarbarbarenbartbarbierbierbarbärbel

Retrieve an xpath text contains using text()

I've been hacking away at this one for hours and I just can't figure it out. Using XPath to find text values is tricky and this problem has too many moving parts.
I have a webpage with a large table and a section in this table contains a list of users (assignees) that are assigned to a particular unit. There is nearly always multiple users assigned to a unit and I need to make sure a particular user is assigned to any of the units on the table. I've used XPath for nearly all of my selectors and I'm half way there on this one. I just can't seem to figure out how to use contains with text() in this context.
Here's what I have so far:
//td[#id='unit']/span [text()='asdfasdfasdfasdfasdf (Primary); asdfasdfasdfasdfasdf, asdfasdfasdfasdf; 456, 3456'; testuser]
The XPath Query above captures all text in the particular section I am looking at, which is great. However, I only need to know if testuser is in that section.
text() gets you a set of text nodes. I tend to use it more in a context of //span//text() or something.
If you are trying to check if the text inside an element contains something you should use contains on the element rather than the result of text() like this:
span[contains(., 'testuser')]
XPath is pretty good with context. If you know exactly what text a node should have you can do:
span[.='full text in this span']
But if you want to do something like regular expressions (using exslt for example) you'll need to use the string() function:
span[regexp:test(string(.), 'testuser')]

Resources