Xpath: Get certain values - xpath

Take a xml-file like this:
<games>
<game>
<place>xxx</place>
<date>2013-10-02</date>
</game>
<game>
<place>yyy </place>
<date>2013-10-03</date>
</game>
<game>
<place>zzz</place>
<date>2013-10-03</date>
</game>
<game>
<place>aaa</place>
<date>2013-10-03</date>
<status>1</status>
</game>
<game>
<place>bbb</place>
<date>2013-10-03</date>
<status>9</status>
</game>
</games>
Now, not only do I need to know, which "game" does have a tag named "status", but also what value do this tags have (in this example: 1 and 9 ).
//game/status
only leeds me to all the nodes with a tag "status", but I can't figure out, how to fetch the specific tag, just to ask for the value.
Can anybody help?
Thanks

The below expression will give you the result:
/games/game[status]/status/text()
/games/game[status] will give you all the game node which has status node. Then /games/game[status]/status will you give you only the status nodes from the selected game node. Then finally top mentioned will help you to extract the text values within status node.

Related

XPath based on node indexes only

I have an XML :
<Section>
<Paragraph>
<Text>t1</Text>
<Text>t2</Text>
</Paragraph>
<Paragraph>
<Text>t3</Text>
<Text>t4</Text>
</Paragraph>
</Section>
and I know only element indexes, e.g., /0/1/0 i.e. first Section, second Paragraph, and its first Text. How can I translate '0/1/0' into a valid XPath that returns element where t3 is ?
Note that I don't know element names because they can differ but I only know sequence of indexes as in above example.
Many thanks
For the example given this will work.
/element()[1]/element()[2]/element()[1]/text()

How to get parent element with attribute using xpath

I have posted sample XML and expected output kindly help to get the result.
Sample XML
<root>
<A id="1">
<B id="2"/>
<C id="2"/>
</A>
</root>
Expected output:
<A id="1"/>
You can formulate this query in several ways:
Find elements that have a matching attribute, only ascending all the time:
//*[#id=1]
Find the attribute, then ascend a step:
//#id[.=1]/..
Use the fn:id($id) function, given the document is validated and the ID-attribute is defined as such:
/id('1')
I think it's not possible what you're after. There's no way of selecting a node without its children using XPATH (meaning that it'd always return the nodes B and C in your case)
You could achieve this using XQuery, I'm not sure if this is what you want but here's an example where you create a new node based on an existing node that's stored in the $doc variable.
declare variable $doc := <root><A id="1"><B id="2"/><C id="2"/></A></root>;
element {fn:node-name($doc/*)} {$doc/*/#*}
The above returns <A id="1"></A>.
is that what you are looking for?
//*[#id='1']/parent::* , similar to //*[#id='1']/../
if you want to verify that parent is root :
//*[#id='1']/parent::root
https://en.wikipedia.org/wiki/XPath
if you need not just parent - but previous element with some attribute: Read about Axis specifiers and use Axis "ancestor::" =)

How to find the parent node by matching text using XPath

I have some XML:
<sys>
<lang>
<employee>
<name>Employee 1</name>
<code>4fdaa994-7015-4ec1-b365-de4ee0279966</code>
</employee>
<employee>
<name>Employee 2</name>
<code>1d960bdc-0853-49af-bb83-18cf92493897</code>
</employee>
</lang>
</syz>
How can I search and get the employee node where name ="Employee 1"?
I tried this but it didn't work:
obj.xpath("//sys/lang[/employee/name = 'Employee 1']")
This XPath
/sys/lang/employee[name = 'Employee 1']
will select the employee element whose name is Employee 1.
Why might OP be getting an "Invalid expression" using the above XPath?
Transcription error.
Resolution: Use copy and paste.
Single quotes around single quotes.
Resolution: Use outer double quotes: "/sys/lang/employee[name = 'Employee 1']"
Smart quotes.
Resolution: Replace ‘ and ’ with single quote '.
Misinterpretation of error message.
Resolution: Carefully check any line number mentioned in error, or carve away surrounding code as much as possible, and see if error goes away.
If none of the above possibilities apply, post a MCVE (Minimal, Complete, and Verifiable Example, including the provided XPath and the calling code -- the complete in MCVE) that produces the invalid expression error, and someone will likely immediately spot the problem.
I'm a big fan of using CSS over XPath for readability reasons. Nokogiri implements a number of jQuery's extensions to make it easier to use CSS for things we'd usually use XPath for.
I'd do it this way:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<sys>
<lang>
<employee>
<name>Employee 1</name>
<code>4fdaa994-7015-4ec1-b365-de4ee0279966</code>
</employee>
<employee>
<name>Employee 2</name>
<code>1d960bdc-0853-49af-bb83-18cf92493897</code>
</employee>
</lang>
</syz>
EOT
emp1 = doc.at('employee name:contains("Employee 1")') # => #<Nokogiri::XML::Element:0x3ffed05285b4 name="name" children=[#<Nokogiri::XML::Text:0x3ffed05283d4 "Employee 1">]>
emp1.to_xml # => "<name>Employee 1</name>"
emp1.parent.to_xml # => "<employee>\n <name>Employee 1</name>\n <code>4fdaa994-7015-4ec1-b365-de4ee0279966</code>\n </employee>"
Also note, it's not good practice to define the full path in the selector for a node. If the HTML or XML changes the structure that selector will break. Instead, find useful landmarks and hop from one to the next. That way your selector is more likely to survive changes in the markup. I only care about finding the appropriate <employee>...<name> combination, not those two tags embedded under <sys> and <lang>.
Sometimes an alternate way of getting to the information you want is to use search and look at a particular index:
doc.search('employee').first.to_xml # => "<employee>\n <name>Employee 1</name>\n <code>4fdaa994-7015-4ec1-b365-de4ee0279966</code>\n </employee>"
Or:
doc.at('employee').to_xml # => "<employee>\n <name>Employee 1</name>\n <code>4fdaa994-7015-4ec1-b365-de4ee0279966</code>\n </employee>"
at('some selector') is equivalent to search('some selector').first.

XPath Expression referencing a node

I am trying to reference a node in an expression. Take this simple example:
<?xml version="1.0" encoding="UTF-8" ?>
<homelist>
<homes>
<home>
<hname>house</hname>
<location>hell</location>
<url>wee</url>
<cID>1234</cID>
</home>
</homes>
<contacts>
<contactdetails cID="1234">
<cname>John Smith</cname>
<phone>0123234</phone>
<email>test#gmail.com</email>
</contactdetails>
</contacts>
</homelist>
I basically want to select nodes if it's value is somewhere else in the tree.
For example, I want to display the url of homes that have cID of John Smith. I tried this but it doesn't work, what is wrong with it:
homelist/homes/home[ancestor::homelist/contacts/contactdetails[cname="John Smith"]/url
"/homelist/homes/home[cID = /homelist/contacts/contactdetails[cname='John Smith']/#cID]/url"
You want to find the <home> whose <cID> child's text content equals that of the cID= attribute of the <contactdetails> whose <cname> contains 'John Smith', then return its <url> child.
Note that I've written this as an absolute path, from the root, since you didn't tell us what the context node was going to be for this XPath.
There are certainly other ways of writing the same concept; this is just the first one that occurred to me offhand.
If you preferred to use ancestor or parent, you could say
"/homelist/homes/home[cID = ancestor::homelist/contacts/contactdetails[cname='John Smith']/#cID]/url"

XPath / XQuery: find text in a node, but ignoring content of specific descendant elements

I am trying to find a way to search for a string within nodes, but excluding ythe content of some subelements of those nodes. Plain and simple, I want to search for a string in paragraphs of a text, excluding the footnotes which are children elements of the paragraphs.
For example,
My document being:
<document>
<p n="1">My text starts here/</p>
<p n="2">Then it goes on there<footnote>It's not a very long text!</footnote></p>
</document>
When I'm searching for "text", I would like the Xpath / XQuery to retrieve the first p element, but not the second one (where "text" is contained only in the footnote subelement).
I have tried the contains() function, but it retrieves both p elements.
Any help would be much appreciated :)
I want to search for a string in
paragraphs of a text, excluding the
footnotes which are children elements
of the paragraphs
An XPath 1.0 - only solution:
Use:
//p//text()[not(ancestor::footnote) and contains(.,'text')]
Against the following XML document (obtained from yours but added p s within a footnote to make this more interesting):
<document>
<p n="1">My text starts here/</p>
<p n="2">Then it goes on there
<footnote>It's not a very long text!
<p>text</p>
</footnote>
</p>
</document>
this XPath expression selects exactly the wanted text node:
My text starts here/
//p[(.//text() except .//footnote//text())[contains(., 'text')]]
/document/p[text()[contains(., 'text')]] should do.
For the record, as a complement to the other answers, I've found this workaround that also seems to do the job:
//p[contains(child::text()|not(descendant::footnote), "text")]

Resources