I need to get a node based on another which is at the same level :(
Here is an example :
<A>
<B id=2>
<F1>aaa</F1>
<F2>bbb</F2>
<F3>ccc</F3>
</B>
<B>
....
</B>
<B>
....
</B>
<D>
<F1>aaa</F1>
<F2>bbb</F2>
<F3>ccc</F3>
<F4>yyy</F4>
</D>
<D>
...
</D>
</A>
So I need to get A/D/F4 where A/D/F1=A/B[#id=2]/F1 and A/D/F2=A/B[#id=2]/F2 and A/D/F3=A/B[#id=2]/F3
I only have the id to execute the query and I get the node B as a starting point (returned by other internal code). I was thinking about a way to get a back reference to a node in the path that means I could use something like :
/A/B[#id=2]/../D[F1=Bref/F1 and F2=Bref/F2 and F3=Bref/F3]/F4
I'm trying to not build manually a join and to use only Xpath capabilities if possible.
Any idea ?
I'm not sure I totally understand what you mean about backreferences, but do you want something like
/A/D[F1 = ../B[#id='2']/F1 and
F2 = ../B[#id='2']/F2 and
F3 = ../B[#id='2']/F3]/F4
?
In XPath 1.0, there is not a way to create a variable and then dereference it within the same expression. In XPath 2.0, you could use
for $b in A/B[#id = '2']
return A/D[F1 = $b/F1 and F2 = $b/F2 and F3 = $b/F3]/F4
but that' won't help you, since your question is about XPath 1.0.
Update: per your comment that you're "on node B", I see two possibilities.
One possibility is that the context node is the B element whose #id is the id you've been given, and you want to use the context node rather than looking up B by its #id again. In that case, you might be able to use current():
/A/D[F1 = current()/F1 and
F2 = current()/F2 and
F3 = current()/F3]/F4
current() means the context node as it was outside the entire XPath expression.
The other possibility is that the context node is some B element but it may not be the one whose #id you've been given, or you don't care whether it is or not. In that case, just use the first XPath expression I gave above. I just added / on the front so that it doesn't matter where in the document the context node was originally.
Related
I have posted sample XML and expected output kindly help to get the result.
Sample XML
<root>
<A id="1">
<B id="2"/>
<C id="2"/>
</A>
</root>
Expected output:
<A id="1"/>
You can formulate this query in several ways:
Find elements that have a matching attribute, only ascending all the time:
//*[#id=1]
Find the attribute, then ascend a step:
//#id[.=1]/..
Use the fn:id($id) function, given the document is validated and the ID-attribute is defined as such:
/id('1')
I think it's not possible what you're after. There's no way of selecting a node without its children using XPATH (meaning that it'd always return the nodes B and C in your case)
You could achieve this using XQuery, I'm not sure if this is what you want but here's an example where you create a new node based on an existing node that's stored in the $doc variable.
declare variable $doc := <root><A id="1"><B id="2"/><C id="2"/></A></root>;
element {fn:node-name($doc/*)} {$doc/*/#*}
The above returns <A id="1"></A>.
is that what you are looking for?
//*[#id='1']/parent::* , similar to //*[#id='1']/../
if you want to verify that parent is root :
//*[#id='1']/parent::root
https://en.wikipedia.org/wiki/XPath
if you need not just parent - but previous element with some attribute: Read about Axis specifiers and use Axis "ancestor::" =)
Given this XML, what XPath returns all elements whose prop attribute contains Foo (the first three nodes):
<bla>
<a prop="Foo1"/>
<a prop="Foo2"/>
<a prop="3Foo"/>
<a prop="Bar"/>
</bla>
//a[contains(#prop,'Foo')]
Works if I use this XML to get results back.
<bla>
<a prop="Foo1">a</a>
<a prop="Foo2">b</a>
<a prop="3Foo">c</a>
<a prop="Bar">a</a>
</bla>
Edit:
Another thing to note is that while the XPath above will return the correct answer for that particular xml, if you want to guarantee you only get the "a" elements in element "bla", you should as others have mentioned also use
/bla/a[contains(#prop,'Foo')]
This will search you all "a" elements in your entire xml document, regardless of being nested in a "blah" element
//a[contains(#prop,'Foo')]
I added this for the sake of thoroughness and in the spirit of stackoverflow. :)
This XPath will give you all nodes that have attributes containing 'Foo' regardless of node name or attribute name:
//attribute::*[contains(., 'Foo')]/..
Of course, if you're more interested in the contents of the attribute themselves, and not necessarily their parent node, just drop the /..
//attribute::*[contains(., 'Foo')]
descendant-or-self::*[contains(#prop,'Foo')]
Or:
/bla/a[contains(#prop,'Foo')]
Or:
/bla/a[position() <= 3]
Dissected:
descendant-or-self::
The Axis - search through every node underneath and the node itself. It is often better to say this than //. I have encountered some implementations where // means anywhere (decendant or self of the root node). The other use the default axis.
* or /bla/a
The Tag - a wildcard match, and /bla/a is an absolute path.
[contains(#prop,'Foo')] or [position() <= 3]
The condition within [ ]. #prop is shorthand for attribute::prop, as attribute is another search axis. Alternatively you can select the first 3 by using the position() function.
Have you tried something like:
//a[contains(#prop, "Foo")]
I've never used the contains function before but suspect that it should work as advertised...
John C is the closest, but XPath is case sensitive, so the correct XPath would be:
/bla/a[contains(#prop, 'Foo')]
If you also need to match the content of the link itself, use text():
//a[contains(#href,"/some_link")][text()="Click here"]
/bla/a[contains(#prop, "foo")]
try this:
//a[contains(#prop,'foo')]
that should work for any "a" tags in the document
For the code above...
//*[contains(#prop,'foo')]
I've to made some exercices but
I don't really understand the difference between two similar path
I've the tree :
<b>
<t></t>
<a>
<n></n>
<p></p>
<p></p>
</a>
<a>
<n></n>
<p></p>
</a>
<a></a>
</b>
And we expect that each final tag contain one text node.
I've to explain the difference between //a//text() and //a/text()
I see that //a//text() return all text nodes and it seems legit,
but why //a/text() return the last "a node" -> text node ?
Another question :
why //p[1] return for each "a node", the first "p" child node ?
-> I've two results
<b>
<t></t>
<a>
<n></n>
**<p></p>**
<p></p>
</a>
<a>
<n></n>
**<p></p>**
</a>
<a></a>
</b>
Why the answer is not the first "p" node for the whole document ?
Thanks for all !
Difference between 1: //a//text() and 2: //a/text()
Let's break it down: //a selects all a elements, no matter where they are in the document. Suppose you have /a, that would select all root a elements.
If the / path expression comes after another element in an XPath expression, it will select elements directly descending the element before that in the XPath expression (ie child elements).
If the // path expression comes after another element in an XPath expression, it will select all elements that are descendant of the previous element, no matter where they are under the previous element.
Applying to your two XPath expressions:
//a//text(): Select all a elements no matter where they are in the document, and for those elements select text() no matter where they are under the a elements selected.
//a/text(): Select all a elements no matter where they are in the document, and for those elements select any direct descendant text().
Why //p[1] returns for each "a node", the first "p" child node?
Suppose you were to write //a/p[1], this would select the first p child element of any a element anywhere in the document. By writing //p[1] you are omitting an explicit parent element, but the predicate still selects the first child element of any parent the p element has.
In this case there are two parent a elements, for which the first p child element is selected.
It would be good to search for a good introduction to XPath on your favorite search engine. I've always found this one from w3schools.com to be a good one.
I'm using Ruby, XPath and Nokogiri and trying to retrieve d1 from the following XML:
<a>
<b1>
<c>
<d1>01/11/2001</d1>
<d2>02/02/2004</d2>
</c>
</b1>
</a>
This is my code in a loop:
rs = doc.xpath("//a/b1/c/d1").inner_text
puts rs
It returns nothing (No error).
I want to get the text in <d1>.
You don't ask for the text content in your xpath query:
rs = doc.xpath('//a/b1/c/d1/text()')
You're misusing XPath:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<a>
<b1>
<c>
<d1>01/11/2001</d1>
<d2>02/02/2004</d2>
</c>
</b1>
</a>
EOT
doc.at('/a/b1/c/d1').text # => "01/11/2001"
doc.at('//d1').text # => "01/11/2001"
// in XPath-ese means start at the top and look anywhere in your document. Instead, if you're supplying an explicit/absolute selector, start at the top of the document and drill down using '/a/b1/c/d1'. Or, do the simple thing and let the parser search through the document for that particular node using //d1. You can do that if you know there's a single instance of that node.
In my code above, I used at instead of xpath. at returns the first matching node, which is similar to using xpath('//d1').first. xpath returns a NodeSet, which is like an array of nodes, whereas at returns a Node only. Using inner_text on a NodeSet is likely to not give you the results you want, which would be the text of a particular node, so be careful there.
doc.xpath('/a/b1/c/d1/text()').class # => Nokogiri::XML::NodeSet
doc.xpath('//c').inner_text # => "\n 01/11/2001\n 02/02/2004\n "
doc.xpath('/a/b1/c/d1').first.text # => "01/11/2001"
Look at the following lines. Instead of using XPath selectors, I used CSS, which tends to be more readable. Nokogiri supports both.
doc.at('d1').text # => "01/11/2001"
doc.at('a b1 c d1').text # => "01/11/2001"
Also, notice the type of data returned from these two lines:
doc.at('/a/b1/c/d1/text()').class # => Nokogiri::XML::Text
doc.at('/a/b1/c/d1').text.class # => String
While it might seem good/smart to tell the parser to locate the text() node inside <d1>, what will be returned isn't text, and will need to be accessed further to make it usable, so consider forgoing the use of text() unless you know exactly why you need it:
doc.at('/a/b1/c/d1/text()').text # => "01/11/2001"
Finally, Nokogiri has many methods used for locating nodes. As I said above, xpath returns a NodeSet and at returns a Node. xpath is really an XPath-specific version of Nokogiri's search method. search, css and xpath all return NodeSets. at, at_css and at_xpath all return Nodes. The CSS and XPath variants are useful when you have an ambiguous selector that you need to be used as CSS or XPath specifically. Most of the time Nokogiri can figure whether it's CSS or XPath on its own and will do the right thing, so it's OK to use the generic search and at for the majority of your coding. Use the specific versions when you have to specify one or the other.
I'd like to use Nokogiri to extract all nodes in an element that contain a specific attribute name.
e.g., I'd like to find the 2 nodes that contain the attribute "blah" in the document below.
#doc = Nokogiri::HTML::DocumentFragment.parse <<-EOHTML
<body>
<h1 blah="afadf">Three's Company</h1>
<div>A love triangle.</div>
<b blah="adfadf">test test test</b>
</body>
EOHTML
I found this suggestion (below) at this website: http://snippets.dzone.com/posts/show/7994, but it doesn't return the 2 nodes in the example above. It returns an empty array.
# get elements with attribute:
elements = #doc.xpath("//*[#*[blah]]")
Thoughts on how to do this?
Thanks!
I found this here
elements = #doc.xpath("//*[#*[blah]]")
This is not a useful XPath expression. It says to give you all elements that have attributes that have child elements named 'blah'. And since attributes can't have child elements, this XPath will never return anything.
The DZone snippet is confusing in that when they say
elements = #doc.xpath("//*[#*[attribute_name]]")
the inner square brackets are not literal... they're there to indicate that you put in the attribute name. Whereas the outer square brackets are literal. :-p
They also have an extra * in there, after the #.
What you want is
elements = #doc.xpath("//*[#blah]")
This will give you all the elements that have an attribute named 'blah'.
You can use CSS selectors:
elements = #doc.css "[blah]"