xpath get first element based on multi-level condition

xpath get first element based on multi-level condition - xpath

I have the following xml.
<root>
<h>
<seg>
<hfield1>hA</hfield1>
<hfield2>h1</hfield2>
</seg>
<seg>
<hfield1>hB</hfield1>
<hfield2>h2</hfield2>
</seg>
</h>
<i>
<iseg>
<ifield1>i1</ifield1>
</iseg>
<iseg>
<ifield1>i2</ifield1>
</iseg>
</i>
<i>
<iseg>
<ifield1>i3</ifield1>
</iseg>
<iseg>
<ifield1>i4</ifield1>
</iseg>
</i>
I need to extract the value of hfiel1 if its hfield2 = 'h2' and if at least one ifield1 = 'i2'.
I'm trying xpath 1.0 with this expression. I exptected 'hB' as a result but it's not working.
//seg/hfield1/text()[..//hfield2/text() = 'h2' and //ifield1 = 'i2'][1]
How can I do?
BR

Try this XPath-1.0 expression:
//seg/hfield1[../hfield2 = 'h2' and //ifield1 = 'i2']

Additionally to zx485's solution you can also do it with the following XPath 1:0 expression:
//seg/hfield2[text() = 'h2' and //ifield1 = 'i2']/preceding-sibling::hfield1

If your xml-tree gets bigger I suggest to use a more explicit XPath, i.e:
/root[i/iseg/ifield1='i2']/h/seg[hfield2='h2']/hfield1/text()

Related

Xpath: return all nodes that match any one of the conditions

I am trying to fetch two nodes from XML as combined result using OR condition.
Nodes in XML where name = John or name="jim",both should be returned . So basically I expect following result:
<person name="John"></person>
<person name="Jim"></person>
I have tried XPath function * ///person[#name="John"] or ///person[#name="Jim"]*
but it gives me only one node.
How to construct Xpath function in this case ?
regards,
Venky

I would use a predicate person[#name = ('John', 'Jim')] if we assume Saxon means a Saxon 9 version where XPath 2 or 3 is supported. Of course the right place for your or expression would be inside the square brackets person[#name = 'Jim' or #name = 'John'].

Obtaining a partial value from XPath

I have the current HTML code:
<div class="group">
<ul class="smallList">
<li><strong>Date</strong>
13.06.2019
</li>
<li>...</li>
<li>...</li>
</ul>
</div>
and here is my "wrong" XPath:
//div[#class='group']/ul/li[1]
and I would like to extract the date with XPath without the text in the strong tag, but I'm not sure how NOT is used in XPath or could it even be used in here?
Keep in mind that the date is dynamic.

Use substring-after() to get the date value.
substring-after(//div[#class='group']/ul/li[1],'Date')
Output:

The easiest way to get the date is by using the XPath-1.0 expression
//div[#class='group']/ul/li[1]/text()[normalize-space(.)][1]
The result does include the spaces.
If you want to get rid of them, too, use the following expression:
normalize-space(//div[#class='group']/ul/li[1]/text()[normalize-space(.)][1])
Unfortunately this only works for one result in XPath-1.0.
If you'd have XPath-2.0 available, you could append the normalize-space() to the end of the expression which also enables the processing of multiple results:
//div[#class='group']/ul/li[1]/text()[normalize-space(.)][1]/normalize-space()

Here is the python method that will read the data directly from the parent in your case the data is associated with ul/li.
Python:
def get_text_exclude_children(element):
return driver.execute_script(
"""
var parent = arguments[0];
var child = parent.firstChild;
var textValue = "";
while(child) {
if (child.nodeType === Node.TEXT_NODE)
textValue += child.textContent;
child = child.nextSibling;
}
return textValue;""",
element).strip()
This is how to call this in your case.
ulEle = driver.find_element_by_xpath("//div[#class='group']/ul/li[1]")
datePart = get_text_exclude_children(ulEle)
print(datePart)
Please feel free to convert to the language that you are using, if it's not python.

xpath without specificy the tag? [duplicate]

Given this XML, what XPath returns all elements whose prop attribute contains Foo (the first three nodes):
<bla>
<a prop="Foo1"/>
<a prop="Foo2"/>
<a prop="3Foo"/>
<a prop="Bar"/>
</bla>

//a[contains(#prop,'Foo')]
Works if I use this XML to get results back.
<bla>
<a prop="Foo1">a</a>
<a prop="Foo2">b</a>
<a prop="3Foo">c</a>
<a prop="Bar">a</a>
</bla>
Edit:
Another thing to note is that while the XPath above will return the correct answer for that particular xml, if you want to guarantee you only get the "a" elements in element "bla", you should as others have mentioned also use
/bla/a[contains(#prop,'Foo')]
This will search you all "a" elements in your entire xml document, regardless of being nested in a "blah" element
//a[contains(#prop,'Foo')]
I added this for the sake of thoroughness and in the spirit of stackoverflow. :)

This XPath will give you all nodes that have attributes containing 'Foo' regardless of node name or attribute name:
//attribute::*[contains(., 'Foo')]/..
Of course, if you're more interested in the contents of the attribute themselves, and not necessarily their parent node, just drop the /..
//attribute::*[contains(., 'Foo')]

descendant-or-self::*[contains(#prop,'Foo')]
Or:
/bla/a[contains(#prop,'Foo')]
Or:
/bla/a[position() <= 3]
Dissected:
descendant-or-self::
The Axis - search through every node underneath and the node itself. It is often better to say this than //. I have encountered some implementations where // means anywhere (decendant or self of the root node). The other use the default axis.
* or /bla/a
The Tag - a wildcard match, and /bla/a is an absolute path.
[contains(#prop,'Foo')] or [position() <= 3]
The condition within [ ]. #prop is shorthand for attribute::prop, as attribute is another search axis. Alternatively you can select the first 3 by using the position() function.

Have you tried something like:
//a[contains(#prop, "Foo")]
I've never used the contains function before but suspect that it should work as advertised...

John C is the closest, but XPath is case sensitive, so the correct XPath would be:
/bla/a[contains(#prop, 'Foo')]

If you also need to match the content of the link itself, use text():
//a[contains(#href,"/some_link")][text()="Click here"]

/bla/a[contains(#prop, "foo")]

try this:
//a[contains(#prop,'foo')]
that should work for any "a" tags in the document

For the code above...
//*[contains(#prop,'foo')]

To get text after the tag, containing another text

For example:
<p>
<b>Member Since:</b> Aug. 07, 2010<br><b>Time Played:</b> <span class="text_tooltip" title="Actual Time: 15.09:37:06">16 days</span><br><b>Last Game:</b>
<span class="text_tooltip" title="07/16/2011 23:41">1 minute ago</span>
<br><b>Wins:</b> 1,017<br><b>Losses / Quits:</b> 883 / 247<br><b>Frags / Deaths:</b> 26,955 / 42,553<br><b>Hits / Shots:</b> 690,695 / 4,229,566<br><b>Accuracy:</b> 16%<br>
</p>
I want to get 1,017. It is a text after the tag, containing text Wins:.
If I used regex, it would be [/<b>Wins:<\/b> ([^<]+)/,1], but how to do it with Nokogiri and XPath?
Or should I better parse this part of page with regex?

Here
doc = Nokogiri::HTML(html)
puts doc.at('b[text()="Wins:"]').next.text

You can use this XPath: //*[*/text() = 'Wins:']/text() It will return 1,017.
About regex: RegEx match open tags except XHTML self-contained tags

I would use pure XPath like:
"//b[.='Wins:']/following::node()[1]"
I've heard thousand of times (and from gurus) "never use regex to parse XML". Can you provide some "shocking" reference demonstrating that this sentence is not valid any more?

Use:
//*[. = 'Wins:']/following-sibling::node()[1]
In case this is ambiguous (selects more than one node), more strict expressions can be specified:
//*[. = 'Wins:']/following-sibling::node()[self::text()][1]
Or:
(//*[. = 'Wins:'])[1]/following-sibling::node()[1]
Or:
(//*[. = 'Wins:'])[1]/following-sibling::node()[self::text()][1]

Combining the use of preceding and following sibling in the same xpath query

I have a quite simple problem but i can't seem to resolve it. Let's say i have the following code:
<a>
<b property="p1">zyx</b>
<b>wvu</b>
<b>tsr</b>
<b property="p2">qpo</b>
<b>qcs</b>
</a>
I want to select the nodes between the b node who has a property="p1" and the b node who has property="p2". I can do either of those with the preceding-sibling and the following-sibling axis but I can't seem to find how to combine both.

XPath 1.0:
/a/b[preceding-sibling::b/#property='p1' and following-sibling::b/#property='p2']
XPath 2.0:
The expression above has some quirks in XSLT 2.0, it is better to use the new and safer operators << (before) and >> (after).
/a/b[../b[#property='p2'] << . and . >> ../b[#property='p1']]

Also, this XPath 1.0:
/a/b[preceding-sibling::b/#property='p1'][following-sibling::b/#property='p2']
Note: Don't use // as first step. Whenever you can replace and operator by predicates, do it.
In XPath 2.0:
/a/b[. >> ../b[#property='p1']][../b[#property='p2'] >> .]

You can combine the tests in the predicate using and.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

xpath get first element based on multi-level condition - xpath

Try this XPath-1.0 expression: //seg/hfield1[../hfield2 = 'h2' and //ifield1 = 'i2']

Additionally to zx485's solution you can also do it with the following XPath 1:0 expression: //seg/hfield2[text() = 'h2' and //ifield1 = 'i2']/preceding-sibling::hfield1

If your xml-tree gets bigger I suggest to use a more explicit XPath, i.e: /root[i/iseg/ifield1='i2']/h/seg[hfield2='h2']/hfield1/text()

Related

Xpath: return all nodes that match any one of the conditions

Obtaining a partial value from XPath

xpath without specificy the tag? [duplicate]

To get text after the tag, containing another text

Combining the use of preceding and following sibling in the same xpath query

Categories

Resources