Choosing specific element in XPath

Choosing specific element in XPath - xpath

I got 2 elements under the same name "reason". When i'm using //*:reason/text() it gives me both of the elements, but i need the first one. (not the one inside "details"). please help..
<xml xmlns:gob="http://osb.yes.co.il/GoblinAudit">
<fault>
<ctx:fault xmlns:ctx="http://www.bea.com/wli/sb/context">
<ctx:errorCode>BEA-382500</ctx:errorCode>
<ctx:reason>OSB Service Callout action received SOAP Fault response</ctx:reason>
<ctx:details>
<ns0:ReceivedFaultDetail xmlns:ns0="http://www.bea.com/wli/sb/stages/transform/config">
<ns0:faultcode xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">soapenv:Server</ns0:faultcode>
<ns0:faultstring>BEA-380001: Internal Server Error</ns0:faultstring>
<ns0:detail>
<con:fault xmlns:con="http://www.bea.com/wli/sb/context">
<con:errorCode>BEA-380001</con:errorCode>
<con:reason>Internal Server Error</con:reason>
<con:location>
<con:node>RouteTo_FinancialControllerBS</con:node>
<con:path>response-pipeline</con:path>
</con:location>
</con:fault>
</ns0:detail>
</ns0:ReceivedFaultDetail>
</ctx:details>
<ctx:location>
<ctx:node>PipelinePairNode2</ctx:node>
<ctx:pipeline>PipelinePairNode2_request</ctx:pipeline>
<ctx:stage>set maintain offer</ctx:stage>
<ctx:path>request-pipeline</ctx:path>
</ctx:location>
</ctx:fault>
</fault>
</xml>

You are using the // qualifier which will descend into any subtree and find all occurences of reason. You can try to be more specific about the subpath:
//fault/*:fault/*:reason/text()
This will only match the outer reason but not the inner reason..

"...but i need the first one"
You can use position index to get the first matched reason element :
(//*:reason)[1]/text()
" (not the one inside "details")"
The above can be expressed as finding reason element which doesn't have ancestor details :
//*:reason[not(ancestor::*:details)]/text()
For a large XML document, using more specific path i.e avoid // at the beginning, would results in a more efficient XPath :
/xml/fault/*:fault/*:reason/text()
But for a small XML, it's just a matter of personal preference, since the improvement is likely to be negligible.

Related

Xpath expression pulling multiple items despite specifying item with [ ]

I am trying to write an XPath expression which can return the URL associated with the next page of a search.
The URL which leads to the next page of the search is always the href in the a tag following the tag span class="navCurrentPage" I have been trying to use a following-sibling term to pull the next URL. My search in the Chrome console is:
$x('//span[#class="navCurrentPage"][1]/following-sibling::a/#href[1]')
I thought by specifying #href[1] I would only get back one URL (thinking the [1] chooses the first element in list), but instead Chrome (and Scrapy) are returning four URLs. I don't understand why. Please help me to understand how to select the one URL that I am looking for.
Here is the URL where you can find the HTML giving me trouble:
https://www.yachtworld.com/core/listing/cache/searchResults.jsp?cit=true&slim=quick&ybw=&sm=3&searchtype=advancedsearch&Ntk=boatsEN&Ntt=&is=false&man=&hmid=102&ftid=101&enid=0&type=%28Sail%29&fromLength=35&toLength=50&fromYear=1985&toYear=2010&fromPrice=&toPrice=&luom=126&currencyid=100&city=&rid=100&rid=101&rid=104&rid=105&rid=107&rid=108&rid=112&rid=114&rid=115&rid=116&rid=128&rid=130&rid=153&pbsint=&boatsAddedSelected=-1
Thank you for the help.

Operator precedence: //x[1] means /descendant-or-self::node()/child::x[1] which finds every descendant x that is the first child of its parent. You want (//x)[1] which finds the first node among all the descendants named x.

xpath index will apply on all matching records, if you want to get only the first item, get the first instance.
$x('//span[#class="navCurrentPage"][1]/following-sibling::a/#href[1]').extract_first()

just add, .extract_first() or .get() to fetch the first item.
see the scrapy documentation here.

I've found this very helpful to make sure you have the bracket in the right place.
What is the XPath expression to find only the first occurrence?
also, the first occurrence may be [0] not [1]

What does Camel Splitter actually do with XML Document when splitting with xpath?

I have a document with an order and a number of lines. I need to break the order into lines so I have a camel splitter set to xpath with the order line as it's value. This works fine.
However, what I get going forward is an element for the order line, which is what I want, but when converting it I need information from the order element - but if I try to get the parent element via xpath following the split, this doesn't work.
Does Camel create copies of the nodes returned by the xpath expression, or return a list of nodes within the parent document? If the former, can I make it the latter? If the latter, any ideas why a "../*" expression would return nothing?
Thanks!
Screwtape.

Look at the split options that are available when using a Tokenizer:
http://camel.apache.org/splitter.html
You have four different modes (i, w, u, t) and the 'w' one is keeping the ancestor context. In such case, the parent node (=the thing you apparently need) will be repeated in each sub-message
Default:
<m:order><id>123</id><date>2014-02-25</date></m:order>
'w' mode:
<m:orders>
<m:order><id>123</id><date>2014-02-25</date>...</m:order>
</m:orders>

xpath - matching value of child in current node with value of element in parent

Edit: I think I found the answer but I'll leave the open for a bit to see if someone has a correction/improvement.
I'm using xpath in Talend's etl tool. I have xml like this:
<root>
<employee>
<benefits>
<benefit>
<benefitname>CDE</benefitname>
<benefit_start>2/3/2004</benefit_start>
</benefit>
<benefit>
<benefitname>ABC</benefitname>
<benefit_start>1/1/2001</benefit_start>
</benefit>
</benefits>
<dependent>
<benefits>
<benefit>
<benefitname>ABC</benefitname>
</benefit>
</dependent>
When parsing benefits for dependents, I want to get elements present in the employee's
benefit element. So in the example above, I want to get 1/1/2001 for the dependent's
start date. I want 1/1/2001, not 2/3/2004, because the dependent's benefit has benefitname ABC, matching the employee's benefit with the same benefitname.
What xpath, relative to /root/employee/dependent/benefits/benefit, will yield the value of
benefit_start for the benefit under parent employee that has the same benefit name as the
dependent benefit name? (Note I don't know ahead of time what the literal value will be, I can't just look for 'ABC', I have to match whatever value is in the dependent's benefitname element.
I'm trying:
../../../benefits/benefit[benefitname=??what??]/benefit_start
I don't know how to refer to the current node's ancestor in the middle of
the xpath (since I think "." at the point I have ??what?? will refer to
the benefit node of the employee/benefits.
EDIT: I think what I want is "current()/benefitname" where the ??what?? is. Seems to work with saxon, I haven't tried it in the etl tool yet.

Your XML is malformed, and I don't think you've described your siduation very well (the XPath you're trying has a bunch of ../../s at the beginning, but you haven't said what the context node is, whether you're iterating through certain nodes, or what.
Supposing the current context node were an employee element, you could select benefit_starts that match dependent benefits with
benefits/benefit[benefitname = ../../dependent/benefits/benefit/benefitname]
/benefit_start
If the current context node is a benefit element in a dependents section, and you want to get the corresponding benefit_start for just the current benefit element, you can do:
../../../benefits/benefit[benefitname = current()/benefitname]/benefit_start
Which is what I think you've already discovered.

using parent dot notation in xpath to find another branch in the XML tree

I'm running an xslt on an XML that at one point puts my current node at the following "item" element in the tree:
host/device/item
I've found through experimentation that the following xpath takes the xslt back up to the "host" branch of the XML tree, then successfully locates the "trigger" grand child element of the "host" element.
../.././setting/trigger
It works, but the xpath syntax seems odd to me. The dot parent notation ../../. makes sense when you read it from the . from right to left. But the child phrase notation setting/trigger only make sense if you read from left to right. The "final meaning" of the entire xpath is equivalent to saying:
host/setting/trigger
Is it always true that the middle ../. section of the xpath (or however many parent ../ levels it is) is always ignored to create the final meaning host/device/trigger?

Since . means self, you could interpret it as being ignored. ./././././././* means the same as ./*, which means the same as *.
Regarding Reading the XPath, breaking it down from the context node host/device/item
In ../.././setting/trigger, the / characters separate the node tests, so, LtoR:
.. you are now at host/device,
.. you are now at host
. you are still at host
setting you are now at host/setting
trigger you are now at host/setting/trigger
If you read it RtoL, then you must understand that instead of following node-tests, you are reading each node test with the opposite meaning, and your context is where you want to end up, so:
from host/setting/trigger (your desired endpoint)
trigger (read as ..), you are now at host/setting
setting (again, ..), you are now at host
. (.), you are still at host
.. (read as *) you might now be at host/device
.. (read as *) you might now be at host/device/item

XPath : finding an attribute node (and only one)

What is the XPath to find only ONE node (whichever) having a certain attribute (actually I'm interested in the attribute, not the node). For example, in my XML, I have several tags having a lang attribute. I know all of them must have the same value. I just want to get any of them.
Right now, I do this : //*[1][#lang]/#lang, but it seems not to work properly, for an unknown reason.
My tries have led me to things ranging from concatenation of all the #lang values ('en en en en...') to nothing, with sometimes inbetween what I want but not on all XML.
EDIT :
Actually //#lang[1] can not work, because the function position() is called before the test on a lang attribute presence. So it always takes the very first element found in the XML. It worked best at the time because many many times, the lang attribute was on root element.

After some more tackling, here is a working solution :
(//#lang)[1]
Parentheses are needed to separate the [1] from the attribute name, otherwise the position() function is applied within the parent element of the attribute (which is useless since there can be only one attribute of a certain name within a tag : that's why //#lang[2] always selects nothing).

Did you tried this?
//#lang[1]
here you can see an example.

The following XPath seems to do what you want:
//*[#lang][1]/attribute::lang

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Choosing specific element in XPath - xpath

You are using the // qualifier which will descend into any subtree and find all occurences of reason. You can try to be more specific about the subpath: //fault/:fault/:reason/text() This will only match the outer reason but not the inner reason..

Related

Xpath expression pulling multiple items despite specifying item with [ ]

What does Camel Splitter actually do with XML Document when splitting with xpath?

xpath - matching value of child in current node with value of element in parent

using parent dot notation in xpath to find another branch in the XML tree

XPath : finding an attribute node (and only one)

Categories

Resources

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Choosing specific element in XPath - xpath

You are using the // qualifier which will descend into any subtree and find all occurences of reason. You can try to be more specific about the subpath: //fault/*:fault/*:reason/text() This will only match the outer reason but not the inner reason..

Related

Xpath expression pulling multiple items despite specifying item with [ ]

What does Camel Splitter actually do with XML Document when splitting with xpath?

xpath - matching value of child in current node with value of element in parent

using parent dot notation in xpath to find another branch in the XML tree

XPath : finding an attribute node (and only one)

Categories

Resources

You are using the // qualifier which will descend into any subtree and find all occurences of reason. You can try to be more specific about the subpath: //fault/:fault/:reason/text() This will only match the outer reason but not the inner reason..