xpath - matching value of child in current node with value of element in parent - xpath

Edit: I think I found the answer but I'll leave the open for a bit to see if someone has a correction/improvement.
I'm using xpath in Talend's etl tool. I have xml like this:
<root>
<employee>
<benefits>
<benefit>
<benefitname>CDE</benefitname>
<benefit_start>2/3/2004</benefit_start>
</benefit>
<benefit>
<benefitname>ABC</benefitname>
<benefit_start>1/1/2001</benefit_start>
</benefit>
</benefits>
<dependent>
<benefits>
<benefit>
<benefitname>ABC</benefitname>
</benefit>
</dependent>
When parsing benefits for dependents, I want to get elements present in the employee's
benefit element. So in the example above, I want to get 1/1/2001 for the dependent's
start date. I want 1/1/2001, not 2/3/2004, because the dependent's benefit has benefitname ABC, matching the employee's benefit with the same benefitname.
What xpath, relative to /root/employee/dependent/benefits/benefit, will yield the value of
benefit_start for the benefit under parent employee that has the same benefit name as the
dependent benefit name? (Note I don't know ahead of time what the literal value will be, I can't just look for 'ABC', I have to match whatever value is in the dependent's benefitname element.
I'm trying:
../../../benefits/benefit[benefitname=??what??]/benefit_start
I don't know how to refer to the current node's ancestor in the middle of
the xpath (since I think "." at the point I have ??what?? will refer to
the benefit node of the employee/benefits.
EDIT: I think what I want is "current()/benefitname" where the ??what?? is. Seems to work with saxon, I haven't tried it in the etl tool yet.

Your XML is malformed, and I don't think you've described your siduation very well (the XPath you're trying has a bunch of ../../s at the beginning, but you haven't said what the context node is, whether you're iterating through certain nodes, or what.
Supposing the current context node were an employee element, you could select benefit_starts that match dependent benefits with
benefits/benefit[benefitname = ../../dependent/benefits/benefit/benefitname]
/benefit_start
If the current context node is a benefit element in a dependents section, and you want to get the corresponding benefit_start for just the current benefit element, you can do:
../../../benefits/benefit[benefitname = current()/benefitname]/benefit_start
Which is what I think you've already discovered.

Related

Select XML Node by position

I have the following XML structure
<Root>
<BundleItem>
<Item>1</Item>
<Item>2</Item>
<Item>3</Item>
</BundleItem>
<Item>4</Item>
<Item>5</Item>
<Item>6</Item>
<BundleItem>
<Item>7</Item>
<Item>8</Item>
<Item>9</Item>
</BundleItem>
</Root>
And by providing the following xPath
//Item[1]
I am selecting
<Item>1</Item>
<Item>4</Item>
<Item>7</Item>
My goal is to select only <Item>1</Item> or <Item>7</Item> regardless of the parent element where they are found and only depending on the position, which i am providing in the xPath.
Is it possible to do that only by using the position and without providing additional criterias in the xPath ?
//Item[1] selects the all the first child elements that are <Item/> regardless of their parent.
To get the two items you are looking for you could use //Item[text() = 1 or text() = 7].
A good tutorial can be found at w3schools.com and you can play with XPath expressions over your XML input here. (I am not affiliated with either of these resources but find them useful.)

XPATH Select All Attributes attr Except One On Specific Element elem

I was selecting all attributes id and everything was going nicely then one day requirements changed and now I have to select all except one!
Given the following example:
<root>
<structs id="123">
<struct>
<comp>
<data id="asd"/>
</comp>
</struct>
</structs>
</root>
I want to select all attributes id except the one at /root/structs/struct/comp/data
Please note that the Xml could be different.
Meaning, what I really want is: given any Xml tree, I want to select all attributes id except the one on element /root/structs/struct/comp/data
I tried the following:
//#id[not(ancestor::struct)] It kinda worked but I want to provide a full xpath to the ancestor axis which I couldn't
//#id[not(contains(name(), 'data'))] It didn't work because name selector returns the name of the underlying node which is the attribute not its parent element
The following should achieve what you're describing:
//#id[not(parent::data/parent::comp/parent::struct/parent::structs/parent::root)]
As you can see, it simply checks from bottom to top whether the id attribute's parent matches the path root/structs/struct/comp/data.
I think this should be sufficient for your needs, but it does not 100% ensure that the parent is at the path /root/structs/struct/comp/data because it could be, for example, at the path /someOtherHigherRoot/root/structs/struct/comp/data. I'm guessing that's not a possible scenario in your XML structure, but if you had to check for that, you could do this:
//#id[not(parent::data/parent::comp/parent::struct/parent::structs/parent::root[not(parent::*)])]

Choosing specific element in XPath

I got 2 elements under the same name "reason". When i'm using //*:reason/text() it gives me both of the elements, but i need the first one. (not the one inside "details"). please help..
<xml xmlns:gob="http://osb.yes.co.il/GoblinAudit">
<fault>
<ctx:fault xmlns:ctx="http://www.bea.com/wli/sb/context">
<ctx:errorCode>BEA-382500</ctx:errorCode>
<ctx:reason>OSB Service Callout action received SOAP Fault response</ctx:reason>
<ctx:details>
<ns0:ReceivedFaultDetail xmlns:ns0="http://www.bea.com/wli/sb/stages/transform/config">
<ns0:faultcode xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">soapenv:Server</ns0:faultcode>
<ns0:faultstring>BEA-380001: Internal Server Error</ns0:faultstring>
<ns0:detail>
<con:fault xmlns:con="http://www.bea.com/wli/sb/context">
<con:errorCode>BEA-380001</con:errorCode>
<con:reason>Internal Server Error</con:reason>
<con:location>
<con:node>RouteTo_FinancialControllerBS</con:node>
<con:path>response-pipeline</con:path>
</con:location>
</con:fault>
</ns0:detail>
</ns0:ReceivedFaultDetail>
</ctx:details>
<ctx:location>
<ctx:node>PipelinePairNode2</ctx:node>
<ctx:pipeline>PipelinePairNode2_request</ctx:pipeline>
<ctx:stage>set maintain offer</ctx:stage>
<ctx:path>request-pipeline</ctx:path>
</ctx:location>
</ctx:fault>
</fault>
</xml>
You are using the // qualifier which will descend into any subtree and find all occurences of reason. You can try to be more specific about the subpath:
//fault/*:fault/*:reason/text()
This will only match the outer reason but not the inner reason..
"...but i need the first one"
You can use position index to get the first matched reason element :
(//*:reason)[1]/text()
" (not the one inside "details")"
The above can be expressed as finding reason element which doesn't have ancestor details :
//*:reason[not(ancestor::*:details)]/text()
For a large XML document, using more specific path i.e avoid // at the beginning, would results in a more efficient XPath :
/xml/fault/*:fault/*:reason/text()
But for a small XML, it's just a matter of personal preference, since the improvement is likely to be negligible.

XPATH - cannot select grandparent node

I am trying to parse a live betting XML feed and need to grab each bet from within the code. In plain English I need to use the tag 'EventSelections' for my base query and 'loop' through these tags on the XML so I grab all that data and it creates and entity for each one which I can use on a CMS.
My problem is I want to go up two places in the tree to a grandparent node to gather that info. Each EventID refers to the unique name of a game and some games have more bets than others. It's important that I grab each bet AND the EventID associated with it, problem is, this ID is the grandparent each time. Example:
<Sportsbet Time="2013-08-03T08:38:01.6859354+09:30">
<Competition CompetitionID="18" CompetitionName="Baseball">
<Round RoundID="2549" RoundName="Major League Baseball">
<Event EventID="849849" EventName="Los Angeles Dodgers (H Ryu) At Chicago Cubs (T Wood)" Venue="" EventDate="2013-08-03T05:35:00" Group="MTCH">
<Market Type="Match Betting - BIR" EachWayPlaces="0">
<EventSelections BetSelectionID="75989549" EventSelectionName="Los Angeles Dodgers">
<Bet Odds="1.00" Line=""/>
</EventSelections>
<EventSelections BetSelectionID="75989551" EventSelectionName="Chicago Cubs">
<Bet Odds="17.00" Line=""/>
</EventSelections>
Does anyone know how I can grab the granparent tags as well?
Currently I am using:
//EventSelections (this is the context)
.//#BetSelectionID
.//#EventSelectionName
I have tried dozens of different ways to do this including the ../.. operator which won't work either. I'd be eternally grateful for any help on this. Thanks.
I think you just haven't gone far enough up the tree.
../* is a two-step location bath with abbreviations, expanded to parent::node()/child::* ... so in effect you are going up the tree with the first step, but back down the tree for the second step.
Therefore, ../* gives you your siblings (parent's children), ../../* gives you your aunts and uncles (grandparent's children), and ../../../* gives you your grandparent and its siblings (great-grandparent's children).
For attributes, ../#* is an abbreviation for parent::node()/attribute::* and attributes are attached to elements, they are not considered children. So you are going sideways, not down the tree in the second step.
Therefore, unlike above, ../#* gives you your parent's attributes, while ../../#* gives you your grandparent's attributes.
But using // in your situation is really inappropriate. // is an abbreviation for /descendent-or-self::node()/ which walks all the way down a tree to the leaves of the tree. It should be used only in rare occasions (and I cringe when I see it abused on SO questions).
So ..//..//..//#RoundID may work for you, but it is in effect addressing attributes all over the tree and not just an attribute of your great-grandparent, which is why it is finding the attribute of your grandparent. ../../#RoundID should be all you need to get the attribute of your grandparent.
If you torture a stylesheet long enough, it will eventually work for you, but it really is more robust and likely faster executing to address things properly.
You could go with ancestor::Event/#EventID, which does exactly you asked for: matches an ancestor element named Event and returns it's EventID attribute.

Retrieve an xpath text contains using text()

I've been hacking away at this one for hours and I just can't figure it out. Using XPath to find text values is tricky and this problem has too many moving parts.
I have a webpage with a large table and a section in this table contains a list of users (assignees) that are assigned to a particular unit. There is nearly always multiple users assigned to a unit and I need to make sure a particular user is assigned to any of the units on the table. I've used XPath for nearly all of my selectors and I'm half way there on this one. I just can't seem to figure out how to use contains with text() in this context.
Here's what I have so far:
//td[#id='unit']/span [text()='asdfasdfasdfasdfasdf (Primary); asdfasdfasdfasdfasdf, asdfasdfasdfasdf; 456, 3456'; testuser]
The XPath Query above captures all text in the particular section I am looking at, which is great. However, I only need to know if testuser is in that section.
text() gets you a set of text nodes. I tend to use it more in a context of //span//text() or something.
If you are trying to check if the text inside an element contains something you should use contains on the element rather than the result of text() like this:
span[contains(., 'testuser')]
XPath is pretty good with context. If you know exactly what text a node should have you can do:
span[.='full text in this span']
But if you want to do something like regular expressions (using exslt for example) you'll need to use the string() function:
span[regexp:test(string(.), 'testuser')]

Resources