Ruby: How to find XML child element from specific element using xpath? - ruby

Given xml
<a>
<b key=1>
<c value=xxx />
</b>
<b key=2>
<c value=yyy />
</b>
</a>
Goal: Get each "b" first, then get the "c" under that "b", like result below. With XPath for searching child.
for <b key=1>
<c value=xxx />
for <b key=2>
<c value=xxx />
but below code
b_elements = XPath.match(xml, "//b[#key]")
b_elements.each do |b_element|
puts b_element.elements["//c"]
end
will result in yeilding
for <b key=1>
<c value=xxx />
<c value=yyy />
for <b key=2>
<c value=xxx />
<c value=yyy />
instead of just getting the "c" under each "b"
I had tried below method but no luck, seems that if using Xpath, it will automatically search from root element
b.get_elements("//c")
XPath.first(b, "//c")
My workaround now is traverse child element 1 layer at a time and search for desired key, which seems quite stupid comparing to using XPath.
Please advise, thanks : )
Reference:
http://ruby-doc.org/stdlib-1.9.3/libdoc/rexml/rdoc/REXML/Element.html#method-i-each_element_with_attribute

Not sure here, but my assumption is that XPath looks at the first char, sees that it is a /, and thinks that the path is absolute (because the path starting with / is meant to be absolute).
Probably you can force the path to be relative by using a . before //, so the parser doesn't confuse // for /?
I mean, instead of "//c" use ".//c"? Hope this helps.

Related

How to identify distinct xpath for an element that occurs in different level in XML document?

For example, I want to find all the unique xpath of element in below XML. Can you please help on identifying it using xquery or any other way
<a>
<b>
<identify>Level-1</identify>
</b>
<c>
<identify>Level-2</identify>
<d>
<identify>Level-3</identify>
<e>
<identify>Level-4-1</identify>
<identify>Level-4-2</identify>
</e>
<f>
<identify>Level-4</identify>
<g>
<identify>Level-5</identify>
<identify>Level-5-2</identify>
</g>
</f
</d>
</c>
Your xml is invalid, but assuming you fix it, try this (based on this):
xquery version "3.1";
declare namespace functx = "http://www.functx.com";
declare function functx:path-to-node
( $nodes as node()* ) as xs:string* {
$nodes/string-join(ancestor-or-self::*/name(.), '/')
} ;
let $in-xml :=
<a>
<b>
<identify>Level-1</identify>
</b>
<c>
<identify>Level-2</identify>
<d>
<identify>Level-3</identify>
<e>
<identify>Level-4-1</identify>
<identify>Level-4-2</identify>
</e>
<f>
<identify>Level-4</identify>
<g>
<identify>Level-5</identify>
<identify>Level-5-2</identify>
</g>
</f>
</d>
</c>
</a>
return
functx:path-to-node($in-xml//*[name()="identify"])
Output:
"a/b/identify"
"a/c/identify"
"a/c/d/identify"
"a/c/d/e/identify"
"a/c/d/e/identify"
"a/c/d/f/identify"
"a/c/d/f/g/identify"
"a/c/d/f/g/identify"
There is also the path function: //identify/path() would give
/Q{}a[1]/Q{}b[1]/Q{}identify[1]
/Q{}a[1]/Q{}c[1]/Q{}identify[1]
/Q{}a[1]/Q{}c[1]/Q{}d[1]/Q{}identify[1]
/Q{}a[1]/Q{}c[1]/Q{}d[1]/Q{}e[1]/Q{}identify[1]
/Q{}a[1]/Q{}c[1]/Q{}d[1]/Q{}e[1]/Q{}identify[2]
/Q{}a[1]/Q{}c[1]/Q{}d[1]/Q{}f[1]/Q{}identify[1]
/Q{}a[1]/Q{}c[1]/Q{}d[1]/Q{}f[1]/Q{}g[1]/Q{}identify[1]
/Q{}a[1]/Q{}c[1]/Q{}d[1]/Q{}f[1]/Q{}g[1]/Q{}identify[2]
for the example in Jack's answer: https://xqueryfiddle.liberty-development.net/nc4P6ya
The format is ugly for XML without namespaces but in other cases has the lengthy but working format to work without setting up any namespace bindings from prefixes to URIs.

Get parents attribute value if child doesn't have a specific attribute value

I have an xml file in linux that I want to process.
I need to get all ids of a parent nodes based on its children.
Here I want to get all id of 'a' that have 'c' without key "f.g".
<a id="11111">
<b>
<c key="d.e">stuff1</c>
<c key="f.g">stuff2</c>
<c key="j.k">stuff4</c>
</b>
</a>
<a id="22222">
<b>
<c key="d.e">stuff1</c>
<c key="h.i">stuff3</c>
<c key="j.k">stuff4</c>
<c key="l.m">stuff5</c>
</b>
</a>
<a id="33333">
<b>
<c key="c.d">stuff0</c>
<c key="d.e">stuff1</c>
<c key="h.i">stuff3</c>
<c key="j.k">stuff4</c>
<c key="l.m">stuff5</c>
</b>
</a>
In this case I should be getting 22222 and 33333.
I'm not really sure how to write the xpath for this.
I think you are looking for something like:
//a[not(.//c[#key="f.g"])]/#id
which can be translated as: find any node <a> which does NOT have a child node <c> which itself has an attribute called key which itself has an attribute value of "f.g".
You can filter by (not):
//a[[not(#key = 'f.g')]]
It will return you needed 'a' elements, but I don't know how to get their ids.
#Jack Fleeting's answer is probably the best solution. As an alternative (more consuming) :
//c[not(#key="f.g" or preceding-sibling::c[#key="f.g"] or following-sibling::c[#key="f.g"])]/ancestor::a
Look for c elements where itself, and preceding or following siblings contain an attribute different from #key="f.g". Then select their a ancestors.

xpath: select node closest to root

I need select a specific node name closest to (not needly on) root.
Example:
<root>
<a>
<b id="1"></b>
<b id="2">
<b id="3"></b>
</b>
<c>
<b id="4"></b>
</c>
</a>
</root>
It should select b#1, b#2 and b#4, but not b#2, because it is included inside of another b node.
Currently I'm doing that: select all b, so check if some of parents is b, if yes, discard that. But I do it hardcoded, maybe xpath can solve that alone?
I found the solution, just using not + ancestor, like:
//table[not(ancestor::table)]
I would try below expression-
//b[not(.//ancestor::b)]
It selects-
<b id="1"/>
<b id="4"/>
See live at here.

Web config transformation condition/match to select a node based on parent node attribute

I have a transform that looks like this
<configuration xmlns:xdt="http://schemas.microsoft.com/XML-Document-Transform">
<a>
<b>
<c>
<d>
<e name="UpdateLanguageProfile">
<f xdt:Transform="Replace" xdt:Locator="Condition(/..#name='UpdateLanguageProfile')">
stuff here
</f>
</e>
</d>
</c>
</b>
</a>
So I want the xdt:Locator to select the f node only if the parent node has an attribute with the specified value.
The xdt:Locator gets translated into the following xpath expression:
/a/b/c/d/e/f[/..#name='UpdateLanguageProfile']
Which is invalid.
So the question is, what could I put in the Condition, that is the XPath square brackets, in order to select the f node based on an attribute in the parent node.
The answer is that the xdt:Locator and the xdt:Transform do not need to be on the same node. They just happen to be on the same node in every example I've ever seen.
You can do this:
<configuration xmlns:xdt="http://schemas.microsoft.com/XML-Document-Transform">
<a>
<b>
<c>
<d>
<e name="UpdateLanguageProfile" xdt:Locator="Match(name)">
<f xdt:Transform="Replace">
stuff here
</f>
</e>
</d>
</c>
</b>
</a>

Limiting an XPath predicate: predicate starting with

I am navigating this office open xml file using XPath 1.0 (extract):
<sheetData ref="A1:XFD108">
<row spans="1:3" r="1">
<c t="s" r="A1">
<is>
<t>FirstCell</t>
</is>
</c>
<c t="s" r="C1">
<is>
<t>SecondCell</t>
</is>
</c>
</row>
<row spans="1:3" r="2">
<c t="s" r="A2">
<is>
<t>ThirdCell</t>
</is>
</c>
<c t="s" r="C2">
<is>
<t>[persons.ID]</t>
</is>
</c>
</row>
</sheetData>
I need to find the cell that says "[persons.ID]", which is a variable. Technically, I need to find the first <row> containing a descendant::t that starts with [ and closes with ]. I currently have:
.//row//t[starts-with(text(), '[') and
substring(text(), string-length(text())) = ']']/ancestor::row
So I filter and then go up again. It works, but I'd like to understand XPath better here - I found no way filter the predicate. Can you point me to a valid equivalent of doing something like .//row[descendant::t[starts-with()...]].
Any help is greatly appreciated.
Technically, I need to find the first
containing a descendant::t that
starts with [ and closes with ].
/sheetData/row[c/is/t[starts-with(.,'[')]
[substring(.,string-length(.))=']']]
[1]
or
/sheetData/row[.//t[starts-with(.,'[') and
substring(.,string-length(.))=']']][1]
or
(//row[.//t[starts-with(.,'[') and
substring(.,string-length(.))=']']])[1]
One option:
.//row[starts-with(descendant::t/text(),'[') and substring(descendant::t/text(), string-length(descendant::t/text())) = ']' ]
This will give you the row, however one significant problem could be if your row has two t elements that would satisfy different conditions, but not both conditions. e.g. one t starts with [, and another ends with ]
Obvsiously, what you have doesn't have this problem
Another option: use translate
.//row[translate(descendant::t/text(),"0123456789","") = "[]"]
That will strip the numeric characters and then it's a simple comparison to the [] characters

Resources