XQuery ancestor axis doesn't work, but explicit XPath does - xpath

Consider the following XML snippet:
<doc>
<chapter id="1">
<item>
<para>some text here</para>
</item>
</chapter>
</doc>
In XQuery, I have a function that needs to do some things based on the ancestor chapter of a given "para" element that is passed in as a parameter, as shown in the stripped down example below:
declare function doSomething($para){
let $chapter := $para/ancestor::chapter
return "some stuff"
};
In that example, $chapter keeps coming up empty. However, if I write the function similar to the follwing (i.e., without using the ancestor axis), I get the desired "chapter" element:
declare function doSomething($para){
let $chapter := $para/../..
return "some stuff"
};
The problem is that I cannot use explicit paths as in the latter example because the XMl I will be searching is not guaranteed to have the "chapter" element as a grandparent every time. It may be a great-grandparent or great-great-grandparent, and so on, as shown below:
<doc>
<chapter id="1">
<item>
<subItem>
<para>some text here</para>
</subItem>
</item>
</chapter>
</doc>
Does anyone have an explanation as to why the axis doesn't work, while the explicit XPath does? Also, does anyone have any suggestions on how to solve this problem?
Thank you.
SOLUTION:
The mystery is now solved.
The node in question was re-created in another function, which had the result of stripping it of all of its ancestor information. Unfortunately, the previous developer did not document this wonderful, little function and has cost us all a good deal of time.
So, the ancestor axis worked exactly as it should - it was just being applied to a deceptive node.
I thank all of you for your efforts in answering my questions.

The ancestor axis does work fine. I suspect your problem is namespaces. The example you showed and that I ran (below) has XML without any namespaces. If your XML have a namespace then you would need to provide that in the ancestor XPath, like this: $para/ancestor:foo:chapter where in this case the prefix _foo_ is bound to the correct namespace for the chapter element.
let $doc := <doc>
<chapter id="1">
<item>
<para>some text here</para>
</item>
</chapter>
</doc>
let $para := $doc//para
return $para/ancestor::chapter
RESULT:
<?xml version="1.0" encoding="UTF-8"?>
<chapter id="1">
<item>
<para>some text here</para>
</item>
</chapter>

These things almost always boil down to namespaces! As a daignostic to confirm 100% that namespace are not the issue, can you try:
declare function local:doSomething($para) {
let $chapter := $para/ancestor::*[local-name() = 'chapter']
return $chapter
};

This seems surprising to me; which XQuery implementation are you using? With BaseX, the following query...
declare function local:doSomething($para) {
let $chapter := $para/ancestor::chapter
return $chapter
};
let $xml :=
<doc>
<chapter id="1">
<item>
<para>some text here</para>
</item>
</chapter>
</doc>
return local:doSomething($xml//para)
...returns...
<chapter id="1">
<item>
<para>some text here</para>
</item>
</chapter>

I suspect namespaces too. If $para/../.. works but $para/parent::item/parent::chapter turns up empty, then you know it's a question of namespaces.
Look for an xmlns declaration at the top of your content, e.g.:
<doc xmlns="http://example.com">
...
</doc>
In your XQuery, you then need to bind that namespace to a prefix and use that prefix in your XQuery/XPath expressions, like this:
declare namespace my="http://example.com";
declare function doSomething($para){
let $chapter := $para/ancestor::my:chapter
return "some stuff"
};
What prefix you use doesn't matter. The important thing is that the namespace URI (http://example.com in the above example) matches up.
It makes sense that ../.. selects the element you want, because .. is short for parent::node() which selects the parent node regardless of its name (or namespace). Whereas ancestor::chapter will only select <chapter> elements that are not in a namespace (unless you have declared a default element namespace, which is usually not a good idea in XQuery because it affects both your input and your output).

Related

Getting a node value depending on an another value at the same level

For each "item" node in the following XML structure, I want to select the corresponding "title" (the text nodes are located at the same level as the item nodes, I can't modify it).
The link between those two nodes will be the "ref" node which is a kind of primary key between the "item" and "title" trees.
Is it possible in XPath ?
I think it should be something like this: //root/item/../title[ref/text()=??????]/label
An example :
<root>
<item>
<ref>ITEM001</ref>
</item>
<item>
<ref>ITEM002</ref>
</item>
<item>
<ref>ITEM003</ref>
</item>
<item>
<ref>ITEM004</ref>
</item>
<title>
<ref>ITEM002</ref>
<label>Hello world!</label>
</title>
<title>
<ref>ITEM003</ref>
<label>Goodbye world!</label>
</title>
<title>
<ref>ITEM007</ref>
<label>This is a test!</label>
</title>
<title>
<ref>ITEM0010</ref>
<label>No this a question!</label>
</title>
</root>
The result would be:
ITEM001: empty
ITEM002: Hello world!
ITEM003: Goodbye world!
ITEM004: empty
Thanks in advance for your help.
I assume if you follow below steps you would get you desired output.
Step 1: Iterate through all the Items tag and capture all in an array.
Step 2: Using a loop on array use the below XPath to find the respective label value.
//title[contains(.,'')]/label.
Step 3: If you find an matching element then get the text of the label to display on console else display empty.

nested for loop with where condition in Xquery

I need to filter tag value from the following sample XML.
<ClinicalDocument xmlns="urn:hl7-org:v3">
<id root="3930E379-5C54-477D-8DB2-F6C92BC08C691" />
<component>
<structuredBody>
<component>
<section>
<templateId root="1.3.6.1.4.1.19376.1.5.3.1.3.4"/>
<code code="10164-2" codeSystem="2.16.840.1.113883.6.1"
codeSystemName="LOINC" displayName="HISTORY OF PRESENT ILLNESS"/>
<title>HISTORY OF PRESENT ILLNESS</title>
<text>Patient slipped and fell on ice, twisting her ankle as she fell.
</text>
</section>
</component>
<component>
<section>
<templateId root="1.3.6.1.4.1.19376.1.5.3.1.3.5"/>
<code code="10164-3" codeSystem="2.16.840.1.113883.6.12"
codeSystemName="LOINC1" displayName="DEMO"/>
<title>DEMO HISTORY OF PRESENT ILLNESS</title>
<text>DEMO Patient slipped and fell on ice, twisting her ankle as she fell.
</text>
</section>
</component>
</structuredBody>
</component>
</ClinicalDocument>
there are many file like this in my collection(i am using eXits-db), and i need to filter based on 'root' attribute in <id> tag and 'root' attribute in <templateId> tag. and the result i need is only the <title> text value.
Following is the query i tried.But is shows all the title values(not the one which match my condition).
xquery version "3.0";
declare namespace d = "urn:hl7-org:v3";
(
for $prod in collection("/db/netspectivedb/")/d:ClinicalDocument
where $prod/d:id/#root/string()='3930E379-5C54-477D-8DB2-F6C92BC08C691'
and $prod/d:component/d:structuredBody/d:component/d:section/d:templateId/#root/string()='1.3.6.1.4.1.19376.1.5.3.1.3.4'
return $prod/d:component/d:structuredBody/d:component/d:section/d:title/text()
)
The problem was, that $prod in your XQuery references ClinicalDocument, which isn't specific enough for your purpose. You want to loop through component or section inside structuredBody instead to start with, for example :
declare namespace d = "urn:hl7-org:v3";
(
for $section in collection("/db/netspectivedb/")/d:ClinicalDocument[d:id/#root eq '3930E379-5C54-477D-8DB2-F6C92BC08C691']/d:component/d:structuredBody/d:component/d:section
where $section/d:templateId/#root eq '1.3.6.1.4.1.19376.1.5.3.1.3.4'
return $section/d:title/text()
)
or using nested for as you specifically asked. Nested for also turns out to be more readable in this case :
declare namespace d = "urn:hl7-org:v3";
(
for $prod in collection("/db/netspectivedb/")/d:ClinicalDocument
for $section in $prod/d:component/d:structuredBody/d:component/d:section
where $prod/d:id/#root eq '3930E379-5C54-477D-8DB2-F6C92BC08C691'
and $section/d:templateId/#root eq '1.3.6.1.4.1.19376.1.5.3.1.3.4'
return $section/d:title/text()
)
I am using eq instead of = above since we mean to do value comparison (read more: https://developer.marklogic.com/blog/comparison-operators-whats-the-difference)
You could achieve the same thing with a single XPath expression:
declare namespace d = "urn:hl7-org:v3";
collection("/db/netspectivedb/")/
d:ClinicalDocument[d:id/#root eq '3930E379-5C54-477D-8DB2-F6C92BC08C691']/
d:component/d:structuredBody/d:component/
d:section[d:templateId/#root eq '1.3.6.1.4.1.19376.1.5.3.1.3.4']/d:title/text()

Get low level xpath from XML with Nokogiri

I'm trying to store in an array all the unique Xpaths of the low level elements in the XML below, but like I'm doing in array a is being stored all the XML, not only the Xpath themselves. The XML has different levels of Xpath. I mean, some child elements only have 2 ancestors and others more than one.
This is the code I have.
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<?xml version="1.0" encoding="UTF-8"?>
<items>
<item>
<name>Cake</name>
<ppu>0.55</ppu>
<batters>
<batter>Regular</batter>
<batter>Chocolate</batter>
<batter>Blueberry</batter>
<batter>Devil's Food</batter>
</batters>
<topping>None</topping>
<topping>Glazed</topping>
<topping>Sugar</topping>
<topping>Powdered Sugar</topping>
<topping>Chocolate with Sprinkles</topping>
<topping>Chocolate</topping>
<topping>Maple</topping>
</item>
<item>
<name>Raised</name>
<ppu>0.55</ppu>
<batters>
<batter>Regular</batter>
</batters>
<topping>None</topping>
<topping>Glazed</topping>
<topping>Sugar</topping>
<topping>Chocolate</topping>
<topping>Maple</topping>
</item>
</items>
EOT
a = []
a = doc.xpath("//*")
puts a
I'd like to store in array "a" only the unique xpaths as below:
/items/item/name
/items/item/ppu
/items/item/batters/batter
/items/item/topping
Maybe somebody could help me in how to do this.
Thanks for the help.
What you want to select is the "leaf" nodes. You can do it like so:
doc.xpath("//*[not(*)]")
This means "select all elements that don't contain elements".
If you want the XPaths, you'll need to call .path on each node. But the paths provided by Nokogiri have explicit positions (e.g. /items/item[2]/topping[4]), so you'll have to apply a regex to remove them, then remove duplicates with uniq:
doc.xpath("//*[not(*)]").map {|leaf| leaf.path.gsub(/\[.*?\]/, '') }.uniq
Output:
/items/item/name
/items/item/ppu
/items/item/batters/batter
/items/item/topping

Ruby + Nokogiri + Xpath navigate Node_Set

<Item id="item0">
<Links>
<FirstLink id="link1" target="one"/>
<SecondLink id="link2" target="two"/>
</Links>
<Data>
<String>content</String>
</Data>
</Item>
<Item id="item1">
<Links>
<FirstLink id="link1" target="two"/>
<SecondLink id="link2" target="two"/>
</Links>
<Data>
<String>content</String>
</Data>
</Item>
I have created a Nokogiri-NodeSet with this structure, i.e. a list of items with links and data children.
How can I filter any items that don't match a certain value in the 'target'-attribute of <FirstLink>?
Actually, what I want in the end is to extract the <Data><String>-Content of every <Item> that matches a certain value in it's <FirstLink> "Target"-Attribute.
I've tried several approaches already but I'm at a loss as to how to identify an element by an attribute of it's grandchild, then extracting the content of this grandchild's parent's sibling, X(.
We can build up an XPath expression to do this. Assuming we are starting from the whole XML document, rather than the node-set you already have, something like
//Item
will select all <Item> elements (I’m guessing you already have something like that to get this node-set).
Next, to select only those <Item> elements which have <Links><FirstLink> where FirstLink has a target attribute value of one:
//Item[Links/FirstLink[#target='one']]
and finally to select the Data/String children of those nodes:
//Item[Links/FirstLink[#target='one']]/Data/String
So with Nokogiri you could use something like this (where doc is your parsed document):
doc.xpath("//Item[Links/FirstLink[#target='one']]/Data/String")
or if you want to use the node-set you already have you can use a relative expression:
nodeset.xpath("self::Item[Links/FirstLink[#target='one']]/Data/String")
I completely didn't understand what your goal is. But using a guess, I am trying to show you, how to proceed in this case :
require 'nokogiri'
doc = Nokogiri::XML <<-xml
<Item id="item0">
<Links>
<FirstLink id="link1" target="one"/>
<SecondLink id="link2" target="two"/>
</Links>
<Data>
<String>content1</String>
</Data>
</Item>
<Item id="item1">
<Links>
<FirstLink id="link1" target="two"/>
<SecondLink id="link2" target="two"/>
</Links>
<Data>
<String>content2</String>
</Data>
</Item>
xml
#xpath method with the expression "//Item", will select all the Item nodes. Then those Item nodes will be passed to the #reject method to select only those nodes, that has a node called Links having the target attribute value is "one". If any of the links, either FirstLink or SecondLink has the target attribute value "one", for that nodes grandparent node Item will be selected.
node.at("//Links/FirstLink")['target'] will give you the string say "one" which is a value of target attribute of the node, FirstLink of first Item nodes , then "two" from the second Item node. The part ['any vaue'] in node.at("//Links/FirstLink")['target']['any vaue'] is a call to the String#[] method.
Remember below approach will give you the flexibility of the use regular expression too.
nodeset = doc.xpath("//Item").reject do |node|
node.at("//Links/FirstLink")['target']['any vaue']
end
Now nodeset contains only the required Item nodes. Now I use #map, passing each item node inside it to collect the content of the String node. Then #at method with an expression //Data/String, will select the String node. Then #text, will give you the content of each String node.
nodeset.map { |n| n.at('//Data/String').text } # => ["content1"]

Using ruby/nokogiri to transform xml to another xml

I've never encountered task of transforming XML from one form to another. I hear that XSLT is just for that, but I don't want to go there. So, using only ruby and nokogiri, how can I:
remove all item elements but time from initial XML and also rename element time to HammerTime?
Initial XML:
...
<item>
<time>05.04.2011 9:53:23</time>
<iddqd>42</iddqd>
<idkfa>woot</idkfa>
</item>
<item>
...
Desired result:
...
<item>
<HammerTime>05.04.2011 9:53:23</HammerTime>
</item>
<item>
...
I figured out how to put data from XML to array using nokogiri's .xpath, but is there a way to make the desired transformation into another XML without manually having to write something like puts "<HammerTime>#{array['time']}</HammerTime>"?
Here you go:
require 'rubygems'
require 'nokogiri'
doc = Nokogiri::HTML <<-EOHTML
<html>
<body>
<item>
<time>05.04.2011 9:53:23</time>
<iddqd>42</iddqd>
<idkfa>woot</idkfa>
</item>
</body>
</html>
EOHTML
hammer = doc.at_css "time"
hammer.name = 'hammertime'
doc.css("iddqd").remove
doc.css("idkfa").remove
outfile = File.new("output.html", "w")
outfile.puts doc.to_html
outfile.close
What do you mean with
into another XML without manually having to write something like puts "<HammerTime>#{array['time']}</HammerTime>"?
If you want to transform an XML element into another in a language-independent way, you can use XSLT transformations (or stylesheet). Once you have your XSLT file you can apply it with Nokogiri's Nokogiri::XSLT::Stylesheet#apply_to.

Resources