Select multiple values and compare them using XQuery or XPath? - xpath

I have the following XML:
<items>
<item min="1" max="3"> </item>
<item min="2" max="7"> </item>
<item min="1" max="2"> </item>
</items>
And I need to check for every item if min is always smaller than max. Expected output for this input would be false as #min="2" is not smaller than #max="2".
I've tried something similar to:
every $min,$max in //item/#min, //item/#max satisfies ...
But that's obviously not working. Any ideas?

You don't need to use satisfies, just invert the condition and use not(...).
This query will return all #min attributes that fulfill the condition:
//item/#min[not(. >= //item/#max)]
If you want to use satisfies, use this query:
//item/#min[every $max in //item/#max satisfies . < $max]
If you want to know whether all elements fulfill the condition, check no elements do not fulfill it:
not(//item/#min[. >= //item/#max])
You could even do without a predicate (XPath/XQuery comparisons have a set-like semantic, this means true iff one #min value that is equal or greater than one #max value):
not(//item/#min >= //item/#max)

You haven't expressed your requirements very clearly, but it sounds to be as if you want the highest #min to be less than the lowest #max, which would be
max(item/#min) lt min(item/#max)

Related

Xpath query expression: summing up a attribute over a condition

<Cities>
<city>
<name />
<country />
<population asof = "2019" />
<total> 2918695</total>
<Average_age> 28 </Average_age>
</city>
<city>
<name />
<country />
<population asof = "2020" />
<total> 78805467 </total>
<Average_age> 32 </Average_age>
</city>
</Cities>
I want to build a Xpath query which returns the total population of cities where asof is higher than 2018
Try this XPath-1.0 expression:
sum(/Cities/city[population/#asof > 2018]/total)
Or, another, less specific, version:
sum(//city[population/#asof > 2018]/total)
the expression to grab population with asof attribute greater than 2018 would be:
//population[#asof > '2018']
If you looking for <total> which is a sibling of <population> despite your indentation use following-sibling::total after the expression
otherwise use /total
lets follow the first approach so the XPath continues as:
//population[#asof > '2019']/following-sibling::total
and add /text() at the end to get text inside of desired <total> tag. additionally if you want sum of populations you can put the whole expression inside sum() function. the inside expression of sum gonna be like:
//population[#asof > '2019']/following-sibling::total/text()

XPath 1.0 to find a sibling of an attribute node whose name is based on the attribute, but has a suffix

Given the following Xml:
<Root><Foo Bar="" Bar_Baz="12" /></Root>
Is there an XPath statement (using version 1.0 functions only) that can return Root/Foo/#Bar where there exists some sibling attribute starting with Bar (determined by context), and ending in _Baz, where that node has the value 12?
Bar should be anonymous - the XPath shouldn't care what it's called - but whatever it is called, if it is returned or not should be determined by whether X_Baz exists, and has the value of 12.
I was looking into something like:
//#*[sibling::#*[concat(local-name(), '_Baz') = '12']
But fairly obviously, this would just compare the text Bar_Baz to 12, not the value of that sibling attribute.
I'm making use of this using the .Net XmlDocument class, meaning I'm limited to Microsoft's XPath 1.0 implementation, so please don't make use of subsequent versions of the spec!
EDIT: Per the comment requesting a more diverse set of examples, see below:
<Root>
<Item Foo="" Foo_Baz="12">Yes - #Foo_Baz is 12, and #Foo exists</Item>
<Item Bar="" Bar_Baz="12">Yes - #Bar_Baz is 12, and #Bar exists</Item>
<Item Foo="" Foo_Baz="1">No - Foo_Baz != 12<Item>
<Item Baz="" Foo_Baz="12">No - No #Foo to return</Item>
<Item Foo_Baz="12">No - No #Foo to return</Item>
<Item Foo="" Foo_Haz="12">No - No #Foo_Baz node to check the value of</Item>
</Root>
Edit 2:
Looking at the first couple of answers proposed, I think there is something I haven't been clear on: the names, Foo or Bar, are unknown. The only things that are known are:
There are one or more attributes with a suffix _Baz that has the value 12
They may have siblings whose entire name is whatever came before the suffice
If they do, then that sibling is the node I want to match, provided the _Baz attribute has the value of 12
Another option :
//item[substring-after(local-name(./#*[last()]),"_")="baz" and ./#*[last()]="12"][local-name(./#*[1])=substring-before(local-name(./#*[last()]),"_")]
Shortest form :
//item[#foo or #bar][#bar_baz="12" or #foo_baz="12"]
EDIT : Massive and horrible XPath here, but it should work. It supports up to 5 attributes per item and regardless the position of these attributes inside each item tag.
//item[contains(local-name(#*[1]),"_baz") and #*[1]=12][local-name(#*[1])=substring-before(local-name(#*[1]),"_")]|//item[contains(local-name(#*[1]),"_baz") and #*[1]=12][local-name(#*[3])=substring-before(local-name(#*[1]),"_")]|//item[contains(local-name(#*[1]),"_baz") and #*[1]=12][local-name(#*[4])=substring-before(local-name(#*[1]),"_")]|//item[contains(local-name(#*[1]),"_baz") and #*[1]=12][local-name(#*[5])=substring-before(local-name(#*[1]),"_")]|//item[contains(local-name(#*[2]),"_baz") and #*[2]=12][local-name(#*[1])=substring-before(local-name(#*[2]),"_")]|//item[contains(local-name(#*[2]),"_baz") and #*[2]=12][local-name(#*[3])=substring-before(local-name(#*[2]),"_")]|//item[contains(local-name(#*[2]),"_baz") and #*[2]=12][local-name(#*[4])=substring-before(local-name(#*[2]),"_")]|//item[contains(local-name(#*[2]),"_baz") and #*[2]=12][local-name(#*[5])=substring-before(local-name(#*[2]),"_")]|//item[contains(local-name(#*[3]),"_baz") and #*[3]=12][local-name(#*[1])=substring-before(local-name(#*[3]),"_")]|//item[contains(local-name(#*[3]),"_baz") and #*[3]=12][local-name(#*[3])=substring-before(local-name(#*[3]),"_")]|//item[contains(local-name(#*[3]),"_baz") and #*[3]=12][local-name(#*[4])=substring-before(local-name(#*[3]),"_")]|//item[contains(local-name(#*[3]),"_baz") and #*[3]=12][local-name(#*[5])=substring-before(local-name(#*[3]),"_")]|//item[contains(local-name(#*[4]),"_baz") and #*[4]=12][local-name(#*[1])=substring-before(local-name(#*[4]),"_")]|//item[contains(local-name(#*[4]),"_baz") and #*[4]=12][local-name(#*[3])=substring-before(local-name(#*[4]),"_")]|//item[contains(local-name(#*[4]),"_baz") and #*[4]=12][local-name(#*[4])=substring-before(local-name(#*[4]),"_")]|//item[contains(local-name(#*[4]),"_baz") and #*[4]=12][local-name(#*[5])=substring-before(local-name(#*[4]),"_")]|//item[contains(local-name(#*[5]),"_baz") and #*[5]=12][local-name(#*[1])=substring-before(local-name(#*[5]),"_")]|//item[contains(local-name(#*[5]),"_baz") and #*[5]=12][local-name(#*[3])=substring-before(local-name(#*[5]),"_")]|//item[contains(local-name(#*[5]),"_baz") and #*[5]=12][local-name(#*[4])=substring-before(local-name(#*[5]),"_")]|//item[contains(local-name(#*[5]),"_baz") and #*[5]=12][local-name(#*[5])=substring-before(local-name(#*[5]),"_")]
Working sample (4 nodes selected) :
Strictly in terms of xpath, this expression
//Item[attribute::*[contains(local-name(), '_Baz')]='12'][attribute::*[local-name()='Foo'] | attribute::*[local-name()='Bar']]
should get you your desired output.

xpath expression wild-cards

I have a requirement to specify wild card in the following xpath
Field[#name="/Root/Table[i]/FirstName"]
Basically the "i" would be a variable which can have either a GUID or a running number. I would like to pick up all elements that basically have the attribute pattern
"/Root/Table[*]/FirstName"
i.e. starting with "/Root/Table[" and ending with "]/FirstName". Any ideas as to how this can be done ?
Here is a sample payload:
<Package>
<Input>
<Data id="36e9f0fe3f8d4508ac20710e07cfddd4">
<Input>
<Field name="/Root/Table[1]/FirstName">Thomas</Field>
</Input>
</Data>
</Input>
</Package>
You should be able to do this using starts-with() and a makeshift ends-with() (since XPath 1.0 doesn't actually have an ends-with() function):
//*[starts-with(#name, '/Root/Table[') and
substring(#name, string-length(#name) - 11 + 1) = ']/FirstName']
Here, 11 is the length of ]/FirstName.

Ruby + Nokogiri + Xpath navigate Node_Set

<Item id="item0">
<Links>
<FirstLink id="link1" target="one"/>
<SecondLink id="link2" target="two"/>
</Links>
<Data>
<String>content</String>
</Data>
</Item>
<Item id="item1">
<Links>
<FirstLink id="link1" target="two"/>
<SecondLink id="link2" target="two"/>
</Links>
<Data>
<String>content</String>
</Data>
</Item>
I have created a Nokogiri-NodeSet with this structure, i.e. a list of items with links and data children.
How can I filter any items that don't match a certain value in the 'target'-attribute of <FirstLink>?
Actually, what I want in the end is to extract the <Data><String>-Content of every <Item> that matches a certain value in it's <FirstLink> "Target"-Attribute.
I've tried several approaches already but I'm at a loss as to how to identify an element by an attribute of it's grandchild, then extracting the content of this grandchild's parent's sibling, X(.
We can build up an XPath expression to do this. Assuming we are starting from the whole XML document, rather than the node-set you already have, something like
//Item
will select all <Item> elements (I’m guessing you already have something like that to get this node-set).
Next, to select only those <Item> elements which have <Links><FirstLink> where FirstLink has a target attribute value of one:
//Item[Links/FirstLink[#target='one']]
and finally to select the Data/String children of those nodes:
//Item[Links/FirstLink[#target='one']]/Data/String
So with Nokogiri you could use something like this (where doc is your parsed document):
doc.xpath("//Item[Links/FirstLink[#target='one']]/Data/String")
or if you want to use the node-set you already have you can use a relative expression:
nodeset.xpath("self::Item[Links/FirstLink[#target='one']]/Data/String")
I completely didn't understand what your goal is. But using a guess, I am trying to show you, how to proceed in this case :
require 'nokogiri'
doc = Nokogiri::XML <<-xml
<Item id="item0">
<Links>
<FirstLink id="link1" target="one"/>
<SecondLink id="link2" target="two"/>
</Links>
<Data>
<String>content1</String>
</Data>
</Item>
<Item id="item1">
<Links>
<FirstLink id="link1" target="two"/>
<SecondLink id="link2" target="two"/>
</Links>
<Data>
<String>content2</String>
</Data>
</Item>
xml
#xpath method with the expression "//Item", will select all the Item nodes. Then those Item nodes will be passed to the #reject method to select only those nodes, that has a node called Links having the target attribute value is "one". If any of the links, either FirstLink or SecondLink has the target attribute value "one", for that nodes grandparent node Item will be selected.
node.at("//Links/FirstLink")['target'] will give you the string say "one" which is a value of target attribute of the node, FirstLink of first Item nodes , then "two" from the second Item node. The part ['any vaue'] in node.at("//Links/FirstLink")['target']['any vaue'] is a call to the String#[] method.
Remember below approach will give you the flexibility of the use regular expression too.
nodeset = doc.xpath("//Item").reject do |node|
node.at("//Links/FirstLink")['target']['any vaue']
end
Now nodeset contains only the required Item nodes. Now I use #map, passing each item node inside it to collect the content of the String node. Then #at method with an expression //Data/String, will select the String node. Then #text, will give you the content of each String node.
nodeset.map { |n| n.at('//Data/String').text } # => ["content1"]

XPATH -- Result order defined by query

I have an xpath-expression like this:
element[#attr="a"] | element[#attr="b"] | element[#attr="c"] | … which is an »or« statement. So can I create an expression that guarantees the result to appear in the order as in the query, even if the elements appear in a different order in the document?
f.e. an document fragment in this order:
<doc>
<element attr="c" />
<element attr="b" />
<element attr="a" />
.
.
.
</doc>
and a result list ordered like this:
[0] <element attr="a" />
[1] <element attr="b" />
[2] <element attr="c" />
.
.
.
The | operator computes the union of its operands and with XPath 1.0 you simply get a set of nodes, the order is undefined, though most XPath APIs then return the result in document order or allow you to say which order you want or whether order matters (see for instance http://www.w3.org/TR/DOM-Level-3-XPath/xpath.html#XPathResult).
With XPath 2.0 you get a sequence of nodes ordered in document order, with XPath 2.0 if you want the order of your subexpressions you would need to use the comma operator, not the union operator i.e. element[#attr="a"] , element[#attr="b"] , element[#attr="c"].
can I create an expression that guarantees the result to appear in the
order as in the query, even if the elements appear in a different
order in the document?
Not with any XPath 1.0 engine -- they return the resulting XmlNodeList in document order.
With XPath 2.0 one can specify that a sequence is to be returned, using the comma , operator, like this:
element[#attr="a"] , element[#attr="b"] , element[#attr="c"]
Finally, If you are limited with an XPath 1.0 implementation, one way of getting the results in the desired order is to evaluate these three XPath expressions:
element[#attr="a"]
element[#attr="b"]
element[#attr="c"]
Then you can access the first result first, the second result -- second and the third result -- third.

Resources