XPath - Selecting elements that equal a value - xpath

In Xpath, I am wanting to select elements that equal a specific value.
Sample XML data:
<aaa id="11" >
<aaa id="21" >
<aaa id="31" ></aaa>
<bbb id="32" >
<aaa id="41" ></aaa>
<bbb id="42" ></bbb>
<ccc id="43" ></ccc>
<ddd id="44" >qwerty</ddd>
<ddd id="45" ></ddd>
<ddd id="46" ></ddd>
</bbb>
</aaa>
<bbb id="22" >
<aaa id="33" >qwerty</aaa>
<bbb id="34" ></bbb>
<ccc id="35" ></ccc>
<ddd id="36" ></ddd>
<ddd id="37" ></ddd>
<ddd id="38" ></ddd>
</bbb>
<ccc id="23" >qwerty</ccc>
<ccc id="24" ></ccc>
</aaa>
Now, using the XPath:
//ccc[.='qwerty']
I get the correct, expected results:
Name Value
ccc qwerty
Now, using the XPath:
//aaa[.='qwerty']
I get unexpected results:
Name Value
aaa
aaa qwerty
And what I am particularly interested, is how to select any element with that value
XPath:
//*[.='qwerty']
I get very strange unexpected results:
Name Value
aaa
bbb
ddd qwerty
bbb qwerty
aaa qwerty
ccc qwerty
Can someone explain these results, and how to fix my XPath expressions to get more expected results?

The XPath spec. defines the string value of an element as the concatenation (in document order) of all of its text-node descendents.
This explains the "strange results".
"Better" results can be obtained using the expressions below:
//*[text() = 'qwerty']
The above selects every element in the document that has at least one text-node child with value 'qwerty'.
//*[text() = 'qwerty' and not(text()[2])]
The above selects every element in the document that has only one text-node child and its value is: 'qwerty'.

Try
//*[text()='qwerty'] because . is your current element

Better use //*[normalize-space(text()) = 'qwerty'] . If there are any whitespaces around the text, they will be removed.

Related

Xpath query expression: summing up a attribute over a condition

<Cities>
<city>
<name />
<country />
<population asof = "2019" />
<total> 2918695</total>
<Average_age> 28 </Average_age>
</city>
<city>
<name />
<country />
<population asof = "2020" />
<total> 78805467 </total>
<Average_age> 32 </Average_age>
</city>
</Cities>
I want to build a Xpath query which returns the total population of cities where asof is higher than 2018
Try this XPath-1.0 expression:
sum(/Cities/city[population/#asof > 2018]/total)
Or, another, less specific, version:
sum(//city[population/#asof > 2018]/total)
the expression to grab population with asof attribute greater than 2018 would be:
//population[#asof > '2018']
If you looking for <total> which is a sibling of <population> despite your indentation use following-sibling::total after the expression
otherwise use /total
lets follow the first approach so the XPath continues as:
//population[#asof > '2019']/following-sibling::total
and add /text() at the end to get text inside of desired <total> tag. additionally if you want sum of populations you can put the whole expression inside sum() function. the inside expression of sum gonna be like:
//population[#asof > '2019']/following-sibling::total/text()

Nested xpath: How do I use the result of an XPath expression as value?

I am having the following XML structure:
<xml>
<value>b</value>
<objects>
<object>
<value>a</value>
</object>
<object>
<value>b</value>
</object>
</objects>
</xml>
What I want is to select the second object, based on the value in the xml.
This XPath works:
//xml/objects/object[value = 'b']
This XPath does not return results:
//xml/objects/object[value = //xml/value/text()]
Are nested XPath expressions not supported?
They are, but the search within a predicate is always relative to the context you currently in.
Currently you start looking for an <xml/> element which is a child of <object/> and as there is none it will yield an empty result set.
Using ../ or parent::* you can go an axis step up to the parent and can select the required value:
//xml/objects/object[value = ../../value]

Ruby + Nokogiri + Xpath navigate Node_Set

<Item id="item0">
<Links>
<FirstLink id="link1" target="one"/>
<SecondLink id="link2" target="two"/>
</Links>
<Data>
<String>content</String>
</Data>
</Item>
<Item id="item1">
<Links>
<FirstLink id="link1" target="two"/>
<SecondLink id="link2" target="two"/>
</Links>
<Data>
<String>content</String>
</Data>
</Item>
I have created a Nokogiri-NodeSet with this structure, i.e. a list of items with links and data children.
How can I filter any items that don't match a certain value in the 'target'-attribute of <FirstLink>?
Actually, what I want in the end is to extract the <Data><String>-Content of every <Item> that matches a certain value in it's <FirstLink> "Target"-Attribute.
I've tried several approaches already but I'm at a loss as to how to identify an element by an attribute of it's grandchild, then extracting the content of this grandchild's parent's sibling, X(.
We can build up an XPath expression to do this. Assuming we are starting from the whole XML document, rather than the node-set you already have, something like
//Item
will select all <Item> elements (I’m guessing you already have something like that to get this node-set).
Next, to select only those <Item> elements which have <Links><FirstLink> where FirstLink has a target attribute value of one:
//Item[Links/FirstLink[#target='one']]
and finally to select the Data/String children of those nodes:
//Item[Links/FirstLink[#target='one']]/Data/String
So with Nokogiri you could use something like this (where doc is your parsed document):
doc.xpath("//Item[Links/FirstLink[#target='one']]/Data/String")
or if you want to use the node-set you already have you can use a relative expression:
nodeset.xpath("self::Item[Links/FirstLink[#target='one']]/Data/String")
I completely didn't understand what your goal is. But using a guess, I am trying to show you, how to proceed in this case :
require 'nokogiri'
doc = Nokogiri::XML <<-xml
<Item id="item0">
<Links>
<FirstLink id="link1" target="one"/>
<SecondLink id="link2" target="two"/>
</Links>
<Data>
<String>content1</String>
</Data>
</Item>
<Item id="item1">
<Links>
<FirstLink id="link1" target="two"/>
<SecondLink id="link2" target="two"/>
</Links>
<Data>
<String>content2</String>
</Data>
</Item>
xml
#xpath method with the expression "//Item", will select all the Item nodes. Then those Item nodes will be passed to the #reject method to select only those nodes, that has a node called Links having the target attribute value is "one". If any of the links, either FirstLink or SecondLink has the target attribute value "one", for that nodes grandparent node Item will be selected.
node.at("//Links/FirstLink")['target'] will give you the string say "one" which is a value of target attribute of the node, FirstLink of first Item nodes , then "two" from the second Item node. The part ['any vaue'] in node.at("//Links/FirstLink")['target']['any vaue'] is a call to the String#[] method.
Remember below approach will give you the flexibility of the use regular expression too.
nodeset = doc.xpath("//Item").reject do |node|
node.at("//Links/FirstLink")['target']['any vaue']
end
Now nodeset contains only the required Item nodes. Now I use #map, passing each item node inside it to collect the content of the String node. Then #at method with an expression //Data/String, will select the String node. Then #text, will give you the content of each String node.
nodeset.map { |n| n.at('//Data/String').text } # => ["content1"]

Xpath test for ancestor attribute not equal string

I'm trying to test if an attribute on an ancestor of an element not equal a string.
Here is my XML...
<aaa att="xyz">
<bbb>
<ccc/>
</bbb>
</aaa>
<aaa att="mno">
<bbb>
<ccc/>
</bbb>
</aaa>
If I'm acting on element ccc, I'm trying to test that its grandparent aaa #att doesn't equal "xyz".
I currently have this...
ancestor::aaa[not(contains(#att, 'xyz'))]
Thanks!
Assuming that by saying an ancestor of an element you're referring to an element with child elements, this XPath expression should do:
//*[*/ccc][#att != 'xyz']
It selects
all nodes
that have at least one <ccc> grandchild node
and that have an att attribute whose value is not xyz.
Update: Restricted test to grandparents of <ccc>.
Update 2: Adapted to your revised question:
//ccc[../parent::aaa/#att != 'xyz']
Selects
all <ccc> elements
that have a grandparent <aaa> with its attribute att set to a value that is not xyz

XPath Get first element of subset

I have XML like this:
<AAA>
<BBB aaa="111" bbb="222">
<CCC/>
<CCC xxx="555" yyy="666" zzz="777"/>
</BBB>
<BBB aaa="999">
<CCC xxx="qq"/>
<DDD xxx="ww"/>
<EEE xxx="oo"/>
</BBB>
<BBB>
<DDD xxx="oo"/>
</BBB>
</AAA>
I want to get first <CCC> element. But with XPath expression //*/CCC[1] I have got two <CCC> elements. Each of them is the first elemet in <BBB></BBB> context. How to get first element in subset?
This one should work for you:
(//*/CCC)[1]
I want to get first element. But with
XPath expression //*/CCC[1] I have
got two elements. Each of them is the
first elemet in <BBB></BBB> context.
How to get first element in subset?
This is a FAQ:
The [] operator has a higher precedence (binds stronger) than the // abbreviation.
Use:
(//CCC)[1]
This selects the first (in document order) CCC element in the XML document.

Resources