Get the non-empty element using XPATH - xpath

I have the following XML
<?xml version = "1.0" encoding = "UTF-8"?>
<root>
<group>
<p1></p1>
</group>
<group>
<p1>value1</p1>
</group>
<group>
<p1></p1>
</group>
</root>
is it possible to get the last the node with value? in this case get the value of the second group/p1.

This xpath should work as well:
//group/p1[string-length(text()) > 0]

How about something like /root/group/p1[text() and not(../following-sibling::group/p1/text())]
In other words: get the p1 elements that have text and whose group parents are not followed by group nodes that have non-empty p1 elements.

You may also use [not(node())] Selector.
Example: //group/p1[not(node())]

It actually can be simplified as below:
//group/p1[string-length() > 0] => element text is non-empty
//group/p1[string-length() = 6] => element text has length 6

Related

Get the position of an element with specific attribute value

I'm trying to get with xPath the position only of the first element which has the attribute value true.
<?xml version="1.0" encoding="UTF-8"?>
<elements>
<element attribute="false"/>
<element attribute="true"/>
<element attribute="true"/>
</elements>
What I have so fare is:
head(/elements/element[#attribute='true']/position())
Result:
1
But it should be:
2
What am I doing wrong?
position() returns the position of the element in the nodelist created by the predicate, i.e. with the false excluded. Instead of position, you can e.g. count the number of preceding elemements.
For example, this works even in XPath 1.0:
1+count(/elements/element[#attribute="true"][1]/preceding-sibling::element)
I think it's (with XPath 3):
head(index-of(/elements/element/#attribute, 'true'))
saxon-lint --xpath 'count(//element[#attribute="true"]/position())' file.xml
From Michael answer:
saxon-lint --xpath 'head(index-of(/elements/element/#attribute, "true"))' file.xml
Output
2

xpath return default value ,if value of attribute not found using text()

sample_xml='<employees>\
<person id="p1">\
<name value="Alice">ALICE</name>\
</person>\
<person id="p2">\
<name value="Alice">BOB</name>\
</person>\
<person id="p3">\
<name value="Alice"/>\
</person>\
</employees>'
data = [
[f'{sample_xml}']
]
df = spark.createDataFrame(data, ['data'])
df=df.selectExpr(
'xpath(data,"/employees/person/name[#value=\'Alice\']/text()") test'
)
this gives expcted ["ALICE", "BOB"]
Problem:
I want my result to be ["ALICE", "BOB","NA"]
i.e for empty path like below
<name value="Alice"/>
I want to return a default NA .
is it possible to achieve this ?
Regards
With XPath itself this is not possible. It can only return you the actual values of the matching nodes or nothing if no match.
In order to get NA or any other data that is not actually contained in the XML, you should wrap the basic XPath request with some additional, external code to return the customized output in case of no match.
In XPath 2.0, use /employees/person/name[#value=\'Alice\'] /(string(text()), 'NA')[1]".
It can't be done in XPath 1.0. In XPath 1.0 there's no such thing as a sequence of strings; you can only return a sequence of nodes, and you can only return nodes that are actually present in the input document.

Xpath query expression: summing up a attribute over a condition

<Cities>
<city>
<name />
<country />
<population asof = "2019" />
<total> 2918695</total>
<Average_age> 28 </Average_age>
</city>
<city>
<name />
<country />
<population asof = "2020" />
<total> 78805467 </total>
<Average_age> 32 </Average_age>
</city>
</Cities>
I want to build a Xpath query which returns the total population of cities where asof is higher than 2018
Try this XPath-1.0 expression:
sum(/Cities/city[population/#asof > 2018]/total)
Or, another, less specific, version:
sum(//city[population/#asof > 2018]/total)
the expression to grab population with asof attribute greater than 2018 would be:
//population[#asof > '2018']
If you looking for <total> which is a sibling of <population> despite your indentation use following-sibling::total after the expression
otherwise use /total
lets follow the first approach so the XPath continues as:
//population[#asof > '2019']/following-sibling::total
and add /text() at the end to get text inside of desired <total> tag. additionally if you want sum of populations you can put the whole expression inside sum() function. the inside expression of sum gonna be like:
//population[#asof > '2019']/following-sibling::total/text()

reading value actual from xml path

I have the following xml structure:
<?xml version="1.0" encoding="UTF-8"? >
<sql>
<Assoc name="sql">
<RecArray name="contents">
<Record name="contents">
<String name="PackType" > < value actual="P" />< /String >
<String name="SerialNumber" > < value actual="0002" />< /String >
<String name="VersionNumber" > < value actual="02" /></ String >
</Record>
</RecArray>
</Assoc>
</sql>
how can i get the values of each of the String nodes like i need to know the value inside the node of "SerialNumber"
Regards,
If you wan to get all <value> elements inside each <String> element, you can try this XPath query :
/sql/Assoc/RecArray/Record/String/value
precise path will be better performance wise. If you're looking for simpler query, this will also work :
//String/value
or if you mean by values of each of the String nodes is value of actual attribute, you can do this way :
/sql/Assoc/RecArray/Record/String/value/#actual
Finally, if none of above meet your requirement, please update the question and provide expected output from sample XML posted.
i figured it out
as it is multi String elements (that was clear in the question), i should use the following
/sql/Assoc/RecArray/Record/String[2]/value/#actual

counting elements in xml with Nokogiri

I'd like to understand why count gives me 5?
If I'm at the root element and I want to know my children, it is supposed to give me 2.
doc = Nokogiri::XML(open('link..to....element.xml'))
root = doc.root.children.count
puts root
<element>
<name>Married with Children</name>
<name>Married with Children</name>
</element>
You get 5 as the result because there are five child nodes under the root <element> node. There are two <name> nodes and three text nodes that each consist of whitespace; one between the opening <element> and the first <name>, one between the two <names>, and one between the second <name> and the closing </element>:
doc.root.children.each do |c|
p c
end
output:
#<Nokogiri::XML::Text:0x80544a04 "\n ">
#<Nokogiri::XML::Element:0x80544900 name="name" children=[#<Nokogiri::XML::Text:0x8054470c "Married with Children">]>
#<Nokogiri::XML::Text:0x80544554 "\n ">
#<Nokogiri::XML::Element:0x80544478 name="name" children=[#<Nokogiri::XML::Text:0x80544284 "Married with Children">]>
#<Nokogiri::XML::Text:0x805440cc "\n">
If you use the noblanks option when parsing Nokogiri won’t include these whitespace nodes:
doc = Nokogiri::XML(open('link..to....element.xml')) { |c| c.noblanks }
Now doc.root.children.count will equal 2, only the two <name> element nodes will be included.

Resources