Xpath query expression: summing up a attribute over a condition - xpath

<Cities>
<city>
<name />
<country />
<population asof = "2019" />
<total> 2918695</total>
<Average_age> 28 </Average_age>
</city>
<city>
<name />
<country />
<population asof = "2020" />
<total> 78805467 </total>
<Average_age> 32 </Average_age>
</city>
</Cities>
I want to build a Xpath query which returns the total population of cities where asof is higher than 2018

Try this XPath-1.0 expression:
sum(/Cities/city[population/#asof > 2018]/total)
Or, another, less specific, version:
sum(//city[population/#asof > 2018]/total)

the expression to grab population with asof attribute greater than 2018 would be:
//population[#asof > '2018']
If you looking for <total> which is a sibling of <population> despite your indentation use following-sibling::total after the expression
otherwise use /total
lets follow the first approach so the XPath continues as:
//population[#asof > '2019']/following-sibling::total
and add /text() at the end to get text inside of desired <total> tag. additionally if you want sum of populations you can put the whole expression inside sum() function. the inside expression of sum gonna be like:
//population[#asof > '2019']/following-sibling::total/text()

Related

Get the position of an element with specific attribute value

I'm trying to get with xPath the position only of the first element which has the attribute value true.
<?xml version="1.0" encoding="UTF-8"?>
<elements>
<element attribute="false"/>
<element attribute="true"/>
<element attribute="true"/>
</elements>
What I have so fare is:
head(/elements/element[#attribute='true']/position())
Result:
1
But it should be:
2
What am I doing wrong?
position() returns the position of the element in the nodelist created by the predicate, i.e. with the false excluded. Instead of position, you can e.g. count the number of preceding elemements.
For example, this works even in XPath 1.0:
1+count(/elements/element[#attribute="true"][1]/preceding-sibling::element)
I think it's (with XPath 3):
head(index-of(/elements/element/#attribute, 'true'))
saxon-lint --xpath 'count(//element[#attribute="true"]/position())' file.xml
From Michael answer:
saxon-lint --xpath 'head(index-of(/elements/element/#attribute, "true"))' file.xml
Output
2

Self axis in xslt

<element>
<bye>do not delete me</bye>
<hello>do not delete me</hello>
<hello>delete me</hello>
<hello>delete me</hello>
</element>
Applied to the above xml, this deletes all the nodes except the first hello child of /element:
<xsl:template match="hello[not(current() = parent::element/hello[1])]" />
Why these ones doesn't work? (assuming the first node is not a text node)
<xsl:template match="hello[not(self::hello/position() = 1)]" />
<xsl:template match="hello[not(./position() = 1)]" />
Or this one?
<xsl:template match="hello[not(self::hello[1])]" />
What is the self axis selecting? Why isn't this last example equivalent to not(hello[1])?
First, you are wrong when you say that:
This deletes all the nodes except the first hello child of /element
The truth is that it deletes (if that's the correct word) any hello child of /element whose value is not the same as the value of the first one of these. For example, given:
XML
<element>
<hello>a</hello>
<hello>b</hello>
<hello>c</hello>
<hello>a</hello>
</element>
the template:
<xsl:template match="hello[not(current() = parent::element/hello[1])]" />
will match the second and the third hello nodes - but not the first or the fourth.
Now, with regard to your question: in XSLT 1.0, position() is not a valid location step - so this:
<xsl:template match="hello[not(self::hello/position() = 1)]" />
should return an error.
In XSLT 2.0, the pattern hello[not(self::hello/position() = 1)] will not match any hello element - because there is only one node on the self axis, and therefore its position is always 1.
Similarly:
<xsl:template match="hello[not(./position() = 1)]" />
is invalid in XSLT 1.0.
In XSLT 2.0, ./position() will always return 1 for the same reason as before: . is short for self::node() and there is only one such node.
Finally, this template:
<xsl:template match="hello[not(self::hello[1])]" />
is looking for a node that doesn't have (the first instance of) itself. Of course, no such node can exist.
Using position() on the RHS of the "/" operator is never useful -- and in XSLT 1.0, which is the tag on your question, it's not actually permitted.
In XSLT 2.0, the result of the expression X/position() is a sequence of integers 1..count(X). If the LHS is a singleton, like self::E, then count(X) is one so the result is a single integer 1.

xpath expression wild-cards

I have a requirement to specify wild card in the following xpath
Field[#name="/Root/Table[i]/FirstName"]
Basically the "i" would be a variable which can have either a GUID or a running number. I would like to pick up all elements that basically have the attribute pattern
"/Root/Table[*]/FirstName"
i.e. starting with "/Root/Table[" and ending with "]/FirstName". Any ideas as to how this can be done ?
Here is a sample payload:
<Package>
<Input>
<Data id="36e9f0fe3f8d4508ac20710e07cfddd4">
<Input>
<Field name="/Root/Table[1]/FirstName">Thomas</Field>
</Input>
</Data>
</Input>
</Package>
You should be able to do this using starts-with() and a makeshift ends-with() (since XPath 1.0 doesn't actually have an ends-with() function):
//*[starts-with(#name, '/Root/Table[') and
substring(#name, string-length(#name) - 11 + 1) = ']/FirstName']
Here, 11 is the length of ]/FirstName.

Does xpath support "or" function

In case below two elements do not show in same time
<a title='a' />
<b title='b' />
I want to check if one of them can show
does xpath support the 'or' function? I just want to write in one line:
//a[#title='a'] or .. #title='b' ??
XPath Operators
Select either matching nodes (your case here):
//a[#title='a'] | //b[#title='b']
Select one element with either matching attributes
//a[#title='a' or #title='b']
If you want to match either <a/> elements with #title='a' attribute or <b/> elements with #title='b' attribute, you can also match all elements and perform a test on their name:
//*[local-name(.) = 'a' and #title='a' or local-name(.) = 'b' and #title='b']

XPATH -- Result order defined by query

I have an xpath-expression like this:
element[#attr="a"] | element[#attr="b"] | element[#attr="c"] | … which is an »or« statement. So can I create an expression that guarantees the result to appear in the order as in the query, even if the elements appear in a different order in the document?
f.e. an document fragment in this order:
<doc>
<element attr="c" />
<element attr="b" />
<element attr="a" />
.
.
.
</doc>
and a result list ordered like this:
[0] <element attr="a" />
[1] <element attr="b" />
[2] <element attr="c" />
.
.
.
The | operator computes the union of its operands and with XPath 1.0 you simply get a set of nodes, the order is undefined, though most XPath APIs then return the result in document order or allow you to say which order you want or whether order matters (see for instance http://www.w3.org/TR/DOM-Level-3-XPath/xpath.html#XPathResult).
With XPath 2.0 you get a sequence of nodes ordered in document order, with XPath 2.0 if you want the order of your subexpressions you would need to use the comma operator, not the union operator i.e. element[#attr="a"] , element[#attr="b"] , element[#attr="c"].
can I create an expression that guarantees the result to appear in the
order as in the query, even if the elements appear in a different
order in the document?
Not with any XPath 1.0 engine -- they return the resulting XmlNodeList in document order.
With XPath 2.0 one can specify that a sequence is to be returned, using the comma , operator, like this:
element[#attr="a"] , element[#attr="b"] , element[#attr="c"]
Finally, If you are limited with an XPath 1.0 implementation, one way of getting the results in the desired order is to evaluate these three XPath expressions:
element[#attr="a"]
element[#attr="b"]
element[#attr="c"]
Then you can access the first result first, the second result -- second and the third result -- third.

Resources