Get the position of an element with specific attribute value - xpath

I'm trying to get with xPath the position only of the first element which has the attribute value true.
<?xml version="1.0" encoding="UTF-8"?>
<elements>
<element attribute="false"/>
<element attribute="true"/>
<element attribute="true"/>
</elements>
What I have so fare is:
head(/elements/element[#attribute='true']/position())
Result:
1
But it should be:
2
What am I doing wrong?

position() returns the position of the element in the nodelist created by the predicate, i.e. with the false excluded. Instead of position, you can e.g. count the number of preceding elemements.
For example, this works even in XPath 1.0:
1+count(/elements/element[#attribute="true"][1]/preceding-sibling::element)

I think it's (with XPath 3):
head(index-of(/elements/element/#attribute, 'true'))

saxon-lint --xpath 'count(//element[#attribute="true"]/position())' file.xml
From Michael answer:
saxon-lint --xpath 'head(index-of(/elements/element/#attribute, "true"))' file.xml
Output
2

Related

Xpath - Find specific element, print all elements of that node

Given the following Xpath to an element
/std:Batch/BatchSection/ContractPartner/Contractor/Contract/contractNumber
How can I print out all subelements of the node Contract
where sequenceNumber= 12345?
I tried
xmllint --xpath "string(/std:Batch/BatchSection/ContractPartner/Contractor/Contract/contractNumber[contractNumber='12345'])" test.xml
However, that is an invalid XPath expression. How to fix that?
Example input:
<std:Batch xmlns:std="http://www.test.com/contractBatch" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<year>2020</year>
<batchType>3</batchType>
<runDate>2020-04-11</runDate>
<text>Datatest</text>
<jobInfo>Test</jobInfo>
<BatchSection>
<addedAtDate>2020-04-11</addedAtDate>
<ContractPartner>
<contractDealerAG>44444</contractDealerAG>
<contractorType/>
<isoCountry>NL</isoCountry>
<language>EN</language>
<Contractor>
<contractor>44444</contractor>
<Contract>
<contractor>44444</contractor>
<sequenceNumber>12345</sequenceNumber>
<info1>abcd</info1>
</Contract>
</Contractor>
</ContractPartner>
</BatchSection>
</std:Batch>
Desired output (where sequenceNumber=12345):
<Contract>
<contractor>44444</contractor>
<sequenceNumber>12345</sequenceNumber>
<info1>abcd</info1>
</Contract>
You have to deal with the dreaded namespaces, unfortunately... Try it like this:
xmllint --xpath "//*[local-name()='Contract'] [.//*[local-name()='sequenceNumber'][./text()='12345']]" test.xml
and see if it works.
I'm assuming you mean sequenceNumber, as per xml example, if that's the case then you may need to do something like this to return the node Contract:
xmllint --xpath "//sequenceNumber[.="12345"]/.." test.xml

XPath 1.0 lowest value regardless of ordering

I have this data, and I'm looking for the lowest bid.
<root>
<current_bid>$1.00</current_bid>
<current_bid>$2.00</current_bid>
<current_bid>$3.00</current_bid>
<current_bid>$4.00</current_bid>
<current_bid>$5.00</current_bid>
</root>
This is my XPath 1.0 attempt:
//current_bid[not(translate (., '$,.','') > translate(//current_bid, '$,.',''))]
And it works fine (returns only the $1.00 bid) with the data above, but if I change the ordering of the data to let's say this here:
<root>
<current_bid>$5.00</current_bid>
<current_bid>$1.00</current_bid>
<current_bid>$2.00</current_bid>
<current_bid>$3.00</current_bid>
<current_bid>$4.00</current_bid>
</root>
Then it gives a wrong output (returns all values).
Shouldn't the order be irrelevant when I use //current_bid, since it queries the whole document?
Also: how would I go if I wanted the second lowest bid?
XPath 1.0 processes nodes in document order so there's no way to sort them with pure XPath. It can be done with XSL processing
This approach works only if minimum is at first position.
Xpath:
'//current_bid[(position()<=last()) and not(translate (., "$,.","") > translate(//current_bid, "$,.",""))]'
Sample:
<root>
<current_bid>$1.00</current_bid>
<current_bid>$5.00</current_bid>
<current_bid>$2.00</current_bid>
<current_bid>$4.00</current_bid>
<current_bid>$3.00</current_bid>
</root>
Testing on command line with xmllint
xmllint --xpath '//current_bid[(position()<=last()) and not(translate (., "$,.","") > translate(//current_bid, "$,.",""))]' test.xml ; echo
Result:
<current_bid>$1.00</current_bid>
If the number of nodes is known in advance perhaps it could be done with nested conditions but would give a very complex XPath expression.

Get the non-empty element using XPATH

I have the following XML
<?xml version = "1.0" encoding = "UTF-8"?>
<root>
<group>
<p1></p1>
</group>
<group>
<p1>value1</p1>
</group>
<group>
<p1></p1>
</group>
</root>
is it possible to get the last the node with value? in this case get the value of the second group/p1.
This xpath should work as well:
//group/p1[string-length(text()) > 0]
How about something like /root/group/p1[text() and not(../following-sibling::group/p1/text())]
In other words: get the p1 elements that have text and whose group parents are not followed by group nodes that have non-empty p1 elements.
You may also use [not(node())] Selector.
Example: //group/p1[not(node())]
It actually can be simplified as below:
//group/p1[string-length() > 0] => element text is non-empty
//group/p1[string-length() = 6] => element text has length 6

Select text from a node and omit child nodes

I need to select the text in a node, but not any child nodes.
the xml looks like this
<a>
apples
<b><c/></b>
pears
</a>
If I select a/text(), all I get is "apples". How would I retreive "apples pears" while omitting <b><c/></b>
Well the path a/text() selects all text child nodes of the a element so the path is correct in my view. Only if you use that path with e.g. XSLT 1.0 and <xsl:value-of select="a/text()"/> it will output the string value of the first selected node. In XPath 2.0 and XQuery 1.0: string-join(a/text()/normalize-space(), ' ') yields the string apples pears so maybe that helps for your problem. If not then consider to explain in which context you use XPath or XQuery so that a/text() only returns the (string?) value of the first selected node.
To retrieve all the descendants I advise using the // notation. This will return all text descendants below an element. Below is an xquery snippet that gets all the descendant text nodes and formats it like Martin indicated.
xquery version "1.0";
let $a :=
<a>
apples
<b><c/></b>
pears
</a>
return normalize-space(string-join($a//text(), " "))
Or if you have your own formatting requirements you could start by looping through each text element in the following xquery.
xquery version "1.0";
let $a :=
<a>
apples
<b><c/></b>
pears
</a>
for $txt in $a//text()
return $txt
If I select a/text(), all i get is
"apples". How would i retreive "apples
pears"
Just use:
normalize-space(/)
Explanation:
The string value of the root node (/) of the document is the concatenation of all its text-node descendents. Because there are white-space-only text nodes, we need to eliminate these unwanted text nodes.
Here is a small demonstration how this solution works and what it produces:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/">
'<xsl:value-of select="normalize-space()"/>'
</xsl:template>
</xsl:stylesheet>
when this transformation is applied on the provided XML document:
<a>
apples
<b><c/></b>
pears
</a>
the wanted, correct result is produced:
'apples pears'

XPath 1 query and attributes name

First question: is there any way to get the name of a node's attributes?
<node attribute1="value1" attribute2="value2" />
Second question: is there a way to get attributes and values as value pairs? The situation is the following:
<node attribute1="10" attribute2="0" />
I want to get all attributes where value>0 and this way: "attribute1=10".
First question: is there any way to
get the name of a node's attributes?
<node attribute1="value1"
attribute2="value2" />
Yes:
This XPath expression (when node is the context (current) node)):
name(#*[1])
produces the name of the first attribute (the ordering may be implementation - dependent)
and this XPath expression (when node is the context (current) node)):
name(#*[2])
produces the name of the second attribute (the ordering may be implementation - dependent).
Second question: is there a way to get
attributes and values as value pairs?
The situation is the following:
<node attribute1="10" attribute2="0"
/>
I want to get all attributes where
value>0 and this way: "attribute1=10".
This XPath expression (when the attribute named "attribute1" is the context (current) node)):
concat(name(), '=', .)
produces the string:
attribute1=value1
and this XPath expression (when the node node is the context (current) node)):
#*[. > 0]
selects all attributes of the context node, whose value is a number, greater than 0.
In XPath 2.0 one can combine them in a single XPath expression:
#*[number(.) > 0]/concat(name(.),'=',.)
to get (in this particular case) this result:
attribute1=10
If you are using XPath 1.0, which is less powerful, you'll need to embed the XPath expression in a hosting language, such as XSLT. The following XSLT 1.0 thransformation :
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/*">
<xsl:for-each select="#*[number(.) > 0]">
<xsl:value-of select="concat(name(.),'=',.)"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
when applied on this XML document:
<node attribute1="10" attribute2="0" />
Produces exactly the same result:
attribute1=10
It depends a little bit on the context, I believe. In most cases, I expect you'd have to query "#*", enumerate over the items, and call "name()" - but it may work in some tests.
Re the edit - you can do:
#*[number(.)>0]
to find attributes matching your criteria, and:
concat(name(),'=',.)
to display the output. I don't think you can do both at once, though. What is the context here? xslt? what?

Resources