XMLStarlet: selecting nodes using less than / greater than - xpath

Does XMLStarlet let you use a less-than/greater-than operator to filter on an attribute value? For example, consider a document like this:
<xml>
<list>
<node name="a" val="x" />
<node name="b" val="y" />
<node name="c" val="z" />
etc.
</list>
{code}
Is there a way to select nodes whose value is greater than "x"? This XPath does not seem to work with XMLStarlet 1.5.0:
//node[#val > 'x']
Nor does this:
//node[#value gt 'x']

Comparing Characters like they were numbers (ASCII values/UniCode codepoints) is (unfortunately) impossible in XPath 1.0, look at this SO question if interested in more details.
So if your #val attributes are sorted in the XML, you can achieve this with a simple XPath expression selecting all nodes after an 'equal' match:
//node[#val='x']/following-sibling::node
If not, you'd have to use an XSLT-Stylesheet. Luckily, XMLStarlet has the ability to apply XSL-Stylesheets. I cite from their overview:
Apply XSLT stylesheets to XML documents (including EXSLT support, and passing parameters to stylesheets)
So you have the possibility to apply an xsl:stylesheet to achieve the desired result using xsl:sort, which is capable of sorting by characters.
<xsl:template match="/list">
<xsl:for-each select="//node"> <!-- all nodes sorted by 'val' attribute' -->
<xsl:sort select="#val" data-type="text" order="ascending" case-order="upper-first"/>
<xsl:value-of select="#name" /> <!-- or whatever output you desire -->
</xsl:for-each>
</xsl:template>

Related

Xpath 1.0 select first node back to ancestor

My XML as below :
<Query>
<Comp>
<Pers>
<Emp>
<Job>
<Code>Not selected</Code>
</Job>
</Emp>
<Emp>
<Job>
<Code>selected</Code>
</Job>
</Emp>
</Pers>
</Comp>
</Query>
I have an XPath : /Query/Comp/Pers/Emp/Job[Code='selected']/../../../..
The result should only have one < Emp > that meet condition
<Query>
<Comp>
<Pers>
<Emp>
<Job>
<Code>selected</Code>
</Job>
</Emp>
</Pers>
</Comp>
</Query>
How could I get the result?
The system doesn't work with ancestor::*. I have to use '/..' to populate the ancestor.
You shouldn't have to use ancestor here to get the <emp> tag, the following expath should select any <emp> tag that meets your criteria:
/Query/Comp/Pers/Emp[Job[Code='selected']]
Note: You say your result should be one, which will be correct in this case but this expression will return all nodes that match your criteria
Edit:
You've stated you're using XSLT and you've given me a bit of a snippet below, but I'm still not 100% sure of your actual structure. You can use the XPath to identify all the nodes that are not equal to selected, and then use XSLT to copy everything except those.
// Copy's all nodes in the input to the output
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()" />
</xsl:copy>
</xsl:template>
// Matches specifically the Emp records that are not equal to selected and
// applies no action to them to they do not appear in the output
<xsl:template match="/Query/Comp/Pers/Emp[Job[Code!='selected']]" />
The two templates above would transform your input to your desired output!

XSLT Function Return Type

Originally: **How to apply XPath query to a XML variable typed as element()* **
I wish to apply XPath queries to a variable passed to a function in XSLT 2.0.
Saxon returns this error:
Type error at char 6 in xsl:value-of/#select on line 13 column 50 of stackoverflow_test.xslt:
XTTE0780: Required item type of result of call to f:test is element(); supplied value has item type text()
This skeleton of a program is simplified but, by the end of its development, it is meant to pass an element tree to multiple XSLT functions. Each function will extract certain statistics and create reports from the tree.
When I say apply XPath queries, I mean I wish to have the query consider the base element in the variable... if you please... as if I could write {count(doc("My XSLT tree/element variable")/a[1])}.
Using Saxon HE 9.7.0.5.
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:f="f:f">
<xsl:template match="/root">
<xsl:variable name="first" as="element()*">
<xsl:copy-of select="(./a[1])" />
</xsl:variable>
<html>
<xsl:copy-of select="f:test($first)" />
</html>
</xsl:template>
<xsl:function name="f:test" as="element()*">
<xsl:param name="frstElem" as="element()*" />
<xsl:value-of select="count($frstElem/a)" />
<!-- or any XPath expression -->
</xsl:function>
</xsl:stylesheet>
Some example data
<root>
<a>
<b>
<c>hi</c>
</b>
</a>
<a>
<b>
<c>hi</c>
</b>
</a>
</root>
Possibly related question: How to apply xpath in xsl:param on xml passed as input to xml
What you are doing is perfectly correct, except that you have passed an a element to the function, and the function is looking for an a child of this element, and with your sample data this will return an empty sequence.
If you want f:test() to return the number of a elements in the sequence that is the value of $frstElem, you can use something like
<xsl:value-of select="count($frstElem/self::a)" />
instead of using the (implicit) child:: axis.

XPath selection excluding element from a child node

I need to select all <next> nodes, but excluding <element4> from each and add a new element in its place (it would be a replace). I'm working with php.
<root>
<next>
<node>
<element1>text</element1>
<element2>text</element1>
<element3>text</element1>
<element4>text</element1>
</node>
<node>
<element1>text</element1>
<element2>text</element1>
<element3>text</element1>
<element4>text</element1>
</node>
</next>
</root>
so it should look like this:
<next>
<node>
<element1>text</element1>
<element2>text</element1>
<element3>text</element1>
<new>text</new>
</node>
<node>
<element1>text</element1>
<element2>text</element1>
<element3>text</element1>
<new><int xmlns="foo.bar">0</int></new>
</node>
</next>
Any tips? Thank you!
XPath is a selection language: it selects nodes or atomic items from an input sequence, it is the language of choice to make a selection over XML or hierarchical data, as SQL is (usually) the language of choice with relational databases.
As such, you can exclude elements from the selection, but you cannot update or change the original sequence. It is possible to do a limited transformation (i.e., turn a string in an integer), but this will change what is selected, it will not change the source. While XPath (namely version 2.0 and up) can "create" atomic values on the fly, it cannot create new elements.
This is possible and will return numeric values in XPath 2.0:
/next/node/number(.)
But this is not possible:
/next/node/(if (element4) then create-element(.) else .)
However, in XSLT 2.0 and up you can create a function that creates elements. As said above, XPath selects, and if you want to change the document, you can create a new document using XSLT (the T standing for Transformation).
Something like the following (partial XSLT 2.0, you need add headers):
<xsl:function name="f:create">
<xsl:param name="node" />
<xsl:param name="name" />
<xsl:choose>
<xsl:when test="name($node) = $name">
<xsl:element name="{if(number($node)) then 'int' else 'new'}">
<xsl:value-of select="$node" />
</xsl:element>
</xsl:when>
<xsl:otherwise><xsl:copy-of select="$node" /></xsl:otherwise>
</xsl:choose>
</xsl:function>
<xsl:template match="node">
<!-- now XPath, with help of XSLT function, can conditionally create nodes -->
<xsl:copy-of select="child::*/create(., 'element4')" />
</xsl:template>
<!-- boilerplate code, typically used to recursively copy non-matched nodes -->
<xsl:template match="node() | #*">
<xsl:copy>
<xsl:apply-templates select="#* | node()" />
</xsl:copy>
</xsl:template>
Note that, while this shows how you can create a different element using XPath and an XSLT function, it does not change the source, it changes the output. Also, it is not a recommended practice, as in XSLT the same pattern is more easily done by simply doing:
<!-- the more specific match -->
<xsl:template match="element4[number(.)]">
<new>
<int xmlns="foo.bar">
<xsl:value-of select="number(.)" />
</int>
</new>
<xsl:template>
<!-- XSLT will automatically fallback to this one if the former fails -->
<xsl:template match="element4">
<new><xsl:copy-of select="node()" /></new>
</xsl:template>
<!-- or this one, if both the former fail -->
<xsl:template match="node() | #*">
<xsl:copy>
<xsl:apply-templates select="#* | node()" />
</xsl:copy>
</xsl:template>

Sorting XPath results in the same order as multiple select parameters

I have an XML document as follows:
<objects>
<object uid="0" />
<object uid="1" />
<object uid="2" />
</objects>
I can select multiple elements using the following query:
doc.xpath("//object[#uid=2 or #uid=0 or #uid=1]")
But this returns the elements in the same order they're declared in the XML document (uid=0, uid=1, uid=2) and I want the results in the same order as I perform the XPath query (uid=2, uid=0, uid=1).
I'm unsure if this is possible with XPath alone, and have looked into XSLT sorting, but I haven't found an example that explains how I could achieve this.
I'm working in Ruby with the Nokogiri library.
There is no way in XPath 1.0 to specify the order of the selected nodes.
XPath 2.0 allows a sequence of nodes with any specific order:
//object[#uid=2], //object[#uid=1]
evaluates to a sequence in which all object items with #uid=2 precede all object items with #uid=1
If one doesn't have anXPath 2.0 engine available, it is still possible to use XSLT in order to output nodes in any desired order.
In this specific case the sequence of the following XSLT instructions:
<xsl:copy-of select="//object[#uid=2]"/>
<xsl:copy-of select="//object[#uid=1]"/>
produces the desired output:
<object uid="2" /><object uid="1" />
I am assuming you are using XPath 1.0. The W3C spec says:
The primary syntactic construct in XPath is the expression. An expression matches the production Expr. An expression is evaluated to yield an object, which has one of the following four basic types:
* node-set (an unordered collection of nodes without duplicates)
* boolean (true or false)
* number (a floating-point number)
* string (a sequence of UCS characters)
So I don't think you can re-order simply using XPath. (The rest of the spec defines document order and reverse document order, so if the latter does what you want you can get it using the appropriate axis (e.g. preceding).
In XSLT you can use <xsl:sort> using the name() of the attribute. The XSLT FAQ is very good and you should find an answer there.
An XSLT example:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:param name="pSequence" select="'2 1'"/>
<xsl:template match="objects">
<xsl:for-each select="object[contains(concat(' ',$pSequence,' '),
concat(' ',#uid,' '))]">
<xsl:sort select="substring-before(concat(' ',$pSequence,' '),
concat(' ',#uid,' '))"/>
<xsl:copy-of select="."/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Output:
<object uid="2" /><object uid="1" />
I don't think there is a way to do it in xpath but if you wish to switch to XSLT you can use the xsl:sort tag:
<xsl:for-each select="//object[#uid=1 or #uid=2]">
<xsl:sort: select="#uid" data-type="number" />
{insert new logic here}
</xsl:for-each>
more complete info here:
http://www.w3schools.com/xsl/el_sort.asp
This is how I'd do it in Nokogiri:
require 'nokogiri'
xml = '<objects><object uid="0" /><object uid="1" /><object uid="2" /></objects>'
doc = Nokogiri::XML(xml)
objects_by_uid = doc.search('//object[#uid="2" or #uid="1"]').sort_by { |n| n['uid'].to_i }.reverse
puts objects_by_uid
Running that outputs:
<object uid="2"/>
<object uid="1"/>
An alternative to the search would be:
objects_by_uid = doc.search('//object[#uid="2" or #uid="1"]').sort { |a,b| b['uid'].to_i <=> a['uid'].to_i }
if you don't like using sort_by with the reverse.
XPath is useful for locating and retrieving the nodes but often the filtering we want to do gets too convoluted in the accessor so I let the language do it, whether it's Ruby, Perl or Python. Where I put the filtering logic is based on how big the XML data set is and whether there are a lot of different uid values I'll want to grab. Sometimes letting the XPath engine do the heavy lifting makes sense, other times its easier to let XPath grab all the object nodes and filter in the calling language.

XPath 1 query and attributes name

First question: is there any way to get the name of a node's attributes?
<node attribute1="value1" attribute2="value2" />
Second question: is there a way to get attributes and values as value pairs? The situation is the following:
<node attribute1="10" attribute2="0" />
I want to get all attributes where value>0 and this way: "attribute1=10".
First question: is there any way to
get the name of a node's attributes?
<node attribute1="value1"
attribute2="value2" />
Yes:
This XPath expression (when node is the context (current) node)):
name(#*[1])
produces the name of the first attribute (the ordering may be implementation - dependent)
and this XPath expression (when node is the context (current) node)):
name(#*[2])
produces the name of the second attribute (the ordering may be implementation - dependent).
Second question: is there a way to get
attributes and values as value pairs?
The situation is the following:
<node attribute1="10" attribute2="0"
/>
I want to get all attributes where
value>0 and this way: "attribute1=10".
This XPath expression (when the attribute named "attribute1" is the context (current) node)):
concat(name(), '=', .)
produces the string:
attribute1=value1
and this XPath expression (when the node node is the context (current) node)):
#*[. > 0]
selects all attributes of the context node, whose value is a number, greater than 0.
In XPath 2.0 one can combine them in a single XPath expression:
#*[number(.) > 0]/concat(name(.),'=',.)
to get (in this particular case) this result:
attribute1=10
If you are using XPath 1.0, which is less powerful, you'll need to embed the XPath expression in a hosting language, such as XSLT. The following XSLT 1.0 thransformation :
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/*">
<xsl:for-each select="#*[number(.) > 0]">
<xsl:value-of select="concat(name(.),'=',.)"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
when applied on this XML document:
<node attribute1="10" attribute2="0" />
Produces exactly the same result:
attribute1=10
It depends a little bit on the context, I believe. In most cases, I expect you'd have to query "#*", enumerate over the items, and call "name()" - but it may work in some tests.
Re the edit - you can do:
#*[number(.)>0]
to find attributes matching your criteria, and:
concat(name(),'=',.)
to display the output. I don't think you can do both at once, though. What is the context here? xslt? what?

Resources