How to traverse JSON in XSLT 3.0 when map vs array is unknown - xpath

I am working with JSON data in XSLT 3.0 for the first time and have run into issue traversing a subset of the data depending on whether it is in map or array form.
Map:
"book": { "id": 32, "name": "Good Book" }
Array:
"book": [{ "id": 32, "name": "Good Book" }]
If I know whether it will be a map or array ahead of time, I can select a value from the data like "name", but the syntax for each is different.
For map:
?book?name
For array:
?book?*?name
The problem is that in our data set, there are multiple book nodes that come in a mix of map and array form and using either of the above syntaxes will result in an error when applied to the "wrong" form.
Am I missing a version of the selection syntax that would work on both forms?
Is there a way to test whether the book node is in map vs array form before selecting? (I've tried things like castable but it doesn't seem appropriate for this case.)
I have a workaround using try/catch:
<xsl:try select="?book?name">
<xsl:catch>
<xsl:value-of select="?book?*?name"/>
</xsl:catch>
</xsl:try>
But I'm wondering if there is a "better" method for solving this issue.
Thanks for any ideas.

The best way in XSLT to deal with variable structure is through template rules. Unfortunately match patterns for maps and arrays aren't very expressive, but it's still a viable approach:
<xsl:apply-templates select="?book" mode="process-book"/>
<xsl:template match=".[. instance of array(*)]" mode="process-book">
<xsl:apply-templates select="?*"/>
</xsl:template>
<xsl:template match=".[. instance of map(*)]" mode="process-book">
<xsl:value-of select="?name"/>
</xsl:template>

You can check the type of ?book e.g. with if (?book instance of array(*)) then ?book?*?name else ?book?name. The type check for a map is e.g. ?book instance of map(*).

Related

Split methods on XPath 1.0

I use 'XPath', how I can simulate split method?
I read documentation and I know that XPath version 1.0 not have this method.
I have document contains this tags:
<TestCategoryModule>
<ItemCategories>
<![CDATA[Birthday Travel,Travel]]>
</ItemCategories>
</TestCategoryModule>
<TestCategoryModule2>
<ItemCategories>
<![CDATA[Travel]]>
</ItemCategories>
</TestCategoryModule2>
I want filter item by 'ItemCategories', but when I filtered by world 'Travel', return 2 item. I use this filter "ItemCategories[contains(text(), 'Travel')]".
I want that I filter by "Travel" return only second item. How can do it?
Use:
/*/*/*[contains(concat(',', ., ','), ',Travel,')]
Here is XSLT-based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select=
"/*/*/*[contains(concat(',', ., ','), ',Travel,')]"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on this XML document (essentially the provided XML fragment, extended with one more test case and made a well-formed XML document:
<t>
<TestCategoryModule>
<ItemCategories>Birthday Travel,Travel</ItemCategories>
</TestCategoryModule>
<TestCategoryModule2>
<ItemCategories>Birthday Travel</ItemCategories>
</TestCategoryModule2>
<TestCategoryModule2>
<ItemCategories>Travel</ItemCategories>
</TestCategoryModule2>
</t>
The wanted, correct result is produced:
<ItemCategories>Birthday Travel,Travel</ItemCategories>
<ItemCategories>Travel</ItemCategories>
I was a little wrong, or poorly described problumu. The problem is that the categories are stored as a string. I have three items, the first one contains categories: (Birthday Travel,Travel), second: (Birthday Travel), third: (Travel). When I request filtering for the word "Travel", I need to get the first and third items, but I get all three items, because all items contain world "Travel".
You actually don't need split() for the problem that you've described. If you want to match Travel but not Travel,Travel you want = instead of contains(). To deal with the whitespace around your CDATA sections, wrap it in normalize-space().
All put together, try ItemCategories[normalize-space(text()) = 'Travel'].

using preceding-sibling with with xsl:sort

I'm trying to use preceding-sibling and following-sibling with a subset of records with a sort on them. The problem that the preceding / following brings back values from the original xml order:
<Salaries>
<Salary>
<Base>1000</Base>
<CreatedDate xmlns:d7p1="http://schemas.datacontract.org/2004/07/System">
<d7p1:DateTime>2016-01-09T14:38:54.8440764Z</d7p1:DateTime>
<d7p1:OffsetMinutes>0</d7p1:OffsetMinutes>
</CreatedDate>
</Salary>
<Salary>
<Base>2000</Base>
<CreatedDate xmlns:d7p1="http://schemas.datacontract.org/2004/07/System">
<d7p1:DateTime>2015-01-09T14:38:54.8440764Z</d7p1:DateTime>
<d7p1:OffsetMinutes>0</d7p1:OffsetMinutes>
</CreatedDate>
</Salary>
<Salary>
<Base>3000</Base>
<CreatedDate xmlns:d7p1="http://schemas.datacontract.org/2004/07/System">
<d7p1:DateTime>2017-01-09T14:38:54.8440764Z</d7p1:DateTime>
<d7p1:OffsetMinutes>0</d7p1:OffsetMinutes>
</CreatedDate>
</Salary>
</Salaries>
When I use a sort under a for-each (Salaries/Salary) with a c# function to add offset minutes into a date and convert to a long number 201701010000 for example(to make manipulation in xslt easier).
<xsl:sort select="number(cs:Convertdatetolong(cs:AddOffsetMinutes(substring(p:CreatedDate/d5p1:DateTime,1,19),p:CreatedDate/d5p1:OffsetMinutes)))" order="ascending"/>
The sort works perfectly and I get the records out in the following order:
2000
1000
3000
The problem comes if I use preceding-sibling / preceding (and following). I would expect the first record (2000) to have no preceding record and the last record (3000) to have no following.
However when I use the preceding / following I get the previous record and the next record from the original XML:
2000 (preceding - 1000 / following - 3000)
1000 (preceding - / following - 2000)
3000 (preceding - 2000 / following - )
I would like to be able to compare against the previous record (in the sorted order) and the current record (in the sorted order):
2000 (preceding - / following - 1000)
1000 (preceding - 2000 / following 3000)
3000 (preceding - 1000 / following - )
I've tried preceding-sibling and preceding
<xsl:value-of select="preceding::p:Salary[1]/p:Base"/>
<xsl:value-of select="preceding-sibling::p:Salary[1]/p:Base"/>
<xsl:value-of select="preceding::p:Salary[position()=1]/p:Base"/>
(the salary is in a different namespace (p)
Is this actually possible or do I have to use variables to save the previous record's data to compare against?
Any ideas gratefully received. I'm using xslt 1.0
Although XSLT/XPath often talks of a "sequence of nodes", it's actually more accurate to think of it as a "sequence of node references" - because, for example, the same node can appear more than once in the sequence. When you sort a sequence of node references, you don't change the individual nodes in any way, you only change the sequence. That means the nodes still exist in their original tree exactly where they were before, and their parents, siblings, and descendants are exactly as they were before.
What you want is not the preceding and following siblings of the node, but the nodes that come before and after it in the sorted sequence, which is a quite different thing.
One way to do this is to construct a new tree containing copies of the original nodes, which you get, for example, if you do
<xsl:variable name="x">
<xsl:for-each ...>
<xsl:sort ...>
<xsl:copy-of select="."/>
The sibling relationships of the copied nodes will then reflect the sorted order. There's the minor problem that in XSLT 1.0, $x is a result tree fragment so you have to convert it to a node-set using the exslt:node-set() function.
In fact in XSLT 1.0 that's probably the only way of doing it, because the XSLT 1.0 data model only has node sets, not sequences, which means there is no way of capturing and processing a sequence of nodes in anything other than document order. The 2.0 model has much more flexibility and power. Upgrade if you can - XSLT 1.0 is approaching 20 years old.
Thanks to Michael for the answer. Posted here for completeness. Complicated because of the name spaces in use in the xml:
<!-- Puts the whole of the Salary Node into a variable-->
<xsl:variable name="SALARY" >
<xsl:copy-of select="p:Salaries" />
</xsl:variable>
<!-- Puts the the required key data into a node-set with the correct sort applied-->
<xsl:variable name="SAL">
<xsl:for-each select="msxsl:node-set($SALARY)//p:Salary">
<xsl:sort select="number(cs:Convertdatetolong(cs:AddOffsetMinutes(substring(p:CreatedDate/d5p1:DateTime,1,19),p:CreatedDate/d5p1:OffsetMinutes)))" order="ascending"/>
<xsl:copy-of select="." />
</xsl:for-each>
</xsl:variable>
<!-- Quick Output-->
<xsl:for-each select="msxsl:node-set($SAL)//p:Salary">
<xsl:text>Sa:</xsl:text>
<xsl:value-of select="position()" />
<xsl:text>Preceding:</xsl:text>
<xsl:value-of select="preceding-sibling::p:Salary[1]/p:Base"/>
<xsl:value-of select="$newline" />
<xsl:text>Current:</xsl:text>
<xsl:value-of select="p:Base"/>
<xsl:value-of select="$newline" />
<xsl:text>Following:</xsl:text>
<xsl:value-of select="following-sibling::p:Salary[1]/p:Base"/>
<xsl:value-of select="$newline"/>
</xsl:for-each>
The preceding-sibling axis gets the preceding siblings of the context node in document order.
To refer to the preceding siblings of a node after sorting, you will need to store the sorted nodes in a variable first - and, in XSLT 1.0, convert the variable into a node-set.

Use xpath to locate a complex element with attributes and children

Given this XML
<well bulkShift="0.000000" diameter="5.000000" hidden="false" name="67-1-TpX-10" filename="67-1-TpX-10.well">
<metadata/>
<unit>ftUS</unit>
<colour blue="1.000000" green="1.000000" hue="" red="1.000000"/>
<tvd clip="false"/>
<associatedcheckshot>25-1-X-14</associatedcheckshot>
<associatedwelllog>HDRA_67-1-TpX-10</associatedwelllog>
<associatedwelllog>NPHI_67-1-TpX-10</associatedwelllog>
</well>
I can select the element with this XPath
//well[#bulkShift=0 and #diameter=5 and #hidden='false' and #name='67-1-TpX-10' and #filename='67-1-TpX-10.well']
However I need to be much more specific in that I need to find the element with these specific child nodes given that the child elements (metadata,unit,colour, etc) can appear in any order inside the element.
Ideally I'd like to be able to select this node with only a single XPath query.
Can anyone help?
This template match also childs and attributed on childs
<xsl:template match="well[#hidden='false'][./unit='ftUS' or ./tvd/#clip='false']">
well found!
</xsl:template>
or in one go:
<xsl:template match="well[#hidden='false' and (./unit='ftUS' or ./tvd/#clip='false')]">
well found!
</xsl:template>
You can add the test for children like the test for attributes to your predicate
e.g.:
//well[#bulkShift=0 and #diameter=5 and #hidden='false' and #name='67-1-TpX-10' and #filename='67-1-TpX-10.well']
[metadata and unit and colour]
Having a list off predicates [ predicate1 ][ predicate2 ] is the same as have one with and operation.

Sorting XPath results in the same order as multiple select parameters

I have an XML document as follows:
<objects>
<object uid="0" />
<object uid="1" />
<object uid="2" />
</objects>
I can select multiple elements using the following query:
doc.xpath("//object[#uid=2 or #uid=0 or #uid=1]")
But this returns the elements in the same order they're declared in the XML document (uid=0, uid=1, uid=2) and I want the results in the same order as I perform the XPath query (uid=2, uid=0, uid=1).
I'm unsure if this is possible with XPath alone, and have looked into XSLT sorting, but I haven't found an example that explains how I could achieve this.
I'm working in Ruby with the Nokogiri library.
There is no way in XPath 1.0 to specify the order of the selected nodes.
XPath 2.0 allows a sequence of nodes with any specific order:
//object[#uid=2], //object[#uid=1]
evaluates to a sequence in which all object items with #uid=2 precede all object items with #uid=1
If one doesn't have anXPath 2.0 engine available, it is still possible to use XSLT in order to output nodes in any desired order.
In this specific case the sequence of the following XSLT instructions:
<xsl:copy-of select="//object[#uid=2]"/>
<xsl:copy-of select="//object[#uid=1]"/>
produces the desired output:
<object uid="2" /><object uid="1" />
I am assuming you are using XPath 1.0. The W3C spec says:
The primary syntactic construct in XPath is the expression. An expression matches the production Expr. An expression is evaluated to yield an object, which has one of the following four basic types:
* node-set (an unordered collection of nodes without duplicates)
* boolean (true or false)
* number (a floating-point number)
* string (a sequence of UCS characters)
So I don't think you can re-order simply using XPath. (The rest of the spec defines document order and reverse document order, so if the latter does what you want you can get it using the appropriate axis (e.g. preceding).
In XSLT you can use <xsl:sort> using the name() of the attribute. The XSLT FAQ is very good and you should find an answer there.
An XSLT example:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:param name="pSequence" select="'2 1'"/>
<xsl:template match="objects">
<xsl:for-each select="object[contains(concat(' ',$pSequence,' '),
concat(' ',#uid,' '))]">
<xsl:sort select="substring-before(concat(' ',$pSequence,' '),
concat(' ',#uid,' '))"/>
<xsl:copy-of select="."/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Output:
<object uid="2" /><object uid="1" />
I don't think there is a way to do it in xpath but if you wish to switch to XSLT you can use the xsl:sort tag:
<xsl:for-each select="//object[#uid=1 or #uid=2]">
<xsl:sort: select="#uid" data-type="number" />
{insert new logic here}
</xsl:for-each>
more complete info here:
http://www.w3schools.com/xsl/el_sort.asp
This is how I'd do it in Nokogiri:
require 'nokogiri'
xml = '<objects><object uid="0" /><object uid="1" /><object uid="2" /></objects>'
doc = Nokogiri::XML(xml)
objects_by_uid = doc.search('//object[#uid="2" or #uid="1"]').sort_by { |n| n['uid'].to_i }.reverse
puts objects_by_uid
Running that outputs:
<object uid="2"/>
<object uid="1"/>
An alternative to the search would be:
objects_by_uid = doc.search('//object[#uid="2" or #uid="1"]').sort { |a,b| b['uid'].to_i <=> a['uid'].to_i }
if you don't like using sort_by with the reverse.
XPath is useful for locating and retrieving the nodes but often the filtering we want to do gets too convoluted in the accessor so I let the language do it, whether it's Ruby, Perl or Python. Where I put the filtering logic is based on how big the XML data set is and whether there are a lot of different uid values I'll want to grab. Sometimes letting the XPath engine do the heavy lifting makes sense, other times its easier to let XPath grab all the object nodes and filter in the calling language.

XPath 1 query and attributes name

First question: is there any way to get the name of a node's attributes?
<node attribute1="value1" attribute2="value2" />
Second question: is there a way to get attributes and values as value pairs? The situation is the following:
<node attribute1="10" attribute2="0" />
I want to get all attributes where value>0 and this way: "attribute1=10".
First question: is there any way to
get the name of a node's attributes?
<node attribute1="value1"
attribute2="value2" />
Yes:
This XPath expression (when node is the context (current) node)):
name(#*[1])
produces the name of the first attribute (the ordering may be implementation - dependent)
and this XPath expression (when node is the context (current) node)):
name(#*[2])
produces the name of the second attribute (the ordering may be implementation - dependent).
Second question: is there a way to get
attributes and values as value pairs?
The situation is the following:
<node attribute1="10" attribute2="0"
/>
I want to get all attributes where
value>0 and this way: "attribute1=10".
This XPath expression (when the attribute named "attribute1" is the context (current) node)):
concat(name(), '=', .)
produces the string:
attribute1=value1
and this XPath expression (when the node node is the context (current) node)):
#*[. > 0]
selects all attributes of the context node, whose value is a number, greater than 0.
In XPath 2.0 one can combine them in a single XPath expression:
#*[number(.) > 0]/concat(name(.),'=',.)
to get (in this particular case) this result:
attribute1=10
If you are using XPath 1.0, which is less powerful, you'll need to embed the XPath expression in a hosting language, such as XSLT. The following XSLT 1.0 thransformation :
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/*">
<xsl:for-each select="#*[number(.) > 0]">
<xsl:value-of select="concat(name(.),'=',.)"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
when applied on this XML document:
<node attribute1="10" attribute2="0" />
Produces exactly the same result:
attribute1=10
It depends a little bit on the context, I believe. In most cases, I expect you'd have to query "#*", enumerate over the items, and call "name()" - but it may work in some tests.
Re the edit - you can do:
#*[number(.)>0]
to find attributes matching your criteria, and:
concat(name(),'=',.)
to display the output. I don't think you can do both at once, though. What is the context here? xslt? what?

Resources