Why is xsl:value-of behaving completely different depending on the xsl:stylesheet version? - xpath

Looking at this XML:
<?xml version="1.0" encoding="UTF-8"?>
<root>
root
<child>
child 1
<grandchild>
grandchild 1
</grandchild>
<yetanothergrandchild>
yetanothergrandchild 1
</yetanothergrandchild>
</child>
<child>
child 2
<grandchild>
grandchild 2
</grandchild>
<yetanothergrandchild>
yetanothergrandchild 2
</yetanothergrandchild>
</child>
</root>
and that XSL
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fo="http://www.w3.org/1999/XSL/Format">
<xsl:output media-type="text" omit-xml-declaration="yes"/>
<xsl:template match="/">
<fo:root>
<fo:layout-master-set>
<fo:simple-page-master master-name="simple"
page-height="29.7cm"
page-width="21cm"
margin-top="1cm"
margin-bottom="2cm"
margin-left="2.5cm"
margin-right="2.5cm">
<fo:region-body margin-top="3cm"/>
<fo:region-before extent="3cm"/>
<fo:region-after extent="1.5cm"/>
</fo:simple-page-master>
</fo:layout-master-set>
<fo:page-sequence master-reference="simple">
<fo:flow flow-name="xsl-region-body">
<fo:block font-size="12pt"
font-family="sans-serif"
line-height="15pt"
space-after.optimum="3pt"
text-align="justify">
<xsl:value-of select="root/child/grandchild"/>
<xsl:value-of select="root/child/yetanothergrandchild"/>
</fo:block>
</fo:flow>
</fo:page-sequence>
</fo:root>
</xsl:template>
</xsl:stylesheet>
If I put the xsl:stylesheet version to 1.0, the output is:
grandchild 1 yetanothergrandchild 1
If I put it to 2.0, the output is:
grandchild 1 grandchild 2 yetanothergrandchild 1 yetanothergrandchild 2
Of course, I read already through various lists of differences in between XSL T 1 and 2 but I cannot find any hint of a change which could cause that.
Can somebody tell me how and why that behaves that differently?

See https://www.w3.org/TR/xslt20/#backwards and then https://www.w3.org/TR/xslt20/#incompatibilities saying
J.1.3 Backwards Compatibility Behavior Some XSLT constructs behave
differently under XSLT 2.0 depending on whether backwards compatible
behavior is enabled. In these cases, the behavior may be made
compatible with XSLT 1.0 by ensuring that backwards compatible
behavior is enabled (which is done using the [xsl:]version attribute).
These constructs are as follows:
If the xsl:value-of instruction has no separator attribute, and the
value of the select expression is a sequence of more than one item,
then under XSLT 2.0 all items in the sequence will be output, space
separated, while in XSLT 1.0, all items after the first will be
discarded.
...

In XSLT 1.0 the xsl:value-of instruction returns the string-value of the first node in the selected node-set.
In XSLT 2.0 the instruction returns the value of every node in the selected sequence, separated by a space or by the string specified in the separator attribute.
These are my formulations, the specs are more difficult to follow.

Related

Find first occurence of node without traversing all of them using XPaths and elementpath library

I use elementpath to handle some XPath queries. I have an XML with linear structure which contains a unique id attribute.
<items>
<item id="1">...</item>
<item id="2">...</item>
<item id="3">...</item>
... 500k elements
<item id="500003">...</item>
</items>
I want the parser to find the first occurence without traversing all the nodes. For example, I want to select //items/item[#id = '3'] and stop after iterating over 3 nodes only (not over 500k of nodes). It would be a nice optimization for many cases.
An example using XSLT 3 streaming with a static parameter for the XPath, then using xsl:iterate with xsl:break to produce the "early exit" once the first item sought has been found would be
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all">
<xsl:param name="path" static="yes" as="xs:string" select="'items/item[#id = ''3'']'"/>
<xsl:output method="xml"/>
<xsl:mode on-no-match="shallow-copy" streamable="yes"/>
<xsl:template match="/" name="xsl:initial-template">
<xsl:iterate _select="{$path}">
<xsl:if test="position() = 1">
<xsl:copy-of select="."/>
<xsl:break/>
</xsl:if>
</xsl:iterate>
</xsl:template>
</xsl:stylesheet>
You can run it with SaxonC EE (unfortunately streaming is only supported by EE) and Python with e.g.
import saxonc
with saxonc.PySaxonProcessor(license=True) as proc:
print("Test SaxonC on Python")
print(proc.version)
xslt30proc = proc.new_xslt30_processor()
xslt30proc.set_parameter('path', proc.make_string_value('/items/item[#id = "2"]'))
transformer = xslt30proc.compile_stylesheet(stylesheet_file='iterate-items-early-exit1.xsl')
xdm_result = transformer.apply_templates_returning_value(source_file='items-sample1.xml')
if transformer.exception_occurred:
print(transformer.error_message)
print(xdm_result)

XSLT how to display/output duplicate values based on different element node and attributes

I am trying to output duplicate values across different nodes and value by using XSLT. I want the node element to be dynamic so it can track different value after the namespace prefix, for example: car:ID, car:Name, car:Location_name, or more. I know i can use the function Local-Name(.) but I am not sure how to apply to my XSLT logic. please help
the sample XML as follow:
<car:root xmlns:car="com.sample">
<Car_Input_Request>
<car:Car_Details>
<car:ID>Car_001</car:ID>
<car:Name>Fastmobile</car:Name>
<car:Local_Name>New York</car:Local_Name>
<car:Transmission_Reference_Type>
<car:ID car:type="Transmission_Reference_Type">Automatic</car:ID>
</car:Transmission_Reference_Type>
</car:Car_Details>
</Car_Input_Request>
<Car_Input_Request>
<car:Car_Details>
<car:ID>Car_002</car:ID>
<car:Name>Slowmobile</car:Name>
<car:Local_Name>New York</car:Local_Name>
<car:Transmission_Reference_Type>
<car:ID car:type="Transmission_Reference_Type">Manual</car:ID>
</car:Transmission_Reference_Type>
</car:Car_Details>
</Car_Input_Request>
<Car_Input_Request>
<car:Car_Details>
<car:ID>Car_001</car:ID>
<car:Name>Fastmobile</car:Name>
<car:Local_Name>New York</car:Local_Name>
<car:Transmission_Reference_Type>
<car:ID car:type="Transmission_Reference_Type">Automatic</car:ID>
</car:Transmission_Reference_Type>
</car:Car_Details>
</Car_Input_Request>
</car:root>
The XSLT used:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:car="com.sample"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs"
version="3.0">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:value-of select="//car:ID[ let $v:=string(.),$t:=#car:type return not( preceding::car:ID[string(.) = $v and #car:type=$t]) ]/(let $v:=string(.), $t:=#car:type,$c:=1+count(following::car:ID[string(.)=$v and $t=#car:type]) ,$c:=1+count(following::car:*[string(.)=$v]) return if ($c > 1) then concat( string(.), ' occurs ', $c, ' times for type ', $t, '
') else () )"/>
</xsl:template>
</xsl:stylesheet>
output shown from xslt:
Car_001 occurs 2 times for type
Automatic occurs 2 times for type Transmission_Reference_Type
But I want it to show
Car_001 occurs 2 times for type ID
Fastmobile occurs 2 times for type Name
Automatic occurs 2 times for type Transmission_Reference_Type
New York occurs 3 times for type Local_Name
If you are looking for an XSLT solution (rather than a single line XPath expression), you could make use of xsl:for-each-group with a composite key:
<xsl:stylesheet
xmlns:car="com.sample"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs"
expand-text="yes"
version="3.0">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:for-each-group select="//car:Car_Details/*" group-by="local-name(), normalize-space()" composite="yes">
<xsl:if test="current-group()[2]">
<xsl:text>{normalize-space()} occurs {count(current-group())} times for {local-name()}
</xsl:text>
</xsl:if>
</xsl:for-each-group>
</xsl:template>
</xsl:stylesheet>

xslt Merge children of 2 parents and Store in a variable

I receive an xml input like this:
<root>
<Tuple1>
<child11></child11>
<child12></child12>
<child13></child13>
</Tuple1>
<Tuple1>
<child11></child11>
<child12></child12>
</Tuple1>
<Tuple2>
<child21></child21>
<child22></child22>
</Tuple2>
<Tuple2>
<child21></child21>
<child22></child22>
<child23></child23>
</Tuple2>
</root>
How can I merge the children of each Tuple1 with children of Tuple2 and store them in a variable that will be used in the rest of xslt document?
First tuple1 will be merged with first Tuple2 and second Tuple1 will be merged with 2nd Tuple2 and so on. The merged output that should be stored in variable would look like this in memory:
<root>
<Tuple1>
<child11></child11>
<child12></child12>
<child13></child13>
<child21></child21>
<child22></child22>
</Tuple1>
<Tuple1>
<child11></child11>
<child12></child12>
<child21></child21>
<child22></child22>
<child23></child23>
</Tuple1>
</root>
Is variable the best option? If we use variable, is it created once or it is created every time called?
I use xslt 3.0 so solution for any version can help.
Thanks and I appreciate your help)
Here is a minimal XSLT 3 approach:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="root">
<xsl:variable name="temp1">
<xsl:copy>
<xsl:apply-templates select="Tuple1"/>
</xsl:copy>
</xsl:variable>
<xsl:copy-of select="$temp1"/>
</xsl:template>
<xsl:template match="Tuple1">
<xsl:copy>
<xsl:copy-of select="*, let $pos := position() return ../Tuple2[$pos]/*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Online at https://xsltfiddle.liberty-development.net/bdxtqg, I have used XPath's let instead of XSLT's xsl:variable to store the position to access the specific Tuple2.

Ancestor-or-self issue at top level

With the following XML, the top level is returning the nodes of all levels. There are no ancestors for the top level, so why am I getting it’s children?
XML
<?xml version="1.0" encoding="ISO-8859-1"?>
<WBSs>
<WBS GUID="2">
<Name>work</Name>
<WBSs>
<WBS GUID="1">
<Name>Wall</Name>
<ParentWBS>2</ParentWBS>
</WBS>
<WBS GUID="2">
<Name>South Wall</Name>
<ParentWBS>2</ParentWBS>
</WBS>
<WBS GUID="3">
<Name>North Wall</Name>
<ParentWBS>2</ParentWBS>
</WBS>
</WBSs>
</WBS>
</WBSs>
XPATH
Note: Apply template is on .//WBS
<xsl:variable name="wbsCode" select=".//ancestor-or-self::WBS/#GUID[1]"/>
Note: I have an xslt instruction immediately following the xpath expression to strinify the nodes and include ‘.’.
Result
2.1.2.3
2.1
2.2
2.3
Desired result
2
2.1
2.2
2.3
XSLT
<xsl:variable name="WBS_ELEMENT_TABLE">
<xsl:apply-templates select=".//WBS" mode="I_WBS_ELEMENT">
<xsl:with-param name="ProjectId" select="$ProjectId"/>
</xsl:apply-templates>
</xsl:variable>
<xsl:template match="WBS" mode="I_WBS_ELEMENT">
<xsl:param name="ProjectId"/>
<xsl:variable name="wbsCode" select=".//ancestor-or-self::WBS/#GUID[1]"/>
<xsl:variable name="temp2" select="string-join(($wbsCode), '.')"/>
<WBS_ELEMENT>
<xsl:value-of select="$temp2"/>
</WBS_ELEMENT>
</xsl:template>
// means /descendant-or-self::node()/, so //ancestor::* means ./descendant-or-self::node()/ancestor::x which finds all ancestors of all descendants, i.e. everything.
Get out of that habit of using // without thinking about what it means!

XSLT: poor performance due to complex XPath expression?

At some point in an XSLT program, I have the following:
<xsl:for-each select="tags/tag">
<xsl:apply-templates select="//shows/show[film=//films/film[tag=current()/#id]/#id]|//shows/show[group=//groups/group[film=//films/film[tag=current()/#id]/#id]/#id]">
<xsl:sort select="date" data-type="text" order="ascending"/>
<xsl:sort select="time" data-type="text" order="ascending"/>
</xsl:apply-templates>
</xsl:for-each>
It seems that the XPath expression //shows/show[film=//films/film[tag=current()/#id]/#id]|//shows/show[group=//groups/group[film=//films/film[tag=current()/#id]/#id]/#id], which is rather complex, considerably slows down the execution of the program (compared to the execution time before adding the quoted piece of code -- processing the same data, of course).
Do you think this is normal due to the relatively complex nature of the expression, and do you see how I could improve it so it performs better?
NB: in the XPath expression, film and //films/film, group and //groups/group refer to distinct elements.
See below a stripped-down sample of the XML input.
<program>
<tags>
<tag id="1">Tag1</tag>
<tag id="2">Tag2</tag>
<tag id="3">Tag3</tag>
</tags>
<films>
<film id="1">
Film1
<tag>2</tag><!-- References: /program/tags/tag/#id=2 -->
</film>
<film id="2">
Film2
<tag>1</tag><!-- References: /program/tags/tag/#id=1 -->
</film>
<film id="3">
Film3
<tag>3</tag><!-- References: /program/tags/tag/#id=3 -->
</film>
<film id="4">
Film4
<tag>3</tag><!-- References: /program/tags/tag/#id=3 -->
</film>
</film>
<groups>
<group id="1">
<film>3</film><!-- References: /program/films/film/#id=3 -->
<film>4</film><!-- References: /program/films/film/#id=4 -->
</group>
</groups>
<shows>
<show id="1"><!-- Show with film (=simple) -->
<film>1</film><!-- References: /program/films/film/#id=1 -->
<date>2011-12-12</date>
<time>12:00</time>
</show>
<show id="2"><!-- Show with group (=combined) -->
<group>1</group><!-- References: /program/groups/group/#id=1 -->
<date>2011-12-12</date>
<time>14:00</time>
</show>
</shows>
</program>
Explanations:
A tag is a property attached to a film (in fact, it's rather a category).
A group is an enumeration of films.
A show references either a film or a group.
What I want: for each tag, I'm looking for the shows referencing a film having the current tag and the shows referencing a group where at least one of the films has the current tag.
Double slashes in XPath are performance and CPU hogs when working with large documents (since every node in the document must be evaluated). If you can replace it with either an absolute or relative path you should have a noticeable improvement. If you can post the input schema and required output, we could be more specific?
e.g. With an absolute path
//shows/show[film=//films/film[tag=current()/#id]/#id]
becomes
/myroot/somepath/shows/show[film=/myroot/somepath/films/film[tag=current()/#id]/#id]
or if the shows and films are relative to the current node
./relativexpath/shows/show[film=./relativexpath/somepath/films/film[tag=current()/#id]/#id]
The answer by nonnb very likely points to the problem, however not really to an efficient solution ("cheaper" axis are better, but that alone doesn't make the speed such as when indexing data).
Note that the big problem is that the XPath expression predicate does another full traversal of the tree for each evaluation. You should use keys for stuff like this; this will (in most or even all XSLT implementations) make an indexed lookup possible, thereby reducing the runtime a lot.
Define keys for the films, groups and shows by id:
<xsl:key name="filmByTag" match="film" use="tag" />
<xsl:key name="groupsByFilm" match="group" use="tag" />
<xsl:key name="showsByFilm" match="show" use="film" />
<xsl:key name="showsByGroup" match="show" use="group" />
And then use it like this (not tested, but you should get the idea):
<xsl:variable name="films" select="key('filmByTag', #id)/#id" />
<xsl:apply-templates select="key('showsByFilm', $films)/#id|key('showsByGroups', key('groupsByFilm', $films)/#id)/#id">
Your XPath expression seems to be doing a three-way join so unless it's optimized the performance is likely to be O(n^3) in the size of the source document. Optimization involves replacing the serial searches of the document by indexed lookups. There are two ways of achieving this: you can hand-optimize it by replacing the filter expressions with calls on the key() function (as indicated by Dimitre), or you can use an optimizing XSLT processor such as Saxon-EE, which should do the same optimizations automatically.
Define a key with xsl:key and then use the key function for the cross reference instead of that comparison you currently have. Show us a sample of the XML so that we can understand its structure, then we can help with concrete code.
Here are two complete solutions that should exhibit better performance:
Do note: Better performance will be registered on sufficiently large input samples only. On small input samples it isn't worth it to optimize.
I. Not using // (but not using keys)
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="vFilms" select="/*/films/film"/>
<xsl:variable name="vShows" select="/*/shows/show"/>
<xsl:variable name="vGroups" select="/*/groups/group"/>
<xsl:variable name="vTags" select="/*/tags/tag"/>
<xsl:template match="/*">
<xsl:for-each select="$vTags">
<xsl:apply-templates select=
"$vShows
[film
=
$vFilms
[tag=current()/#id]
/#id
or
group
=
$vGroups
[film
=
$vFilms
[tag=current()/#id]
/#id
]
/#id
]
">
<xsl:sort select="date" data-type="text" order="ascending"/>
<xsl:sort select="time" data-type="text" order="ascending"/>
</xsl:apply-templates>
</xsl:for-each>
</xsl:template>
<xsl:template match="show">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>
II. Using keys
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:key name="kShowByFilmId" match="show"
use="film"/>
<xsl:key name="kShowByGroupId" match="show"
use="group"/>
<xsl:key name="kGroupByFilmId" match="group"
use="film"/>
<xsl:key name="kFilmByTag" match="film"
use="tag"/>
<xsl:variable name="vTags" select="/*/tags/tag"/>
<xsl:template match="/*">
<xsl:for-each select="$vTags">
<xsl:apply-templates select=
"key('kShowByFilmId',
key('kFilmByTag', current()/#id)/#id
)
|
key('kShowByGroupId',
key('kGroupByFilmId',
key('kFilmByTag', current()/#id)/#id
)
/#id
)
">
<xsl:sort select="date" data-type="text" order="ascending"/>
<xsl:sort select="time" data-type="text" order="ascending"/>
</xsl:apply-templates>
</xsl:for-each>
</xsl:template>
<xsl:template match="show">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>

Resources