Xpath 1.0 nodelist based on node names - xpath

I don't like to ask for help, but this time I'm getting totally stuck with a xpath query.
Please have a look at this XML:
<doc>
<car>
<property id="color">
<attribute id="black" />
<attribute id="white" />
<attribute id="green" />
</property>
<property id="size">
<attribute id="small" />
<attribute id="medium" />
<attribute id="large" />
</property>
</car>
<attributes>
<color>white</color>
<size>small</size>
</attributes>
</doc>
The car/properties should be output according to the attributes nodenames. The desired output is:
<property id="color"><attribute id="white" /></property>
<property id="size"><attribute id="small" /></property>
The xpath
/doc/car/property[#id=name(/doc/attributes/*)]/attribute[#id=/doc/attributes/*/text()]
results only the first node, because the name() function returns only the name of the first element.
Who can help me to find out a working xpath (XSLT 1.0)? Many thanks for your help in advance!

You can achieve this with XSLT-1.0, but not only with XPath-1.0, because in XPath-1.0 you can only return the first item. This is not a problem in XSLT-1.0, because you can use an xsl:for-each loop, like the following:
<xsl:for-each select="/doc/attributes/*">
<property id="{/doc/car/property[#id=current()/local-name()]/#id}"><attribute id="{/doc/car/property[#id=current()/local-name()]/attribute[#id=current()/.]/#id}" /></property>
</xsl:for-each>
This code emits the following XML:
<property id="color"><attribute id="white"/></property>
<property id="size"><attribute id="small"/></property>
As seen, your requirements seem to be a little bit redundant, but I guess that your greater scenario justify the means.

What about these options (it's still unclear to me why you're using name() since I don't see any namespace in your sample data) :
//property|//attribute[#id=//attributes/*]
//attribute[#id=//attributes/*]|//attribute[#id=//attributes/*]/parent::property
//property|//attribute[#id=substring-before(normalize-space(//attributes)," ") or #id=substring-after(normalize-space(//attributes)," ")]
Third option should work even if you have to deal with a namespace for the #id inside the attributes node.
Output :

My working solution:
<xsl:stylesheet version="1.0">
<xsl:template match="/">
<xsl:for-each select="/doc/car/property">
<property id="{#id}">
<xsl:variable name="id" select="#id" />
<xsl:copy-of select="attribute[#id=/doc/attributes/*[name()=$id]/text()]" />
</property>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

Another solution without using a loop:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:apply-templates select="doc/car/property"/>
</xsl:template>
<xsl:template match="property">
<xsl:copy>
<xsl:copy-of select="#*"/>
<xsl:copy-of select="attribute[#id = /doc/attributes/*[name() = current()/#id]]"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
For each property it copies the element node and its attributes. Then it copies its attribute children having an id matching the respective element below /doc/attributes.

Related

Use xmlstarlet to add missing elements?

I have a number of vendor records which contain multiple addresses e.g.
<vendor>
<addresses>
<address primary="yes">
<line1 />
<city />
<state />
....
</address>
<address primary="no">
<line1 />
<city />
<state />
....
</address>
</addresses>
</vendor>
Some required elements are missing -- preventing updating of the records. Can xmlstarlet can be used to add an element with a default value if it is missing?
Here's a simple example. I'll use xmllint --auto for the xml source. Then we'll add an <add-me> element as a child of <info> if it doesn't exist using the identity transform pattern.
Source xml:
xmllint --auto
<?xml version="1.0"?>
<info>abc</info>
Add the missing element:
xmllint --auto | xsltproc add-missing.xsl -
<?xml version="1.0"?>
<info><add-me>some stuff</add-me>abc</info>
add-missing.xsl:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="info">
<xsl:copy>
<xsl:if test="not(add-me)">
<add-me>some stuff</add-me>
</xsl:if>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Another XSLT w/xmlstarlet option is to use a variable that contains the required elements (with or without default values) and treat is as a node set (using the supported exsl:node-set() function).
You can then iterate over the node set to see if an element with the same name already exists. If it does, use it. Otherwise use the default.
Example...
XML Input (input.xml)
<vendor>
<addresses>
<address primary="yes">
<line1>address 1 line1</line1>
<state>address 1 state1</state>
</address>
<address primary="no">
<line1>address 2 line1</line1>
<city>address 2 city</city>
<state>address 2 state</state>
</address>
</addresses>
</vendor>
XSLT 1.0 (so.xsl)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="http://exslt.org/common" exclude-result-prefixes="exsl">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="req_elems">
<req>
<line1/>
<city/>
<state/>
<country/>
</req>
</xsl:variable>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="address">
<xsl:variable name="ctx" select="."/>
<xsl:copy>
<xsl:apply-templates select="#*"/>
<xsl:for-each select="exsl:node-set($req_elems)/req/*">
<xsl:choose>
<xsl:when test="$ctx/*[local-name()=local-name(current())]">
<xsl:apply-templates select="$ctx/*[local-name()=local-name(current())]"/>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="."/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
XML Output
<vendor>
<addresses>
<address primary="yes">
<line1>address 1 line1</line1>
<city/>
<state>address 1 state1</state>
<country/>
</address>
<address primary="no">
<line1>address 2 line1</line1>
<city>address 2 city</city>
<state>address 2 state</state>
<country/>
</address>
</addresses>
</vendor>
Note: This only works if the only allowed elements in address are the same as $req_elements. For example, if you have an element named "foo" in address, it will be dropped from the output.

Grouping in XSLT 2.0 (grouping by text)

I have a problem figuring out this grouping in xslt:
The initial information:
<Application>
<ApplicationItem LayoutPath="Attachments.Package.Attachment[bfd0b74d-2888-49d9-a986-df807f08ad8a].UniqueID" Value="bfd0b74d-2888-49d9-a986-df807f08ad8a" />
<ApplicationItem LayoutPath="Attachments.Package.Attachment[bfd0b74d-2888-49d9-a986-df807f08ad8a].Filename" Value="Document 1 Test" />
<ApplicationItem LayoutPath="Attachments.Package.Attachment[bfd0b74d-2888-49d9-a986-df807f08ad8a].URI" Value="https/.test.pdf" />
<ApplicationItem LayoutPath="Attachments.Package.Attachment[bfd0b74d-2888-49d9-a986-df807f08ad8b].UniqueID" Value="bfd0b74d-2888-49d9-a986-df807f08ad8b" />
<ApplicationItem LayoutPath="Attachments.Package.Attachment[bfd0b74d-2888-49d9-a986-df807f08ad8b].Filename" Value="Document 2 Test" />
<ApplicationItem LayoutPath="Attachments.Package.Attachment[bfd0b74d-2888-49d9-a986-df807f08ad8b].URI" Value="google.com" />
</Application>
The expected result:
<Package>
<Attachment UniqueID="bfd0b74d-2888-49d9-a986-df807f08ad8a"
Filename="Document 1 Test"
URI="https/.test.pdf"/>
<Attachment UniqueID="bfd0b74d-2888-49d9-a986-df807f08ad8b"
Filename="Document 2 Test"
URI="google.com"/>
<Package>
My code:
I've done the grouping by using the id from the square brackets.
<xsl:for-each-group select="ApplicationItem[contains(#LayoutPath,'Attachments.Package.Attachment')]" group-by="substring-before(substring-after(#LayoutPath, 'Attachments.Package.Attachment['), ']')">
<Attachment>
<xsl:for-each select="current-group()">
<xsl:attribute name="UniqueID" select="current-grouping-key()"/>
<xsl:attribute name="Filename" select=".[contains(#LayoutPath,'Filename')]/#Value"/>
<xsl:attribute name="URI" select=".[contains(#LayoutPath,'URI')]/#Value"/>
</xsl:for-each>
<Attachment>
</xsl:for-each-group>
My results:
<Package>
<Attachment UniqueID="bfd0b74d-2888-49d9-a986-df807f08ad8a"
Filename=""
URI="https/.test.pdf"/>
<Attachment UniqueID="bfd0b74d-2888-49d9-a986-df807f08ad8b"
Filename=""
URI="google.com"/>
<Package>
What i need to change in code to use the grouping because for now is not working taking only the last ApplicationItem with the unique #LayoutPath.
I think the problem is with the grouping but don't now how to fix it.
Remove the <xsl:for-each select="current-group()"> and change
<xsl:attribute name="Filename" select=".[contains(#LayoutPath,'Filename')]/#Value"/>
<xsl:attribute name="URI" select=".[contains(#LayoutPath,'URI')]/#Value"/>
to
<xsl:attribute name="Filename" select="current-group()[contains(#LayoutPath,'Filename')]/#Value"/>
<xsl:attribute name="URI" select="current-group()[contains(#LayoutPath,'URI')]/#Value"/>

XSLT 2.0 dynamic XPATH expression

I have one XML file that I need to transform based on a mapping file with XSLT 2.0. I'm using the Saxon HE processor.
My mapping file:
<element root="TEST">
<childName condition="/TEST/MyElement/CHILD[text()='B']>/TEST/MyElement/CHILD</childName>
<childBez condition="/TEST/MyElement/CHILD[text()='B']>/TEST/MyElement/CHILDBEZ</childBez>
</element>
I have to copy the elements CHILD and CHILDBEZ plus the parent and the root elements when the text of CHILD equals B.
So with this Input:
<?xml version="1.0" encoding="UTF-8"?>
<TEST>
<MyElement>
<CHILD>A</CHILD>
<CHILDBEZ>ABEZ</CHILDBEZ>
<NotInteresting></NotInteresting>
</MyElement>
<MyElement>
<CHILD>B</CHILD>
<CHILDBEZ>BBEZ</CHILDBEZ>
<NotInteresting2></NotInteresting2>
</MyElement>
</TEST>
the desired output:
<TEST>
<MyElement>
<childName>B</childName>
<childBez>BBEZ</childBez>
</MyElement>
</TEST>
what I have so far (based on this solution XSLT 2.0 XPATH expression with variable):
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:strip-space elements="*"/>
<xsl:param name="mapping" select="document('mapping.xml')"/>
<xsl:key name="map" match="*" use="."/>
<xsl:template match="/">
<xsl:variable name="first-pass">
<xsl:apply-templates mode="first-pass"/>
</xsl:variable>
<xsl:apply-templates select="$first-pass/*"/>
</xsl:template>
<xsl:template match="*" mode="first-pass">
<xsl:param name="parent-path" tunnel="yes"/>
<xsl:variable name="path" select="concat($parent-path, '/', name())"/>
<xsl:variable name="replacement" select="key('map', $path, $mapping)"/>
<xsl:variable name="condition" select="key('map', $path, $mapping)/#condition"/>
<xsl:choose>
<xsl:when test="$condition!= ''">
<!-- if there is a condition defined in the mapping file, check for it -->
</xsl:when>
<xsl:otherwise>
<xsl:element name="{if ($replacement) then name($replacement) else name()}">
<xsl:attribute name="original" select="not($replacement)"/>
<xsl:apply-templates mode="first-pass">
<xsl:with-param name="parent-path" select="$path" tunnel="yes"/>
</xsl:apply-templates>
</xsl:element>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="*">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[#original='true' and not(descendant::*/#original='false')]"/>
</xsl:stylesheet>
but the problem is that it's impossible to evaluate dynamic XPATH expressions with XSLT 2.0. Does anyone knows a workaround for that? Plus I have a problem with the mapping file. When there is only one element in it, it's not working at all.
If dynamic XPath evaluation isn't an option in your chosen processor, then generating an XSLT stylesheet is often a good alternative. In fact, it's often a good alternative anyway.
One way of thinking about this is that your mapping file is actually a program written in a very simple transformation language. There are two ways of executing this program: you can write an interpreter (dynamic XPath evaluation), or you can write a compiler (XSLT stylesheet generation). Both work well.

How to replace 1st node attribute value in xml using xpath

In the below XML, need to replace the namespace by using XPath.
<application xmlns="http://ns.adobe.com/air/application/4.0">
<child id="1"></child>
<child id="2"></child>
</application>
I tried with
/application/#xmlns
and
/*[local-name()='application']/#[local-name()='xmlns']
Both failed to give the desire output. To replace the text, I have used xmltask replace.
<xmltask source="${temp.file1}" dest="${temp.file1}">
<replace path="/application/#xmlns" withText="http://ns.adobe.com/air/application/16.0" />
</xmltask>
The problem is that xmlns is not an attribute. You cannot select it with XPath.
A namespace is part of the node name in XML: <foo xmlns="urn:foo-namespace" /> and <foo xmlns="urn:bar-namespace" /> are not two nodes with the same name and different attributes, they are two nodes with different names and no attributes.
If you want to change a namespace, you must construct a completely new node.
XSLT is better-suited to this task:
<!-- update-air-ns.xsl -->
<xsl:transform
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:air4="http://ns.adobe.com/air/application/4.0"
xmlns="http://ns.adobe.com/air/application/16.0"
>
<xsl:output method="xml" encoding="UTF-8" indent="yes" />
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="air4:*">
<xsl:element name="{local-name()}">
<xsl:apply-templates select="#*|node()"/>
</xsl:element>
</xsl:template>
</xsl:transform>
This XSLT transformation does two things:
the first template (identity template) copies nodes recursively, unless there is a better matching template for a given node
the second template matches elements in the air4 namespace and constructs new elements that have the same local name but a different namespace. This happens because of the default namespace declaration in the XSLT. The http://ns.adobe.com/air/application/16.0 namespace is used for all newly constructed elements.
Applied to your input XML, the result is
<application xmlns="http://ns.adobe.com/air/application/16.0">
<child id="1"/>
<child id="2"/>
</application>
You can use Ant's xslt task:
<xslt in="${temp.file1}" out="${temp.file1}" style="update-air-ns.xsl" />

XSLT: poor performance due to complex XPath expression?

At some point in an XSLT program, I have the following:
<xsl:for-each select="tags/tag">
<xsl:apply-templates select="//shows/show[film=//films/film[tag=current()/#id]/#id]|//shows/show[group=//groups/group[film=//films/film[tag=current()/#id]/#id]/#id]">
<xsl:sort select="date" data-type="text" order="ascending"/>
<xsl:sort select="time" data-type="text" order="ascending"/>
</xsl:apply-templates>
</xsl:for-each>
It seems that the XPath expression //shows/show[film=//films/film[tag=current()/#id]/#id]|//shows/show[group=//groups/group[film=//films/film[tag=current()/#id]/#id]/#id], which is rather complex, considerably slows down the execution of the program (compared to the execution time before adding the quoted piece of code -- processing the same data, of course).
Do you think this is normal due to the relatively complex nature of the expression, and do you see how I could improve it so it performs better?
NB: in the XPath expression, film and //films/film, group and //groups/group refer to distinct elements.
See below a stripped-down sample of the XML input.
<program>
<tags>
<tag id="1">Tag1</tag>
<tag id="2">Tag2</tag>
<tag id="3">Tag3</tag>
</tags>
<films>
<film id="1">
Film1
<tag>2</tag><!-- References: /program/tags/tag/#id=2 -->
</film>
<film id="2">
Film2
<tag>1</tag><!-- References: /program/tags/tag/#id=1 -->
</film>
<film id="3">
Film3
<tag>3</tag><!-- References: /program/tags/tag/#id=3 -->
</film>
<film id="4">
Film4
<tag>3</tag><!-- References: /program/tags/tag/#id=3 -->
</film>
</film>
<groups>
<group id="1">
<film>3</film><!-- References: /program/films/film/#id=3 -->
<film>4</film><!-- References: /program/films/film/#id=4 -->
</group>
</groups>
<shows>
<show id="1"><!-- Show with film (=simple) -->
<film>1</film><!-- References: /program/films/film/#id=1 -->
<date>2011-12-12</date>
<time>12:00</time>
</show>
<show id="2"><!-- Show with group (=combined) -->
<group>1</group><!-- References: /program/groups/group/#id=1 -->
<date>2011-12-12</date>
<time>14:00</time>
</show>
</shows>
</program>
Explanations:
A tag is a property attached to a film (in fact, it's rather a category).
A group is an enumeration of films.
A show references either a film or a group.
What I want: for each tag, I'm looking for the shows referencing a film having the current tag and the shows referencing a group where at least one of the films has the current tag.
Double slashes in XPath are performance and CPU hogs when working with large documents (since every node in the document must be evaluated). If you can replace it with either an absolute or relative path you should have a noticeable improvement. If you can post the input schema and required output, we could be more specific?
e.g. With an absolute path
//shows/show[film=//films/film[tag=current()/#id]/#id]
becomes
/myroot/somepath/shows/show[film=/myroot/somepath/films/film[tag=current()/#id]/#id]
or if the shows and films are relative to the current node
./relativexpath/shows/show[film=./relativexpath/somepath/films/film[tag=current()/#id]/#id]
The answer by nonnb very likely points to the problem, however not really to an efficient solution ("cheaper" axis are better, but that alone doesn't make the speed such as when indexing data).
Note that the big problem is that the XPath expression predicate does another full traversal of the tree for each evaluation. You should use keys for stuff like this; this will (in most or even all XSLT implementations) make an indexed lookup possible, thereby reducing the runtime a lot.
Define keys for the films, groups and shows by id:
<xsl:key name="filmByTag" match="film" use="tag" />
<xsl:key name="groupsByFilm" match="group" use="tag" />
<xsl:key name="showsByFilm" match="show" use="film" />
<xsl:key name="showsByGroup" match="show" use="group" />
And then use it like this (not tested, but you should get the idea):
<xsl:variable name="films" select="key('filmByTag', #id)/#id" />
<xsl:apply-templates select="key('showsByFilm', $films)/#id|key('showsByGroups', key('groupsByFilm', $films)/#id)/#id">
Your XPath expression seems to be doing a three-way join so unless it's optimized the performance is likely to be O(n^3) in the size of the source document. Optimization involves replacing the serial searches of the document by indexed lookups. There are two ways of achieving this: you can hand-optimize it by replacing the filter expressions with calls on the key() function (as indicated by Dimitre), or you can use an optimizing XSLT processor such as Saxon-EE, which should do the same optimizations automatically.
Define a key with xsl:key and then use the key function for the cross reference instead of that comparison you currently have. Show us a sample of the XML so that we can understand its structure, then we can help with concrete code.
Here are two complete solutions that should exhibit better performance:
Do note: Better performance will be registered on sufficiently large input samples only. On small input samples it isn't worth it to optimize.
I. Not using // (but not using keys)
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="vFilms" select="/*/films/film"/>
<xsl:variable name="vShows" select="/*/shows/show"/>
<xsl:variable name="vGroups" select="/*/groups/group"/>
<xsl:variable name="vTags" select="/*/tags/tag"/>
<xsl:template match="/*">
<xsl:for-each select="$vTags">
<xsl:apply-templates select=
"$vShows
[film
=
$vFilms
[tag=current()/#id]
/#id
or
group
=
$vGroups
[film
=
$vFilms
[tag=current()/#id]
/#id
]
/#id
]
">
<xsl:sort select="date" data-type="text" order="ascending"/>
<xsl:sort select="time" data-type="text" order="ascending"/>
</xsl:apply-templates>
</xsl:for-each>
</xsl:template>
<xsl:template match="show">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>
II. Using keys
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:key name="kShowByFilmId" match="show"
use="film"/>
<xsl:key name="kShowByGroupId" match="show"
use="group"/>
<xsl:key name="kGroupByFilmId" match="group"
use="film"/>
<xsl:key name="kFilmByTag" match="film"
use="tag"/>
<xsl:variable name="vTags" select="/*/tags/tag"/>
<xsl:template match="/*">
<xsl:for-each select="$vTags">
<xsl:apply-templates select=
"key('kShowByFilmId',
key('kFilmByTag', current()/#id)/#id
)
|
key('kShowByGroupId',
key('kGroupByFilmId',
key('kFilmByTag', current()/#id)/#id
)
/#id
)
">
<xsl:sort select="date" data-type="text" order="ascending"/>
<xsl:sort select="time" data-type="text" order="ascending"/>
</xsl:apply-templates>
</xsl:for-each>
</xsl:template>
<xsl:template match="show">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>

Resources