Sort multiple XML by element value from another XML - sorting

Basically what I'm trying to here is to merge and sort a multiple XML by a value of an element in a reference XML using XSLT.
> <xsl:variable name="refXml"
> select="document(concat(replace($refXmlTemp,'^file:',''),'/ref.xml'))"/>
>
>
> <xsl:for-each select="for $x in
> collection(string-join(($inputDir,'select=*.xml;recurse=yes;on-error=fail'),'?'))
> return
> (if (matches($refXml/root/descendant-or-self::issue/id[normalize-space(.)=normalize-space($x/art/item/id)]/number,'\w+')
> and matches($x/art/item/title,'\w+')) then saxon:discard-document($x)
> else ())">
> <xsl:sort select="$refXml/root/descendant-or-self::issue/id[normalize-space(.)=/art/item/id]/following-sibling::number"/>
The snippet above merged all the input XML but it was not sorted.
It seems that the XSLT xsl:sortfunction will only take an effect if it will be pointed at a value inside the XML that's currently processing.
Please advise on how could i get to use the ref.xml as a reference in sorting.
Here's a sample input of ref.xml:
<root>
<issue>
<id>wlu-101</id>
<number>1</number>
</issue>
<issue>
<id>wlu-143</id>
<number>2</number>
</issue>
<issue-group>
<issue>
<id>wlu-144</id>
<number>3</number>
</issue>
<issue-group>
<issue>
<id>wlu-185</id>
<number>4</number>
</issue>
</issue-group>
</issue-group>
</root>

Replace <xsl:sort select="$refXml/root/descendant-or-self::issue/id[normalize-space(.)=/art/item/id]/following-sibling::number"/> with
<xsl:sort select="key('ref', /art/item/id, $refXml)/number"/>
after defining
<xsl:key name="ref" match="issue" use="normalize-space(id)"/>
As an alternative use <xsl:sort select="$refXml//issue[normalize-space(id)=current()/art/item/id]/number"/>.

Related

Getting invalid date issue while comparing dates in XSLT code

I have a requirement wherein I have to validate couple of scenarios: the offer start date should fall before offer end date and the offer start date account should fall after account start date. If any of the scenario is not met error should be thrown.
Offer start date and offer end date values will appear in space separated formats in xml tag and xml tags respectively.
Below is the sample xml code:
<Accounts>
<Account>
<AccountStartDate>2020-12-01<AccountStartDate>
<offerStartDate>2020-10-02 2020-11-02</offerStartDate>
<offerEndDate>2019-10-02 2019-11-02</offerEndDate>
</Account>
</Accounts>
Below is the sample xslt code:
<xsl:for-each select="Accounts/Account">
<xsl:variable name="offerSDate" select="offerStartDate"/>
<xsl:variable name="offerEDate" select="offerEndDate"/>
<xsl:if test="$offerSDate > xs:date(AccountStartDate)">
<Error>
<xsl:text>Error: Invalid offer Date
</xsl:text>
</Error>
</xsl:if>
<xsl:if test="$offerSDate > $offerEDate">
<Error>
<xsl:text>Error: Invalid offer Date
</xsl:text>
</Error>
</xsl:if>
</xsl:for-each>
After execution of the xslt code, I am getting the invalid date "2020-10-02 2020-11-02""issue.
If you want to do a separate comparison for each date in offerStartDate, then you could do (in XSLT 2.0) either:
<xsl:for-each select="Account">
<xsl:if test="some $offerStartDate in tokenize(offerStartDate, ' ') satisfies xs:date($offerStartDate) gt xs:date(AccountStartDate)">
<Error>error message</Error>
</xsl:if>
</xsl:for-each>
or (depending on what meaning your test should have):
<xsl:for-each select="Account">
<xsl:if test="every $offerStartDate in tokenize(offerStartDate, ' ') satisfies xs:date($offerStartDate) gt xs:date(AccountStartDate)">
<Error>error message</Error>
</xsl:if>
</xsl:for-each>
Probably the easiest way to do it with only XSLT is to convert your XML from:
<Accounts>
<Account>
<AccountStartDate>2020-12-01</AccountStartDate>
<offerStartDate>2020-10-02 2020-11-02</offerStartDate>
<offerEndDate>2019-10-02 2019-11-02</offerEndDate>
</Account>
</Accounts>
To something like:
<Accounts>
<Account>
<AccountStartDate>2020-12-01</AccountStartDate>
<offer>
<offerStartDate>2020-10-02</offerStartDate>
<offerEndDate>2019-10-02</offerEndDate>
</offer>
<offer>
<offerStartDate>2020-11-02</offerStartDate>
<offerEndDate>2019-11-02</offerEndDate>
</offer>
</Account>
</Accounts>

How to convert content between two patterns into specific format using shell

How do we convert the content between 2 patterns into specific format using shell code? The following sample XML that starts with <Mapping> and ends with </Mapping> needs to be converted to plan format code as shown below.
Sample input code:
<Mapping name="temp1"> /*rule name will the value of Mapping name*/
<phpCode>
boolean_out = copyfunc temp /*rule content output */
</phpCode>
</Mapping>
The value of name will be the rule name and the value of boolean_out will be rule content.
Sample output code:
rule temp1 { // temp1 is the mapping value
copyfunc temp //boolean_out value is rule content
}
Given input.xml containing:
<Mapping name="temp1"> /*rule name will the value of Mapping name*/
<phpCode>
boolean_out = copyfunc temp /*rule content output */
</phpCode>
</Mapping>
And transform.xsl containing:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="utf-8" indent="no" />
<xsl:template match="Mapping">
<xsl:text>rule </xsl:text>
<xsl:value-of select="#name" />
<xsl:text> { // </xsl:text>
<xsl:value-of select="#name" />
<xsl:text> is the mapping value</xsl:text>
<xsl:apply-templates select="./phpCode" mode="php-code" />
<xsl:text>
}
</xsl:text>
</xsl:template>
<xsl:template match="*" mode="php-code">
<xsl:text>
</xsl:text>
<xsl:value-of select="substring-before(substring-after(normalize-space(text()),'= '),'/*')" />
<xsl:text>//</xsl:text>
<xsl:value-of select="substring-before(normalize-space(text()),'=')" />
<xsl:text> value is rule content</xsl:text>
</xsl:template>
</xsl:stylesheet>
An XSLT transform produces:
rule temp1 { // temp1 is the mapping value
copyfunc temp //boolean_out value is rule content
}
The method of invoking the transform is platform and tool-specific. Command line tools can be used to invoke a transformation, though there are any number of ways to run the XSL script. For example, an xsltproc command is:
xsltproc transform.xsl input.xml
One can use msxsl.exe similarly, except that the command arguments are reversed.

How to replace 1st node attribute value in xml using xpath

In the below XML, need to replace the namespace by using XPath.
<application xmlns="http://ns.adobe.com/air/application/4.0">
<child id="1"></child>
<child id="2"></child>
</application>
I tried with
/application/#xmlns
and
/*[local-name()='application']/#[local-name()='xmlns']
Both failed to give the desire output. To replace the text, I have used xmltask replace.
<xmltask source="${temp.file1}" dest="${temp.file1}">
<replace path="/application/#xmlns" withText="http://ns.adobe.com/air/application/16.0" />
</xmltask>
The problem is that xmlns is not an attribute. You cannot select it with XPath.
A namespace is part of the node name in XML: <foo xmlns="urn:foo-namespace" /> and <foo xmlns="urn:bar-namespace" /> are not two nodes with the same name and different attributes, they are two nodes with different names and no attributes.
If you want to change a namespace, you must construct a completely new node.
XSLT is better-suited to this task:
<!-- update-air-ns.xsl -->
<xsl:transform
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:air4="http://ns.adobe.com/air/application/4.0"
xmlns="http://ns.adobe.com/air/application/16.0"
>
<xsl:output method="xml" encoding="UTF-8" indent="yes" />
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="air4:*">
<xsl:element name="{local-name()}">
<xsl:apply-templates select="#*|node()"/>
</xsl:element>
</xsl:template>
</xsl:transform>
This XSLT transformation does two things:
the first template (identity template) copies nodes recursively, unless there is a better matching template for a given node
the second template matches elements in the air4 namespace and constructs new elements that have the same local name but a different namespace. This happens because of the default namespace declaration in the XSLT. The http://ns.adobe.com/air/application/16.0 namespace is used for all newly constructed elements.
Applied to your input XML, the result is
<application xmlns="http://ns.adobe.com/air/application/16.0">
<child id="1"/>
<child id="2"/>
</application>
You can use Ant's xslt task:
<xslt in="${temp.file1}" out="${temp.file1}" style="update-air-ns.xsl" />

How to get sum of an attribute value which is referenced by id multiple times with xpath in xslt 1.0?

I really do hope that my title is at least a bit clear.
important: i can only use xslt 1.0 because the project needs to work with the MSXML XSLT processor.
What I try to do:
I generate documents containing information about rooms. Rooms have walls, I need the sum of wall area of these per room.
The input xml file I get is dynamically created by another program.
Changing the structure of the input xml file is not the solution, trust me, it's needed like that and is much more complex than I show you here.
My XML (the innerArea attribute in the wall element has to get summed up):
<root>
<floor id="30" name="EG">
<flat name="Wohnung" nr="1">
<Room id="49" area="93.08565">
<WallSegments>
<WallSegment id="45"/>
<WallSegment id="42"/>
<WallSegment id="39"/>
</WallSegments>
</Room>
</flat>
</floor>
<components>
<Wall id="20" innerArea="20.7654"/>
<wallSegment id="45" wall="20">[...]</wallSegment>
<Wall id="21" innerArea="12.45678"/>
<wallSegment id="42" wall="21">[...]</wallSegment>
<Wall id="22" innerArea="17.8643"/>
<wallSegment id="39" wall="22">[...]</wallSegment>
</components>
</root>
With my XSLT I was able to reach the values of the walls which belong to a room.
But I have really no idea how I could get the sum of the value out of that.
My XSLT:
<xsl:for-each select="flat/Room">
<xsl:for-each select="WallSegments/WallSegment">
<xsl:variable name="curWallSegId" select="#id"/>
<xsl:for-each select="/root/components/wallSegment[#id = $curWallSegId]">
<xsl:variable name="curWallId" select="#wall"/>
<xsl:for-each select="/root/components/Wall[#id = $curWallId]">
<!--I didn't expect that this was working, but at least I tried :D-->
<xsl:value-of select="sum(#AreaInner)"/>
</xsl:for-each>
</xsl:for-each>
</xsl:for-each>
</xsl:for-each>
Desired Output should be something like...
[...]
<paragraph>
Room 1:
Wall area: 51.09 m²
[...]
</paragraph>
[...]
So I hope I described my problem properly. If not: I am sorry, you may beat me right into the face x)
It's best to use keys to get "related" data. Place this at the top of your stylesheet, outside of any template:
<xsl:key name="wall" match="components/Wall" use="#id" />
<xsl:key name="wallSegment" match="components/wallSegment" use="#id" />
Then:
<xsl:for-each select="flat/Room">
<paragraph>
<xsl:text>Room </xsl:text>
<xsl:value-of select="position()"/>
<xsl:text>:
Wall area: </xsl:text>
<xsl:value-of select="format-number(sum(key('wall', key('wallSegment', WallSegments/WallSegment/#id)/#wall)/#innerArea), '0.00m²')"/>
<xsl:text>
</xsl:text>
</paragraph>
</xsl:for-each>
will return:
<paragraph>Room 1:
Wall area: 51.09m²</paragraph>
If what you need it's the area of every room, this is a way of getting it:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match="/root/floor">
<xsl:for-each select="flat/Room">
<xsl:variable name="currentRoomSegmentsIds" select="WallSegments/WallSegment/#id"/>
<xsl:variable name="currentRoomWallsIds" select="/root/components/wallSegment[#id = $currentRoomSegmentsIds]/#wall"/>
<xsl:variable name="currentRoomWallsInnerAreas" select="/root/components/Wall[#id = $currentRoomWallsIds]/#innerArea"/>
Id of the room = <xsl:value-of select="#id"/>.
Area of the room = <xsl:value-of select="sum($currentRoomWallsInnerAreas)"/>
</xsl:for-each> <!-- Enf of for each room -->
</xsl:template>
</xsl:stylesheet>
This produces the following result:
Id of the room = 49.
Area of the room = 51.08648

XSLT: poor performance due to complex XPath expression?

At some point in an XSLT program, I have the following:
<xsl:for-each select="tags/tag">
<xsl:apply-templates select="//shows/show[film=//films/film[tag=current()/#id]/#id]|//shows/show[group=//groups/group[film=//films/film[tag=current()/#id]/#id]/#id]">
<xsl:sort select="date" data-type="text" order="ascending"/>
<xsl:sort select="time" data-type="text" order="ascending"/>
</xsl:apply-templates>
</xsl:for-each>
It seems that the XPath expression //shows/show[film=//films/film[tag=current()/#id]/#id]|//shows/show[group=//groups/group[film=//films/film[tag=current()/#id]/#id]/#id], which is rather complex, considerably slows down the execution of the program (compared to the execution time before adding the quoted piece of code -- processing the same data, of course).
Do you think this is normal due to the relatively complex nature of the expression, and do you see how I could improve it so it performs better?
NB: in the XPath expression, film and //films/film, group and //groups/group refer to distinct elements.
See below a stripped-down sample of the XML input.
<program>
<tags>
<tag id="1">Tag1</tag>
<tag id="2">Tag2</tag>
<tag id="3">Tag3</tag>
</tags>
<films>
<film id="1">
Film1
<tag>2</tag><!-- References: /program/tags/tag/#id=2 -->
</film>
<film id="2">
Film2
<tag>1</tag><!-- References: /program/tags/tag/#id=1 -->
</film>
<film id="3">
Film3
<tag>3</tag><!-- References: /program/tags/tag/#id=3 -->
</film>
<film id="4">
Film4
<tag>3</tag><!-- References: /program/tags/tag/#id=3 -->
</film>
</film>
<groups>
<group id="1">
<film>3</film><!-- References: /program/films/film/#id=3 -->
<film>4</film><!-- References: /program/films/film/#id=4 -->
</group>
</groups>
<shows>
<show id="1"><!-- Show with film (=simple) -->
<film>1</film><!-- References: /program/films/film/#id=1 -->
<date>2011-12-12</date>
<time>12:00</time>
</show>
<show id="2"><!-- Show with group (=combined) -->
<group>1</group><!-- References: /program/groups/group/#id=1 -->
<date>2011-12-12</date>
<time>14:00</time>
</show>
</shows>
</program>
Explanations:
A tag is a property attached to a film (in fact, it's rather a category).
A group is an enumeration of films.
A show references either a film or a group.
What I want: for each tag, I'm looking for the shows referencing a film having the current tag and the shows referencing a group where at least one of the films has the current tag.
Double slashes in XPath are performance and CPU hogs when working with large documents (since every node in the document must be evaluated). If you can replace it with either an absolute or relative path you should have a noticeable improvement. If you can post the input schema and required output, we could be more specific?
e.g. With an absolute path
//shows/show[film=//films/film[tag=current()/#id]/#id]
becomes
/myroot/somepath/shows/show[film=/myroot/somepath/films/film[tag=current()/#id]/#id]
or if the shows and films are relative to the current node
./relativexpath/shows/show[film=./relativexpath/somepath/films/film[tag=current()/#id]/#id]
The answer by nonnb very likely points to the problem, however not really to an efficient solution ("cheaper" axis are better, but that alone doesn't make the speed such as when indexing data).
Note that the big problem is that the XPath expression predicate does another full traversal of the tree for each evaluation. You should use keys for stuff like this; this will (in most or even all XSLT implementations) make an indexed lookup possible, thereby reducing the runtime a lot.
Define keys for the films, groups and shows by id:
<xsl:key name="filmByTag" match="film" use="tag" />
<xsl:key name="groupsByFilm" match="group" use="tag" />
<xsl:key name="showsByFilm" match="show" use="film" />
<xsl:key name="showsByGroup" match="show" use="group" />
And then use it like this (not tested, but you should get the idea):
<xsl:variable name="films" select="key('filmByTag', #id)/#id" />
<xsl:apply-templates select="key('showsByFilm', $films)/#id|key('showsByGroups', key('groupsByFilm', $films)/#id)/#id">
Your XPath expression seems to be doing a three-way join so unless it's optimized the performance is likely to be O(n^3) in the size of the source document. Optimization involves replacing the serial searches of the document by indexed lookups. There are two ways of achieving this: you can hand-optimize it by replacing the filter expressions with calls on the key() function (as indicated by Dimitre), or you can use an optimizing XSLT processor such as Saxon-EE, which should do the same optimizations automatically.
Define a key with xsl:key and then use the key function for the cross reference instead of that comparison you currently have. Show us a sample of the XML so that we can understand its structure, then we can help with concrete code.
Here are two complete solutions that should exhibit better performance:
Do note: Better performance will be registered on sufficiently large input samples only. On small input samples it isn't worth it to optimize.
I. Not using // (but not using keys)
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="vFilms" select="/*/films/film"/>
<xsl:variable name="vShows" select="/*/shows/show"/>
<xsl:variable name="vGroups" select="/*/groups/group"/>
<xsl:variable name="vTags" select="/*/tags/tag"/>
<xsl:template match="/*">
<xsl:for-each select="$vTags">
<xsl:apply-templates select=
"$vShows
[film
=
$vFilms
[tag=current()/#id]
/#id
or
group
=
$vGroups
[film
=
$vFilms
[tag=current()/#id]
/#id
]
/#id
]
">
<xsl:sort select="date" data-type="text" order="ascending"/>
<xsl:sort select="time" data-type="text" order="ascending"/>
</xsl:apply-templates>
</xsl:for-each>
</xsl:template>
<xsl:template match="show">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>
II. Using keys
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:key name="kShowByFilmId" match="show"
use="film"/>
<xsl:key name="kShowByGroupId" match="show"
use="group"/>
<xsl:key name="kGroupByFilmId" match="group"
use="film"/>
<xsl:key name="kFilmByTag" match="film"
use="tag"/>
<xsl:variable name="vTags" select="/*/tags/tag"/>
<xsl:template match="/*">
<xsl:for-each select="$vTags">
<xsl:apply-templates select=
"key('kShowByFilmId',
key('kFilmByTag', current()/#id)/#id
)
|
key('kShowByGroupId',
key('kGroupByFilmId',
key('kFilmByTag', current()/#id)/#id
)
/#id
)
">
<xsl:sort select="date" data-type="text" order="ascending"/>
<xsl:sort select="time" data-type="text" order="ascending"/>
</xsl:apply-templates>
</xsl:for-each>
</xsl:template>
<xsl:template match="show">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>

Resources