I have a (XML-)file that has the following content:
<class>OverAll</class>
<char>
<rank> 1</rank>
<name> yyy</name>
<level> 9</level>
<experience>53842</experience>
<class>xxx</class>
</char>
<char>
<rank> 2</rank>
<name>aaa</name>
<level> 9</level>
<experience>53074</experience>
<class>zzz</class>
</char>
..and so on. I want to extract the number between the <experience> </experience> lines and replace it with a modified version of the number I found between the tag. For example, the file should look like this after the script:
<class>OverAll</class>
<char>
<rank> 1</rank>
<name> yyy</name>
<level> 9</level>
<experience>53.842</experience>
<class>xxx</class>
</char>
<char>
<rank> 2</rank>
<name>aaa</name>
<level> 9</level>
<experience>53.074</experience>
<class>zzz</class>
</char>
(I want to add a thousands separator, and values above 1 Million is required. So 2 thousand Separators :)
I am able to find and replace the number, but I dont know how to use the input number and modify it and add it back to the line.
Perhaps someone can help here?
Thank you very much :)
A one-liner sed can do it, assuming the last three digits are always decimal:
sed -zE 's#([[:digit:]]{7,})([[:digit:]]{1})[[:space:]]*(</experience[[:space:]]*>)#\1.\2\3#g;s#([[:digit:]]{3})[[:space:]]*(</experience[[:space:]]*>)#.\1\2#g'
sed parameters breakdown:
-zE
-z or --null-data: Separate lines by NULL characters to allow pattern matching across lines, because spaces, tabs and newlines are allowed by the XML syntax before the > bracket of a tag.
-E or --regexp-extended: Use extended regular expressions in the script (for portability use POSIX -E).
s#([[:digit:]]{7,})([[:digit:]]{1})[[:space:]]*(</experience[[:space:]]*>)#\1.\2\3#g:
Insert a decimal point before the last digit, to experience numbers containing seven plus one (eight) or more digits (Million or more with an extra decimal digit).
s#([[:digit:]]{3})[[:space:]]*(</experience[[:space:]]*>)#.\1\2#g:
Insert a decimal point before the last three digits, to experience numbers ending with three digits (automatically excludes the Millions experiences already processed by previous sed command.
Now keep in mind that it is not parsing the XML either, because it will replace numbers in the <experience> tag anywhere in the XML tree.
Regular expressions are not meant to parse markup languages. There are better, more efficient and dedicated tools to manipulate XML with XSLT/XPATH like saxon, xsltproc, xmllint...
Using proper XML processing with xsltproc:
decimal-experience.xsl
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- Cosmetic sugar to have the xml declaration header and indent -->
<xsl:output omit-xml-declaration="no" indent="yes"/>
<!-- Cosmetic sugar to remove unneeded spaces in elements -->
<xsl:strip-space elements="*"/>
<!-- Copy all the nodes as-is from the source xml -->
<xsl:template match="*">
<xsl:copy>
<xsl:apply-templates select="node()"/>
</xsl:copy>
</xsl:template>
<!-- Process the content of the experience tag within the char tag -->
<xsl:template match="char/experience/">
<!-- If the experience is not already in decimal form -->
<xsl:if test="not(contains(., '.'))">
<xsl:choose>
<!-- When the experience is less than a Million -->
<xsl:when test=". < 9999999">
<!-- The last three digits are decimals -->
<xsl:value-of select="format-number(. div 1000, '0.000')"/>
</xsl:when>
<!-- Otherwise the experience is a Million or more -->
<xsl:otherwise>
<!-- The last digit is decimal -->
<xsl:value-of select="format-number(. div 10, '0.0')"/>
</xsl:otherwise>
</xsl:choose>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
Running the XSLT transformation above:
xsltproc decimal-experience.xsl characters.xml
Example output:
I created a valid fictive characters.xml with a span root tag, because your extract was invalid XML.
<?xml version="1.0"?>
<span>
<class>OverAll</class>
<char>
<rank> 1</rank>
<name> yyy</name>
<level> 9</level>
<experience>53.842</experience>
<class>xxx</class>
</char>
<char>
<rank> 2</rank>
<name>aaa</name>
<level> 9</level>
<experience>53.074</experience>
<class>zzz</class>
</char>
<char>
<rank> 3</rank>
<name>Million</name>
<level>42</level>
<experience>5585307.4</experience>
<class>zzz</class>
</char>
</span>
Related
Looking at this XML:
<?xml version="1.0" encoding="UTF-8"?>
<root>
root
<child>
child 1
<grandchild>
grandchild 1
</grandchild>
<yetanothergrandchild>
yetanothergrandchild 1
</yetanothergrandchild>
</child>
<child>
child 2
<grandchild>
grandchild 2
</grandchild>
<yetanothergrandchild>
yetanothergrandchild 2
</yetanothergrandchild>
</child>
</root>
and that XSL
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fo="http://www.w3.org/1999/XSL/Format">
<xsl:output media-type="text" omit-xml-declaration="yes"/>
<xsl:template match="/">
<fo:root>
<fo:layout-master-set>
<fo:simple-page-master master-name="simple"
page-height="29.7cm"
page-width="21cm"
margin-top="1cm"
margin-bottom="2cm"
margin-left="2.5cm"
margin-right="2.5cm">
<fo:region-body margin-top="3cm"/>
<fo:region-before extent="3cm"/>
<fo:region-after extent="1.5cm"/>
</fo:simple-page-master>
</fo:layout-master-set>
<fo:page-sequence master-reference="simple">
<fo:flow flow-name="xsl-region-body">
<fo:block font-size="12pt"
font-family="sans-serif"
line-height="15pt"
space-after.optimum="3pt"
text-align="justify">
<xsl:value-of select="root/child/grandchild"/>
<xsl:value-of select="root/child/yetanothergrandchild"/>
</fo:block>
</fo:flow>
</fo:page-sequence>
</fo:root>
</xsl:template>
</xsl:stylesheet>
If I put the xsl:stylesheet version to 1.0, the output is:
grandchild 1 yetanothergrandchild 1
If I put it to 2.0, the output is:
grandchild 1 grandchild 2 yetanothergrandchild 1 yetanothergrandchild 2
Of course, I read already through various lists of differences in between XSL T 1 and 2 but I cannot find any hint of a change which could cause that.
Can somebody tell me how and why that behaves that differently?
See https://www.w3.org/TR/xslt20/#backwards and then https://www.w3.org/TR/xslt20/#incompatibilities saying
J.1.3 Backwards Compatibility Behavior Some XSLT constructs behave
differently under XSLT 2.0 depending on whether backwards compatible
behavior is enabled. In these cases, the behavior may be made
compatible with XSLT 1.0 by ensuring that backwards compatible
behavior is enabled (which is done using the [xsl:]version attribute).
These constructs are as follows:
If the xsl:value-of instruction has no separator attribute, and the
value of the select expression is a sequence of more than one item,
then under XSLT 2.0 all items in the sequence will be output, space
separated, while in XSLT 1.0, all items after the first will be
discarded.
...
In XSLT 1.0 the xsl:value-of instruction returns the string-value of the first node in the selected node-set.
In XSLT 2.0 the instruction returns the value of every node in the selected sequence, separated by a space or by the string specified in the separator attribute.
These are my formulations, the specs are more difficult to follow.
I have a requirement wherein I have to validate couple of scenarios: the offer start date should fall before offer end date and the offer start date account should fall after account start date. If any of the scenario is not met error should be thrown.
Offer start date and offer end date values will appear in space separated formats in xml tag and xml tags respectively.
Below is the sample xml code:
<Accounts>
<Account>
<AccountStartDate>2020-12-01<AccountStartDate>
<offerStartDate>2020-10-02 2020-11-02</offerStartDate>
<offerEndDate>2019-10-02 2019-11-02</offerEndDate>
</Account>
</Accounts>
Below is the sample xslt code:
<xsl:for-each select="Accounts/Account">
<xsl:variable name="offerSDate" select="offerStartDate"/>
<xsl:variable name="offerEDate" select="offerEndDate"/>
<xsl:if test="$offerSDate > xs:date(AccountStartDate)">
<Error>
<xsl:text>Error: Invalid offer Date
</xsl:text>
</Error>
</xsl:if>
<xsl:if test="$offerSDate > $offerEDate">
<Error>
<xsl:text>Error: Invalid offer Date
</xsl:text>
</Error>
</xsl:if>
</xsl:for-each>
After execution of the xslt code, I am getting the invalid date "2020-10-02 2020-11-02""issue.
If you want to do a separate comparison for each date in offerStartDate, then you could do (in XSLT 2.0) either:
<xsl:for-each select="Account">
<xsl:if test="some $offerStartDate in tokenize(offerStartDate, ' ') satisfies xs:date($offerStartDate) gt xs:date(AccountStartDate)">
<Error>error message</Error>
</xsl:if>
</xsl:for-each>
or (depending on what meaning your test should have):
<xsl:for-each select="Account">
<xsl:if test="every $offerStartDate in tokenize(offerStartDate, ' ') satisfies xs:date($offerStartDate) gt xs:date(AccountStartDate)">
<Error>error message</Error>
</xsl:if>
</xsl:for-each>
Probably the easiest way to do it with only XSLT is to convert your XML from:
<Accounts>
<Account>
<AccountStartDate>2020-12-01</AccountStartDate>
<offerStartDate>2020-10-02 2020-11-02</offerStartDate>
<offerEndDate>2019-10-02 2019-11-02</offerEndDate>
</Account>
</Accounts>
To something like:
<Accounts>
<Account>
<AccountStartDate>2020-12-01</AccountStartDate>
<offer>
<offerStartDate>2020-10-02</offerStartDate>
<offerEndDate>2019-10-02</offerEndDate>
</offer>
<offer>
<offerStartDate>2020-11-02</offerStartDate>
<offerEndDate>2019-11-02</offerEndDate>
</offer>
</Account>
</Accounts>
How can we check value for value inside node retun i.e
Hurricane's Grill Darling Harbour
Actually, i am not able to pick apostrophe sign with below code
exists(//ns1:name[text()='Hurricane's Grill Darling Harbour']).
Getting error message:
RuntimeException:net.sf.saxon.trans.XPathException: XPath syntax error at char 33 on line 2 in {...name[text()='Hurricane's Gr...}: expected "]", found name "s"
i am not able to pick apostrophe sign with below code
exists(//ns1:name[text()='Hurricane's Grill Darling Harbour']).
Getting error message: RuntimeException:net.sf.saxon.trans.XPathException: XPath syntax error at char 33 on line 2 in {...name[text()='Hurricane's Gr...}:
expected "]", found name "s
In XPath 2.0 an apostrophe or a double quote can be escaped by simply doubling that character. See rules [75] and [76] here: http://www.w3.org/TR/xpath20/#terminal-symbols
Use:
exists(//ns1:name[text()='Hurricane''s Grill Darling Harbour'])
XSLT 2.0 - based verification:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ns1="ns1">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:sequence
select="exists(//ns1:name[text()='Hurricane''s Grill Darling Harbour'])"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the following XML document:
<ns1:name xmlns:ns1="ns1">Hurricane's Grill Darling Harbour</ns1:name>
the wanted, correct result is produced:
true
In case you only can use an XPath 1.0 expression within an XML document and the string contains both kinds of quotes, use:
boolean(//ns1:name
[text()=concat('Hurricane "Mathew"', " strength's value")])
Here is a full XSLT 1.0 - based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ns1="ns1">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:value-of select=
"boolean(//ns1:name
[text()=concat('Hurricane "Mathew"', " strength's value")])"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the following XML document:
<ns1:name xmlns:ns1="ns1">Hurricane "Mathew" strength's value</ns1:name>
the wanted, correct result is produced:
true
Saxon starts to support some of the XPath 2.0 requeriments in 7.0 version.
SOAPUI version 5.1.2 use Saxon 9.1 version (Saxon change history for new version can be viewed here)
Then as #DimitreNovatchev perfectly explain, a possible option to escape ' for XPath 2.0 in text is to use double the symbol as '':
exists(//ns1:name[text()='Hurricane''s Grill Darling Harbour'])
I only want to add another possibility; since " quotes are also valid you can wrap the whole text with " an use ' inside. So in your SOAPUI XPath match assertion you can use:
exists(//ns1:name[text()="Hurricane's Grill Darling Harbour"])
Note that I purpose this way, since I'm assuming that you've to use only ' in your XPath finding node texts. If you need to use also " inside your the text wrapped with " you've to double the " as happens with '.
Additionally note that in SOAPUI it's possible to use * as a wildcard for namespaces so you can reduce your expression:
declare namespace ns='http://somenamespace/';
exists(//ns1:name[text()="Hurricane's Grill Darling Harbour"])
To:
exists(//*:name[text()="Hurricane's Grill Darling Harbour"])
I really do hope that my title is at least a bit clear.
important: i can only use xslt 1.0 because the project needs to work with the MSXML XSLT processor.
What I try to do:
I generate documents containing information about rooms. Rooms have walls, I need the sum of wall area of these per room.
The input xml file I get is dynamically created by another program.
Changing the structure of the input xml file is not the solution, trust me, it's needed like that and is much more complex than I show you here.
My XML (the innerArea attribute in the wall element has to get summed up):
<root>
<floor id="30" name="EG">
<flat name="Wohnung" nr="1">
<Room id="49" area="93.08565">
<WallSegments>
<WallSegment id="45"/>
<WallSegment id="42"/>
<WallSegment id="39"/>
</WallSegments>
</Room>
</flat>
</floor>
<components>
<Wall id="20" innerArea="20.7654"/>
<wallSegment id="45" wall="20">[...]</wallSegment>
<Wall id="21" innerArea="12.45678"/>
<wallSegment id="42" wall="21">[...]</wallSegment>
<Wall id="22" innerArea="17.8643"/>
<wallSegment id="39" wall="22">[...]</wallSegment>
</components>
</root>
With my XSLT I was able to reach the values of the walls which belong to a room.
But I have really no idea how I could get the sum of the value out of that.
My XSLT:
<xsl:for-each select="flat/Room">
<xsl:for-each select="WallSegments/WallSegment">
<xsl:variable name="curWallSegId" select="#id"/>
<xsl:for-each select="/root/components/wallSegment[#id = $curWallSegId]">
<xsl:variable name="curWallId" select="#wall"/>
<xsl:for-each select="/root/components/Wall[#id = $curWallId]">
<!--I didn't expect that this was working, but at least I tried :D-->
<xsl:value-of select="sum(#AreaInner)"/>
</xsl:for-each>
</xsl:for-each>
</xsl:for-each>
</xsl:for-each>
Desired Output should be something like...
[...]
<paragraph>
Room 1:
Wall area: 51.09 m²
[...]
</paragraph>
[...]
So I hope I described my problem properly. If not: I am sorry, you may beat me right into the face x)
It's best to use keys to get "related" data. Place this at the top of your stylesheet, outside of any template:
<xsl:key name="wall" match="components/Wall" use="#id" />
<xsl:key name="wallSegment" match="components/wallSegment" use="#id" />
Then:
<xsl:for-each select="flat/Room">
<paragraph>
<xsl:text>Room </xsl:text>
<xsl:value-of select="position()"/>
<xsl:text>:
Wall area: </xsl:text>
<xsl:value-of select="format-number(sum(key('wall', key('wallSegment', WallSegments/WallSegment/#id)/#wall)/#innerArea), '0.00m²')"/>
<xsl:text>
</xsl:text>
</paragraph>
</xsl:for-each>
will return:
<paragraph>Room 1:
Wall area: 51.09m²</paragraph>
If what you need it's the area of every room, this is a way of getting it:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match="/root/floor">
<xsl:for-each select="flat/Room">
<xsl:variable name="currentRoomSegmentsIds" select="WallSegments/WallSegment/#id"/>
<xsl:variable name="currentRoomWallsIds" select="/root/components/wallSegment[#id = $currentRoomSegmentsIds]/#wall"/>
<xsl:variable name="currentRoomWallsInnerAreas" select="/root/components/Wall[#id = $currentRoomWallsIds]/#innerArea"/>
Id of the room = <xsl:value-of select="#id"/>.
Area of the room = <xsl:value-of select="sum($currentRoomWallsInnerAreas)"/>
</xsl:for-each> <!-- Enf of for each room -->
</xsl:template>
</xsl:stylesheet>
This produces the following result:
Id of the room = 49.
Area of the room = 51.08648
At some point in an XSLT program, I have the following:
<xsl:for-each select="tags/tag">
<xsl:apply-templates select="//shows/show[film=//films/film[tag=current()/#id]/#id]|//shows/show[group=//groups/group[film=//films/film[tag=current()/#id]/#id]/#id]">
<xsl:sort select="date" data-type="text" order="ascending"/>
<xsl:sort select="time" data-type="text" order="ascending"/>
</xsl:apply-templates>
</xsl:for-each>
It seems that the XPath expression //shows/show[film=//films/film[tag=current()/#id]/#id]|//shows/show[group=//groups/group[film=//films/film[tag=current()/#id]/#id]/#id], which is rather complex, considerably slows down the execution of the program (compared to the execution time before adding the quoted piece of code -- processing the same data, of course).
Do you think this is normal due to the relatively complex nature of the expression, and do you see how I could improve it so it performs better?
NB: in the XPath expression, film and //films/film, group and //groups/group refer to distinct elements.
See below a stripped-down sample of the XML input.
<program>
<tags>
<tag id="1">Tag1</tag>
<tag id="2">Tag2</tag>
<tag id="3">Tag3</tag>
</tags>
<films>
<film id="1">
Film1
<tag>2</tag><!-- References: /program/tags/tag/#id=2 -->
</film>
<film id="2">
Film2
<tag>1</tag><!-- References: /program/tags/tag/#id=1 -->
</film>
<film id="3">
Film3
<tag>3</tag><!-- References: /program/tags/tag/#id=3 -->
</film>
<film id="4">
Film4
<tag>3</tag><!-- References: /program/tags/tag/#id=3 -->
</film>
</film>
<groups>
<group id="1">
<film>3</film><!-- References: /program/films/film/#id=3 -->
<film>4</film><!-- References: /program/films/film/#id=4 -->
</group>
</groups>
<shows>
<show id="1"><!-- Show with film (=simple) -->
<film>1</film><!-- References: /program/films/film/#id=1 -->
<date>2011-12-12</date>
<time>12:00</time>
</show>
<show id="2"><!-- Show with group (=combined) -->
<group>1</group><!-- References: /program/groups/group/#id=1 -->
<date>2011-12-12</date>
<time>14:00</time>
</show>
</shows>
</program>
Explanations:
A tag is a property attached to a film (in fact, it's rather a category).
A group is an enumeration of films.
A show references either a film or a group.
What I want: for each tag, I'm looking for the shows referencing a film having the current tag and the shows referencing a group where at least one of the films has the current tag.
Double slashes in XPath are performance and CPU hogs when working with large documents (since every node in the document must be evaluated). If you can replace it with either an absolute or relative path you should have a noticeable improvement. If you can post the input schema and required output, we could be more specific?
e.g. With an absolute path
//shows/show[film=//films/film[tag=current()/#id]/#id]
becomes
/myroot/somepath/shows/show[film=/myroot/somepath/films/film[tag=current()/#id]/#id]
or if the shows and films are relative to the current node
./relativexpath/shows/show[film=./relativexpath/somepath/films/film[tag=current()/#id]/#id]
The answer by nonnb very likely points to the problem, however not really to an efficient solution ("cheaper" axis are better, but that alone doesn't make the speed such as when indexing data).
Note that the big problem is that the XPath expression predicate does another full traversal of the tree for each evaluation. You should use keys for stuff like this; this will (in most or even all XSLT implementations) make an indexed lookup possible, thereby reducing the runtime a lot.
Define keys for the films, groups and shows by id:
<xsl:key name="filmByTag" match="film" use="tag" />
<xsl:key name="groupsByFilm" match="group" use="tag" />
<xsl:key name="showsByFilm" match="show" use="film" />
<xsl:key name="showsByGroup" match="show" use="group" />
And then use it like this (not tested, but you should get the idea):
<xsl:variable name="films" select="key('filmByTag', #id)/#id" />
<xsl:apply-templates select="key('showsByFilm', $films)/#id|key('showsByGroups', key('groupsByFilm', $films)/#id)/#id">
Your XPath expression seems to be doing a three-way join so unless it's optimized the performance is likely to be O(n^3) in the size of the source document. Optimization involves replacing the serial searches of the document by indexed lookups. There are two ways of achieving this: you can hand-optimize it by replacing the filter expressions with calls on the key() function (as indicated by Dimitre), or you can use an optimizing XSLT processor such as Saxon-EE, which should do the same optimizations automatically.
Define a key with xsl:key and then use the key function for the cross reference instead of that comparison you currently have. Show us a sample of the XML so that we can understand its structure, then we can help with concrete code.
Here are two complete solutions that should exhibit better performance:
Do note: Better performance will be registered on sufficiently large input samples only. On small input samples it isn't worth it to optimize.
I. Not using // (but not using keys)
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="vFilms" select="/*/films/film"/>
<xsl:variable name="vShows" select="/*/shows/show"/>
<xsl:variable name="vGroups" select="/*/groups/group"/>
<xsl:variable name="vTags" select="/*/tags/tag"/>
<xsl:template match="/*">
<xsl:for-each select="$vTags">
<xsl:apply-templates select=
"$vShows
[film
=
$vFilms
[tag=current()/#id]
/#id
or
group
=
$vGroups
[film
=
$vFilms
[tag=current()/#id]
/#id
]
/#id
]
">
<xsl:sort select="date" data-type="text" order="ascending"/>
<xsl:sort select="time" data-type="text" order="ascending"/>
</xsl:apply-templates>
</xsl:for-each>
</xsl:template>
<xsl:template match="show">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>
II. Using keys
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:key name="kShowByFilmId" match="show"
use="film"/>
<xsl:key name="kShowByGroupId" match="show"
use="group"/>
<xsl:key name="kGroupByFilmId" match="group"
use="film"/>
<xsl:key name="kFilmByTag" match="film"
use="tag"/>
<xsl:variable name="vTags" select="/*/tags/tag"/>
<xsl:template match="/*">
<xsl:for-each select="$vTags">
<xsl:apply-templates select=
"key('kShowByFilmId',
key('kFilmByTag', current()/#id)/#id
)
|
key('kShowByGroupId',
key('kGroupByFilmId',
key('kFilmByTag', current()/#id)/#id
)
/#id
)
">
<xsl:sort select="date" data-type="text" order="ascending"/>
<xsl:sort select="time" data-type="text" order="ascending"/>
</xsl:apply-templates>
</xsl:for-each>
</xsl:template>
<xsl:template match="show">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>