What causes extremely poor XSLT performance: text fragments or priorites? - performance

I am trying to do some cleanups using XSLT. I want to do some changes on text fragments and leave all the other nodes in peace. However my current implementation runs very slow and consumes a lot of memory. The removal of a small template changes the run time from a minute to a fraction of a second.
This is the XSLT:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:import href="../common/identity.xsl"/>
<xsl:template match="text()" priority="100">
<xsl:variable name="pass1" select="replace(., '(_|~)', ' ')"/>
<xsl:variable name="pass2" select="replace($pass1, ' , ', ', ')"/>
<xsl:variable name="final" select="$pass2"/>
<xsl:value-of select="$final"/>
</xsl:template>
<xsl:template match="body/text()[1][. = ' '] | body/text()[last()][. = ' ']"
priority="200"/>
</xsl:stylesheet>
The first template replaces some characters, the second template removes the first and last text fragments, but only if they contain exactly one space (sadly normalize-space does not fit my needs).
This XSLT runs very slow and consumes a lot of memory. If I remove the last templates, the same XSLT runs fast and using a normal amount of memory.
The XSLT is run using Saxon-(HE|EE) 9.5.1.3 inside oXygen 15.2.
What is causing this big loss of performance? Is it the use of text fragments in general? The use of priorities? The use of [1] and [last()]?

using not(following-sibling::text()) instead of last() fixed it. Could you explain why or give some pointers to the problems of last()?
There are two ways of evaluating patterns: left-to-right, and right-to-left, corresponding to the "formal" and "informal" semantics given in section 5.5.3 of the specification. The right-to-left method is much more efficient, but it cannot be used for all patterns; in particular, patterns that use positional predicates are tricky. Saxon will handle a number of cases efficiently, including match="para[last()]", but for some others, including match="para[last()-1]" and (it seems) match="section/para[last()]", it takes the slow-but-methodical route. I'll take a look at the code and see if this can be improved.

Related

If condition with and without the square brackets XSLT

I am learning XSLT and I came to an aspect of XSLT/XPath which is not clear to me.
I want to check if there is at least one element namePart with a value inside.
There can be something like this in XML:
<mods:name type="personal">
<mods:namePart type="family">Salamonis</mods:namePart>
<mods:namePart type="given"/>
</mods:name>
But also this due to any reason:
<mods:name type="personal">
<mods:namePart/>
</mods:name>
I think I have found out the solution for my problem. Actually I found two similar solutions but I do not understand the difference:
first:
<xsl:for-each select="mods:name">
<xsl:if test="mods:namePart/text() != ''"> ..... </xsl:if>
<xsl:for-each>
second:
<xsl:for-each select="mods:name">
<xsl:if test="mods:namePart[text() != '']"> ..... </xsl:if>
<xsl:for-each>
Apparently, both of them are working fine. But I am still thinking what is better to use or if there are some minor differences.
My solution is taken from this comment: https://stackoverflow.com/a/7660915/14163073
So are both of these solution working in accordance with the comment? (operator returns true if at least one item on the left side "matches" one item on the right side)
Thanks for any explanation!
Well, take a tour through an XPath tutorial, I would say. Preferences are often a personal choice and style. For the last test I would simply use e.g. <xsl:if test="mods:namePart = 'foo'"> as that works for any contents of the mods:namePart elements. It doesn't really matter for your simple example but in the end you might end up using XPath against e.g. some mixed contents HTML paragraph element and want to check its whole content (e.g. test="p = 'This is an example text.'") and the p could be anything from a simple <p>This is an example text.</p> to <p>This is an <b>example</b> text.</p>.

Numbering of figures in DocBook

I'm using the Maven docbkx plugin to generate a PDF.
I would like the figures to be numbered as usual sequentially from 1, ignoring any chapters, sections etc.
This doesn't work, as I turned on hierarchical numbering of sections with the configuration parameter sectionLabelIncludesComponentLabel in the pom.xml. Now the first section in chapter 2 is not 1 (as it is by default) but 2.1, as I want.
But as a side effect, the first figure in chapter 2.1 gets the number 2.1, too, and the next figure gets 2.2, so the chapter number is not only prepended to sections, but also to figures (which makes absolutely no sense).
How can I have hierarchical section numbers, but at the same time simple sequential figure numbering?
[Edit]
Looks like sectionLabelIncludesComponentLabel has nothing to do with it. Even if I turn it off, the figure titles are prefixed with the chapter number.
There is no parameter to switch on the wanted behaviour, but it can be done by customizing a template in common/labels.xsl (the number part of a title is called a "label" in DocBook-XSL).
You will need to create a customization layer and add the following to it:
<xsl:template match="db:figure" mode="label.markup">
<xsl:choose>
<xsl:when test="#label">
<xsl:value-of select="#label"/>
</xsl:when>
<xsl:otherwise>
<!-- Use simple sequential numbering within a book -->
<xsl:number format="1" from="db:book" level="any"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>

Can an XSLT 1.0 transform be used as a pure XPath 1.0 location path evaluator?

Please sorry if the following questions might seem silly, but due to my inexperience I can't be sure about the reliability of this method.
I'm trying to build myself an XPath 1.0 location path evaluator using XSLT 1.0.
The idea is simple. The transform takes in input the xpath expression to evaluate and then apply the templates to selected nodes. A template for each kind of node is defined to copy the node (and some more information) on the output. The input document will be transformed obsviously using an XSLT 1.0 compliant processor.
What I would like to know from your expertise, is whether this approach is absolutely, fall-free and reliable way to test location paths and display selected node-sets. I'm not asking for someone debugging my code. I've tested against various input documents and it seems working correctly. I'd like to know just if I'm missing something from the point of view of XPath.
Will this work correctly with any XPath 1.0 location path?
Will this be limited to XPath 1.0/XSLT 1.0? I do not see any controindication to extend the template to XPath 2.0 just by changing its version (and the XSLT processor obviously).
Here's the transform which should be used as XPath tester. Notice:
I've omitted the templates for comment and pi nodes to make the transform not too heavy, but they are currently managed in a similar way.
It doesn't need to manage namespaces at the moment.
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
<xsl:strip-space elements="*"/>
<xsl:param name="path-expr" select="*"/>
<xsl:template match="/">
<xpath-tester>
<node-sets count="{count($path-expr)}">
<xsl:apply-templates select="$path-expr" mode="path-expr"/>
</node-sets>
</xpath-tester>
</xsl:template>
<xsl:template match="node()|#*" mode="path-expr">
<node-set
position="{position()}"
id="{generate-id()}"
parent-id="{name(parent::*[1])}-{generate-id(parent::*[1])}">
<xsl:apply-templates select="." mode="output"/>
</node-set>
</xsl:template>
<xsl:template match="*" mode="output">
<xsl:attribute name="type">element</xsl:attribute>
<node>
<xsl:copy-of select="."/>
</node>
</xsl:template>
<xsl:template match="#*" mode="output">
<xsl:attribute name="type">attribute</xsl:attribute>
<node>
<xsl:copy-of select="."/>
</node>
</xsl:template>
<xsl:template match="text()" mode="output">
<xsl:attribute name="type">text</xsl:attribute>
<node>
<xsl:copy-of select="."/>
</node>
</xsl:template>
</xsl:stylesheet>
Sounds similar to a pet project of mine; feel free to have a look at my code, it's a bit too big to paste here:
http://www.flynn1179.net/xml/FullDisplayXml.xslt
It transforms any XML document into an html page with collapsible nodes, and by modifying the 'match' attribute of a key near the top, you can specify an XPath to nodes, and have it produce a list of them or highlight them in the source.
I asked a very similar question to this here: How can you pass in a parameter to an xslt that can be used in a xsl:key?, although I was trying to apply the parameter to the key, which doesn't work.
NB: That code's a work in progress, it's kind of ugly in places, and I'm fairly sure there's a few things it doesn't handle properly, or could do better, but hopefully it's useful. I use a derivative of it on my XML sandbox page: http://www.flynn1179.net/xml/ (it's also a work in progress, I know there's a couple of bugs in it)
You may be interested to look into the code of my 11years old XPath Vizualizer.
Dynamic evaluation within XSLT itself isn't directly supported in XSLT 2.0 and althogh there might be such support in XSLT 3.0 / XPath 3.0, this is not necessary at all.
First #Martin pointed out
you would need dynamic XPath evaluation supported to be able to treat the string with the XPath as a node-set
I've extended the transform a bit, in order to handle dynamic XPath evaluation. The transform is now able to accept an input string and evaluate it to an XPath.
Now it's Saxon dependent as by usage of saxon:evaluate. In a similar way and with the support of function-available one could implement other extensions and make this more portable.
Here the three new templates (and the two new parameters) which replace the root template in my original transform (given the namespace declaration xmlns:saxon="http://icl.com/saxon").
<xsl:param name="path-expr" select="false()"/>
<xsl:param name="xpath" select="*"/>
<xsl:template match="/">
<xsl:choose>
<xsl:when test="$path-expr">
<xsl:apply-templates select="/" mode="dyn"/>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="/" mode="base"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="/" mode="dyn">
<xpath-tester>
<node-sets count="{count(saxon:evaluate($path-expr))}">
<xsl:apply-templates select="saxon:evaluate($path-expr)"
mode="path-expr"/>
</node-sets>
</xpath-tester>
</xsl:template>
<xsl:template match="/" mode="base">
<xpath-tester>
<node-sets count="{count($xpath)}">
<xsl:apply-templates select="$xpath"
mode="path-expr"/>
</node-sets>
</xpath-tester>
</xsl:template>
Second #Martin pointed out
I suspect you will run into trouble with outputting attribute and element nodes together
I've made test and run into this problem, but worked around as above it works fine even in that situation.

Select a only first matching node in XPath

I have the following XML:
<parent>
<pet>
<data>
<birthday/>
</data>
</pet>
<pet>
<data>
<birthday/>
</data>
</pet>
</parent>
And now I want to select the first birthday element via parent//birthday[1] but this returns both birthday elements because bothof them are the first child of their parents.
How can I only select the first birthday element of the entire document no matter where it is located. I've tried parent//birthday[position()=1] but that doesn't work either.
You mean (note the parentheses!)
(/parent/pet/data/birthday)[1]
or, a shorter, but less specific variation:
(/*/*/*/birthday)[1]
(//birthday)[1]
or, more semantic, the "birthday of the first pet":
/parent/pet[1]/data/birthday
or, if not all pets have birthday entries, the "birthday of the first pet that for which a birthday is set":
/parent/pet[data/birthday][1]/data/birthday
If you work from a context node, you can abbreviate the expression by making it relative to that context node.
Explanation:
/parent/pet/data/birthday[1] selects all <birthday> nodes that are the first in their respective parents (the <data> nodes), throughout the document
(/parent/pet/data/birthday)[1] selects all <birthday> nodes, and of those (that's what the parentheses do, they create an intermediary node-set), it takes the first one
FYI: you can visualize the results of the various Xpath queries with the (free) XPathVisualizer tool. Works on Windows only.
Ok, I admit this is horrendous and there must be a better way, but it appears to work.
/*/*[descendant::birthday and not(preceding-sibling::*[descendant::birthday])]
I look for all elements at the second level in the tree that have a descendant element called birthday that do not have a preceding sibling element that has a birthday element as a descendant.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:variable name="birthdays" select="//birthday"/>
<xsl:value-of select="$birthdays[1]"/>
</xsl:template>
</xsl:stylesheet>
try
//birthday[position()=1]
// finds nodes no matter where there are in the hierarchy
you could also do
pet[position()=1]/data/birthday

How to export an algorithm to other languages?

A project I'm working on involves 3 distinct systems/platforms. C#, Java, and XSLT. I have some simple algorithms (just a bunch of conditionals), expressed in pseudo-code as something like:
if inputParameter1 is equal to 1
return "one"
else if inputParameter2 is equal to 5
return "five" concatenated with inputParameter1
else
return "not found"
simple stuff like that.
I'm trying to figure out a mechanism that will:
Let me write the algorithm once
Be able to execute the algorithm in the native language of each system (C#, Java, and XSL)
Have each system (C#, Java, and XSL) always use the latest version of the algorithm when the algorithm is updated.
So to elaborate on my example, the C# representation would be:
public string TheMethod(int inputParameter1, int inputParameter2)
{
if (inputParameter1 == 1)
{
return "one";
}
else if (inputParameter2 == 5)
{
return string.Concat("five", inputParameter1.ToString());
}
else
{
return "not found";
}
}
and the XSLT representation would be:
<xsl:template name="TheMethod">
<xsl:param name="inputParameter1" />
<xsl:param name="inputParameter2" />
<xsl:choose>
<xsl:when test="$inputParameter1 = 1">
<xsl:text>one</xsl:text>
</xsl:when>
<xsl:otherwise>
<xsl:choose>
<xsl:when test="$inputParameter2 = 5">
<xsl:text>five</xsl:text>
<xsl:value-of select="$inputParameter1" />
</xsl:when>
<xsl:otherwise>
<xsl:text>Not Found</xsl:text>
</xsl:otherwise>
</xsl:choose>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
hopefully you get the idea.
How would I express an algorithm in a generic way and be able to automatically convert it to C#, Java, or XSL?
Thanks!
-Mike
Well, the "answer" to this is a DSL, or just some common markup that you then render (amusingly, you could do this with XSLT).
But generally, IMHO, implementing this isn't worth the trouble, depending on how complicated your algorithm is and how many of them you'll be writing.
Looking at this problem a little more generally, your aim is to have only one authoritative version of the algorithm (i.e. the "Don't Repeat Yourself" principle).
Instead of trying to automatically translate/export to different programming languages, a simpler solution would be to choose one language (probably XSL) to implement the algorithm. Then in C# and Java just use some XSL tools to execute the algorithm directly, passing in whatever parameters you like. I haven't done this before, but I assume it is possible with the right third-party tools (whereas I doubt you could do it the other way around, executing Java or C# from within XSL, which is why XSL is the best choice for the "base" language).
This does not produce a source code, but it make possible to use the logic in various languages with very little coding:
Write a simple xml like this:
...
<case>
<InputParam1>1</InputParam>
<InputParam2>NULL</InputParam>
<answer>one</answer>
</case>
...
Then parse it, and store it in a dictionary/map/whatever-the-language-or-framework-has-for-hashtables. So, store the inputparams as the key (perhaps, as a sturct), and you get the answer very quickly. For the cases when the parameters itselves are used in the return value (like in your example in the "concat part"), I'd use some special syntax for the data like
...
<answer>five$inputparam1$</answer>
...
It is worth mentioning that the more special cases you have, like this concatenation, the less useful this solution can be.

Resources