Find duplicate sibling with xpath - xpath

How to find only nodes with at least a similar/equal sibling node using Xpath?
For example:
<root>
<parent>
<node>...</node>
<node_unique>...</node_unique>
<node>...</node>
<another_one>...</another_one>
<another_one>...</another_one>
</parent>
</root>
In the example the xpath shold select only <node> and <another_one> because they are appearing more than once.
I was trying to find a solution for this for hours without success (now I think is not possible with XPath...).

These are impossible to select with a single XPath 1.0 expression (due to lack of range variables in XPath 1.0).
One possible solution is to select all /*/*/* elements, then to get the name of each element, using name() off that element, then to evaluate /*/*/*[name() = $currentName][2] (where $currentName should be substituted with the name just obtained. If the last expression selects an element, then the currentName is a name that occurs at least twice -- therefore you keep that element. Do so with all elements and their names. As an auxhiliarry step, one might dedup the names (and selected elements) by placing them in a hash-table.
In Xpath 2.0 it is trivial to select with a single XPath expression all children of a given parent, that have at least one other sibling with the same name:
/*/*/*
[name() = following-sibling::*/name()
and
not(name() = preceding-sibling::*/name())
]
A much more compact expression:
/*/*/*[index-of(/*/*/*/name(), name())[2]]
XSLT 2.0 - based verification:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select=
"/*/*/*[index-of(/*/*/*/name(), name())[2]]"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<root>
<parent>
<node>...</node>
<node_unique>...</node_unique>
<node>...</node>
<another_one>...</another_one>
<another_one>...</another_one>
</parent>
</root>
the above XPath expression is evaluated and the selected from this evaluation elements are copied to the output:
<node>...</node>
<another_one>...</another_one>
Note: For a related question/answer, see this.

Related

What causes extremely poor XSLT performance: text fragments or priorites?

I am trying to do some cleanups using XSLT. I want to do some changes on text fragments and leave all the other nodes in peace. However my current implementation runs very slow and consumes a lot of memory. The removal of a small template changes the run time from a minute to a fraction of a second.
This is the XSLT:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:import href="../common/identity.xsl"/>
<xsl:template match="text()" priority="100">
<xsl:variable name="pass1" select="replace(., '(_|~)', ' ')"/>
<xsl:variable name="pass2" select="replace($pass1, ' , ', ', ')"/>
<xsl:variable name="final" select="$pass2"/>
<xsl:value-of select="$final"/>
</xsl:template>
<xsl:template match="body/text()[1][. = ' '] | body/text()[last()][. = ' ']"
priority="200"/>
</xsl:stylesheet>
The first template replaces some characters, the second template removes the first and last text fragments, but only if they contain exactly one space (sadly normalize-space does not fit my needs).
This XSLT runs very slow and consumes a lot of memory. If I remove the last templates, the same XSLT runs fast and using a normal amount of memory.
The XSLT is run using Saxon-(HE|EE) 9.5.1.3 inside oXygen 15.2.
What is causing this big loss of performance? Is it the use of text fragments in general? The use of priorities? The use of [1] and [last()]?
using not(following-sibling::text()) instead of last() fixed it. Could you explain why or give some pointers to the problems of last()?
There are two ways of evaluating patterns: left-to-right, and right-to-left, corresponding to the "formal" and "informal" semantics given in section 5.5.3 of the specification. The right-to-left method is much more efficient, but it cannot be used for all patterns; in particular, patterns that use positional predicates are tricky. Saxon will handle a number of cases efficiently, including match="para[last()]", but for some others, including match="para[last()-1]" and (it seems) match="section/para[last()]", it takes the slow-but-methodical route. I'll take a look at the code and see if this can be improved.

Xpath Axes - how to select child node attribute

I have the following XML:
<ArrayOfStationStatus xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" autopagerMatchedRules="1">
<StationStatus ID="20" StatusDetails="To the platform due to planned maintenance work.">
<Station ID="20" Name="Bermondsey"/>
<Status ID="NS" CssClass="Closed" Description="No Step Free Access" IsActive="true">
<StatusType ID="2" Description="Station"/>
</Status>
</StationStatus>
</ArrayOfStationStatus>
And would like to select StationStatus nodes that contain a particular phrase in the Name attribute. It's important that I select SationStatus nodes.
This is the xpath I have come up with but it's not correct:
/ArrayOfStationStatus/StationStatus[contains(lower-case(child::Station/#Name),lower-case('phrase'))]
EDIT::::::::
I just solved it! This is the code I needed:
/ArrayOfStationStatus/StationStatus[child::Station[contains(lower-case(attribute::Name),lower-case("Ac"))]]
Well I managed to solve it people! Here is the solution, in this case I'm looking for the phrase 'Ac' as you can see
/ArrayOfStationStatus/StationStatus[child::Station[contains(lower-case(attribute::Name),lower-case("Ac"))]]
Also remember
lower-case(
is only available in xpath 2.0 (Dimitre Novatchev)

How to check for the immediate following sibling of parents in xpath

I am now using xpath to test a node's parent node's immediate following sibling(uncle or ant) node.
My xml looks like
<MyParent>
<A>
<B>
<C>
</MyParent>
<Uncle>
..
</Uncle>
Now I am in the template match for child node B, and I want to test if the immeididate following-sibling of my parent is called "Uncle",
I tried the following two xpaths:
<xsl:if test="parent::MyParent/following-sibling::*[1][self::Uncle]">
<xsl:text>we have it</xsl:text>
</xsl:if>
and
<xsl:if test="parent::MyParent[following-sibling::*[1][self::Uncle]]">
<xsl:text>we have it</xsl:text>
</xsl:if>
neither of them will work, could experts help debug where I made mistakes? Thanks :).
Try this.
../following-sibling::*[position()=1][name()='Uncle']

Can an XSLT 1.0 transform be used as a pure XPath 1.0 location path evaluator?

Please sorry if the following questions might seem silly, but due to my inexperience I can't be sure about the reliability of this method.
I'm trying to build myself an XPath 1.0 location path evaluator using XSLT 1.0.
The idea is simple. The transform takes in input the xpath expression to evaluate and then apply the templates to selected nodes. A template for each kind of node is defined to copy the node (and some more information) on the output. The input document will be transformed obsviously using an XSLT 1.0 compliant processor.
What I would like to know from your expertise, is whether this approach is absolutely, fall-free and reliable way to test location paths and display selected node-sets. I'm not asking for someone debugging my code. I've tested against various input documents and it seems working correctly. I'd like to know just if I'm missing something from the point of view of XPath.
Will this work correctly with any XPath 1.0 location path?
Will this be limited to XPath 1.0/XSLT 1.0? I do not see any controindication to extend the template to XPath 2.0 just by changing its version (and the XSLT processor obviously).
Here's the transform which should be used as XPath tester. Notice:
I've omitted the templates for comment and pi nodes to make the transform not too heavy, but they are currently managed in a similar way.
It doesn't need to manage namespaces at the moment.
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
<xsl:strip-space elements="*"/>
<xsl:param name="path-expr" select="*"/>
<xsl:template match="/">
<xpath-tester>
<node-sets count="{count($path-expr)}">
<xsl:apply-templates select="$path-expr" mode="path-expr"/>
</node-sets>
</xpath-tester>
</xsl:template>
<xsl:template match="node()|#*" mode="path-expr">
<node-set
position="{position()}"
id="{generate-id()}"
parent-id="{name(parent::*[1])}-{generate-id(parent::*[1])}">
<xsl:apply-templates select="." mode="output"/>
</node-set>
</xsl:template>
<xsl:template match="*" mode="output">
<xsl:attribute name="type">element</xsl:attribute>
<node>
<xsl:copy-of select="."/>
</node>
</xsl:template>
<xsl:template match="#*" mode="output">
<xsl:attribute name="type">attribute</xsl:attribute>
<node>
<xsl:copy-of select="."/>
</node>
</xsl:template>
<xsl:template match="text()" mode="output">
<xsl:attribute name="type">text</xsl:attribute>
<node>
<xsl:copy-of select="."/>
</node>
</xsl:template>
</xsl:stylesheet>
Sounds similar to a pet project of mine; feel free to have a look at my code, it's a bit too big to paste here:
http://www.flynn1179.net/xml/FullDisplayXml.xslt
It transforms any XML document into an html page with collapsible nodes, and by modifying the 'match' attribute of a key near the top, you can specify an XPath to nodes, and have it produce a list of them or highlight them in the source.
I asked a very similar question to this here: How can you pass in a parameter to an xslt that can be used in a xsl:key?, although I was trying to apply the parameter to the key, which doesn't work.
NB: That code's a work in progress, it's kind of ugly in places, and I'm fairly sure there's a few things it doesn't handle properly, or could do better, but hopefully it's useful. I use a derivative of it on my XML sandbox page: http://www.flynn1179.net/xml/ (it's also a work in progress, I know there's a couple of bugs in it)
You may be interested to look into the code of my 11years old XPath Vizualizer.
Dynamic evaluation within XSLT itself isn't directly supported in XSLT 2.0 and althogh there might be such support in XSLT 3.0 / XPath 3.0, this is not necessary at all.
First #Martin pointed out
you would need dynamic XPath evaluation supported to be able to treat the string with the XPath as a node-set
I've extended the transform a bit, in order to handle dynamic XPath evaluation. The transform is now able to accept an input string and evaluate it to an XPath.
Now it's Saxon dependent as by usage of saxon:evaluate. In a similar way and with the support of function-available one could implement other extensions and make this more portable.
Here the three new templates (and the two new parameters) which replace the root template in my original transform (given the namespace declaration xmlns:saxon="http://icl.com/saxon").
<xsl:param name="path-expr" select="false()"/>
<xsl:param name="xpath" select="*"/>
<xsl:template match="/">
<xsl:choose>
<xsl:when test="$path-expr">
<xsl:apply-templates select="/" mode="dyn"/>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="/" mode="base"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="/" mode="dyn">
<xpath-tester>
<node-sets count="{count(saxon:evaluate($path-expr))}">
<xsl:apply-templates select="saxon:evaluate($path-expr)"
mode="path-expr"/>
</node-sets>
</xpath-tester>
</xsl:template>
<xsl:template match="/" mode="base">
<xpath-tester>
<node-sets count="{count($xpath)}">
<xsl:apply-templates select="$xpath"
mode="path-expr"/>
</node-sets>
</xpath-tester>
</xsl:template>
Second #Martin pointed out
I suspect you will run into trouble with outputting attribute and element nodes together
I've made test and run into this problem, but worked around as above it works fine even in that situation.

Select a only first matching node in XPath

I have the following XML:
<parent>
<pet>
<data>
<birthday/>
</data>
</pet>
<pet>
<data>
<birthday/>
</data>
</pet>
</parent>
And now I want to select the first birthday element via parent//birthday[1] but this returns both birthday elements because bothof them are the first child of their parents.
How can I only select the first birthday element of the entire document no matter where it is located. I've tried parent//birthday[position()=1] but that doesn't work either.
You mean (note the parentheses!)
(/parent/pet/data/birthday)[1]
or, a shorter, but less specific variation:
(/*/*/*/birthday)[1]
(//birthday)[1]
or, more semantic, the "birthday of the first pet":
/parent/pet[1]/data/birthday
or, if not all pets have birthday entries, the "birthday of the first pet that for which a birthday is set":
/parent/pet[data/birthday][1]/data/birthday
If you work from a context node, you can abbreviate the expression by making it relative to that context node.
Explanation:
/parent/pet/data/birthday[1] selects all <birthday> nodes that are the first in their respective parents (the <data> nodes), throughout the document
(/parent/pet/data/birthday)[1] selects all <birthday> nodes, and of those (that's what the parentheses do, they create an intermediary node-set), it takes the first one
FYI: you can visualize the results of the various Xpath queries with the (free) XPathVisualizer tool. Works on Windows only.
Ok, I admit this is horrendous and there must be a better way, but it appears to work.
/*/*[descendant::birthday and not(preceding-sibling::*[descendant::birthday])]
I look for all elements at the second level in the tree that have a descendant element called birthday that do not have a preceding sibling element that has a birthday element as a descendant.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:variable name="birthdays" select="//birthday"/>
<xsl:value-of select="$birthdays[1]"/>
</xsl:template>
</xsl:stylesheet>
try
//birthday[position()=1]
// finds nodes no matter where there are in the hierarchy
you could also do
pet[position()=1]/data/birthday

Resources