Is there an "if -then - else " statement in XPath? - xpath

It seems with all the rich amount of function in xpath that you could do an "if" . However , my engine keeps insisting "there is no such function" , and I hardly find any documentation on the web (I found some dubious sources , but the syntax they had didn't work)
I need to remove ':' from the end of a string (if exist), so I wanted to do this:
if (fn:ends-with(//div [#id='head']/text(),': '))
then (fn:substring-before(//div [#id='head']/text(),': ') )
else (//div [#id='head']/text())
Any advice?

Yes, there is a way to do it in XPath 1.0:
concat(
substring($s1, 1, number($condition) * string-length($s1)),
substring($s2, 1, number(not($condition)) * string-length($s2))
)
This relies on the concatenation of two mutually exclusive strings, the first one being empty if the condition is false (0 * string-length(...)), the second one being empty if the condition is true. This is called "Becker's method", attributed to Oliver Becker (original link is now dead, the web archive has a copy).
In your case:
concat(
substring(
substring-before(//div[#id='head']/text(), ': '),
1,
number(
ends-with(//div[#id='head']/text(), ': ')
)
* string-length(substring-before(//div [#id='head']/text(), ': '))
),
substring(
//div[#id='head']/text(),
1,
number(not(
ends-with(//div[#id='head']/text(), ': ')
))
* string-length(//div[#id='head']/text())
)
)
Though I would try to get rid of all the "//" before.
Also, there is the possibility that //div[#id='head'] returns more than one node.
Just be aware of that — using //div[#id='head'][1] is more defensive.

The official language specification for XPath 2.0 on W3.org details that the language does indeed support if statements. See Section 3.8 Conditional Expressions, in particular. Along with the syntax format and explanation, it gives the following example:
if ($widget1/unit-cost < $widget2/unit-cost)
then $widget1
else $widget2
This would suggest that you shouldn't have brackets surrounding your expressions (otherwise the syntax looks correct). I'm not wholly confident, but it's surely worth a try. So you'll want to change your query to look like this:
if (fn:ends-with(//div [#id='head']/text(),': '))
then fn:substring-before(//div [#id='head']/text(),': ')
else //div [#id='head']/text()
I do strongly suspect this may fix it however, as the fact that your XPath engine seems to be trying to interpret if as a function, where it is in fact a special construct of the language.
Finally, to point out the obvious, insure that your XPath engine does in fact support XPath 2.0 (as opposed to an earlier version)! I don't believe conditional expressions are part of previous versions of XPath.

How about using fn:replace(string,pattern,replace) instead?
XPATH is very often used in XSLTs and if you are in that situation and does not have XPATH 2.0 you could use:
<xsl:choose>
<xsl:when test="condition1">
condition1-statements
</xsl:when>
<xsl:when test="condition2">
condition2-statements
</xsl:when>
<xsl:otherwise>
otherwise-statements
</xsl:otherwise>
</xsl:choose>

according to pkarat's, law you can achieve conditional XPath in version 1.0.
For your case, follow the concept:
concat(substring-before(your-xpath[contains(.,':')],':'),your-xpath[not(contains(.,':'))])
This will definitely work. See how it works. Give two inputs
praba:
karan
For 1st input: it contains : so condition true, string before : will be the output, say praba is your output. 2nd condition will be false so no problems.
For 2nd input: it does not contain : so condition fails, coming to 2nd condition the string doesn't contain : so condition true... therefore output karan will be thrown.
Finally your output would be praba,karan.

Personally, I would use XSLT to transform the XML and remove the trailing colons. For example, suppose I have this input:
<?xml version="1.0" encoding="UTF-8"?>
<Document>
<Paragraph>This paragraph ends in a period.</Paragraph>
<Paragraph>This one ends in a colon:</Paragraph>
<Paragraph>This one has a : in the middle.</Paragraph>
</Document>
If I wanted to strip out trailing colons in my paragraphs, I would use this XSLT:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fn="http://www.w3.org/2005/xpath-functions"
version="2.0">
<!-- identity -->
<xsl:template match="/|#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<!-- strip out colons at the end of paragraphs -->
<xsl:template match="Paragraph">
<xsl:choose>
<!-- if it ends with a : -->
<xsl:when test="fn:ends-with(.,':')">
<xsl:copy>
<!-- copy everything but the last character -->
<xsl:value-of select="substring(., 1, string-length(.)-1)"></xsl:value-of>
</xsl:copy>
</xsl:when>
<xsl:otherwise>
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>

Unfortunately the previous answers were no option for me so i researched for a while and found this solution:
http://blog.alessio.marchetti.name/post/2011/02/12/the-Oliver-Becker-s-XPath-method
I use it to output text if a certain Node exists. 4 is the length of the text foo. So i guess a more elegant solution would be the use of a variable.
substring('foo',number(not(normalize-space(/elements/the/element/)))*4)

Somewhat simpler XPath 1.0 solution, adapted from Tomalek's (posted here) and Dimitre's (here):
concat(substring($s1, 1 div number($cond)), substring($s2, 1 div number(not($cond))))
Note: I found an explicit number() was required to convert the bool to an int otherwise some XPath evaluators threw a type mismatch error. Depending on how strict your XPath processor is type-matching you may not need it.

Related

passing a result set from a user defined function into the max() function

first time poster long time browser to bear with me if I'm not clear. I'm quite new to xslt.
I'm trying to write a function which passes a list of cleansed date values to the max() function. Following is my input document:
<dates>
<date>1990-09-02Z</date>
<date>1990-09-03Z</date>
<date>1990-09-04Z</date>
<date>1990-09-05Z</date>
<date>1990-09-06Z</date>
</dates>
As you can see, the string values have a trailing 'Z'. If I try to pass these directly to max() using a nested substring() function
<xsl:template match="/dates">
<xsl:value-of select="max(xs:date(substring(//date,1,10)))"/>
</xsl:template>
I get this error:
A sequence of more than one item is not allowed as the first argument of fn:substring() ("1990-09-02Z", "1990-09-03Z")
so I've included an xsl:function declaration into my stylesheet which now looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:test="http://www.blah.blah/funct"
version="3.0">
<xsl:function name="test:funct" visibility="public">
<xsl:param name="input"/>
<xsl:sequence>
<xsl:for-each select="$input">
<xsl:value-of select="xs:date(substring(.,1,10))"/>
</xsl:for-each>
</xsl:sequence>
</xsl:function>
<xsl:template match="/dates">
<xsl:value-of select="max(test:funct(//date))"/>
</xsl:template>
</xsl:stylesheet>
However, now I'm getting the following error
Failure converting {1990-09-02} to a number
I thought max() could handle dates? I'm quite confused about what's being passed into the max() function and why it's not working. the output I'm looking for is 1990-09-06
I try to read the w3org specification docs but the terms are too technical for me so not making sense of it. Appreciate any help you can offer.
By the way, processing engine I'm using is Saxon-PE 9.8.0.12
edit: my ultimate goal is to have a stylesheet with a list of functions which I can include within other xsl stylesheets, so ultimately the solution has to be a function. In this specific case a function which produces a list of cleansed dates which can then be passed to max().
As you have tagged that as XSLT 3, I would suggest to start with basic XPath 2/3 where you can simply write
//date/xs:date(substring(., 1, 10))
i.e. you can use function calls in the last step of your path to extract the substring and construct an xs:date: https://xsltfiddle.liberty-development.net/6rexjii
So that expression //date/xs:date(substring(., 1, 10)) gives you a sequence of xs:date values, you can then use the max function on them:
max(//date/xs:date(substring(., 1, 10)))
https://xsltfiddle.liberty-development.net/6rexjii/1
As for writing a user-defined function to have that last step done, I would write a function where the input is an xs:string and which returns an xs:date:
<xsl:function name="mf:date" as="xs:date">
<xsl:param name="input" as="xs:string"/>
<xsl:sequence select="xs:date(substring($input, 1, 10))"/>
</xsl:function>
Then you can call it as max(//date/mf:date(.)): https://xsltfiddle.liberty-development.net/6rexjii/2
If you really wanted to write a function to process a sequence of input items to return a sequence of xs:dates then use
<xsl:function name="mf:dates" as="xs:date*">
<xsl:param name="input" as="xs:string*"/>
<xsl:sequence select="$input ! xs:date(substring(., 1, 10))"/>
</xsl:function>
and call it with
<xsl:value-of select="max(mf:dates(//date))"/>
https://xsltfiddle.liberty-development.net/6rexjii/3
As a syntax alternative, in XPath 3.1 you can use the arrow operator =>:
<xsl:value-of select="//date => mf:dates() => max()"/>
https://xsltfiddle.liberty-development.net/6rexjii/4

Breaking for-each loop on a conditional base

I'm new to xslt 2.0, I would like to set the value to a variable in for-each loop only once (means if the value set, I want to come out of the loop).
For now it keep iterating for all the users. I just want to come out of the loop once the value set (immediately after my first attemp). I'm not sure how to break if the value set once.
Can you please help me on the below code ?
XSLT Code:
<xsl:variable name="v_first_name">
<xsl:for-each select="$emailList/emails/child::*">
<xsl:variable name="mailid" select="id" />
<xsl:for-each select="$userList/users/child::*">
<xsl:if test="emailid = $mailid">
<xsl:if test="firstname eq 'Antony'">
<xsl:value-of select="firstname" />
</xsl:if>
</xsl:if>
</xsl:for-each>
</xsl:for-each>
</xsl:variable>
<xsl:if test="$v_first_name != ''">
<first_name>
<xsl:value-of select="$v_first_name" />
</first_name>
</xsl:if>
XML O/p:
<first_name>AntonyAntonyAntonyAntony</first_name>
Expected XML O/P:
<first_name>Antony</first_name>
Note1: Please note that I'm using xslt 2.0 and my lists can have duplicates (So Antony can come twice, but I want only once (or unique)).
Note2: I also tried with position(), but couldn't find it work as the condition () can match at any position.
Thanks in advance.
Start with XPath and simply select the nodes you are looking for instead of considering for-each a "loop". If you select e.g. $userList/users/*[emailid = $emailList/emails/*/id] you select child elements from users which have a matching emailid in $emailList/emails/*.
I am not sure which sense it makes to hard code a first name value and then output it but of course you can select e.g. $userList/users/*[emailid = $emailList/emails/*/id and firstname = 'Antony']/lastname. That gives you a sequence of element nodes, if you want the first use a positional predicate e.g. depending on the structure of your input $userList/users/*[emailid = $emailList/emails/*/id and firstname = 'Antony'][1]/lastname or, of all selected elements ($userList/users/*[emailid = $emailList/emails/*/id and firstname = 'Antony']/lastname)[1].

Split methods on XPath 1.0

I use 'XPath', how I can simulate split method?
I read documentation and I know that XPath version 1.0 not have this method.
I have document contains this tags:
<TestCategoryModule>
<ItemCategories>
<![CDATA[Birthday Travel,Travel]]>
</ItemCategories>
</TestCategoryModule>
<TestCategoryModule2>
<ItemCategories>
<![CDATA[Travel]]>
</ItemCategories>
</TestCategoryModule2>
I want filter item by 'ItemCategories', but when I filtered by world 'Travel', return 2 item. I use this filter "ItemCategories[contains(text(), 'Travel')]".
I want that I filter by "Travel" return only second item. How can do it?
Use:
/*/*/*[contains(concat(',', ., ','), ',Travel,')]
Here is XSLT-based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select=
"/*/*/*[contains(concat(',', ., ','), ',Travel,')]"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on this XML document (essentially the provided XML fragment, extended with one more test case and made a well-formed XML document:
<t>
<TestCategoryModule>
<ItemCategories>Birthday Travel,Travel</ItemCategories>
</TestCategoryModule>
<TestCategoryModule2>
<ItemCategories>Birthday Travel</ItemCategories>
</TestCategoryModule2>
<TestCategoryModule2>
<ItemCategories>Travel</ItemCategories>
</TestCategoryModule2>
</t>
The wanted, correct result is produced:
<ItemCategories>Birthday Travel,Travel</ItemCategories>
<ItemCategories>Travel</ItemCategories>
I was a little wrong, or poorly described problumu. The problem is that the categories are stored as a string. I have three items, the first one contains categories: (Birthday Travel,Travel), second: (Birthday Travel), third: (Travel). When I request filtering for the word "Travel", I need to get the first and third items, but I get all three items, because all items contain world "Travel".
You actually don't need split() for the problem that you've described. If you want to match Travel but not Travel,Travel you want = instead of contains(). To deal with the whitespace around your CDATA sections, wrap it in normalize-space().
All put together, try ItemCategories[normalize-space(text()) = 'Travel'].

xpath with multiple predicates equivalence

I was told that the following are not the same:
a[1][#attr="foo"]
a[#attr="foo"][1]
Can someone explain why that is the case?
Think of XPath expressions as defining a result set1 - a set of nodes that fulfil all the requirements stated in the XPath expression. The predicates of XPath expressions (the parts inside []) either have no effect on the result set or they incrementally narrow it.
Put another way, in the following expression:
//xyz[#abc="yes"]
[#abc="yes"] reduces the result set defined to the left of it, by //xyz.
Note that, as Michael Kay has suggested, all that is said below only applies to XPath expressions with at least one positional predicate. Positional predicates are either a number: [1] or evaluate to a number, or contain position() or last().
If no positional predicate is present, the order of predicates in XPath expressions is not significant.
Consider the following simple input document:
<root>
<a attr="other"/>
<a attr="foo"/>
<a attr="other"/>
<a attr="foo"/>
</root>
As you can see, a[#attr = 'foo'] is not the first child element of root. If we apply
//a[1]
to this document, this will of course result in
<a attr="other"/>
Now, crucially, if we add another predicate to the expression, like so:
//a[1][#attr="foo"]
Then, [#attr="foo"] can only influence the result set defined by //a[1] already. In this result set, there is no a[#attr="foo"] - and the final result is empty.
On the other hand, if we start out with
//a[#attr="foo"]
the result will be
<a attr="foo"/>
-----------------------
<a attr="foo"/>
and in this case, if we add a second predicate:
//a[#attr="foo"][1]
the second predicate [1] can narrow down the result set of //a[#attr="foo"] to only contain the first of those nodes.
If you know XSLT, you might find an XSLT (and XPath 2.0) proof of this helpful:
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="xml" omit-xml-declaration="yes" encoding="UTF-8" indent="yes" />
<xsl:template match="/">
<result1>
<xsl:copy-of select="//a[1][#attr='foo']"/>
</result1>
<result2>
<xsl:copy-of select="//a[#attr='foo'][1]"/>
</result2>
</xsl:template>
</xsl:transform>
And the result will be
<result1/>
<result2>
<a attr="foo"/>
</result2>
1 Technically speaking, only XPath 1.0 calls this result a node-set. In XPath 2.0, all sets have become true sequences of nodes.

Sorting XPath results in the same order as multiple select parameters

I have an XML document as follows:
<objects>
<object uid="0" />
<object uid="1" />
<object uid="2" />
</objects>
I can select multiple elements using the following query:
doc.xpath("//object[#uid=2 or #uid=0 or #uid=1]")
But this returns the elements in the same order they're declared in the XML document (uid=0, uid=1, uid=2) and I want the results in the same order as I perform the XPath query (uid=2, uid=0, uid=1).
I'm unsure if this is possible with XPath alone, and have looked into XSLT sorting, but I haven't found an example that explains how I could achieve this.
I'm working in Ruby with the Nokogiri library.
There is no way in XPath 1.0 to specify the order of the selected nodes.
XPath 2.0 allows a sequence of nodes with any specific order:
//object[#uid=2], //object[#uid=1]
evaluates to a sequence in which all object items with #uid=2 precede all object items with #uid=1
If one doesn't have anXPath 2.0 engine available, it is still possible to use XSLT in order to output nodes in any desired order.
In this specific case the sequence of the following XSLT instructions:
<xsl:copy-of select="//object[#uid=2]"/>
<xsl:copy-of select="//object[#uid=1]"/>
produces the desired output:
<object uid="2" /><object uid="1" />
I am assuming you are using XPath 1.0. The W3C spec says:
The primary syntactic construct in XPath is the expression. An expression matches the production Expr. An expression is evaluated to yield an object, which has one of the following four basic types:
* node-set (an unordered collection of nodes without duplicates)
* boolean (true or false)
* number (a floating-point number)
* string (a sequence of UCS characters)
So I don't think you can re-order simply using XPath. (The rest of the spec defines document order and reverse document order, so if the latter does what you want you can get it using the appropriate axis (e.g. preceding).
In XSLT you can use <xsl:sort> using the name() of the attribute. The XSLT FAQ is very good and you should find an answer there.
An XSLT example:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:param name="pSequence" select="'2 1'"/>
<xsl:template match="objects">
<xsl:for-each select="object[contains(concat(' ',$pSequence,' '),
concat(' ',#uid,' '))]">
<xsl:sort select="substring-before(concat(' ',$pSequence,' '),
concat(' ',#uid,' '))"/>
<xsl:copy-of select="."/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Output:
<object uid="2" /><object uid="1" />
I don't think there is a way to do it in xpath but if you wish to switch to XSLT you can use the xsl:sort tag:
<xsl:for-each select="//object[#uid=1 or #uid=2]">
<xsl:sort: select="#uid" data-type="number" />
{insert new logic here}
</xsl:for-each>
more complete info here:
http://www.w3schools.com/xsl/el_sort.asp
This is how I'd do it in Nokogiri:
require 'nokogiri'
xml = '<objects><object uid="0" /><object uid="1" /><object uid="2" /></objects>'
doc = Nokogiri::XML(xml)
objects_by_uid = doc.search('//object[#uid="2" or #uid="1"]').sort_by { |n| n['uid'].to_i }.reverse
puts objects_by_uid
Running that outputs:
<object uid="2"/>
<object uid="1"/>
An alternative to the search would be:
objects_by_uid = doc.search('//object[#uid="2" or #uid="1"]').sort { |a,b| b['uid'].to_i <=> a['uid'].to_i }
if you don't like using sort_by with the reverse.
XPath is useful for locating and retrieving the nodes but often the filtering we want to do gets too convoluted in the accessor so I let the language do it, whether it's Ruby, Perl or Python. Where I put the filtering logic is based on how big the XML data set is and whether there are a lot of different uid values I'll want to grab. Sometimes letting the XPath engine do the heavy lifting makes sense, other times its easier to let XPath grab all the object nodes and filter in the calling language.

Resources