Check attribute value of XML element with a regular expression pattern - xpath

I have this code, which does not work.
I want to iterate trough all p elements below the body element and look for an element named object. Then use the match() function to look for a certain pattern in the attribute node data.
<xsl:template match="/xhtml:html/xhtml:body//xhtml:p">
<xsl:for-each select="xhtml:object[matches(#data,'*mov$')]">
<halloMovie><xsl:value-of select="#data"/></halloMovie>
</xsl:for-each>
</xsl:template>
Error message while executing with matches()
[xslt] : Fatal Error! Could not compile stylesheet Cause: Error checking type of the expression 'funcall(matches, [step("attribute", 17), literal-expr(v$)])'.
If I use the following line with the contains() function which works.
<xsl:for-each select="xhtml:object[contains(#data,'.mov')]">
What am I doing wrong?

The matches function requires XPath 2.0 (and hence XSLT 2.0) or later, there's no regular expression support in XSLT 1.0.
If you just want to check that the value ends with ".mov" you can do that using the substring function
substring(#data, string-length(#data) - 3) = '.mov'
(XPath 1.0 has starts-with but you need 2.0 for ends-with).

Related

Using Variable in Node Attribute

I am writing a XSL template. I am not getting how to specify a variable for a custom attribute inside XSL file.
I am trying this code in XSL:
<xsl:variable name="var1" select="DEF"/>
<frequency myAttr="ABC"+$var1 >
<xsl:value-of select="frequency"/>
</frequency>
Expected result is
<frequency myAttr="ABCDEF" >20</frequency>
I am getting this error:
Unable to generate the XML document using the provided XML/XSL input. org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 24; Element type "sourceId" must be followed by either attribute specifications, ">" or "/>"
The issue is the way I am concatenating is wrong. Any help in achieving this?
I think you mean :
<frequency myAttr="ABC{$var1}" >
This will concatenate the literal string "ABC" with the content of the $var1 variable - see: Attribute Value Templates.
Note that the way you populate the variable suggests there is an element named DEF in the source XML. The expected result will be obtained only if this element has a string value of "DEF".
XSL is a declarative language in XML format. An XSLT file must be well-formed XML.
What you tried looks like you have programming knowledge of an imperative language where you can write expressions like "ABC" + $var1. This violates the XML format rules in XSL however. Elements follow the pattern <elementname attribute1="value1" attributeN="valueN">...</elementname>. In your code, you put +$var1 after the end of the myAttr attribute, which ends at the second quote mark - and this is invalid: <frequency myAttr="ABC"+$var1 >
<frequency myAttr="ABC+$var1"> would result in a literal attribute value of ABC+$var1, which is not what you want. This would also happen if you tried to use XPath syntax as attribute value, concat('ABC', $var1) as string from <frequency myAttr="concat('ABC', $var1)">.
You can use the Attribute Value Template syntax as michael.hor257k suggested, which essentially means to wrap XPath expressions in curly braces inside of attribute value strings.
Another way would be to not write the element as literal in your code, but rather declare it:
<xsl:variable name="var1" select="'DEF'"/>
<xsl:element name="frequency">
<xsl:attribute name="myAttr">
<xsl:value-of select="concat('ABC', $var1)"/>
</xsl:attribute>
</xsl:element>
Note that I corrected the variable definition: In the select attribute, you must put DEF in single quote marks if this is supposed to be a string, like select="'DEF'". Without the single quote marks you define the variable to refer to <DEF> elements in the source XML.
In the select attribute of <xsl:value-of> I used the XPath function concat() to concatenate the string 'ABC' and the content of the variable $var1. The attribute value template syntax can not be used here: select="'ABC{$var1}'" would result in the string ABC{$var1}.
A variation of above example would be to use <xsl:text> for the string ABC and <xsl:value-of> to output the content of $var1:
<xsl:variable name="var1" select="'DEF'"/>
<xsl:element name="frequency">
<xsl:attribute name="myAttr">
<xsl:text>ABC</xsl:text>
<xsl:value-of select="$var1"/>
</xsl:attribute>
</xsl:element>
So there are three solutions, a concise one and two declarative ones. Which one you choose is up to you. The quickest to type is certainly the first:
<xsl:variable name="var1" select="'DEF'"/>
<frequency myAttr="ABC{$var1}">
<xsl:value-of select="frequency"/>
</frequency>
... but you should also be aware of both declarative ways to achieve the same result and understand how they work.

How to write the Regular expression for the Below to use in Webdriver?

I have a requirement for webdriver to use xpath using Regular expression.I have a list of id's with different values.How can i write a expression for the below type of values.
//*[#id="js_1"]
//*[#id="js_2"]
//*[#id="js_3"]
//*[#id="js_4"]
//*[#id="js_5"]
//*[#id="js_6"]
I have to write the regrular expression for that above xpath format using webdriver?
I have tried with the below
Listnames=box.findElements(By.xpath("//div[contains(#id, 'js_*')]"));
But it wont work for me.How can i write a expression.Please help me.
Thanks & Regards,
Shiva Oleti
If you use js_* as standard regular expression it matches js, js_, js__, js___ ...
The correct regular expression would be js_\d+
However, the XPath contains function does not use regular expressions, so you can just use js_ (although it won't check for numbers).
Or better
`//div[starts-with(#id, 'js_')]`
AFAIK, Webdriver supports XQuery (such as using XQUIB), therefore full XPath 2.0 is supported.
Use:
//*[matches(#id, '^js_\d+$')]
XSLT-2.0 - based verification:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:sequence select="//*[matches(#id, '^js_\d+$')]"/>
</xsl:template>
</xsl:stylesheet>
when this transformation is applied on the following XML document:
<t>
<x id="js_1"/>
<y id="a1"/>
<z id="js_2008"/>
</t>
the above XPath expression is evaluated and the result of this evaluation is copied to the output:
<x id="js_1"/>
<z id="js_2008"/>
Explanation:
Proper use of the XPath 2.0 function matches() and RegEx.
Else you can use a loop to add elements to your list ...
List <WebElement> ls = null;
i=1;
while(i<5)
{
ls.add(we.findElement(By.xpath("html/body/div[1]/div/div[2]/section/nav/div[2]/ul/li["+i+
"]")));
i++;
}

Sorting XPath results in the same order as multiple select parameters

I have an XML document as follows:
<objects>
<object uid="0" />
<object uid="1" />
<object uid="2" />
</objects>
I can select multiple elements using the following query:
doc.xpath("//object[#uid=2 or #uid=0 or #uid=1]")
But this returns the elements in the same order they're declared in the XML document (uid=0, uid=1, uid=2) and I want the results in the same order as I perform the XPath query (uid=2, uid=0, uid=1).
I'm unsure if this is possible with XPath alone, and have looked into XSLT sorting, but I haven't found an example that explains how I could achieve this.
I'm working in Ruby with the Nokogiri library.
There is no way in XPath 1.0 to specify the order of the selected nodes.
XPath 2.0 allows a sequence of nodes with any specific order:
//object[#uid=2], //object[#uid=1]
evaluates to a sequence in which all object items with #uid=2 precede all object items with #uid=1
If one doesn't have anXPath 2.0 engine available, it is still possible to use XSLT in order to output nodes in any desired order.
In this specific case the sequence of the following XSLT instructions:
<xsl:copy-of select="//object[#uid=2]"/>
<xsl:copy-of select="//object[#uid=1]"/>
produces the desired output:
<object uid="2" /><object uid="1" />
I am assuming you are using XPath 1.0. The W3C spec says:
The primary syntactic construct in XPath is the expression. An expression matches the production Expr. An expression is evaluated to yield an object, which has one of the following four basic types:
* node-set (an unordered collection of nodes without duplicates)
* boolean (true or false)
* number (a floating-point number)
* string (a sequence of UCS characters)
So I don't think you can re-order simply using XPath. (The rest of the spec defines document order and reverse document order, so if the latter does what you want you can get it using the appropriate axis (e.g. preceding).
In XSLT you can use <xsl:sort> using the name() of the attribute. The XSLT FAQ is very good and you should find an answer there.
An XSLT example:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:param name="pSequence" select="'2 1'"/>
<xsl:template match="objects">
<xsl:for-each select="object[contains(concat(' ',$pSequence,' '),
concat(' ',#uid,' '))]">
<xsl:sort select="substring-before(concat(' ',$pSequence,' '),
concat(' ',#uid,' '))"/>
<xsl:copy-of select="."/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Output:
<object uid="2" /><object uid="1" />
I don't think there is a way to do it in xpath but if you wish to switch to XSLT you can use the xsl:sort tag:
<xsl:for-each select="//object[#uid=1 or #uid=2]">
<xsl:sort: select="#uid" data-type="number" />
{insert new logic here}
</xsl:for-each>
more complete info here:
http://www.w3schools.com/xsl/el_sort.asp
This is how I'd do it in Nokogiri:
require 'nokogiri'
xml = '<objects><object uid="0" /><object uid="1" /><object uid="2" /></objects>'
doc = Nokogiri::XML(xml)
objects_by_uid = doc.search('//object[#uid="2" or #uid="1"]').sort_by { |n| n['uid'].to_i }.reverse
puts objects_by_uid
Running that outputs:
<object uid="2"/>
<object uid="1"/>
An alternative to the search would be:
objects_by_uid = doc.search('//object[#uid="2" or #uid="1"]').sort { |a,b| b['uid'].to_i <=> a['uid'].to_i }
if you don't like using sort_by with the reverse.
XPath is useful for locating and retrieving the nodes but often the filtering we want to do gets too convoluted in the accessor so I let the language do it, whether it's Ruby, Perl or Python. Where I put the filtering logic is based on how big the XML data set is and whether there are a lot of different uid values I'll want to grab. Sometimes letting the XPath engine do the heavy lifting makes sense, other times its easier to let XPath grab all the object nodes and filter in the calling language.

Is there an "if -then - else " statement in XPath?

It seems with all the rich amount of function in xpath that you could do an "if" . However , my engine keeps insisting "there is no such function" , and I hardly find any documentation on the web (I found some dubious sources , but the syntax they had didn't work)
I need to remove ':' from the end of a string (if exist), so I wanted to do this:
if (fn:ends-with(//div [#id='head']/text(),': '))
then (fn:substring-before(//div [#id='head']/text(),': ') )
else (//div [#id='head']/text())
Any advice?
Yes, there is a way to do it in XPath 1.0:
concat(
substring($s1, 1, number($condition) * string-length($s1)),
substring($s2, 1, number(not($condition)) * string-length($s2))
)
This relies on the concatenation of two mutually exclusive strings, the first one being empty if the condition is false (0 * string-length(...)), the second one being empty if the condition is true. This is called "Becker's method", attributed to Oliver Becker (original link is now dead, the web archive has a copy).
In your case:
concat(
substring(
substring-before(//div[#id='head']/text(), ': '),
1,
number(
ends-with(//div[#id='head']/text(), ': ')
)
* string-length(substring-before(//div [#id='head']/text(), ': '))
),
substring(
//div[#id='head']/text(),
1,
number(not(
ends-with(//div[#id='head']/text(), ': ')
))
* string-length(//div[#id='head']/text())
)
)
Though I would try to get rid of all the "//" before.
Also, there is the possibility that //div[#id='head'] returns more than one node.
Just be aware of that — using //div[#id='head'][1] is more defensive.
The official language specification for XPath 2.0 on W3.org details that the language does indeed support if statements. See Section 3.8 Conditional Expressions, in particular. Along with the syntax format and explanation, it gives the following example:
if ($widget1/unit-cost < $widget2/unit-cost)
then $widget1
else $widget2
This would suggest that you shouldn't have brackets surrounding your expressions (otherwise the syntax looks correct). I'm not wholly confident, but it's surely worth a try. So you'll want to change your query to look like this:
if (fn:ends-with(//div [#id='head']/text(),': '))
then fn:substring-before(//div [#id='head']/text(),': ')
else //div [#id='head']/text()
I do strongly suspect this may fix it however, as the fact that your XPath engine seems to be trying to interpret if as a function, where it is in fact a special construct of the language.
Finally, to point out the obvious, insure that your XPath engine does in fact support XPath 2.0 (as opposed to an earlier version)! I don't believe conditional expressions are part of previous versions of XPath.
How about using fn:replace(string,pattern,replace) instead?
XPATH is very often used in XSLTs and if you are in that situation and does not have XPATH 2.0 you could use:
<xsl:choose>
<xsl:when test="condition1">
condition1-statements
</xsl:when>
<xsl:when test="condition2">
condition2-statements
</xsl:when>
<xsl:otherwise>
otherwise-statements
</xsl:otherwise>
</xsl:choose>
according to pkarat's, law you can achieve conditional XPath in version 1.0.
For your case, follow the concept:
concat(substring-before(your-xpath[contains(.,':')],':'),your-xpath[not(contains(.,':'))])
This will definitely work. See how it works. Give two inputs
praba:
karan
For 1st input: it contains : so condition true, string before : will be the output, say praba is your output. 2nd condition will be false so no problems.
For 2nd input: it does not contain : so condition fails, coming to 2nd condition the string doesn't contain : so condition true... therefore output karan will be thrown.
Finally your output would be praba,karan.
Personally, I would use XSLT to transform the XML and remove the trailing colons. For example, suppose I have this input:
<?xml version="1.0" encoding="UTF-8"?>
<Document>
<Paragraph>This paragraph ends in a period.</Paragraph>
<Paragraph>This one ends in a colon:</Paragraph>
<Paragraph>This one has a : in the middle.</Paragraph>
</Document>
If I wanted to strip out trailing colons in my paragraphs, I would use this XSLT:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fn="http://www.w3.org/2005/xpath-functions"
version="2.0">
<!-- identity -->
<xsl:template match="/|#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<!-- strip out colons at the end of paragraphs -->
<xsl:template match="Paragraph">
<xsl:choose>
<!-- if it ends with a : -->
<xsl:when test="fn:ends-with(.,':')">
<xsl:copy>
<!-- copy everything but the last character -->
<xsl:value-of select="substring(., 1, string-length(.)-1)"></xsl:value-of>
</xsl:copy>
</xsl:when>
<xsl:otherwise>
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
Unfortunately the previous answers were no option for me so i researched for a while and found this solution:
http://blog.alessio.marchetti.name/post/2011/02/12/the-Oliver-Becker-s-XPath-method
I use it to output text if a certain Node exists. 4 is the length of the text foo. So i guess a more elegant solution would be the use of a variable.
substring('foo',number(not(normalize-space(/elements/the/element/)))*4)
Somewhat simpler XPath 1.0 solution, adapted from Tomalek's (posted here) and Dimitre's (here):
concat(substring($s1, 1 div number($cond)), substring($s2, 1 div number(not($cond))))
Note: I found an explicit number() was required to convert the bool to an int otherwise some XPath evaluators threw a type mismatch error. Depending on how strict your XPath processor is type-matching you may not need it.

XPath 1 query and attributes name

First question: is there any way to get the name of a node's attributes?
<node attribute1="value1" attribute2="value2" />
Second question: is there a way to get attributes and values as value pairs? The situation is the following:
<node attribute1="10" attribute2="0" />
I want to get all attributes where value>0 and this way: "attribute1=10".
First question: is there any way to
get the name of a node's attributes?
<node attribute1="value1"
attribute2="value2" />
Yes:
This XPath expression (when node is the context (current) node)):
name(#*[1])
produces the name of the first attribute (the ordering may be implementation - dependent)
and this XPath expression (when node is the context (current) node)):
name(#*[2])
produces the name of the second attribute (the ordering may be implementation - dependent).
Second question: is there a way to get
attributes and values as value pairs?
The situation is the following:
<node attribute1="10" attribute2="0"
/>
I want to get all attributes where
value>0 and this way: "attribute1=10".
This XPath expression (when the attribute named "attribute1" is the context (current) node)):
concat(name(), '=', .)
produces the string:
attribute1=value1
and this XPath expression (when the node node is the context (current) node)):
#*[. > 0]
selects all attributes of the context node, whose value is a number, greater than 0.
In XPath 2.0 one can combine them in a single XPath expression:
#*[number(.) > 0]/concat(name(.),'=',.)
to get (in this particular case) this result:
attribute1=10
If you are using XPath 1.0, which is less powerful, you'll need to embed the XPath expression in a hosting language, such as XSLT. The following XSLT 1.0 thransformation :
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/*">
<xsl:for-each select="#*[number(.) > 0]">
<xsl:value-of select="concat(name(.),'=',.)"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
when applied on this XML document:
<node attribute1="10" attribute2="0" />
Produces exactly the same result:
attribute1=10
It depends a little bit on the context, I believe. In most cases, I expect you'd have to query "#*", enumerate over the items, and call "name()" - but it may work in some tests.
Re the edit - you can do:
#*[number(.)>0]
to find attributes matching your criteria, and:
concat(name(),'=',.)
to display the output. I don't think you can do both at once, though. What is the context here? xslt? what?

Resources