Substrings from iterable node - xpath

Please consider this sample file: http://www.w3schools.com/dom/books.xml
This XPath expression //title/text(), returns:
Everyday Italian
Harry Potter
XQuery Kick Start
Learning XML
Now I want just the first names, and try: tokenize(//title/text(),' ')[1], which returns:
Too many items
OTOH tokenize((//title/text())[1],' ')[1] returns first name for first node.
How can I get substrings with XPath while iterating nodes?

Use:
//text()/tokenize(.,' ')[1]
This produces a sequence of the first "word" of every text node in the XML document.
XSLT 2.0 - based verification:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:sequence select="//text()/tokenize(.,' ')[1]"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the following XML document:
<t>
<a>Everyday Italian</a>
<b>Harry Potter</b>
<c>XQuery Kick Start</c>
<d>Learning XML</d>
</t>
the XPath expression is evaluated and the result of this evaluation is copied to the output:
Everyday
Harry
XQuery
Learning
The above includes a few white-space only text nodes.
If you want to ignore any whitespace-only text node, change the XPath expression to:
//text()[normalize-space()]/tokenize(.,' ')[1]

Try this
1. To get all parts except last one use this:
//title/string-join(tokenize(.,'\s+')[position() ne last()],' ')
or
2. To get only first one use this:
//title/string-join(tokenize(.,'\s+')[position() eq 1],' ')
Hope this helps.

Related

XPath translate() function and (de)composed Unicode characters

Take following XSLT code:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:value-of select="translate('abc', 'éabc', 'eabc')"/> <!--0x65CC81-->
<xsl:value-of select="'
'"/>
<xsl:value-of select="translate('abc', 'éabc', 'eabc')"/> <!--0xC3A9-->
</xsl:template>
</xsl:stylesheet>
Running this with Saxon 10 returns:
bc
abc
The first invocation of the translate function uses the decomposed form of é (U+0065 and U+0301), while the second uses U+00E9. It seems they are not treated equally. Is this to be expected? And is this behavior specified somewhere?
See https://www.w3.org/TR/xpath-functions-31/#character-terminology §1.7.1, where it is stated:
Unless explicitly stated, the xs:string values returned by the
functions in this document are not normalized in the sense of
[Character Model for the World Wide Web 1.0: Fundamentals].
So the translate() function works on Unicode codepoints as input, and produces Unicode codepoints as output, and isn't concerned about whether those codepoints represent composed or decomposed characters. If you want normalization, you have to invoke it explicitly using the normalize-unicode() function.
(The quote above is a little bit ambiguous for my taste. By "are not normalized" it means "no action is taken to normalize the strings", it doesn't mean "the strings will not be in normalized form".)
I think the result is correct, what happens is that you have e.g. translate('abc', 'éabc', 'eabc') where the é is two characters in the second argument so the a in the second argument value is at the third position and is replace by the third position b of the third argument and the b is at the fourth position and is replaced by the fourth position c and the c doesn't have any replacement so it is replaced by the empty string/it is removed.
I guess you can do e.g. <xsl:value-of select="translate('abc', normalize-unicode('éabc', 'NFC'), 'eabc')"/ to avoid the problem.

Usage of a variable in an xPath expression

With the definition
<xsl:variable name="testVariable">
<xsl:value-of select="'/author/'"/>
</xsl:variable>
I was hoping that
<xsl:value-of select="concat('./book',$testVariable,'#attribute')" />
returns the same like
<xsl:value-of select="./book/author/#attribute" />
But only the latter returns the actual value of the attribute, the first one just returns the path
./book/author/#attribute
How can I make the first one also return the value of the attribute?
Thanks!
The concat() function returns a string, it doesn't magically interpret that string as the source code of an XPath expression and then evaluate that expression.
Note also that
<xsl:variable name="testVariable">
<xsl:value-of select="'/author/'"/>
</xsl:variable>
can in 99% of cases be rewritten as
<xsl:variable name="testVariable" select="'/author/'"/>
which is not only less code, it's also a lot more efficient. (Sadly the other 1% of cases mean that the optimizer can't do this rewrite automatically.)
Usually you can achieve what you want using
select="/book/*[name()=$testVariable]/#attribute"
Occasionally you need to go a bit beyond that in which case you need something like xsl:evaluate in XSLT 3.0.

Using an XPath on current-group()

I need to select a subset of the nodes of a of the current-group() in an xsl:for-each-group loop. When I use an XPath of the form current-group()/foo, nothing is matched. If, however, I bind the current group to a variable like so:
<xsl:variable name="foo"><xsl:copy-of select="current-group()"/></xsl:variable>
and then use an XPath of the form $foo/foo, I get the expected matches. I suspect that the issue is somehow related with the type of current-group() and how the $foo variable has a different type, but I can't seem to figure it out by myself. Any clues how I can avoid introducing a variable to make the type conversion? Or is it something different?
if you do something like:
<xsl:for-each-group select="foo" group-by="type">
<xsl:value-of select="current-group()[self::foo]"/>
</xsl:for-each-group>
Then current-group() returns sequence of elements
But
<xsl:variable name="foo">
<xsl:copy-of select="current-group()"/>
</xsl:variable>
returns a document node which contains sequence of foo, and then you need to use:
<xsl:value-of select="current-group()/foo"/>

'Select' 2 pieces of info (XSLT file)

I am trying to link our Magento website with Sage 50 with a piece of software.
We would like the customers first name and last name to go into the company field.
Below are the 3 lines I assume I have to tweak:
<Forename><xsl:value-of select="billing_address/firstname"/></Forename>
<Surname><xsl:value-of select="billing_address/lastname"/></Surname>
<Company><xsl:value-of select="billing_address/company"/></Company>
How do I combine first name and last name in 1 line? looking for something like:
<Company><xsl:value-of select="billing_address/firstname, billing_address/lastname"/></Company>
You really need to tell us which version of XSLT you are using. Your proposed code
<xsl:value-of select="billing_address/firstname, billing_address/lastname"/>
is fine in 2.0, and you can get the comma by adding the attribute separator=", "/>. But this won't work in 1.0, where xsl:value-of will only output the first item if you give it a sequence.
First of all, whitespace-only text nodes are ignored by the XSLT engine, so what you tried above can be rewritten like the following:
<Company>
<xsl:value-of select="billing_address/firstname, billing_address/lastname"/>
</Company>
Second, you have to understand that xsl:value-of generates a text node. The following will generate 2 text nodes, with resp. the first and last names:
<Company>
<xsl:value-of select="billing_address/firstname"/>
<xsl:value-of select="billing_address/lastname"/>
</Company>
Then if I understand correctly, you want to seperate both with the string ", ". You can use xsl:text to generate a fixed-content text node:
<Company>
<xsl:value-of select="billing_address/firstname"/>
<xsl:text>, </xsl:text>
<xsl:value-of select="billing_address/lastname"/>
</Company>
In the above, you can put the ", " directly between both value-of, but then you can't control the indentation. Usuaully, when I generate fixed text, I always use xsl:text.
You can give
<Company><xsl:value-of select="concat(billing_address/firstname,', ', billing_address/lastname)"/></Company>
a try...

XPath to get all text in element as one value, removing line breaks

I am trying to get all the text in a node for a following set and returning as one value (not multiple nodes).
<p>
"I love eating out."
<br>
<br>
"This is my favorite restaurant."
<br>
"I will definitely be back"
</p>
I am using '/p' and get all the results but it returns with line breaks. Also trying '/p/text()' results in getting each text between each tag as a separate returned value. The ideal return would be --
"I love eating out. This is my favorite restaurant. I will definitely be back"
I've tried searching other questions but couldn't find something as close. Please not that in the current environment I am restricted to only use an XPath Query and cannot parse after or setup any HTML pre-parsing. Specifically I'm using the importXML function inside of Google Docs.
Use:
normalize-space(/)
When this XPath expression is evaluated, the string value of the document node (/) is first produced and this is provided as argument to the standard XPath function normalize-space().
By definition, normalize-space() returns its argument with the leading and trailing adjacent whitespace characters eliminated, and any interim such group of adjacent whitespace characters -- replaced by a single space character.
The evaluation of the above XPath expression results in:
"I love eating out." "This is my favorite restaurant." "I will definitely be back"
To eliminate the quotes, we additionally use the translate() function:
normalize-space(translate(/,'"', ''))
The result of evaluating this expression is:
I love eating out. This is my favorite restaurant. I will definitely be back
Finally, to have this result wrapped in quotes itself, we use the concat() function:
concat('"',
normalize-space(translate(/,'"', '')),
'"'
)
The evaluation of this XPath expression produces exactly the wanted result:
"I love eating out. This is my favorite restaurant. I will definitely be back"
XSLT - based verification:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:value-of select=
"concat('"',
normalize-space(translate(/,'"', '')),
'"'
)"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document (corrected to be made well-formed):
<p>
"I love eating out."
<br />
<br />
"This is my favorite restaurant."
<br />
"I will definitely be back"
</p>
the XPath expression is evaluated and the result of this evaluation is copied to the output:
"I love eating out. This is my favorite restaurant. I will definitely be back"

Resources