XPath translate() function and (de)composed Unicode characters - xpath

Take following XSLT code:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:value-of select="translate('abc', 'éabc', 'eabc')"/> <!--0x65CC81-->
<xsl:value-of select="'
'"/>
<xsl:value-of select="translate('abc', 'éabc', 'eabc')"/> <!--0xC3A9-->
</xsl:template>
</xsl:stylesheet>
Running this with Saxon 10 returns:
bc
abc
The first invocation of the translate function uses the decomposed form of é (U+0065 and U+0301), while the second uses U+00E9. It seems they are not treated equally. Is this to be expected? And is this behavior specified somewhere?

See https://www.w3.org/TR/xpath-functions-31/#character-terminology §1.7.1, where it is stated:
Unless explicitly stated, the xs:string values returned by the
functions in this document are not normalized in the sense of
[Character Model for the World Wide Web 1.0: Fundamentals].
So the translate() function works on Unicode codepoints as input, and produces Unicode codepoints as output, and isn't concerned about whether those codepoints represent composed or decomposed characters. If you want normalization, you have to invoke it explicitly using the normalize-unicode() function.
(The quote above is a little bit ambiguous for my taste. By "are not normalized" it means "no action is taken to normalize the strings", it doesn't mean "the strings will not be in normalized form".)

I think the result is correct, what happens is that you have e.g. translate('abc', 'éabc', 'eabc') where the é is two characters in the second argument so the a in the second argument value is at the third position and is replace by the third position b of the third argument and the b is at the fourth position and is replaced by the fourth position c and the c doesn't have any replacement so it is replaced by the empty string/it is removed.
I guess you can do e.g. <xsl:value-of select="translate('abc', normalize-unicode('éabc', 'NFC'), 'eabc')"/ to avoid the problem.

Related

Usage of a variable in an xPath expression

With the definition
<xsl:variable name="testVariable">
<xsl:value-of select="'/author/'"/>
</xsl:variable>
I was hoping that
<xsl:value-of select="concat('./book',$testVariable,'#attribute')" />
returns the same like
<xsl:value-of select="./book/author/#attribute" />
But only the latter returns the actual value of the attribute, the first one just returns the path
./book/author/#attribute
How can I make the first one also return the value of the attribute?
Thanks!
The concat() function returns a string, it doesn't magically interpret that string as the source code of an XPath expression and then evaluate that expression.
Note also that
<xsl:variable name="testVariable">
<xsl:value-of select="'/author/'"/>
</xsl:variable>
can in 99% of cases be rewritten as
<xsl:variable name="testVariable" select="'/author/'"/>
which is not only less code, it's also a lot more efficient. (Sadly the other 1% of cases mean that the optimizer can't do this rewrite automatically.)
Usually you can achieve what you want using
select="/book/*[name()=$testVariable]/#attribute"
Occasionally you need to go a bit beyond that in which case you need something like xsl:evaluate in XSLT 3.0.

Using an XPath on current-group()

I need to select a subset of the nodes of a of the current-group() in an xsl:for-each-group loop. When I use an XPath of the form current-group()/foo, nothing is matched. If, however, I bind the current group to a variable like so:
<xsl:variable name="foo"><xsl:copy-of select="current-group()"/></xsl:variable>
and then use an XPath of the form $foo/foo, I get the expected matches. I suspect that the issue is somehow related with the type of current-group() and how the $foo variable has a different type, but I can't seem to figure it out by myself. Any clues how I can avoid introducing a variable to make the type conversion? Or is it something different?
if you do something like:
<xsl:for-each-group select="foo" group-by="type">
<xsl:value-of select="current-group()[self::foo]"/>
</xsl:for-each-group>
Then current-group() returns sequence of elements
But
<xsl:variable name="foo">
<xsl:copy-of select="current-group()"/>
</xsl:variable>
returns a document node which contains sequence of foo, and then you need to use:
<xsl:value-of select="current-group()/foo"/>

'Select' 2 pieces of info (XSLT file)

I am trying to link our Magento website with Sage 50 with a piece of software.
We would like the customers first name and last name to go into the company field.
Below are the 3 lines I assume I have to tweak:
<Forename><xsl:value-of select="billing_address/firstname"/></Forename>
<Surname><xsl:value-of select="billing_address/lastname"/></Surname>
<Company><xsl:value-of select="billing_address/company"/></Company>
How do I combine first name and last name in 1 line? looking for something like:
<Company><xsl:value-of select="billing_address/firstname, billing_address/lastname"/></Company>
You really need to tell us which version of XSLT you are using. Your proposed code
<xsl:value-of select="billing_address/firstname, billing_address/lastname"/>
is fine in 2.0, and you can get the comma by adding the attribute separator=", "/>. But this won't work in 1.0, where xsl:value-of will only output the first item if you give it a sequence.
First of all, whitespace-only text nodes are ignored by the XSLT engine, so what you tried above can be rewritten like the following:
<Company>
<xsl:value-of select="billing_address/firstname, billing_address/lastname"/>
</Company>
Second, you have to understand that xsl:value-of generates a text node. The following will generate 2 text nodes, with resp. the first and last names:
<Company>
<xsl:value-of select="billing_address/firstname"/>
<xsl:value-of select="billing_address/lastname"/>
</Company>
Then if I understand correctly, you want to seperate both with the string ", ". You can use xsl:text to generate a fixed-content text node:
<Company>
<xsl:value-of select="billing_address/firstname"/>
<xsl:text>, </xsl:text>
<xsl:value-of select="billing_address/lastname"/>
</Company>
In the above, you can put the ", " directly between both value-of, but then you can't control the indentation. Usuaully, when I generate fixed text, I always use xsl:text.
You can give
<Company><xsl:value-of select="concat(billing_address/firstname,', ', billing_address/lastname)"/></Company>
a try...

Substrings from iterable node

Please consider this sample file: http://www.w3schools.com/dom/books.xml
This XPath expression //title/text(), returns:
Everyday Italian
Harry Potter
XQuery Kick Start
Learning XML
Now I want just the first names, and try: tokenize(//title/text(),' ')[1], which returns:
Too many items
OTOH tokenize((//title/text())[1],' ')[1] returns first name for first node.
How can I get substrings with XPath while iterating nodes?
Use:
//text()/tokenize(.,' ')[1]
This produces a sequence of the first "word" of every text node in the XML document.
XSLT 2.0 - based verification:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:sequence select="//text()/tokenize(.,' ')[1]"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the following XML document:
<t>
<a>Everyday Italian</a>
<b>Harry Potter</b>
<c>XQuery Kick Start</c>
<d>Learning XML</d>
</t>
the XPath expression is evaluated and the result of this evaluation is copied to the output:
Everyday
Harry
XQuery
Learning
The above includes a few white-space only text nodes.
If you want to ignore any whitespace-only text node, change the XPath expression to:
//text()[normalize-space()]/tokenize(.,' ')[1]
Try this
1. To get all parts except last one use this:
//title/string-join(tokenize(.,'\s+')[position() ne last()],' ')
or
2. To get only first one use this:
//title/string-join(tokenize(.,'\s+')[position() eq 1],' ')
Hope this helps.

Problems inserting with PL/SQL Developer

I have the following script that I want to insert into a table, but I'm having some issues with it.
declare
v_xslt9 varchar2(32767) := '<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="html" encoding="UTF-8" indent="yes"/> <xsl:template name="Template"> <xsl:text>Kære </xsl:text> <xsl:value-of select="/nameValueMap/entry[string=''FIRST_NAME'']/string[2]"/> <xsl:text> </xsl:text> <xsl:value-of select="/nameValueMap/entry[string=''LAST_NAME'']/string[2]"/></xsl:template> </xsl:stylesheet>'
begin
insert into XSLT values ('','Note',sysdate,v_xslt9,sysdate,'T','')
end;
The part of interest is the following
<xsl:text> </xsl:text>
I'm using PL/SQL Developer and when I run the script above it recognises
as an entity and I then have to type in what value I want in it. What I want is a simple whitespace within the XSL such that the first and last name will be separated. I've tried all suggestions from the following link: orafaq - I just cant get it to work. Either it fails when I try to insert or it fails when I extract the data.
Is there some easy way of inserting a whitespace in the XSL?
use
'<xsl:text>'||'&'||'nbsp;</xsl:text>'
it will solve problem everywhere and forever, and as well as for you - for every future user.
Just isolate the symbol
Use a command window instead of a SQL window to run scripts in PL/SQL Developer. To transform one kind of window into another type, just right-click anywhere in the window and select "Change Window to" => "Command Window".
Then run your script as in SQL*Plus -- in this case with SET DEFINE OFF as the first line.
<xsl:text>& </xsl:text>
From the PL/SQL Developer manual:
If you wish to use an ampersand in the SQL text that should not be
interpreted as a substitution variable, use a double ampersand
instead.
Give this a try:
'<xsl:text>'||unistr('\00A0')||'</xsl:text>'
This unistr('\00A0') function returns the Unicode NO-BREAK SPACE
Or, if you want the entity itself, you can try this:
'<xsl:text>'||chr(38)||'nbsp;</xsl:text>'
The chr(38) returns a literal ampersand without trying to prompt for input.

Resources