XPath: how to extract first line of multi-line attribute value? - xpath

For example:
<doc>
<elem attr="firstLine &x0a; secondLine"/>
<elem attr="1stLine"/>
</doc>
The need is to get the first line, when there are many
From the example above, we want to get { 'firstLine', '1stLine'}
Thanks in advance

Use:
substring-before(/*/elem[1]/#attr, '
')
Here is an XSLT - based verification:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:copy-of select="substring-before(/*/elem[1]/#attr, '
')"/>
</xsl:template>
</xsl:stylesheet>
when this transformation is applied on the provided XML document (corrected to be well-formed !!!):
<doc>
<elem attr="firstLine
secondLine"/>
<elem attr="1stLine"/>
</doc>
it evaluates the XPath expression and outputs the result of this evaluation:
firstLine

If you're using XPath 2.0, you can use tokenize()

Related

Find position of a node with specific text among identical nodes using xpath

There is a set of trademark nodes <tm> in a single document. Each node <tm> contains the text node inside - trademark name. There may be identical nodes among tm's that means they have the same trademark name. I need to write the template that will add the trademark character ™ (™) only to the first occurrence of each trademark.
Example:
<doc>
<a><tm>A</tm></a>
<tm>A</tm>
<tm>B</tm>
<b><tm>B</tm></b>
<a><b><c><tm>A</tm></c></b></a>
</doc>
Only the first occurrences of <tm>A</tm> and <tm>B</tm> should be processed.
The expected result is:
<doc>
<a><tm>A™</tm></a>
<tm>A</tm>
<tm>B™</tm>
<b><tm>B</tm></b>
<a><b><c><tm>A</tm></c></b></a>
</doc>
The difficulty here is that there are identical nodes. Besides, I cannot write a separate template for each trademark, one template should match all.
Here is a draft of the solution:
<xsl:template match="tm">
<xsl:variable name="text" select="text()"/>
<xsl:variable name="same_tms" select="//tm[text()=$text]"/>
<xsl:if test=" --- current tm is the first among $same_tms --- ">
<xsl:value-of select="concat(text(), '™')"/>
</xsl:if>
</xsl:template>
I don't know how to write a generic test condition that would check if the current <tm> is the first among $same_tms. Is it possible?
Use a key, as in Muenchian grouping (http://www.jenitennison.com/xslt/grouping/muenchian.xml), only that with XSLT 2.0 you can use is instead of the generate-id() test you would need in XSLT 1.0:
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:key name="tm" match="tm" use="."/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="tm[. is key('tm', .)[1]]">
<xsl:copy>
<xsl:value-of select="concat(., '™')"/>
</xsl:copy>
</xsl:template>
</xsl:transform>
Online as http://xsltransform.net/ncdD7mC.

XPath joining multiple elements

I'm looking a way to join every two elements together using XPath2.0:
<item>
<element class='1'>el1</element>
<element class='2'>el2</element>
<break>break</break>
<element class='1'>el3</element>
<element class='2'>el4</element>
<break>break</break>
<element class='1'>el5</element>
<element class='2'>el6</element>
<break>break</break>
<element class='1'>el7</element>
<element class='2'>el8</element>
</item>
I am hoping the result will be like:
el1el2
el3el4
el5el6
el7el8
The are "breaks" between two meaningful elements, and there are also classes to help out, but still I cannot get it done.
Since I'm not familiar with XPath, this is what I can come up so far, and turned out to be wrong, since concatenate needs at least two arguments...
//item/concat(element[preceding-sibling::break | following-sibling::break])
//item/element[#class='1']/concat(., following-sibling::element[1])
You want your result sequence to contain one item for each of the class='1' elements, the value of that item being the concatenation of that element and its next sibling (the matching class='2').
I'm not sure if you're also open for an XSLT 1.0 solution but this works for me with your input xml:
<xsl:template match="/item">
<xsl:apply-templates select="element[1]|break"/>
</xsl:template>
<xsl:template match="element[1]">
<xsl:text>
</xsl:text>
<xsl:value-of select="."/>
<xsl:value-of select="following-sibling::*[1]"/>
</xsl:template>
<xsl:template match="break">
<xsl:text>
</xsl:text>
<xsl:value-of select="following-sibling::*[1]"/>
<xsl:value-of select="following-sibling::*[2]"/>
</xsl:template>
I have two templates that match either the start element or the break element. I use the following-sibling axis to get to the next two elements. The <xsl:text> elements are there to force a linebreak.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output omit-xml-declaration="yes"/>
<xsl:template match="item">
<xsl:for-each-group select="*" group-starting-with="break">
<xsl:if test="current-group()[1][self::break]">
<xsl:text>
</xsl:text>
</xsl:if>
<xsl:value-of select="current-group()[self::element]" separator=""/>
</xsl:for-each-group>
</xsl:template>
</xsl:stylesheet>

XPath: Get text that contains Obama but not Romney

I am quite new to XPath so bear with me. I have a XPath expression
'.//*[contains(.,"Obama")]/text()'
that gets me the text that contains "Obama". However, I haven't been able to figure out how to add
and [not(contains(., "Romney"))] to the expression without getting a syntax error. How is it done? Help much appriciated!
Use:
.//*[contains(.,"Obama") and not(contains(.,"Romney"))]/text()
XSLT - based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:copy-of select=
'.//*[contains(.,"Obama") and not(contains(.,"Romney"))]/text()'/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the following XML document:
<election>
<choice>Maybe Obama</choice>
<choice>Maybe Romney</choice>
</election>
the XPath expression is evaluated and the selected node is copied to the output:
Maybe Obama
Do note:
SomeExpression[x][y]
is not always equivalent to:
SomeExpression[x and y]
Therefore, it is recommended the latter -- not the former, as specified in the answer by #ChrisGerken.
Here is a concrete example:
Let's have this XML document:
<nums>
<num>01</num>
<num>02</num>
<num>03</num>
<num>04</num>
<num>05</num>
<num>06</num>
<num>07</num>
<num>08</num>
<num>09</num>
<num>10</num>
</nums>
and these two XPath expressions:
/*/*[. mod 3 = 0 and position() = 3]
and
/*/*[. mod 3 = 0][position() = 3]
The first expression selects:
<num>03</num>
However, the second expression selects:
<num>09</num>
And here is a complete XSLT - based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select=
"/*/*[. mod 3 = 0 and position() = 3]"/>
================
<xsl:copy-of select=
"/*/*[. mod 3 = 0][position() = 3]"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the above XML document, the two XPath expressions are evaluated and the results of these evaluations are copied to the output:
<num>03</num>
================
<num>09</num>
Explanation:
position() is a *context-sensitive` function and typically produces different results when used in the k-th and in the m-th predicate, where k != m
try this:
'.//*[contains(.,"Obama")][not(contains(.,"Romney"))]/text()'
You can put as many predicates as you like one after another:
[a][b][c]

XPath 1.0 Order of returned attributes in a UNION

<merge>
<text>
<div begin="A" end="B" />
<div begin="C" end="D" />
<div begin="E" end="F" />
<div begin="G" end="H" />
</text>
</merge>
I need a UNIONed set of attribute nodes, in the order A,B,C,D,E,F,G,H, and this will work:
/merge/text/div/#begin | /merge/text/div/#end
but only if each #begin comes before each #end, since the UNION operator is spec'd to return nodes in document order. (Yes?)
I need the nodeset to be in the same order, even if the attributes appear in a different order in the document, as here:
<merge>
<text>
<div end="B" begin="A" />
<div begin="C" end="D" />
<div end="F" begin="E" />
<div begin="G" end="H" />
</text>
</merge>
That is, I need elements to follow document order, but the attributes in each element to follow a determined order (either specified or alphabetical by attribute name).
This simply isn't possible in pure XPath. First of all, attributes in XML are unordered. From the XML 1.0 Recommendation:
Note that the order of attribute specifications in a start-tag or
empty-element tag is not significant.
An XPath engine might be reading and storing them in the order they appear in the document, but in terms of the spec, this is just a happy coincidence that cannot be relied upon.
Second, XPath has no sorting functionality. So, your best option is to sort the elements in your host language (e.g. XSLT or a general-purpose PL) after they've been selected.
Here's how to sort those attributes by value in XSLT:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:apply-templates
select="/merge/text/div/#*[name()='begin' or name()='end']">
<xsl:sort select="."/>
</xsl:apply-templates>
</xsl:template>
</xsl:stylesheet>
Note that I also merged your two expressions into one.
Edit: Use the following to output begin/end pairs in document order (as described in the comments):
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:template match="div">
<xsl:value-of select="concat(#begin, #end)"/>
</xsl:template>
</xsl:stylesheet>

XPath to return default value if node not present

Say I have a pair of XML documents
<Foo>
<Bar/>
<Baz>mystring</Baz>
</Foo>
and
<Foo>
<Bar/>
</Foo>
I want an XPath (Version 1.0 only) that returns "mystring" for the first document and "not-found" for the second. I tried
(string('not-found') | //Baz)[last()]
but the left hand side of the union isn't a node-set
In XPath 1.0, use:
concat(/Foo/Baz,
substring('not-found', 1 div not(/Foo/Baz)))
If you want to handle the posible empty Baz element, use:
concat(/Foo/Baz,
substring('not-found', 1 div not(/Foo/Baz[node()])))
With this input:
<Foo>
<Baz/>
</Foo>
Result: not-found string data type.
Special case:
If you want to get 0 if numeric node is missing or empty, use sum(/Foo/Baz) function
#Alejandro provided the best XPath 1.0 answer, which has been known for years, since first used by Jeni Tennison almost ten years ago.
The only problem with this expression is its shiny elegance, which makes it difficult to understand by not only novice programmers.
In a hosted XPath 1.0 (and every XPath is hosted!) one can use more understandable expressions:
string((/Foo/Baz | $vDefaults[not(/Foo/Baz/text())]/Foo/Baz)[last())
Here the variable $vDefaults is a separate document that has the same structure as the primary XML document, and whose text nodes contain default values.
Or, if XSLT is the hosting language, one can use the document() function:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:my="my:my">
<xsl:output method="text"/>
<my:defaults>
<Foo>
<Bar/>
<Baz>not-found</Baz>
</Foo>
</my:defaults>
<xsl:template match="/">
<xsl:value-of select=
"concat(/Foo/Baz,
document('')[not(current()/Foo/Baz/text())]
/*/my:defaults/Foo/Baz
)"/>
</xsl:template>
</xsl:stylesheet>
Or, not using concat():
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:my="my:my">
<xsl:output method="text"/>
<my:defaults>
<Foo>
<Bar/>
<Baz>not-found</Baz>
</Foo>
</my:defaults>
<xsl:variable name="vDefaults" select="document('')/*/my:defaults"/>
<xsl:template match="/">
<xsl:value-of select=
"(/Foo/Baz
| $vDefaults/Foo/Baz[not(current()/Foo/Baz/text())]
)
[last()]"/>
</xsl:template>
</xsl:stylesheet>
/Foo/(Baz/string(), 'not-found')[1]
If you are okay with printing an empty string instead of 'not-found' message then use:
/Foo/concat(Baz/text(), '')
Later, you can replace the empty strings with 'not-found'.

Resources