XPath to return string concatenation of qualifying child node values - xpath

Can anyone please suggest an XPath expression format that returns a string value containing the concatenated values of certain qualifying child nodes of an element, but ignoring others:
<div>
This text node should be returned.
<em>And the value of this element.</em>
And this.
<p>But this paragraph element should be ignored.</p>
</div>
The returned value should be a single string:
This text node should be returned. And the value of this element. And this.
Is this possible in a single XPath expression?
Thanks.

In XPath 2.0 :
string-join(/*/node()[not(self::p)], '')

In XPath 1.0:
You can use
/div//text()[not(parent::p)]
to capture the wanted text nodes. The concatenation itself cannot be done in XPath 1.0, I recommend doing it in the host application.

/div//text()
double slash forces to extract text regardless of intermediate nodes

This look that works:
Using as context /div/:
text() | em/text()
Or without the use of context:
/div/text() | /div/em/text()
If you want to concat the first two strings, use this:
concat(/div/text(), /div/em/text())

If you want all children except p, you can try the following...
string-join(//*[name() != 'p']/text(), "")
which returns...
This text node should be returned.
And the value of this element.
And this.

I know this comes a bit late, but I figure my answer could still be relevant. I recently ran into a similar problem. And because I use scrapy in Python 3.6, which does not support xpath 2.0, I could not use the string-join function suggested in several online answers.
I ended up finding a simple workaround (as shown below) which I did not see in any of the stackoverflow answers, that's why I'm sharing it.
temp_selector_list = response.xpath('/div')
string_result = [''.join(x.xpath(".//text()").extract()) for x in temp_selector_list]
Hope this helps!

You could use a for-each loop as well and assemble the values in a variable like this
<xsl:variable name="newstring">
<xsl:for-each select="/div//text()">
<xsl:value-of select="."/>
</xsl:for-each>
</xsl:variable>

Related

Find xml node whose name is a concatenation of attribute of another node and string constant

I have a bit of a tough xpath query (which I'm not entirely sure can be done).
I have the below xml
<Root>
<PersonOne Name='jon'/>
<PersonTwo Name='bob'/>
<JonDetails>some text</JonDetails>
<BobDetails>some details about Bob</BobDetails>
</Root>
I know it is a bit of a contrived example but the xml structure I am dealing with is fixed and I cannot change it.
Basically I'm trying to figure out the xpath to select the *Detail node for the name attribute in the PersonOne node.
So to do this I need to concat the atribute value of 'Name' in the PersonOne node with the constant Details to get 'JonDetails' as a node name.
I have this so far but it doesn't work but I think it is along the right lines.
/Root/*[contains(name(), concat(/Root/PersonOne/#Name, 'Details'))]
However, just to add to the fun it has to be a case insensitive match on the node name. I know this can be done with a translate function.
Any pointers in the right direction?
Jon
will this expression be better?
/Root/*[translate(name(), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz') = translate(concat(/Root/PersonOne/#Name, 'details'), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz')]
it looks for an exact match.
Just figured it out! It's not too pretty but it works.
/Root/*[contains(translate(name(), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), translate(concat(/Root/PersonOne/#Name, 'details'), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'))]
If anyone can improve on this it would be good to see how.
Thanks

Select element with a changing Id string using XPath

I have a textarea control with an Id that goes something like this:
<textarea id="NewTextArea~~51887~~1" rows="2"/>
And the xpath that has worked before has been
//textarea[#id, "NewTextArea~~51887~~1"]
But now the '51887' portion of the id is become diverse (changing every time) so I need to select the NewtextArea~~*~~1 element without actually specifying the number. Is there a way I can wildcard part of the string so that it will match a particular pattern? I tried using starts-with and ends-with but couldn't get it to work:
//textarea[starts-with(#id, 'NewTextArea~~') and ends-with(#name, '~~1')]
Bare in mind there are other fields with the difference being the number on the end.
Any advice or guidance would be greatly appreciated :)
I tried using starts-with and ends-with but couldn't get it to work:
//textarea[starts-with(#id, 'NewTextArea~~') and ends-with(#name, '~~1')]
ends-with() is available as a standard function only in XPath 2.0 and you seem to be using XPath 1.0.
Use:
//textarea
[starts-with(#id, 'NewTextArea~~')
and
substring(#id, string-length(#id) - 2) = '~~1'
]
Explanation:
See the answer to this question, for how to implement ends-with() in XPath 1.0:
https://stackoverflow.com/a/405507/36305

Find attribute names that start with a certain pattern

I am looking to find all attributes of an element that match a certain pattern.
So for an element
<element s2="1" name="aaaa" id="1" />
<element s3="1" name="aaaa" id="2" />
I would like to be able to find all attributes that start with 's' (returning the value of s1 for the first element and s3 for the value of the second element).
If this is outside of xpath's ability please let me know.
Use:
element/#*[starts-with(name(), 's')]
This XPath expression selects all atribute nodes whose name starts with the string 's' and that are attributes of elements named element that are children of the current node.
starts-with() is a standard function in XPath 1.0
element/#*[substring(name(), 1,1) = "s"]
will match any attribute that starts with 's'.
The function starts-with() might look better than using substring()
I've tested the given answers from both #Dimitre-Novatchev and #Ledhund, using lxml.html module in Python.
Both element/#*[starts-with(name(), 's')] and element/#*[substring(name(), 1,1) = "s"] return only the values of s2 and s3. You won't be able to know which value belong to which attribute.
I think in practice I would be more interested in finding the elements themselves that contain the attributes of names starting with specific characters rather than just their values.
To achieve that is very simple, just add /.. at the end,
element/#*[starts-with(name(), "s")]/..
or
element/#*[starts-with(name(), "s")]/parent::*
or
element/#*[starts-with(name(), "s")]/parent::node()
None from above worked for me.
So I did not some changes and it worked for me. :)
/*:UserCustomField[starts-with(#name, 'purchaseDate')]

In XPath how to select the element content

Is there a way of writing an XPath expression to select the content of the element.
e.g.
<Element>xxx</Element>
Assuming I can write XPath (/Element) to get Element how do I tweak the XPath to get xxxx returned rather than the Element wrapper?
EDIT/ANSWER
To do this in dom4j world use the Element.valueOf(String xpathExpression) rather than the .selectXXX() methods.
Use the value-of element:
<xsl:value-of select="/Some/Path/To/Element"/>
If you can only specify an XPath then use the text function like this:
/Some/Path/To/Element/text()
A bit too late but...
data(Element)
...should also be fine.

XPath concat multiple nodes

I'm not very familiar with xpath. But I was working with xpath expressions and setting them in a database. Actually it's just the BAM tool for biztalk.
Anyway, I have an xml which could look like:
<File>
<Element1>element1<Element1>
<Element2>element2<Element2>
<Element3>
<SubElement>sub1</SubElement>
<SubElement>sub2</SubElement>
<SubElement>sub3</SubElement>
<Element3>
</File>
I was wondering if there is a way to use an xpath expression of getting all the SubElements concatted? At the moment, I am using:
/*[local-name()='File']/*[local-name()='Element3']/*[local-name()='SubElement']
This works if it only has one index. But apparently my xml sometimes has more nodes, so it gives NULL. I could just use
/*[local-name()='File']/*[local-name()='Element3']/*[local-name()='SubElement'][0]
but I need all the nodes. Is there a way to do this?
Thanks a lot!
Edit: I changed the XML, I was wrong, it's different, it should look like this:
<item>
<element1>el1</element1>
<element2>el2</element2>
<element3>el3</element3>
<element4>
<subEl1>subel1a</subEl1>
<subEl2>subel2a</subEl2>
</element4>
<element4>
<subEl1>subel1b</subEl1>
<subEl2>subel2b</subEl2>
</element4>
</item>
And I need to have a one line code to get a result like: "subel2a subel2b";
I need the one line because I set this xpath expression as an xml attribute (not my choice, it's specified). I tried string-join but it's not really working.
string-join(/file/Element3/SubElement, ',')
/File/Element3/SubElement will match all of the SubElement elements in your sample XML. What are you using to evaluate it?
If your evaluation method is subject to the "first node rule", then it will only match the first one. If you are using a method that returns a nodeset, then it will return all of them.
You can get all SubElements by using:
//SubElement
But this won't keep them grouped together how you want. You will want to do a query for all elements that contain a SubElement (basically do a search for the parent of any SubElements).
//parent::SubElement
Once you have that, you could (depending on your programming language) loop through the parents and concatenate the SubElements.

Resources