In XPath how to select the element content - xpath

Is there a way of writing an XPath expression to select the content of the element.
e.g.
<Element>xxx</Element>
Assuming I can write XPath (/Element) to get Element how do I tweak the XPath to get xxxx returned rather than the Element wrapper?
EDIT/ANSWER
To do this in dom4j world use the Element.valueOf(String xpathExpression) rather than the .selectXXX() methods.

Use the value-of element:
<xsl:value-of select="/Some/Path/To/Element"/>
If you can only specify an XPath then use the text function like this:
/Some/Path/To/Element/text()

A bit too late but...
data(Element)
...should also be fine.

Related

XPath - Get Attribute With Certain Value

My XPath expression appears to be slightly wrong. Here is a snippet of my XML..
<wd:Repository_Document_Reference wd:Descriptor="EIB_Input.zip">
<wd:ID wd:type="WID">VALUE 1</wd:ID>
<wd:ID wd:type="Document_ID">VALUE 2</wd:ID>
</wd:Repository_Document_Reference>
I am looking to extract 'VALUE 2' as a single output.
The current XPath I am is not working:
/wd:Repository_Document_Reference/wd:ID[#wd:type='Document_ID']
Does my XPath need a slight tweak?
Thanks
Your XPath selects
<wd:ID wd:type="Document_ID">VALUE 2</wd:ID>
from the XML you've shown. In a context where this is evaluated as a string, it will indeed be
VALUE 2
If you wish to force it to be evaluated as a string, you can explicitly take the string value of your XPath:
string(/wd:Repository_Document_Reference/wd:ID[#wd:type='Document_ID'])
However, there's a chance that the rest of your document, which you've not shown, is causing other complications. Your XPath might be selecting multiple elements or no elements. You have to make sure that your XPath is specific enough to only be selecting the element you want. You also have to make sure that you've defined the namespaces prefix, wd properly. Without knowing more about your actual example, we can't say.
Try;
/wd:Repository_Document_Reference/wd:ID[contains(#wd:type,"Document_ID")]

Select element with a changing Id string using XPath

I have a textarea control with an Id that goes something like this:
<textarea id="NewTextArea~~51887~~1" rows="2"/>
And the xpath that has worked before has been
//textarea[#id, "NewTextArea~~51887~~1"]
But now the '51887' portion of the id is become diverse (changing every time) so I need to select the NewtextArea~~*~~1 element without actually specifying the number. Is there a way I can wildcard part of the string so that it will match a particular pattern? I tried using starts-with and ends-with but couldn't get it to work:
//textarea[starts-with(#id, 'NewTextArea~~') and ends-with(#name, '~~1')]
Bare in mind there are other fields with the difference being the number on the end.
Any advice or guidance would be greatly appreciated :)
I tried using starts-with and ends-with but couldn't get it to work:
//textarea[starts-with(#id, 'NewTextArea~~') and ends-with(#name, '~~1')]
ends-with() is available as a standard function only in XPath 2.0 and you seem to be using XPath 1.0.
Use:
//textarea
[starts-with(#id, 'NewTextArea~~')
and
substring(#id, string-length(#id) - 2) = '~~1'
]
Explanation:
See the answer to this question, for how to implement ends-with() in XPath 1.0:
https://stackoverflow.com/a/405507/36305

Nokogiri 'not' selector

Is there a way in Nokogiri to select all elements that don't match a selector. In jQuery I'd use:
:not(*[#class='someclass'])
However the following code gives me an xpath syntax error
dom = Nokogiri::HTML(#file)
dom.css(":not(*[#class='someclass'])")
In CSS3, :not() takes a selector like any other, so it would be:
dom.css(":not(.someclass)")
(untested, but the selector is right)
In addition to ton's answer, if you want to use two classes, that it would like this:
.local:not(.hide)
I'm not sure about the syntax you are using, but this is basically xpath selector you want:
dom.xpath("//wherever/*[not (#class='someclass')]")

XPath concat multiple nodes

I'm not very familiar with xpath. But I was working with xpath expressions and setting them in a database. Actually it's just the BAM tool for biztalk.
Anyway, I have an xml which could look like:
<File>
<Element1>element1<Element1>
<Element2>element2<Element2>
<Element3>
<SubElement>sub1</SubElement>
<SubElement>sub2</SubElement>
<SubElement>sub3</SubElement>
<Element3>
</File>
I was wondering if there is a way to use an xpath expression of getting all the SubElements concatted? At the moment, I am using:
/*[local-name()='File']/*[local-name()='Element3']/*[local-name()='SubElement']
This works if it only has one index. But apparently my xml sometimes has more nodes, so it gives NULL. I could just use
/*[local-name()='File']/*[local-name()='Element3']/*[local-name()='SubElement'][0]
but I need all the nodes. Is there a way to do this?
Thanks a lot!
Edit: I changed the XML, I was wrong, it's different, it should look like this:
<item>
<element1>el1</element1>
<element2>el2</element2>
<element3>el3</element3>
<element4>
<subEl1>subel1a</subEl1>
<subEl2>subel2a</subEl2>
</element4>
<element4>
<subEl1>subel1b</subEl1>
<subEl2>subel2b</subEl2>
</element4>
</item>
And I need to have a one line code to get a result like: "subel2a subel2b";
I need the one line because I set this xpath expression as an xml attribute (not my choice, it's specified). I tried string-join but it's not really working.
string-join(/file/Element3/SubElement, ',')
/File/Element3/SubElement will match all of the SubElement elements in your sample XML. What are you using to evaluate it?
If your evaluation method is subject to the "first node rule", then it will only match the first one. If you are using a method that returns a nodeset, then it will return all of them.
You can get all SubElements by using:
//SubElement
But this won't keep them grouped together how you want. You will want to do a query for all elements that contain a SubElement (basically do a search for the parent of any SubElements).
//parent::SubElement
Once you have that, you could (depending on your programming language) loop through the parents and concatenate the SubElements.

XPath to return string concatenation of qualifying child node values

Can anyone please suggest an XPath expression format that returns a string value containing the concatenated values of certain qualifying child nodes of an element, but ignoring others:
<div>
This text node should be returned.
<em>And the value of this element.</em>
And this.
<p>But this paragraph element should be ignored.</p>
</div>
The returned value should be a single string:
This text node should be returned. And the value of this element. And this.
Is this possible in a single XPath expression?
Thanks.
In XPath 2.0 :
string-join(/*/node()[not(self::p)], '')
In XPath 1.0:
You can use
/div//text()[not(parent::p)]
to capture the wanted text nodes. The concatenation itself cannot be done in XPath 1.0, I recommend doing it in the host application.
/div//text()
double slash forces to extract text regardless of intermediate nodes
This look that works:
Using as context /div/:
text() | em/text()
Or without the use of context:
/div/text() | /div/em/text()
If you want to concat the first two strings, use this:
concat(/div/text(), /div/em/text())
If you want all children except p, you can try the following...
string-join(//*[name() != 'p']/text(), "")
which returns...
This text node should be returned.
And the value of this element.
And this.
I know this comes a bit late, but I figure my answer could still be relevant. I recently ran into a similar problem. And because I use scrapy in Python 3.6, which does not support xpath 2.0, I could not use the string-join function suggested in several online answers.
I ended up finding a simple workaround (as shown below) which I did not see in any of the stackoverflow answers, that's why I'm sharing it.
temp_selector_list = response.xpath('/div')
string_result = [''.join(x.xpath(".//text()").extract()) for x in temp_selector_list]
Hope this helps!
You could use a for-each loop as well and assemble the values in a variable like this
<xsl:variable name="newstring">
<xsl:for-each select="/div//text()">
<xsl:value-of select="."/>
</xsl:for-each>
</xsl:variable>

Resources