XPath expression to select the complete document excluding one element - xpath

I have an XML document
<root>
<a>Foo</a>
<b>Bar</b>
<c>Baz</c>
</root>
and need an XPath 1.0 query to obtain the entire document excluding the <b> element, as follows:
<root>
<a>Foo</a>
<c>Baz</c>
</root>
I have tried *[not(self::b)] but this just gives me the original document, as does *[not(ancestor-or-self::b)].
The queries /root/*[not(self::b)] and /root/*[not(ancestor-or-self::b)] work as expected to exclude the element, but omit the parent root element, which we require.
<a>Foo</a>
<c>Baz</c>
Any suggestions on how to achieve this would be gratefully received.

XPath can only select nodes that are there in the input, it cannot modify the input tree in any way. Your input does not contain a root element whose only children are a and c, so you cannot select such an element.
For that you need XSLT or XQuery.

Related

How to select a node that is inside a sibling of a parent node using xpath expression?

I'm trying to select a node based on the known text inside a sibling of a parent node. To be clearer my HTML has the following structure:
<k>
<l>Known</l>
</k>
<k>
<l>Desired</l>
</k>
My attempt:
//k//following-sibling::*[text()="Known"]
Returns:
Known
Why?
It's because basically you're selecting any descendant of k with the text Known.
(You're actually matching the l because it's a sibling of the whitespace before it. If you remove the whitespace (including line breaks), your xpath probably won't return anything.)
Try selecting the first following sibling k...
//k[l='Known']/following-sibling::k[1]/l

Using XPath how do I select a node() at a specific position that is also text()?

I want to select the previous node only if it is a text node (and contains only whitespace). I have an XPath like so: path/preceding-sibling::node()[1][normalize-space()='']. This works great but matches both text and element nodes (if the nodes contain only whitespace). Using path/preceding-sibling::text()[1][normalize-space()=''] will select the first preceding node that is a text node which is definitely not what I want if there are any elements in between.
How can I combine the two tests?
You can use self::text() to test if current node is a text node, like so :
path/preceding-sibling::node()[1][self::text() and normalize-space()='']

A `node()` is a tag?

Perhaps my problem (and of many others) is only a misconception about what I see at XML code as tag: it is detected by XPath as by node()? Or text and attributes are also detected?
When ., *, #*, text() and node() can be used for "tag detection"?
PS: my guess is that only * and node() can tags (and . is like a * about "children of this tag")... But I think I am wrong.
Close; what you call a tag is an element in xml parlance, and an element is a type of node, as are attributes, text, comments, etc.
In terms of XPath expressions, node() selects all nodes, irrespective of type, whilst * selects nodes of type element, and #* would give you the attributes.

Parsing XML tags with small difference in names

I have an XML file to parse in which the element tags are of the form:
<mensa-1>
..
</mensa-1>
<mensa-2>
..
</mensa-2>
Is it possible to parse such elements via Xpath when the element names differ via a number at the end?
The following XPath expression returns all the elements whose names start with "mensa-":
//*[starts-with(name(),'mensa-')]

What is the difference between normalize-space(.) and normalize-space(text())?

I was writing an XPath expression, and I had a strange error which I fixed, but what is the difference between the following two XPath expressions?
"//td[starts-with(normalize-space()),'Posted Date:')]"
and
"//td[starts-with(normalize-space(text()),'Posted Date:')]"
Mainly, what will the first XPath expression catch? Because I was getting a lot of strange results. So what does the text() make in the matching? Also, is there is a difference if I said normalize-space() & normalize-space(.)?
Well, the real question is: what's the difference between . and text()?
. is the current node. And if you use it where a string is expected (i.e. as the parameter of normalize-space()), the engine automatically converts the node to the string value of the node, which for an element is all the text nodes within the element concatenated. (Because I'm guessing the question is really about elements.)
text() on the other hand only selects text nodes that are the direct children of the current node.
So for example given the XML:
<a>Foo
<b>Bar</b>
lish
</a>
and assuming <a> is your current node, normalize-space(.) will return Foo Bar lish, but normalize-space(text()) will fail, because text() returns a nodeset of two text nodes (Foo and lish), which normalize-space() doesn't accept.
To cut a long story short, if you want to normalize all the text within an element, use .. If you want to select a specific text node, use text(), but always remember that despite its name, text() returns a nodeset, which is only converted to a string automatically if it has a single element.

Resources