Limiting an XPath predicate: predicate starting with - xpath

I am navigating this office open xml file using XPath 1.0 (extract):
<sheetData ref="A1:XFD108">
<row spans="1:3" r="1">
<c t="s" r="A1">
<is>
<t>FirstCell</t>
</is>
</c>
<c t="s" r="C1">
<is>
<t>SecondCell</t>
</is>
</c>
</row>
<row spans="1:3" r="2">
<c t="s" r="A2">
<is>
<t>ThirdCell</t>
</is>
</c>
<c t="s" r="C2">
<is>
<t>[persons.ID]</t>
</is>
</c>
</row>
</sheetData>
I need to find the cell that says "[persons.ID]", which is a variable. Technically, I need to find the first <row> containing a descendant::t that starts with [ and closes with ]. I currently have:
.//row//t[starts-with(text(), '[') and
substring(text(), string-length(text())) = ']']/ancestor::row
So I filter and then go up again. It works, but I'd like to understand XPath better here - I found no way filter the predicate. Can you point me to a valid equivalent of doing something like .//row[descendant::t[starts-with()...]].
Any help is greatly appreciated.

Technically, I need to find the first
containing a descendant::t that
starts with [ and closes with ].
/sheetData/row[c/is/t[starts-with(.,'[')]
[substring(.,string-length(.))=']']]
[1]
or
/sheetData/row[.//t[starts-with(.,'[') and
substring(.,string-length(.))=']']][1]
or
(//row[.//t[starts-with(.,'[') and
substring(.,string-length(.))=']']])[1]

One option:
.//row[starts-with(descendant::t/text(),'[') and substring(descendant::t/text(), string-length(descendant::t/text())) = ']' ]
This will give you the row, however one significant problem could be if your row has two t elements that would satisfy different conditions, but not both conditions. e.g. one t starts with [, and another ends with ]
Obvsiously, what you have doesn't have this problem
Another option: use translate
.//row[translate(descendant::t/text(),"0123456789","") = "[]"]
That will strip the numeric characters and then it's a simple comparison to the [] characters

Related

How do I chose a chunk of tags that are in between of two tags?

(and including the ending tag)
For example:
<xml>
<a></a>
<a><b></b></a>
<a></a>
<a></a>
<a><c></c></a>
<a></a>
<a><b></b></a>
<a><b></b></a>
<a></a>
<a></a>
<a><b></b></a>
</xml>
I need these three <a> that are after the one that includes <b> and ending with the one that includes <c>.
Or rather "start from one with <c> and select back until you see one with <b> or end of document" that would be even better because there can be case with no 'start' <b> marker.
I need it to write an element-blocking rule for the uBlock Origin Chrome extension.
I think this should do the trick:
//a[c][1] |
//a[c][1]/preceding-sibling::a
[
not(
b or following-sibling::a[b]/following-sibling::a/c
)
]
Explanation:
the first a that contains a c, and also...
the a elements that precede that a, so long as they don't themselves either:
contain a b or
have a following a that contains a b and which is followed by another a that contains a c
I came up with this:
//a[not(b)][c | following-sibling::*[./*][1][./c]]
It takes all not(b) which are either c (the ending) or "that the next tag that includes anything includes c".
Or:
//a[c or not(*) and following-sibling::*[*][1][./c]]

xpath: select node closest to root

I need select a specific node name closest to (not needly on) root.
Example:
<root>
<a>
<b id="1"></b>
<b id="2">
<b id="3"></b>
</b>
<c>
<b id="4"></b>
</c>
</a>
</root>
It should select b#1, b#2 and b#4, but not b#2, because it is included inside of another b node.
Currently I'm doing that: select all b, so check if some of parents is b, if yes, discard that. But I do it hardcoded, maybe xpath can solve that alone?
I found the solution, just using not + ancestor, like:
//table[not(ancestor::table)]
I would try below expression-
//b[not(.//ancestor::b)]
It selects-
<b id="1"/>
<b id="4"/>
See live at here.

xpath with node(), how to express `node()[.//x]` condition?

I have a XPath that must match text and tags, except the tag <aa>; so,
./node()[name()!='aa']
is the correct xpath.
But it is insufficient for cases where tag aa is into the node, I need something like,
./node()[name()!='aa' and not(.//aa)]
but this xpath not works (!).
NOTE
I used
./*[not(self::aa or .//aa)] | ./text()
but it lost the original sequence order of the nodes. This problem is more evident when working with XSLT, example:
<xsl:for-each select="./*[not(self::aa or .//aa)] | ./text()">
<xsl:copy-of select="."/>
<xsl:for-each>
not works as expected (the order of nodes is not ensured). When using ./node() the order is always correct.
PS: with XSLT we have a solution using all the explained xpaths,
<xsl:for-each select="./node()[name()!='aa']">
<xsl:if test="not(.//aa)"><xsl:copy-of select="."/><xsl:if>
<xsl:for-each>
but the ideal/simplest one not works with the same result (when processing big and complex inputs),
<xsl:copy-of select="*[not(self::aa or .//aa)] | ./text()"/>
I'm imagining your file looks like:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<aa/>
<b>
<aa/>
</b>
<c>
<b>
<aa/>
</b>
</c>
<d/>
<e>
<b/>
</e>
</root>
Then the expression
//node()[not(descendant-or-self::aa)]
returns all nodes (including the whitespace text nodes) that are not themselves an <aa> element or have an <aa> descendant. Children of <aa> are matched as well.
You'll probably want to do something like
<xsl:copy-of select="node()[not(descendant-or-self::aa)]"/>

Ruby: How to find XML child element from specific element using xpath?

Given xml
<a>
<b key=1>
<c value=xxx />
</b>
<b key=2>
<c value=yyy />
</b>
</a>
Goal: Get each "b" first, then get the "c" under that "b", like result below. With XPath for searching child.
for <b key=1>
<c value=xxx />
for <b key=2>
<c value=xxx />
but below code
b_elements = XPath.match(xml, "//b[#key]")
b_elements.each do |b_element|
puts b_element.elements["//c"]
end
will result in yeilding
for <b key=1>
<c value=xxx />
<c value=yyy />
for <b key=2>
<c value=xxx />
<c value=yyy />
instead of just getting the "c" under each "b"
I had tried below method but no luck, seems that if using Xpath, it will automatically search from root element
b.get_elements("//c")
XPath.first(b, "//c")
My workaround now is traverse child element 1 layer at a time and search for desired key, which seems quite stupid comparing to using XPath.
Please advise, thanks : )
Reference:
http://ruby-doc.org/stdlib-1.9.3/libdoc/rexml/rdoc/REXML/Element.html#method-i-each_element_with_attribute
Not sure here, but my assumption is that XPath looks at the first char, sees that it is a /, and thinks that the path is absolute (because the path starting with / is meant to be absolute).
Probably you can force the path to be relative by using a . before //, so the parser doesn't confuse // for /?
I mean, instead of "//c" use ".//c"? Hope this helps.

XPath - Get node with no child of specific type

XML: /A/B or /A
I want to get all A nodes that do not have any B children.
I've tried
/A[not(B)]
/A[not(exists(B))]
without success
I prefer a solution with the syntax /*[local-name()="A" and .... ], if possible. Any ideas that works?
Clarification. The xml looks like:
<WhatEver>
<A>
<B></B>
</A>
</WhatEver>
or
<WhatEver>
<A></A>
</WhatEver>
Maybe
*[local-name() = 'A' and not(descendant::*[local-name() = 'B'])]?
Also, there should be only one root element, so for /A[...] you're either getting all your XML back or none. Maybe //A[not(B)] or /*/A[not(B)]?
I don't really understand why /A[not(B)] doesn't work for you.
~/xml% xmllint ab.xml
<?xml version="1.0"?>
<root>
<A id="1">
<B/>
</A>
<A id="2">
</A>
<A id="3">
<B/>
<B/>
</A>
<A id="4"/>
</root>
~/xml% xpath ab.xml '/root/A[not(B)]'
Found 2 nodes:
-- NODE --
<A id="2">
</A>
-- NODE --
<A id="4" />
Try this "/A[not(.//B)]" or this "/A[not(./B)]".
The first / causes XPath to start at the root of the document, I doubt that is what you intended.
Perhaps you meant //A[not(B)] which would find all A nodes in the document at any level that do not have a direct B child.
Or perhaps you are already at a node that contains A nodes in which case you just want A[not(B)] as the XPath.
If you are trying to get A anywhere in the hierarchy from the root, this works (for xslt 1.0 as well as 2.0 in case its used in xslt)
//descendant-or-self::node()[local-name(.) = 'a' and not(count(b))]
OR you can also do
//descendant-or-self::node()[local-name(.) = 'a' and not(b)]
OR also
//descendant-or-self::node()[local-name(.) = 'a' and not(child::b)]
There are n no of ways in xslt to achieve the same thing.
Note: XPaths are case-sensitive, so if your node names are different (which I am sure, no one is gonna use A, B), then please make sure the case matches.
Use this:
/*[local-name()='A' and not(descendant::*[local-name()='B'])]

Resources