Xpath-filtering items - xpath

I have a short question. How can I display only the elements who's value is = '.'
I have no idea how to do that. I'm newbie in XPath.
<SalesTransaction>
<TransactionHeader>
<TransactionHeaderFields>
<WrntyID>a</WrntyID>
<ExternalID/>
<Type>.</Type>
<Status>
Submited
</Status>
<CreationDate>
2015-01-12
</CreationDate>
<Date>
2015-01-12T11:41:29Z
</Date>
<DeliveryDate>
2015-01-12
</DeliveryDate>
<Remark/>
</TransactionHeaderFields>
<CatalogFields>
<CatalogID>
saf
</CatalogID>
</CatalogFields>
</TransactionHeader>
</SalesTransaction>

Ignoring any of the structure and just looking for any element who's text() is equal to ".", you could use:
//*[text()='.']
//* will search through the entire tree structure, looking for any element at any level
[text()='.'] is a predicate filter (kind of like a WHERE clause in SQL) that performs a test on each of those matched elements. Only the ones that have a text() node who's value is equal to . will evaluate to true() and will be what is left.
It's not not he most efficient XPath expression, but may be good enough for what you need.

Related

XQuery: look for node with descendants in a certain order

I have an XML file that represents the syntax trees of all the sentences in a book:
<book>
<sentence>
<w class="pronoun" role="subject">
I
</w>
<wg type="verb phrase">
<w class="verb" role="verb">
like
</w>
<wg type="noun phrase" role="object">
<w class="adj">
green
</w>
<w class="noun">
eggs
</w>
</wg>
</wg>
</sentence>
<sentence>
...
</sentence>
...
</book>
This example is fake, but the point is that the actual words (the <w> elements) are nested in unpredictable ways based on syntactic relationships.
What I'm trying to do is find <sentence> nodes with <w> children matching particular criteria in a certain order. For example, I may be looking for a sentence with a w[#class='pronoun'] descendant followed by a w[#class='verb'] descendant.
It's easy to find sentences that just contain both descendants, without caring about ordering:
//sentence[descendant::w[criteria1] and descendant::w[criteria2]]
I did manage to figure out this query that does what I want, which looks for a <w> with a following <w> matching the criteria with the same closest <sentence> ancestor:
for $sentence in //sentence
where $sentence[descendant::w[criteria1 and
following::w[(ancestor::sentence[1] = $sentence) and criteria2]]]
return ...
...but unfortunately it's very slow, and I'm not sure why.
Is there a non-slow way to search for a node that contains descendants matching criteria in a certain order? I'm using XQuery 3.1 with BaseX. If I can't find a reasonable way to do this with XQuery, plan B is to do post-processing with Python.
The following axis is expensive indeed, as it spans all subsequent nodes of a document that are no descendants and no ancestors.
The node comparison operators (<<, >>, is) may help you here. In the code example below, it is checked if there is at least one verb that is followed by a noun:
for $sentence in //sentence
let $words1 := $sentence//w[#class = 'verb']
let $words2 := $sentence//w[#class = 'noun']
where some $w1 in $words1 satisfies
some $w2 in $words2 satisfies $w1 << $w2
return $sentence

Select all nodes until a specific given node/tag

Given the following markup:
<div id="about">
<dl>
<dt>Date</dt>
<dd>1872</dd>
<dt>Names</dt>
<dd>A</dd>
<dd>B</dd>
<dd>C</dd>
<dt>Status</dt>
<dd>on</dd>
<dt>Another Field</dt>
<dd>X</dd>
<dd>Y</dd>
</dl>
</div>
I'm trying to extract all the <dd> nodes following <dt>Names</dt> but only until another <dt> starts. In this case, I'm after the following nodes:
<dd>A</dd>
<dd>B</dd>
<dd>C</dd>
I'm trying the following XPath code, but it's not working as intended.
xpath("//div[#id='about']/dl/dt[contains(text(),'Names')]/following-sibling::dd[not(following-sibling::dt)]/text()")
Any thoughts on how to fix it?
Many thanks.
Update: much simpler solution
There is a prerequisite in your situation, that is that the anchor item always is the first preceding sibling with a certain property. Because of that, here's a much simpler way of writing the below complex expression:
/div/dl/dd[preceding-sibling::dt[1][. = 'Names']]
In other words:
select any dd
that has a first preceding sibling dt (the preceding sibling axis counts backwards)
that itself has a value of "Names"
As can be seen in the following screenshot from oXygen, it selects the nodes you wanted to select (and if you change "Names" to "Status" or "Another Field", it will select only the following ones before the next dt also).
Original complex solution (leaving in for reference)
This is far easier in XPath 2.0, but let's assume you can only use XPath 1.0. The trick is to count the number of preceding siblings from your anchor element (the one with "Names" in it), and disregard any that have the wrong count (i.e., when we cross over <dt>Status</dt>, the number of preceding siblings has increased).
For XPath 1.0, remove the comments between (: and :) (in XPath, whitespace is insignificant, you can make it a multiline XPath for readability, but in 1.0, comments are not possible)
/div/dl/dd
(: any dd having a dt before it with "Names" :)
[preceding-sibling::dt[. = 'Names']]
(: count the preceding siblings up to dt with "Names", add one to include 'self' :)
[count(preceding-sibling::dt[. = 'Names']/preceding-sibling::dt) + 1
=
(: compare with count of all preceding siblings :)
count(preceding-sibling::dt)]
As a one-liner:
/div/dl/dd[preceding-sibling::dt[. = 'Names']][count(preceding-sibling::dt[. = 'Names']/preceding-sibling::dt) + 1 = count(preceding-sibling::dt)]
How about this:
//dd[preceding-sibling::dt[contains(., 'Names')]][following-sibling::dt]

XPath 1.0 exclusive or node-set expression

What I need doesn't quite seem to match what other articles of a similar title are about.
I need, using Xpath 1, to be able to get node a, or node b, excusively, in that order.
That is, node a if it exists, otherwise, node b.
an xpath expression such as :
expression | expression
will get me both in the case they both exist. that is not what I want.
I could go:
(expression | expression)[last()]
Which does in fact gget me what I need (in my case), but seems to be a bit inefficient, because it will evaluate both sides of the expression before the last result is selected.
I was hoping for an expression that is going to stop working once the left side succeeds.
A more concrete example of XML
<one>
<two>
<three>hello</three>
<four>bye</four>
</two>
<blahfive>again</blahfive>
</one>
and the xpath that works (but inefficient):
(/one/*[starts-with(local-name(.), 'blah')] | .)[last()]
To be clear, I would like to grab the immediate child node of 'one' which starts with 'blah'. However, if it doesn't exist, I would like only the current node.
If the 'blah' node does exist, I do not want the current node.
Is there a more efficient way to achieve this?
I need, using Xpath 1, to be able to get node a, or node b,
excusively, in that order. That is, node a if it exists, otherwise,
node b.
an xpath expression such as :
expression | expression
will get me both in the case they both exist. that is not what I want.
I could go:
(expression | expression)[last()]
Which does in fact gget me what I need (in my case),
This statement is not true.
Here is an example. Let us have this XML document:
<one>
<a/>
<b/>
</one>
Expression1 is:
/*/a
Expression2 is:
/*/b
Your composite expression:
(Expression1 | Expression2)[last()]
when we substitute the two expressions above is:
(/*/a | /*/b)[last()]
And this expression actually selects b -- not a -- because b is the last of the two in document order.
Now, here is an expression that selects just a if it exists, and selects b only if a doesn't exist -- regardless of document order:
/*/a | /*/b[not(/*/a)]
When this expression is evaluated on the XML document above, it selects a, regardless of its document order -- try swapping in the XML document above the places of a and b to confirm that in both cases the element that is selected is a.
To summarize, one expression that selects the wanted node regardless of any document order is:
Expression1 | Expression2[not(Expression1)]
Let us apply this general expression in your case:
Expression1 is:
/one/*[starts-with(local-name(.), 'blah')]
Expression2 is:
self::node()
The wanted expression (after substituting Expression1 and Expression2 in the above general expression) is:
/one/*[starts-with(local-name(.), 'blah')]
|
self::node()[not(/one/*[starts-with(local-name(.), 'blah')])]

XPath : Find following siblings that don't follow an order pattern

This is for C code detection. I'm trying to flag case statements that don't have a break. The hierarchy of the tree looks like this when there are multiple lines before the break statement. This is an example in C:
switch (x) {
case 1:
if (...) {...}
int y = 0;
for (...) {...}
break;
case 2:
It is somehow represented as this:
<switch>
<case>...</case>
<if>...</if>
<expression>...</expression>
<for>...</for>
<break>...</break>
<case>...</case>
</switch>
I need to find <case>s where a <break> exists after any number of lines, but before the next <case>.
This code only helps me find those where the break doesn't immediately follow the case:
//case [name(following-sibling::*[1]) != 'break']
..but when I try to use following-sibling::* it will find a break, but not necessarily before the next case.
How can I do this?
Select any case that has a following break and either no following case or where the position of the next break is less than the position of the next case. With the positions determined by running count() on the preceding siblings.
//case
[
following-sibling::break and
(
not(following-sibling::case) or
(
count(following-sibling::break[1]/preceding-sibling::*) <
count(following-sibling::case[1]/preceding-sibling::*)
)
)
]
To grab the other cases, those without breaks, just throw a big old not() in there like so:
//case
[not(
following-sibling::break and
(
not(following-sibling::case) or
(
count(following-sibling::break[1]/preceding-sibling::*) <
count(following-sibling::case[1]/preceding-sibling::*)
)
)
)]
I agree with #PeterHall, It would be better to restructure the XML into something more closely representing the abstract syntax tree of the C grammar. You can do this easily enough (for this case) with XSLT grouping:
<xsl:for-each-group select="*" group-starting-with="case">
<case>
<xsl:copy-of select="current-group()[not(self::case)]"/>
</case>
</xsl:for-each-group>
You can then find cases with no break as switch/case[not(break)].
I think you are struggling because your XML format does not really model the problem very well. It would be much easier if the other statements were nested inside the <case> elements, instead of being siblings, then you could just use switch/case[break].
With your current structure, it's easiest to start by finding the <break> and then work backwards to find the matching <case>. As #LarsH pointed out, my original expression would find some additional clauses. It can't really be modified to fix that, unless you restrict it to find just the first case:
switch/break/preceding-sibling::case[1]
#derp's answer is better, and can find both cases with and without breaks.
Derp's answer is correct. But I'll just add another. This selects case elements that do have a break:
//case[generate-id(.) =
generate-id(following-sibling::break[1]/preceding-sibling::case[1])]
In otherwords, this selects case elements for which this is true:
The context element is identical to the first case element preceding the next break element (considering siblings only).
If you have a lot of case statements, this variant could be faster than using count(). But you never know for sure unless you test it with the relevant data using the relevant XPath processor.
BTW, the . in generate-id(.) is not required, as the argument defaults to . anyway. But I prefer to make it explicit, for readability.

xpath Expression for "or" operator

Can anyone please help me, I want to use or operator in my xpath expression to select all input or all a from an html page.
my expression is like this:
document.DocumentNode.SelectNodes("//input or //a");
But I'm having errors.
You can use the union operator:
//input | //a
Or an expression like this, which may perform somewhat better:
//*[self::input or self::a]
The or operator is boolean OR in XPath, so //input or //a is a boolean expression which will return true if either of the node sets //input and //a are non-empty (i.e. within your source document there is at least one input element or one a element or both) and false otherwise.
Instead you're looking for the | operator which is the "union" operation on node sets.
//input | //a
will give you a set containing all the input elements and all the a elements.

Resources