xpath 1.0 : subset matches an attribute list? - xpath

In xml I have elements with an attribute that contains a list of categories for example : cat="A B C D"
Is there any possibility with xpath 1.0 (I'm using xslt in Firefox) to return all the elements where a subset of categories matches the list?
For example :
subset(A C) cat(A B C) true
subset(D) cat(A C) false
subset(A C) cat(A B) false
Thanks for your help.

Here's some code that should do basically what you want. The XSLT wrapper is just to set the variables.
<xsl:variable name="subset" select="A C" />
<xsl:variable name="matches"
select="//*[translate($subset, concat(#cat, ' '), '') = '']" />
Of course, you may need to tweak //* depending on what kinds of elements you're trying to match.
Concatenating a space to #cat is only necessary if you may have subset strings like 'D' that contain no spaces.
This code also assumes that all category names are single letters. If that's not the case, let me know.

Thanks for the code.
In fact, my categories contain normally more than one letter and it is possible that there are subcategories (for example : AB.CD where the point is the delimiter for the subcategory). It would be nice if I can also search only one part of a category (for example :
subset(A Z.Y) cat(A.B Z.Y) true

Related

XQuery: look for node with descendants in a certain order

I have an XML file that represents the syntax trees of all the sentences in a book:
<book>
<sentence>
<w class="pronoun" role="subject">
I
</w>
<wg type="verb phrase">
<w class="verb" role="verb">
like
</w>
<wg type="noun phrase" role="object">
<w class="adj">
green
</w>
<w class="noun">
eggs
</w>
</wg>
</wg>
</sentence>
<sentence>
...
</sentence>
...
</book>
This example is fake, but the point is that the actual words (the <w> elements) are nested in unpredictable ways based on syntactic relationships.
What I'm trying to do is find <sentence> nodes with <w> children matching particular criteria in a certain order. For example, I may be looking for a sentence with a w[#class='pronoun'] descendant followed by a w[#class='verb'] descendant.
It's easy to find sentences that just contain both descendants, without caring about ordering:
//sentence[descendant::w[criteria1] and descendant::w[criteria2]]
I did manage to figure out this query that does what I want, which looks for a <w> with a following <w> matching the criteria with the same closest <sentence> ancestor:
for $sentence in //sentence
where $sentence[descendant::w[criteria1 and
following::w[(ancestor::sentence[1] = $sentence) and criteria2]]]
return ...
...but unfortunately it's very slow, and I'm not sure why.
Is there a non-slow way to search for a node that contains descendants matching criteria in a certain order? I'm using XQuery 3.1 with BaseX. If I can't find a reasonable way to do this with XQuery, plan B is to do post-processing with Python.
The following axis is expensive indeed, as it spans all subsequent nodes of a document that are no descendants and no ancestors.
The node comparison operators (<<, >>, is) may help you here. In the code example below, it is checked if there is at least one verb that is followed by a noun:
for $sentence in //sentence
let $words1 := $sentence//w[#class = 'verb']
let $words2 := $sentence//w[#class = 'noun']
where some $w1 in $words1 satisfies
some $w2 in $words2 satisfies $w1 << $w2
return $sentence

how to write an xpath that returns the result of function b if function a returns empty

I have two xpath functions that I want to combine
a - return the subdomain string if it's not "www"
replace(substring-before(replace(//url,"(https?://)",""),"."),"www","")
b - return the subfolder name
substring-before(replace(//url,"(https?://[^/]+/)",""),"/")
Is it possible to have an xpath that returns b if a is empty, otherwise it returns a?
Example:
http://aaaa.something.com/bbbb should return "aaaa"
http://www.something.com/bbbb should return "bbbb"
Is it possible to have an xpath that returns b if a is empty,
otherwise it returns a?
The usual XPath 2.0 idiom for that is
(a, b)[1]
(Efficiency here depends on piped evaluation, but I think you can assume that any half-decent processor when given X[1] will avoid evaluating items in X beyond the first).
After digging into #Michael's suggestion, I found the answer:
(a, b)[string-length() > 0][1]
which in my example translates into:
(replace(substring-before(replace(//url,"(https?://)",""),"."),"www",""), replace(replace(//url,"(https?://[^/]+[/])",""),"[/+].*",""))[string-length() > 0][1]

Select all nodes until a specific given node/tag

Given the following markup:
<div id="about">
<dl>
<dt>Date</dt>
<dd>1872</dd>
<dt>Names</dt>
<dd>A</dd>
<dd>B</dd>
<dd>C</dd>
<dt>Status</dt>
<dd>on</dd>
<dt>Another Field</dt>
<dd>X</dd>
<dd>Y</dd>
</dl>
</div>
I'm trying to extract all the <dd> nodes following <dt>Names</dt> but only until another <dt> starts. In this case, I'm after the following nodes:
<dd>A</dd>
<dd>B</dd>
<dd>C</dd>
I'm trying the following XPath code, but it's not working as intended.
xpath("//div[#id='about']/dl/dt[contains(text(),'Names')]/following-sibling::dd[not(following-sibling::dt)]/text()")
Any thoughts on how to fix it?
Many thanks.
Update: much simpler solution
There is a prerequisite in your situation, that is that the anchor item always is the first preceding sibling with a certain property. Because of that, here's a much simpler way of writing the below complex expression:
/div/dl/dd[preceding-sibling::dt[1][. = 'Names']]
In other words:
select any dd
that has a first preceding sibling dt (the preceding sibling axis counts backwards)
that itself has a value of "Names"
As can be seen in the following screenshot from oXygen, it selects the nodes you wanted to select (and if you change "Names" to "Status" or "Another Field", it will select only the following ones before the next dt also).
Original complex solution (leaving in for reference)
This is far easier in XPath 2.0, but let's assume you can only use XPath 1.0. The trick is to count the number of preceding siblings from your anchor element (the one with "Names" in it), and disregard any that have the wrong count (i.e., when we cross over <dt>Status</dt>, the number of preceding siblings has increased).
For XPath 1.0, remove the comments between (: and :) (in XPath, whitespace is insignificant, you can make it a multiline XPath for readability, but in 1.0, comments are not possible)
/div/dl/dd
(: any dd having a dt before it with "Names" :)
[preceding-sibling::dt[. = 'Names']]
(: count the preceding siblings up to dt with "Names", add one to include 'self' :)
[count(preceding-sibling::dt[. = 'Names']/preceding-sibling::dt) + 1
=
(: compare with count of all preceding siblings :)
count(preceding-sibling::dt)]
As a one-liner:
/div/dl/dd[preceding-sibling::dt[. = 'Names']][count(preceding-sibling::dt[. = 'Names']/preceding-sibling::dt) + 1 = count(preceding-sibling::dt)]
How about this:
//dd[preceding-sibling::dt[contains(., 'Names')]][following-sibling::dt]

XPath 1.0 exclusive or node-set expression

What I need doesn't quite seem to match what other articles of a similar title are about.
I need, using Xpath 1, to be able to get node a, or node b, excusively, in that order.
That is, node a if it exists, otherwise, node b.
an xpath expression such as :
expression | expression
will get me both in the case they both exist. that is not what I want.
I could go:
(expression | expression)[last()]
Which does in fact gget me what I need (in my case), but seems to be a bit inefficient, because it will evaluate both sides of the expression before the last result is selected.
I was hoping for an expression that is going to stop working once the left side succeeds.
A more concrete example of XML
<one>
<two>
<three>hello</three>
<four>bye</four>
</two>
<blahfive>again</blahfive>
</one>
and the xpath that works (but inefficient):
(/one/*[starts-with(local-name(.), 'blah')] | .)[last()]
To be clear, I would like to grab the immediate child node of 'one' which starts with 'blah'. However, if it doesn't exist, I would like only the current node.
If the 'blah' node does exist, I do not want the current node.
Is there a more efficient way to achieve this?
I need, using Xpath 1, to be able to get node a, or node b,
excusively, in that order. That is, node a if it exists, otherwise,
node b.
an xpath expression such as :
expression | expression
will get me both in the case they both exist. that is not what I want.
I could go:
(expression | expression)[last()]
Which does in fact gget me what I need (in my case),
This statement is not true.
Here is an example. Let us have this XML document:
<one>
<a/>
<b/>
</one>
Expression1 is:
/*/a
Expression2 is:
/*/b
Your composite expression:
(Expression1 | Expression2)[last()]
when we substitute the two expressions above is:
(/*/a | /*/b)[last()]
And this expression actually selects b -- not a -- because b is the last of the two in document order.
Now, here is an expression that selects just a if it exists, and selects b only if a doesn't exist -- regardless of document order:
/*/a | /*/b[not(/*/a)]
When this expression is evaluated on the XML document above, it selects a, regardless of its document order -- try swapping in the XML document above the places of a and b to confirm that in both cases the element that is selected is a.
To summarize, one expression that selects the wanted node regardless of any document order is:
Expression1 | Expression2[not(Expression1)]
Let us apply this general expression in your case:
Expression1 is:
/one/*[starts-with(local-name(.), 'blah')]
Expression2 is:
self::node()
The wanted expression (after substituting Expression1 and Expression2 in the above general expression) is:
/one/*[starts-with(local-name(.), 'blah')]
|
self::node()[not(/one/*[starts-with(local-name(.), 'blah')])]

mathematica: PadRight[] and \[PlusMinus]

Is there any way that
PadRight[a \[PlusMinus] b,2,""]
Returns
{a \[PlusMinus] b,""}
Instead of
a \[PlusMinus] b \[PlusMinus] ""
?
I believe that i need to somehow deactivate the operator properties of [PlusMinus].
Why do i need this?
I'm creating a program to display tables with physical quantities. To me, that means tables with entries like
(value of a) [PlusMinus] (uncertainty of a)
When i have several columns with different heights, i'm stuffing the shorter ones with "", so i can use Transpose the numeric part of the table.
If the column has more than one entrie, there's no problem:
PadRight[{a \[PlusMinus] b,c \[PlusMinus] d},4,""]
gives what i want:
{a \[PlusMinus] b,c \[PlusMinus] d,"",""}
It is when the column has only one entrie that my problem appears.
This is the code that constructs the body stuffed with "":
If[tested[Sbody],1,
body = PadRight[body, {Length[a], Max[Map[Length, body]]
With
tested[a__] :=
If[Length[DeleteDuplicates[Map[Dimensions, {a}]]] != 1, False,
True];
, a function that discovers if is arguments have the same dimension
and
a={Quantity1,Quantity2,...}
Where the quantities are the one's that i want on my table.
Thanks
First you need to be aware of that any expression in Mathematica is in the form of Head[Body]
where body may be empty, a single expression or a sequence of expressions separated by commas
Length operate on expressions, not necessarily lists
so
Length[PlusMinus[a,b]]
returns 2 since the body of the expression contains to expressions (atoms in this case) that are a and b
Read the documentation on PadRight. The second argument define the final length of the expression
so
PadRight[{a,b},4,c] results with a list of length 4 with the last two elements equal to
PadRight[{a,b},2,c] results with the original list since it is already of length 2
Therefore
PadRight[PlusMinus[a,b],2,anything] just returns the same PlusMinus[a,b] unchanged since it is already of length 2
so, youר first example is wrong. You are not able to get a result with head List using PadRight when you try to pad to an expression with head PlusMinus
There is no problem of executing
PadRight[PlusMinus[a,b],3,""]
but the result looks funny (at best) and logically meaningless, but if this is what you wanted in the first place you get it, and following my explanations above you can figure out why
HTH
best
yehuda

Resources