Xpath select multiple elements - xpath

Is there any ways to make the following lines shorter?
Can i select both /span/strong[2] and /strong[2] in one line without "|"?
//*[#id="content"]/div[2]/div[2]//following-sibling::p[1]/span/strong[2] | //*[#id="content"]/div[2]/div[2]//following-sibling::p[1]/strong[2]

Use p[1]//strong[2]
It means all strong[2] descendants of p[1]

why dont't you just use the second path only
//*[#id="content"]/div[2]/div[2]//following-sibling::p[1]/strong[2]
it contains both directives.

Related

Select element and its descendants

I'm trying to select an folder and its descendants from a JCR with XPath. I can select the folder easily enough:
//content/documents/folder-name
I can select its descendants too:
//content/documents/folder-name//*
However, I can't figure out how to get both. I've tried several things. These select nothing:
//content/documents/folder-name | //content/documents/folder-name//*
//content/documents/folder-name(. | *)
//content/documents/folder-name/(. | *)
//content/documents/folder-name/descendant-or-self
//content/documents/folder-name/descendant-or-self::node()
These both throw a javax.jcr.query.InvalidQueryException:
//content/documents/folder-name[. | *]
//content/documents/folder-name/[. | *]
Obviously I'm terrible at XPath. Please help.
Edit: I was using the // prefix because I didn't realize I could use /jcr:root/content instead. I have the same problem with that, however.
You can combine two XPaths using the union operator:
xpath1 | xpath2
However, your first XPath,
//content/documents/folder-name
does select the folder-name element(s), which includes the descendants of the element.
If you want the folder-name elements to be first in a list, followed by their descendants, you could combine as follows:
//content/documents/folder-name | //content/documents/folder-name//*
//content/documents/folder-name/descendant-or-self::node() looks correct to me (without seeing your XML input), though //content/documents/folder-name/descendant-or-self::* is probably better.
Certainly if //content/documents/folder-name selects something then ``//content/documents/folder-name/descendant-or-self::*` should also select something.
In XPath 2.0 you can do //content/documents/folder-name/(.|descendant::*) but although it's shorter, it seems clumsier to me than using the descendant-or-self axis.

xpath: nested nodes with same name

Is it possible to write Xpath exression to find following nodes:
a/b/c
a/b/b/c
a/b/b/b/c
a/b/b/b/.../b/.../b/c
etc.
but not
a/b/b/e/b/c
?
Thanks for any advice
For the following input:
<root>
<a><b><c>1</c></b></a>
<a><b><b><c>2</c></b></b></a>
<a><b><b><b><c>3</c></b></b></b></a>
<a><b><b><e><b><c>4</c></b></e></b></b></a>
<a><b><b><e><b><b><c>5</c></b></b></e></b></b></a>
</root>
You can try the following expression:
//a[deep-equal(distinct-values(descendant::*[position()<last()]/name()), ('b'))]//c
How it works:
For any a the names of all but the last descendant are calculated and it is checked if the distinct list of that names matches the list containing just b. And for such an a the descendant c is one you want to select.
I tested it with the 2010 version of XMLSpy, which returns the c elements containing 1, 2 and 3. I think more modern tools will also work. But you need at least XPath 2.0 for this.
Question is not totally clear, but you can try to use below expression:
//a[.//b and not(.//e)]//c[not(.//*)]
This should allow to match cthat has no own child and is descendant of a, that has b descendant, but not e
UPDATE
You can try
//a[count(.//b)=count(.//*)-count(.//c)]//c[not(.//*)]
or
//a[not(.//*[not(self::b)]//c)]//c[not(.//*)]

Xpath expression with OR

I'd like to know if there is a way to verify multiple strings on a Xpath. This is the one I'm using now:
/td[2][text()[contains(.,'Word1')]]
I'd like to do something like this:
/td[2][text()[contains(.,'Word1' OR 'Word2' OR 'Word3')]]
Is that possible?
Updated answer:
I believe, the problem why you are experiencing is case-sensitivity, try writing or in lower-case:
//td[text()[contains(.,'Word1') or contains(.,'Word2') or contains(.,'Word3')]]
If it doesn't help, you can use Union approach:
/td[2][text()[contains(.,'Word1')]] | /td[2][text()[contains(.,'Word2')]] | /td[2][text()[contains(.,'Word3')]]
yes it's possible:
/td[2][text()[contains(.,'Word1') OR contains(.,'Word2') OR contains(.,'Word3')]]
Yes - you just need separate contains() calls:
[contains(., 'Word1') OR contains(., 'Word2') OR contains(., 'Word3')]
As you have it currently, a boolean being passed as the second parameter to contains, rather than a string.
With XPath 2.0 or 3.0 you could also use:
A Quantified Expression to loop over a sequence of words and test if any of the words are contained
//td[2][text()[some $word in ('Word1', 'Word2', 'Word3') satisfies contains(., $word)]]
The matches() function and specify your list of words in a regex:
//td[2][text()[matches(., 'Word1|Word2|Word3')]]

Xpath Multiple Predicates

I am trying to quickly find a specific node using XPath but it seems my multiple predicates are not working. The div I need has a specific class, but there are 3 others that have it. I want to select the fourth one so I did the following:
//div[#class='myCLass' and 4]
However the "4" is being ignored. Any help? I am new to XPath.
Thanks.
If a xpath query returns a node set you can always use the [OFFSET] operator to access a certain element of it.
Use the following query to access the fourth element that matches the #class='myClass' predicate:
//div[#class='myCLass'][4]
#WilliamNarmontas answer might be an alternative to the syntax showed above.
Alternatively,
//div[#class='myCLass' and position()=4]
The accepted answer works correctly only if all of the div elements have the same parent. Otherwise use:
(//div[#class='myCLass'])[4]

xpath choose first table

I have xpath
page.search("//table[#class='campaign']//table")
which returns two tables.
I need to choose only first table. This line doesn't work:
page.search("//table[#class='campaign']//table[1]")
How to choose only first table?
This bugged me, too. I still don't exactly know why your solution does not work. However, this should:
page.search("//table[#class='campaign']/descendant::table[1]")
EDIT: As the docs say,
"The location path //para[1] does not mean the same as the location
path /descendant::para[1]. The latter selects the first descendant
para element; the former selects all descendant para elements that are
the first para children of their parents."
Thanks to your question, I finally understood why this works this way :). So, depending on your structure and needs, this should work.
Instead of using an XPath expression to select the first matching element, you can either find all of them and then pare it down:
first_table = page.search("//table[#class='campaign']//table").first
...or better yet, select only the first by using at:
first_table = page.at("//table[#class='campaign']//table")
Note also that your expression can be found more simply by using the CSS selector syntax:
first_table = page.at("table.campaign table")

Resources