Extract last word using Xpath 1.0 - xpath

I need to select only the last word using xpath 1.0. I have something like this:
<Example>
<Ctry> Portugal PT </Ctry>
</Example>
I want to select only the PT word but the order is not exact, i.e: <Ctry> Portugal - Lisbon - PT </Ctry>, but the word i want to extract is always the last one.
I've already tried:
//*[name()='Example'][substring(., string-length(.) - string-length('PT')+1) = 'PT']/text() but extracts always the whole string.
Can anyone help me please?

You're selecting a node using the substring as a predicate to filter out other nodes. If you want the substring to be your output, it shouldn't go inside brackets.
substring(//*[name()='Example'], string-length(//*[name()='Example']) - string-length('PT')+1)
note that /text() can be ommited when working with string functions

Related

xpath query omit results with parent tag

I'm fairly new to xpath so seeking some help with a pattern to match the following. My current attempt isn't matching what I would expect.
//text()[1][contains(.,'wordToMatch') and not(self::a)]
As i'm sure you can see from the pattern above, i'm a noob.
Sample payload 1:
<p>Sample 1 wordToMatch some
random text
to not be matched followed by wordToMatch, this should work.</p>
Expected Result 1:
wordToMatch (Not the one inside of a' tags but the following one)
Sample payload 2:
<p>Sample 2 wordToMatch some
random text to not be matched followed by <b>wordToMatch</b> this
should work.</p>
Expected Result 2:
wordToMatch (The one inside of the b' tags)
Sample payload 3:
<p>Sample 3 wordToMatch some
random text to not be matched followed by wordToMatch followed by
further occurrences of wordToMatch which should not be matched.</p>
Expected Result 3:
wordToMatch (The second occurrence of the term)
Expected results for all 3 payloads is the first occurrence of the term wordToMatch which is NOT wrapped inside of an 'a' Tag.
The end language that will implement this pattern is Java.
Please help.
It's still not clear from the question what you're after exactly, adding exact expected output for each sample will clears things up, I think. Anyway, based on current information, consider the following XPath which will match any element where inner text is exactly equals 'wordToMatch', and the element itself is not an <a> element :
//*[.='wordToMatch'][not(self::a)]
This will return b element in the 2nd case and none for other cases. If you want to relax the matching return the text node (instead of parent element), this will do:
//*[not(self::a)]/text()[contains(.,'wordToMatch')]
UPDATE:
In XPath 2.0 or above you can use for construct :
for $t in //*[not(self::a)]/text()[contains(.,'wordToMatch')]
return 'wordToMatch'
xpatheval demo

Xpath - identify text that contains string 'AB' in 5th and 6th characters of a word

I am trying to write a matching XPath rule but I can't seem to pin point words with the exact letters in the 5th and 6th position.
example 'ab' in 'qwerabqwert'
/location1[Variable='variable1'][item1[contains(.,'AB')] or item1[contains(.,'ab')]
Please help.
You can use the substring() function in the below way:
/location1[Variable='variable1'][item1[substring(., 5, 2) = ('AB', 'ab')]]

Xpath with htmlagilitypack

I am try to select the "string b" text node using XPath with the HtmlAgilliyPack.
<div>
string a<br/>
string b<br/>
string c<br/>
</div>
I am not sure how to select the text?
This won't work //div/text(1)
Anybody has some suggestions?
There are two problems with your expression:
XPath starts counting at 1, so you want the second text node
text() is a node filter which does not accept arguments. If you want to limit to the second text node, use the predicate [position() = 2] or the short version [2].
Use this expression:
//div/text()[2]
Selecting text nodes can include some hassles, chopping leading and trailing whitespace and omitting whitespace-only text nodes is implementation-dependent.
Try:
//div/br[1]/following-sibling::text()[1]'
The direct following text after the first br.

Whats the XPath equivalent to SQL In query?

I would like to know whats the XPath equivalent to SQL In query. Basically in sql i can do this:
select * from tbl1 where Id in (1,2,3,4)
so i want something similar in XPath/Xsl:
i.e.
//*[#id= IN('51417','1121','111')]
Please advice
(In XPath 2,) the = operator always works like in.
I.e. you can use
//*[#id = ('51417','1121','111')]
A solution is to write out the options as separate conditions:
//*[(#id = '51417') or (#id = '1121') or (#id = '111')]
Another, slightly less verbose solution that looks a bit like a hack, though, would be to use the contains function:
//*[contains('-51417-1121-111-', concat('-', #id, '-'))]
Literally, this means you're checking whether the value of the id attribute (preceeded and succeeded by a delimiter character) is a substring of -51417-1121-111-. Note that I am using a hyphen (-) as a delimiter of the allowable values; you can replace that with any character that will not appear in the id attribute.

XPath 2.0:reference earlier context in another part of the XPath expression

in an XPath I would like to focus on certain elements and analyse them:
...
<field>aaa</field>
...
<field>bbb</field>
...
<field>aaa (1)</field>
...
<field>aaa (2)</field>
...
<field>ccc</field>
...
<field>ddd (7)</field>
I want to find the elements who's text content (apart from a possible enumeration, are unique. In the aboce example that would be bbb, ccc and ddd.
The following XPath gives me the unique values:
distinct-values(//field[matches(normalize-space(.), ' \([0-9]\)$')]/substring-before(., '(')))
Now I would like to extent that and perform another XPath on all the distinct values, that would be to count how many field start with either of them and retreive the ones who's count is bigger than 1.
These could be a field content that is equal to that particular value, or it starts witrh that value and is followed by " (". The problem is that in the second part of that XPath I would have refer to the context of that part itself and to the former context at the same time.
In the following XPath I will - instead of using "." as the context- use c_outer and c_inner:
distinct-values(//field[matches(normalize-space(.), ' \([0-9]\)$')]/substring-before(., '(')))[count(//field[(c_inner = c_outer) or starts-with(c_inner, concat(c_outer, ' ('))]) > 1]
I can't use "." for both for obvious reasons. But how could I reference a particular, or the current distinct value from the outer expression within the inner expression?
Would that even be possible?
XQuery can do it e.g.
for $s
in distinct-values(
//field[matches(normalize-space(.), ' \([0-9]\)$')]/substring-before(., '(')))
where count(//field[(. = $s) or starts-with(., concat($s, ' ('))]) > 1
return $s

Resources