Xpath with htmlagilitypack - xpath

I am try to select the "string b" text node using XPath with the HtmlAgilliyPack.
<div>
string a<br/>
string b<br/>
string c<br/>
</div>
I am not sure how to select the text?
This won't work //div/text(1)
Anybody has some suggestions?

There are two problems with your expression:
XPath starts counting at 1, so you want the second text node
text() is a node filter which does not accept arguments. If you want to limit to the second text node, use the predicate [position() = 2] or the short version [2].
Use this expression:
//div/text()[2]
Selecting text nodes can include some hassles, chopping leading and trailing whitespace and omitting whitespace-only text nodes is implementation-dependent.

Try:
//div/br[1]/following-sibling::text()[1]'
The direct following text after the first br.

Related

Extract last word using Xpath 1.0

I need to select only the last word using xpath 1.0. I have something like this:
<Example>
<Ctry> Portugal PT </Ctry>
</Example>
I want to select only the PT word but the order is not exact, i.e: <Ctry> Portugal - Lisbon - PT </Ctry>, but the word i want to extract is always the last one.
I've already tried:
//*[name()='Example'][substring(., string-length(.) - string-length('PT')+1) = 'PT']/text() but extracts always the whole string.
Can anyone help me please?
You're selecting a node using the substring as a predicate to filter out other nodes. If you want the substring to be your output, it shouldn't go inside brackets.
substring(//*[name()='Example'], string-length(//*[name()='Example']) - string-length('PT')+1)
note that /text() can be ommited when working with string functions

XPath - "get first tr where its first td has text equal to 'abcd'"

I currently have the following xpath: //tr[td//text()='AD'][1]
From my understanding, this means: "find the first tr which has some td child which has any descendant whose text is equal to 'AD'"
Is this correct? If so, I would like to change the xpath to the following definition:
"find the first tr whose second td child has text equal to 'AD'"
"find the first tr whose second td child has text equal to 'AD'"
might be implemented as
//tr[td[position()=2 and text()="AD"]]
The meaning of //tr[td//text()='AD'][1] is not quite as you say. It expands to /descendant-or-self::node()/child::tr[child::td//text()='AD'][1], which means "for each descendant node, return the first child tr element that has a descendant text node equal to 'AD'.
You should instead write (//tr[td//text()='AD'])[1].
You can do
//td[contains(text(), 'AD']/parent::node() as well

Xpath - identify text that contains string 'AB' in 5th and 6th characters of a word

I am trying to write a matching XPath rule but I can't seem to pin point words with the exact letters in the 5th and 6th position.
example 'ab' in 'qwerabqwert'
/location1[Variable='variable1'][item1[contains(.,'AB')] or item1[contains(.,'ab')]
Please help.
You can use the substring() function in the below way:
/location1[Variable='variable1'][item1[substring(., 5, 2) = ('AB', 'ab')]]

What XPath can I use to get all text nodes after and including the first paragraph node?

I'm new to Nokogiri, and Ruby in general.
I want to get the text of all the nodes in the document, starting from and inclusive of the first paragraph node.
I tried the following with XPath but I'm getting nowhere:
puts page.search("//p[0]/text()[next-sibling::node()]")
This doesn't work. What do I have to change?
You have to find the <p/> node and return all text() nodes, both inside and following. Depending what XPath capabilities Nokogiri has, use one of these queries:
//p[1]/(descendant::text() | following::text())
If it doesn't work, use this instead, which needs to find the first paragraph twice and can be a little bit, but probably unnoticeably, slower:
(//p[1]/descendant::text() | //p[1]/following::text())
A probably unsupported XPath 2.0 alternative would be:
//text()[//p[1] << .]
which means "all text nodes preceded by the first <p/> node in document".
This works with Nokogiri (which stands on top of libxml2 and supports XPath 1.0 expressions):
//p[1]//text() | //p[1]/following::text()
Proof:
require 'nokogiri'
html = '<body><h1>A</h1><p>B <b>C</b></p><p>D <b>E</b></p></body>'
doc = Nokogiri.HTML(html)
p doc.xpath('//p[1]//text() | //p[1]/following::text()').map(&:text)
#=> ["B ", "C", "D ", "E"]
Note that just selecting the text nodes themselves returns a NodeSet of Nokogiri::XML::Text objects, and so if you want only the text contents of them you must map them via the .text (or .content) methods.

XPath - find first occurance of string

I'm trying to select an anchor element by first containing the text "To Be Coded", then extracting a number from a string using substring, then using the greater than comparison operator (>0). This is what I have thus far:
/a[number(substring(text(),???,string-length()-1))>0]
An example of the HTML is:
<a class="" href="javascript:submitRequest('getRec','30', '63', 'Z')">
To Be Coded (23)
</a>
My issue right now is I don't know how to find the first occurrence of the open parenthesis. I'm also not sure how to combine what I have with the contains(text(),"To Be Coded") function.
So my criteria for the selection is:
Must be an anchor element
Must include the text "To Be Coded"
Must contain a number greater than 0 in the parentheses
Edit: I suppose I could just "hard code" the starting position for the substring, but I'm not sure what that would be - will XPath count the white space before the text in the element? How would it handle/count the characters?
Here try this :
a[contains(., 'To Be Coded') and number(substring-before(substring-after(., '('), ')')) > 0]

Resources