I am try to select the "string b" text node using XPath with the HtmlAgilliyPack.
<div>
string a<br/>
string b<br/>
string c<br/>
</div>
I am not sure how to select the text?
This won't work //div/text(1)
Anybody has some suggestions?
There are two problems with your expression:
XPath starts counting at 1, so you want the second text node
text() is a node filter which does not accept arguments. If you want to limit to the second text node, use the predicate [position() = 2] or the short version [2].
Use this expression:
//div/text()[2]
Selecting text nodes can include some hassles, chopping leading and trailing whitespace and omitting whitespace-only text nodes is implementation-dependent.
Try:
//div/br[1]/following-sibling::text()[1]'
The direct following text after the first br.
Related
I need to select only the last word using xpath 1.0. I have something like this:
<Example>
<Ctry> Portugal PT </Ctry>
</Example>
I want to select only the PT word but the order is not exact, i.e: <Ctry> Portugal - Lisbon - PT </Ctry>, but the word i want to extract is always the last one.
I've already tried:
//*[name()='Example'][substring(., string-length(.) - string-length('PT')+1) = 'PT']/text() but extracts always the whole string.
Can anyone help me please?
You're selecting a node using the substring as a predicate to filter out other nodes. If you want the substring to be your output, it shouldn't go inside brackets.
substring(//*[name()='Example'], string-length(//*[name()='Example']) - string-length('PT')+1)
note that /text() can be ommited when working with string functions
I currently have the following xpath: //tr[td//text()='AD'][1]
From my understanding, this means: "find the first tr which has some td child which has any descendant whose text is equal to 'AD'"
Is this correct? If so, I would like to change the xpath to the following definition:
"find the first tr whose second td child has text equal to 'AD'"
"find the first tr whose second td child has text equal to 'AD'"
might be implemented as
//tr[td[position()=2 and text()="AD"]]
The meaning of //tr[td//text()='AD'][1] is not quite as you say. It expands to /descendant-or-self::node()/child::tr[child::td//text()='AD'][1], which means "for each descendant node, return the first child tr element that has a descendant text node equal to 'AD'.
You should instead write (//tr[td//text()='AD'])[1].
You can do
//td[contains(text(), 'AD']/parent::node() as well
I am trying to write a matching XPath rule but I can't seem to pin point words with the exact letters in the 5th and 6th position.
example 'ab' in 'qwerabqwert'
/location1[Variable='variable1'][item1[contains(.,'AB')] or item1[contains(.,'ab')]
Please help.
You can use the substring() function in the below way:
/location1[Variable='variable1'][item1[substring(., 5, 2) = ('AB', 'ab')]]
I'm new to Nokogiri, and Ruby in general.
I want to get the text of all the nodes in the document, starting from and inclusive of the first paragraph node.
I tried the following with XPath but I'm getting nowhere:
puts page.search("//p[0]/text()[next-sibling::node()]")
This doesn't work. What do I have to change?
You have to find the <p/> node and return all text() nodes, both inside and following. Depending what XPath capabilities Nokogiri has, use one of these queries:
//p[1]/(descendant::text() | following::text())
If it doesn't work, use this instead, which needs to find the first paragraph twice and can be a little bit, but probably unnoticeably, slower:
(//p[1]/descendant::text() | //p[1]/following::text())
A probably unsupported XPath 2.0 alternative would be:
//text()[//p[1] << .]
which means "all text nodes preceded by the first <p/> node in document".
This works with Nokogiri (which stands on top of libxml2 and supports XPath 1.0 expressions):
//p[1]//text() | //p[1]/following::text()
Proof:
require 'nokogiri'
html = '<body><h1>A</h1><p>B <b>C</b></p><p>D <b>E</b></p></body>'
doc = Nokogiri.HTML(html)
p doc.xpath('//p[1]//text() | //p[1]/following::text()').map(&:text)
#=> ["B ", "C", "D ", "E"]
Note that just selecting the text nodes themselves returns a NodeSet of Nokogiri::XML::Text objects, and so if you want only the text contents of them you must map them via the .text (or .content) methods.
I'm trying to select an anchor element by first containing the text "To Be Coded", then extracting a number from a string using substring, then using the greater than comparison operator (>0). This is what I have thus far:
/a[number(substring(text(),???,string-length()-1))>0]
An example of the HTML is:
<a class="" href="javascript:submitRequest('getRec','30', '63', 'Z')">
To Be Coded (23)
</a>
My issue right now is I don't know how to find the first occurrence of the open parenthesis. I'm also not sure how to combine what I have with the contains(text(),"To Be Coded") function.
So my criteria for the selection is:
Must be an anchor element
Must include the text "To Be Coded"
Must contain a number greater than 0 in the parentheses
Edit: I suppose I could just "hard code" the starting position for the substring, but I'm not sure what that would be - will XPath count the white space before the text in the element? How would it handle/count the characters?
Here try this :
a[contains(., 'To Be Coded') and number(substring-before(substring-after(., '('), ')')) > 0]