Escaping Underscore with Xpath in Nokogiri - ruby

I am baffled. Given this HTML:
<div class="v-product">
<div class="v-product__inner">
<a href="https://www.xxxxx.com/>
</div>
<div class="v-product__details"> Description </div>
</div>
I want to get a node using XPath and Nokogiri.
I tried
parse_page.xpath("//v-product__details")
but it doesn't work as the node is empty.
How do I escape a double underscore in XPath?

The problem isn't the underscore, its your XPath.
//v-product__details
is looking for a tag like <v-product__details>, not something with v-product__details in its class attribute.
I'd use CSS for this instead:
parse_page.css('.v-product__details')
But if you must use XPath:
parse_page.xpath('//div[contains(#class, "v-product__inner")]')
parse_page.xpath('//*[contains(#class, "v-product__inner")]')
parse_page.xpath('//div[#class="v-product__inner"]')
parse_page.xpath('//*[#class="v-product__inner"]')
...
And if parse_page came from Nokogiri::HTML.fragment(...) then you'll want to add a leading . to your XPath expressions:
parse_page.xpath('.//div[contains(#class, "v-product__inner")]')
...
But really, I'd go with CSS if possible.

Related

Obtain an xpath element containing another element with an specific class

Hello I have this HTML:
<div class="_3Vhpd"><span>Your commerce Data</span>
<a class="n3G0C" href='http://www.webadress.......'><span>Some Text</span</a>
</div>
I tried to obtain the tag as follow:
parser.xpath('//div[contains(#class,"_3Vhpd")]//following-sibling::*[a[#class="n3G0C"]]/#href ')
but I received none '[]'. Maybe because is not just after div but after a span...
First, you sample html doesn't have a class="n3G0C", but assuming you fix it, this xpath expression should work:
//div[contains(#class,"_3Vhpd")]//following-sibling::a/#href
Output:
http://www.webadress.......

xpath:how to find a node that not contains text?

I have a html like:
...
<div class="grid">
"abc"
<span class="searchMatch">def</span>
</div>
<div class="grid">
<span class="searchMatch">def</span>
</div>
...
I want to get the div which not contains text,but xpath
//div[#class='grid' and text()='']
seems doesn't work,and if I don't know the text that other divs have,how can I find the node?
Let's suppose I have inferred the requirement correctly as:
Find all <div> elements with #class='grid' that have no directly-contained non-whitespace text content, i.e. no non-whitespace text content unless it's within a child element like a <span>.
Then the answer to this is
//div[#class='grid' and not(text()[normalize-space(.)])]
You need a not() statement + normalize-space() :
//div[#class='grid' and not(normalize-space(text()))]
or
//div[#class='grid' and normalize-space(text())='']

Can i write short path in XPath?

<html>
<body>
Example
SO
<div>
<div class="kekeke">JSAFK</div>
</div>
</body>
</html>
For getting a JSAFK element in this doc, using XPath, can I just write //*div[#class=kekeke] instead full XPath?
// is short for /descendant-or-self::node()/. So...
This XPath,
//div[#class='kekeke']
will select all such div elements in the document:
<div class="kekeke">JSAFK</div>
This XPath,
//div[#class='kekeke']/text()
will select all text nodes under all such div elements in the document:
JSAFK
there is something wrong in "
//*div[#class=kekeke]
you can't use * and div together. if you want to have a shorter path.
you can write like this
//div[#class="kekeke"]/text()

Xpath first occurrence of a tree

I want to find the first occurrence of a tree. Example:
<div id='post>
<p>text1</p>
<p>text2</p>
<img src="a.jpg">
<img src="b.jpg">
<p>text3</p>
<p>text4</p>
<img src="c.jpg">
<p>text5</p>
</div>
I want to find the first occurrence of "p/img/#src".
When i do xpath search: .//div/p/img[1]/#src
it gives 2 hits, a.jpg and c.jpg
What is the xpath for only the first occurrence (a.jpg).
I would say .//div/(p/img)[1]/#src but is of course not working.
The best option would be:
(//img[#src])[1]/#src
or
(//p//img[#src])[1]/#src
ensuring img itself within a p element.
As Martin says img is not a child of p. Moreover in your example are missing single quote closing of id attribute inside div and tag closing of img.
Here your xml corrected:
<div id='post'>
<p>text1</p>
<p>text2</p>
<img src="a.jpg"/>
<img src="b.jpg"/>
<p>text3</p>
<p>text4</p>
<img src="c.jpg"/>
<p>text5</p>
</div>
Now to select the first image you can use simply //img[1]/#src or //img[#src="a.jpg"]

C# htmlagilitypack XPath except containt html tag

<div id="Dossuuu11Plus" style="display: block; ">
Text need
<br/>
Not need
<a class="bot_link" href="http://abc.com" target="_self">http://abc.com</a>
<br/>
</div>
This is html code. I use: //td[#class='textdetaildrgI
but it get all content in , I just need "Text need". Please help me. Thanks
You could use
//div[#id='Dossuuu11Plus']/text()[1][normalize-space()]
Explanation:
It will select the first text node found for DIV which in this case is Text need and normalize-space() will trim leading and trailing whitespaces if any.

Resources