xpath: how to get the specific text using xpath? - xpath

the html is like this:
<div id='id'>
<a href>abc</a>
"xyz"
</div>
I want to use xpath to get the xyz (I use it in capybara), but my xpath can't work
... //div[#id='id'].text
it returns abcxyz
how can I get it?

Text is its own text node, so the correct selector would be:
.//div[#id='id']/text()

Related

XPath one of multiple elements of an attribute

in this HTML using scrapy i can access the full info-car by : './/#info-car' XPath
<div class="car car-root"
info-car='{brand":"BMW","Price":"&#30000"name":"X5","color":null,"}'>
</div>
what is the XPath to pick only the name of info-car ?
You can obtain the name by using a combination of xpath and regex. See below sample code:
response.xpath(".//#info-car").re_first(r'"name":"(.*)",')

XPath "and" Confusion

I recently started a new job that uses cucumber/Gherkin along with selenium. I was trying to create a XPath for a specific element. The xml looks slightly like this...
<p>
<div class="slds-text-title_bold slds-m-bottom_x-small ncc-input-label">
Amp
</div>
<div class="slds-text-title_bold slds-m-bottom_x-small ncc-input-label required-field-label">
Voltage
</div>
</p>
I am looking to only get the div with the required field label in the class and text of "Voltage" So far this kinda works...
//div[contains(text(), "Voltage")] | //*[contains(class, "required-field-label")]
however I'm getting way too many false positives. Any time I change the pipe into "and" I get nothing. What am I doing wrong?
HCSloan
Try the following expression on your actual code, and see if it works:
//div[contains(#class, "required-field-label")][contains(text(), "Voltage")]
You can match the element using "and" like this:
//div[contains(#class, 'required-field-label') and contains(text(), 'Voltage')]

Escaping Underscore with Xpath in Nokogiri

I am baffled. Given this HTML:
<div class="v-product">
<div class="v-product__inner">
<a href="https://www.xxxxx.com/>
</div>
<div class="v-product__details"> Description </div>
</div>
I want to get a node using XPath and Nokogiri.
I tried
parse_page.xpath("//v-product__details")
but it doesn't work as the node is empty.
How do I escape a double underscore in XPath?
The problem isn't the underscore, its your XPath.
//v-product__details
is looking for a tag like <v-product__details>, not something with v-product__details in its class attribute.
I'd use CSS for this instead:
parse_page.css('.v-product__details')
But if you must use XPath:
parse_page.xpath('//div[contains(#class, "v-product__inner")]')
parse_page.xpath('//*[contains(#class, "v-product__inner")]')
parse_page.xpath('//div[#class="v-product__inner"]')
parse_page.xpath('//*[#class="v-product__inner"]')
...
And if parse_page came from Nokogiri::HTML.fragment(...) then you'll want to add a leading . to your XPath expressions:
parse_page.xpath('.//div[contains(#class, "v-product__inner")]')
...
But really, I'd go with CSS if possible.

Obtain an xpath element containing another element with an specific class

Hello I have this HTML:
<div class="_3Vhpd"><span>Your commerce Data</span>
<a class="n3G0C" href='http://www.webadress.......'><span>Some Text</span</a>
</div>
I tried to obtain the tag as follow:
parser.xpath('//div[contains(#class,"_3Vhpd")]//following-sibling::*[a[#class="n3G0C"]]/#href ')
but I received none '[]'. Maybe because is not just after div but after a span...
First, you sample html doesn't have a class="n3G0C", but assuming you fix it, this xpath expression should work:
//div[contains(#class,"_3Vhpd")]//following-sibling::a/#href
Output:
http://www.webadress.......

Xpath get text of nested item not working but css does

I'm making a crawler with Scrapy and wondering why my xpath doesn't work when my CSS selector does? I want to get the number of commits from this html:
<li class="commits">
<a data-pjax="" href="/samthomson/flot/commits/master">
<span class="octicon octicon-history"></span>
<span class="num text-emphasized">
521
</span>
commits
</a>
</li
Xpath:
response.xpath('//li[#class="commits"]//a//span[#class="text-emphasized"]//text()').extract()
CSS:
response.css('li.commits a span.text-emphasized').css('::text').extract()
CSS returns the number (unescaped), but XPath returns nothing. Am I using the // for nested elements correctly?
You're not matching all values in the class attribute of the span tag, so use the contains function to check if only text-emphasized is present:
response.xpath('//li[#class="commits"]//a//span[contains(#class, "text-emphasized")]//text()')[0].strip()
Otherwise also include num:
response.xpath('//li[#class="commits"]//a//span[#class="num text-emphasized"]//text()')[0].strip()
Also, I use [0] to retrieve the first element returned by XPath and strip() to remove all whitespace, resulting in just the number.

Resources