HTML Code:
<font size="6.2em;" color="red"> $0.00</font>
I need to print the price every time I add more items.
The XPath that I've tried is
*//font[starts-with(normalize-space()='$')]* but it is not locating price tag element.
this is the URL : http://demo.guru99.com/payment-gateway/process_purchasetoy.php
Related
in this HTML using scrapy i can access the full info-car by : './/#info-car' XPath
<div class="car car-root"
info-car='{brand":"BMW","Price":"田"name":"X5","color":null,"}'>
</div>
what is the XPath to pick only the name of info-car ?
You can obtain the name by using a combination of xpath and regex. See below sample code:
response.xpath(".//#info-car").re_first(r'"name":"(.*)",')
I want to scrape data using Nokogiri from some HTML:
<td data-bar="hoge" data-date="2000-01-01" class="modals"></td>
<td data-bar="fuga" data-date="2000-01-02" class="modals"></td>
I wrote:
element = page.css("td[data-bar='hoge'][data-date='2000-01-01']")
but element.length returns 0.
How do I distinguish elements having two data- attributes?
Try using XPath selectors instead. This worked for me:
element = page.xpath "//td[#data-bar='hoge'][#data-date='2000-01-01']"
In this example, the // portion will match any td element (with those attributes) in the document, which may not be desirable. In that case, you would need to write a more explicit XPath to the node.
Here's the documentation for XPath: https://www.w3.org/TR/xpath/
Code snippet:
<td class="right odds down"><a class=" betslip" target="unibet" onmouseout="delayHideTip()" onmouseover="page.hist(this,'P-0.00-0-0','24vekxv464x0x4g25d',5,event,1,1)" href="/bookmaker/unibet/betslip//event/1002752206/coupon/single,2133228960,p,[0]">1.70</a></td>
I trying to extract data from a page where the class target is "Unibet".
What would be the correct formatting for this query?
Ive tried:
//*[classtarget="unibet"]//td/a/#class
Well, target is attribute, not class, of element <a>. The XPath to find <td> element and then return the child element <a> where target attribute value equals "unibet" will be :
//td/a[#target='unibet']
if you want to return class attribute of the <a> element instead, simply add a trailing /#class to the above XPath :
//td/a[#target='unibet']/#class
I'm making a crawler with Scrapy and wondering why my xpath doesn't work when my CSS selector does? I want to get the number of commits from this html:
<li class="commits">
<a data-pjax="" href="/samthomson/flot/commits/master">
<span class="octicon octicon-history"></span>
<span class="num text-emphasized">
521
</span>
commits
</a>
</li
Xpath:
response.xpath('//li[#class="commits"]//a//span[#class="text-emphasized"]//text()').extract()
CSS:
response.css('li.commits a span.text-emphasized').css('::text').extract()
CSS returns the number (unescaped), but XPath returns nothing. Am I using the // for nested elements correctly?
You're not matching all values in the class attribute of the span tag, so use the contains function to check if only text-emphasized is present:
response.xpath('//li[#class="commits"]//a//span[contains(#class, "text-emphasized")]//text()')[0].strip()
Otherwise also include num:
response.xpath('//li[#class="commits"]//a//span[#class="num text-emphasized"]//text()')[0].strip()
Also, I use [0] to retrieve the first element returned by XPath and strip() to remove all whitespace, resulting in just the number.
I'm stuck not being able to parse irregularly embedded html tags. Is there a way to remove all html tags from a node and retain all text?
I'm using the code:
rows = doc.search('//table[#id="table_1"]/tbody/tr')
details = rows.collect do |row|
detail = {}
[
[:word, 'td[1]/text()'],
[:meaning, 'td[6]/font'],
].collect do |name, xpath|
detail[name] = row.at_xpath(xpath).to_s.strip
end
detail
end
Using Xpath:
[:meaning, 'td[6]/font']
generates
:meaning: ! '<font size="3">asking for information specifying <font
color="#CC0000" size="3">what is your name?</font> /what/ as in, <font color="#CC0000" size="3">I'm not sure what you mean</font>
/what/ as in <a style="text-decoration: none;" href="http://somesecretlink.com">what</a></font>
On the other hand, using Xpath:
'td/font/text()'
generates
:meaning: asking for information specifying
thus ignoring all children of the node. What I want to achieve is this
:meaning: asking for information specifying what is your name? /what/ as in, I'm not sure what you mean /what/ as in what? I can't hear you
This depends on what you need to extract. If you want all text in font elements, you can do it with the following xpath:
'td/font//text()'
It extracts all text nodes in font tags. If you want all text nodes in the cell, then:
'td//text()'
You can also call the text method on a Nokogiri node:
row.at_xpath(xpath).text
I added an answer for this same sort of question the other day. It's a very easy process.
Take a look at: Convert HTML to plain text and maintain structure/formatting, with ruby