How to simulate click in span using Mechanize Ruby? - ruby

I have a webpage with list of pages:
<div class="pager">
<span class="current_page">1</span>
<span class="page" samo:page="2">2</span>
<span class="page" samo:page="3">3</span>
<span class="page" samo:page="4">4</span>
<span class="page" samo:page="5">5</span>
<span class="page" samo:page="6">6</span>
<span class="page" samo:page="7">7</span>
<span class="page" samo:page="8">8</span>
<span class="page" samo:page="9">9</span>
<span class="page" samo:page="10">10</span>
<span class="page" samo:page="11">11</span>
</div>
How can I click on the span using mechanize?

According to this ASCIIcasts you can perform searches and findings:
There are two methods on the page object that we can use to extract
elements from a page using Nokogiri. The first of these is called at
and will return a single element that matches a selector.
agent.page.at(".edit_item")
The second method is search. This is similar, but returns an array of
all of the elements that match.
agent.page.search(".edit_item")
http://asciicasts.com/episodes/191-mechanize
So doing something like:
agent.page.at(".page")
Will return the array of spans. And then you will be able to work with them and just do the #click action.
EDITED:
As long as the span is a non interactive element, and click is a Link action, you will have to find a workaround:
How to click link in Mechanize and Nokogiri?

Related

How to properly get the value contained inside a section using XPath?

having the following HTML (snippet grabbed from the web page I wanted to scrape):
<div class="ulListContainer">
<section class="stockUpdater">
<ul class="column4">
<li>
<img src="1.png" alt="">
<strong>
Buy*
</strong>
<strong>
Sell*
</strong>
</li>
<li>
<header>
$USD
</header>
<span class="">
20.90
</span>
<span class="">
23.15
</span>
</li>
</ul>
<ul>...</ul>
</section>
</div>
how do I get the 2nd li 1st span value using XPath? The result should be 20.90.
I have tried the following //div[#class="ulListContainer"]/section/ul[1]/li[2]/span[1] but I am not getting any values. I must said this is being used from a Google Sheet and using the function IMPORTXML (not sure what version of XPath it does uses) can I get some help?
Update
Apparently Google Sheets does not support such "complex" XPath expression since it seems to work fine:
Update 1
As requested I've shared the Google Sheet I am using to test this, here is the link
What you need is :
=IMPORTXML(A1;"//li[contains(text(),'USD')]/span[1]")
Removing section from your original XPath will work too :
=IMPORTXML(A1;"//div[#class='ulListContainer']/ul[1]/li[2]/span[1]")
Try this:
=IMPORTXML("URL","//span[1]")
Change URL to the actual website link/URL

Parsing through response created with XPath

Using Scrapy, I want to extract some data from a HTML well-formed site. With XPath I am able to extract a list of items, but I am not able to extra data from the elements in the list, using XPath
All XPath's have been tested using XPather. I have tested the issue using a local file that contains the webpage, same issue.
Here goes:
# Get the webpage
fetch("https://www.someurl.com")
# The following gives me the expected items from the HTML
products = response.xpath("//*[#id='product-list-146620']/div/div")
The items are like this:
<div data-pageindex="1" data-guid="13157582" class="col ">
<div class="item item-card item-card--static">
<div class="item-card__inner">
<div class="item__image item__image--overlay">
<a href="/www.something.anywhere?ref_gr=9801" class="ratio_custom" style="padding-bottom:100%">
</a>
</div>
<div class="item__text-container">
<div class="item__name">
<a class="item__name-link" href="/c.aspx?ref_gr=9801">The text I want</a>
</div>
</div>
</div>
</div>
</div>
When using the following Xpath to extract "The text I want", i dont get anything:
XPATH_PRODUCT_NAME = "/div/div/div/div/div[contains(#class,'item__name')]/a/text()"
products[0].xpath(XPATH_PRODUCT_NAME).extract()
The output is empty, why?
Try the following code.
XPATH_PRODUCT_NAME = ".//div[#class='item__name']/a[#class='item__name-link']/text()"
products[0].xpath(XPATH_PRODUCT_NAME).extract()

Xpath grab div contents based on span class

<div class="accrd-row">
<h3 class="ui-helper-reset ui-accordion-header ui-corner-top ui-accordion-header-collapsed ui-corner-all ui-state-default ui-accordion-icons" role="tab" id="ui-id-1" aria-controls="ui-id-2" aria-selected="false" aria-expanded="false" tabindex="0"><span class="ui-accordion-header-icon ui-icon ui-icon-triangle-1-e"></span><span class="icon icon-ki-act-panda"></span>Outdoor Activities</h3>
<div class="accrd-detail ui-accordion-content ui-corner-bottom ui-helper-reset ui-widget-content" id="ui-id-2" aria-labelledby="ui-id-1" role="tabpanel" aria-hidden="true" style="display: none;">Need to grab this text here</div>
</div>
I am trying to grab the text:
Need to grab this text here
Based on that the span above has the word "panda" in it. I know it is something like:
//span/#class[contains(.,'panda')]/following-sibling::a/div
But I cannot seem to get this to pick up the text.
You need to go back to the parent of span since the div you are looking for is a sibling of h3 not span.
There is probably a nicer way to do it but this is working for me to get the div element you need:
//h3//span[contains(#class, 'panda')]/parent::h3/following-sibling::div

Scraping text within several span tags (Ruby & Nokogiri)

I am trying to scrape "Description" from this HTML structure
<div class="menu-index-page__item-content">
<h6 class="menu-index-page__item-title">
<span> Item title </span>
</h6>
<p class="menu-index-page__item-desc">
<span>
<span>
<span>Description</span>
</span>
</span>
Each tag has an element with it that I don't know how to handle:
data-reactid=".3wrqgx5340.3.5.0.4:$523105.2.$3959254.$menuItemContent.1.0"
Each data-reactid is different. So if I target this attribute I will scrape stuff I don't want.
I've tried .search .xpath, using tags and classes but nothing seems to work.
Is there a way to say: give me the p tag that has a class="menu-index-page__item-desc" and scrape the 3rd span from there?
You can get the required value via xpath
//text()[contains(.,'Description')]
You code and xpath:

How to use XPath extract text without Html tag?

<div id="info" class="">
<span>
<span class="pl"> author</span>:
<a class="" href="/search/author"Peter</a>
</span><br/>
<span class="pl">publisher:</span> god cor<br/>
<span class="pl">year:</span> 2011-6<br/>
<span class="pl">page:</span> 360<br/>
<span class="pl">price:</span> 39.50<br/>
From the above HTML tags, i want to extract those numbers with XPath.How can i do that?
Thanks.
The XPath for each number is (in order as shown above) :
//*[#id="info"]/a/text()[2] --> 2011-6
//*[#id="info"]/a/text()[3] -->360
//*[#id="info"]/a/text()[4] --> 39.5
You can know the XPath for any tag by just opening the html file in Chrome, right clicking on the view and choosing "inspect". When you find the tag you want, just right click on it and choose Copy-> Copy XPath.

Resources