Stripping out results from a website that doesn't have differing URLs - ruby

I'm trying to automate the process of searching for alternative telephone numbers using SayNoTo0870 . Every time one searches for an alternate number or name it brings up the '/companysearch.php' page.
Clearly this page has no reference, and in my mind you can't just link to this page.
What I'm hoping to do is use the code below, to automate the opening of a browser, searching of a name/number, stripping out the HTML and then providing the top 5 results. I've got the automation part down, but clearly when trying to save the webpage using Hpricot it only brings up the 'Sorry nothing can be found page' because I can't link directly to the search result page.
Here is my code thus far:
(I've removed comments to shorten it)
require 'rubygems'
require 'watir'
require 'hpricot'
require 'open-uri'
class OH870
def searchName(name)
browser = Watir::Browser.new
browser.goto 'http://www.saynoto0870.com/search.php'
browser.text_field(:name => 'search_name').set name
browser.button(:name => 'submit').click
end
def searchNumber(number)
browser = Watir::Browser.new
browser.goto 'http://www.saynoto0870.com/search.php'
browser.text_field(:name => 'number').set number
browser.button(:name => 'submit').click
end
def loadNew(website)
doc = Hpricot(open(website))
puts(doc)
end
def strip_tags
stripped = website.gsub( %r{</?[^>]+?>}, '' )
puts stripped
end
end # class
class Main < OH870
puts "What is the name of the place you want?"
website = 'http://www.saynoto0870.com/companysearch.php'
question = gets.chomp
whichNumber = OH870.new
whichNumber.searchName(question)
#result = OH870.new
#withoutTags = website.strip_tags
#result.loadNew(withoutTags)
end
Now I'm not sure whether there's a way of "asking watir to follow through to the companysearch.php page and dump the results without having to pass this page as a variable.
I wonder if anyone has any suggestions here?

With WATIR, minus the extraneous libraries, here's all it takes to accomplish what you've described (using the 'name' test case only). I've pulled it out of the function format since you already know how to do that, and this will be a clearer test case path.
require 'watir'
#browser = Watir::Browser.new :firefox #open a browser called #browser
#browser.goto "http://(your search page here)" #go to the search page
#browser.text_field(:name => 'name').value = "Awesome" #fill in the 'name' field
#browser.button(:name => 'submit').click #submit the form
If all goes well, we should now be looking at the search results. WATIR already knows it's on a new page - we don't have to specify a URL. In the case that the results are in a frame, we do need to access that frame before we can view its content. Let's pretend they're in a DIV element with an ID of "search_results":
results = #browser.div(:id => "search_results").text
resultsFrame = #browser.frame(:index => 1) #in the case of a frame
results = resultsFrame.div(id => "search_results).text
As you can see, you do not need to save the entire page to parse the results. They could be in table cells, they could be in a different div per line, or a new frame. All are easily accessible with WATIR to be stored in a variable, array, or immediately written to the console or log file.
#results = Array.new #create an Array to store our results
#browser.divs.each do |div| #for each div element on the page
if div.id == "search_results" #if the div ID equals "search_results"
#results << div.text #add it to our array named #results
end
end
Now, if you just wanted the top 5 there are many ways to access them.
#results[0] #first element
#results[0..4] #first 5 elements
I'd also suggest you look into a few programming principles like DRY (Don't Repeat Yourself). In your function definitions where you see that they share code, like opening the browser and visiting the same URL - you can consolidate those:
def search(how, what)
#browser = Watir::Browser.new :firefox
#browser.goto "(that search url again)"
#browser.text_field(:name => how).value = what
etc...
end
search("name", "Hilton")
search("number", "555555")
Since we know that the two available text_field names are "name" and "number", and those make good logical sense as a 'how', we can parameterize them and use a single function for both the Search by Name and Search by Number test cases. This is more efficient, as long as the test cases remain similar enough to be shared.

Related

Element not found in the cache - perhaps the page has changed since it was looked up (Selenium::WebDriver::Error::StaleElementReferenceError)

I am trying to click all links on stackoveflow horizontal menu (Questions, Tags, Users, Badges, Unanswered). I have this code but this clicks on first link (this link is Questions), then prints 1, and after that raises error. What could be problem with this?
require 'watir-webdriver'
class Stackoverflow
def click_all_nav_links
b = Watir::Browser.new
b.goto "http://stackoverflow.com"
counter = 0
b.div(:id => 'hmenus').div(:class => 'nav mainnavs').ul.lis.each do |li|
li.a.click
puts counter += 1
end
end
end
stackoverflow = Stackoverflow.new
stackoverflow.click_all_nav_links
Error message is:
https://gist.github.com/3242300
The StaleElementReferenceError often occurs when storing elements and then trying to access them after going to another page. In this case, the reference to the lis becomes stale after you click the links and navigate to a new page.
You have to store off attributes or the index of the lis first. This will allow you to get a fresh reference to each li after clicking a link.
Try this:
class Stackoverflow
def click_all_nav_links
b = Watir::Browser.new
b.goto "http://stackoverflow.com"
#Store the text of each locate so that it can be located later
tabs = b.div(:id => 'hmenus').div(:class => 'nav mainnavs').ul.lis.collect{ |x| x.text }
#Iterate through the tabs, using a fresh reference each time
tabs.each do |x|
b.div(:id => 'hmenus').div(:class => 'nav mainnavs').ul.li(:text, x).a.click
end
end
end
stackoverflow = Stackoverflow.new
stackoverflow.click_all_nav_links

print an array of input elements on a page using watir-webdriver

I would like to cycle threw all the input elements on a web page and print the name attribute of each. I am having trouble creating the array of elements to cycle threw. here is my code hitting the example page at bit.ly/watir-webdriver-demo
require 'watir-webdriver'
b = Watir::Browser.new
b.goto("bit.ly/watir-webdriver-demo")
listOfInputs = b.form(:method => "post")
listOfInputs.input.each do |i|
puts i.Name
end
How can I print out the name of each input on the page
looks like i just needed to not use form.
I use the body instead and this works!
require 'watir-webdriver'
browser = Watir::Browser.new
browser.goto("bit.ly/watir-webdriver-demo")
body = browser.body
body.inputs.each do |input|
puts input.name
end

Ruby Mechanize: Follow a Link

In Mechanize on Ruby, I have to assign a new variable to every new page I come to. For example:
page2 = page1.link_with(:text => "Continue").click
page3 = page2.link_with(:text => "About").click
...etc
Is there a way to run Mechanize without a variable holding every page state? like
my_only_page.link_with(:text => "Continue").click!
my_only_page.link_with(:text => "About").click!
I don't know if I understand your question correctly, but if it's a matter of looping through a lot of pages dynamically and process them, you could do it like this:
require 'mechanize'
url = "http://example.com"
agent = Mechanize.new
page = agent.get(url) #Get the starting page
loop do
# What you want to do on the page - ex. extract something...
item = page.parser.css('.some_item').text
item.save
if link = page.link_with(:text => "Continue") # As long as there is still a nextpage link...
page = link.click
else # If no link left, then break out of loop
break
end
end

Mechanize: picking right submit from multiple in same form

I use Mechanize to loop through a table, which is paginated.
I have a problem with a form that holds multiple submit inputs. The input tags are used as pagination and they are generated dynamically. When I loop through the pages I need to scrape, I need to be able to pick the right input, since only one of them will take me to the “next page”. The right tag can be identified by different attributes such as name, class, value etc. My problem is though, that I can’t find out how to tell mechanize which one to use.
I tried this:
require 'mechanize'
require 'yaml'
url = "http://www.somewhere.com"
agent = Mechanize.new
page = agent.get(url)
loop do
puts "some content from site using nokogiri"
if next_page = page.form_with(:action => /.*/)
page = next_page.submit(page.form_with(:action => /.*/).submits[3])
else
break
end
end
From this post, http://rubyforge.org/pipermail/mechanize-users/2008-November/000314.html, but as told the number of tags are changing so just picking a hardcoded number of the submits is not too good an idea.
What I would like to know is if there is a way like this:
loop do
puts "some content from site using nokogiri"
if next_page = page.form_with(:action => /.*/)
page = next_page.submit(:name => /the_right_submit_button/)
else
break
end
end
or something like that, maybe with a css or xpath selector.
I usually use form.button_with to select the right button to click:
form = results_page.forms[0]
results_page = form.submit(form.button_with(:name=>'ctl00$ContentBody$ResultsPager$NextButton'))

Ruby - Mechanize: Select link by classname and other questions

At the moment I'm having a look on Mechanize.
I am pretty new to Ruby, so please be patient.
I wrote a little test script:
require 'rubygems'
require 'mechanize'
agent = WWW::Mechanize.new
page = agent.get('http://www.google.de')
pp page.title
google_form = page.form_with(:name => 'f')
google_form.q = 'test'
page = agent.submit(google_form)
pp page.title
page_links = Array.new
page.links.each do |ll|
page_links << ll
end
puts page_links.size
This works. But page_links includes not only the search results. It also includes the google links like Login, Pictures, ...
The result links own a styleclass "1". Is it possible to select only the links with class == 1? How do I achieve this?
Is it possible to modify the "agentalias"? If I own a website, including google analytics or something, what browserclient will I see in ga going with mechanize on my site?
Can I select elements by their ID instead of their name? I tried to use
my_form = page.form_with(:id => 'myformid')
But this does not work.
in such cases like your I am using Nokogiri DOM search.
Here is your code a little bit rewritten:
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
page = agent.get('http://www.google.de')
pp page.title
google_form = page.form_with(:name => 'f')
google_form.q = 'test'
page = agent.submit(google_form)
pp page.title
page_links = Array.new
#maybe you better use 'h3.r > a.l' here
page.parser.css("a.l").each do |ll|
#page.parser here is Nokogiri::HTML::Document
page_links << ll
puts ll.text + "=>" + ll["href"]
end
puts page_links.size
Probably this article is a good place to start:
getting-started-with-nokogiri
By the way samples in the article also deal with Google search ;)
You can build a list of just the search result links by changing your code as follows:
page.links.each do |ll|
cls = ll.attributes.attributes['class']
page_links << ll if cls && cls.value == 'l'
end
For each element ll in page.links, ll.attributes is a Nokogiri::XML::Element and ll.attributes.attributes is a Hash containing the attributes on the link, hence the need for ll.attributes.attributes to get at the actual class and the need for the nil check before comparing the value to 'l'
The problem with using :id in the criteria to find a form is that it clashes with Ruby's Object#id method for returning a Ruby object's internal id. I'm not sure what the work around for this is. You would have no problem selecting the form by some other attribute (e.g. its action.)
I believe the selector you are looking for is:
:dom_id e.g. in your case:
my_form = page.form_with(:dom_id => 'myformid')

Resources