I'm trying to get an automated Google search to click on the first link. So far I have not been successful and was wondering if someone could assist me. The search results populate, although the act of clicking the first link fails every time.
require "selenium-webdriver"
driver = Selenium::WebDriver.for :firefox
driver.navigate.to "http://google.com"
element = driver.find_element(:name, 'q')
element.send_keys "translate"
element.submit
resultlink = driver.find_Element(:link, "Google Translate")
resultlink.click
How about you try locating the First link using CSS selectors, something like this:
driver.find_element(:css, "#rso li:nth-child(1) div > h3 > a").click
where the 1 in the brackets (after nth-child) refers to the first search result.
Also I may be wrong, but try :link_text instead of :link, something like this :
resultlink = driver.find_element(:link_text, "Google Translate")
resultlink.click
If you watch this while it's happening, you might notice that it's failing before the results load. This is probably the single most annoying aspect of automation: timing.
I tried adding sleep(5) before defining the element and it worked. However, sleeps are generally bad, so you should instead give selenium a little leeway to find the element before deciding it doesn't exist. You do this through implicit waits. For example:
driver.manage.timeouts.implicit_wait = 5 #time in seconds
This sets the maximum time that selenium will allow for an element to load. If it finds it sooner, it will continue right away. For this reason, it is far more efficient than sleep. More info is available in the documentation. Set this any time before you need to find your element. Once set, this will apply for the remainder of your test. It's a good idea in general to allow for slight delays and/or network hiccups.
First, if you are learning Selenium, don't use any of the Google pages to start with. They look simple but are extremely tricky and complex under the hood. Find another website to automate please. It is against Google's user agreement anyway.
Then I can provide you working code. Note Google search results may render differently in different browsers, and you also need to use WebDriverWait for waiting.
require 'selenium-webdriver'
driver = Selenium::WebDriver.for :firefox
driver.navigate.to "http://google.com"
element = driver.find_element(:name, 'q')
element.send_keys "translate"
element.submit
wait = Selenium::WebDriver::Wait.new(:timeout => 10)
wait.until {
driver.find_element(:css , 'h3 > a')
}
# click first result
# driver.find_element(:xpath , '(.//h3/a)[1]').click
results = driver.find_elements(:css , 'h3 > a')
results.each { |result|
if result.attribute('textContent') == 'Google Translate'
result.click
break
end
}
(.//h3/a)[1] means the first result. In Firefox, results don't have unique data-href for identifying, so you need to use index.
Otherwise, you can loop through all result links for a link that its attribute textContent equals Google Translate. Note the link text for it is actually Google <em>Translate</em>, so using text() in XPath might not work.
If you find the solution above is too much to take in, it proves you shouldn't start learning Selenium using Google pages in the first place. ;)
Related
I'm using Selenium in Ruby ( a language that I am currently learning) and I have a drop down menu that I want to iterate though, select each option, do some stuff, and then move onto the next option.
I have looked at several answers that are somewhat similar. Only one Stack Overflow question had to similar idea in mind as mine but it's in Python and I just don't know the syntax for Ruby.
I have read through the documentation for Ruby and haven't found anything that does anything similar to the Python way.
Essentially what I want to do is:
select first option
click a button
navigate to a different page
download a csv
return back to the previous page
select second option
do the same thing
etc...until all the options are done
Is this possible? I can figure out returning to the previous page and clicking the csv option but I would like some help on the syntax part.
Thank you
The ruby bindings for selenium-webdriver have a Select class for manipulating select lists.
Here's a contrived example that locates a select_list element, passed the element to a Select object, and prints the text of each option in the list. YMMV...
require "selenium-webdriver"
driver = Selenium::WebDriver.for :firefox
driver.navigate.to "https://www.seleniumeasy.com/test/basic-select-dropdown-demo.html"
element = driver.find_element(id: 'select-demo')
select_list = Selenium::WebDriver::Support::Select.new(element)
select_list.options.each { |option| puts option.text}
#=> Please select
#=> Sunday
#=> Monday
#=> Tues
...
I am successful scraping building data from a website (www.propertyshark.com) using a single address, but it looks like I get blocked once I use loop to scrape multiple addresses. Is there a way around this? FYI, the information I'm trying to access is not prohibited according to their robots.txt.
Codes for single run is as follows:
require 'mechanize'
class PropShark
def initialize(key,link_key)
##key = key
##link_key = link_key
end
def crawl_propshark_single
agent = Mechanize.new{ |agent|
agent.user_agent_alias = 'Mac Safari'
}
agent.ignore_bad_chunking = true
agent.verify_mode = OpenSSL::SSL::VERIFY_NONE
page = agent.get('https://www.google.com/')
form = page.forms.first
form['q'] = "#{##key}"
page = agent.submit(form)
page = form.submit
page.links.each do |link|
if link.text.include?("#{##link_key}")
if link.text.include?("PropertyShark")
property_page = link.click
else
next
end
if property_page
data_value = property_page.css("div.cols").css("td.r_align")[4].text # <--- error points to these commands
data_name = property_page.css("div.cols").css("th")[4].text
#result_hash["#{data_name}"] = data_value
else
next
end
end
end
return #result_hash
end
end #endof: class PropShark
# run
key = '41 coral St, Worcester, MA 01604 propertyshark'
key_link = '41 Coral Street'
spider = PropShark.new(key,key_link)
puts spider.crawl_propshark_single
I get the following errors but in an hour or two the error disappears:
undefined method `text' for nil:NilClass (NoMethodError)
When I use a loop using the above codes, I delay the process by having sleep 80 between addresses.
The first thing you should do, before you do anything else, is to contact the website owner(s). Right now, you actions could be interpreted anywhere between overly aggressive and illegal. As others have pointed out, the owners may not want you scraping the site. Alternatively, they may have an API or product feed available for this particular thing. Either way, if you are going to be depending on this website for your product, you may want to consider playing nice with them.
With that being said, you are moving through their website with all of the grace of an elephant in a china store. Between the abnormal user agent, unusual usage patterns from a single IP, and a predictable delay between requests, you've completely blown your cover. Consider taking a more organic path through the site, with a more natural human-emulation delay. Also, you should either disguise your useragent, or make it super obvious (Josh's Big Bad Scraper). You may even consider using something like Selenium, which uses a real browser, instead of Mechanize, to give away fewer hints.
You may also consider adding more robust error handling. Perhaps the site is under excessive load (or something), and the page you are parsing is not the desired page, but some random error page. A simple retry may be all you need to get that data in question. When scraping, a poorly-functioning or inefficient site can be as much of an impediment as deliberate scraping protections.
If none of that works, you could consider setting up elaborate arrays of proxies, but at that point you would be much better of using one of the many online Webscraping/API creating/Data extraction services that currently exist. They are fairly inexpensive and already do everything discussed above, plus more.
It is very likely nothing is "blocking" you. As you pointed out
property_page.css("div.cols").css("td.r_align")[4].text
is the problem. So lets focus on that line of code for a second.
Say the first time round your columns are columns = [1,2,3,4,5] well then rows[4] will return 5 (the element at index 4).
No for fun let's assume the next go around your columns are columns = ['a','b','c','d'] well then rows[4] will return nil because there is nothing at the fourth index.
This appears to be your case where sometimes there are 5 columns and sometimes there are not. Thus leading to nil.text and the error you are recieving
For example purposes, lets use google. Let us also assume in the search bar on google we enter texts that reads, "hello". Lastly lets assume the search bar has a save button and we select it to store the data in the field. So now, when you look at the search bar it reads, "hello".
I want to go back into the browser navigate to google and check that text field to ensure it saved as expected - my data that is. I know how to navigate there. But can someone explain to me in Rspec/Ruby how to go into a browser, find the web element you want to verify the value by way of id, name, xpath, etc....and write a command using selenium webdriver with Rspec/Ruby to do so. NO CAPYBARA.
After entering the search term, you can get the value (e.g. element['value']) from the text field, and use an rspec expectation to validate that the correct string has been entered. For example:
#foo_spec.rb
require 'selenium-webdriver'
require 'rspec'
describe "Contrived Example" do
it "enters a search term" do
driver = Selenium::WebDriver.for :firefox
driver.navigate.to "http://google.com"
element = driver.find_element(:name, 'q')
element.send_keys "test string"
sleep 1
expect(element['value']).to eq "test string"
driver.quit
end
end
I'm using Capybara with Ruby 1.9.3 using the selenium driver in order to get information off a website. After clicking through a couple of pages I visit the page I want and I put:
all(:css, 'td').each { |td| a_info << td }
a_info.each {|t| puts t.text }
The error I then get after about 10 seconds of waiting:
[remote server] resource://fxdriver/modules/web_element_cache.js:5628:in `unknown': Element not found in the cache - perhaps the page has changed since it was looked up (Selenium::WebDriver::Error::StaleElementReferenceError)
Followed a lot more remote server errors. I've given the page 10-30 seconds of sleep time and it's still not loading and when I do print page.html, I see a javascript script and then all the td's that I'm trying to get info from. I know that the error means an element being found is not the current one but it seems like all the elements have been loaded already so I'm not sure why they wouldn't exist anymore. I've scoured the internet for hours looking for this and would love any kind of help from possible solutions to try and the next steps for trying to figure it out. I can provide any extra information needed, just let me know.
It happens when you make a scope and into that you change the page, for example, but keeps making assertions inside it, like the following:
within 'selector' do
expect(page).to have_link 'Link'
visit 'page'
expect(page).to have_link 'I do not know this whitin, I am stale.'
find('I do not know this within, I am stale.').click
end
Just reopen you scope again over to keep working on the old one.
within 'selector' do
expect(page).to have_link 'Link'
end
visit 'page'
within 'selector' do
expect(page).to have_link 'Now i am fresh'
find('#now-i-am-fresh').click
end
This is my least favorite error. I'm gonna refer you to this exact question on stack overflow asked about a year before Random "Element is no longer attached to the DOM" StaleElementReferenceException
In presented code you are printing text on console, but maybe in your real code you are clicking on these links inside each loop which is wrong.
Solution
Try first to extract href attributes and then go through them in loop.
Code:
a_href = a_node.collect {|a| a[:href] }
a_href.each { |a| visit(a) }
I'm not sure if it fits to your situation, give more info in comments.
Is it possible to open every link in certain div and collect values of opened fields alltogether in one file or at least terminal output?
I am trying to get list of coordinates from all markers visible on google map.
all_links = b.div(:id, "kmlfolders").links
all_links.each do |link|
b.link.click
b.link(:text, "Norādījumi").click
puts b.text_field(:title, "Galapunkta_adrese").value
end
Are there easier or more effective ways how to automatically collect coordinates from all markers?
Unless there is other data (alt tags? elements invoked via onhover?) in the HTML already that you could pick through, that does seem like the most practical way to iterate through the links, however from what I can see you are not actually making use of the 'link' object inside your loop. You'd need something more like this I think
all_links = b.div(:id, "kmlfolders").links
all_links.each do |thelink|
b.link(:href => thelink.href).click
b.link(:text, "Norādījumi").click
puts b.text_field(:title, "Galapunkta_adrese").value
end
Probably using their API is a lot more effective means to get what you want however, it's why folks make API's after all, and if one is available, then using it is almost always best. Using a test tool as a screen-scraper to gather the info is liable to be a lot harder in the long run than learning how to make some api calls and get the data that way.
for web based api's and Ruby I find the REST-CLIENT gem works great, other folks like HTTP-Party
As I'm not already familiar with Google API, I find it hard for me to dig into API for one particular need. Therefor I made short watir-webdriver script for collecting coordinates of markers on protected google map. Resulting file is used in python script that creates speedcam files for navigation devices.
In this case it's speedcam map maintained and updated by Latvian police, but this script can probably be used with any google map just by replacing url.
# encoding: utf-8
require "rubygems"
require "watir-webdriver"
#b = Watir::Browser.new :ff
#--------------------------------
#b.goto "http://maps.google.com/maps?source=s_q&f=q&hl=lv&geocode=&q=htt%2F%2Fmaps.google.com%2Fmaps%2Fms%3Fmsid%3D207561992958290099079.0004b731f1c645294488e%26msa%3D0%26output%3Dkml&aq=&sll=56.799934,24.5753&sspn=3.85093,8.64624&ie=UTF8&ll=56.799934,24.5753&spn=3.610137,9.887695&z=7&vpsrc=0&oi=map_misc&ct=api_logo"
#b.div(:id, "kmlfolders").wait_until_present
all_markers = #b.div(:id, "kmlfolders").divs(:class, "fdrlt")
#prev_coordinates = 1
puts "#{all_markers.length} speedcam markers detected"
File.open("list_of_coordinates.txt","w") do |outfile|
all_markers.each do |marker|
sleep 1
marker.click
sleep 1
description = #b.div(:id => "iw_kml").text
#b.span(:class, "actbar-text").click
sleep 2
coordinates = #b.text_field(:name, "daddr").value
redo if coordinates == #prev_coordinates
puts coordinates
outfile.puts coordinates
#prev_coordinates = coordinates
end
end
puts "Coordinates saved in file!"
#b.close
Works both on Mac OSX 10.7 and Windows7.