Selenium parse elements to string - ruby

My goal is to dynamically get website content created by Javascript. I have the following code:
browser = Selenium::WebDriver.for :firefox
browser.get "https://gls-group.eu/AT/de/paket-verfolgen?match=00000000000"
wait = Selenium::WebDriver::Wait.new(:timeout => 20)
js_code = "return document.getElementsByTagName('div')"
elements = browser.execute_script(js_code)
puts elements
browser.close
The output is:
#<Selenium::WebDriver::Element:0x4e4c920>
#<Selenium::WebDriver::Element:0x4e4c770>
#<Selenium::WebDriver::Element:0x4e4c230>
#<Selenium::WebDriver::Element:0x4e55650>
#<Selenium::WebDriver::Element:0x4e55848>
#<Selenium::WebDriver::Element:0x4e57e58>
#<Selenium::WebDriver::Element:0x4e57c00>
#<Selenium::WebDriver::Element:0x4e57a08>
and so on. How do I get the divs?

browser.execute_script(js_code) gives all the html elements as you asked as instances of Selenium::WebDriver::Element class. Write as below using method Selenium::WebDriver::Element#text to get the content of those div elements :
require 'selenium-webdriver'
browser = Selenium::WebDriver.for :firefox
browser.get "https://gls-group.eu/AT/de/paket-verfolgen?match=00000000000"
wait = Selenium::WebDriver::Wait.new(:timeout => 20)
js_code = "return document.getElementsByTagName('div')"
elements = browser.execute_script(js_code)
elements.each{|e| puts e.text }

Related

Is there another way to only get last loaded data using while loop in Ruby

I'm doing a web scraping with a dynamic website that has a "Load more" button. Though I solve the load more problems by using a while loop. It has another challenge when I try to scrape the data it just keeps multiplying. So the first batch of data is 24 data when I scrape the second batch it also scrapes the first batch so it scrape 48 data with only 24 new data being added and soon.
heres my code.
require "selenium-webdriver"
driver = Selenium::WebDriver.for :chrome
url ="https://www.example.com/categories/car-parts"
driver.navigate.to "#{url}"
wait = Selenium::WebDriver::Wait.new(:timeout => 20)
while driver.page_source.include? "Load more"
load_more = wait.until {
load_more_element = driver.find_element(css: ".styles__loadMore___yYAF4")
}
sleep 3
load_more.click()
puts "load_more"
sleep 3
seller_url = wait.until {
element = driver.find_elements(:css, ".desktop__itemOneFourth___2t71A .styles__link___9msaS:nth-child(1)")
}
seller_url.each do |line|
seller_uri = line.attribute("href")
seller_hand = seller_uri[/https:\/\/www.example.com(.*\/([.\w+]+))/i]
seller_handle = seller_hand.gsub("https://www.example.com/", "")
seller = Seller.new
seller.seller_url = seller_uri
seller.seller_handle = seller_handle
seller.save
puts seller_handle
end
puts seller_url.size
sleep 3
What I want is that i continues to load but i want to scrape the last loaded batch minus all the previous batch.
You know how many records are loaded each time you hit the load more button so you can easily access only new records in the seller_url array:
items_per_page = 24
while driver.page_source.include? "Load more"
# ...
seller_url = wait.until {
element = driver.find_elements(:css, ".desktop__itemOneFourth___2t71A .styles__link___9msaS:nth-child(1)")
}
seller_url.last(items_per_page).each do |line|
# do stuff
end
pages_loaded += 1
end

How to input text into textfield with Selenium and Nokogiri?

I'm using Selenium Webdriver, Chromedriver, and Nokogiri. I've written a script to go to Google.com and parse the page:
require "selenium-webdriver"
require "nokogiri"
browser = Selenium::WebDriver.for :chrome
browser.get "https://google.com"
doc = Nokogiri::HTML.parse(browser.page_source)
Now, how can I input text into the searchbar with my ruby script? The searchbar has an id of #lst-ib.
Based on the information provided here Link you can do something like:
input = wait.until {
element = browser.find_element(:id, "lst-ib")
element if element.displayed?
}
input.send_keys("Input")

Watir or Selenium webdriver - Find for IMG SRC duplicated

Is there any possible way to index/list and compare <img src=""> values using watir or selenium webdriver?.
Update #1
I've succesfully managed to progress on the general script for finding the right <div> that contains the pictures
require 'watir-webdriver'
require 'selenium-webdriver'
b = Watir::Browser.new :firefox
$i = 1
(1..1000).each do |i|
b.goto 'http:example.com'
b.div(:id, 'pic_container').wait_until_present
puts 'div present'
begin
if
else
end
end
b.close
There will be more coding, only thing i can't resolve is enumerate all pictures available, comparing their sources and output the results.
Update #2
Thanks both JustinKo and Carldmitch for their answers. I went to this now:
require 'watir-webdriver'
require 'selenium-webdriver'
b = Watir::Browser.new :firefox
b.goto 'https://trafficmonsoon.com'
begin
Watir::Wait.until { b.url == "http://example.com" }
b.a(:href, "http://example.com/img").wait_until_present
b.a(:href, "http://example.com/img").click
Watir::Wait.until { b.url == "http://example.com/img" }
b.driver.manage.timeouts.implicit_wait = 10
b.a(:class, "btn").click
end
$i = 1
(1..1000).each do |i|
b.driver.manage.timeouts.implicit_wait = 30
pics_set = b.div(:id, 'pics_container').images
pics_array = []
pics_set.each_with_index do |image|
pics_array.push(image.current_src)
end
puts pics_array.find_all {|e| pics_array.rindex(e) != pics_array.index(e) }.uniq
end
b.close
The only problem here, is that, it is no showing which picture is duplicated, instead of, it only shows all img src without the one duplicated. Any hint on this?.
Thanks in advance.
Update #3
I got it working, it prints out the duplicated img src, but can't use the output data to do some web browser interactions, (clicks & drags)
Update #4
I've succesfully managed to interact with the data, only thing i want to know, is there any way to pic one or another duplicated picture?, since both ahve the same img srcit's impissible to click or drag from that attibute.
Here is the code that i've got by now
require 'sub'
require 'watir-webdriver'
require 'selenium-webdriver'
b = Watir::Browser.new :firefox
b.goto 'https://example'
begin
Watir::Wait.until { b.url == "http://example.com/img" }
b.a(:href, "http://example.com/imgs").wait_until_present
b.a(:href, "http://example.com/imgs").click
Watir::Wait.until { b.url == "http://example.com/imgs" }
b.driver.manage.timeouts.implicit_wait = 10
b.a(:class, "btn btn-xs btn-danger").click
end
b.driver.manage.timeouts.implicit_wait = 30
pics_set = b.div(:id, 'site_loader').images
pics_array = []
pics_set.each_with_index do |image|
pics_array.push(image.current_src)
end
duplicated = pics_array.find_all {|e| pics_array.rindex(e) != pics_array.index(e) }.uniq
duplicated[0].sub!("http://example.com
b.img(:src, duplicated).click", ".")
Update #5
Here is an example of the divi'm diggin' into
<div id="pic_container">
<img src="./images/test/3.png" style="cursor:pointer;width:64px" onclick="checkClick ("7hva9f")">
<img src="./images/test/5.png" style="cursor:pointer;width:59px" onclick="checkClick ("xt0nnc")">
<img src="./images/test/5.png" style="cursor:pointer;width:67px" onclick="checkClick ("1tyz9b")">
<img src="./images/test/1.png" style="cursor:pointer;width:67px" onclick="checkClick ("300yp7")">
<img src="./images/test/7.png" style="cursor:pointer;width:67px" onclick="checkClick ("pzxgyh")">
</div>
You can get all of the images in a browser or element by retrieving an ImageCollection. To get the collection you can either use the imgs or images method.
All of the images in the "pic_container" div can be retrieved by:
b.div(:id, 'pic_container').images
The ImageCollection is enumerable, which means you can get an array of the src attributes using:
b.div(:id, 'pic_container').images.map(&:src)
#=> ['src1', 'src2', 'etc']
Or if you need to do more custom logic per image, you can iterate through each one using each or each_with_index (if you also want an index). For example:
b.div(:id, 'pic_container').images.each_with_index do |image, i|
puts image.src
puts i
end
When I'm doing things other than driving the browser, I like to just use Ruby.
require 'watir-webdriver'
browser = Watir::Browser.new :chrome
browser.goto 'http:example.com'
#collects all images on page
image_collection = browser.images
# creates array of the 'src' urls
image_array = []
image_collection.each do |image|
image_array.push(image.current_src)
end
# outputs urls if any duplicates are found in the array
puts image_array.find_all {|e| image_array.rindex(e) != image_array.index(e) }.uniq
browser.close

Element is no longer attached to the DOM selenium

I have the following problem with my Selenium in Ruby. It generates the error, that the element is no longer attached to the DOM. I found some solutions to wait, but I wasn`t able to figure out if I can wait for an element which has no ID. Can i wait for an element if I only have the className?
require 'selenium-webdriver'
#require Firefox installation !!
browser = Selenium::WebDriver.for :firefox
browser.get <URL>
wait = Selenium::WebDriver::Wait.new(:timeout => 20)
js_code = "return document.getElementsByClassName('Cell ')"
rawdata = Array.new
puts rawdata.size
elements = browser.execute_script(js_code)
elements.each{|e| rawdata.push(e.text) }
puts rawdata.size
arrSize = rawdata.length
puts rawdata.at(5) + " " + rawdata.at(4) + " " + rawdata.at(9) + " " + rawdata.at(6)
This answers your question but not necessarily resolves your exception. If it doesn't, you might want to post HTML snippets and stacktrace.
Here is how to use WebDriverWait in Ruby:
# create wait like you have already done
wait = Selenium::WebDriver::Wait.new(:timeout => 20)
# wait until something, you can use any locators you want, not just ids
# don't inject JavaScript directly, unless you have to
wait.until { driver.find_element(:class => "dojoxGridCell") }
# do stuff to your raw data

print an array of input elements on a page using watir-webdriver

I would like to cycle threw all the input elements on a web page and print the name attribute of each. I am having trouble creating the array of elements to cycle threw. here is my code hitting the example page at bit.ly/watir-webdriver-demo
require 'watir-webdriver'
b = Watir::Browser.new
b.goto("bit.ly/watir-webdriver-demo")
listOfInputs = b.form(:method => "post")
listOfInputs.input.each do |i|
puts i.Name
end
How can I print out the name of each input on the page
looks like i just needed to not use form.
I use the body instead and this works!
require 'watir-webdriver'
browser = Watir::Browser.new
browser.goto("bit.ly/watir-webdriver-demo")
body = browser.body
body.inputs.each do |input|
puts input.name
end

Resources