I'm new to Ruby and Capybara and I'm trying to use capybara-webkit to scrape a website. All of the data I'm interested in lies in td tags with certain properties.
Where form is a particular form element I'm looking at, the following code works:
form.all('td').detect do |td|
if td['valign'] == 'top' && td['nowrap'] != 'nowrap'
print "#{td.text}\n"
end
end
The contents of all of the td elements I'm interested in are printed out correctly. However, when I try to then parse the text with a regex:
form.all('td').detect do |td|
if td['valign'] == 'top' && td['nowrap'] != 'nowrap'
print "#{td.text}\n"
val1, val2 = td.match(/(\d)(\d)/).captures # The real regex is more complex
end
end
...suddenly only the first td element is read/parsed. I've tried even just pushing each td.text value into an array for later parsing, but the same thing occurs. I've even tried making a clone of the td.text string and operating on that—no luck. There doesn't seem to be any sort of timeout on the page that would change the HTML elements. Absolutely no clue what could be causing this.
Any thoughts?
Related
I'm not able to click on available date in the calendar. Past dates are greyed out so trying to get today's date and click on it. I have tried execute script, click() and perform() but none of them worked.
today_date = Date.today.strftime('%d')
element = #driver.find_element(:xpath,"//td[contains(#class,\"CalendarDay__default\")][contains(#aria-label, '#{today_date}')]")
##driver.execute_script("arguments[0].click;", element )
#driver.action.move_to(ele).click(ele).perform
I also tried loop but td element is not displayed as some of elements are greyed out. Not sure how to select displayed elements?
today_date= Date.today.strftime('%d')
date_picker= #driver.find_element(:xpath,"//*[contains(#class,'SingleDatePicker_picker')]")
columns=date_picker.find_elements(:tag_name, "td")
calendar_date=columns.map(&:text).reject(&:empty?)
columns.each do |col|
# This returns true
puts "include date: #{calendar_date.include?today_date}"
if calendar_date.include?today_date
# Elemement is not displayed
puts "td displayed: #{col.displayed?}"
# Not clickable
col.click
end
end
Please find html below.
It's kind of hard to know what will work exactly without being able to work with the actual html, but my guess is that the first thing that is getting matched by that locator is not what you want. Try with this:
element = browser.td(aria_label: Time.now.strftime("%A, %B %d, %Y"))
Or you can select the first non-disabled element regardless of date:
element = browser.td(aria_disabled: 'false')
Edit: just realized your code is Selenium even though the label is Watir. The XPath equivalent to the above are:
".//td[#aria-label=#{Time.now.strftime("%A, %B %d, %Y")}]"
".//td[#aria-disabled='false']"
Pretty positive you have to use .clear, or maybe not as it doesn't seem to be working for me, maybe i'm just implementing it wrong I'm unsure.
Example:
browser.div(:id => "formLib1").clear.type("input", "hi")
Can anyone tell me how to simply clear a field then enter in a new string?
Assuming we are talking about a text field (ie you are not trying to clear/input a div tag), the .set() and .value= methods automatically clear the text field before inputting the value.
So one of the following would work:
browser.text_field(:id, 'yourid').set('hi')
browser.text_field(:id, 'yourid').value = 'hi'
Note that it is usually preferred to use .set since .value= does not fire events.
I had a similar issue, and, for some reason, .set() and .value= were not available/working for the element.
The element was a Watir::Input:
browser.input(:id => "formLib1").to_subtype.clear
after clearing the field I was able to enter text.
browser.input(:id => "formLib1").send_keys "hi"
I had a similar issue, and, for some reason, .set() and .value= were not available for the element.
The element was a Watir::HTMLElement:
[2] pry(#<Object>)> field.class
=> Watir::HTMLElement
field.methods.grep /^(set|clear)$/
=> []
I resorted to sending the backspace key until the value of the field was "":
count = 0
while field.value != "" && count < 50
field.send_keys(:backspace)
count += 1
end
field.send_keys "hi"
I want to extract all the HTML5 data attributes from a tag, just like this jQuery plugin.
For example, given:
<span data-age="50" data-location="London" class="highlight">Joe Bloggs</span>
I want to get a hash like:
{ 'data-age' => '50', 'data-location' => 'London' }
I was originally hoping use a wildcard as part of my CSS selector, e.g.
Nokogiri(html).css('span[#data-*]').size
but it seems that isn't supported.
Option 1: Grab all data elements
If all you need is to list all the page's data elements, here's a one-liner:
Hash[doc.xpath("//span/#*[starts-with(name(), 'data-')]").map{|e| [e.name,e.value]}]
Output:
{"data-age"=>"50", "data-location"=>"London"}
Option 2: Group results by tag
If you want to group your results by tag (perhaps you need to do additional processing on each tag), you can do the following:
tags = []
datasets = "#*[starts-with(name(), 'data-')]"
#If you want any element, replace "span" with "*"
doc.xpath("//span[#{datasets}]").each do |tag|
tags << Hash[tag.xpath(datasets).map{|a| [a.name,a.value]}]
end
Then tags is an array containing key-value hash pairs, grouped by tag.
Option 3: Behavior like the jQuery datasets plugin
If you'd prefer the plugin-like approach, the following will give you a dataset method on every Nokogiri node.
module Nokogiri
module XML
class Node
def dataset
Hash[self.xpath("#*[starts-with(name(), 'data-')]").map{|a| [a.name,a.value]}]
end
end
end
end
Then you can find the dataset for a single element:
doc.at_css("span").dataset
Or get the dataset for a group of elements:
doc.css("span").map(&:dataset)
Example:
The following is the behavior of the dataset method above. Given the following lines in the HTML:
<span data-age="50" data-location="London" class="highlight">Joe Bloggs</span>
<span data-age="40" data-location="Oxford" class="highlight">Jim Foggs</span>
The output would be:
[
{"data-location"=>"London", "data-age"=>"50"},
{"data-location"=>"Oxford", "data-age"=>"40"}
]
You can do this with a bit of xpath:
doc = Nokogiri.HTML(html)
data_attrs = doc.xpath "//span/#*[starts-with(name(), 'data-')]"
This gets all the attributes of span elements that start with 'data-'. (You might want to do this in two steps, first to get all the elements you're interested in, then extract the data attributes from each in turn.
Continuing the example (using the span in your question):
hash = data_attrs.each_with_object({}) do |n, hsh|
hsh[n.name] = n.value
end
puts hash
produces:
{"data-age"=>"50", "data-location"=>"London"}
Try looping through element.attributes while ignoring any attribue that does not start with a data-.
The Node#css docs mention a way to attach a custom psuedo-selector. This might look like the following for selecting nodes with attributes starting with 'data-':
Nokogiri(html).css('span:regex_attrs("^data-.*")', Class.new {
def regex_attrs node_set, regex
node_set.find_all { |node| node.attributes.keys.any? {|k| k =~ /#{regex}/ } }
end
}.new)
I'm trying to do simple monkey test for my web page, which get all active elements on page and click on them in random order.
When i do this I want to write a log to know, on which element my test click and on which test crashed
So I want log file to look like this
01.01.11 11.01.01 Clicked on Element <span id='myspan' class ='myclass .....>
01.01.11 11.01.01 Clicked on Element <span id='button' class ='myclass title = 'Button'.....>
or
01.01.11 11.01.01 Clicked on Element //*[#id='myspan']
01.01.11 11.01.01 Clicked on Element //*[#id='button']
Is it any way to do in Webdriver + Ruby?
I don't think there is a way but you could always do something like this (with watir-webdriver):
browser.divs.each do |div|
puts '<span ' + ['id','class','title'].map{|x| "#{x}='#{div.attribute_value(x)}'"}.join(' ') + '>'
end
WebDriver does not provide this type of functionality, you would have to get the page source and do some of your own parsing - I've done this in Html Agility Pack with C#, you would need to find a similar library for ruby (see: Options for HTML scraping?)
You can do this:
Get all elements, which are clickable
For example, find all links, find all clickable spans. Put those candidates in a list
Randomly pick a element in that candidate list
Click the very element and write some log
I tweaked #pguardiario's answer to come up with this method:
def get_element_dom_info(#e)
if #e.class != Selenium::WebDriver::Element
raise "No valid element passed: #{#e.class}"
end
#attrs = ['id', 'class', 'title', 'href', 'src', 'type', 'name']
return "<" + #e.tag_name + #attrs.map{ |x| " #{x}='#{#e.attribute(x)}'" if #e.attribute(x) && #e.attribute(x) != "" }.join('') + ">"
end
Of course, it expects that the single parameter you pass into it is an actual Selenium element. Also, it doesn't include every possible attribute, but that's the majority of them (and you can always add your extra attributes if needed).
I suppose you can integrate this via some code like this:
def clickElement(*args)
... # parse vars
#e = #driver.find_element(...)
puts get_timestamp + " Clicked on Element: " + get_element_dom_info(#e)
end
UPDATE
I recently realized that I could get the full html of the element using native javascript (d'oh!). You have to use a hack to get the "outerHTML". Here is my new method:
def get_element_dom_info(how, what)
e = #driver.find_element(how, what)
# Use native javascript to return element dom info by creating a wrapper
# that the element node is cloned to and we check the innerHTML of the parent wrapper.
return #driver.execute_script("var f = document.createElement('div').appendChild(arguments[0].cloneNode(true)); return f.parentNode.innerHTML", e)
end
I try to access a form using mechanize (Ruby).
On my form I have a gorup of Radiobuttons.
So I want to check one of them.
I wrote:
target_form = (page/:form).find{ |elem| elem['id'] == 'formid'}
target_form.radiobutton_with(:name => "radiobuttonname")[2].check
In this line I want to check the radiobutton with the value of 2.
But in this line, I get an error:
: undefined method `radiobutton_with' for #<Nokogiri::XML::Element:0x9b86ea> (NoMethodError)
The problem occured because using a Mechanize page as a Nokogiri document (by calling the / method, or search, or xpath, etc.) returns Nokogiri elements, not Mechanize elements with their special methods.
As noted in the comments, you can be sure to get a Mechanize::Form by using the form_with method to find your form instead.
Sometimes, however, you can find the element you want with Nokogiri but not with Mechanize. For example, consider a page with a <select> element that is not inside a <form>. Since there is no form, you can't use the Mechanize field_with method to find the select and get a Mechanize::Form::SelectList instance.
If you have a Nokogiri element and you want the Mechanize equivalent, you can create it by passing the Nokogiri element to the constructor. For example:
sel = Mechanize::Form::SelectList.new( page.at_xpath('//select[#name="city"]') )
In your case where you had a Nokogiri::XML::Element and wanted a Mechanize::Form:
# Find the xml element
target_form = (page/:form).find{ |elem| elem['id'] == 'formid'}
target_form = Mechanize::Form.new( target_form )
P.S. The first line above is more simply achieved by target_form = page.at_css('#formid').