I'm picking up ruby mechanize & getting tripped up from the start...
Why does this code:
#!/usr/bin/ruby env
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
page = agent.get('http://linkedin.com/')
#pp page
form = page.form.first
#form.fields.each { |f| puts f.name }
#pp page
spit out...
/home/ubuntu/.rbenv/versions/2.2.3/lib/ruby/gems/2.2.0/gems/mechanize-2.7.4/lib/mechanize/form.rb:217:in `method_missing': undefined method `first' for #<Mechanize::Form:0x007f9f2cf1ced0> (NoMethodError)
from 1-li.rb:10:in `<main>'
You want to use the forms method instead of the form method.
Per the documentation, the forms method returns "a list of all form tags", and you can then method-chain a first method. For example:
require 'mechanize'
mechanize = Mechanize.new
page = mechanize.get('http://www.w3schools.com/html/html_forms.asp')
forms = page.forms
forms.class #=> Array
form = forms.first
form.class #=> Mechanize::Form
To get the first form on the page, use use page.form or page.forms.first
Related
I was following this tutorial for screen scraping with Ruby and Watir.
I tried to write a simple script to return text from Wikipedia:
require "selenium-webdriver"
browser = Selenium::WebDriver.for :chrome
browser.get "https://wikipedia.org"
require "nokogiri"
puts doc.xpath(".//*[#id='langsearch-input']/p").inner_text
But when I run the script, I get this error in my terminal:
$ ruby app/views/layouts/scraper.rb
app/views/layouts/scraper.rb:7:in `<main>': undefined local variable or method `doc' for main:Object (NameError)
I have nokogiri 1.6.7.2, watir-webdriver 0.9.1, and watir 4.0.2 installed.
What am I doing wrong?
You are missing a line to that converts the browser HTML into a Nokogiri document. In other words, you have not defined what doc is.
require "selenium-webdriver"
browser = Selenium::WebDriver.for :chrome
browser.get "https://wikipedia.org"
require "nokogiri"
doc = Nokogiri::HTML.parse(browser.page_source)
puts doc.xpath(".//*[#id='langsearch-input']/p").inner_text
#=> ""
Note that while this will address the exception, the inner_text will return an empty string - ie "". The element with id "langsearch-input" is an input field, which dos not have a child p element or a text node.
Also note that you are not actually using Watir at all. To use Watir, it would look like:
require 'watir-webdriver'
browser = Watir::Browser.new :chrome
browser.goto "https://wikipedia.org"
require 'nokogiri'
doc = Nokogiri::HTML.parse(browser.html)
puts doc.xpath(".//*[#id='langsearch-input']/p").inner_text
#=> ""
However, unless you are doing a lot of parsing of a single large HTML chunk, using Watir without Nokogiri might be easier:
require 'watir-webdriver'
browser = Watir::Browser.new :chrome
browser.goto "https://wikipedia.org"
puts browser.text_field(id: 'langsearch-input').value
Am trying to go through a series of links with a css class title and click those links and then get the product title. But i keep getting the error undefined method each for #<Mechanize::Page::Link:0x007fbfe2524410> (NoMethodError)? I Don't understand what am doning wrong?
heres my code:
require 'mechanize'
file = File.new("outputscrape.txt", 'w')
agent = Mechanize.new { |agent|
agent.user_agent_alias = 'Windows Chrome'}
page = agent.get('http://www.amazon.com/s/ref=sr_nr_n_0?rh=n%3A283155%2Cn%3A%211000%2Cn%3A5%2Cn%3A15377001%2Cn%3A6133979011%2Cn%3A6133980011&bbn=6133979011&ie=UTF8&qid=1412193262&rnid=6133979011')
title_link = page.link_with(:dom_class => "title")
title_link.each do |link|
link.click
file.write(link.at('#productTitle').text.strip)
end
From the mechanize docs:
link_with(criteria)
Find a single link matching criteria.
You need to use:
links_with(criteria)
Find all links matching criteria.
The object mentioned in your error message, Page::Link:
undefined method each for #<Mechanize::Page::Link:0x007fbfe2524410>
(NoMethodError)
doesn't sound like more than one thing, does it? More than one thing would be more like Page::Links, or Page::Link::Group, or Page::LinkSet. You are doing the equivalent of:
10.each do |number|
puts number
end
However, numbers do not have an each() method, so that produces the error:
undefined method `each' for 10:Fixnum (NoMethodError)
Compare that to your error:
undefined method each for #<Mechanize::Page::Link:0x007fbfe2524410>
On the other hand an Array does have an each() method, so you can do this:
[10, 20, 30].each do |number|
puts number
end
I've been trying to write a webscraper in Ruby to scrape from a corporate events data website, and I'm referring to the Flickr example on the Mechanize docs page:
When I run corp_act_scrape.rb:
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
home_page = agent.get("http://www.eventsdata.com/main.php#")
mypage = home_page.form_with(:name => 'loginForm') do |form|
form.myusrname = ARGV[0]
form.mypasswrd = ARGV[1]
end.submit
rows = page.css('#recentEventsDiv > div.RecentEventsDisplay > table > tbody > tr')
nextLink = page.link_with(:text => 'Next')
hasNextLink = nextLink?
while page.hasNextLink do
puts rows
page = agent.click(page.nextLink)
end
I receive the error:
corp_act_scrape.rb:9:in `block in <main>': undefined method `myusrname=' for nil:
NilClass (NoMethodError)
from (eval):23:in `form_with'
from corp_act_scrape.rb:7:in `<main>'
Copying the Flickr example, it seems that I should be able to enter my username and password as methods, but it doesn't seem to work in practice. Also, that section of the code is pretty confusing to me. If you have an alternative method of submitting the form, please also let me know.
I'm using the ruby's Mechanize gem to login the website.Here comes my problem:
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
page = agent.get('http://mail.163.com/')
form = page.form('login163')
form.username = 'myaccount'
form.password = 'mypassword'
page = agent.submit(form)
when I run the script, it just cant't work, the error is: "in `fetch': 405 => Net::HTTPMethodNotAllowed for http://mail.163.com/". What should I do?
I'm using the following code to download a page through a POST request:
require 'net/http'
require 'uri'
res = Net::HTTP.post_form(URI.parse('http://example.com'),{'post'=>'1'})
puts res.split("Date")
The URL I originally used has been replaced with example.com
It works great, but when I try to call split (last line) it returns an error:
<main>': undefined methodsplit' for # (NoMethodError)
I'm new to ruby, so I'm confused about this error.
The method you are calling returns a HTTPResponse object, so you need to leverage that object's methods to get what you want. maybe something like:
require 'net/http'
require 'uri'
res = Net::HTTP.post_form(URI.parse('http://example.com'),{'post'=>'1'})
puts res.body.split("Date")
Notice the body method.
Or, if you want to see all the data returned:
require 'net/http'
require 'uri'
res = Net::HTTP.post_form(URI.parse('http://example.com'),{'post'=>'1'})
puts res.inspect
Hope this helps!