I've been trying to write a webscraper in Ruby to scrape from a corporate events data website, and I'm referring to the Flickr example on the Mechanize docs page:
When I run corp_act_scrape.rb:
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
home_page = agent.get("http://www.eventsdata.com/main.php#")
mypage = home_page.form_with(:name => 'loginForm') do |form|
form.myusrname = ARGV[0]
form.mypasswrd = ARGV[1]
end.submit
rows = page.css('#recentEventsDiv > div.RecentEventsDisplay > table > tbody > tr')
nextLink = page.link_with(:text => 'Next')
hasNextLink = nextLink?
while page.hasNextLink do
puts rows
page = agent.click(page.nextLink)
end
I receive the error:
corp_act_scrape.rb:9:in `block in <main>': undefined method `myusrname=' for nil:
NilClass (NoMethodError)
from (eval):23:in `form_with'
from corp_act_scrape.rb:7:in `<main>'
Copying the Flickr example, it seems that I should be able to enter my username and password as methods, but it doesn't seem to work in practice. Also, that section of the code is pretty confusing to me. If you have an alternative method of submitting the form, please also let me know.
Related
I'm picking up ruby mechanize & getting tripped up from the start...
Why does this code:
#!/usr/bin/ruby env
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
page = agent.get('http://linkedin.com/')
#pp page
form = page.form.first
#form.fields.each { |f| puts f.name }
#pp page
spit out...
/home/ubuntu/.rbenv/versions/2.2.3/lib/ruby/gems/2.2.0/gems/mechanize-2.7.4/lib/mechanize/form.rb:217:in `method_missing': undefined method `first' for #<Mechanize::Form:0x007f9f2cf1ced0> (NoMethodError)
from 1-li.rb:10:in `<main>'
You want to use the forms method instead of the form method.
Per the documentation, the forms method returns "a list of all form tags", and you can then method-chain a first method. For example:
require 'mechanize'
mechanize = Mechanize.new
page = mechanize.get('http://www.w3schools.com/html/html_forms.asp')
forms = page.forms
forms.class #=> Array
form = forms.first
form.class #=> Mechanize::Form
To get the first form on the page, use use page.form or page.forms.first
I'm trying to write Rspec tests for my Mechanize agent.
My agent is supposed to go to a website, log into the form, then scrape some data off the website. I also downloaded FakeWeb to stub the HTTP requests, and make my tests faster.
Here is my account_spec.spec file:
require 'spec_helper'
describe Account do
before(:each) { #account = Account.new('bob', '1234') }
describe '#login' do
before(:each) do
home_page = File.read('spec/html/home_page.html')
login_page = File.read('spec/html/login_page.html')
FakeWeb.register_uri(:get,
"https://www.example.com/",
body: home_page,
status: ["200", "Success"],
content_type: "text/html")
FakeWeb.register_uri(:get,
"https://www.example.com/account/login",
body: login_page,
status: ["200", "Success"],
content_type: "text/html")
#web_crawler = Mechanize.new
#home_page = #web_crawler.get("https://www.example.com/")
#login_page = #web_crawler.get("https://www.example.com/account/login")
end # -- before :each
it 'finds the login form' do
login_form = #login_page.form_with(:class => "form login")
puts login_form.class # ==> nil:NilClass
end
end # -- #login
end # -- Account
However, when I comment out the FakeWeb uri for example/account/login (it then accesses the real server), it actually returns the correct form. Basically, if I am searching for the form in my locally saved HTML file, Mechanize can not find it, but if I check the actual website, it does find it. I would like to know if there is a way around this, and why this happens.
Any help would be greatly appreciated.
Am trying to go through a series of links with a css class title and click those links and then get the product title. But i keep getting the error undefined method each for #<Mechanize::Page::Link:0x007fbfe2524410> (NoMethodError)? I Don't understand what am doning wrong?
heres my code:
require 'mechanize'
file = File.new("outputscrape.txt", 'w')
agent = Mechanize.new { |agent|
agent.user_agent_alias = 'Windows Chrome'}
page = agent.get('http://www.amazon.com/s/ref=sr_nr_n_0?rh=n%3A283155%2Cn%3A%211000%2Cn%3A5%2Cn%3A15377001%2Cn%3A6133979011%2Cn%3A6133980011&bbn=6133979011&ie=UTF8&qid=1412193262&rnid=6133979011')
title_link = page.link_with(:dom_class => "title")
title_link.each do |link|
link.click
file.write(link.at('#productTitle').text.strip)
end
From the mechanize docs:
link_with(criteria)
Find a single link matching criteria.
You need to use:
links_with(criteria)
Find all links matching criteria.
The object mentioned in your error message, Page::Link:
undefined method each for #<Mechanize::Page::Link:0x007fbfe2524410>
(NoMethodError)
doesn't sound like more than one thing, does it? More than one thing would be more like Page::Links, or Page::Link::Group, or Page::LinkSet. You are doing the equivalent of:
10.each do |number|
puts number
end
However, numbers do not have an each() method, so that produces the error:
undefined method `each' for 10:Fixnum (NoMethodError)
Compare that to your error:
undefined method each for #<Mechanize::Page::Link:0x007fbfe2524410>
On the other hand an Array does have an each() method, so you can do this:
[10, 20, 30].each do |number|
puts number
end
I wrote a script with Mechanize to scrape some links, which later I will write code to put into an Excel file.
For now I can't authenticate past the first page. I keep getting an undefined method value= for nil:NilClass when attempting to set the password in the form and haven't been able to find any information on it.
I don't even have the method value= in my code so I don't understand what is going on. The code runs fine for the username, but once I enter the password and hit enter I get the error:
users.rb:11:in `block (2 levels) in <main>': undefined method `value=' for nil:NilClass (NoMethodError)
from (eval):23:in `form_with'
from formity_users.rb:7:in `block in <main>'
from /home/codelitt/.rvm/gems/ruby-2.0.0-p247/gems/mechanize-2.7.1/lib/mechanize.rb:433:in `get'
from formity_users.rb:5:in `<main>'
This is my users.rb script:
require 'rubygems'
require 'mechanize'
a = Mechanize.new
a.get('https://www.example.com') do |page|
#Enter information into forms
logged_in = page.form_with(:id => 'frmLogin') do |f|
puts "Username?"
f.field_with(:name => "LoginCommand.EmailAddress").value = gets.chomp
puts "Password?"
f.field_with(:name => "Login.Password").value = gets.chomp
end.click_button
#Click drop down
admin_page = logged_in.click.link_with(:text => /Admin/)
#Click Users and enter user admin section
user_admin = admin_page.click.link_with(:text => /Users/)
#Scrape and print links for now
user_admin.links.each do |link|
text = link.text.strip
next unless text.length > 0
puts text
end
end
I think your error is coming from
f.field_with(:name => "Login.Password")
which seems to be nil. For username, I see that you have specified input name LoginCommand.EmailAddress and for password input name is Login.Password.
I'd expect anybody who has written this markup to use consistent names. Maybe you should look that the underlying html to see you're using correct field names in your code.
I'm using the ruby's Mechanize gem to login the website.Here comes my problem:
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
page = agent.get('http://mail.163.com/')
form = page.form('login163')
form.username = 'myaccount'
form.password = 'mypassword'
page = agent.submit(form)
when I run the script, it just cant't work, the error is: "in `fetch': 405 => Net::HTTPMethodNotAllowed for http://mail.163.com/". What should I do?