I'm using the ruby's Mechanize gem to login the website.Here comes my problem:
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
page = agent.get('http://mail.163.com/')
form = page.form('login163')
form.username = 'myaccount'
form.password = 'mypassword'
page = agent.submit(form)
when I run the script, it just cant't work, the error is: "in `fetch': 405 => Net::HTTPMethodNotAllowed for http://mail.163.com/". What should I do?
Related
I'm trying to parse a url but keep having this 500. Any suggestion please?
require 'open-uri'
require 'json'
require 'csv'
url = 'https://gist.githubusercontent.com/gregclermont/ca9e8abdff5dee9ba9db/raw/
7b2318efcf8a7048f720bcaff2031d5467a4a2c8/users.json'
encoded_url = URI.encode(url)
open(encoded_url) do |stream|
quote = JSON.parse(stream.read)
puts quote
end
require 'open-uri'
require 'json'
url = 'https://gist.githubusercontent.com/gregclermont/ca9e8abdff5dee9ba9db/raw/7b2318efcf8a7048f720bcaff2031d5467a4a2c8/users.json'
open(url) { |f| JSON.parse(f.read) }
Works fine for me.
I was following this tutorial for screen scraping with Ruby and Watir.
I tried to write a simple script to return text from Wikipedia:
require "selenium-webdriver"
browser = Selenium::WebDriver.for :chrome
browser.get "https://wikipedia.org"
require "nokogiri"
puts doc.xpath(".//*[#id='langsearch-input']/p").inner_text
But when I run the script, I get this error in my terminal:
$ ruby app/views/layouts/scraper.rb
app/views/layouts/scraper.rb:7:in `<main>': undefined local variable or method `doc' for main:Object (NameError)
I have nokogiri 1.6.7.2, watir-webdriver 0.9.1, and watir 4.0.2 installed.
What am I doing wrong?
You are missing a line to that converts the browser HTML into a Nokogiri document. In other words, you have not defined what doc is.
require "selenium-webdriver"
browser = Selenium::WebDriver.for :chrome
browser.get "https://wikipedia.org"
require "nokogiri"
doc = Nokogiri::HTML.parse(browser.page_source)
puts doc.xpath(".//*[#id='langsearch-input']/p").inner_text
#=> ""
Note that while this will address the exception, the inner_text will return an empty string - ie "". The element with id "langsearch-input" is an input field, which dos not have a child p element or a text node.
Also note that you are not actually using Watir at all. To use Watir, it would look like:
require 'watir-webdriver'
browser = Watir::Browser.new :chrome
browser.goto "https://wikipedia.org"
require 'nokogiri'
doc = Nokogiri::HTML.parse(browser.html)
puts doc.xpath(".//*[#id='langsearch-input']/p").inner_text
#=> ""
However, unless you are doing a lot of parsing of a single large HTML chunk, using Watir without Nokogiri might be easier:
require 'watir-webdriver'
browser = Watir::Browser.new :chrome
browser.goto "https://wikipedia.org"
puts browser.text_field(id: 'langsearch-input').value
I'm picking up ruby mechanize & getting tripped up from the start...
Why does this code:
#!/usr/bin/ruby env
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
page = agent.get('http://linkedin.com/')
#pp page
form = page.form.first
#form.fields.each { |f| puts f.name }
#pp page
spit out...
/home/ubuntu/.rbenv/versions/2.2.3/lib/ruby/gems/2.2.0/gems/mechanize-2.7.4/lib/mechanize/form.rb:217:in `method_missing': undefined method `first' for #<Mechanize::Form:0x007f9f2cf1ced0> (NoMethodError)
from 1-li.rb:10:in `<main>'
You want to use the forms method instead of the form method.
Per the documentation, the forms method returns "a list of all form tags", and you can then method-chain a first method. For example:
require 'mechanize'
mechanize = Mechanize.new
page = mechanize.get('http://www.w3schools.com/html/html_forms.asp')
forms = page.forms
forms.class #=> Array
form = forms.first
form.class #=> Mechanize::Form
To get the first form on the page, use use page.form or page.forms.first
I've been trying to write a webscraper in Ruby to scrape from a corporate events data website, and I'm referring to the Flickr example on the Mechanize docs page:
When I run corp_act_scrape.rb:
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
home_page = agent.get("http://www.eventsdata.com/main.php#")
mypage = home_page.form_with(:name => 'loginForm') do |form|
form.myusrname = ARGV[0]
form.mypasswrd = ARGV[1]
end.submit
rows = page.css('#recentEventsDiv > div.RecentEventsDisplay > table > tbody > tr')
nextLink = page.link_with(:text => 'Next')
hasNextLink = nextLink?
while page.hasNextLink do
puts rows
page = agent.click(page.nextLink)
end
I receive the error:
corp_act_scrape.rb:9:in `block in <main>': undefined method `myusrname=' for nil:
NilClass (NoMethodError)
from (eval):23:in `form_with'
from corp_act_scrape.rb:7:in `<main>'
Copying the Flickr example, it seems that I should be able to enter my username and password as methods, but it doesn't seem to work in practice. Also, that section of the code is pretty confusing to me. If you have an alternative method of submitting the form, please also let me know.
So far as I can tell the Google::Reader API is working fine, in that it returns an sid successfully. However, lower level interactions with gmail won't run properly:
warning: peer certificate won't be verified in this SSL session
#<Google::Reader::Base:0xb76efa0c
#email="hawat.thufir",
#password="pword",
#sid=
"DQAAAL4AAACq-Wrm1V_anY1sV4r_3kA4EuRax9oTt5z7upD6NNfT0e7bsN-8WA7cQOTt7zypI5fymS9Ux8QTtyu-7xal9c6szb2ZoeBR5dwPH_m7OrBe6ICkKY-dPus0_g5DFW6tckpCZmJIyrP9zfUQKJzGYjnYKJzJEJYFEdvMu756Hl68qeD6AuGKDdFWbyBEvgQGR2oFjkxHYGqwTQ9oHJBfBkMH9hrDl2Q9C_cVE5A-_Bb9RiUy6WuwIbS-pPN56z3XtpA">
#<URI::HTTPS:0xb76e7988 URL:https://hawat.thufir:pword#gmail.com>
#<Net::HTTP gmail.com:443 open=false>
#<Net::HTTP::Get GET>
["Basic aGF3YXQudGh1ZmlyOmRldm90Y2hrYQ=="]
#<Net::HTTP::Get GET>
/usr/lib/ruby/1.8/net/http.rb:1060:in `request': undefined method `closed?' for nil:NilClass (NoMethodError)
from ./req_uri.rb:23
code:
#!/usr/bin/ruby -w
require 'rubygems'
require 'google/reader'
require 'pp'
require 'net/http'
require 'net/https'
require 'uri'
require 'yaml'
yml = YAML.load_file 'login.yml'
user = yml["user"]
pword = yml["pword"]
pp Google::Reader::Base.establish_connection(user, pword)
uri = URI.parse "https://#{user}:#{pword}#gmail.com"
pp uri
pp http = Net::HTTP.new(uri.host, uri.port)
pp request = Net::HTTP::Get.new(uri.request_uri)
pp request.basic_auth(user, pword)
pp request
response = http.request(request)
So, the question is, should the request be basically empty when printed? What's wrong with sending the request to the response? That seems to be correct so far as I can ascertain. What am I missing?