cURL works but mechanize doesn't - ruby

I am trying to scrape my university web site using ruby mechanize. This my ruby script;
require 'mechanize'
agent = Mechanize.new
agent.get('https://kampus.izu.edu.tr')
This script doesn't return response. I need to see login page but the response is different. I also tried it with cURL like this;
curl https://kampus.izu.edu.tr
This works and return the login page. What am I missing?

Make sure that you are storing the output of agent.get(). From your example, I don't see how you would be using/printing the response of this request.
Try this:
require 'mechanize'
agent = Mechanize.new
page = agent.get("https://kampus.izu.edu.tr")
puts page.body
The .get() method returns a Mechanize::Page object that you can call other methods on, such as .css(), to select elements by css selectors. Check out the documentation here

Related

Trouble logging in to Pinterest with ruby mechanize

I am trying to build a simple crawler that can login to Pinterest and pin a few things to my board.
The first step of this is successfully login. I read through the documentation and it seems like this should work but it doesn't.
When I run the code I expect it to print out a title like "Mary... is mary... on Pinterest"
But instead the title of the page is "Pinterest-The Visual Discovery Tool"
I think there's something wrong with my script.
require 'rubygems'
require 'mechanize'
require 'pry'
a = Mechanize.new
a.get('https://www.pinterest.com/login/') do |page|
form = page.forms.first
form.fields[0].value = "m...#gmail.com"
form.fields[1].value = "some_password"
new_page = form.submit
puts new_page.title
end
Keep in mind that mechanize has no capability of executing javascript and if the page depends on javascript, it may not load correctly. Although I only did a light read through of the source, it looks like it is very dependent on javascript and therefore can't be crawled effectively with mechanize.
Another option might be to use a headless browser like watir or selenium.

Difficulty Accessing Section of Website using Ruby Mechanize

I am trying to access the calendar data on an airbnb listing and so far have been unsuccessful. I am using the Mechanize gem in Ruby, and when I try to access the link to access the table, I am encountering the following error:
require 'mechanize'
agent = Mechanize.new
page1=agent.get("https://www.airbnb.com/rooms/726348")
page2=agent.get("https://www.airbnb.com/rooms/calendar_tab_inner2/73944?cal_month=11&cal_year=2013&currency=USD")
Mechanize::ResponseCodeError: 400 => Net::HTTPBadRequest for https://www.airbnb.com/rooms/calendar_tab_inner2/726348?cal_month=11&cal_year=2013&currency=USD -- unhandled response
I have also tried to click on the tab that generates the table with the following code, but doing so simply generates the html from the original url.
agent = Mechanize.new
page1=agent.get("https://www.airbnb.com/rooms/726348")
page2=agent.click(page1.link_with(:href => '#calendar'))
Any help would greatly appreciated. Thanks!
I see the problem, you need to check the request headers:
page = agent.get url, nil, nil, {'X-Requested-With' => 'XMLHttpRequest'}

why we need user_agent_alias with mechanize object?

I just wondering for some informations about mechanize and found the below code from Internet:
require 'mechanize'
require 'logger'
agent = Mechanize.new
agent.user_agent_alias = 'Windows IE 9'
agent.follow_meta_refresh = true
agent.log = Logger.new(STDOUT)
Could any one please explain why user_agent_alias and follow_meta_refresh is needed when,mechanize itself is a browser?
Mechanize isn't a browser. It is a page parser that gives you enough methods to make it easy/convenient to navigate through a site. But, in no way is it a browser.
user_agent_alias sets the signature of Mechanize when it's running and making page requests. In your example it's trying to spoof a site by masquerading as "IE 9", but that signature isn't going to fool any system that is sniffing the User-Agent header.
follow_meta_refresh, well, you should take the time to search for "meta" tags with the "refresh" parameter. It's trivial to find out about it, and, then you'll understand. Or just read the documentation for it.

Login to Vimeo Via Mechanize (ruby)

I am trying to login to my vimeo account using Mechanize in order to scrape hundreds of video titles and urls. Here is my code:
task :import_list => :environment do
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
agent.user_agent = "Mac Safari"
puts "Logging in..."
page = agent.get("http://vimeo.com/log_in")
form = page.forms[0]
form.fields[0].value = 'sample#email.com'
form.fields[1].value = 'somepassword'
page = agent.submit(form)
pp page
end
and my error message:
401 => Net::HTTPUnauthorized
This is running through a rake task if it matters at all.
Any ideas?
Not sure how to do it with Mecnanize but here is code to do it with Capybara:
require 'capybara/dsl'
require 'selenium-webdriver'
Capybara.run_server = false
Capybara.default_driver = :selenium
class Vimeo
include Capybara::DSL
def go
visit "https://vimeo.com/log_in"
fill_in "email", :with => "ivan.bisevac#gmail.com"
fill_in "password", :with => "strx8UnK0a-"
find("span.submit > input").click
end
end
v = Vimeo.new
v.go
Also, Capybara is better for scraping javascript sites.
my first thought was:
Vimeo login does not work without JavaScript, so it's not possible to login with Mechanize.
To test my bold statement:
without javascript
disable javascript for all sites in your browser
try to login ( fill out the form in your browser like you normally do )
you'll get an unauthorized message on the resulting page
with javascript
enable javascript
everything works as expected
update
Vimeo.com uses the following querystring when logging in.
Gonna try and post the string manually with Mechanize.
action=login&service=vimeo&email=your-email&password=your-password&token=k7yd5du3L9aa5577bb0e8fc
update 2
I've got a Ruby Rake task that logs in to a Vimeo Pro account
and reads the HTTP Live Streaming link from a video settings page.
update 3
I've posted a working Ruby Rake task: https://gist.github.com/webdevotion/5635755.
Have you tried using the official Vimeo API?
It seems that authorization give something 'token'
http header part:
action=login&service=vimeo&email=your_mail&password=asfsdfsdf&token=51605c24c92a4d4706ecbe9ded7e3851

Mechanize on HTTPS site

Has anyone used the Mechanize gem on a site that required SSL?
When I try to access such a website Mechanize tries to use standard HTTP which results in endless redirections between http:// and https://.
Mechanize works just fine with HTTPS. Try setting
agent.log = Logger.new(STDOUT)
to see what's going on between Mechanize and the server. If you are still having trouble, post a sample of the code and somebody will help.
I just gave Mechanize a try with my company's web site. The home page is HTTP, but it contains a link, "customer login," which sends the browser to an HTTPS page. It worked fine. The code is:
#!/usr/bin/ruby1.8
require 'rubygems'
require 'mechanize'
agent = WWW::Mechanize.new
page = agent.get("http://www.not_the_real_url.com")
link = page.link_with(:text=>"CUSTOMER LOGIN")
page = link.click
form = page.forms.first
form['user_login'] = 'not my real login name'
form['user_password'] = 'not my real password'
page = form.submit

Resources