Mechanize on HTTPS site - ruby

Has anyone used the Mechanize gem on a site that required SSL?
When I try to access such a website Mechanize tries to use standard HTTP which results in endless redirections between http:// and https://.

Mechanize works just fine with HTTPS. Try setting
agent.log = Logger.new(STDOUT)
to see what's going on between Mechanize and the server. If you are still having trouble, post a sample of the code and somebody will help.

I just gave Mechanize a try with my company's web site. The home page is HTTP, but it contains a link, "customer login," which sends the browser to an HTTPS page. It worked fine. The code is:
#!/usr/bin/ruby1.8
require 'rubygems'
require 'mechanize'
agent = WWW::Mechanize.new
page = agent.get("http://www.not_the_real_url.com")
link = page.link_with(:text=>"CUSTOMER LOGIN")
page = link.click
form = page.forms.first
form['user_login'] = 'not my real login name'
form['user_password'] = 'not my real password'
page = form.submit

Related

cURL works but mechanize doesn't

I am trying to scrape my university web site using ruby mechanize. This my ruby script;
require 'mechanize'
agent = Mechanize.new
agent.get('https://kampus.izu.edu.tr')
This script doesn't return response. I need to see login page but the response is different. I also tried it with cURL like this;
curl https://kampus.izu.edu.tr
This works and return the login page. What am I missing?
Make sure that you are storing the output of agent.get(). From your example, I don't see how you would be using/printing the response of this request.
Try this:
require 'mechanize'
agent = Mechanize.new
page = agent.get("https://kampus.izu.edu.tr")
puts page.body
The .get() method returns a Mechanize::Page object that you can call other methods on, such as .css(), to select elements by css selectors. Check out the documentation here

Net::HTTPNotFound for https://dashboard.stripe.com/login -- unhandled response

We need to retrieve access to the logs for our Stripe instance for a specific time period. There isn't an endpoint in there API (grrrr) so we are trying a quick screen scrape, because the dashboard structures them quite nicely.
At this point though I can't even log into Stripe using Mechanize. Below is the code I am using to log in
require 'mechanize'
agent = Mechanize.new
agent.user_agent_alias = 'Mac Safari'
agent.follow_meta_refresh = true
starting_link = 'https://dashboard.stripe.com/login'
page = agent.get(starting_link)
login_form = page.form
login_form.email = email
login_form.password = pass
new_page = agent.submit(login_form, login_form.buttons[0])
The response I get from running this is:
Mechanize::ResponseCodeError: 404 => Net::HTTPNotFound for https://dashboard.stripe.com/login -- unhandled response
from /Users/Nicholas/.rvm/gems/ruby-2.2.2/gems/mechanize-2.7.4/lib/mechanize/http/agent.rb:316:in `fetch'
from /Users/Nicholas/.rvm/gems/ruby-2.2.2/gems/mechanize-2.7.4/lib/mechanize.rb:1323:in `post_form'
from /Users/Nicholas/.rvm/gems/ruby-2.2.2/gems/mechanize-2.7.4/lib/mechanize.rb:584:in `submit'
from (irb):21
from /Users/Nicholas/.rvm/rubies/ruby-2.2.2/bin/irb:11:in `<main>'
I tried logging into several other sites and was successful. I also aliased the agent and handled the re-direct (a strategy mentioned in other questions).
Does anyone know what tweaks could be made to Mechanize log into Stripe?
Thanks much
Short Answer:
I would suggest using a browser engine like Selenium to get the logs data as that will be much simpler.
Long Answer:
Though your mechanize form submission code is correct, it is assuming the Stripe login form is being submitted using a normal POST request which is not the case.
The Stripe login form is being submitted using an AJAX request.
Here is the working code to take that into account:
require 'mechanize'
agent = Mechanize.new
agent.user_agent_alias = 'Mac Safari'
agent.follow_meta_refresh = true
starting_link = 'https://dashboard.stripe.com/login'
page = agent.get(starting_link)
login_form = page.form
login_form.action = 'https://dashboard.stripe.com/ajax/sessions'
login_form.email = email
login_form.password = password
new_page = agent.submit(login_form, login_form.buttons[0])
As you can see, simply setting the form's action property to the AJAX url solves your problem.
However, once you have logged in successfully, navigating around the site to scrape it for logs will not be possible with mechanize as it does not support javascript. You can check that by requesting the dashboard's url. You will get an error message to enable javascript.
Further, Stripe's dashboard is fully javascript powered. It simply makes an AJAX request to fetch data from the server and then render it as HTML.
This can work for you as the server response is JSON. You can simply parse it and get the required information from logs.
Upon further inspection(in Chrome Developer Tools), I found that the logs are requested from the url https://dashboard.stripe.com/ajax/logs?count=5&include%5B%5D=total_count&limit=10&method=not_get
Again, if you try to access this url using mechanize, you will run into CSRF token problem which is maintained between requests by Stripe.
The CSRF token problem can be solved using mechanize cookies but it will not be worth the effort.
I would suggest using a browser engine like Selenium to get the logs data as that will much simpler.

Can mechanize work with browsers?

I am using ruby's gem mechanize to automate a file upload after logging in to a particular site..
I am able to login using
#!/usr/bin/ruby
require 'rubygems'
require 'mechanize'
#creating an object for Mechanize class
a = Mechanize.new { |agent|
# site refreshes after login
agent.follow_meta_refresh = true
}
#Getting the page
a.get('https://www.samplesite.com/') do |page|
puts page.title
form = page.forms.first
form.fields.each {|f| puts f.name}
form['username'] = "username"
form['password'] = "password"
# Then submitting the form and reaching the page
Now there are two questions...
a. Can I see this happening on browser using any agent or tool?
b. Is there any way to keep the mechanize waiting for the page to load?
Do you try Selenium WebDriver ?
It should easily integrates with your Ruby program

Login to Vimeo Via Mechanize (ruby)

I am trying to login to my vimeo account using Mechanize in order to scrape hundreds of video titles and urls. Here is my code:
task :import_list => :environment do
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
agent.user_agent = "Mac Safari"
puts "Logging in..."
page = agent.get("http://vimeo.com/log_in")
form = page.forms[0]
form.fields[0].value = 'sample#email.com'
form.fields[1].value = 'somepassword'
page = agent.submit(form)
pp page
end
and my error message:
401 => Net::HTTPUnauthorized
This is running through a rake task if it matters at all.
Any ideas?
Not sure how to do it with Mecnanize but here is code to do it with Capybara:
require 'capybara/dsl'
require 'selenium-webdriver'
Capybara.run_server = false
Capybara.default_driver = :selenium
class Vimeo
include Capybara::DSL
def go
visit "https://vimeo.com/log_in"
fill_in "email", :with => "ivan.bisevac#gmail.com"
fill_in "password", :with => "strx8UnK0a-"
find("span.submit > input").click
end
end
v = Vimeo.new
v.go
Also, Capybara is better for scraping javascript sites.
my first thought was:
Vimeo login does not work without JavaScript, so it's not possible to login with Mechanize.
To test my bold statement:
without javascript
disable javascript for all sites in your browser
try to login ( fill out the form in your browser like you normally do )
you'll get an unauthorized message on the resulting page
with javascript
enable javascript
everything works as expected
update
Vimeo.com uses the following querystring when logging in.
Gonna try and post the string manually with Mechanize.
action=login&service=vimeo&email=your-email&password=your-password&token=k7yd5du3L9aa5577bb0e8fc
update 2
I've got a Ruby Rake task that logs in to a Vimeo Pro account
and reads the HTTP Live Streaming link from a video settings page.
update 3
I've posted a working Ruby Rake task: https://gist.github.com/webdevotion/5635755.
Have you tried using the official Vimeo API?
It seems that authorization give something 'token'
http header part:
action=login&service=vimeo&email=your_mail&password=asfsdfsdf&token=51605c24c92a4d4706ecbe9ded7e3851

Screen scraping Akamai's control panel using Mechanize for Ruby - Cookies Issue

I am attempting to screen scrape some data from Akamai's control panel, but I am having trouble while logging in to the page via mechanize for Ruby.
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
url = 'http://control.akamai.com'
page = agent.get( url )
puts page.content
Upon examining the page, I find displayed:
"Cookie support has been disabled in your browser. Please enable cookies before continuing."
The fact that the page thinks I have cookies disabled prevents me from logging in. Any thoughts?
You can specify other user agent:
agent.user_agent_alias = 'Mac Safari'
Or/And create a cookie manually:
cookie = Mechanize::Cookie.new(key, value)
cookie.domain = '.akamai.com'
cookie.path = '/'
agent.cookie_jar.add(cookie)
For more info about Ruby Mechanize cookies, read this pages:
http://mechanize.rubyforge.org/Mechanize/Cookie.html
http://mechanize.rubyforge.org/Mechanize/CookieJar.html

Resources