We need to retrieve access to the logs for our Stripe instance for a specific time period. There isn't an endpoint in there API (grrrr) so we are trying a quick screen scrape, because the dashboard structures them quite nicely.
At this point though I can't even log into Stripe using Mechanize. Below is the code I am using to log in
require 'mechanize'
agent = Mechanize.new
agent.user_agent_alias = 'Mac Safari'
agent.follow_meta_refresh = true
starting_link = 'https://dashboard.stripe.com/login'
page = agent.get(starting_link)
login_form = page.form
login_form.email = email
login_form.password = pass
new_page = agent.submit(login_form, login_form.buttons[0])
The response I get from running this is:
Mechanize::ResponseCodeError: 404 => Net::HTTPNotFound for https://dashboard.stripe.com/login -- unhandled response
from /Users/Nicholas/.rvm/gems/ruby-2.2.2/gems/mechanize-2.7.4/lib/mechanize/http/agent.rb:316:in `fetch'
from /Users/Nicholas/.rvm/gems/ruby-2.2.2/gems/mechanize-2.7.4/lib/mechanize.rb:1323:in `post_form'
from /Users/Nicholas/.rvm/gems/ruby-2.2.2/gems/mechanize-2.7.4/lib/mechanize.rb:584:in `submit'
from (irb):21
from /Users/Nicholas/.rvm/rubies/ruby-2.2.2/bin/irb:11:in `<main>'
I tried logging into several other sites and was successful. I also aliased the agent and handled the re-direct (a strategy mentioned in other questions).
Does anyone know what tweaks could be made to Mechanize log into Stripe?
Thanks much
Short Answer:
I would suggest using a browser engine like Selenium to get the logs data as that will be much simpler.
Long Answer:
Though your mechanize form submission code is correct, it is assuming the Stripe login form is being submitted using a normal POST request which is not the case.
The Stripe login form is being submitted using an AJAX request.
Here is the working code to take that into account:
require 'mechanize'
agent = Mechanize.new
agent.user_agent_alias = 'Mac Safari'
agent.follow_meta_refresh = true
starting_link = 'https://dashboard.stripe.com/login'
page = agent.get(starting_link)
login_form = page.form
login_form.action = 'https://dashboard.stripe.com/ajax/sessions'
login_form.email = email
login_form.password = password
new_page = agent.submit(login_form, login_form.buttons[0])
As you can see, simply setting the form's action property to the AJAX url solves your problem.
However, once you have logged in successfully, navigating around the site to scrape it for logs will not be possible with mechanize as it does not support javascript. You can check that by requesting the dashboard's url. You will get an error message to enable javascript.
Further, Stripe's dashboard is fully javascript powered. It simply makes an AJAX request to fetch data from the server and then render it as HTML.
This can work for you as the server response is JSON. You can simply parse it and get the required information from logs.
Upon further inspection(in Chrome Developer Tools), I found that the logs are requested from the url https://dashboard.stripe.com/ajax/logs?count=5&include%5B%5D=total_count&limit=10&method=not_get
Again, if you try to access this url using mechanize, you will run into CSRF token problem which is maintained between requests by Stripe.
The CSRF token problem can be solved using mechanize cookies but it will not be worth the effort.
I would suggest using a browser engine like Selenium to get the logs data as that will much simpler.
Related
I have been trying to login to my airbnb account to scrape all my reservations. I was able to login and get the status of the reservation to appear with the following code but all of a sudden, it seems like I have been blocked out of Airbnb. The script would usually return all of the reservations' status nicely lined up and after calling the script for about 6 hours straight trying to figure out how to get this to work I suddenly couldn't get it to return the /my_reservations page anymore.
When I called
puts page.body
it kept returning the /log_in page
I can still access my account manually through the browser though. Does anyone know anything about this? Any advice is greatly appreciated.
require 'mechanize'
require 'nokogiri'
agent = Mechanize.new {|a| a.user_agent_alias = "Mac Firefox"} page = agent.get('http://www.airbnb.com/my_reservations')
form = agent.page.form_with(action: "/authenticate") form.field_with(name: "email").value = "my_email#provider.com" form.field_with(name: "password").value = "my_password" form.submit
puts page.search('.label').text
I am trying to access the calendar data on an airbnb listing and so far have been unsuccessful. I am using the Mechanize gem in Ruby, and when I try to access the link to access the table, I am encountering the following error:
require 'mechanize'
agent = Mechanize.new
page1=agent.get("https://www.airbnb.com/rooms/726348")
page2=agent.get("https://www.airbnb.com/rooms/calendar_tab_inner2/73944?cal_month=11&cal_year=2013¤cy=USD")
Mechanize::ResponseCodeError: 400 => Net::HTTPBadRequest for https://www.airbnb.com/rooms/calendar_tab_inner2/726348?cal_month=11&cal_year=2013¤cy=USD -- unhandled response
I have also tried to click on the tab that generates the table with the following code, but doing so simply generates the html from the original url.
agent = Mechanize.new
page1=agent.get("https://www.airbnb.com/rooms/726348")
page2=agent.click(page1.link_with(:href => '#calendar'))
Any help would greatly appreciated. Thanks!
I see the problem, you need to check the request headers:
page = agent.get url, nil, nil, {'X-Requested-With' => 'XMLHttpRequest'}
I just wondering for some informations about mechanize and found the below code from Internet:
require 'mechanize'
require 'logger'
agent = Mechanize.new
agent.user_agent_alias = 'Windows IE 9'
agent.follow_meta_refresh = true
agent.log = Logger.new(STDOUT)
Could any one please explain why user_agent_alias and follow_meta_refresh is needed when,mechanize itself is a browser?
Mechanize isn't a browser. It is a page parser that gives you enough methods to make it easy/convenient to navigate through a site. But, in no way is it a browser.
user_agent_alias sets the signature of Mechanize when it's running and making page requests. In your example it's trying to spoof a site by masquerading as "IE 9", but that signature isn't going to fool any system that is sniffing the User-Agent header.
follow_meta_refresh, well, you should take the time to search for "meta" tags with the "refresh" parameter. It's trivial to find out about it, and, then you'll understand. Or just read the documentation for it.
I am attempting to screen scrape some data from Akamai's control panel, but I am having trouble while logging in to the page via mechanize for Ruby.
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
url = 'http://control.akamai.com'
page = agent.get( url )
puts page.content
Upon examining the page, I find displayed:
"Cookie support has been disabled in your browser. Please enable cookies before continuing."
The fact that the page thinks I have cookies disabled prevents me from logging in. Any thoughts?
You can specify other user agent:
agent.user_agent_alias = 'Mac Safari'
Or/And create a cookie manually:
cookie = Mechanize::Cookie.new(key, value)
cookie.domain = '.akamai.com'
cookie.path = '/'
agent.cookie_jar.add(cookie)
For more info about Ruby Mechanize cookies, read this pages:
http://mechanize.rubyforge.org/Mechanize/Cookie.html
http://mechanize.rubyforge.org/Mechanize/CookieJar.html
Has anyone used the Mechanize gem on a site that required SSL?
When I try to access such a website Mechanize tries to use standard HTTP which results in endless redirections between http:// and https://.
Mechanize works just fine with HTTPS. Try setting
agent.log = Logger.new(STDOUT)
to see what's going on between Mechanize and the server. If you are still having trouble, post a sample of the code and somebody will help.
I just gave Mechanize a try with my company's web site. The home page is HTTP, but it contains a link, "customer login," which sends the browser to an HTTPS page. It worked fine. The code is:
#!/usr/bin/ruby1.8
require 'rubygems'
require 'mechanize'
agent = WWW::Mechanize.new
page = agent.get("http://www.not_the_real_url.com")
link = page.link_with(:text=>"CUSTOMER LOGIN")
page = link.click
form = page.forms.first
form['user_login'] = 'not my real login name'
form['user_password'] = 'not my real password'
page = form.submit