Using a Ruby script to login to a website via https - ruby

Alright, so here's the dealio: I'm working on a Ruby app that'll take data from a website, and aggregate that data into an XML file.
The website I need to take data from does not have any APIs I can make use of, so the only thing I can think of is to login to the website, sequentially load the pages that have the data I need (in this case, PMs; I want to archive them), and then parse the returned HTML.
The problem, though, is that I don't know of any ways to programatically simulate a login session.
Would anyone have any advice, or know of any proven methods that I could use to successfully login to an https page, and then programatically load pages from the site using a temporary cookie session from the login? It doesn't have to be a Ruby-only solution -- I just wanna know how I can actually do this. And if it helps, the website in question is one that uses Microsoft's .NET Passport service as its login/session mechanism.
Any input on the matter is welcome. Thanks.

Mechanize
Mechanize is ruby library which imititates the behaviour of a web browser. You can click links, fill out forms und submit them. It even has a history and remebers cookies. It seems your problem could be easily solved with the help of mechanize.
The following example is taken from http://docs.seattlerb.org/mechanize/EXAMPLES_rdoc.html:
require 'rubygems'
require 'mechanize'
a = Mechanize.new
a.get('http://rubyforge.org/') do |page|
# Click the login link
login_page = a.click(page.link_with(:text => /Log In/))
# Submit the login form
my_page = login_page.form_with(:action => '/account/login.php') do |f|
f.form_loginname = ARGV[0]
f.form_pw = ARGV[1]
end.click_button
my_page.links.each do |link|
text = link.text.strip
next unless text.length > 0
puts text
end
end

You can try use wget to fetch the page. You can analyse login process with this app www.portswigger.net/proxy/.

For what it's worth, you could check out Webrat. It is meant to be used a tool for automated acceptance tests, but I think you could use it to simulate filling out the login fields, then click through links by their names, and grab the needed HTML as a string. Haven't tried doing anything like it, tho.

Related

Form handling through watir-webdriver or ruby or rspec

How I can handle the sign-up form. It appears every time when I hit the url. I want dismiss it globally and also I want to access elements of it. How can I do that? using ruby, watir-webdriver, rspec or cucumber.
Check this out (that is a line from watir-webdriver code):
browser.goto 'http://login:password#www.yoursite.com/index.html'
In other words you can send keys for basic HTTP authorisation right in the url like http://login:password#www.yoursite.com/index.html
I hope it will help you.

Ruby Mechanize and changing URL after a login

I have a Mechanize script that currently goes to a login form and properly logs a user in. I'm seeing plenty of documentation to follow links, but I'd like to go to an ad-hoc page that isn't linked on the main page after I login. The page requires authentication and that's why I force the login first. Is there a way to change to another URL (that's still part of the same site) with Ruby's Mechanize gem and have it retain all of the cookies from the login? I looked up methods such as link_with but that's to follow a link on the current page. I'd like to go to a different url within the same website.
I believe you just need to make a subsequent get call after your initial transaction is complete.
client = Mechanize.new
client.get('http://example.com/login') do
# handle login
end
client.get('http://example.com/something-else') do
# another action
end

Mechanize cannot load a page properly

I want to scrape some pages of this site: Marketbook.ca
So I used for that mechanize. but it does not load pages properly. and it returns a page with empty body, like in the following code:
require 'mechanize'
agent = Mechanize.new
agent.user_agent_alias = 'Linux Firefox'
agent.get('http://www.marketbook.ca/list/list.aspx?ETID=1&catid=1001&LP=MAT&units=imperial')
What could be the issue here?
Actually this page requires JS engine to display the content:
<noscript>Please enable JavaScript to view the page content.</noscript>
Mechanize doesn't handle pages with JS, so you'd better choose another options like Selenium or WATIR. Both need a real web browser to manipulate.
Another option for you is to look through included JS scripts and figure out where data comes from and query that web resource if it's possible.

How to 'Like-gate' with OmniAuth Facebook, Sinatra and Datamapper

For an app I'm, building I need to be able to determine a Facebook user's relation to the page of which the app is being shown within. I hope to provide the following functionality:
1) If the user likes the page, direct them to another page
2) If the user doesn't like a page, direct them to another page
3) If the user is an admin of the current page, direct them to another page
The Auth Hash schema (https://github.com/intridea/omniauth/wiki/Auth-Hash-Schema) doesn't provide info of how to access user likes, or indicate a way whether they: a) Like or don't like the current page, or B) are an admin of the current page.
Furthermore I've researched around the internet but cannot find any specific Ruby or Sinatra example of how to do this. The closest that I've come to is (https://github.com/chrissloan/sinatra-book/blob/master/app.rb), however this users FBGraph (I'm using Omniauth-Facebook) and this script doesn't make any distinction between admin users of the page.
Therefore I'm wondering if my method of distinguishing between users on the page tab is inherently wrong and am wondering if there is another process of achieving the desired goals.
So in summary, I'm attempting to create:
A backend that is accessible by the app admin. When an app admin goes on the page, the admin panel is displayed.
A front end that displays whether the user has liked the page or not, and shows them specific content based upon this state - a 'like-gate'.
Thanks for reading and if you could help it would be very much appreciated.
Some code I've thought up, but from the docs I'm unsure of whether it is valid syntax or not. From begin is the experimental piece, the code above works and is currently being used.
get '/auth/:provider/callback' do
content_type 'application/json'
JSON.generate(request.env)
auth = request.env["omniauth.auth"]
puts auth
=begin
if auth['page']['admin'] == true ***not sure if admin is valid syntax***
#check user database and move to admin side
else if auth['page']['liked'] == true
#allow to download endpoint
else
auth['page']['liked'] = false
#direct to wall to like
end
I solved this by authorising with OmniAuth and then getting the signed_request generate from Facebook when the user opens the app on a Facebook page.

Read dynamic PDF from Ruby Watir

I am using Watir to log into an application, push some buttons, etc... Basically the normal stuff that a person would use Watir for.
However, my problem is that there is one particular page that I need to test. It's actually a dynamically-generated PDF and I need to get the actual binary data from it, so that I can load it using a certain gem that we're using. This normally works with static PDF files because we can just use:
open("http://site.com/something.pdf")
This works for static PDFs. However, for a dynamically generated one it doesn't work because we are using Ruby to send the HTTP request and it is not aware of the headers/cookies/session that Watir is using. So instead of getting the actual PDF we get a login page.
Another thing we tried was to use Watir to get the PDF:
#browser.goto "http://site.com/dynamic/thepdffile"
#browser.text
#browser.html
We tried getting the text or html from the page, but no luck because firefox creates a DOM when loading a pdf so the text is an empty string and the html is the DOM that firefox creates when viewing a pdf page. We need the raw HTTP response and there doesn't seem to be a way to extract that.
So we need a solution for this and in my opinion we have these options:
Figure out a way to use "open" or similar method in Ruby, using the session from Watir.
Figure out how to use watir to get the binary http response from the PDF page.
Disable the pdf plugin (which doesn't seem possible) such that the "save as" dialog appears.
Or if you have some other idea please share! Thanks in advance!
I figured out a solution.
In the profile for firefox you can set the plugin.scan.Acrobat to "999" which will effectively disable the PDF plugin.
profile = Selenium::WebDriver::Firefox::Profile.new
profile['plugin.scan.Acrobat'] = "999"
b = Watir::Browser.new :firefox, :profile => profile

Resources