Manual POST request - ruby

Scenario: I have logged into a website, gained cookies etc, got to a particular webpage with a form + hidden fields. I now want to be able to create my own http post with my own hidden form data instead of what is on the webpage and verify the response instead of using the one on the webpage.
Reason: Testing against pre-existing data (I know, I know) which could be different on each environment hence no predictable way to use it. We need a workaround.
Is there any way to do this without manually editing the existing form and submitting that? Feels a little 'hacky'.
Ideally, I would like to say something like:
browser.post 'url', 'field1=test&field2=abc'

I would probably switch to mechanize to muck around at the protocol level. Something like this added to your script
b = WWW::Mechanize.new
b.get('http://yoursite.com/current_page') do |page|
# Submit the login form
my_form = page.form_with(:action => '/post/url') do |f|
f.form_loginname = 'tim'
f.form_pw = 'password'
end.click_button
end

Related

Submitting login fields during a scraping process with ruby?

I need to scrape some financial data from a system called NetTeller.
An example can be found here.
Note the initial ID field prompt:
Then once you submit you have to then enter your password:
As you can see, it has a two step process where you first enter an ID number and then after submission the user is presented with a password field. I'm hitting some roadbumps here when it comes to jumping through these two hoops prior to getting on into the system and getting to the data that I actually want. How would one process a scenario such as this where you need to pass through the authentication fields prior first before getting to the data you want to scrape?
I have assumed that I could just jump in with httpclient and nokogiri, but am curious if there are any tricks when dealing with a two-page login such as this before getting into your target.
I would use Mechanize. The first page is "tricky" because the login form is within an iframe. So you could use just the source where the iframe is being loaded. Here is how:
agent = Mechanize.new
# Get first page
iframe_url = 'https://www.banksafe.com/sfonline/'
page = agent.get(iframe_url)
login_form = page.forms.first
username_field = login_form.field_with(:name => "12345678")
# Get second page
response = login_form.submit
second_login_form = response.forms.first
password_field = second_login_form.field_with(:password => "xxxxx")
# Get page to scrap
response = second_login_form.submit
This is how you could process an scenario like this. Obviously you might need to adapt to exactly how those forms/fields are written and other specific-page details, but I would go for this approach.

Using Ruby Mechanize to download file served as attachement

I need the ability to grab reports off of a particular website. The below method below does everything I need it to do, the only catch is the report, "report.csv", is served back with "content-disposition:filename=report.csv" in the response header when the page is posted (the page posts to itself).
def download_report
page = #mechanize.click(#mechanize.current_page().link_with(:text => /Reporting/))
page.form.field_with(:name => "rep").option_with(:value => "adperf").click
page.form_with(:name => "get-report").field_with(:id => "sasReportingQuery.dateRange").option_with(:value => "Custom").click
start_date = DateTime.parse(#start_date)
end_date = DateTime.parse(#end_date)
page.form_with(:name => "get-report").field_with(:name => "sd_display").value = start_date.strftime("%m/%d/%Y")
page.form_with(:name => "get-report").field_with(:name => "ed_display").value = end_date.strftime("%m/%d/%Y")
page.form_with(:name => "get-report").submit
end
As far as I can tell, Mechanize is not capturing the file anywhere that I can get to it. Is there a way to get Mechanize to capture and download this file?
#mechanize.current_page() does not contain the file and #mechanize.history() does not show that the file url was presented to Mechanize.
The server appears to be telling the browser to save the document. "Content-disposition:filename" is the clue to that. Mechanize won't know what to do with that, and will try to read and parse the content, which, if it's a CSV, will not work.
Without seeing the HTML page you're working with it's impossible to know exactly what mechanism they're using to trigger the download. Clicking an element could fire a JavaScript event, which Mechanize won't handle. Or, it could send a form to the server, which responds with the document download. In either case, you have to figure out what is being sent, why, and what specifically defines the document you want, then use that information to request the document.
Mechanize isn't the right tool to download an attachment. Use Mechanize to navigate forms, then use Mechanize's embedded Nokogiri to extract the URL for the document.
Then use something like curb or Ruby's built-in OpenURI to retrieve the attachment, or see "Using WWW:Mechanize to download a file to disk without loading it all in memory first" for more information.
Check the class of the returned page page.class. if it is File then you can just save it.
...
page = page.form_with(:name => "get-report").submit
page.class # File?
page.save('path/to/file')

How do I search then parse results on a webpage with Ruby?

How would you use Ruby to open a website and do a search in the search field and then parse the results? For example if I entered something into a search engine and then parsed the results page. I know how to use Nokogiri to find the webpage and open it. I am lost on how to input into the search field and moving forward to the results. Also on the page that I am actually searching I have to click on enter, I can't simply hit enter to move forward. Thank you so much for your help.
Use Mechanize - a library used for automating interaction with websites.
Something like mechanize will work, but interacting with the front end UI code is always going to be slower and more problematic than making requests directly against the back end.
Your best bet would be to look at the request that is being made to the server (probably a HTTP GET or POST request with some associated params). You can do this with firebug or Fiddler 2 for windows. Then, once you know the parameters that the server will accept, just make the request yourself.
For example, if you were doing this with the duckduckgo.com search engine, you could either get mechanize to go to duckduckgo.com, input text into the search box, and click submit, or you could just create a GET request to http://www.duckduckgo.com?q=search_term_here.
You can use Mechanize for something like this but it might be overkill. I would take a look at RestClient, especially if you don't need to manage cookies.
Edit:
If you can determine the specific URL that the form submits to, say for example 'example.com/search'; and you knew the request was a POST (which it usually is if you are submitting a form) you could construct something like this with mechanize:
agent = Mechanize.new
agent.post 'http://example.com/search', {
"_id0:Number" => string_to_search_for,
"_id0:submitButton" => "Enter"
}
Notice how the 'name' attribute of a form element becomes a key for the post and the 'value' element becomes the value. The 'input' element gets the value directly from the text you would have entered. This gets transformed into a request and submitted to the server when you push the submit button (of course in this case you are making the request directly). The result of the post should be some HTML that you can parse for the info you need.

Grab Facebook signed_request with Sinatra

I'm trying to figure out whether or not a user likes our brand page. Based off of that, we want to show either a like button or some 'thank you' text.
I'm working with a sinatra application hosted on heroku.
I tried the code from this thread: Decoding Facebook's signed request in Ruby/Sinatra
However, it doesn't seem to grab the signed_request and I can't figure out why.
I have the following methods:
get "/tab" do
#encoded_request = params[:signed_request]
#json_request = decode_data(#encoded_request)
#signed_request = Crack::JSON.parse(#json_request)
erb :index
end
# used by Canvas apps - redirect the POST to be a regular GET
post "/tab" do
#encoded_request = params[:signed_request]
#json_request = decode_data(#encoded_request)
#signed_request = Crack::JSON.parse(#json_request)
redirect '/tab'
end
I also have the helper messages from that thread, as they seem to make sense to me:
helpers do
def base64_url_decode(payload)
encoded_str = payload.gsub('-','+').gsub('_','/')
encoded_str += '=' while !(encoded_str.size % 4).zero?
Base64.decode64(encoded_str)
end
def decode_data(signed_request)
payload = signed_request.split('.')
data = base64_url_decode(payload)
end
end
However, when I just do
#encoded_request = params[:signed_request]
and read that out in my view with:
<%= #encoded_request %>
I get nothing at all.
Shouldn't this return at least something? My app seems to be crashing because well, there's nothing to be decoded.
I can't seem to find a lot of information about this around the internet so I'd be glad if someone could help me out.
Are there better ways to know whether or not a user likes our page? Or, is this the way to go and am I just overlooking something obvious?
Thanks!
The hint should be in your app crashing because there's nothing to decode.
I suspect the parameters get lost when redirecting. Think about it at the HTTP level:
The client posts to /tab with the signed_request in the params.
The app parses the signed_request and stores the result in instance variables.
The app redirects to /tab, i.e. sends a response with code 302 (or similar) and a Location header pointing to /tab. This completes the request/response cycle and the instance variables get discarded.
The client makes a new request: a GET to /tab. Because of the way redirects work, this will no longer have the params that were sent with the original POST.
The app tries to parse the signed_request param but crashes because no such param was sent.
The simplest solution would be to just render the template in response to the POST instead of redirecting.
If you really need to redirect, you need to carefully pass along the signed_request as query parameters in the redirect path. At least that's a solution I've used in the past. There may be simpler ways to solve this, or libraries that handle some of this for you.

Redirect from current page to a new page

I am having trouble with some Ruby CGI.
I have a home page (index.cgi) which is a mix of HTML and Ruby, and has a login form in it.
On clicking on the Submit button the POST's action is the same page (index.cgi), at which point I check to make sure the user has entered data into the correct fields.
I have a counter which increases by 1 each time a field is left empty. If this counter is 0 I want to change the current loaded page to something like contents.html.
With this I have:
if ( errorCount > 0 )
do nothing
else
....
end
What do I need to put where I have the ....?
Unfortunately I cannot use any frameworks as this is for University coursework, so have to use base Ruby.
As for using the CGI#header method as you have suggested, I have tried using this however it is not working for me.
As mentioned my page is index.cgi. This is made of a mixture of Ruby and HTML using "here doc" statements.
At the top of my code page I have my shebang line, following by a HTML header statement.
I then do the CGI form validation part, and within this I have tried doing something like: print this.cgi( { 'Status' => '302 Moved', 'location' =>
'{http://localhost:10000/contents.html' } )
All that happens is that this line is printed at the top of the browser window, above my index.cgi page.
I hope this makes sense.
To redirect the browser to another URL you must output an 30X HTTP response that contains the Location: /foo/bar header. You can do that using the CGI#header method.
Instead of dealing with these details that you do not yet master, I suggest you use a simple framework as Sinatra or, at least, write your script as a Rack-compatible application.
If you really need to use the bare CGI class, have a look at this simple example: https://github.com/tdtds/amazon-auth-proxy/blob/master/amazon-auth-proxy.cgi.

Resources