How to manually add a cookie to Mechanize state? - ruby

I'm working in Ruby, but my question is valid for other languages as well.
I have a Mechanize-driven application. The server I'm talking to sets a cookie using JavaScript (rather than standard set-cookie), so Mechanize doesn't catch the cookie. I need to pass that cookie back on the next GET request.
The good news is that I already know the value of the cookie, but I don't know how to tell Mechanize to include it in my next GET request.

I figured it out by extrapolation (and reading sources):
agent = Mechanize.new
...
cookie = Mechanize::Cookie.new(key, value)
cookie.domain = ".oddity.com"
cookie.path = "/"
agent.cookie_jar.add(cookie)
...
page = agent.get("https://www.oddity.com/etc")
Seems to do the job just fine.
update
As #Benjamin Manns points out, Mechanize now wants a URL in the add method. Here's the amended recipe, making the assumption that you've done a GET using the agent, and that the last page visited is the domain for the cookie (saves a URI.parse()):
agent = Mechanize.new
...
cookie = Mechanize::Cookie.new(key, value)
cookie.domain = ".oddity.com"
cookie.path = "/"
agent.cookie_jar.add(agent.history.last.uri, cookie)

These answers are old, so to bring this up to date, these days it looks more like this:
cookie = Mechanize::Cookie.new :domain => '.mydomain.com', :name => name, :value => value, :path => '/', :expires => (Date.today + 1).to_s
agent.cookie_jar << cookie

I wanted to add my experience for specifically passing cookies from Selenium to Mechanize:
Get the cookies from your selenium driver
sel_driver = Selenium::WebDriver.for :firefox
sel_driver.navigate.to('https://sample.com/javascript_login')
#login
sel_cookies = sel_driver.manage.all_cookies
Value for :expires from Selenium cookie is a DateTime object or blank.
However, value for :expires Mechanize cookie (a) must be a string and (b) cannot be blank
sel_cookies.each do |c|
if c[:expires].blank?
c[:expires] = (DateTime.now + 10.years).to_s #arbitrary date in the future
else
c[:expires] = c[:expires].to_s
end
end
Now instantiate as Mechanize cookies and place them in the cookie jar
mech_agent = Mechanize.new
sel_cookies.each { |c| agent.cookie_jar << Mechanize::Cookie.new(c) }
mech_agent.get 'https://sample.com/html_pages'

Also you can try this
Mechanize::Cookie.parse(url, "SessionCookie=#{sessid}",
Logger.new(STDOUT)) { |c| agent.cookie_jar.add(url, c) }
source: http://twitter.com/#!/calebcrane/status/51683884341002240

response.to_hash.fetch("set-cookie").each do |c|
agent.cookie_jar.parse c
end
response here is a native Ruby stdlib thing, like Net::HTTPOK.

Related

How do I follow URL redirection?

I have a URL and I need to retrieve the URL it redirects to (the number of redirections is arbitrary).
One real example I'm working on is:
https://www.google.com/url?q=http://m.zynga.com/about/privacy-center/privacy-policy&sa=D&usg=AFQjCNESJyXBeZenALhKWb52N1vHouAd5Q
which will eventually redirect to:
http://company.zynga.com/privacy/policy
which is the URL I'm interested in.
I tried with open-uri as follows:
privacy_url = "https://www.google.com/url?q=http://m.zynga.com/about/privacy-center/privacy-policy&sa=D&usg=AFQjCNESJyXBeZenALhKWb52N1vHouAd5Q"
final_url = nil
open(privacy_url) do |h|
puts "Redirecting to #{h.base_uri}"
final_url = h.base_uri
end
but I keep getting the original URL back, meaning that final_url is equal to privacy_url.
Is there any way to follow this kind of redirection and programmatically access the resulting URL?
I finally made it, using the Mechanize gem. They key is to enable the follow_meta_refresh options, which is disabled by default.
Here's how
require 'mechanize'
browser = Mechanize.new
browser.follow_meta_refresh = true
start_url = "https://www.google.com/url?q=http://m.zynga.com/about/privacy-center/privacy-policy&sa=D&usg=AFQjCNESJyXBeZenALhKWb52N1vHouAd5Q"
final_url = nil
browser.get(start_url) do |page|
final_url = page.uri.to_s
end
puts final_url # => http://company.zynga.com/privacy/policy

login vk.com net::http.post_form

I want login to vk.com or m.vk.com without Ruby. But my code dosen't work.
require 'net/http'
email = "qweqweqwe#gmail.com"
pass = "qeqqweqwe"
userUri = URI('m.vk.com/index.html')
Net::HTTP.get(userUri)
res = Net::HTTP.post_form(userUri, 'email' => email, 'pass' => pass)
puts res.body
First of all, you need to change userUri to the following:
userUri = URI('https://login.vk.com/?act=login')
Which is where the vk site expects your login parameters.
I'm not very faimilar with vk, but you probably need a way to handle the session cookie. Both receiving it, and providing it for future requests. Can you elaborate on what you're doing after login?
Here is the net/http info for cookie handling:
# Headers
res['Set-Cookie'] # => String
res.get_fields('set-cookie') # => Array
res.to_hash['set-cookie'] # => Array
puts "Headers: #{res.to_hash.inspect}"
This kind of task is exactly what Mechanize is for. Mechanize handles redirects and cookies automatically. You can do something like this:
require 'mechanize'
agent = Mechanize.new
url = "http://m.vk.com/login/"
page = agent.get(url)
form = page.forms[0]
form['email'] = "qweqweqwe#gmail.com"
form['pass'] = "qeqqweqwe"
form.submit
puts agent.page.body

How to set the Referer header before loading a page with Ruby mechanize?

Is there a straightforward way to set custom headers with Mechanize 2.3?
I tried a former solution but get:
$agent = Mechanize.new
$agent.pre_connect_hooks << lambda { |p|
p[:request]['Referer'] = 'https://wwws.mysite.com/cgi-bin/apps/Main'
}
# ./mech.rb:30:in `<main>': undefined method `pre_connect_hooks' for nil:NilClass (NoMethodError)
The docs say:
get(uri, parameters = [], referer = nil, headers = {}) { |page| ... }
so for example:
agent.get 'http://www.google.com/', [], agent.page.uri, {'foo' => 'bar'}
alternatively you might like:
agent.request_headers = {'foo' => 'bar'}
agent.get url
You misunderstood the code you were copying. There was a newline in the example, but it disappeared in the formatting as it wasn't tagged as code. $agent contains nil since you're trying to use it before it has been initialized. You must initialize the object and then use it. Just try this:
$agent = Mechanize.new
$agent.pre_connect_hooks << lambda { |p| p[:request]['Referer'] = 'https://wwws.mysite.com/cgi-bin/apps/Main' }
For this question I noticed people seem to use:
page = agent.get("http://www.you.com/index_login/", :referer => "http://www.you.com/")
As an aside, now that I tested this answer, it seems this was not the issue behind my actual problem: that every visit to a site I'm scraping requires going through the login sequence pages again, even seconds later after the first logged-in visit, despite that I'm always loading and saving the complete cookie jar in yaml format. But that would lead to another question of course.

Sinatra cookies vanish on certain routes?

I have a simple Sinatra app that I am playing with, and for some reason the cookies don't seem to work for certain routes, which I find quite bizarre.
require "sinatra"
set(:authenticate) do |*vars|
condition do
unless request.cookies.has_key?("TestCookie")
redirect to("/login"), 303
end
end
end
get "/login" do
return "No valid cookie"
end
get "/secret", :authenticate => [:auth_cookie] do
cookie = request.cookies["TestCookie"]
return "Secrets ahoy - #{cookie}"
end
get '/cookie/set' do
response.set_cookie("TestCookie", {
:expires => Time.now + 2400,
:value => "TestValue"
})
return "Cookie is set"
end
get '/cookie/get' do
cookie = request.cookies["TestCookie"]
return "Cookie with value #{cookie}"
end
If I go to cookies/set it correctly sets the cookie (can see it in firecookie), then if I go to cookies/get I get the correct cookie output. However if I go to /secret it always redirects to the /login. As I am still fairly new to Ruby syntax I thought it may be a problem with my condition within the authenticate extension, so I have tried removing that and just spitting out the cookie like the other one does. However still nothing, so I am at a loss as to why the cookie is there, I can see it in the browser... and /cookies/get works, but /secret doesn't...
Am I missing something here?
The problem is that the cookie is set with path /cookie. When you set a cookie your can specify a path, which is effectively a sub-part of the Website that you want the cookie to apply to. I guess Sinatra/Rack use the path of the current request by default which in /cookie/set would be /cookie.
You can make it work the way you expect by explicitly specifying the path:
response.set_cookie("TestCookie", {
:expires => Time.now + 2400,
:value => "TestValue",
:path => '/'
})
Or you could set the cookie at a route called say /cookie-set rather than /cookie/set

How to implement cookie support in ruby net/http?

I'd like to add cookie support to a ruby class utilizing net/http to browse the web. Cookies have to be stored in a file to survive after the script has ended. Of course I can read the specs and write some kind of a handler, use some cookie.txt format and so on, but it seems to mean reinventing the wheel. Is there a better way to accomplish this task? Maybe some kind of a cooie jar class to take care of cookies?
The accepted answer will not work if your server returns and expects multiple cookies. This could happen, for example, if the server returns a set of FedAuth[n] cookies. If this affects you, you might want to look into using something along the lines of the following instead:
http = Net::HTTP.new('https://example.com', 443)
http.use_ssl = true
path1 = '/index.html'
path2 = '/index2.html'
# make a request to get the server's cookies
response = http.get(path)
if (response.code == '200')
all_cookies = response.get_fields('set-cookie')
cookies_array = Array.new
all_cookies.each { | cookie |
cookies_array.push(cookie.split('; ')[0])
}
cookies = cookies_array.join('; ')
# now make a request using the cookies
response = http.get(path2, { 'Cookie' => cookies })
end
Taken from DZone Snippets
http = Net::HTTP.new('profil.wp.pl', 443)
http.use_ssl = true
path = '/login.html'
# GET request -> so the host can set his cookies
resp, data = http.get(path, nil)
cookie = resp.response['set-cookie'].split('; ')[0]
# POST request -> logging in
data = 'serwis=wp.pl&url=profil.html&tryLogin=1&countTest=1&logowaniessl=1&login_username=blah&login_password=blah'
headers = {
'Cookie' => cookie,
'Referer' => 'http://profil.wp.pl/login.html',
'Content-Type' => 'application/x-www-form-urlencoded'
}
resp, data = http.post(path, data, headers)
# Output on the screen -> we should get either a 302 redirect (after a successful login) or an error page
puts 'Code = ' + resp.code
puts 'Message = ' + resp.message
resp.each {|key, val| puts key + ' = ' + val}
puts data
update
#To save the cookies, you can use PStore
cookies = PStore.new("cookies.pstore")
# Save the cookie
cookies.transaction do
cookies[:some_identifier] = cookie
end
# Retrieve the cookie back
cookies.transaction do
cookie = cookies[:some_identifier]
end
The accepted answer does not work. You need to access the internal representation of the response header where the multiple set-cookie values are stores separately and then remove everything after the first semicolon from these string and join them together. Here is code that works
r = http.get(path)
cookie = {'Cookie'=>r.to_hash['set-cookie'].collect{|ea|ea[/^.*?;/]}.join}
r = http.get(next_path,cookie)
Use http-cookie, which implements RFC-compliant parsing and rendering, plus a jar.
A crude example that happens to follow a redirect post-login:
require 'uri'
require 'net/http'
require 'http-cookie'
uri = URI('...')
jar = HTTP::CookieJar.new
Net::HTTP.start(uri.host, uri.port, use_ssl: uri.scheme == 'https') do |http|
req = Net::HTTP::Post.new uri
req.form_data = { ... }
res = http.request req
res.get_fields('Set-Cookie').each do |value|
jar.parse(value, req.uri)
end
fail unless res.code == '302'
req = Net::HTTP::Get.new(uri + res['Location'])
req['Cookie'] = HTTP::Cookie.cookie_value(jar.cookies(uri))
res = http.request req
end
Why do this? Because the answers above are incredibly insufficient and flat out don't work in many RFC-compliant scenarios (happened to me), so relying on the very lib implementing just what's needed is infinitely more robust if you want to handle more than one particular case.
I've used Curb and Mechanize for a similar project.
Just enable cookies support and save the cookies to a temp cookiejar...
If your using net/http or packages without cookie support built in, you will need to write your own cookie handling.
You can send receive cookies using headers.
You can store the header in any persistence framework. Whether it is some sort of database, or files.

Resources