How to parse HTTP response using Ruby - ruby

I've written a short snippet which sends a GET request, performs auth and checks if there is a 200 OK response (when auth success). Now, one thing I saw with this specific GET request, is that the response is always 200 irrespective of whether auth success or not.
The diff is in the HTTP response. That is when auth fails, the first response is 200 OK, just the same as when auth success, and after this then there is a second step. The page gets redirected again to the login page.
I am just trying to make a quick script which can check my login user and pass on my web application and tell me which auth passed and which didn't.
How should I check this? The sample code is like this:
def funcA(u, p)
print_A("#{ip} - '#{u}' : '#{p}' - Pass")
end
def try_login(u, p)
path = '/index.php?uuser=#{u}&ppass=#{p}'
r = send_request_raw({
'URI' => 'path',
'method' => 'GET'
})
if (r and r.code.to_i == 200)
check = true
end
if check == true
funcA(u, p)
else
out = "#{ip} - '#{u} - Fail"
print_B(out)
end
return check, r
end
end
Update:
I also tried adding a new check for matching a 'Success/Fail' keyword coming in HTTP response. It didn't work either. But I now noticed that the response coming back seems to be in a different form. The Content-Type in response is text/html;charset=utf-8 though. And I am not doing any parsing so it is failing.
Success Response is in form of:
{"param1":1,"param2"="Auth Success","menu":0,"userdesc":"My User","user":"uuser","pass":"ppass","check":"success"}
Fail response is in form of:
{"param1":-1,"param2"="Auth Fail","check":"fail"}
So now I need some pointers on how to parse this response.
Many Thanks.

I do this with with "net/http"
require 'net/http'
uri = URI(url)
connection = Net::HTTP.start(uri.host, uri.port)
#response = Net::HTTP.get_response(URI(url))
#httpStatusCode = #response.code
connection.finish

If there's a redirect from a 200 then it must be a javascript or meta redirect. So just look for that in the response body.

Related

Net::HTTP to Newrelic returns 404, but curl returns 200

I am currently using Ruby's Net::HTTP library to query a Newrelic endpoint. Recently, these queries have started returning 404. I tested my exact query through curl - in the hopes of perhaps getting a more detailed error message back - but the query through curl actually returns a 200 with the expected data. So the query does work, and I am sort of at a loss as to why Net::HTTP would be returning a 404 at this point.
Here are some code snippets of what I have so far, and if anyone can offer any suggestions of further things to try, that would be much appreciated!
Environment:
JRuby 1.7.26 (so Ruby 1.9.3p551)
Rails 3.2.21
Ruby code:
uri = URI('https://NEWRELIC_HOST/PATH/ACCOUNT_ID/query')
parameters = { :nrql => NRQL_QUERY_STRING }
uri.query = URI.encode_www_form(parameters)
request = Net::HTTP::Get.new(uri.to_s)
request['X-Query-Key'] = NEWRELIC_QUERY_KEY
Net::HTTP.start(uri.hostname, uri.port, {:use_ssl => true}) do |http|
response = http.request(request)
end
This returns me a 404 error code every time. I have tried it against a couple valid Newrelic endpoints/accounts and every time is a 404 error.
CURL code:
Now if I take that same request, and punt it to curl on the command line, there are no issues, I get a 200 with all data returned properly:
curl -H "X-Query-Key: NEWRELIC_QUERY_KEY" https://NEWRELIC_HOST/PATH/ACCOUNT_ID/query?NRQL_QUERY_STRING
Try changing
request = Net::HTTP::Get.new(uri.to_s)
to
request = Net::HTTP::Get.new(uri.request_uri)

Reading Withings API ruby

I have been trying for days to pull down activity data from the Withings API using the OAuth Ruby gem. Regardless of what method I try I consistently get back a 503 error response (not enough params) even though I copied the example URI from the documentation, having of course swapped out the userid. Has anybody had any luck with this in the past. I hope it is just something stupid I am doing.
class Withings
API_KEY = 'REMOVED'
API_SECRET = 'REMOVED'
CONFIGURATION = { site: 'https://oauth.withings.com', request_token_path: '/account/request_token',
access_token_path: '/account/access_token', authorize_path: '/account/authorize' }
before do
#consumer = OAuth::Consumer.new API_KEY, API_SECRET, CONFIGURATION
#base_url ||= "#{request.env['rack.url_scheme']}://#{request.env['HTTP_HOST']}#{request.env['SCRIPT_NAME']}"
end
get '/' do
#request_token = #consumer.get_request_token oauth_callback: "#{#base_url}/access_token"
session[:token] = #request_token.token
session[:secret] = #request_token.secret
redirect #request_token.authorize_url
end
get '/access_token' do
#request_token = OAuth::RequestToken.new #consumer, session[:token], session[:secret]
#access_token = #request_token.get_access_token oauth_verifier: params[:oauth_verifier]
session[:token] = #access_token.token
session[:secret] = #access_token.secret
session[:userid] = params[:userid]
redirect "#{#base_url}/activity"
end
get '/activity' do
#access_token = OAuth::AccessToken.new #consumer, session[:token], session[:secret]
response = #access_token.get("http://wbsapi.withings.net/v2/measure?action=getactivity&userid=#{session[:userid]}&startdateymd=2014-01-01&enddateymd=2014-05-09")
JSON.parse(response.body)
end
end
For other API endpoints I get an error response of 247 - The userid provided is absent, or incorrect. This is really frustrating. Thanks
So I figured out the answer after copious amount of Googleing and grasping a better understanding of both the Withings API and the OAuth library I was using. Basically Withings uses query strings to pass in API parameters. I though I was going about passing these parameters correctly when I was making API calls, but apparently I needed to explicitly set the OAuth library to use the query string scheme, like so
http_method: :get, scheme: :query_string
This is appended to my OAuth consumer configuration and all worked fine immediately.

How can I check if a URL on Amazon S3 is valid, and not expired using ruby

I need to accept a url like:
https://aidin.s3.amazonaws.com/appname/bucket/folder/faxattach/bXs9FerLJR1tnhs3z?AWSAccessKeyId=ACCEDD_KEY&Expires=1372360744&Signature=SIGNATURE
and check if that URL is valid, and not expired.
I've tried a few different things, but they tend to error out. For example:
url = URI.parse("https://aidin.s3.amazonaws.com/appname/bucket/folder/faxattach/bXs9FerLJR1tnhs3z?AWSAccessKeyId=ACCEDD_KEY&Expires=1372360744&Signature=SIGNATURE")
req = Net::HTTP.new(url.host, url.port)
res = req.request_head(url.path)
This gives me
Net::HTTPBadResponse: wrong status line: "\x15\x03\x01\x00\x02\x02" whether or not the URL is valid or not.
Just ran up against this myself, this is what fixed it for me:
url = URI.parse(url_param)
http = Net::HTTP.new(url.host, url.port)
http.use_ssl = true if url.scheme == 'https'
response = http.request_head(url.path)
The wrong status line is due to the Net library treating the request as HTTP when S3 can be either. The use_ssl boolean assignment is conditional on the scheme being HTTPS.
you can use the aws-s3 gem (https://github.com/marcel/aws-s3)
S3Object.exists? 'headshot.jpg', 'photos'

RestClient POST doesn't display status on header-only response

I have a Rails action which responds with head :ok, rather than rendering any content. I'm calling this action using RestClient, like so:
resp = RestClient.post("#{api_server_url}/action/path", {:param_1 => thing, :param_2 => other_thing}, :authorization => auth)
The Rails server log shows that this worked as expected:
Completed 200 OK in 78ms (ActiveRecord: 21.3ms)
However, the resulting value of resp is the string " ", rather than an object I can examine (to see what its status code is, for instance).
I tried changing the action to use head :created instead, just to see if it produced a different result, but it's the same: " ".
How can I get the status code of this response?
RestClient.post returns an instance of the class RestClient::Response that inherits from the String class.
You can still check the return code by calling the method code resp.code. Other methods are for example resp.headers and resp.cookies.

Access session cookie in scrapy spiders

I am trying to access the session cookie within a spider. I first login to a social network using in a spider:
def parse(self, response):
return [FormRequest.from_response(response,
formname='login_form',
formdata={'email': '...', 'pass':'...'},
callback=self.after_login)]
In after_login, I would like to access the session cookies, in order to pass them to another module (selenium here) to further process the page with an authentificated session.
I would like something like that:
def after_login(self, response):
# process response
.....
# access the cookies of that session to access another URL in the
# same domain with the autehnticated session.
# Something like:
session_cookies = XXX.get_session_cookies()
data = another_function(url,cookies)
Unfortunately, response.cookies does not return the session cookies.
How can I get the session cookies ? I was looking at the cookies middleware: scrapy.contrib.downloadermiddleware.cookies and scrapy.http.cookies but there doesn't seem to be any straightforward way to access the session cookies.
Some more details here bout my original question:
Unfortunately, I used your idea but I dind't see the cookies, although I know for sure that they exists since the scrapy.contrib.downloadermiddleware.cookies middleware does print out the cookies! These are exactly the cookies that I want to grab.
So here is what I am doing:
The after_login(self,response) method receives the response variable after proper authentication, and then I access an URL with the session data:
def after_login(self, response):
# testing to see if I can get the session cookies
cookieJar = response.meta.setdefault('cookie_jar', CookieJar())
cookieJar.extract_cookies(response, response.request)
cookies_test = cookieJar._cookies
print "cookies - test:",cookies_test
# URL access with authenticated session
url = "http://site.org/?id=XXXX"
request = Request(url=url,callback=self.get_pict)
return [request]
As the output below shows, there are indeed cookies, but I fail to capture them with cookieJar:
cookies - test: {}
2012-01-02 22:44:39-0800 [myspider] DEBUG: Sending cookies to: <GET http://www.facebook.com/profile.php?id=529907453>
Cookie: xxx=3..........; yyy=34.............; zzz=.................; uuu=44..........
So I would like to get a dictionary containing the keys xxx, yyy etc with the corresponding values.
Thanks :)
A classic example is having a login server, which provides a new session id after a successful login. This new session id should be used with another request.
Here is the code picked up from source which seems to work for me.
print 'cookie from login', response.headers.getlist('Set-Cookie')[0].split(";")[0].split("=")[1]
Code:
def check_logged(self, response):
tmpCookie = response.headers.getlist('Set-Cookie')[0].split(";")[0].split("=")[1]
print 'cookie from login', response.headers.getlist('Set-Cookie')[0].split(";")[0].split("=")[1]
cookieHolder=dict(SESSION_ID=tmpCookie)
#print response.body
if "my name" in response.body:
yield Request(url="<<new url for another server>>",
cookies=cookieHolder,
callback=self."<<another function here>>")
else:
print "login failed"
return
Maybe this is an overkill, but i don't know how are you going to use those cookies, so it might be useful (an excerpt from real code - adapt it to your case):
from scrapy.http.cookies import CookieJar
class MySpider(BaseSpider):
def parse(self, response):
cookieJar = response.meta.setdefault('cookie_jar', CookieJar())
cookieJar.extract_cookies(response, response.request)
request = Request(nextPageLink, callback = self.parse2,
meta = {'dont_merge_cookies': True, 'cookie_jar': cookieJar})
cookieJar.add_cookie_header(request) # apply Set-Cookie ourselves
CookieJar has some useful methods.
If you still don't see the cookies - maybe they are not there?
UPDATE:
Looking at CookiesMiddleware code:
class CookiesMiddleware(object):
def _debug_cookie(self, request, spider):
if self.debug:
cl = request.headers.getlist('Cookie')
if cl:
msg = "Sending cookies to: %s" % request + os.linesep
msg += os.linesep.join("Cookie: %s" % c for c in cl)
log.msg(msg, spider=spider, level=log.DEBUG)
So, try request.headers.getlist('Cookie')
This works for me
response.request.headers.get('Cookie')
It seems to return all the cookies that where introduced by the middleware in the request, session's or otherwise.
As of 2021 (Scrapy 2.5.1), this is still not particularly straightforward. But you can access downloader middlewares (like CookiesMiddleware) from within a spider via self.crawler.engine.downloader:
def after_login(self, response):
downloader_middlewares = self.crawler.engine.downloader.middleware.middlewares
cookies_mw = next(iter(mw for mw in downloader_middlewares if isinstance(mw, CookiesMiddleware)))
jar = cookies_mw.jars[response.meta.get('cookiejar')].jar
cookies_list = [vars(cookie) for domain in jar._cookies.values() for path in domain.values() for cookie in path.values()]
# or
cookies_dict = {cookie.name: cookie.value for domain in jar._cookies.values() for path in domain.values() for cookie in path.values()}
...
Both output formats above can be passed to other requests using the cookies parameter.

Resources