Good request from browser but bad request from ruby? - ruby

I'm using the google custom search api and I'm trying to access it through some ruby code:
Here is a snippet of the code
req = Typhoeus::Request.new("https://www.googleapis.com/customsearch/v1?key={my_key}&cx=017576662512468239146:omuauf_lfve&q=" + keyword, followlocation: true)
res = req.run
It appears that the body of the answer is this one:
<p>Your client has issued a malformed or illegal request. <ins>That’s all we know.</ins>
'
from /usr/local/lib/ruby/2.1.0/json/common.rb:155:in `parse'
from main.rb:20:in `initialize'
from main.rb:41:in `new'
from main.rb:41:in `<main>'
When I try to do the same thing from the browser it works like a charm. Even more confusing is that this same code worked 12 hours ago. I only changed the keyword that it should look for, however it started returning the error.
Any suggestions? I'm sure that I have enough credits for more requests

You probably have problems with special characters in your get parameter keyword. If you enter the URL in your browser, the brower adjusts these. However, for ruby you need to escape these characters, in such a way that a string like "sky line" becomes "sky+line" and so on. There is a utility function CGI::escape, which is used like this:
require 'cgi'
CGI::escape("sky line")
=> "sky+line"
Your fixed code would look something like this:
req = Typhoeus::Request.new("https://www.googleapis.com/customsearch/v1?key={my_key}&cx=017576662512468239146:omuauf_lfve&q=" + CGI::escape(keyword), followlocation: true)
res = req.run
However, since you're using Typhoeus anyway, you should be able to use its params parameter and let Typhoeus handle the escaping:
req = Typhoeus::Request.new(
"https://www.googleapis.com/customsearch/v1?&cx=017576662512468239146:omuauf_lfve",
followlocation: true,
params: {q: keyword, key: my_key}
)
res = req.run
There's more examples on Typhoeus' GitHub page.

Related

Reading Withings API ruby

I have been trying for days to pull down activity data from the Withings API using the OAuth Ruby gem. Regardless of what method I try I consistently get back a 503 error response (not enough params) even though I copied the example URI from the documentation, having of course swapped out the userid. Has anybody had any luck with this in the past. I hope it is just something stupid I am doing.
class Withings
API_KEY = 'REMOVED'
API_SECRET = 'REMOVED'
CONFIGURATION = { site: 'https://oauth.withings.com', request_token_path: '/account/request_token',
access_token_path: '/account/access_token', authorize_path: '/account/authorize' }
before do
#consumer = OAuth::Consumer.new API_KEY, API_SECRET, CONFIGURATION
#base_url ||= "#{request.env['rack.url_scheme']}://#{request.env['HTTP_HOST']}#{request.env['SCRIPT_NAME']}"
end
get '/' do
#request_token = #consumer.get_request_token oauth_callback: "#{#base_url}/access_token"
session[:token] = #request_token.token
session[:secret] = #request_token.secret
redirect #request_token.authorize_url
end
get '/access_token' do
#request_token = OAuth::RequestToken.new #consumer, session[:token], session[:secret]
#access_token = #request_token.get_access_token oauth_verifier: params[:oauth_verifier]
session[:token] = #access_token.token
session[:secret] = #access_token.secret
session[:userid] = params[:userid]
redirect "#{#base_url}/activity"
end
get '/activity' do
#access_token = OAuth::AccessToken.new #consumer, session[:token], session[:secret]
response = #access_token.get("http://wbsapi.withings.net/v2/measure?action=getactivity&userid=#{session[:userid]}&startdateymd=2014-01-01&enddateymd=2014-05-09")
JSON.parse(response.body)
end
end
For other API endpoints I get an error response of 247 - The userid provided is absent, or incorrect. This is really frustrating. Thanks
So I figured out the answer after copious amount of Googleing and grasping a better understanding of both the Withings API and the OAuth library I was using. Basically Withings uses query strings to pass in API parameters. I though I was going about passing these parameters correctly when I was making API calls, but apparently I needed to explicitly set the OAuth library to use the query string scheme, like so
http_method: :get, scheme: :query_string
This is appended to my OAuth consumer configuration and all worked fine immediately.

Ruby httpclient: 'create_request': undefined method 'each'

I'm green when it comes to Ruby. Right now I'm mucking about with a script which connects to the Terremark eCloud API Explorer. I'm trying to use the httpclient gem, but I'm a bit confused as to how I'm supposed to construct my client.
#!/usr/bin/ruby
require "httpclient"
require 'base64'
require 'hmac-sha1'
require 'openssl'
# Method definitions
def get_date
# Get the time and date in the necessary format
result = Time.now.strftime('%a, %d %b %Y %H:%M:%S GMT')
end
def get_signature(action,date,headers,resource,user,pass)
string_to_sign = "#{action}
#{date}
#{headers}
#{resource}\n"
return Base64.encode64(OpenSSL::HMAC.digest('sha1', "#{user}:#{pass}", "#{string_to_sign}"))
end
# Initial variables
date = get_date
domain = "https://services.enterprisecloud.terremark.com"
password = 'password'
query = {}
tmrk_headers = Hash.new
tmrk_headers['x-tmrk-date: '] = date
tmrk_headers['x-tmrk-version: '] = '2013-06-01'
uri = '/cloudapi/spec/networks/environments/1'
url = "#{domain}#{uri}"
username = 'user#terremark.com'
verb = 'GET'
signature = get_signature(verb,date,tmrk_headers,uri,username,password)
tmrk_headers['Authorization: '] = "Basic \"#{signature}\""
puts signature
client = HTTPClient.new
client.get_content(url,query,tmrk_headers)
EDIT: This is no longer valid as I've moved beyond this error with some help:
Right now I'm not concerned about seeing what is returned from the connection. I'm just looking to create an error-free run. For instance, if I run the script without the client.get_content line it will return to a prompt without issue (giving me the impression that everything ran cleanly, if not uselessly).
How am I supposed to construct this? The httpclient documentation uses the example with external headers:
extheader = [['Accept', 'image/jpeg'], ['Accept', 'image/png']]
clnt.get_content(uri, query, extheader)
I'm making the assumption that the query is the URI that I've defined.
In all reality, it isn't set up right in the first place. I need to be able to include the string in the auth_header variable in the string to be signed but the signature is actually part of the variable. I've obviously created a hole in that regard.
Any assistance with this will be more than appreciated.
EDIT2: Removed strace pastebin. Adding Ruby backtrace:
/home/msnyder/.rvm/gems/ruby-2.1.1/gems/httpclient-2.3.4.1/lib/httpclient.rb:1023:in `create_request': undefined method `each' for #<String:0x0000000207d1e8> (NoMethodError)
from /home/msnyder/.rvm/gems/ruby-2.1.1/gems/httpclient-2.3.4.1/lib/httpclient.rb:884:in `do_request'
from /home/msnyder/.rvm/gems/ruby-2.1.1/gems/httpclient-2.3.4.1/lib/httpclient.rb:959:in `follow_redirect'
from /home/msnyder/.rvm/gems/ruby-2.1.1/gems/httpclient-2.3.4.1/lib/httpclient.rb:594:in `get_content'
from ./test.rb:42:in `<main>'
EDIT3: Updated script; adding further backtrace after making necessary script modifications:
/
home/msnyder/.rvm/gems/ruby-2.1.1/gems/httpclient-2.3.4.1/lib/httpclient.rb:975:in `success_content': unexpected response: #<HTTP::Message::Headers:0x00000001dddc58 #http_version="1.1", #body_size=0, #chunked=false, #request_method="GET", #request_uri=#<URI::HTTPS:0x00000001ddecc0 URL:https://services.enterprisecloud.terremark.com/cloudapi/spec/networks/environments/1>, #request_query={}, #request_absolute_uri=nil, #status_code=400, #reason_phrase="Bad Request", #body_type=nil, #body_charset=nil, #body_date=nil, #body_encoding=#<Encoding:US-ASCII>, #is_request=false, #header_item=[["Content-Type", "text/html; charset=us-ascii"], ["Server", "Microsoft-HTTPAPI/2.0"], ["Date", "Thu, 27 Mar 2014 23:12:53 GMT"], ["Connection", "close"], ["Content-Length", "339"]], #dumped=false> (HTTPClient::BadResponseError)
from /home/msnyder/.rvm/gems/ruby-2.1.1/gems/httpclient-2.3.4.1/lib/httpclient.rb:594:in `get_content'
from ./test.rb:52:in `<main>'
The issue that you're having as stated by your backtrace
/home/msnyder/.rvm/gems/ruby-2.1.1/gems/httpclient-2.3.4.1/lib/httpclient.rb:1023:in `create_request': undefined method `each' for #<String:0x0000000207d1e8> (NoMethodError)
from /home/msnyder/.rvm/gems/ruby-2.1.1/gems/httpclient-2.3.4.1/lib/httpclient.rb:884:in `do_request'
from /home/msnyder/.rvm/gems/ruby-2.1.1/gems/httpclient-2.3.4.1/lib/httpclient.rb:959:in `follow_redirect'
from /home/msnyder/.rvm/gems/ruby-2.1.1/gems/httpclient-2.3.4.1/lib/httpclient.rb:594:in `get_content'
from ./test.rb:42:in `<main>'
is that it seems like you're passing a String object to one of the arguments in get_content where it expects an object that responds to the method each.
From looking at the documentation of httpclient#get_content http://www.ruby-doc.org/gems/docs/h/httpclient-xaop-2.1.6/HTTPClient.html#method-i-get_content
It expects the second parameter to be a Hash or Array of arguments
From your code sample and showing only the relevant parts
uri = '/cloudapi/spec/networks/environments/1'
url = "https://services.enterprisecloud.terremark.com"
tmrk_headers = "x-tmrk-date:\"#{date}\"\nx-tmrk-version:2014-01-01"
auth_header = "Authorization: CloudApi AccessKey=\"#{access_key}\" SignatureType=\"HmacSHA1\" Signature=\"#{signature}\""
full_header = "#{tmrk_headers}\n#{auth_header}"
client = HTTPClient.new
client.get_content(url,uri,full_header)
There are two things that I see wrong with your code.
You're passing in a String value for the query. Specifically, you're passing in uri which has a value of what I'm assuming is the path that you want to hit.
For the extra headers parameter, you're passing in a String value which is in the full_header
What you need to do in order to fix this is pass in the full url for the first parameter.
This means it should look something like this:
url = "https://services.enterprisecloud.terremark.com/cloudapi/spec/networks/environments/1"
query = {} # if you have any parameters to pass in they should be here.
headers = {
"x-tmrk-date" => date, "x-tmrk-version" => "2014-01-01",
"Authorization" => "CloudApi AccessKey=#{access_key} SignatureType=HmacSHA1 Signature=#{signature}"
}
client = HTTPClient.new
client.get_content(url, query, headers)

I am trying to use Curl::Easy.http_put but have some issues with the data argument

I'm struggling with a ruby script to upload some pictures to moodstocks using their http interface
here is the code that I have so far
curb = Curl::Easy.new
curb.http_auth_types = :digest
curb.username = MS_API
curb.password = MS_SECRET
curb.multipart_form_post = true
Dir.foreach(images_directory) do |image|
if image.include? '.jpg'
path = images_directory + image
filename = File.basename(path, File.extname(path))
puts "Upload #{path} with id #{filename}"
raw_url = 'http://api.moodstocks.com/v2/ref/' + filename
encoded_url = URI.parse URI.encode raw_url
curb.url = encoded_url
curb.http_put(Curl::PostField.file('image_file', path))
end
end
and this is the error that I get
/Library/Ruby/Gems/2.0.0/gems/curb-0.8.5/lib/curl/easy.rb:57:in `add': no implicit conversion of nil into String (TypeError)
from /Library/Ruby/Gems/2.0.0/gems/curb-0.8.5/lib/curl/easy.rb:57:in `perform'
from upload_moodstocks.rb:37:in `http_put'
from upload_moodstocks.rb:37:in `block in <main>'
from upload_moodstocks.rb:22:in `foreach'
from upload_moodstocks.rb:22:in `<main>'
I think the problem is in how I give the argument to the http_put method, but I have tried to look for some examples of Curl::Easy.http_put and have found nothing so far.
Could anyone point me to some documentation regarding it or help me out on this.
Thank you in advance
There are several problems here:
1. URI::HTTP instead of String
First, the TypeError you encounter comes from the fact that you pass a URI::HTTP instance (encoded_url) as curb.url instead of a plain Ruby string.
You may want to use encoded_url.to_s, but the question is why do you do this parse/encode here?
2. PUT w/ multipart/form-data
The second problem is related to curb. At the time of writing (v0.8.5) curb does NOT support the ability to perform a HTTP PUT request with multipart/form-data encoding.
If you refer to the source code you can see that:
the multipart_form_post setting is only used for POST requests,
the put_data setter does not support Curl::PostField-s
To solve your problem you need an HTTP client library that can combine Digest Authentication, multipart/form-data and HTTP PUT.
In Ruby you can use rufus-verbs, but you will need to use rest-client to build the multipart body.
There is also HTTParty but it has issues with Digest Auth.
That is why I greatly recommend to go ahead with Python and use Requests:
import requests
from requests.auth import HTTPDigestAuth
import os
MS_API_KEY = "kEy"
MS_API_SECRET = "s3cr3t"
filename = "sample.jpg"
with open(filename, "r") as f:
base = os.path.basename(filename)
uid = os.path.splitext(base)[0]
r = requests.put(
"http://api.moodstocks.com/v2/ref/%s" % uid,
auth = HTTPDigestAuth(MS_API_KEY, MS_API_SECRET),
files = {"image_file": (base, f.read())}
)
print(r.status_code)

`open_http': 403 Forbidden (OpenURI::HTTPError) for the string "Steve_Jobs" but not for any other string

I was going through the Ruby tutorials provided at http://ruby.bastardsbook.com/ and I encountered the following code:
require "open-uri"
remote_base_url = "http://en.wikipedia.org/wiki"
r1 = "Steve_Wozniak"
r2 = "Steve_Jobs"
f1 = "my_copy_of-" + r1 + ".html"
f2 = "my_copy_of-" + r2 + ".html"
# read the first url
remote_full_url = remote_base_url + "/" + r1
rpage = open(remote_full_url).read
# write the first file to disk
file = open(f1, "w")
file.write(rpage)
file.close
# read the first url
remote_full_url = remote_base_url + "/" + r2
rpage = open(remote_full_url).read
# write the second file to disk
file = open(f2, "w")
file.write(rpage)
file.close
# open a new file:
compiled_file = open("apple-guys.html", "w")
# reopen the first and second files again
k1 = open(f1, "r")
k2 = open(f2, "r")
compiled_file.write(k1.read)
compiled_file.write(k2.read)
k1.close
k2.close
compiled_file.close
The code fails with the following trace:
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/open-uri.rb:277:in `open_http': 403 Forbidden (OpenURI::HTTPError)
from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/open-uri.rb:616:in `buffer_open'
from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/open-uri.rb:164:in `open_loop'
from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/open-uri.rb:162:in `catch'
from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/open-uri.rb:162:in `open_loop'
from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/open-uri.rb:132:in `open_uri'
from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/open-uri.rb:518:in `open'
from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/open-uri.rb:30:in `open'
from /Users/arkidmitra/tweetfetch/samecode.rb:11
My problem is not that the code fails but that whenever I change r2 to anything other than Steve_Jobs, it works. What is happening here?
Your code runs fine for me (Ruby MRI 1.9.3) when I request a wiki page that exists.
When I request a wiki page that does NOT exist, I get a mediawiki 404 error code.
Steve_Jobs => success
Steve_Austin => success
Steve_Rogers => success
Steve_Foo => error
Wikipedia does a ton of caching, so if you see reponses for "Steve_Jobs" that are different than other people who do exist, then best-guess this is because wikipedia is caching the Steve Jobs article because he's famous, and potentially adding extra checks/verifications to protect the article from rapid changes, defacings, etc.
The solution for you: always open the url with a User Agent string.
rpage = open(remote_full_url, "User-Agent" => "Whatever you want here").read
Details from the Mediawiki docs: "When you make HTTP requests to the MediaWiki web service API, be sure to specify a User-Agent header that properly identifies your client. Don't use the default User-Agent provided by your client library, but make up a custom header that includes the name and the version number of your client: something like "MyCuteBot/0.1".
On Wikimedia wikis, if you don't supply a User-Agent header, or you supply an empty or generic one, your request will fail with an HTTP 403 error. See our User-Agent policy."
I think this happens for locked down entries like "Steve Jobs", "Al-Gore" etc. This is specified in the same book that you are referring to:
For some pages – such as Al Gore's locked-down entry – Wikipedia will
not respond to a web request if a User-Agent isn't specified. The
"User-Agent" typically refers to your browser, and you can see this by
inspecting the headers you send for any page request in your browser.
By providing a "User-Agent" key-value pair, (I basically use "Ruby"
and it seems to work), we can pass it as a hash (I use the constant
HEADERS_HASH in the example) as the second argument of the method
call.
It is specified later at http://ruby.bastardsbook.com/chapters/web-crawling/

How to pass cookies from one page to another using curl in Ruby?

I am doing a video crawler in ruby. In there I have to log in to a page by enabling cookies and download pages. For that I am using the CURL library in ruby. I can successfully log in, but I can't download the pages inside that with curl. How can I fix this or download the pages otherwise?
My code is
curl = Curl::Easy.new(1st url)
curl.follow_location = true
curl.enable_cookies = true
curl.cookiefile = "cookie.txt"
curl.cookiejar = "cookie.txt"
curl.http_post(1st url,field)
curl.perform
curl = Curl::Easy.perform(2nd url)
curl.follow_location = true
curl.enable_cookies = true
curl.cookiefile = "cookie.txt"
curl.cookiejar = "cookie.txt"
curl.http_get
code = curl.body_str
What I've seen in writing my own similar "post-then-get" script is that ruby/Curb (I'm using version 0.7.15 with ruby 1.8) seems to ignore the cookiejar/cookiefile fields of a Curl::Easy object. If I set either of those fields and the http_post completes successfully, no cookiejar or cookiefile file is created. Also, curl.cookies will still be nil after your curl.http_post, however, the cookies ARE set within the curl object. I promise :)
I think where you're going wrong is here:
curl = Curl::Easy.perform(2nd url)
The curb documentation states that this creates a new object. That new object doesn't have any of your existing cookies set. If you change your code to look like the following, I believe it should work. I've also removed the curl.perform for the first url since curl.http_post already implicitly does the "perform". You were basically http_post'ing twice before trying your http_get.
curl = Curl::Easy.new(1st url)
curl.follow_location = true
curl.enable_cookies = true
curl.http_post(1st url,field)
curl.url = 2nd url
curl.http_get
code = curl.body_str
If this still doesn't seem to be working for you, you can verify if the cookie is getting set by adding
curl.verbose = true
Before
curl.http_post
Your Curl::Easy object will dump all the headers that it gets in the response from the server to $stdout, and somewhere in there you should see a line stating that it added/set a cookie. I don't have any example output right now but I'll try to post a follow-up soon.
HTTPClient automatically enables cookies, as does Mechanize.
From the HTTPClient docs:
clnt = HTTPClient.new
clnt.get_content(url1) # receives Cookies.
clnt.get_content(url2) # sends Cookies if needed.
Posting a form is easy too:
body = { 'keyword' => 'ruby', 'lang' => 'en' }
res = clnt.post(uri, body)
Mechanize makes this sort of thing really simple (It will handle storing the cookies, among other things).

Resources