Best practices for handling binary data in Ruby? - ruby

What are the best practices for reading and writing binary data in Ruby?
In the code sample below I needed to send a binary file using over HTTP (as POST data):
class SimpleHandler < Mongrel::HttpHandler
def process(request, response)
response.start(200) do |head,out|
head["Content-Type"] = "application/ocsp-responder"
f = File.new("resp.der", "r")
begin
while true
out.syswrite(f.sysread(1))
end
rescue EOFError => err
puts "Sent response."
end
end
end
end
While this code seems to do a good job, it probably isn't very idiomatic. How can I improve it?

Then FileUtils copy_stream might be of use.
require 'fileutils'
fin = File.new('svarttag.jpg')
fout = File.new('blacktrain.jpg','w')
FileUtils.copy_stream(fin,fout)
fin.close
fout.close
Maybe not exactly what you asked for but if it's the whole HTTP POST files issue you want to solve then HTTPClient can do it for you:
require 'httpclient'
HTTPClient.post 'http://nl.netlog.com/test', { :file => File.new('resp.der') }
Also I've heard that Nick Siegers multipart-post is good but I haven't used it.

Related

Open-Uri Alternative - Getting a response from a website

The following code works in Ruby 1.9.3p-551
require "open-uri"
res = open("http://example.com/version").read
p res => {"buildNumber": 2496, "buildDate": "2015-09-29 11:18:02 +0200", "timestamp": 1443639212 }
In any Ruby version higher than 1.9.3 I get the following error;
from /Users/imac/.rbenv/versions/2.1.0/lib/ruby/2.1.0/net/http/response.rb:357:in `finish': incorrect header check (Zlib::DataError)
I need to use a higher version as this will be used in a Rails 4 app.
Any ideas for alternatives?
Turns out the gzip encoding is not accepted by default. Or at least that's what I'm guessing.
The following works.
res = open("http://someurl.com/version", "Accept-Encoding" => "plain").read
Interesting how this changed from Ruby 2.0.0+
Another neat solution for this.
require 'rest-client'
url = "http://example.com/version"
def get_response(url)
begin
return RestClient.get(url, {:accept => :json})
rescue RestClient::GatewayTimeout
"GatewayTimeout"
rescue RestClient::RequestTimeout
"RequestTimeout"
rescue SocketError
"SocketError"
end
end
p get_response(url)
# => "{\"buildNumber\": 2535, \"buildDate\": \"2015-09-30 17:41:42 +0200\", \"timestamp\": 1444085042 }"

how to post (http-post) content of pdf using ruby?

I am trying to post (raw) content of a PDF in ruby using the following block
require 'pdf/reader'
require 'curb'
reader = PDF::Reader.new('folder/file.pdf')
raw_string = ''
reader.pages.each do |page|
raw_string = raw_string + page.raw_content.to_s
end
c = Curl::Easy.new('http://0.0.0.0:4567/pdf_upload')
c.http_post(Curl::PostField.content('param1', 'value1'),Curl::PostField.content('param2', 'value2'), c.http_post(Curl::PostField.content('body', raw_string)))
Inside the API implementation params[:body] seems to be empty all the time (though puts raw_string confirms that the variable has all the values.
Also, is there a better way to post pdf content?
Regarding how you're building raw_string...
Instead of:
reader.pages.each do |page|
raw_string = raw_string + page.raw_content.to_s
end
You should be able to do something like one of these:
raw_string = reader.pages.map(&:raw_content).join
raw_string = reader.pages.map{ |p| p.raw_content.to_s }.join
I'd also recommend you write your last line spread across several lines, for clarity and readability:
c.http_post(
Curl::PostField.content('param1', 'value1'),
Curl::PostField.content('param2', 'value2'),
c.http_post(Curl::PostField.content('body', raw_string))
)

How to get HTTP headers before downloading with Ruby's OpenUri

I am currently using OpenURI to download a file in Ruby. Unfortunately, it seems impossible to get the HTTP headers without downloading the full file:
open(base_url,
:content_length_proc => lambda {|t|
if t && 0 < t
pbar = ProgressBar.create(:total => t)
end
},
:progress_proc => lambda {|s|
pbar.progress = s if pbar
}) {|io|
puts io.size
puts io.meta['content-disposition']
}
Running the code above shows that it first downloads the full file and only then prints the header I need.
Is there a way to get the headers before the full file is downloaded, so I can cancel the download if the headers are not what I expect them to be?
You can use Net::HTTP for this matter, for example:
require 'net/http'
http = Net::HTTP.start('stackoverflow.com')
resp = http.head('/')
resp.each { |k, v| puts "#{k}: #{v}" }
http.finish
Another example, this time getting the header of the wonderful book, Object Orient Programming With ANSI-C:
require 'net/http'
http = Net::HTTP.start('www.planetpdf.com')
resp = http.head('/codecuts/pdfs/ooc.pdf')
resp.each { |k, v| puts "#{k}: #{v}" }
http.finish
It seems what I wanted is not possible to archieve using OpenURI, at least not, as I said, without loading the whole file first.
I was able to do what I wanted using Net::HTTP's request_get
Here an example:
http.request_get('/largefile.jpg') {|response|
if (response['content-length'] < max_length)
response.read_body do |str| # read body now
# save to file
end
end
}
Note that this only works when using a block, doing it like:
response = http.request_get('/largefile.jpg')
the body will already be read.
Rather than use Net::HTTP, which can be like digging a pool on the beach using a sand shovel, you can use a number of the HTTP clients for Ruby and clean up the code.
Here's a sample using HTTParty:
require 'httparty'
resp = HTTParty.head('http://example.org')
resp.headers
# => {"accept-ranges"=>["bytes"], "cache-control"=>["max-age=604800"], "content-type"=>["text/html"], "date"=>["Thu, 02 Mar 2017 18:52:42 GMT"], "etag"=>["\"359670651\""], "expires"=>["Thu, 09 Mar 2017 18:52:42 GMT"], "last-modified"=>["Fri, 09 Aug 2013 23:54:35 GMT"], "server"=>["ECS (oxr/83AB)"], "x-cache"=>["HIT"], "content-length"=>["1270"], "connection"=>["close"]}
At that point it's easy to check the size of the document:
resp.headers['content-length'] # => "1270"
Unfortunately, the HTTPd you're talking to might not know how big the content will be; In order to respond quickly servers don't necessarily calculate the size of dynamically generated output, which would take almost as long and be almost as CPU intensive as actually sending it, so relying on the "content-length" value might be buggy.
The issue with Net::HTTP is it won't automatically handle redirects, so then you have to add additional code. Granted, that code is supplied in the documentation, but the code keeps growing as you need to do more things, until you've ended up writing yet another http client (YAHC). So, avoid that and use an existing wheel.

Ruby library to make multiple HTTP requests simultaneously

I'm looking for an alternate Ruby HTTP library that makes multiple HTTP calls simultaneously and performs better than the core Net::HTTP library.
You are probably looking for Typhoeus.
Typhoeus runs HTTP requests in parallel while cleanly encapsulating handling logic
https://github.com/typhoeus/typhoeus
Why do you need a networking library handle parallelism? That is exactly what threads are for.
require "open-uri"
fetcher = lambda do |uri|
puts "Started fetching #{uri}"
puts open(uri).read
puts "Stopped fetching #{uri}"
end
thread1 = Thread.new("http://localhost:9292", &fetcher)
thread2 = Thread.new("http://localhost:9293", &fetcher)
thread1.join
thread2.join
Also, I don't understand what do you mean by "performs better". Core libraries are usually good enough to be in the core. Do you have any problems with Net::HTTP?
You can use the Parallel gem, it should work with any Ruby HTTP library.
Not sure if it performs better then Typhoeus, BUT you could use Eventmacheine + em-http-request. There is an example for sending multiple requests.
require 'eventmachine'
require 'em-http'
EventMachine.run {
multi = EventMachine::MultiRequest.new
reqs = [
'http://google.com/',
'http://google.ca:81/'
]
reqs.each_with_index do |url, idx|
http = EventMachine::HttpRequest.new(url, :connect_timeout => 1)
req = http.get
multi.add idx, req
end
multi.callback do
p multi.responses[:callback].size
p multi.responses[:errback].size
EventMachine.stop
end
}
https://github.com/igrigorik/em-http-request

Optimizing Ruby RSS

I'm writing a very simple Ruby script to parse tweets out of a twitter RSS feed. Here's the code I have:
require 'rss'
#rss = RSS::Parser.parse('statuses.xml', false)
outputfile = open("output.txt", "w")
#rss.items.each do |i|
pubdate = i.published.to_s
if pubdate.include? '2011-05'
tweet = i.title.to_s
tweet = tweet.gsub(/<title>SlyFlourish: /, "")
tweet = tweet.gsub(/<\/title>/, "\n\n")
outputfile << tweet
end
end
I think I'm missing something about dealing with the objects coming out of the RSS parser. Can someone tell me how I can better pull out the title and date entries from the object returned by the parser?
Is there a reason you chose RSS? Parsing XML is expensive.
I'd consider using JSON instead.
There's also a twitter Ruby gem that makes this really easy:
require "twitter"
Twitter.user_timeline("gavin_morrice").each do |tweet|
puts tweet.text
puts tweet.created_at
end

Resources