Resuming file downloads in Ruby, range header issue - ruby

When setting a range header in Ruby 1.8.7, an additional "X-REMOVED: Range" header is being added, which (seemingly) prevents download resumes from working.
size = File.size(local_file)
Net::HTTP.start(domain) do |http|
headers = {
'Range' => "bytes=#{size}-"
}
resp = http.get(remote_file, headers)
open(local_file, "wb") do |file|
file.write(resp.body)
end
end
Header sent:
GET /test.zip HTTP/1.1..Host: 192.168.50.1..Accept: */*..X-REMOVED: Range..Range: bytes=481-....
I've also tried using set_range with the same result.

Well this is embarrassing. The resumes not working had nothing to do with the range header. It's just that I was opening the file with "wb" instead of "ab".

Related

Net::HTTP get a PDF file and save with paperclip

I would download a PDF File in a web server. I use the Net::HTTP Ruby class.
def open_file(url)
uri = URI.parse(url)
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
request = Net::HTTP::Get.new(uri.path)
request.basic_auth(self.class.user, self.class.password)
http.request(request)
end
It works, I retrieve my PDF file, it's a string like : %PDF-1.3\n%\ ...
I have a method who return the result :
def file
result = open_file(self.file_url)
times = 0
if result.code == 404 && times <= 5
sleep(1)
times += 1
file
else
result.body
end
end
(It's a recursive method because that possible the file doesn't exist again on the server)
But when I would save this file with Paperclip, I have a error : Paperclip::AdapterRegistry::NoHandlerError (No handler found for "%PDF-1.3\n% ...
I tried manipulate the file with StringIO... without success :(.
Anyone have a idea ?
Assuming the PDF object you're getting is okay (I'm not 100% sure it is), then you could do this:
file = StringIO.new(attachment) #mimic a real upload file
file.class.class_eval { attr_accessor :original_filename, :content_type } #add attr's that paperclip needs
file.original_filename = "your_report.pdf"
file.content_type = "application/pdf"
then save the file with Paperclip.
(from "Save a Prawn PDF as a Paperclip attachment?")

Sinatra streaming response with headers

I want to proxy remote files through a Sinatra application. This requires streaming an HTTP response with headers from a remote source back to the client, but I can't figure out how to set the headers of the response while using the streaming API inside the block provided by Net::HTTP#get_response.
For example, this will not set response headers:
get '/file' do
stream do |out|
uri = URI("http://manuals.info.apple.com/en/ipad_user_guide.pdf")
Net::HTTP.get_response(uri) do |file|
headers 'Content-Type' => file.header['Content-Type']
file.read_body { |chunk| out << chunk }
end
end
end
And this results in the error: Net::HTTPOK#read_body called twice (IOError):
get '/file' do
response = nil
uri = URI("http://manuals.info.apple.com/en/ipad_user_guide.pdf")
Net::HTTP.get_response(uri) do |file|
headers 'Content-Type' => file.header['Content-Type']
response = stream do |out|
file.read_body { |chunk| out << chunk }
end
end
response
end
I could be wrong but after thinking a bit about this it appears to me that when setting the response headers from inside the stream helper block, those headers don't get applied into the response because the execution of that block is actually being deferred. So, probably, the block gets evaluated and the response headers get set before it begins executing.
A possible workaround for this is issuing a HEAD request before streaming back the contents of the file.
For example:
get '/file' do
uri = URI('http://manuals.info.apple.com/en/ipad_user_guide.pdf')
# get only header data
head = Net::HTTP.start(uri.host, uri.port) do |http|
http.head(uri.request_uri)
end
# set headers accordingly (all that apply)
headers 'Content-Type' => head['Content-Type']
# stream back the contents
stream do |out|
Net::HTTP.get_response(uri) do |f|
f.read_body { |ch| out << ch }
end
end
end
It may not be ideal for your use case because of the additional request but it should be small enough to not be much of a problem (delay) and it adds the benefit that your app may be able to react if that request fails before sending back any data.
Hope it helps.

Converting python script to ruby (downloading part of a file)

I've been at this for a couple of day, and am having no luck at all. Despite reading over these two posts, I can't seem to rewrite this little python script I did up in ruby.
clean_link = link['href'].replace(' ', '%20')
mp3file = urllib2.urlopen(clean_link)
output = open('temp.mp3','wb')
output.write(mp3file.read(2000))
output.close()
I've been looking at using open-uri and net/http to do the same in ruby, but keep hitting a url redirect issue. So far I have
clean_link = link.attributes['href'].gsub(' ', '%20')
link_pieces = clean_link.scan(/http:\/\/(?:www\.)?([^\/]+?)(\/.*?\.mp3)/)
host = link_pieces[0][0]
path = link_pieces[0][1]
Net::HTTP.start(host) do |http|
resp = http.get(path)
open("temp.mp3", "wb") do |file|
file.write(resp.body)
end
end
Is there a simpler way to do this in ruby? Also, as with the python script, is there a way to only download part of the file?
EDIT: progress updated
see here & here
http.request_get('/index.html') {|res|
size = 0
res.read_body do |chunk|
size += chunk.size
# do some processing
break if size >= 2000
end
}
but you can't control chunk sizes here

Download image with Ruby RIO gem

My code:
require 'rio'
rio('nice.jpg') < rio('http://farm4.static.flickr.com/3134/3160515898_59354c9733.jpg?v=0')
But the image downloaded is currupted. Whtat is wrong with this solution?
pjb3 is correct. You must call binmode on the left-hand term:
rio('nice.jpg').binmode < rio('http://...')
If this still does not work (notably, it may happen for large jpeg files, i.e. rio uses an intermediate temp file when retrieving from the URL you have provided), then apply the binmode modifier to both terms:
rio('nice.jpg').binmode < rio('http://...').binmode
2011 UPDATE
According to Luke C., the above answer no longer applies to more recent versions of the gem:
Neither of these work. On Linux having .binmode set on the destination causes a Errno::ENOENT exception. Doing: rio('nice.jpg') < rio('http://...').binmode works
It works for me. Are you on windows? It might be because the file isn't being opened with the binary flag.
I had similar problems downloading images on Linux, I found that this worked for me:
rio(source_url).binmode > rio(filename)
Here is some simple ruby code to download an image
require 'net/http'
url = URI.parse("http://www.somedomain.com/image.jpg")
Net::HTTP.start(url.host, url.port) do |http|
resp, data = http.get(url.path, nil)
open( File.join(File.dirname(__FILE__), "image.jpg"), "wb" ) { |file| file.write(resp.body) }
end
This can even be extended to follow redirects:
require 'net/http'
url = URI.parse("http://www.somedomain.com/image.jpg")
Net::HTTP.start(url.host, url.port) do |http|
resp, data = http.get(url.path, nil)
prev_redirect = ''
while resp.header['location']
raise "Recursive redirect: #{resp.header['location']}" if prev_redirect == resp.header['location']
prev_redirect = resp.header['location']
url = URI.parse(resp.header['location'])
host = url.host if url.host
port = url.port if url.port
http = Net::HTTP.new(host, port)
resp, data = http.get(url.path, nil)
end
open( File.join(File.dirname(__FILE__), "image.jpg"), "wb" ) { |file| file.write(resp.body) }
end
It can probably be prettied up some, but it gets the job done, and is not dependent on any 3rd party gems! :)
I guess this is a bug. On windows all 0x0A replaced with 0x0D 0x0A. And as so, it makes sence that properly used (with .binmode) it works on Linux.
For downloading pictures from the web page, you can use ruby gem image_downloader

How to make an HTTP GET with modified headers?

What is the best way to make an HTTP GET request in Ruby with modified headers?
I want to get a range of bytes from the end of a log file and have been toying with the following code, but the server is throwing back a response saying that "it is a request that the server could not understand" (the server is Apache).
require 'net/http'
require 'uri'
#with #address, #port, #path all defined elsewhere
httpcall = Net::HTTP.new(#address, #port)
headers = {
'Range' => 'bytes=1000-'
}
resp, data = httpcall.get2(#path, headers)
Is there a better way to define headers in Ruby?
Does anyone know why this would be failing against Apache? If I do a get in a browser to http://[address]:[port]/[path] I get the data I am seeking without issue.
Created a solution that worked for me (worked very well) - this example getting a range offset:
require 'uri'
require 'net/http'
size = 1000 #the last offset (for the range header)
uri = URI("http://localhost:80/index.html")
http = Net::HTTP.new(uri.host, uri.port)
headers = {
'Range' => "bytes=#{size}-"
}
path = uri.path.empty? ? "/" : uri.path
#test to ensure that the request will be valid - first get the head
code = http.head(path, headers).code.to_i
if (code >= 200 && code < 300) then
#the data is available...
http.get(uri.path, headers) do |chunk|
#provided the data is good, print it...
print chunk unless chunk =~ />416.+Range/
end
end
If you have access to the server logs, try comparing the request from the browser with the one from Ruby and see if that tells you anything. If this isn't practical, fire up Webrick as a mock of the file server. Don't worry about the results, just compare the requests to see what they are doing differently.
As for Ruby style, you could move the headers inline, like so:
httpcall = Net::HTTP.new(#address, #port)
resp, data = httpcall.get2(#path, 'Range' => 'bytes=1000-')
Also, note that in Ruby 1.8+, what you are almost certainly running, Net::HTTP#get2 returns a single HTTPResponse object, not a resp, data pair.

Resources