I am trying to download the latest.zip from WordPress.org using Net::HTTP. This is what I have got so far:
Net::HTTP.start("wordpress.org/") { |http|
resp = http.get("latest.zip")
open("a.zip", "wb") { |file|
file.write(resp.body)
}
puts "WordPress downloaded"
}
But this only gives me a 4 kilobytes 404 error HTML-page (if I change file to a.txt). I am thinking this has something to do with the URL probably is redirected somehow but I have no clue what I am doing. I am a newbie to Ruby.
My first question is why use Net::HTTP, or code to download something that could be done more easily using curl or wget, which are designed to make it easy to download files?
But, since you want to download things using code, I'd recommend looking at Open-URI if you want to follow redirects. Its a standard library for Ruby, and very useful for fast HTTP/FTP access to pages and files:
require 'open-uri'
open('latest.zip', 'wb') do |fo|
fo.print open('http://wordpress.org/latest.zip').read
end
I just ran that, waited a few seconds for it to finish, ran unzip against the downloaded file "latest.zip", and it expanded into the directory containing their content.
Beyond Open-URI, there's HTTPClient and Typhoeus, among others, that make it easy to open an HTTP connection and send queriers/receive data. They're very powerful and worth getting to know.
NET::HTTP doesn't provide a nice way of following redirects, here is a piece of code that I've been using for a while now:
require 'net/http'
class RedirectFollower
class TooManyRedirects < StandardError; end
attr_accessor :url, :body, :redirect_limit, :response
def initialize(url, limit=5)
#url, #redirect_limit = url, limit
end
def resolve
raise TooManyRedirects if redirect_limit < 0
self.response = Net::HTTP.get_response(URI.parse(url))
if response.kind_of?(Net::HTTPRedirection)
self.url = redirect_url
self.redirect_limit -= 1
resolve
end
self.body = response.body
self
end
def redirect_url
if response['location'].nil?
response.body.match(/<a href=\"([^>]+)\">/i)[1]
else
response['location']
end
end
end
wordpress = RedirectFollower.new('http://wordpress.org/latest.zip').resolve
puts wordpress.url
File.open("latest.zip", "w") do |file|
file.write wordpress.body
end
Related
Attempting to prompt a download window and stream an XLSX file using Ruby Sinatra and the AXLSX gem, my excel file serializes successfully to local file, so I know its a valid excel doc, but I need it to transfer content to the end user. There haven't been any docs online with examples of AXLS and Sinatra used together, only rails. Help is appreciated!
class Downloads < Sinatra::Base
get '/downloads/report' do
## ...
Axlsx::Package.new do |p|
p.workbook.add_worksheet(name: 'tab name') do |sheet|
## ...
end
content_type 'application/xlsx'
attachment 'cost-code-dashboard.xlsx'
p.to_stream # unsuccessful
# p.to_stream.read # unsuccessful as well
end
end
end
I have also tried the following snippet unsuccessfully
Axlsx::Package.new do |p|
## ...
send_file p.to_stream.read, type: "application/xlsx", filename: "cost-code-dashboard.xlsx"
end
It appears that the issue had everything to do with how Axlsx::Package.new was called, the helper functions were not available inside Axlsx, the following solution worked - online documentation said that the below content_type was better
get '/downloads' do
content_type :'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
p = Axlsx::Package.new
p.workbook.add_worksheet(name: 'Test') do |sheet|
sheet.add_row ['Hello world']
end
p.to_stream
end
i'm still fairly new to server side scripts and try myself a little bit on ruby to write me little helpers and to learn some new things.
I currently try to write a small ruby app which sends a json file of all images within a specific folder to my page where i can use those to handle them further in js.
I read quite a few introductions to ruby and rails and got a recommendation to look into rack as a lightweight communicator between server and app.
While the ruby part works fine, i have difficulties to understand how to send out the generated JSON as a reaction to a future ajax call (e.g.). Hope someone can give me a few hints or sources to look into for further understanding. Thanks!
require 'json'
class listImages
def call(env)
imageDir = Dir.chdir("./img");
files = Dir.glob("img*")
n = 0
tempHash = {}
files.each do |i|
tempHash["img#{n}"] = i
n += 1
end
File.open("temp.json","w") do |f|
f.write(tempHash.to_json)
end
[200,{"Content-Type" => "application/javascript"}, ["temp.json"]]
end
puts "All done!"
end
run listImages.new
if $0 == __FILE__
require 'rack'
Rack::Handler::WEBrick.run MyApp.new
end
You don't have to save the JSON to a file before you can send it. Just send it directly:
[200, {"Content-Type" => "application/json"}, [tempHash.to_json]]
With your current code, you are only sending the String "temp.json".
That said, the rest of your code looks a little bit messy/not conform Ruby coding standards:
Start your classnames with an uppercase: class ListImages, not class listImages.
Use underscores, not camelcase for variable names: image_dir, not imageDir.
The puts "All done!" statement is outside the method definition and will be called early, when the class is loaded.
You define a class ListImages but in the last line of your code you refer to MyApp.
It appears the Net::HTTP library doesn't support loading of local file via file:// . I'd like to configure loading of content from a file or remotely, depending on environment.
Is there a standard Ruby way to access either type the same way, or barring that some succinct code that branches?
Do you know about open-uri?
require 'open-uri'
open("/home/me/file.txt") { |f| ... }
open("http://www.google.com") { |f| ... }
So to support either "http://" or "file://" in one statement, simply remove the "file://" from the beginning of the uri if it is present (and no need to do any processing for "http://"), like so:
uri = ...
open(uri.sub(%r{^file://}, ''))
Here's some experimental code that teaches "open-uri" to handle "file:" URIs:
require 'open-uri'
require 'uri'
module URI
class File < Generic
def open(*args, &block)
::File.open(self.path, &block)
end
end
##schemes['FILE'] = File
end
As Ben Lee pointed out, open-uri is the way to go here. I've also used it in combination with paperclip for storing resources associated with models, which makes everything brilliantly simple.
require 'open-uri'
class SomeModel < ActiveRecord::Base
attr_accessor :remote_url
has_attached_file :resource # etc, etc.
before_validation :get_remote_resource, :if => :remote_url_provided?
validates_presence_of :remote_url, :if => :remote_url_provided?,
:message => 'is invalid or missing'
def get_remote_resource
self.resource = SomeModel.download_remote_resource(self.remote_url)
end
def self.download_remote_resource (uri)
io = open(URI.parse(uri))
def io.original_filename; base_uri.path.split('/').last; end
io.original_filename.blank? ? nil : io
rescue
end
end
# SomeModel.new(:remote_url => 'http://www.google.com/').save
How do I take this URL http://t.co/yjgxz5Y and get the destination URL which is http://nickstraffictricks.com/4856_how-to-rank-1-in-google/
require 'net/http'
require 'uri'
Net::HTTP.get_response(URI.parse('http://t.co/yjgxz5Y'))['location']
# => "http://nickstraffictricks.com/4856_how-to-rank-1-in-google/"
I've used open-uri for this, because it's nice and simple. It will retrieve the page, but will also follow multiple redirects:
require 'open-uri'
final_uri = ''
open('http://t.co/yjgxz5Y') do |h|
final_uri = h.base_uri
end
final_uri # => #<URI::HTTP:0x00000100851050 URL:http://nickstraffictricks.com/4856_how-to-rank-1-in-google/>
The docs show a nice example for using the lower-level Net::HTTP to handle redirects.
require 'net/http'
require 'uri'
def fetch(uri_str, limit = 10)
# You should choose better exception.
raise ArgumentError, 'HTTP redirect too deep' if limit == 0
response = Net::HTTP.get_response(URI.parse(uri_str))
case response
when Net::HTTPSuccess then response
when Net::HTTPRedirection then fetch(response['location'], limit - 1)
else
response.error!
end
end
puts fetch('http://www.ruby-lang.org')
Of course this all breaks down if the page isn't using a HTTP redirect. A lot of sites use meta-redirects, which you have to handle by retrieving the URL from the meta tag, but that's a different question.
For resolving redirects you should use a HEAD request to avoid downloading the whole response body (imagine resolving a URL to an audio or video file).
Working example using the Faraday gem:
require 'faraday'
require 'faraday_middleware'
def resolve_redirects(url)
response = fetch_response(url, method: :head)
if response
return response.to_hash[:url].to_s
else
return nil
end
end
def fetch_response(url, method: :get)
conn = Faraday.new do |b|
b.use FaradayMiddleware::FollowRedirects;
b.adapter :net_http
end
return conn.send method, url
rescue Faraday::Error, Faraday::Error::ConnectionFailed => e
return nil
end
puts resolve_redirects("http://cre.fm/feed/m4a") # http://feeds.feedburner.com/cre-podcast
You would have to follow the redirect. I think that would help :
http://shadow-file.blogspot.com/2009/03/handling-http-redirection-in-ruby.html
I only need to download the first few kilobytes of a file via HTTP.
I tried
require 'open-uri'
url = 'http://example.com/big-file.dat'
file = open(url)
content = file.read(limit)
But it actually downloads the full file.
This seems to work when using sockets:
require 'socket'
host = "download.thinkbroadband.com"
path = "/1GB.zip" # get 1gb sample file
request = "GET #{path} HTTP/1.0\r\n\r\n"
socket = TCPSocket.open(host,80)
socket.print(request)
# find beginning of response body
buffer = ""
while !buffer.match("\r\n\r\n") do
buffer += socket.read(1)
end
response = socket.read(100) #read first 100 bytes of body
puts response
I'm curious if there is a "ruby way".
This is an old thread, but it's still a question that seems mostly unanswered according to my research. Here's a solution I came up with by monkey-patching Net::HTTP a bit:
require 'net/http'
# provide access to the actual socket
class Net::HTTPResponse
attr_reader :socket
end
uri = URI("http://www.example.com/path/to/file")
begin
Net::HTTP.start(uri.host, uri.port) do |http|
request = Net::HTTP::Get.new(uri.request_uri)
# calling request with a block prevents body from being read
http.request(request) do |response|
# do whatever limited reading you want to do with the socket
x = response.socket.read(100);
end
end
rescue IOError
# ignore
end
The rescue catches the IOError that's thrown when you call HTTP.finish prematurely.
FYI, the socket within the HTTPResponse object isn't a true IO object (it's an internal class called BufferedIO), but it's pretty easy to monkey-patch that, too, to mimic the IO methods you need. For example, another library I was using (exifr) needed the readchar method, which was easy to add:
class Net::BufferedIO
def readchar
read(1)[0].ord
end
end
Check out "OpenURI returns two different objects". You might be able to abuse the methods in there to interrupt downloading/throw away the rest of the result after a preset limit.