Streaming data in Ruby net/http PUT request - ruby

In the Ruby-doc Net/HTTP there is a detailed example for streaming response bodies - it applies when you try to download a large file.
I am looking for an equivalent code snippet to upload a file via PUT. Spent quite a bit of time trying to make code work with no luck. I think I need to implement a particular interface and pass it the to request.body_stream
I need streaming because I want to alter the content of the file while it is being uploaded so I want to have access to the buffered chunks while the upload takes place. I would gladly use a library like http.rb or rest-client as long as I can use streaming.
Thanks in advance!
For reference following is the working non streamed version
uri = URI("http://localhost:1234/upload")
Net::HTTP.start(uri.host, uri.port) do |http|
request = Net::HTTP::Put.new uri
File.open("/home/files/somefile") do |f|
request.body = f.read()
end
# Ideally I would need to use **request.body_stream** instead of **body** to get streaming
http.request request do |response|
response.read_body do |result|
# display the operation result
puts result
end
end
end

Related

Wait for selector to present

When doing web scraping with Nokogiri I occasionally get the following error message
undefined method `at_css' for nil:NilClass (NoMethodError)
I know that the selected element is present at some time, but the site is sometimes a bit slow to respond, and I guess this is the reason why I'm getting the error.
Is there some way to wait until a certain selector is present before proceeding with the script?
My current http request block looks like this
url = URL
body = BODY
uri = URI.parse(url)
http = Net::HTTP.new(uri.host, uri.port)
http.read_timeout = 200 # default 60 seconds
http.open_timeout = 200 # default nil
http.use_ssl = true
request = Net::HTTP::Post.new(uri.request_uri)
request.body = body
request["Content-Type"] = "application/x-www-form-urlencoded"
begin
response = http.request(request)
doc = Nokogiri::HTML(response.body)
rescue
sleep 100
retry
end
While you can use a streaming Net::HTTP like #Stefan says in his comment, and an associated handler that includes Nokogiri, you can't parse a partial HTTP document using a DOM model, which is Nokogiri's default, because it expects the full document also.
You could use Nokogiri's SAX parser, but that's an entirely different programming style.
If you're retrieving an entire page, then use OpenURI instead of the lower-level Net::HTTP. It automatically handles a number of things that Net::HTTP will not do by default, such as redirection, which makes it a lot easier to retrieve pages and will greatly simplify your code.
I suspect the problem is either that the site is timing out, or the tag you're trying to find is dynamically loaded after the real page loads.
If it's timing out you'll need to increase your wait time.
If it's dynamically loading that markup, you can request the main page, locate the appropriate URL for the dynamic content and load it separately. Once you have it, you can either insert it into the first page if you need everything, or just parse it separately.

how can i store/retrieve files in owncloud from a webapp written in opal/ruby?

I have a webapp written mostly in ruby compiled with opal. I now would like to store/retrieve file in my owncloud, maybe using WebDAV. I am looking for an example how to do this using HTTP module.
I tried
HTTP.get("https://owncloud/foo.abc") do |req|
req.username= "user"
...
end.then do |response|
puts response
end
But that does not work. no method then for module HTTP.
So it seem that if I pass a block to HTTP.get it no longer returns a promise.
When I do not pass a block I don' know
how to configure the request.
Best if I could find an full example how to use HTTP from opal.
The small example in opal blog die not hell out.
I think username/password should be passed in the options hash (see the opal-jquery README).
HTTP.get("https://owncloud/foo.abc", username: 'user').then do |response|
puts response
end
A note about the Promise-style:
The block is used as the default form of callback. To switch to promise-style you should not pass any block, instead try assigning the result of HTTP.get to a variable to modify the request options:
req = HTTP.get("https://owncloud/foo.abc")
puts req.inspect # <= do something with the request
req.then do |response|
puts response
end

Forward a file download in sinatra using streaming

I have made a ruby / sinatra website and I need to let the user to download a file.
This file is not local hosted, it is hosted on a remote API. end user must not see the true origin of the file.
get "/files/:elementKey/masterfile" do
content_type "application/octet-stream"
loadMasterfile(params[:elementKey])
end
With loadMasterfile:
http = Net::HTTP.new(plainURI,443)
http.use_ssl = true;
http.start do |http|
req = Net::HTTP::Get.new(resource, {"User-Agent" =>"API downloader"})
req.basic_auth(user.keytechUserName, user.keytechPassword)
response = http.request(req)
# return this as a file attachment
attachment( response["X-Filename"]) #Use the sinatra helper to set this as filename
response.body << This lets sinatra download the file and then forward the whole content to the browser
end
This code works, but:
The file is downloaded first to the ruby/sinatra and then forwarded to the browser.
User must wait until download starts - browser seems to freeze.
Is there a solution to start a download form a remote API and forward the contents in one flow?
I found nothing about that or just found solutions for local file downloads, but I must download a file from a remote API.
I also can not cache the file locally or on Amazon AWS.
Any Ideas?
To achieve this in a streaming fashion in which your app is the proxy, you'll need to send the client chunks as you are downloading chunks. This is not the default behavior of ruby / Net::HTTP, but it is possible.
From the ruby docs:
By default Net::HTTP reads an entire response into memory. If you are handling large files or wish to implement a progress bar you can instead stream the body directly to an IO.
Streaming is possible through read_body, though.
Net::HTTP Streaming Response Bodies
Example usage from the docs:
uri = URI('http://example.com/large_file')
Net::HTTP.start(uri.host, uri.port) do |http|
request = Net::HTTP::Get.new uri
http.request request do |response|
open 'large_file', 'w' do |io|
response.read_body do |chunk|
io.write chunk
end
end
end
end
This example from the docs writes the streaming data to a file, but you could replace it with writes to your response stream. In combination with Sinatra's streaming api, the code might look like this:
get "/files/:elementKey/masterfile" do
content_type "application/octet-stream"
stream do |out|
loadMasterfile(params[:elementKey]) do |chunk|
out << chunk
end
end
end
def loadMasterfile(resource, &block)
http = Net::HTTP.new(plainURI, 443)
http.use_ssl = true;
http.start do |http|
req = Net::HTTP::Get.new(resource, {"User-Agent" =>"API downloader"})
req.basic_auth(user.keytechUserName, user.keytechPassword)
http.request(req) do |origin_repsonse|
origin_repsonse.read_body(&block)
end
end
end
I'm not sure how you'd set the filename. You'd also want to handle errors appropriately in the net calls and stream close. Also note that a front-end like nginx can affect the buffering / chunking of streaming responses.

React on HTTP response before response is done?

Is it possible with any Ruby library to start doing something when a pattern is matched from a HTTP response, before the HTTP session is finished/closed and before the entire result is fetched from the server?
Pseudo code:
http.get 'http://example.org/foo.json' do |response|
run_this_function if /\"field\":\"data\"/ =~ response.body_str
end
I want something similar to odoe.js, but in Ruby.
Normally Net::HTTP will pull the entire body into memory, but you can change that behavior into streaming. From the documentation:
Streaming Response Bodies¶ ↑
By default Net::HTTP reads an entire response into memory. If you are handling large files or wish to implement a progress bar you can instead stream the body directly to an IO.
uri = URI('http://example.com/large_file')
Net::HTTP.start(uri.host, uri.port) do |http|
request = Net::HTTP::Get.new uri
http.request request do |response|
open 'large_file', 'w' do |io|
response.read_body do |chunk|
io.write chunk
end
end
end
end
You'll want your code to camp out in the read_body block. See the documentation for read_body as there is additional information you should be aware of, but basically it says:
If a block is given, the body is passed to the block, and the body is provided in fragments, as it is read in from the socket.

How do I read only x number of bytes of the body using Net::HTTP?

It seems like the methods of Ruby's Net::HTTP are all or nothing when it comes to reading the body of a web page. How can I read, say, the just the first 100 bytes of the body?
I am trying to read from a content server that returns a short error message in the body of the response if the file requested isn't available. I need to read enough of the body to determine whether the file is there. The files are huge, so I don't want to get the whole body just to check if the file is available.
This is an old thread, but the question of how to read only a portion of a file via HTTP in Ruby is still a mostly unanswered one according to my research. Here's a solution I came up with by monkey-patching Net::HTTP a bit:
require 'net/http'
# provide access to the actual socket
class Net::HTTPResponse
attr_reader :socket
end
uri = URI("http://www.example.com/path/to/file")
begin
Net::HTTP.start(uri.host, uri.port) do |http|
request = Net::HTTP::Get.new(uri.request_uri)
# calling request with a block prevents body from being read
http.request(request) do |response|
# do whatever limited reading you want to do with the socket
x = response.socket.read(100);
# be sure to call finish before exiting the block
http.finish
end
end
rescue IOError
# ignore
end
The rescue catches the IOError that's thrown when you call HTTP.finish prematurely.
FYI, the socket within the HTTPResponse object isn't a true IO object (it's an internal class called BufferedIO), but it's pretty easy to monkey-patch that, too, to mimic the IO methods you need. For example, another library I was using (exifr) needed the readchar method, which was easy to add:
class Net::BufferedIO
def readchar
read(1)[0].ord
end
end
Shouldn't you just use an HTTP HEAD request (Ruby Net::HTTP::Head method) to see if the resource is there, and only proceed if you get a 2xx or 3xx response? This presumes your server is configured to return a 4xx error code if the document is not available. I would argue this was the correct solution.
An alternative is to request the HTTP head and look at the content-length header value in the result: if your server is correctly configured, you should easily be able to tell the difference in length between a short message and a long document. Another alternative: set the content-range header field in the request (which again assumes that the server is behaving correctly WRT the HTTP spec).
I don't think that solving the problem in the client after you've sent the GET request is the way to go: by that time, the network has done the heavy lifting, and you won't really save any wasted resources.
Reference: http header definitions
I wanted to do this once, and the only thing that I could think of is monkey patching the Net::HTTP#read_body and Net::HTTP#read_body_0 methods to accept a length parameter, and then in the former just pass the length parameter to the read_body_0 method, where you can read only as much as length bytes.
To read the body of an HTTP request in chunks, you'll need to use Net::HTTPResponse#read_body like this:
http.request_get('/large_resource') do |response|
response.read_body do |segment|
print segment
end
end
Are you sure the content server only returns a short error page?
Doesn't it also set the HTTPResponse to something appropriate like 404. In which case you can trap the HTTPClientError derived exception (most likely HTTPNotFound) which is raised when accessing Net::HTTP.value().
If you get an error then your file wasn't there if you get 200 the file is starting to download and you can close the connection.
You can't. But why do you need to? Surely if the page just says that the file isn't available then it won't be a huge page (i.e. by definition, the file won't be there)?

Resources