Is it possible to get the headers of a request using Ruby's HTTPClient gem before the request completes? - ruby

I have a requirement to proxy a request in a Rails app. I was hoping I could proxy it with chunking (so, 1 chunk received, one chunk is sent). The app is working fine without chunking (load the request into memory, and transmit).
Here is my code to proxy the chunks through to the end-client:
self.response.headers['Last-Modified'] = Time.now.ctime.to_s
self.response_body = Enumerator.new do |y|
client = HTTPClient.new
http_response = client.get(proxy_url, nil, headers) do |chunk|
y << chunk
end
end
The problem is, I can't inspect "http_response" until all the chunks have been received, thus I can't set the headers based on the headers of the client.
What I'm trying to do is transmit the headers returned from the client before the first chunk is sent. Is this possible?
If not, is this pattern possible in any other Ruby HTTP client gem?

Update
I have a solution for you.
If you call get_async instead, it will retun immediately with an HTTPClient::Connection object that is updated with the header information as soon as it is received. This code sample demonstrates.
The patch to HTTPClient::Connection is almost certainly not necessary for you, but it lets you write things like conn.queue.size? and conn.queue.empty?.
conn.pop blocks until the response (or exception) has been pushed to the queue by the async thread and then returns the normal HTTP::Message object. (Note that, if you are using the monkey patch, you can use conn.queue.empty? to see if pop is going to block.)
resp.content returns an IO object which is a pipe read endpoint, and can be called as soon as pop hs returned. The other end is written by the async thread as the data arrives, and you can read the entire content in one go or in whatever size chunks you like using read.
require 'httpclient'
class HTTPClient::Connection
attr_reader :queue
end
client = HTTPClient.new
conn = client.get_async 'http://en.wikipedia.org/wiki/Ruby_(programming_language)'
resp = conn.pop
resp.header.all.each { |name, val| puts "#{name}=#{val}" }
puts
pipe = resp.content
while chunk = pipe.read(8192)
print chunk
end
You could parse the first chunk you receive to extract the headers, but I suggest you call head first to get the header information. Then do the get as well.
(Updated - the first chunk holds the beginning of the content so this won't work.)

Related

Pushing data to websocket browser client in Lua

I want to use a NodeMCU device (Lua based top level) to act as a websocket server to 1 or more browser clients.
Luckily, there is code to do this here: NodeMCU Websocket Server
(courtesy of #creationix and/or #moononournation)
This works as described and I am able to send a message from the client to the NodeMCU server, which then responds based on the received message. Great.
My questions are:
How can I send messages to the client without it having to be sent as a response to a client request (standalone sending of data)? When I try to call socket.send() socket is not found as a variable, which I understand, but cannot work out how to do it! :(
Why does the decode() function output the extra variable? What is this for? I'm assuming it will be for packet overflow, but I can never seem to make it return anything, regardless of my message length.
In the listen method, why has the author added a queuing system? is this essential or for applications that perhaps may receive multiple simultaneous messages? Ideally, I'd like to remove it.
I have simplified the code as below:
(excluding the decode() and encode() functions - please see the link above for the full script)
net.createServer(net.TCP):listen(80, function(conn)
local buffer = false
local socket = {}
local queue = {}
local waiting = false
local function onSend()
if queue[1] then
local data = table.remove(queue, 1)
return conn:send(data, onSend)
end
waiting = false
end
function socket.send(...)
local data = encode(...)
if not waiting then
waiting = true
conn:send(data, onSend)
else
queue[#queue + 1] = data
end
end
conn:on("receive", function(_, chunk)
if buffer then
buffer = buffer .. chunk
while true do
local extra, payload, opcode = decode(buffer)
if opcode==8 then
print("Websocket client disconnected")
end
--print(type(extra), payload, opcode)
if not extra then return end
buffer = extra
socket.onmessage(payload, opcode)
end
end
local _, e, method = string.find(chunk, "([A-Z]+) /[^\r]* HTTP/%d%.%d\r\n")
local key, name, value
for name, value in string.gmatch(chunk, "([^ ]+): *([^\r]+)\r\n") do
if string.lower(name) == "sec-websocket-key" then
key = value
break
end
end
if method == "GET" and key then
acceptkey=crypto.toBase64(crypto.hash("sha1", key.."258EAFA5-E914-47DA-95CA-C5AB0DC85B11"))
conn:send(
"HTTP/1.1 101 Switching Protocols\r\n"..
"Upgrade: websocket\r\nConnection: Upgrade\r\n"..
"Sec-WebSocket-Accept: "..acceptkey.."\r\n\r\n",
function ()
print("New websocket client connected")
function socket.onmessage(payload,opcode)
socket.send("GOT YOUR DATA", 1)
print("PAYLOAD = "..payload)
--print("OPCODE = "..opcode)
end
end)
buffer = ""
else
conn:send(
"HTTP/1.1 200 OK\r\nContent-Type: text/plain\r\nContent-Length: 12\r\n\r\nHello World!",
conn.close)
end
end)
end)
I can only answer 1 question, the others may be better suited for the library author. Besides, SO is a format where you ask 1 question normally.
How can I send messages to the client without it having to be sent as a response to a client request (standalone sending of data)?
You can't. Without the client contacting the server first and establishing a socket connection the server wouldn't know where to send the messages to. Even with SSE (server-sent events) it's the client that first initiates a connection to the server.

Tornado cancel httpclient.AsyncHTTPClient fetch() from on_chunk()

Inside one of the handlers I am doing the following:
async def get(self):
client = httpclient.AsyncHTTPClient()
url = 'some url here'
request = httpclient.HTTPRequest(url=url, streaming_callback=self.on_chunk, request_timeout=120)
result = await client.fetch(request)
self.write("done")
#gen.coroutine
def on_chunk(self, chunk):
self.write(chunk)
yield self.flush()
The requests can sometimes be quite large and the client may leave while the request is still in progress of being fetched and pumped to the client. If this happens an exception will appear in the on_chunk function when self.write() is attempted. My question is how do I abort the remaining download if my client went away ?
If your streaming_callback raises an exception, the client request should be aborted. This will spam the logs with stack traces, but there's not currently a cleaner way to do it. You can override on_connection_close to detect when the client has disconnected and set an attribute on self that you can check in on_chunk.

How to synchronize data between multiple workers

I've the following problem that is begging a zmq solution. I have a time-series data:
A,B,C,D,E,...
I need to perform an operation, Func, on each point.
It makes good sense to parallelize the task using multiple workers via zmq. However, what is tripping me up is how do I synchronize the result, i.e., the results should be time-ordered exactly the way the input data came in. So the end result should look like:
Func(A), Func(B), Func(C), Func(D),...
I should also point out that time to complete,say, Func(A) will be slightly different than Func(B). This may require me to block for a while.
Any suggestions would be greatly appreciated.
You will always need to block for a while in order to synchronize things. You can actually send requests to a pool of workers, and when a response is received - to buffer it if it is not a subsequent one. One simple workflow could be described in a pseudo-language as follows:
socket receiver; # zmq.PULL
socket workers; # zmq.DEALER, the worker thread socket is started as zmq.DEALER too.
poller = poller(receiver, workers);
next_id_req = incr()
out_queue = queue;
out_queue.last_id = next_id_req
buffer = sorted_queue;
sock = poller.poll()
if sock is receiver:
packet_N = receiver.recv()
# send N for processing
worker.send(packet_N, ++next_id_req)
else if sock is workers:
# get a processed response Func(N)
func_N_response, id = workers.recv()
if out_queue.last_id != id-1:
# not subsequent id, buffer it
buffer.push(id, func_N_rseponse)
else:
# in order, push to out queue
out_queue.push(id, func_N_response)
# also consume all buffered subsequent items
while (out_queue.last_id == buffer.min_id() - 1):
id, buffered_N_resp = buffer.pop()
out_queue.push(id, buffered_N_resp)
But here comes the problem what happens if a packet is lost in the processing thread(the workers pool).. You can either skip it after a certain timeout(flush the buffer into the out queue), amd continue filling the out queue, and reorder when the packet comes later, if ever comes.

How to test deferred action - EventMachine

I have a Sinatra app that runs inside of EventMachine. Currently, I am taking a post request of JSON data, deferring storage, and returning a 200 OK status code. The deferred task simply pushes the data to a queue and increments a stats counter. The code is similar to:
class App < Sinatra::Base
...
post '/' do
json = request.body.read
operation = lambda do
push_to_queue(json)
incr_incoming_stats
end
callback = lambda {}
EM.defer(operation, callback)
end
...
end
My question is, how do I test this functionality. If I use Rack::Test::Methods, then I have to put in something like sleep 1 to make sure the deferred task has completed before checking the queue and stats such that my test may look like:
it 'should push data to queue with valid request' do
post('/', #json)
sleep 1
#redis.llen("#{#opts[:redis_prefix]}-queue").should > 0
end
Any help is appreciated!
The solution was pretty simple and once I realized it, I felt kind of silly. I created a test-helper that contained the following:
module EM
def self.defer(op, callback)
callback.call(op.call)
end
end
Then just include this into your test-files. This way the defer method will just run the operation and callback on the same thread.

How to calculate the amount of data downloaded and the total data to be downloaded in Ruby

I'm trying to build a desktop client that manages some downloads with Ruby. I would like to know how to go about trying to identify how much of the data is downloaded and the size of the entire data that is to be downloaded.
Im trying to do this with Ruby so any help would be useful.
Thanks in advance.
Like Wayne said in his comment, it depends on the protocol that is used to transfer the files. With HTTP for example, the HTTP response will include a Content-Length header which will tell you the length of the file that you are downloading. After you know that you will have to keep track of the number of bytes that you've read from the HTTP connection.
Something like this seems to work (for HTTP), but I wouldn't be surprised if it could be done more elegantly:
require 'net/http'
url = URI.parse('http://www.google.com/index.html')
req = Net::HTTP::Get.new(url.path)
res = Net::HTTP.start(url.host, url.port) do |http|
http.request(req) do |res|
remaining = res.content_length
puts "total length: #{remaining}"
res.read_body do |segment|
puts "read #{segment.length} bytes"
remaining = remaining - segment.length
puts "#{remaining} bytes remaining"
end
end
end
www.google.com/index.html is a bad example since the content gets returned in one segment, but try it on a larger object and you should see multiple "read..." lines.
If you're using Net::HTTP then the length of whatever you're requesting should be in the response header. Net::HTTP mixin NET::HTTPHeader, in it you'll find content_length(). Although it only works if the size is determined before the transfer happens.
Net::HTTPResponse has a method that reads the body in chunks, so you can use that to determine the progress. Start at 0 and add the length of each chunk, compare it to the total size and you're done.
http.request_get('/index.html') {|res|
res.read_body do |segment|
print segment
end
} #Example taken from Ruby-Documentation
If you're using FTP then it should be easier through NET::FTP. Connect to the server, get the size of a given file with size(filename), and then download the file with get, getbinaryfile or gettextfile.
This is the signature of the get method: get(remotefile, localfile = File.basename(remotefile), blocksize = DEFAULT_BLOCKSIZE) {|data| ...}
ftp.get('file.something', 'file.something.local', 1024){ |data|
puts "Downloaded 1024 more bytes"
}

Resources