Pipe data from HTTP GET to HTTP POST/PUT - ruby

I'd like to stream data from an HTTP GET request to an HTTP POST or PUT request. I'd prefer to use Ruby and have already made an attempt using EventMachine and EM-HTTP-Request.
Here's my attempt, to be called using:
HttpToS3Stream.new(src_url, dest_bucket, dest_key, aws_access_key_id, aws_secret_access_key)
http_to_s3_stream.rb
require 'em-http-request'
class HttpToS3Stream
def initialize(http_url, s3_bucket, s3_key, s3_access_key_id, s3_secret_access_key)
#http_url = http_url
#s3_bucket = s3_bucket
#s3_key = s3_key
#s3_access_key_id = s3_access_key_id
#s3_secret_access_key = s3_secret_access_key
go
end
private
def go
EM.run {
# initialize get stream, without listener does not start request
#get_stream = HttpGetStream.new(#http_url)
# initialize put stream, send content length, request starts
#put_stream = S3PutStream.new(#s3_bucket, #s3_key, #s3_access_key_id, #s3_secret_access_key, #get_stream.content_length)
# set listener on get stream, starts request, pipes data to put stream
#get_stream.listener = #put_stream
}
end
end
http_get_stream.rb
require 'httparty'
require 'em-http-request'
class HttpGetStream
def initialize(http_url, listener = nil)
#http_url = http_url
self.listener = listener
end
def listener=(listener)
#listener = listener
listen unless #listener.nil?
end
def content_length
response = HTTParty.head(#http_url)
response['Content-length']
end
private
def listen
http = EventMachine::HttpRequest.new(#http_url).get
http.stream do |chunk|
#listener.send_data chunk
end
http.callback do |chunk|
EventMachine.stop
end
end
end
s3_put_stream.rb
require 'em-http-request'
class S3PutStream
def initialize(s3_bucket, s3_key, s3_access_key_id, s3_secret_access_key, content_length = nil)
#s3_bucket = s3_bucket
#s3_key = s3_key
#s3_access_key_id = s3_access_key_id
#s3_secret_access_key = s3_secret_access_key
#content_length = content_length
#bytes_sent = 0
listen
end
def send_data(data)
#bytes_sent += data.length
#http.on_body_data data
end
private
def listen
raise 'ContentLengthRequired' if #content_length.nil?
#http = EventMachine::HttpRequest.new(put_url).put(
:head => {
'Content-Length' => #content_length,
'Date' => Time.now.getutc,
'Authorization' => auth_key
}
)
#http.errback { |error| puts "error: #{error}" }
end
def put_url
"http://#{#s3_bucket}.s3.amazonaws.com/#{#s3_key}"
end
def auth_key
"#{#s3_access_key_id}:#{#s3_secret_access_key}"
end
end
HttpToS3Stream.new(src_url, dest_bucket, dest_key, aws_access_key_id, aws_secret_access_key)
It seems to be working but always stops at 33468 bytes. Not sure what that's about. Now, by passing chunks directly to #listener.send_data, it is processing the entire GET body. However, the upload is not occurring successfully.
How can I get this to work? And is there a name for what I'm trying to do? I'm having trouble searching for more information.
Any help is appreciated.

Related

Errno::EBADF: Bad file descriptor with ruby net/http

What can cause while making an HTTP connection to return EBADF (Bad file descriptor).
Here is my following code wherein the HTTP connection is made. Although the error is very less now(happening very less) but before I put those error on rescue I need to understand what is the reason for the EBADF
def make_http_request(url, headers={})
uri = URI(url)
Net::HTTP.start(uri.host, uri.port) do |http|
req = Net::HTTP::Get.new(uri, headers)
resp = http.request(req)
if resp.code.to_i != 200
logger.error "Retrieve #{resp.code} with #{url} and #{headers}"
return false
end
return resp.body
end
rescue SocketError, Net::ReadTimeout, Errno::ECONNREFUSED => e
logger.error "make_http_request #{url} with #{headers} resulted in #{e.message} \n #{e.backtrace}"
return false
end
I have a feeling that connect syscall is receiving an FD which ain't valid at that given point in time. But still unable to understand how can that happens.
If it helps the code is used in an application that operates with multiple threads.
In a nutshell, the definition of the above method looks like this...
module Eval
def make_http_request(url, headers={})
...
...
..
end
def request_local_endpoint(url, headers)
response = make_http_request(url, headers)
response && response.fetch('bravo',nil)
end
def request_external_endpoint(url, headers)
response = make_http_request(url, headers)
response && response.fetch('token',nil)
end
end
class RequestBuilder
include Eval
attr_reader :data
def initialize(data)
#data = data
end
def start
token = request_external_endpoint('http://external.com/endpoint1',{'Content-Type'.freeze => 'application/json', 'Authorization' => 'abcdef'})
return unless token
result = request_local_endpoint('http://internal.com/endpoint1',{'Content-Type'.freeze => 'application/json'})
return result
end
end
10.times {
Thread.new { RequestBuilder.new('sample data').start }
}

API integration error HTTParty

I'm learning how to work with HTTParty and API and I'm having an issue with my code.
Users/admin/.rbenv/versions/2.0.0-p481/lib/ruby/2.0.0/uri/generic.rb:214:in `initialize': the scheme http does not accept registry part: :80 (or bad hostname?)
I've tried using debug_output STDOUT both as an argument to my method and after including HTTParty to have a clue but with no success. Nothing gets displayed:
require 'httparty'
class LolObserver
include HTTParty
default_timeout(1) #timeout after 1 second
attr_reader :api_key, :playerid
attr_accessor :region
def initialize(region,playerid,apikey)
#region = region_server(region)
#playerid = playerid
#api_key = apikey
end
def region_server(region)
case region
when "euw"
self.class.base_uri "https://euw.api.pvp.net"
self.region = "EUW1"
when "na"
self.class.base_uri "https://na.api.pvp.net"
self.region = "NA1"
end
end
def handle_timeouts
begin
yield
#Timeout::Error, is raised if a chunk of the response cannot be read within the read_timeout.
#Timeout::Error, is raised if a connection cannot be created within the open_timeout.
rescue Net::OpenTimeout, Net::ReadTimeout
#todo
end
end
def base_path
"/observer-mode/rest/consumer/getSpectatorGameInfo"
end
def current_game_info
handle_timeouts do
url = "#{ base_path }/#{region}/#{playerid}?api_key=#{api_key}"
puts '------------------------------'
puts url
HTTParty.get(url,:debug_output => $stdout)
end
end
end
I verified my URL which is fine so I'm lost as to where the problem is coming from.
I tested with a static base_uri and it doesn't change anything.
The odd thing is when I do:
HTTParty.get("https://euw.api.pvp.net/observer-mode/rest/consumer/getSpectatorGameInfo/EUW1/randomid?api_key=myapikey")
Everything is working fine and I'm getting a response.
HTTParty doesn't seem to like the way you set your base_uri.
Unless you need it to be like that just add another attr_reader called domain and it will work.
require 'httparty'
class LolObserver
include HTTParty
default_timeout(1) #timeout after 1 second
attr_reader :api_key, :playerid, :domain
attr_accessor :region
def initialize(region,playerid,apikey)
#region = region_server(region)
#playerid = playerid
#api_key = apikey
end
def region_server(region)
case region
when "euw"
#domain = "https://euw.api.pvp.net"
self.region = "EUW1"
when "na"
#domain = "https://na.api.pvp.net"
self.region = "NA1"
end
end
def handle_timeouts
begin
yield
#Timeout::Error, is raised if a chunk of the response cannot be read within the read_timeout.
#Timeout::Error, is raised if a connection cannot be created within the open_timeout.
rescue Net::OpenTimeout, Net::ReadTimeout
#todo
end
end
def base_path
"/observer-mode/rest/consumer/getSpectatorGameInfo"
end
def current_game_info
handle_timeouts do
url = "#{domain}/#{ base_path }/#{region}/#{playerid}?api_key=#{api_key}"
puts '------------------------------'
puts url
HTTParty.get(url,:debug_output => $stdout)
end
end
end

WARN TCPServer Error: Address already in use - bind(2) in linux EC2 and Heroku servers

[2013-01-29 09:17:50] INFO WEBrick 1.3.1
[2013-01-29 09:17:50] INFO ruby 1.8.7 (2012-10-12) [i386-linux]
[2013-01-29 09:17:50] WARN TCPServer Error: Address already in use - bind(2)
[2013-01-29 09:17:50] INFO WEBrick::HTTPServer#start: pid=4107 port=8080
When I run the file attached below in linux I get the error described. I tried all possible command and strategies online to listen to processes (including rogue) and kill them. I did this in lots of ports. No luck.
As soon as I run the script in Mac OS and it works. Nevertheless I have to mount it on a server and clients have to communicate with it. It happens on every instance of amazon ec2 and on heroku. I have seen this error one too many times and spend many hours trying to fix it. I configured the security group of ec2 instances and still did not work. I am beyond desperate. At this point I have to think that the problem must be WEBrick itself or something in my code.
require 'webrick'
require 'uri'
require 'net/http'
$own_address = 8080
class AuctionInfo
# The representation is a hash mapping item names to [highest_bidder, highest_bid, end_time]
def initialize
#data = {}
end
def new_item(item, endTime)
#data[item] = ["UNKNOWN", 0, endTime]
end
def bid(item, bid, client)
if #data.has_key?(item)
endTime = #data[item][2]
if #data[item][1].to_i < bid.to_i and Time.new.to_i < endTime.to_i
#data[item] = [client, bid, endTime]
end
end
end
def get_status(item)
if #data.has_key?(item)
return #data[item][0]
end
end
def winner(item)
if #data.has_key?(item)
if #data[item][2].to_i + 1 <= Time.new.to_i
return #data[item][0]
else return "UNKNOWN"
end
end
end
def reset
#data = {}
end
def has_item(item)
return #data.has_key?(item)
end
def get_data
return {}.replace(#data)
end
end
class StartAuctionServlet < WEBrick::HTTPServlet::AbstractServlet
def initialize(server, data)
#data = data
end
def do_POST(request, response)
if request.query['name'] and request.query['end_time']
#data.new_item(request.query['name'], request.query['end_time'].to_i)
end
response.status = 200
end
alias_method :do_GET, :do_POST
end
class BidServlet < WEBrick::HTTPServlet::AbstractServlet
def initialize(server, data)
#data = data
end
def do_POST(request, response)
if request.query['name'] and request.query['client'] and request.query['bid']
#data.bid(request.query['name'], request.query['bid'].to_i, request.query['client'])
end
response.status = 200
end
alias_method :do_GET, :do_POST
end
class StatusServlet < WEBrick::HTTPServlet::AbstractServlet
def initialize(server, data)
#data = data
end
def do_GET(request, response)
if request.query['name']
response.body = #data.get_status(request.query['name'])
end
response.status = 200
end
alias_method :do_POST, :do_GET
end
class WinnerServlet < WEBrick::HTTPServlet::AbstractServlet
def initialize(server, data)
#data = data
end
def do_GET(request, response)
if request.query['name']
response.body = #data.winner(request.query['name'])
end
response.status = 200
end
alias_method :do_POST, :do_GET
end
class ResetServlet < WEBrick::HTTPServlet::AbstractServlet
def initialize(server, data)
#data = data
end
def do_POST(request, response)
#data.reset
response.status = 200
end
alias_method :do_GET, :do_POST
end
class RandomServlet < WEBrick::HTTPServlet::AbstractServlet
def initialize(server, data)
#data = data
end
def do_GET(request, response)
response.status = 200
response.body = #data.get_data.to_s
end
alias_method :do_POST, :do_GET
end
data = AuctionInfo.new
server = WEBrick::HTTPServer.new(:Port => $own_address)
server.mount '/start_auction', StartAuctionServlet, data
server.mount '/bid', BidServlet, data
server.mount '/status', StatusServlet, data
server.mount '/winner', WinnerServlet, data
server.mount '/rst', ResetServlet, data
server.mount '/', RandomServlet, data
trap("INT") { server.shutdown }
server.start
Have you checked whether the linux server is running apache, tomcat, trinidad or any other web server? Odds are one of them is already running on port 8080 on the server.
lsof is a useful command. Try lsof | grep 8080 and see whether anything shows up

em-http-request unexpected result when using tor as proxy

I've created a gist which shows exactly what happens.
https://gist.github.com/4418148
I've tested a version which used ruby's 'net/http' library and 'socksify/http' and it worked perfect but if the EventMachine version returns an unexpected result.
The response in Tor Browser is correct but using EventMachine is not!
It return a response but it's not the same as returned response when you send the request via browser, net/http with or without proxy.
For convenience, I will also paste it here.
require 'em-http-request'
DEL = '-'*40
#results = 0
def run_with_proxy
connection_opts = {:proxy => {:host => '127.0.0.1', :port => 9050, :type => :socks5}}
conn = EM::HttpRequest.new("http://www.apolista.de/tegernsee/kloster-apotheke", connection_opts)
http = conn.get
http.callback {
if http.response.include? "Oops"
puts "#{DEL}failed with proxy#{DEL}", http.response
else
puts "#{DEL}success with proxy#{DEL}", http.response
end
#results+=1
EM.stop_event_loop if #results == 2
}
end
def run_without_proxy
conn = EM::HttpRequest.new("http://www.apolista.de/tegernsee/kloster-apotheke")
http = conn.get
http.callback {
if http.response.include? "Oops"
puts "#{DEL}failed without proxy#{DEL}", http.response
else
puts "#{DEL}success without proxy#{DEL}", http.response
end
#results+=1
EM.stop_event_loop if #results == 2
}
end
EM.run do
run_with_proxy
run_without_proxy
end
Appreciate any clarification.

Same request sent twice has two different responses

Please consider this test:
def test_ok_on_second_request
bad_response = #request.get "/bad-response"
assert_equal 404, bad_response.status
good_response = #request.get "/test-title"
assert_equal 200, good_response.status
assert_equal "text/html", good_response.content_type
end
I have assured that /test-title is a valid path. The assertion that's supposed to return 200 is in fact returning 404. How is Rack behaving in order to return two different results for the same request?
This is the code for the Server class inside the project:
module Blogrite
class Server
attr_accessor :status, :mimetype, :body, :provider
def initialize *args, &block
#status, #mimetype = 200, "text/html"
provider = args[0][:with].nil? ? :filesystem : args[0][:with]
#provider = Blogrite.const_get(provider.capitalize).new
# p "Server is running with #{#provider.class}."
end
def call env
begin
article = go env['PATH_INFO'].delete("/")
rescue Blogrite::Article::NoBodyError
#status = 404
end
#status = 404 if !article
#status = 403 if env["REQUEST_METHOD"] == 'POST'
#mimetype = "text/css" if env["PATH_INFO"].include?("css")
#body = if article then article.render
elsif env.respond_to?(:to_yaml) then "<pre>#{env.to_yaml}</pre>"
else "oops"
end
[#status,{ "Content-Type" => #mimetype},[#body]]
end
def go path
f = #provider.fetch path
Article.parse f unless f.nil?
end
end
end
The whole workflow is too big for me to paste it in but you can check the project out on Github. I appreciate your help, thank you.
The solution for the problem is as simple as initializing #status inside the call function.
class Server
attr_accessor :status, :mimetype, :body, :provider
def initialize *args, &block
- #status, #mimetype = 200, "text/html"
provider = args[0][:with].nil? ? :filesystem : args[0][:with]
#provider = Blogrite.const_get(provider.capitalize).new
# p "Server is running with #{#provider.class}."
end
def call env
begin
- article = go env['PATH_INFO'].delete("/")
+ #status, #mimetype = 200, "text/html"
+ article = go env['PATH_INFO'].delete("/")
rescue Blogrite::Article::NoBodyError
#status = 404
end
That way the rack instance – that is called only once – stays out of the request's way. Every call function should have its own defaults, not the server class.
Thanks to #rubenfonseca for helping me out.

Resources