How to make persistent HTTP requests using multiple threads in Ruby/Faraday? - ruby

I'm using faraday with net-http-persistent adapter to make HTTP requests.
I want to optimise my requests by making them execute asynchronously but as I need a persistent connection I keep getting errors such as too many connections reset which I assume is due to the fact that I have multiple threads creating new connections.
I tried changing my adapter to typhoeus but as the connection is not persistent the final result of executing all request is not as expected.
My goal is to add items to a basket by making this HTTP requests. Without the persistent connection items are not added to the basket.
So, my question is:
Is it possible to make persistent HTTP requests reusing the connection between threads? If so, how can this be achieved?
Here is a piece of my code:
Create the connection:
Faraday.new do |c|
c.use :cookie_jar, jar: cookie_jar
c.options.open_timeout = 5
c.options.timeout = 10
c.request :url_encoded
c.response :logger, logger
c.adapter :net_http_persistent do |http| # yields Net::HTTP::Persistent
http.idle_timeout = 2
end
end
Creating threads and getting the result of each one of them
result = []
threads = []
total_items = items.size
items.each_slice(5) do |sliced_items|
# Create a thread for a batch of 5 items and store its result
threads << Thread.new do
Thread.current[:output] = browser.add_all_items(sliced_items)
end
end
# Wait for all threads to finish their work and store their output into result
threads.each do |t|
t.join
result << t[:output]
end
add_all_items and add_to_cart methods:
# Add a batch of items by the key passed (id, gtin, url)
def add_all_items(items_info, key)
results = []
items_info.each do |item|
begin
add_to_cart(item[key], item[:quantity])
item[:message] = nil
rescue => e
item[:message] = e.message
puts "---------- BACKTRACE -------------- \n #{e.backtrace}"
end
puts "\n--------- MESSAGE = #{item[:message]} --------- \n"
results << item
puts "-------- RESULTS #{results}"
end
results
end
def add_to_cart(url, count = 1)
response = connection.get(url) do |req|
req.headers["User-Agent"] = #user_agent
end
doc = Nokogiri::HTML(response.body)
stoken = doc.search('form.js-oxProductForm input[name=stoken]').attr('value').value
empty_json = '""'
product_id = get_item_id(url)
data = { #removed payload for security reasons }
# Using example.com for question purposes
response = connection.post('https://www.example.com/index.php?') do |req|
req.headers["Origin"] = "https://www.example.com"
req.headers["Content-Type"] = "application/x-www-form-urlencoded; charset=UTF-8"
req.headers["Accept"] = "application/json, text/javascript, */*; q=0.01"
req.headers["Referer"] = url
req.headers["Pragma"] = "no-cache"
req.headers["Accept-Language"] = "de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7"
req.headers["User-Agent"] = #user_agent
req.headers["Cache-Control"] = "no-cache"
req.headers["Connection"] = "keep-alive"
req.headers["DNT"] ="1"
req.headers["Content-Length"] = data.size.to_s
req.headers["Accept"] = "*/*"
req.headers["X-Requested-With"] = "XMLHttpRequest"
req.headers["Connection"] = "keep-alive"
req.body = data
end
begin
json = JSON.parse(response.body)
raise "Could not add item: #{json['message']}" if json['success'] != 1 || json['item'] != product_id
rescue JSON::ParserError => e
puts "JSON Error"
end
end
def get_item_id(url)
response = connection.get(url) do |req|
req.headers["User-Agent"] = #user_agent
end
doc = Nokogiri::HTML(response.body)
doc.search('.js-oxProductForm input[name=aid]').attr('value').value
end
Thanks in advance.

Related

Add multithreads/concurency in script

I created a script which checks healthcheck and ports status from a .json file populated with microservices.
So for every microservice from the .json file the script will output the HTTP status and healthcheck body and other small details, and I want to add multithreading here in order to return all the output at once.Please see the script below:
#!/usr/bin/env ruby
... get the environment argument part...
file = File.read('./services.json')
data_hash = JSON.parse(file)
threads = []
service = data_hash.keys
service.each do |microservice|
threads << Thread.new do
begin
puts "Microservice: #{microservice}"
port = data_hash["#{microservice}"]['port']
puts "Port: #{port}"
nodes = "knife search 'chef_environment:#{env} AND recipe:#{microservice}' -i"
node = %x[ #{nodes} ].split
node.each do |n|
puts "Node: #{n}"
uri = URI("http://#{n}:#{port}/healthcheck?count=10")
res = Net::HTTP.get_response(uri)
status = Net::HTTP.get(uri)
puts res.code
puts status
puts res.message
end
rescue Net::ReadTimeout
puts "ReadTimeout Error"
next
end
end
end
threads.each do |thread|
thread.join
end
Anyway in this way the script return first the puts "Microservice: #{microservice}" and puts "Port: #{port}" and after this it will return the nodes and only after the STATUS.
How can I return all the data for each loop together?
Instead of puts write output to a variable (hash).
If you wand to wait for all threads to finish their job before showing the output, use ThreadsWait class.
require 'thwait'
file = File.read('./services.json')
data_hash = JSON.parse(file)
h = {}
threads = []
service = data_hash.keys
service.each do |microservice|
threads << Thread.new do
thread_id = Thread.current.object_id.to_s(36)
begin
h[thread_id] = "Microservice: #{microservice}"
port = data_hash["#{microservice}"]['port']
h[thread_id] << "Port: #{port}"
nodes = "knife search 'chef_environment:#{env} AND recipe:#{microservice}' -i"
node = %x[ #{nodes} ].split
node.each do |n|
h[thread_id]<< "Node: #{n}"
uri = URI("http://#{n}:#{port}/healthcheck?count=10")
res = Net::HTTP.get_response(uri)
status = Net::HTTP.get(uri)
h[thread_id] << res.code
h[thread_id] << status
h[thread_id] << res.message
end
rescue Net::ReadTimeout
h[thread_id] << "ReadTimeout Error"
next
end
end
end
threads.each do |thread|
thread.join
end
# wait untill all threads finish their job
ThreadsWait.all_waits(*threads)
p h
[edit]
ThreadsWait.all_waits(*threads) is redundant in above code and can be omitted, since line treads.each do |thread| thread.join end does exactely the same thing.
Instead of outputting the data as you get it using puts, you can collect it all in a string and then puts it once at the end. Strings can take the << operator (implemented as a method in Ruby), so you can just initialize the string, add to it, and then output it at the end, like this:
report = ''
report << 'first thing'
report << 'second thing'
puts report
You could even save them all up together and print them all after all were finished if you want.

Workaround for Timeouts with Http.rb and Celluloid?

I know that current timeouts are currently not supported with Http.rb and Celluloid[1], but is there an interim workaround?
Here's the code I'd like to run:
def fetch(url, options = {} )
puts "Request -> #{url}"
begin
options = options.merge({ socket_class: Celluloid::IO::TCPSocket,
timeout_class: HTTP::Timeout::Global,
timeout_options: {
connect_timeout: 1,
read_timeout: 1,
write_timeout: 1
}
})
HTTP.get(url, options)
rescue HTTP::TimeoutError => e
[do more stuff]
end
end
Its goal is to test a server as being live and healthy. I'd be open to alternatives (e.g. %x(ping <server>)) but these seem less efficient and actually able to get at what I'm looking for.
[1] https://github.com/httprb/http.rb#celluloidio-support
You can set timeout on future calls when you fetch for the request
Here is how to use timeout with Http.rb and Celluloid-io
require 'celluloid/io'
require 'http'
TIMEOUT = 10 # in sec
class HttpFetcher
include Celluloid::IO
def fetch(url)
HTTP.get(url, socket_class: Celluloid::IO::TCPSocket)
rescue Exception => e
# error
end
end
fetcher = HttpFetcher.new
urls = %w(http://www.ruby-lang.org/ http://www.rubygems.org/ http://celluloid.io/)
# Kick off a bunch of future calls to HttpFetcher to grab the URLs in parallel
futures = urls.map { |u| [u, fetcher.future.fetch(u)] }
# Consume the results as they come in
futures.each do |url, future|
# Wait for HttpFetcher#fetch to complete for this request
response = future.value(TIMEOUT)
puts "*** Got #{url}: #{response.inspect}\n\n"
end

Ruby Mechanize Stops Working while in Each Do Loop

I am using a mechanize Ruby script to loop through about 1,000 records in a tab delimited file. Everything works as expected until i reach about 300 records.
Once I get to about 300 records, my script keeps calling rescue on every attempt and eventually stops working. I thought it was because I had not properly set max_history, but that doesn't seem to be making a difference.
Here is the error message that I start getting:
getaddrinfo: nodename nor servname provided, or not known
Any ideas on what I might be doing wrong here?
require 'mechanize'
result_counter = 0
used_file = File.open(ARGV[0])
total_rows = used_file.readlines.size
mechanize = Mechanize.new { |agent|
agent.open_timeout = 10
agent.read_timeout = 10
agent.max_history = 0
}
File.open(ARGV[0]).each do |line|
item = line.split("\t").map {|item| item.strip}
website = item[16]
name = item[11]
if website
begin
tries ||= 3
page = mechanize.get(website)
primary1 = page.link_with(text: 'text')
secondary1 = page.link_with(text: 'other_text')
contains_primary = true
contains_secondary = true
unless contains_primary || contains_secondary
1.times do |count|
result_counter+=1
STDERR.puts "Generate (#{result_counter}/#{total_rows}) #{name} - No"
end
end
for i in [primary1]
if i
page_to_visit = i.click
page_found = page_to_visit.uri
1.times do |count|
result_counter+=1
STDERR.puts "Generate (#{result_counter}/#{total_rows}) #{name}"
end
break
end
end
rescue Timeout::Error
STDERR.puts "Generate (#{result_counter}/#{total_rows}) #{name} - Timeout"
rescue => e
STDERR.puts e.message
STDERR.puts "Generate (#{result_counter}/#{total_rows}) #{name} - Rescue"
end
end
end
You get this error because you don't close the connection after you used it.
This should fix your problem:
mechanize = Mechanize.new { |agent|
agent.open_timeout = 10
agent.read_timeout = 10
agent.max_history = 0
agent.keep_alive = false
}

Ruby HTTP Post containing multiple parameters and a body

I need to post using three parameters and a body which consists of 512 bytes. I can get the body right but I can't seem to get the parameters to take:
require 'net/http'
#ip_address = Array['cueserver.dnsalias.com']
#cueserver = 0
#playback = 'p1'
def send_cuescript(data)
params = {'id' => '1', 'type' => "20",'dst' => 'RES' }
begin
url = URI.parse('http://'+ #ip_address[#cueserver] + '/set.cgi')
http = Net::HTTP.new(url.host, url.port)
response, body = http.post(url.path, params, data)
rescue Timeout::Error, Errno::EINVAL, Errno::ECONNRESET, EOFError,
Net::HTTPBadResponse, Net::HTTPHeaderSyntaxError, Net::ProtocolError => e
end
response_array = []
puts 'got this value: ' + response.to_s
response.body.each_byte { |e| response_array.push(e.to_s(16))}
end
data_array = Array.new(512, "\x80")
send_cuescript(data_array.join)
I am getting an error from the initialize_http_header. I know there must be a way to set the parameters and the body separately but I can't seem to find any reference to this.
Why do you have to send part of the params in the url and part of it in the body?
If you have to do this, try
url = URI.parse('http://'+ #ip_address[#cueserver] + '/set.cgi?' + params.to_param)
PS: to_param is from active support. You need to write your own if you are not using active support.

better ruby - possible to not repeat parts of a function?

I have a little sinatra api that I'm trying to beautify. Most of my routes are simple db operations but a few involve calling an external service before doing db operations. In all cases most of the code is the same except for how I respond to the service response. Is there any slick functional programming approach?
Here's an example of one of these routes:
get '/update_x' do
validateParams(params,:x)
xid = params[:x]
xName = getNameFromId(xid)
if xName
# Make request to proxy service
rid = generateRandomHexNumber(16) # generate requestId
params['m'] = 'set'
params['rid'] = rid
json = "{}"
begin
response = #resource["/"+"?rid=#{rid}&id=#{xid}&json=#{json}"].get
status = response.code
body = response.body
parsed_json = JSON(body)
if parsed_json['response'] and parsed_json['response']['success'] and parsed_json['response']['success']=='false'
msg = {:success => "false", :response => "unknown error"}
if parsed_json['response']['response']
msg = {:success => "false", :response => parsed_json['response']['response']}
end
content_type :json
msg.to_json
else
#### Here is stuff specific to this api call
updateDBHelper(xid,buildUpdateOptions(params))
params['ss_status'] = status
content_type :json
params.to_json
#### End specific to api call
end
rescue Exception=>e
params['ss_status'] = status
params['exception'] = e
content_type :json
params.to_json
end
else
msg = {:success => "false", :response => "Not found"}
content_type :json
msg.to_json
end
end
In general if you have a general pattern with some arbitrary code that changes every time then the simplest thing would be accepting a block with those customizations.
def make_api_request(some, params)
# do what you need to do
yield(variables, that, your_custom_code, needs)
# do some more, maybe cleanup
end
get '/some_route' do
make_api_request do |variables, that, your_custom_code, needs|
# do custom stuff here
end
end

Resources