In a Ruby script I'm having a problem with socket connections.
What I am doing is the following:
I have two threads and each one creates a connection to a different web server
Any time thread 1 receives data from server 1, I want thread 1 to post this data to server 2
Any time thread 2 receives data from server 2, I want thread 2 to post this data to server 1
Basically I am kind of acting as a bridge between the 2 servers.
Code looks like this:
require 'uri'
require 'net/http'
require 'json'
#connection1 = Net::HTTP.start 'server1.com'
#connection2 = Net::HTTP.start 'server2.com'
# reads data from server 1 as it comes and sends it to server 2
Thread.new{
while JSON.parse(#connection1.post('/receive').body) !nil
#connection2.post '/send', JSON.parse(#connection1.post('/receive').body)
end
}
# reads data from server 2 as it comes and sends it to server 2
while JSON.parse(#connection2.post('/receive').body) !nil
#connection1.post '/send', JSON.parse(#connection2.post('/receive').body)
end
# Thread.join
# not actually needed because the two connections are supposed to continuously stream data
However as soon as one of the two connections receives data and tries sending it to the other connection I'm receiving the following error:
Socket operation on non-socket - Errno::ENOTSOCK
More in deep stack trace:
C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/protocol.rb:176:in
wait_readable': socket operation on non-socket. (Errno::ENOTSOCK)
from C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/protocol.rb:176:in 'rbuf_fill'
from C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/protocol.rb:154:in 'readuntil'
from C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/protocol.rb:164:in 'readline'
from C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/http/response.rb:40:in
'read_status_line'
from C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/http/response.rb:29:in 'read_new'
from C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/http.rb:1446:in block in 'transport_request'
from C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/http.rb:1443:in 'catch'
from C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/http.rb:1443:in 'transport_request'
from C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/http.rb:1416:in 'request'
from C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/http.rb:1430:in 'send_entity'
from C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/http.rb:1218:in 'post'
So what do you think I am doing wrong?
I should add that for reasons beyond my control the two remote servers are configured to serve data when contacted with a POST rather than with a GET.
Core problem
You lack any sort of synchronization between both threads and Net::HTTP is not thread-safe.
What's possibly happening here is that you call #connection1.post /receive in one thread, that said thread gets paused and the second thread tries to use #connection1.post /send while connection1 is still being used.
Another problem is that your code in inefficient, you issue two /receive requests per thread to get information.
while JSON.parse(#connection1.post('/receive').body) !nil
#connection2.post '/send', JSON.parse(#connection1.post('/receive').body)
end
This makes three requests total
Could be
while True
result = JSON.parse(#connection1.post('/receive').body)
break if result.nil?
#connection2.post '/send', result)
end
This makes two requests total
Suggested Solution
Use a Mutex to make sure that while connection1 is sending/receiving a request, no other thread touches it.
require 'uri'
require 'net/http'
require 'json'
#connection1 = Net::HTTP.start 'server1.com'
#connection2 = Net::HTTP.start 'server2.com'
connection_1_lock = Mutex.new
connection_2_lock = Mutex.new
# reads data from server 1 as it comes and sends it to server 2
Thread.new do
while True
receive_result = nil
connection_1_lock.synchronize do
receive_result = JSON.parse(#connection1.post('/receive').body)
end
connection_2_lock.synchronize do
#connection2.post '/send', receive_result
end
end
end
Thread.new do
while True
receive_result = nil
connection_2_lock.synchronize do
receive_result = JSON.parse(#connection2.post('/receive').body)
end
connection_1_lock.synchronize do
#connection1.post '/send', receive_result
end
end
end
I believe the code above should fix your problem, although I cannot guarantee it. Concurrent programming is hard.
Further reading:
I suggest you read up on concurrent/multithreaded programming and its pitfalls. There are numerous Ruby resources online.
Since Ruby's documentation on Mutex is notoriously bad, I'll shamelessly plug my own article here and suggest you read it:
https://dev.to/enether/working-with-multithreaded-ruby-part-i-cj3 (The 'How To Protect Yourself' paragraph introduces mutexes)
Related
I am trying to ping a large amount of urls and retrieve information regarding the certificate of the url. As I read in this thoughtbot article here Thoughtbot Threads and others, I've read that the best way to do this is by using Threads. When I implement threads however, I keep running into Timeout errors and other problems for urls that I can retrieve successfully on their own. I've been told in another related question that I asked earlier that I should not use Timeout with Threads. However, the examples I see wrap API/NET::HTTP/TCPSocket calls in the Timeout block and based opn what I've read, that entire API/NET::HTTP/TCP Socket call will be nested within the Thread. Here is my code:
class SslClient
attr_reader :url, :port, :timeout
def initialize(url, port = '443', timeout = 30)
#url = url
#port = port
#timeout = timeout
end
def ping_for_certificate_info
context = OpenSSL::SSL::SSLContext.new
certificates = nil
verify_result = nil
Timeout.timeout(timeout) do
tcp_client = TCPSocket.new(url, port)
ssl_client = OpenSSL::SSL::SSLSocket.new tcp_client, context
ssl_client.hostname = url
ssl_client.sync_close = true
ssl_client.connect
certificates = ssl_client.peer_cert_chain
verify_result = ssl_client.verify_result
tcp_client.close
end
{certificate: certificates.first, verify_result: verify_result }
rescue => error
puts url
puts error.inspect
end
end
[VERY LARGE LIST OF URLS].map do |url|
Thread.new do
ssl_client = SslClient.new(url)
cert_info = ssl_client.ping_for_certificate_info
puts cert_info
end
end.map(&:value)
If you run this code in your terminal, you will see many Timeout errors and ERNNO:TIMEDOUT errors for sites like fandango.com, fandom.com, mcaffee.com, google.de etc that should return information. When I run these individually however I get the information I need. When I run them in the thread they tend to fail especially for domains that have a foreign domain name. What I'm asking is whether I am using Threads correctly. This snippet of code that I've pasted is part of a larger piece of code that interacts with ActiveRecord objects in rails depending on the results given. Am I using Timeout and Threads correctly? What do I need to do to make this work? Why would a ping work individually but not wrapped in a thread? Help would be greatly appreciated.
There are several issues:
You'd not spawn thousands of threads, use a connection pool (e.g https://github.com/mperham/connection_pool) so you have maximum 20-30 concurrent requests going (this maximum number should be determined by testing at which point network performance drops and you get these timeouts).
It's difficult to guarantee that your code is not broken when you use threads, that's why I suggest you use something where others figured it out for you, like https://github.com/httprb/http (with examples for thread safety and concurrent requests like https://github.com/httprb/http/wiki/Thread-Safety). There are other libs out there (Typhoeus, patron) but this one is pure Ruby so basic thread safety is easier to achieve.
You should not use Timeout (see https://jvns.ca/blog/2015/11/27/why-rubys-timeout-is-dangerous-and-thread-dot-raise-is-terrifying and https://medium.com/#adamhooper/in-ruby-dont-use-timeout-77d9d4e5a001). Use IO.select or something else.
Also, I suggest you learn about threading issues like deadlocks, starvations and all the gotchas. In your case you are doing a starvation of network resources because all the threads are fighting for bandwidth/network.
The whole threads/fibers/processes thing is confusing me a little. I have a practical problem that can be solved with some concurrency, so I thought this was a good opportunity to ask professionals and people more knowledgable than me about it.
I have a long array, let's say 3,000 items. I want to send a HTTP request for each item in the array.
Actually iterating over the array, generating requests, and sending them is very rapid. What takes time is waiting for each item to be received, processed, and acknowledged by the party I'm sending to. I'm essentially sending 100 bytes, waiting 2 seconds, sending 100 bytes, waiting 2 seconds.
What I would like to do instead is send these requests asynchronously. I want to send a request, specify what to do when I get the response, and in the meantime, send the next request.
From what I can see, there are four concurrency options I could use here.
Threads.
Fibers.
Processes; unsuitable as far as I know because multiple processes accessing the same array isn't feasible/safe.
Asynchronous functionality like JavaScript's XMLHttpRequest.
The simplest would seem to be the last one. But what is the best, simplest way to do that using Ruby?
Failing #4, which of the remaining three is the most sensible choice here?
Would any of these options also allow me to say "Have no more than 10 pending requests at any time"?
This is your classic producer/consumer problem and is nicely suited for threads in Ruby. Just create a Queue
urls = [...] # array with bunches of urls
require "thread"
queue = SizedQueue.new(10) # this will only allow 10 items on the queue at once
p1 = Thread.new do
url_slice = urls.each do |url|
response = do_http_request(url)
queue << response
end
queue << "done"
end
consumer = Thread.new do
http_response = queue.pop(true) # don't block when zero items are in queue
Thread.exit if http_response == "done"
process(http_response)
end
# wait for the consumer to finish
consumer.join
EventMachine as an event loop and em-synchrony as a Fiber wrapper for it's callbacks into synchronous code
Copy Paste from em-synchrony README
require "em-synchrony"
require "em-synchrony/em-http"
require "em-synchrony/fiber_iterator"
EM.synchrony do
concurrency = 2
urls = ['http://url.1.com', 'http://url2.com']
results = []
EM::Synchrony::FiberIterator.new(urls, concurrency).each do |url|
resp = EventMachine::HttpRequest.new(url).get
results.push resp.response
end
p results # all completed requests
EventMachine.stop
end
This is an IO bounded case that fits more in both:
Threading model: no problem with MRI Ruby in this case cause threads work well with IO cases; GIL effect is almost zero.
Asynchronous model, which proves(in practice and theory) to be far superior than threads when it comes to IO specific problems.
For this specific case and to make things far simpler, I would have gone with Typhoeus HTTP client which has a parallel support that works as the evented(Asynchronous) concurrency model.
Example:
hydra = Typhoeus::Hydra.new
%w(url1 url2 url3).each do |url|
request = Typhoeus::Request.new(url, followlocation: true)
request.on_complete do |response|
# do something with response
end
hydra.queue(request)
end
hydra.run # this is a blocking call that returns once all requests are complete
So, I'm trying to simulate some basic HTTP persistent connections using sockets and Ruby - for a college class.
The point is to build a server - able to handle multiple clients - that receives a file path and gives back the file content - just like an HTTP GET.
The current server implementation loops listening for clients, fires a new thread when there's an incoming connection and reads the file paths from this socket. It's very dumb, but it works fine when working with non-presistent connections - one request per connection.
But they should be persistent.
Which means the client shouldn't worry about closing the connection. In the non-persistent version the servers echoes the response and close the connection - goodbye client, farewell.
But being persistent means the server thread should loop and wait for more incoming requests until... well until there's no more requests. How does the server knows that? It doesn't! Some sort of timeout is needed. I tried to do that with Ruby's Timeout, but it didn't work.
Googling for some solutions - besides being thoroughly advised to avoid using Timeout module - I've seen a lot of posts about the IO.select method, that should handle this socket waiting issue way better than using threads and stuff (which really sounds cool, considering how Ruby threads (don't) work). I'm trying to understand here how IO.select works, but still wasn't able to make it work in the current scenario.
So I aske basically two things:
how can I efficiently work this timeout issue on the server-side, either using some thread based solution, low-level socket options or some IO.select magic?
how can the client side know that the server has closed its side of the connection?
Here's the current code for the server:
require 'date'
module Sockettp
class Server
def initialize(dir, port = Sockettp::DEFAULT_PORT)
#dir = dir
#port = port
end
def start
puts "Starting Sockettp server..."
puts "Serving #{#dir.yellow} on port #{#port.to_s.green}"
Socket.tcp_server_loop(#port) do |socket, client_addrinfo|
handle socket, client_addrinfo
end
end
private
def handle(socket, addrinfo)
Thread.new(socket) do |client|
log "New client connected"
begin
loop do
if client.eof?
puts "#{'-' * 100} end connection"
break
end
input = client.gets.chomp
body = content_for(input)
response = {}
if body
response.merge!({
status: 200,
body: body
})
else
response.merge!({
status: 404,
body: Sockettp::STATUSES[404]
})
end
log "#{addrinfo.ip_address} #{input} -- #{response[:status]} #{Sockettp::STATUSES[response[:status]]}".send(response[:status] == 200 ? :green : :red)
client.puts(response.to_json)
end
ensure
socket.close
end
end
end
def content_for(path)
path = File.join(#dir, path)
return File.read(path) if File.file?(path)
return Dir["#{path}/*"] if File.directory?(path)
end
def log(msg)
puts "#{Thread.current} -- #{DateTime.now.to_s} -- #{msg}"
end
end
end
Update
I was able to simulate the timeout behaviour using the IO.select method, but the implementation doesn't feel good when combining with a couple of threads for accepting new connections and another couple for handling requests. The concurrency makes the situation mad and unstable, and I'm probably not sticking with it unless I can figure out a better way of using this solution.
Update 2
Seems like Timeout is still the best way to handle this. I'm sticking with it till find a better option.
I still don't know how to deal with zombie client connections.
Solution
I endend up using IO.select (got inspired when looking at the webrick code). You cha check the final version here (lib/http/server/client_handler.rb)
You should implement something like heartbeat packets.Client side should send special packets to after few secs/mins to ensure that server doesn't time out the connection on the client end.You just avoid doing anything in this call.
Is there a way to find out how many bytes of data is available on an TCPSocket in Ruby? I.e. how many bytes can be ready without blocking?
The standard library io/wait might be useful here. Requiring it gives stream-based I/O (sockets and pipes) some new methods, among which is ready?. According to the documentation, ready? returns non-nil if there are bytes available without blocking. It just so happens that the non-nil value it returns it the number of bytes that are available in MRI.
Here's an example which creates a dumb little socket server, and then connects to it with a client. The server just sends "foo" and then closes the connection. The client waits a little bit to give the server time to send, and then prints how many bytes are available for reading. The interesting stuff for you is in the client:
require 'socket'
require 'io/wait'
# Server
server_socket = TCPServer.new('localhost', 0)
port = server_socket.addr[1]
Thread.new do
session = server_socket.accept
sleep 0.5
session.puts "foo"
session.close
end
# Client
client_socket = TCPSocket.new('localhost', port)
puts client_socket.ready? # => nil
sleep 1
puts client_socket.ready? # => 4
Don't use that server code in anything real. It's deliberately short in order to keep the example simple.
Note: According to the Pickaxe book, io/wait is only available if "FIONREAD feature in ioctl(2)", which it is in Linux. I don't know about Windows & others.
I've this ruby code that connects to a TCP server (namely, netcat). It loops 20 times, and sends "ABCD ". If I kill netcat, it takes TWO iterations of the loop for an exception to be triggered. On the first loop after netcat is killed, no exception is triggered, and "send" reports that 5 bytes have been correctly written... Which in the end is not true, since of course the server never received them.
Is there a way to work around this issue ? Right now I'm losing data : since I think it's been correctly transfered, I'm not replaying it.
#!/usr/bin/env ruby
require 'rubygems'
require 'socket'
sock = TCPSocket.new('192.168.0.10', 5443)
sock.sync = true
20.times do
sleep 2
begin
count = sock.write("ABCD ")
puts "Wrote #{count} bytes"
rescue Exception => myException
puts "Exception rescued : #{myException}"
end
end
When you're sending data your blocking call will return when the data is written to the TCP output buffer. It would only block if the buffer was full, waiting for the server to acknowledge receipt of previous data that was sent.
Once this data is in the buffer, the network drivers try to send the data. If the connection is lost, on the second attempt to write, your application discovers the broken state of the connection.
Also, how does the connection close? Is the server actively closing the connection? In which case client socket would be notified at its next socket call. Or has it crashed? Or perhaps there's a network fault which means you can no longer communicate.
Discovering a broken connection only occurs when you try to send or receive data over the socket. This is different from having the connection actively closed. You simply can't determine if the connection is still alive without doing something with it.
So try doing sock.recv(0) after the write - if the socket has failed this would raise "Errno::ECONNRESET: Connection reset by peer - recvfrom(2)". You could also try sock.sendmsg "", 0 (not sock.write, or sock.send), and this would report a "Errno::EPIPE: Broken pipe - sendmsg(2)".
Even if you got your hands on the TCP packets and get acknowledgement that the data had been received at the other end, there's still no guarantee that the server will have processed this data - it might in its input buffer but not yet processed.
All of this might help identify a broken connection earlier, but it still won't guarantee that the data was received and processed by the server. The only sure way to know that the application has processed your message is with an application level response.
I tried without the sleep function (just to make sure it wasn't putting on hold anything) and still no luck:
#!/usr/bin/env ruby
require 'rubygems'
require 'socket'
require 'activesupport' # Fixnum.seconds
sock = TCPSocket.new('127.0.0.1', 5443)
sock.sync = true
will_restart_at = Time.now + 2.seconds
should_continue = true
while should_continue
if will_restart_at <= Time.now
will_restart_at = Time.now + 2.seconds
begin
count = sock.write("ABCD ")
puts "Wrote #{count} bytes"
rescue Exception => myException
puts "Exception rescued : #{myException}"
should_continue = false
end
end
end
I analyzed with Wireshark and the two solutions are exactly behaving identically.
I think (and can't be sure) that until you actually call your_socket.write (which will not fail as the socket is still opened because you weren't probing for its possible destruction), the socket won't raise any error.
I tried to simulate this with nginx and manual TCP sockets. And look at that:
irb> sock = TCPSocket.new('127.0.0.1', 80)
=> #<TCPSocket:0xb743b824>
irb> sock.write("salut")
=> 5
irb> sock.read
=> "<html>\r\n<head><title>400 Bad Request</title></head>\r\n<body>\r\n</body>\r\n</html>\r\n"
# Here, I kill nginx
irb> sock.write("salut")
=> 5
irb> sock.read
=> ""
irb> sock.write("salut")
Errno::EPIPE: Broken pipe
So what's the conclusion from here? Unless you're actually expecting some data from the server, you're screwed to detect that you've lost the connection :)
To detect a gracefully close, you'll have to read from the socket - read returning 0 indicates the socket has closed.
If you do need know if data got sent successfully though, there's no way other than implementing ACKs of the data at the application level.