I have a circumstance where my server may close TCPServer and restart, saving all the users to a file, and immediately reloading them; their connections do not sever.
The problem is I can't seem to reinitialize their streams.
When we restart (and attempt to maintain connections), I reinitialize TCPServer, and load my array of connected users – Since these each have an existing socket address, stored as <TCPSocket:0x00000000000000>, can I reinitialize these addresses with TCPServer?
Normally, each user connects and is accepted:
$nCS = TCPServer.new(HOST, PORT)
begin
while socket = $nCS.accept
Thread.new( socket ) do |sock|
begin
d = User.new(sock)
while sock.gets
szIn = $_.chomp
DBG( "Received '" + szIn + "' from Client " + sock.to_s )
d.parseInput( szIn )
end
rescue => e
$stdout.puts "ERROR: Caught error in Client Thread: #{e} \r\n #{e.backtrace.to_s.gsub(",", ",\r\n")}"
sock.write("Sorry, an error has occurred, and you have been disconnected."+EOL+"Please try again later."+EOL)
d.closeConnection
end
end
end
rescue => e
$stdout.puts "ERROR: Caught error in Server Thread: #{e} \r\n #{e.backtrace.to_s.gsub(",", ",\r\n")}"
exit
end
To give it a command to hot reboot, we use exec('./main --copyover') to flag that a copy over is occurring.
If $connected holds an array of all users, and each user has a socket, how do I reinitialize the socket that was open before the restart (assuming the other end is still connected)?
I suspect that using exec("./main", "--copyover", *$nCS, *$connected) is getting me closer, since this simply replaces the process, and should maintain the files (not close them).
You can't. The socket is only valid for the lifetime of the process: it is closed by the OS when the process exits. That in turn invalidates the connection, so the other end is not still connected.
How to Hot-Reboot a TCPServer in Ruby
Hot-Rebooting (aka Copyover) is a process by which an administrator can reload the application (along with any new changes made since last boot) without losing the client connections. This is useful in managing customer expectations as the application does not need to suffer severe downtime and disruption if in use.
What I propose below may not be the best practice, but it's functioning and perhaps will guide others to a similar solution.
The Command
I use a particular style of coding that makes use of command tables to find functions and their accessibility. All command functions are prefixed with cmd. I'll clean up the miscellany to improve readability:
def cmdCopyover
#$nCS is the TCPServer object
#$connected holds an array of all users sockets
#--copyover flags that this is a hot reboot.
connected_args = $connected.map do |sock|
sock.close_on_exec = false if sock.respond_to?(:close_on_exec=)
sock.fileno.to_s
end.join(",")
exec('./main.rb', '--copyover', $nCS.fileno.to_s, connected_args)
end
What we're passing are strings; $nCS.fileno.to_s provides us the file descriptor of the main TCPServer object, while connected_args is a comma-delineated list of file descriptors for each user connected. When we restart, ARGV will be an array holding each argument:
ARGV[0] == "--copyover"
ARGV[1] == "5" (Or whatever the file descriptor for TCPServer was)
ARGV[2] == "6,7,8,9" (Example, assuming 4 connected users)
What To Expect When You're Expecting (a Copyover)
Under normal circumstances, we may have a basic server (in main.rb that looks something like this:
puts "Starting Server"
$connected = Array.new
$nCS = TCPServer.new("127.0.0.1",9999)
begin
while socket = $nCS.accept
# NB: Move this loop to its own function, threadLoop()
Thread.new( socket ) do |sock|
begin
while sock.gets
szIn = $_.chomp
#do something with input.
end
rescue => e
puts "ERROR: Caught error in Client Thread: #{e}"
puts #{e.backtrace.to_s.gsub(",", ",\r\n")}"
sock.write("Sorry, an error has occurred, and you have been disconnected."+EOL+"Please try again later."+EOL)
sock.close
end
end
end
rescue => e
puts "Error: Caught Error in Server Thread: #{e}"
puts "#{e.backtrace.to_s.gsub(",", ",\r\n")}"
exit
end
We want to move that main loop to its own function to make it accessible -- our reconnecting users will need to be reinserted in the loop.
So let's get main.rb ready for accepting a hot reboot:
def threadLoop( socket )
Thread.new( socket ) do |sock|
begin
while sock.gets
szIn = $_.chomp
#do something with input.
end
rescue => e
puts "ERROR: Caught error in Client Thread: #{e}"
puts #{e.backtrace.to_s.gsub(",", ",\r\n")}"
sock.write("Sorry, an error has occurred, and you have been disconnected."+EOL+"Please try again later."+EOL)
sock.close
end
end
end
puts "Starting Server"
$connected = Array.new
if ARGV[0] == '--copyover'
$nCS = TCPServer.for_fd( ARGV[1].to_i )
$nCS.close_on_exec = false if $nCS.respond_to?(:close_on_exec=)
connected_args = ARGV[2]
connected_args.split(/,/).map do |sockfd|
$connected << sockfd
$connected.each {|c| threadLoop( c ) }
else
$nCS = TCPServer.new("127.0.0.1",9999)
$nCS.close_on_exec = false if $nCS.respond_to?(:close_on_exec=)
end
begin
while socket = $nCS.accept
threadLoop( socket )
end
rescue => e
puts "Error: Caught Error in Server Thread: #{e}"
puts "#{e.backtrace.to_s.gsub(",", ",\r\n")}"
exit
end
Caveat
My actual usage was a lot more ridiculously complicated, so I did my best to strip out all the garbage; however, I was realizing when I got the end here that you could probably do without $connected (it's a part of a larger system for me). There may be some errors, so please comment if you find them and I'll correct.
Hope this helps anyone who finds it.
Related
I've written little UDP server in Ruby:
def listen
puts "Started UDP server on #{#port}..."
Socket.udp_server_loop(#port) do |message, message_source|
puts "Got \"#{message}\" from #{message_source}"
handle_incoming_message(message)
end
end
I start it in a separate thread:
thread = Thread.new { listen }
Is there a way to gracefully stop the udp_server_loop from outside the thread without just killing it (thread.kill)? I also dont't want to stop it from the inside by receiving any UDP message. Is udp_server_loop maybe not the right tool for me?
I don’t think you can do this with udp_server_loop (although you might be able to use some of the methods it uses). You are going to have to call IO::select in a loop of your own with some way of signalling it to exit, and some way of waking the thread so you don’t have to send a packet to stop it.
A simple way would be to use the timeout option to select with a variable to set to indicate you want the thread to end, something like:
#halt_loop = false
def listen
puts "Started UDP server on #{#port}..."
sockets = Socket.udp_server_sockets(#port)
loop do
readable, _, _ = IO.select(sockets, nil, nil, 1) # timeout 1 sec
break if #halt_loop
next unless readable # select returns nil on timeout
Socket.udp_server_recv(readable) do |message, message_source|
puts "Got \"#{message}\" from #{message_source}"
handle_incoming_message(message)
end
end
end
You then set #halt_loop to true when you want to stop the thread.
The downside to this is that it is effectively polling. If you decrease the timeout then you potentially do more work on an empty loop, and if you increase it you have to wait longer when stopping the thread.
Another, slightly more complex solution would be to use a pipe and have the select listen on it along with the sockets. You could then signal directly to finish the select and exit the thread.
#read, #write = IO.pipe
#halt_loop = false
def listen
puts "Started UDP server on #{#port}..."
sockets = Socket.udp_server_sockets(#port)
sockets << #read
loop do
readable, _, _ = IO.select(sockets)
break if #halt_loop
readable.delete #read
Socket.udp_server_recv(readable) do |message, message_source|
puts "Got \"#{message}\" from #{message_source}"
handle_incoming_message(message)
end
end
end
def end_loop
#halt_loop = true
#write.puts "STOP!"
end
To exit the thread you just call end_loop which sets the #halt_loop flag then writes to the pipe, making the other end readable and causing the other thread to return from select.
You could have this code check the readable IOs and exit if one of them is the read end of the pipe instead of using the variable, but at least on Linux there is a potential bug where a call to select might return a file descriptor as readable when it actuallt isn’t. I don’t know if Ruby deals with this, so better safe than sorry.
Also be sure to remove the pipe from the readable array before passing it to udp_server_recv. It’s not a socket so will cause an exception if you don’t.
A downside to this technique is that pipes are “[n]ot available on all platforms".
Although I doubt I understand what would be wrong with Thread::kill and/or Thread#exit, you might use the thread local variable for that.
def listen
Socket.udp_server_loop(#port) do |message, message_source|
break :interrupted if Thread.current[:break]
handle_incoming_message(message)
end
end
and do
thread[:break] = true
from the outside.
I have a TCPserver that I made in ruby, the server seems to work, I can see that two or more clients can connect and be served by the server, but, they sometime get stuck (as in need to wait for the other client to disconnect or just get unresponsive), usually after the "pass_ok" bit, When connecting only with one client I don't see this issue.
Here is my code:
def self.main_server
begin
server = TCPServer.open(#port)
rescue Exception => e
CoreLogging.syslog_error("Cant start server: #{e}")
end
#main_pid = Process.pid
# Main Loop
Thread.abort_on_exception = true
while true
Thread.fork(server.accept) do |client|
#client = client
sock_domain, remote_port, remote_hostname, remote_ip = #client.peeraddr # Get some info on the incoming connection
CoreLogging.syslog_error("Got new connection from #{#client.peeraddr[3]} Handeled by Thread: #{Thread.current}") # Log incoming connection
#client.puts "Please enter password: " # Password testing (later will be from a config file or DB)
action = #client.gets(4096).chomp # get client password response 'chomp' is super important
if action == #password
# what to do when password is right
pass_ok
Thread.exit
else
# what to do when password is wrong
pass_fail
Thread.exit
end
end
begin
CoreLogging.syslog_error("Thread Ended (SOFT)")
rescue Exception => e
CoreLogging.syslog_error("Thread was killed (HARD)")
end
end
end
I'll leave it here for future reference and hope someone in a close situation will find it useful.
The issue was the global #client variable, which got overwritten every new thread and then inherited to the subclasses inside the thread.
using a local client variable (without the '#') got it to work as supposed.
I am working on a project where I have implemented a TCP client server for a device communication. In order to send a command from the server to the client, I am building a command that the device understands and sending to it but the response is not what should be returned
while 1
Thread.start(#otd.accept) do |client|
loop do
command_to_send ="<R-2,3,4>"
client.puts command_to_send
puts "Command #{command_to_send}sent"
#sleep 2
response = **client.gets** # here it halts and never puts the the next statement.
puts "Reponse #{response}"
end # end of nested loop
client.close
end #END OF THREAD.
end #end of while loop
Can someone tell me what I am missing?
Do not use gets as it expects '\n' to be a delimiter of the message.
Instead use: recv here is a method that could help you:
def read(timeout = 2, buffer = 1024)
message = ''
begin
Timeout::timeout(timeout) do
buffer = client.recv(buffer)
message += buffer
end
rescue Timeout::Error
puts "Received nothing from client: #{client.__id__}"
message = ''
rescue Exception => e
raise "Client failed to read for reason - #{e.message}"
end
message
end
You do not have to use sleep anymore as recv like gets is blocking. But the timeout makes sure you are not stuck waiting for a response not existing.
I use the following code to check the server status of a certain game server to see if the game server is online.
begin
sock = Socket.new(Socket::AF_INET, Socket::SOCK_STREAM, 0)
sockaddr = Socket.sockaddr_in(game_server.gameserver_port, game_server.gameserver_hostname)
sock.connect(sockaddr)
server_status.status = 1
rescue
server_status.status = 0
end
However it seems that the code blindly hangs up on the line without proceeding anywhere sock.connect(sockaddr) and does not throw an error when there's no services listening on that port. Is there a better way to do this in Ruby?
Could timeout be a good solution?
require 'timeout'
begin
timeout(5) do
# socket stuff...
end
rescue Timeout::Error
puts "Timed out!"
end
How do you set the timeout for blocking operations on a Ruby socket?
The solution I found which appears to work is to use Timeout::timeout:
require 'timeout'
...
begin
timeout(5) do
message, client_address = some_socket.recvfrom(1024)
end
rescue Timeout::Error
puts "Timed out!"
end
The timeout object is a good solution.
This is an example of asynchronous I/O (non-blocking in nature and occurs asynchronously to
the flow of the application.)
IO.select(read_array
[, write_array
[, error_array
[, timeout]]] ) => array or nil
Can be used to get the same effect.
require 'socket'
strmSock1 = TCPSocket::new( "www.dn.se", 80 )
strmSock2 = TCPSocket::new( "www.svd.se", 80 )
# Block until one or more events are received
#result = select( [strmSock1, strmSock2, STDIN], nil, nil )
timeout=5
timeout=100
result = select( [strmSock1, strmSock2], nil, nil,timeout )
puts result.inspect
if result
for inp in result[0]
if inp == strmSock1 then
# data avail on strmSock1
puts "data avail on strmSock1"
elsif inp == strmSock2 then
# data avail on strmSock2
puts "data avail on strmSock2"
elsif inp == STDIN
# data avail on STDIN
puts "data avail on STDIN"
end
end
end
I think the non blocking approach is the way to go.
I tried the mentioned above article and could still get it to hang.
this article non blocking networking and the jonke's approach above got me on the right path. My server was blocking on the initial connect so I needed it to be a little lower level.
the socket rdoc can give more details into the connect_nonblock
def self.open(host, port, timeout=10)
addr = Socket.getaddrinfo(host, nil)
sock = Socket.new(Socket.const_get(addr[0][0]), Socket::SOCK_STREAM, 0)
begin
sock.connect_nonblock(Socket.pack_sockaddr_in(port, addr[0][3]))
rescue Errno::EINPROGRESS
resp = IO.select([sock],nil, nil, timeout.to_i)
if resp.nil?
raise Errno::ECONNREFUSED
end
begin
sock.connect_nonblock(Socket.pack_sockaddr_in(port, addr[0][3]))
rescue Errno::EISCONN
end
end
sock
end
to get a good test. startup a simple socket server and then do a ctrl-z to background it
the IO.select is expecting data to come in on the input stream within 10 seconds. this may not work if that is not the case.
It should be a good replacement for the TCPSocket's open method.