avoiding zombie processes with Ruby's Net::SSH gem when connection fails - ruby

I have an application that does something like this using Ruby's Net::SSH gem:
key = '/home/creede/.ssh/secret.pem'
conn = nil
conn = Net::SSH::start('example.com','creede',:timeout=>60,:keys=>[key])
conn = Net::SSH::start('example.com','root',:timeout=>60,:keys=>[key])
rescue Exception => e
puts "Can't connect to example.com: #{e.to_s}"
if not conn.nil?
if not conn.closed?
issue = conn.exec!('cat /etc/issue')
All well and good when we connect to the server the first time. However if the server needs to be logged into as root because the first try at connecting fails, the first attempted ssh process turns into a zombie. So does the second one if connecting as root fails.
These zombies disappear when the parent process finishes, but I'd like to figure out how (if possible) to get rid of the zombies as soon as we know the connection fails.


replSetStepDown works, but an EOF error is returned to my Ruby mongo driver

I am interacting with my MongoDB ReplicaSet through my Ruby mongo driver, v. 2.6.2.
#mongo_client = Mongo::Client.new(#uri, connect: :replica_set).use("admin")
When I want to initiate a failover, I run this command
failover = #mongo_client.command(
replSetStepDown: 60,
secondaryCatchUpPeriodSecs: 10,
force: false
Rails.logger.error "failover: #{failover}"
rescue Mongo::Error::SocketError => e
Rails.logger.error "error: #{e.to_s.inspect}"
The command works, and the failover happens. But an exception is raised, that's why I have to rescue it:
error: EOFError: end of file reached
When I pass invalid arguments, I do get proper responses, for example:
failover: Mongo::Error::OperationFailure (stepdown period must be longer than secondaryCatchUpPeriodSecs (2)):
So it's only on successfull initiation that the exception is raised.
Any idea how to get a proper response, rather than an exception?
error: EOFError: end of file reached
This is expected behavior in pre-4.2 servers, which close all connections when they step down. 4.2+ servers do not.
Any idea how to get a proper response, rather than an exception?
Use a 4.2+ server.

create a background ssh tunnel in ruby

My oracle db is only accessable via a jumpoff server and is load balanced. As a result I run the following background tunnel command in bash:
ssh ${jumpoffUser}#${jumpoffIp} -L1521:ont-db01-vip:1521 -L1522:ont-db02-vip:1521 -fN
Before I run my commands on the db using sqlplus like so:
sqlplus #{#sqlUsername}/#{#sqlPassword}#'#{#sqlUrl}' #scripts/populateASDB.sql
This all works fine.
Now I want to rubisize this procedure.
In looking up the documentation on ruby I could not find how to put the tunnel in the background (which would be my preference) but I found documentation on local port forwarding which I thought would emulate the above tunnel and subsequent sqlplus command.
Here is my code:
Net::SSH.start( #jumpoffIp, #jumpoffUser ) do |session|
session.forward.local( 1521, 'ont-db01-vip', 1521 )
session.forward.local( 1522, 'ont-db02-vip', 1521 )
puts "About to populateDB"
res = %x[sqlplus #{#sqlUsername}/#{#sqlPassword}#'#{#sqlUrl}' #scripts/populateASDB.sql > output.txt]
puts "populateDb output #{res}"
When I run the above I get the line "About to populateDB" but it hangs on the actual running of the sqlplus command. Is there something wrong with my port forwarding code or how do I put the following:
ssh ${jumpoffUser}#${jumpoffIp} -L1521:ont-db01-vip:1521 -L1522:ont-db02-vip:1521 -fN
into ruby code?
Try to use this gem: https://github.com/net-ssh/net-ssh-gateway/
require 'net/ssh/gateway'
gateway = Net::SSH::Gateway.new(#jumpoffIp, #jumpoffUser)
gateway.open('ont-db01-vip', 1521, 1521)
gateway.open('ont-db02-vip', 1521, 1521)
res = %x[sqlplus #{#sqlUsername}/#{#sqlPassword}#'#{#sqlUrl}' #scripts/populateASDB.sql > output.txt]
puts "populateDb output #{res}"
You have two problems.
1) You need to use 'session.loop { true }' so that the session actually loops
2) You don't start looping the session until your sqlplus command is done, but the sqlplus needs the session looping (the forwarding to be up).
So I suggest creating a background thread using Thread.new and then killing the thread once sqlplus is done.
Thanks to David's answer, I came up with the following:
Net::SSH.start(ip_addr, 'user') do |session|
session.forward.local( 9090, 'localhost', 9090 )
# Need to run the event loop in the background for SSH callbacks to work
t = Thread.new {
session.loop { true }
commands.each do | command |

Ruby TCPServer fails to work sometimes

I've implemented a very simple kind of server in Ruby, using TCPServer. I have a Server class with serve method:
def serve
# Do the actual serving in a child process
#pid = fork do
# Trap signal sent by #stop or by pressing ^C
Signal.trap('INT') { exit }
# Create a new server on port 2835 (1 ounce = 28.35 grams)
server = TCPServer.new('localhost', 2835)
#logger.info 'Listening on http://localhost:2835...'
loop do
socket = server.accept
request_line = socket.gets
#logger.info "* #{request_line}"
socket.print "message"
and a stop method:
def stop
#logger.info 'Shutting down'
Process.kill('INT', #pid)
#pid = nil
I run my server from the command line, using:
if __FILE__ == $0
server = Server.new
server.logger = Logger.new(STDOUT)
server.logger.formatter = proc { |severity, datetime, progname, msg| "#{msg}\n" }
rescue Interrupt
The problem is that, sometimes, when I do ruby server.rb from my terminal, the server starts, but when I try to make a request on localhost:2835, it fails. Only after several requests it starts serving some pages. In other cases, I need to stop/start the server again for it to properly serve pages. Why is this happening? Am I doing something wrong? I find this very weird...
The same things applies to my specs: I have some specs defined, and some Capybara specs. Before each test I create a server and start it and after each test I stop the server. And the problem persists: tests sometimes pass, sometimes fail because the requested page could not be found.
Is there something fishy going on with my forking?
Would appreciate any answer because I have no more place to look...
Your code is not an HTTP server. It is a TCP server that sends the string "message" over the socket after receiving a newline.
The reason that your code isn't a valid HTTP server is that it doesn't conform to the HTTP protocol. One of the many requirements of the HTTP protocol is that the server respond with a message of the form
HTTP/1.1 <code> <reason>
Where <code> is a number and <reason> is a human-readable "status", like "OK" or "Server Error" or something along those lines. The string message obviously does not conform to this requirement.
Here is a simple introduction to how you might build a HTTP server in ruby: https://practicingruby.com/articles/implementing-an-http-file-server

Gracefully disconnect a WebSocket connection with Celluloid

I connect to a WebSocket with Chrome's Remote Debugging Protocol, using a Rails application and a class that implements Celluloid, or more specifically, celluloid-websocket-client.
The problem is that I don't know how to disconnect the WebSocket cleanly.
When an error happens inside the actor, but the main program runs, Chrome somehow still makes the WebSocket unavailable, not allowing me to attach again.
Code Example
Here's the code, completely self-contained:
require 'celluloid/websocket/client'
class MeasurementConnection
include Celluloid
def initialize(url)
#ws_client = Celluloid::WebSocket::Client.new url, Celluloid::Actor.current
# When WebSocket is opened, register callbacks
def on_open
puts "Websocket connection opened"
# #ws_client.close to close it
# When raw WebSocket message is received
def on_message(msg)
puts "Received message: #{msg}"
# Send a raw WebSocket message
def send_chrome_message(msg)
#ws_client.text JSON.dump msg
# When WebSocket is closed
def on_close(code, reason)
puts "WebSocket connection closed: #{code.inspect}, #{reason.inspect}"
MeasurementConnection.new ARGV[0].strip.gsub("\"","")
while true
What I've tried
When I uncomment #ws_client.close, I get:
NoMethodError: undefined method `close' for #<Celluloid::CellProxy(Celluloid::WebSocket::Client::Connection:0x3f954f44edf4)
But I thought this was delegated? At least the .text method works too?
When I call terminate instead (to quit the Actor), the WebSocket is still opened in the background.
When I call terminate on the MeasurementConnection object that I create in the main code, it makes the Actor appear dead, but still does not free the connection.
How to reproduce
You can test this yourself by starting Chrome with --remote-debugging-port=9222 as command-line argument, then checking curl http://localhost:9222/json and using the webSocketDebuggerUrl from there, e.g.:
ruby chrome-test.rb $(curl http://localhost:9222/json 2>/dev/null | grep webSocket | cut -d ":" -f2-)
If no webSocketDebuggerUrl is available, then something is still connecting to it.
It used to work when I was using EventMachine similar to this example, but not with faye/websocket-client, but em-websocket-client instead. Here, upon stopping the EM loop (with EM.stop), the WebSocket would become available again.
I figured it out. I used the 0.0.1 version of the celluloid-websocket-client gem which did not delegate the close method.
Using 0.0.2 worked, and the code would look like this:
In MeasurementConnection:
def close
In the main code:
m = MeasurementConnection.new ARGV[0].strip.gsub("\"","")
while m.alive?

How do I know what ports are on my Macbook?

I'm trying to go through this particular code example from "The Well Grounded Rubyist" regarding TCPServer and threads. The code is below:
require 'socket'
server = TCPServer.new(3939)
connect = server.accept
connect.puts "Hi. Here's the date."
connect.puts 'date'
How do I know what port is on my Macbook? The docs has 2000 in the example. However, when I try both of these numbers the code doesn't execute, it continues to hang indefinitely.
How can I check if these numbers are verified ports? I tried telnetting to the port number and the connection is refused everytime.
server.accepts waits for a client to connect to the server. If that does not happen, it just keeps waiting. Run the code, then open terminal and type:
require 'socket'
s = TCPSocket.new 'localhost', 3939
At this point you will create TCPSocket, which will connect with your server. This will cause the rest of the code to execute. You can check it with your socket:
while line = s.gets # Read lines from socket
puts line # and print them
