How can I tell if my Ruby server script is being overloaded? - ruby

I have a daemonized ruby script running on my server that looks like this:
#server = TCPServer.open(61101)
loop do
#thr = Thread.new(#server.accept) do |sock|
Thread.current[:myArrayOfHashes] = [] # hashes containing attributes of myObject
SystemTimer.timeout_after(5) do
Thread.current[:string] = sock.gets
sock.close
# parse the string and load the data into myArrayOfHashes
Myobject.transaction do # Update the myObjects Table
Thread.current[:myArrayOfHashes].each do |h|
Thread.current[:newMyObject] = Myobject.new
# load up the new object with data
Thread.current[:newMyObject].save
end
end
end
end
#thr.join
end
This server receives and manages data for my rails application which is all running on Mac OS 10.6. The clients call the server every 15 minutes on the 15 and while I currently only have 16 or so clients calling every 15 min on the 15, I'm wondering about the following:
If two clients call at close enough to the same time, will one client's connection attempt fail?
How I can figure out how many client connections my server can accommodate at the same time?
How can I monitor how much memory my server is using?
Also, is there an article you can point me toward that discusses the best way to implement this kind of a server? I mean can I have multiple instances of the server listening on the same port? Would that even help?
I am using Bluepill to monitor my server daemons.

1 and 2
The answer is no, two clients connecting close to each other will not make the connection fail (however multiple clients connecting may fail, see below).
The reason is the operating system has a default so called listening queue built into all server sockets. So even if you are not calling accept fast enough in your program, the OS will still keep buffering incoming connections for you. It will buffer these connections for as long as the listening queue does not get filled.
Now what is the size of this queue then?
In most cases the default size typically used is 5. The size is set after you create the socket and you call listen on this socket (see man page for listen here).
For Ruby TCPSocket automatically calls listen for you, and if you look at the C-source code for TCPSocket you will find that it indeed sets the size to 5:
https://github.com/ruby/ruby/blob/trunk/ext/socket/ipsocket.c#L108
SOMAXCONN is defined as 5 here:
https://github.com/ruby/ruby/blob/trunk/ext/socket/mkconstants.rb#L693
Now what happens if you don't call accept fast enough and the queue gets filled?
The answer is found in the man page of listen:
The backlog argument defines the maximum length to which the queue of pending connections for sockfd may grow. If a connection request arrives when the queue is full, the client may receive an error with an indication of ECONNREFUSED or, if the underlying protocol supports retransmission, the request may be ignored so that a later reattempt at connection succeeds.
In your code however there is one problem which can make the queue fill up if more than 5 clients try to connect at the same time: you're calling #thr.join at the end of the loop.
What effectively happens when you do this is that your server will not accept any new incoming connections until all your stuff inside your accept-thread has finished executing.
So if the database stuff and the other things you are doing inside the accept-thread takes a long time, the listening queue may fill up in the meantime. It depends on how long your processing takes, and how many clients could potentially be connecting at the exact same time.
3
You didn't say which platform you are running on, but on linux/osx the easiest way is to just run top in your console. For more advanced memory monitoring options you might want to check these out:
ruby/ruby on rails memory leak detection
track application memory usage on heroku

Related

Understanding Celluloid Pool

I guess my understanding toward Celluloid Pool is sort of broken. I will try to explain below but before that a quick note.
Note: Our system is running against a very fast client passing messages over ZeroMQ.
With the following Vanilla Celluloid app
class VanillaClient
include Celluloid::ZMQ
def read
loop { async.evaluate_response(socket.read_multipart)
end
def evaluate_response(data)
## the reason for using defer can be found over here.
Celluloid.defer do
ExternalService.execute(data)
end
end
end
Our system result in failure after some time, reason 'Can't spawn more thread' (or something like it)
So we intended to use Celluloid Pool(to avoid the above-mentioned problem ) so that we can limit the number of threads that spawned
My Understanding toward Celluloid Pool is
Celluloid Pool maintains a pool of actors for you so that you can distribute your task in parallel.
Hence, I decide to test it, but according to my test cases, it seems to behave serially(i.e thing never get distribute or happen in parallel.)
Example to replicate this.
sender-1.rb
## Send message `1` to the the_client.rb
sender-2.rb
## Send message `2` to the the_client.rb
the_client.rb
## take message from sender-1 and sender-2 and return it back to receiver.rb
## heads on, the `sleep` is introduced to test/replicate the IO block that happens in the actual code.
receiver.rb
## print the message obtained from the_client.rb
If, the sender-2.rb is run before sender-1.rb it appears that the pool gets blocked for 20 sec (sleep time in the_client.rb,can be seen over here) before consuming the data sent by sender-1.rb
It behaves the same in ruby-2.2.2 and under jRuby-9.0.5.0. What could be the possible causes for Pool to act in such manner?
Your pool call is not asynchronous.
Execution of evaluate on #pool needs to be .async still, as in your original example, not using pools. You still want asynchronous behavior, but you als want to have multiple handler actors.
Next you will likely hit the Pool.async bug.
https://github.com/celluloid/celluloid-pool/issues/6
This means after 5 hits to evaluate your pool will become unresponsive until at least one actor in the pool is finished. Worst case scenario, if you get 6+ requests in rapid succession, the 6th will then take 120 seconds, because it will take 5*20 seconds before it executes, then 20 seconds to execute itself.
Depending on what your actual operation is that's causing you delays -- you might need to adjust your pool size down the line.

How does ZeroMQ connect and bind work internally

I am experimenting with ZeroMQ. And I found it really interesting that in ZeroMQ, it does not matter whether either connect or bind happens first. I tried looking into the source code of ZeroMQ but it was too big to find anything.
The code is as follows.
# client side
import zmq
ctx = zmq.Context()
socket = ctx.socket(zmq.PAIR)
socket.connect('tcp://*:2345') # line [1]
# make it wait here
# server side
import zmq
ctx = zmq.Context()
socket = ctx.socket(zmq.PAIR)
socket.bind('tcp://localhost:2345')
# make it wait here
If I start client side first, the server has not been started yet, but magically the code is not blocked at line [1]. At this point, I checked with ss and made sure that the client is not listening on any port. Nor does it have any open connection. Then I start the server. Now the server is listening on port 2345, and magically the client is connected to it. My question is how does the client know the server is now online?
The best place to ask your question is the ZMQ mailing list, as many of the developers (and founders!) of the library are active there and can answer your question directly, but I'll give it a try. I'll admit that I'm not a C developer so my understanding of the source is limited, but here's what I gather, mostly from src/tcp_connector.cpp (other transports are covered in their respective files and may behave differently).
Line 214 starts the open() method, and here looks to be the meat of what's going on.
To answer your question about why the code is not blocked at Line [1], see line 258. It's specifically calling a method to make the socket behave asynchronously (for specifics on how unblock_socket() works you'll have to talk to someone more versed in C, it's defined here).
On line 278, it attempts to make the connection to the remote peer. If it's successful immediately, you're good, the bound socket was there and we've connected. If it wasn't, on line 294 it sets the error code to EINPROGRESS and fails.
To see what happens then, we go back to the start_connecting() method on line 161. This is where the open() method is called from, and where the EINPROGRESS error is used. My best understanding of what's happening here is that if at first it does not succeed, it tries again, asynchronously, until it finds its peer.
I think the best answer is in zeromq wiki
When should I use bind and when connect?
As a very general advice: use bind on the most stable points in your architecture and connect from the more volatile endpoints. For request/reply the service provider might be point where you bind and the client uses connect. Like plain old TCP.
If you can't figure out which parts are more stable (i.e. peer-to-peer) think about a stable device in the middle, where boths sides can connect to.
The question of bind or connect is often overemphasized. It's really just a matter of what the endpoints do and if they live long — or not. And this depends on your architecture. So build your architecture to fit your problem, not to fit the tool.
And
Why do I see different behavior when I bind a socket versus connect a socket?
ZeroMQ creates queues per underlying connection, e.g. if your socket is connected to 3 peer sockets there are 3 messages queues.
With bind, you allow peers to connect to you, thus you don't know how many peers there will be in the future and you cannot create the queues in advance. Instead, queues are created as individual peers connect to the bound socket.
With connect, ZeroMQ knows that there's going to be at least a single peer and thus it can create a single queue immediately. This applies to all socket types except ROUTER, where queues are only created after the peer we connect to has acknowledge our connection.
Consequently, when sending a message to bound socket with no peers, or a ROUTER with no live connections, there's no queue to store the message to.
When you call socket.connect('tcp://*:2345') or socket.bind('tcp://localhost:2345') you are not calling these methods directly on an underlying TCP socket. All of ZMQ's IO - including connecting/binding underlying TCP sockets - happens in threads that are abstracted away from the user.
When these methods are called on a ZMQ socket it essentially queues these events within the IO threads. Once the IO threads begin to process them they will not return an error unless the event is truly impossible, otherwise they will continually attempt to connect/reconnect.
This means that a ZMQ socket may return without an error even if socket.connect is not successful. In your example it would likely fail without error but then quickly reattempt and succeeded if you were to run the server side of script.
It may also allow you to send messages while in this state (depending on the state of the queue in this situation, rather than the state of the network) and will then attempt to transmit queued messages once the IO threads are able to successfully connect. This also includes if a working TCP connection is later lost. The queues may continue to accept messages for the unconnected socket while IO attempts to automatically resolve the lost connection in the background. If the endpoint takes a while to come back online it should still receive it's messages.
To better explain here's another example
<?php
$pid = pcntl_fork();
if($pid)
{
$context = new ZMQContext();
$client = new ZMQSocket($context, ZMQ::SOCKET_REQ);
try
{
$client->connect("tcp://0.0.0.0:9000");
}catch (ZMQSocketException $e)
{
var_dump($e);
}
$client->send("request");
$msg = $client->recv();
var_dump($msg);
}else
{
// in spawned process
echo "waiting 2 seconds\n";
sleep(2);
$context = new ZMQContext();
$server = new ZMQSocket($context, ZMQ::SOCKET_REP);
try
{
$server->bind("tcp://0.0.0.0:9000");
}catch (ZMQSocketException $e)
{
var_dump($e);
}
$msg = $server->recv();
$server->send("response");
var_dump($msg);
}
The binding process will not begin until 2 seconds later than the connecting process. But once the child process wakes and successfully binds the req/rep transaction will successfully take place without error.
jason#jason-VirtualBox:~/php-dev$ php play.php
waiting 2 seconds
string(7) "request"
string(8) "response"
If I was to replace tcp://0.0.0.0:9000 on the binding socket with tcp://0.0.0.0:2345 it will hang because the client is trying to connect to tcp://0.0.0.0:9000, yet still without error.
But if I replace both with tcp://localhost:2345 I get an error on my system because it can't bind on localhost making the call truly impossible.
object(ZMQSocketException)#3 (7) {
["message":protected]=>
string(38) "Failed to bind the ZMQ: No such device"
["string":"Exception":private]=>
string(0) ""
["code":protected]=>
int(19)
["file":protected]=>
string(28) "/home/jason/php-dev/play.php"
["line":protected]=>
int(40)
["trace":"Exception":private]=>
array(1) {
[0]=>
array(6) {
["file"]=>
string(28) "/home/jason/php-dev/play.php"
["line"]=>
int(40)
["function"]=>
string(4) "bind"
["class"]=>
string(9) "ZMQSocket"
["type"]=>
string(2) "->"
["args"]=>
array(1) {
[0]=>
string(20) "tcp://localhost:2345"
}
}
}
["previous":"Exception":private]=>
NULL
}
If your needing real-time information for the state of underlying sockets you should look into socket monitors. Using socket monitors along with the ZMQ poll allows you to poll for both socket events and queue events.
Keep in mind that polling a monitor socket using ZMQ poll is not similar to polling a ZMQ_FD resource via select, epoll, etc. The ZMQ_FD is edge triggered and therefor doesn't behave the way you would expect when polling network resources, where a monitor socket within ZMQ poll is level triggered. Also, monitor sockets are very light weight and latency between the system event and the resulting monitor event is typically sub microsecond.

Elegant way to stop socket read operation from outside

I implemented a small client server application in Ruby and I have the following problem: The server starts a new client session in a new thread for each connecting client, but it should be possible to shutdown the server and stop all the client sessions in a 'polite' way from outside without just killing the thread while I don't know which state it is in.
So I decided that the client session object gets a `stop' flag which can be set from outside and is checked before each action. The problem is that it should not wait for the client, if it is just waiting for a request. I have the following temporary solution:
def read_client
loop do
begin
timeout(1) { return #client.gets }
rescue Timeout::Error
if #stop
stop # Notifies the client and closes the connection
return nil
end
end
end
end
But that sucks, looks terrible and intuitively, this should be such a normal thing that there has to be a `normal' solution to it. I don't even know if it is safe or if it could happen that the gets operation reads part of the client request, but not all of it.
Another side question is, if setting/getting a boolean flag is an atomic operation in Ruby (or if I need an additional Mutex for the flag).
Thread-per-client approach is usually a disaster for server design. Also blocking I/O is difficult to interrupt without OS-specific tricks. Check out non-blocking sockets, see for example, answers to this question.

Handling event-stream connections in a Sinatra app

There is a great example of a chat app using Server-Sent Events by Konstantin Haase. I am trying to run it and have a problem with callbacks (I use Sinatra 1.3.2 and browse with Chrome 16). They simply do not run (e.g. after page reload), and therefore the number of connections is growing.
Also, connection is closed in 30-60 sec unless one sets a periodic timer to send empty data, as suggested by Konstantin elsewhere.
Can you replicate it? If yes, is it possible to fix these issues somehow? WebSockets work seamlessly in this respect...
# ruby
get '/stream', provides: 'text/event-stream' do
stream :keep_open do |out|
EventMachine::PeriodicTimer.new(20) { out << "data: \n\n" } # added
settings.connections << out
puts settings.connections.count # added
out.callback { puts 'closed'; settings.connections.delete(out) } # modified
end
end
# javascript
var es = new EventSource('/stream');
es.onmessage = function(e) { if (e.data != '') $('#chat').append(e.data + "\n") }; // modified
This was a bug in Sinatra https://github.com/sinatra/sinatra/issues/446
Neat bit of code. But you're right, WebSockets would address these problems. I think there are two problems here:
1) Your browser, the Web server, or a proxy in-between may shut down your connection after a period of time, idle or not. Your suggestion of a periodic timer sending empty data will help, but is no guarantee.
2) As far as I know, there's no built-in way to tell if/when one of these connections is still working. To keep your list of connections from growing, you're going to have to keep track of when each connection was last "used" (maybe the client should ping occasionally, and you would store this datetime.) Then add a periodic timer to check for and kill "stale" connections.
An easier, though perhaps uglier option, is to store the creation time of each connection, and kill it off after n minutes. The client should be smart enough to reconnect.
I know that takes some of the simplicity out of the code. As neat as the example is, I think it's a better candidate for WebSockets.

Block TCP-send till ACK returned

I am programming a client application sending TCP/IP packets to a server. Because of timeout issues I want to start a timer as soon as the ACK-Package is returned (so there can be no timeout while the package has not reached the server). I want to use the winapi.
Setting the Socket to blocking mode doesn't help, because the send command returns as soon as the data is written into the buffer (if I am not mistaken). Is there a way to block send till the ACK was returned, or is there any other way to do this without writing my own TCP-implementation?
Regards
It sounds like you want to do the minimum implementation to achieve your goal. In this case you should set your socket to blocking, and following the send which blocks until all data is sent, you call recv which in turn will block until the ACK packet is received or the server end closes or aborts the connection.
If you wanted to go further with your implementation you'd have to structure your client application in such a way that supports asynchronous communication. There are a few techniques with varying degrees of complexity; polling using select() simple, event model using WSASelectEvent/WSAWaitForMultipleEvents challenging, and the IOCompletionPort model which is very complicated.
peudocode... Will wait until ack is recevied, after which time you can call whatever functionallity you want -i chose some made up function send_data.. which would then send information over the socket after receiving the ack.
data = ''
while True
readable, writable, errors = select([socket])
if socket in readble
data += recv(socket)
if is_ack(data)
timer.start() #not sure why you want this
break
send_data(socket)

Resources