Set timeout ZMQ - with a condition - zeromq

I am working on a client-server app - with several clients.
A process creates jobs, and initiate them. The server waits on a ZMQ socket to these jobs answers.
The problem:
Currently I am working with 6 jobs, and I want to receive an answer from at least 4. After 4 answers were received - I want to wait 2 more seconds and if didn't get any more results - process the results I got, and then return to listen on the socket.
My thoughts:
I have seen several ways (Poll, ZMQ_CONNECT_TIMEOUT etc.), but I couldn't figure a way to use it in my case.
I thought of initiating another process in the server once 4 jobs were done, which goes to sleep for 2 seconds and then sends a SIGSTP, but I'm afraid it will affect the server's return to listen on the socket.
Any suggestions?

Well, no one answered, but I found a solution that works for me.
Posting cause it might help others.
What I did is as simple as it gets:
while active_clients > 0:
# check if we already have MIN_REQ sites
if requirement:
try:
time.sleep(10) # give another chance to the last client
message = socket.recv_pyobj(flags=zmq.NOBLOCK) # peek to see if received a msg
results.append(message)
break
except zmq.Again as e:
break
else:
# Wait for next response from client
message = socket.recv_pyobj()
results.append(message)
active_clients -= 1
# Send reply back to client
socket.send_pyobj(b"ok")
Notice the if else pattern
If requirement (which can be any requirement you'd like) is fulfilled - just open a Non-Blocking socket.
You could also remove the break statement, and enter another time to loop.

Related

0MQ: Can you drop a message after a timeout in a REQ/REP pattern?

In my 0MQ applications I usually do this to deal with a timeout:
import zmq
ctx = zmq.Context()
s = ctx.socket(zmq.DEALER)
s.connect("tcp://localhost:5555")
# send PING request
v = <some unique value>
s.send_multipart(["PING", v])
if s.poll(timeout * 1000) & zmq.POLLIN:
msg = s.recv_multipart()
...
However if the server is not running and minutes later goes online, then 0MQ will automatically reconnect and send the message.
However if I put the send PING command in a loop (once a second) and the
server is down, then once the server goes back online I'll get on the next recv_multipart()-call the old messages
that remained in the internal 0MQ queue while the server was offline. Because I don't care about old
messages, I though I could do this:
s = ctx.socket(zmq.DEALER)
s.connect("tcp://localhost:5555")
while True:
# send PING request
v = <some unique value>
s.send_multipart(["PING", v])
if s.poll(timeout * 1000) & zmq.POLLIN:
msg = s.recv_multipart()
...
else:
s.close()
s = ctx.socket(zmq.DEALER)
s.connect("tcp://localhost:5555")
time.sleep(1)
But this is a bad idea, after a while ctx.socket raises ZMQError: Too many open files.
Setting the ZMQ_LINGER socket option to 0 seems to help
here, but now I don't like this strategy, seems wrong to me.
So ideally I'd like to drop the previously sent message if a read timeout
happens. Is this a) possible and b) a good idea at all? I'm not sure that this
would be correct though, it may be that 0MQ is able to physically send the
message but the server crashes before it can send back anything, so dropping
would be impossible because there wouldn't be anything to drop, would it?
So my question is: what should I do in this situation? Am I possibly looking at
this problem from the wrong angle?
Q : "Can you drop a message after a timeout in a REQ/REP pattern?"
No.
Q : Is this a) possible and b) a good idea at all?
a) Yes.
b) Yes.
Q : what should I do in this situation?
Best avoid the REQ/REP certainty of falling into a mutual deadlock ( you just never know when it happens - many posts with details here ) + setup the connections layer so as to provide self-defensive means for delivering only the last message over a healthy connection :
...
s = ctx.socket( zmq.DEALER )
s.setsockopt( zmq.LINGER, 0 ) # ALWAYS, no excuse, you never know the peers' versions
s.setsockopt( zmq.IMMEDIATE, 1 ) # prevent sending over incomplete yet connections
s.setsockopt( zmq.CONFLATE, 1 ) # Does not support multi-part, so .pack() payload
s.connect( "tcp://localhost:5555" )
...

Publisher finishes before subscriber and messages are lost - why?

Fairly new to zeromq and trying to get a basic pub/sub to work. When I run the following (sub starting before pub) the publisher finishes but the subscriber hangs having not received all the messages - why ?
I think the socket is being closed but the messages have been sent ? Is there a way of ensuring all messages are received ?
Publisher:
import zmq
import random
import time
import tnetstring
context=zmq.Context()
socket=context.socket(zmq.PUB)
socket.bind("tcp://*:5556")
y=0
for x in xrange(5000):
st = random.randrange(1,10)
data = []
data.append(random.randrange(1,100000))
data.append(int(time.time()))
data.append(random.uniform(1.0,10.0))
s = tnetstring.dumps(data)
print 'Sending ...%d %s' % (st,s)
socket.send("%d %s" % (st,s))
print "Messages sent: %d" % x
y+=1
print '*** SERVER FINISHED. # MESSAGES SENT = ' + str(y)
Subscriber :-
import sys
import zmq
import tnetstring
# Socket to talk to server
context = zmq.Context()
socket = context.socket(zmq.SUB)
socket.connect("tcp://localhost:5556")
filter = "" # get all messages
socket.setsockopt(zmq.SUBSCRIBE, filter)
x=0
while True:
topic,data = socket.recv().split()
print "Topic: %s, Data = %s. Total # Messages = %d" % (topic,data,x)
x+=1
In ZeroMQ, clients and servers always try to reconnect; they won't go down if the other side disconnects (because in many cases you'd want them to resume talking if the other side comes up again). So in your test code, the client will just wait until the server starts sending messages again, unless you stop recv()ing messages at some point.
In your specific instance, you may want to investigate using the socket.close() and context.term(). It will block until all the messages have been sent. You also have the problem of a slow joiner. You can add a sleep after the bind, but before you start publishing. This works in a test case, but you will want to really understand what is the solution vs a band-aid.
You need to think of the PUB/SUB pattern like a radio. The sender and receiver are both asynchronous. The Publisher will continue to send even if no one is listening. The subscriber will only receive data if it is listening. If the network goes down in the middle, the data will be lost.
You need to understand this in order to design your messages. For example, if you design your messages to be "idempotent", it doesn't matter if you lose data. An example of this would be a status type message. It doesn't matter if you have any of the previous statuses. The latest one is correct and message loss doesn't matter. The benefits to this approach is that you end up with a more robust and performant system. The downsides are when you can't design your messages this way.
Your example includes a type of message that requires no loss. Another type of message would be transactional. For example, if you just sent the deltas of what changed in your system, you would not be able to lose the messages. Database replication is often managed this way which is why db replication is often so fragile. To try to provide guarantees, you need to do a couple things. One thing is to add a persistent cache. Each message sent needs to be logged in the persistent cache. Each message needs to be assigned a unique id (preferably a sequence) so that the clients can determine if they are missing a message. A second socket (ROUTER/REQ) needs to be added for the client to request the missing messages individually. Alternatively, you could just use the secondary socket to request resending over the PUB/SUB. The clients would then all receive the messages again (which works for the multicast version). The clients would ignore the messages they had already seen. NOTE: this follows the MAJORDOMO pattern found in the ZeroMQ guide.
An alternative approach is to create your own broker using the ROUTER/DEALER sockets. When the ROUTER socket saw each DEALER connect, it would store its ID. When the ROUTER needed to send data, it would iterate over all client IDs and publish the message. Each message should contain a sequence so that the client can know what missing messages to request. NOTE: this is a sort of reimplementation of Kafka from linkedin.

Why is gevent.sleep(0.1) necessary in this example to prevent the app from blocking?

I'm pulling my hair out over this one. I'm trying to get the simplest of examples working with zeromq and gevent. I changed this script to use PUB/SUB sockets and when I run it the 'server' socket loops forever. If I uncomment the gevent.sleep(0.1) line then it works as expected and yields to the other green thread, which in this case is the client.
The problem is, why should I have to manually add a sleep call? I thought when I import the zmq.green version of zmq that the send and receive calls are non blocking and underneath do the task switching.
In other words, why should I have to add the gevent.sleep() call to get this example working? In Jeff Lindsey's original example, he's doing REQ/REP sockets and he doesn't need to add sleep calls...but when I changed this to PUB/SUB I need it there for this to yield to the client for processing.
#Notes: Code taken from slide: http://www.google.com/url?sa=t&rct=j&q=zeromq%20gevent&source=web&cd=27&ved=0CFsQFjAGOBQ&url=https%3A%2F%2Fraw.github.com%2Fstrangeloop%2F2011-slides%2Fmaster%2FLindsay-DistributedGeventZmq.pdf&ei=JoDNUO6OIePQiwK8noHQBg&usg=AFQjCNFa5g9ZliRVoN_yVH7aizU_fDMtfw&bvm=bv.1355325884,d.cGE
#Jeff Lindsey talk on gevent and zeromq
import gevent
from gevent import spawn
import zmq.green as zmq
context = zmq.Context()
def serve():
print 'server online'
socket = context.socket(zmq.PUB)
socket.bind("ipc:///tmp/jeff")
while True:
print 'send'
socket.send("World")
#gevent.sleep(0.1)
def client():
print 'client online'
socket = context.socket(zmq.SUB)
socket.connect("ipc:///tmp/jeff")
socket.setsockopt(zmq.SUBSCRIBE, '')
while True:
print 'recv'
message = socket.recv()
cl = spawn(client)
server = spawn(serve)
print 'joinall'
gevent.joinall([cl, server])
print 'end'
I thought when I import the zmq.green version of zmq that the send and receive calls are non blocking and underneath do the task switching.
zmq.green will only yield if these calls would block, it does not yield if they are ready (there's nothing to wait for). In your case the sender is always ready, so it never has a reason to yield.
Some pointers:
a minimal explicit yield is gevent.sleep(0), it doesn't need to be finite.
zmq.green only yields on blocking calls. That is, if a socket is always ready to send/recv when you ask it to, it will never yield.
socket.send only blocks when the socket is not ready to send (not (socket.events & zmq.POLLOUT)),
which can never actually be true of a PUB socket (you will see it at HWM for PUSH, DEALER, etc.).
in general, don't trust send to yield, because of the way zeromq works this will rarely be the case unless
you are exceeding the capacity of your configuration.
unlike send, recv regularly blocks in normal usage, so it yields on most calls. But if a peer is flooding your incoming buffer, repeated recv calls will not yield until there is nothing ready to receive, so you may again need to explicitly yield every so often to prevent starvation.
What zmq.green amounts to is turning send/recv into:
try:
socket.send(msg, zmq.NOBLOCK) # or recv
except zmq.ZMQError as e:
if e.errno == zmq.EAGAIN:
yield # and wait for socket to be ready, then try again
so if send/recv with NOBLOCK are always succeeding, the socket never yields.
To put it another way: If a socket has nothing to wait for, it won't wait.

How can I tell if my Ruby server script is being overloaded?

I have a daemonized ruby script running on my server that looks like this:
#server = TCPServer.open(61101)
loop do
#thr = Thread.new(#server.accept) do |sock|
Thread.current[:myArrayOfHashes] = [] # hashes containing attributes of myObject
SystemTimer.timeout_after(5) do
Thread.current[:string] = sock.gets
sock.close
# parse the string and load the data into myArrayOfHashes
Myobject.transaction do # Update the myObjects Table
Thread.current[:myArrayOfHashes].each do |h|
Thread.current[:newMyObject] = Myobject.new
# load up the new object with data
Thread.current[:newMyObject].save
end
end
end
end
#thr.join
end
This server receives and manages data for my rails application which is all running on Mac OS 10.6. The clients call the server every 15 minutes on the 15 and while I currently only have 16 or so clients calling every 15 min on the 15, I'm wondering about the following:
If two clients call at close enough to the same time, will one client's connection attempt fail?
How I can figure out how many client connections my server can accommodate at the same time?
How can I monitor how much memory my server is using?
Also, is there an article you can point me toward that discusses the best way to implement this kind of a server? I mean can I have multiple instances of the server listening on the same port? Would that even help?
I am using Bluepill to monitor my server daemons.
1 and 2
The answer is no, two clients connecting close to each other will not make the connection fail (however multiple clients connecting may fail, see below).
The reason is the operating system has a default so called listening queue built into all server sockets. So even if you are not calling accept fast enough in your program, the OS will still keep buffering incoming connections for you. It will buffer these connections for as long as the listening queue does not get filled.
Now what is the size of this queue then?
In most cases the default size typically used is 5. The size is set after you create the socket and you call listen on this socket (see man page for listen here).
For Ruby TCPSocket automatically calls listen for you, and if you look at the C-source code for TCPSocket you will find that it indeed sets the size to 5:
https://github.com/ruby/ruby/blob/trunk/ext/socket/ipsocket.c#L108
SOMAXCONN is defined as 5 here:
https://github.com/ruby/ruby/blob/trunk/ext/socket/mkconstants.rb#L693
Now what happens if you don't call accept fast enough and the queue gets filled?
The answer is found in the man page of listen:
The backlog argument defines the maximum length to which the queue of pending connections for sockfd may grow. If a connection request arrives when the queue is full, the client may receive an error with an indication of ECONNREFUSED or, if the underlying protocol supports retransmission, the request may be ignored so that a later reattempt at connection succeeds.
In your code however there is one problem which can make the queue fill up if more than 5 clients try to connect at the same time: you're calling #thr.join at the end of the loop.
What effectively happens when you do this is that your server will not accept any new incoming connections until all your stuff inside your accept-thread has finished executing.
So if the database stuff and the other things you are doing inside the accept-thread takes a long time, the listening queue may fill up in the meantime. It depends on how long your processing takes, and how many clients could potentially be connecting at the exact same time.
3
You didn't say which platform you are running on, but on linux/osx the easiest way is to just run top in your console. For more advanced memory monitoring options you might want to check these out:
ruby/ruby on rails memory leak detection
track application memory usage on heroku

Block TCP-send till ACK returned

I am programming a client application sending TCP/IP packets to a server. Because of timeout issues I want to start a timer as soon as the ACK-Package is returned (so there can be no timeout while the package has not reached the server). I want to use the winapi.
Setting the Socket to blocking mode doesn't help, because the send command returns as soon as the data is written into the buffer (if I am not mistaken). Is there a way to block send till the ACK was returned, or is there any other way to do this without writing my own TCP-implementation?
Regards
It sounds like you want to do the minimum implementation to achieve your goal. In this case you should set your socket to blocking, and following the send which blocks until all data is sent, you call recv which in turn will block until the ACK packet is received or the server end closes or aborts the connection.
If you wanted to go further with your implementation you'd have to structure your client application in such a way that supports asynchronous communication. There are a few techniques with varying degrees of complexity; polling using select() simple, event model using WSASelectEvent/WSAWaitForMultipleEvents challenging, and the IOCompletionPort model which is very complicated.
peudocode... Will wait until ack is recevied, after which time you can call whatever functionallity you want -i chose some made up function send_data.. which would then send information over the socket after receiving the ack.
data = ''
while True
readable, writable, errors = select([socket])
if socket in readble
data += recv(socket)
if is_ack(data)
timer.start() #not sure why you want this
break
send_data(socket)

Resources