0MQ: Can you drop a message after a timeout in a REQ/REP pattern? - zeromq

In my 0MQ applications I usually do this to deal with a timeout:
import zmq
ctx = zmq.Context()
s = ctx.socket(zmq.DEALER)
s.connect("tcp://localhost:5555")
# send PING request
v = <some unique value>
s.send_multipart(["PING", v])
if s.poll(timeout * 1000) & zmq.POLLIN:
msg = s.recv_multipart()
...
However if the server is not running and minutes later goes online, then 0MQ will automatically reconnect and send the message.
However if I put the send PING command in a loop (once a second) and the
server is down, then once the server goes back online I'll get on the next recv_multipart()-call the old messages
that remained in the internal 0MQ queue while the server was offline. Because I don't care about old
messages, I though I could do this:
s = ctx.socket(zmq.DEALER)
s.connect("tcp://localhost:5555")
while True:
# send PING request
v = <some unique value>
s.send_multipart(["PING", v])
if s.poll(timeout * 1000) & zmq.POLLIN:
msg = s.recv_multipart()
...
else:
s.close()
s = ctx.socket(zmq.DEALER)
s.connect("tcp://localhost:5555")
time.sleep(1)
But this is a bad idea, after a while ctx.socket raises ZMQError: Too many open files.
Setting the ZMQ_LINGER socket option to 0 seems to help
here, but now I don't like this strategy, seems wrong to me.
So ideally I'd like to drop the previously sent message if a read timeout
happens. Is this a) possible and b) a good idea at all? I'm not sure that this
would be correct though, it may be that 0MQ is able to physically send the
message but the server crashes before it can send back anything, so dropping
would be impossible because there wouldn't be anything to drop, would it?
So my question is: what should I do in this situation? Am I possibly looking at
this problem from the wrong angle?

Q : "Can you drop a message after a timeout in a REQ/REP pattern?"
No.
Q : Is this a) possible and b) a good idea at all?
a) Yes.
b) Yes.
Q : what should I do in this situation?
Best avoid the REQ/REP certainty of falling into a mutual deadlock ( you just never know when it happens - many posts with details here ) + setup the connections layer so as to provide self-defensive means for delivering only the last message over a healthy connection :
...
s = ctx.socket( zmq.DEALER )
s.setsockopt( zmq.LINGER, 0 ) # ALWAYS, no excuse, you never know the peers' versions
s.setsockopt( zmq.IMMEDIATE, 1 ) # prevent sending over incomplete yet connections
s.setsockopt( zmq.CONFLATE, 1 ) # Does not support multi-part, so .pack() payload
s.connect( "tcp://localhost:5555" )
...

Related

Set timeout ZMQ - with a condition

I am working on a client-server app - with several clients.
A process creates jobs, and initiate them. The server waits on a ZMQ socket to these jobs answers.
The problem:
Currently I am working with 6 jobs, and I want to receive an answer from at least 4. After 4 answers were received - I want to wait 2 more seconds and if didn't get any more results - process the results I got, and then return to listen on the socket.
My thoughts:
I have seen several ways (Poll, ZMQ_CONNECT_TIMEOUT etc.), but I couldn't figure a way to use it in my case.
I thought of initiating another process in the server once 4 jobs were done, which goes to sleep for 2 seconds and then sends a SIGSTP, but I'm afraid it will affect the server's return to listen on the socket.
Any suggestions?
Well, no one answered, but I found a solution that works for me.
Posting cause it might help others.
What I did is as simple as it gets:
while active_clients > 0:
# check if we already have MIN_REQ sites
if requirement:
try:
time.sleep(10) # give another chance to the last client
message = socket.recv_pyobj(flags=zmq.NOBLOCK) # peek to see if received a msg
results.append(message)
break
except zmq.Again as e:
break
else:
# Wait for next response from client
message = socket.recv_pyobj()
results.append(message)
active_clients -= 1
# Send reply back to client
socket.send_pyobj(b"ok")
Notice the if else pattern
If requirement (which can be any requirement you'd like) is fulfilled - just open a Non-Blocking socket.
You could also remove the break statement, and enter another time to loop.

How to prevent buffering/latency with PUB/SUB?

I'm sending video as a sequence of images (equals zmq messages) but sometimes, perhaps when the network is slow, they are received at a slower rate than they're sent and a growing latency appears, seemingly up to about a minute of video or 100s of images or megabytes of data. It usually clears itself eventually with the subscriber receiving messages at a faster rate than the publisher sends.
Instead, I want it to discard missed messages the same way it's supposed to if the subscriber is too slow recving them. I hoped zmq.CONFLATE=1 would do this but it doesn't. How then? I suspect they're being buffered at the publisher, which is not supposed to have any zmq buffer, or in the network stack somehow.
Simplified server code
context = zmq.Context()
socket = context.socket(zmq.PUB)
socket.bind("tcp://*:12345")
camera = PiCamera()
stream = io.BytesIO()
for _ in camera.capture_continuous(stream, 'jpeg', use_video_port=True):
stream.truncate()
stream.seek(0)
socket.send(stream.read())
stream.seek(0)
Simplified client code
# Initialization
self.context = zmq.Context()
self.video_socket = self.context.socket(zmq.SUB)
self.video_socket.setsockopt(zmq.CONFLATE, 1)
self.video_socket.setsockopt(zmq.SUBSCRIBE, b"")
self.video_socket.connect("tcp://" + ip_address + ":12345")
def get_image(self):
# Receive the latest image
poll_result = self.video_socket.poll(timeout=0)
if poll_result == zmq.POLLIN:
return self.video_socket.recv()
else:
return None
The publisher is on a Raspberry Pi and the subscriber is on Windows.
I am not sure which version of python zmq you are using but based on the underlying c++ libzmq you need to:
Set the ZMQ_SNDHWM socket option on the server socket
Set the ZMQ_RCVHWM socket option on the client socket.
These options limit the number of messages to queue per completed connection in the case of pub/sub. If the queue grows larger than the HWM (high water mark) the messages will be discarded.
Also turn off conflate as that will interfere with these options.
Also set zmq.CONFLATE=1 on the server to keep only the latest message in the send queue.
Before binding the server socket
socket.setsockopt(zmq.CONFLATE, 1)
For some reason I mistakenly thought the PUB socket didn't have a send queue but it does.

How to send byte message with ZeroMQ PUB / SUB setting?

So I'm new to ZeroMQ and I am trying to send byte message with ZeroMQ, using a PUB / SUB setting.
Choice of programming language is not important for this question since I am using ZeroMQ for communication between multiple languages.
Here is my server code in python:
import zmq
import time
port = "5556"
context = zmq.Context()
socket = context.socket(zmq.PUB)
socket.bind("tcp://*:%s" % port)
while True:
socket.send(b'\x84\xa5Title\xa2hi\xa1y\xcb\x00\x00\x00\x00\x00\x00\x00\x00\xa1x\xcb#\x1c\x00\x00\x00\x00\x00\x00\xa4Data\x08')
time.sleep(1)
and here is my client code in python:
import zmq
context = zmq.Context()
socket = context.socket(zmq.SUB)
socket.connect("tcp://localhost:5556")
total_value = 0
for update_nbr in range (5):
string = socket.recv()
print (string)
My client simply blocks at string = socket.recv().
I have done some study, so apparently, if I were to send string using PUB / SUB setting, I need to set some "topic filter" in order to make it work. But I am not sure how to do it if I were to send some byte message.
ZeroMQ defines protocols, that guarantee cross-platform compatibility of both the behaviours and the message-content .
The root-cause:to start receiving messagesone must change the initial "topic-filter" state for the SUB-socket( which initially is "receive nothing" Zero-subscription )
ZeroMQ is a lovely set of tools, created around smart principles.
One of these says, do nothing on SUB-side, until .setsockopt( zmq.SUBSCRIBE, ... ) explicitly says, what to subscribe to, to start checking the incoming messages ( older zmq-fans remember the initial design, where PUB-side always distributes all messages down the road, towards each connected SUB-"radio-broadcast receivers", where upon each message receival the SUB-side performs "topic-filtering" on it's own. Newer versions of zmq reverse the architecture and perform PUB-side filtering ).
Anyway, the inital-state of the "topic-filter" makes sense. Who knows what ought be received a-priori? Nobody. So receive nothing.
Given you need or wish to start the job, an easy move to subscribe to anything ... let's any message get through.
Yes, that simple .setsockopt( zmq.SUBSCRIBE, "" )
If one needs some key-based processing and the messages are of a reasonable size ( no giga-BLOBs ) one may just simply prefix some key ( or a byte-field if more hacky ) in front of the message-string ( or a payload byte-field ).
Sure, one may save some fraction of the transport-layer overhead in case the zmq-filtering is performed on the PUB-side one ( not valid for the older API versions ), otherwise there is typically not big deal to subscribe to receive "anything" and check the messages for some pre-assembled context-key ( a prefix substring, a byte-field etc ) before the rest of the message payload is being processed.
The Best Next Step:
If your code strives to go into production state, not to remain as just an academia example, there will have to be much more work to be done, to provide surviveability measures for the hostile real-world production environments.
An absolutely great perspective for doing this and a good read for realistic designs with ZeroMQ is Pieter HINTJEN's book "Code Connected, Vol.1" ( may check my posts on ZeroMQ to find the book's direct pdf-link ).
Plus another good read comes from Martin SUSTRIK, the co-father of ZeroMQ, on low-level truths about the ZeroMQ implementation details & scale-ability

ZeroMQ: How to initialize a SUB and PUSH socket in same code? i.e. black box pattern but not using different machines

I have this code
context = zmq.Context()
app_worker = context.socket(zmq.PUSH)
app_worker.bind("tcp://127.0.0.1:9005")
app_sub = context.socket(zmq.SUB)
app_sub.connect("tcp://127.0.0.1:9004")
app_sub.setsockopt(zmq.SUBSCRIBE,'sometopic')
while True:
msg = app_sub.recv()
msg_data = msg.split(' ',1)
app_worker.send_json(msg_data[1])
print msg_data[1]
but when i run this, it is unable to receive any message from the publisher but when i comment this lines
app_worker = context.socket(zmq.PUSH)
app_worker.bind("tcp://127.0.0.1:9005")
it suddenly works. it is stated in the zeromq guide chapter 5 black box pattern that this is possible. if so, what am i doing wrong here?
You didn't supply enough data to solve this question with 100% assurances.
But based on what you did post the most obvious problem is that the port 9005 was already binded by someone else.
Its very likely your app_worker.send_json(msg_data[1]) is blocking (the entire thread) if there are no downstream nodes to PULL the messages.
Set the send_json to non blocking mode and check the error/exception returned
app_worker.send_json(msg_data[1], zmq.NOBLOCK)
The reason it "works" when you comment out the bind is because the send is just failing and not blocking.

Publisher finishes before subscriber and messages are lost - why?

Fairly new to zeromq and trying to get a basic pub/sub to work. When I run the following (sub starting before pub) the publisher finishes but the subscriber hangs having not received all the messages - why ?
I think the socket is being closed but the messages have been sent ? Is there a way of ensuring all messages are received ?
Publisher:
import zmq
import random
import time
import tnetstring
context=zmq.Context()
socket=context.socket(zmq.PUB)
socket.bind("tcp://*:5556")
y=0
for x in xrange(5000):
st = random.randrange(1,10)
data = []
data.append(random.randrange(1,100000))
data.append(int(time.time()))
data.append(random.uniform(1.0,10.0))
s = tnetstring.dumps(data)
print 'Sending ...%d %s' % (st,s)
socket.send("%d %s" % (st,s))
print "Messages sent: %d" % x
y+=1
print '*** SERVER FINISHED. # MESSAGES SENT = ' + str(y)
Subscriber :-
import sys
import zmq
import tnetstring
# Socket to talk to server
context = zmq.Context()
socket = context.socket(zmq.SUB)
socket.connect("tcp://localhost:5556")
filter = "" # get all messages
socket.setsockopt(zmq.SUBSCRIBE, filter)
x=0
while True:
topic,data = socket.recv().split()
print "Topic: %s, Data = %s. Total # Messages = %d" % (topic,data,x)
x+=1
In ZeroMQ, clients and servers always try to reconnect; they won't go down if the other side disconnects (because in many cases you'd want them to resume talking if the other side comes up again). So in your test code, the client will just wait until the server starts sending messages again, unless you stop recv()ing messages at some point.
In your specific instance, you may want to investigate using the socket.close() and context.term(). It will block until all the messages have been sent. You also have the problem of a slow joiner. You can add a sleep after the bind, but before you start publishing. This works in a test case, but you will want to really understand what is the solution vs a band-aid.
You need to think of the PUB/SUB pattern like a radio. The sender and receiver are both asynchronous. The Publisher will continue to send even if no one is listening. The subscriber will only receive data if it is listening. If the network goes down in the middle, the data will be lost.
You need to understand this in order to design your messages. For example, if you design your messages to be "idempotent", it doesn't matter if you lose data. An example of this would be a status type message. It doesn't matter if you have any of the previous statuses. The latest one is correct and message loss doesn't matter. The benefits to this approach is that you end up with a more robust and performant system. The downsides are when you can't design your messages this way.
Your example includes a type of message that requires no loss. Another type of message would be transactional. For example, if you just sent the deltas of what changed in your system, you would not be able to lose the messages. Database replication is often managed this way which is why db replication is often so fragile. To try to provide guarantees, you need to do a couple things. One thing is to add a persistent cache. Each message sent needs to be logged in the persistent cache. Each message needs to be assigned a unique id (preferably a sequence) so that the clients can determine if they are missing a message. A second socket (ROUTER/REQ) needs to be added for the client to request the missing messages individually. Alternatively, you could just use the secondary socket to request resending over the PUB/SUB. The clients would then all receive the messages again (which works for the multicast version). The clients would ignore the messages they had already seen. NOTE: this follows the MAJORDOMO pattern found in the ZeroMQ guide.
An alternative approach is to create your own broker using the ROUTER/DEALER sockets. When the ROUTER socket saw each DEALER connect, it would store its ID. When the ROUTER needed to send data, it would iterate over all client IDs and publish the message. Each message should contain a sequence so that the client can know what missing messages to request. NOTE: this is a sort of reimplementation of Kafka from linkedin.

Resources