ZeroMQ Clone pattern and late-joining client - zeromq

The ZeroMQ guide describes in the Getting an Out-of-Band Snapshot section that
The client first subscribes to updates and then makes a state request. This guarantees that the state is going to be newer than the oldest update it has.
How does making the subscription first guarantee that the client will receive all updates newer than the snapshot state? For example
Client subscribes to state updates
Client requests the state snapshot
Client receives the state snapshot
State changes happen at the server
Client's subscription to state changes is complete
So the client would miss the state changes happening on step 4. Is this scenario possible?

Allow me to describe the process a little more fully:
Server publishes updates as they occur
New client subscribes to updates (client SUB to server PUB)
New client requests current state from server (client DEALER to server ROUTER) - (IMPORTANT: it's assumed it will take longer for this request to reach the server and begin building the snapshot this than it takes for the SUB socket to finish connecting and subscribe to updates - this is generally a reasonable assumption, but note it)
Server builds snapshot of current state to respond to request
Server continues to publish updates as they occur
New client queues all of these updates that they are subscribed to - does not process them yet (this is part of ZMQ "for free")
Server sends back current state (IMPORTANT: If the state request from the client occurred after the subscription completed, then one of two scenarios is true: either (A) there were no new updates after the new client joined, and so the state is just the history before the new client joined, or (B) there were new updates that are both in the state and queued in the client's SUB socket. (A) is trivially correct, so we'll focus on (B).)
New client processes the state - this brings it up to current.
New client begins to process the messages in the SUB socket. If there are any we check them against the history we now have. If we already have this update (from the state), we discard it. If we don't, it's a new message and we deal with it.
New client continues to process the messages as normal, all caught up to date and processing all new messages.
... even though in the example code, the SUB socket doesn't start to recv() messages until after it receives the state, it's still getting them from the publisher and queuing them until it's ready to process them, so there's no scenario where an update is missed, instead the opposite scenario where messages are duplicated is planned for and handled.

Related

How to grab the latest message sent from each connection

I have a ZMQ_PULL/ZMQ_PUSH socket connection.
I have multiple ZMQ_PUSH connections pushing to a single ZMQ_PULL connection.
ZMQ_PUSH connection 1----->
ZMQ_PUSH connection 2-----> ZMQ_PULL
ZMQ_PUSH connection N----->
I do not need every message, I just need the latest message that was sent. I am doing some inference on the back end and am streaming the results to the ZMQ_PULL socket.
I have set the ZMQ_PULL socket to Conflate=true
"If set, a socket shall keep only one message in its inbound/outbound queue, this message being the last message received/the last message to be sent. Ignores ZMQ_RCVHWM and ZMQ_SNDHWM options."
But after testing I realize I actually need the last message of each connection, not just the last message. So, if 3 connections, it grabs in a round robin from each connection, so I constantly have the latest from each connection.
Is there an option that is like Conflate, but instead of for all messages, it is for each connection?
Docs: http://api.zeromq.org/4-0:zmq-setsockopt
Is there an option that is like Conflate, but instead of for all messages, it is for each connection?
No.
The documentation you cite explains that 0MQ does not currently
offer direct support for such a single-socket use case.
You could certainly code it up and submit an upstream PR
so that future revs of 0MQ offer such functionality.
Given that you'll need app-level support to make
this work with 0MQ 4.3, simplest approach would
be to maintain N ZMQ_PULL sockets with ZMQ_CONFLATE
set, as you're already aware.
An alternate approach would be to assign a dedicated
thread or process to keep draining the existing muxed
socket, and update a shared memory data structure
that interested clients could consult.
The idea is to burn a core on keeping the queue
mostly empty, while doing no processing,
just focusing on communications.
Then other cores can examine "most recent message"
and each one then embarks on some expensive processing,
while another core continues to keep the queue drained.
This is essentially offering the 0MQ service proposed
above but at a different place in the stack,
up a level, within your application.
To do this in a distributed way,
the "queue draining service" would need to
know about idle workers.
That is, a worker could publish a brief
"I just completed an expensive task" message,
which would trigger the drainer to post
a fresh work item, never using shared memory at all.
This lets the drainer worry about eliding dup messages
that arrived when no one was available to immediately
start work on them, which have been superseded by a
more recent message.

Spring State Machine Asynchronous Processes

I am trying to solving a problem where the spring state machine have actions which invoke external long running processes via RabbitMQ. Here are the steps:
A state machine event is issued
The associated Action send a message to an external microservice via RabbitMQ
The external microservice takes 1 hour to process the request and send the response back to the State machine
The state machine picks up the message and update the state.
The issue I am having is how to block the state machine and wait for the response from that remote service before updating the state. I would greatly appreciate any help regarding this
You do not need explicit blocking of the state machine.
The machine will remain in the same target state (pre step 1) unless the next event (in your case response from the long running process is received), for that instance of a machine.
If to track, you can have another intermediate state "WAITINGFORMESSAGE", and transition from this stage to the next stage on receiving of message.

Front-facing REST API with an internal message queue?

I have created a REST API - in a few words, my client hits a particular URL and she gets back a JSON response.
Internally, quite a complicated process starts when the URL is hit, and there are various services involved as a microservice architecture is being used.
I was observing some performance bottlenecks and decided to switch to a message queue system. The idea is that now, once the user hits the URL, a request is published on internal message queue waiting for it to be consumed. This consumer will process and publish back on a queue and this will happen quite a few times until finally, the same node servicing the user will receive back the processed response to be delivered to the user.
An asynchronous "fire-and-forget" pattern is now being used. But my question is, how can the node servicing a particular person remember who it was servicing once the processed result arrives back and without blocking (i.e. it can handle several requests until the response is received)? If it makes any difference, my stack looks a little like this: TomCat, Spring, Kubernetes and RabbitMQ.
In summary, how can the request node (whose job is to push items on the queue) maintain an open connection with the client who requested a JSON response (i.e. client is waiting for JSON response) and receive back the data of the correct client?
You have few different scenarios according to how much control you have on the client.
If the client behaviour cannot be changed, you will have to keep the session open until the request has not been fully processed. This can be achieved employing a pool of workers (futures/coroutines, threads or processes) where each worker keeps the session open for a given request.
This method has few drawbacks and I would keep it as last resort. Firstly, you will only be able to serve a limited amount of concurrent requests proportional to your pool size. Lastly as your processing is behind a queue, your front-end won't be able to estimate how long it will take for a task to complete. This means you will have to deal with long lasting sessions which are prone to fail (what if the user gives up?).
If the client behaviour can be changed, the most common approach is to use a fully asynchronous flow. When the client initiates a request, it is placed within the queue and a Task Identifier is returned. The client can use the given TaskId to poll for status updates. Each time the client requests updates about a task you simply check if it was completed and you respond accordingly. A common pattern when a task is still in progress is to let the front-end return to the client the estimated amount of time before trying again. This allows your server to control how frequently clients are polling. If your architecture supports it, you can go the extra mile and provide information about the progress as well.
Example response when task is in progress:
{"status": "in_progress",
"retry_after_seconds": 30,
"progress": "30%"}
A more complex yet elegant solution would consist in using HTTP callbacks. In short, when the client makes a request for a new task it provides a tuple (URL, Method) the server can use to signal the processing is done. It then waits for the server to send the signal to the given URL. You can see a better explanation here. In most of the cases this solution is overkill. Yet I think it's worth to mention it.
One option would be to use DeferredResult provided by spring but that means you need to maintain some pool of threads in request serving node and max no. of active threads will decide the throughput of your system. For more details on how to implement DeferredResult refer this link https://www.baeldung.com/spring-deferred-result

Howto find out all the subscribed to filters in a PUB server?

I have a PUB server. How can it tell what filters are subscribed to, so the server knows what data it has to create?The server doesn't need to create data once no SUB clients are interested in.
Say the set of possible filters is huge ( or infinite ), but subscribers at any given time are just subscribed to a few of them.
Example: Say SUB clients are only subscribed to a weather feed data for a few area codes in New York and Paris. Then the PUB server shouldn't have to create weather data for every other area code in every other city in the world, just to throw it all away again.
How do you find out all the subscribed to filters in a PUB server?
If there is no easy way, how do I solve this in another way?
I'll answer my own question here in case its of use to anyone else.
The requirements where:
The client should be able to ask the server what ids (topics) are available for subscription.
The client should chooses the id's it is interested in and tell the server about it.
The server should created data for all subscribed too id's and send that data to clients.
The client and server should not block/hang if either one goes away.
Implementation:
Step 1. Is two way traffic, and is done with REQ/REP sockets.
Step 2. Is one way traffic from one client to one server, and is done by PUSH/PULL sockets.
Step 3. Is one way traffic from one server to many clients, and is done by PUB/SUB sockets.
Step 4. The receives can block either the server or client if the other one is not there. Therefore I followed the "lazy pirate pattern" of checking if there is anything to receive in the queue, before I try and receive. (If there is nothing in the queue I'll check again on the next loop of the program etc).
Step 4+. Clients can die without unsubscribing, and the server wont know about it, It will continue to publish data for those ids. A solution is for the client to resends the subscription information (with a timestamp) every so often to the server. This works as a heartbeat for the ids the client has subscribed too. If the client dies without unsubscribing, the server notices that some subscription ids have not been refreshed in a while (the timestamp). The server removes those ids.
This solution seems to work fine. It was a lot of low level work though. It would be nice if zeromq was a bit higher level, and had some common and reliable architectures/frameworks ready to use out of the box.

How to push all data to late subscribers?

I would like to know if zmq already solves following problem (or) the application sitting on top of zmq needs to take care of it.
1) A central publisher which publishes data to all subscribers. This data is static in nature, something like configuration. The data can be modified at any point in time.
2) Multiple subscribers subscribe to messages from this publisher. The publisher can join at any point in time.
3) If data changes, publisher should just publish the diff to the existing subscribers.
4) If a subscriber joins later, the publisher should publish all the data (current configuration) to the new subscriber.
Zeromq guide suggests following for solving Slow Joiner syndrome but this does not solve above problem.
http://zguide.zeromq.org/page:all#Slow-Subscriber-Detection-Suicidal-Snail-Pattern
The Clone pattern from the Guide does precisely what you want.
The problem I'm seeing with your setup is that it requires all the subscribers to have the same state. If all subscribers are at version 7 and you publish the 7-to-8 diff, then they all update to version 8. But this requires a tightly-coupled state synchronization between nodes. How would you handle the case when subscribers get out of sync?
Consider this alternative setup:
the "publisher" has a single ROUTER socket that it binds
each "subscriber" has a single DEALER socket that connects to the ROUTER
can't use a REQ socket because that would prohibit the sending of "update-hints" (details to follow)
when a subscriber i joins the network, it sends an "update" request to the publisher, so that the publisher is aware of the subscriber's identity and his current version version[i]
the publisher responds with the diffs necessary to bring subscriber i up to date
if data changes on the publisher (i.e., a new version) it sends an "update-hint" to all of the known subscribers
when a subscriber receives an "update-hint," it performs an "update" request
(optional) subscribers periodically send an "update" request (infrequent polling)
This approach has the following benefits:
the publisher is server; the subscribers are clients
the publisher never initiates the sending of any actual data - it only responds to requests from clients (that is, the "update-hints" don't count as sending actual data)
the subscribers are all independently keeping themselves up to date (eventual consistency) even though they may be out of sync intermittently

Resources