I have been reading up on zmq design patterns but I haven't seem to find that fits my need.
1. Box A sends info (json) to Box B and C; B and C gets different info from each other
2. Boxes B and C do some work based on info received from Box A
3. After finishing the work, Boxes B and C sends result back to Box A
Forwarder device (http://learning-0mq-with-pyzmq.readthedocs.org/en/latest/pyzmq/devices/forwarder.html) can achieve step 1 and 2 but not 3, correct?
Are there any patterns I can use to achieve?
Is it simple request/reply pattern?
If so, is there a centralized request/reply pattern so that Box A doesnt pick Boxes B and C but rather Box A sends info to something central and it knows to send to Boxes B and C and send the result back to Box A?
This looks like a pretty basic Load Balancing pattern which is in the guide. A is the controller and will be a ROUTER, while the workers, B and C, are DEALERS. The messaging is simple enough; the dealers send an initial message to the controller to say "I'm ready". The controller then hands out work to the ready workers.
This topology is the opposite of Jason's answer. Which you choose just depends on how you're wanting to extend your application. When the controller hands out work, it really ought to go to a worker that is ready to handle it. With the Load Balancing pattern that is guaranteed.
This is a pretty basic DEALER/ROUTER pattern.
DEALER sockets are round-robin, which means it'll send one request to box B, then the next to box C, then the next to box B, etc. If you want to hold any work until the worker is completed, you just have to know the current count of available workers.
On box B and box C, use a ROUTER socket (or a REP socket if your use case is simple enough, but that'll limit your options). Receive the work, work on it, send it back, wait for more work.
There are many examples like this in the guide, which I recommend you read.
Related
I have multiple servers, at any point, one and only one will be the leader whcih can respond to a request, all others just drop the request. The issue is that the client does not know which server is the leader.
I have tried using a pub socket on the client for the parallel request out, however I can't work out the right semantics for the response. In terms of how to get the server to respond to that specific client.
A hacky solution which I have tried is to have a sub socket on the client to pub sockets on all the servers, with the leader responding by publishing a message with a filter such that it only goes to the client.
However I am unable to receive any responses this way, the server believes that it sent the message and the client believes it subscribed to "" but then doesn't receive anything...
So I am wondering whether there is a more proper way of doing this? I have thought that potentially a dealer/router with sending to a specific client would work, however I am unsure how to do that.
Essentially I am trying to do a standard Req/Rep however doing the req in parallel to all the nodes, rather than round robin.
UPDATE: By sending the routing id of the dealer in the pub request, making the remote call idempotent (just returning pre-computed results on repeated attempts), and then sending the result back via a router, with message filtering on the receiving side, it now works.
Q : " is (there) a more proper way of doing this? "
Yes.
Start to apply the Maslow's Hammer rule:
“When the only tool you have is a hammer, every problem begins to resemble a nail.”
In other words, do not try use (one) hammer for solving every problem. PUB/SUB-archetype was designed to serve those-and-only-those multi-party Formal-Communications-Pattern archetypes, where many SUB-scribe to .recv() some PUB-lisher(s) .send()-broadcast messages, but nothing other.
Similarly, REQ/REP-archetype was defined and implemented so as to serve one-and-only-one multi-party distributed Formal-Communications-Pattern ( and will obviously not meet any use-case, which has any single other or even a slightly different requirement ).
Users often require some special, non-trivial features, that obviously were not a part of the said trivial Formal-Communications-Pattern archetype primitives ( those ready-made blocks, made available in the ZeroMQ toolbox ).
It is architecs' / designers' role to define, analyse and implement any more complex user-specific distributed-behaviour definition ( a protocol ) and to implement it, most often using a layered combination of the ready-made ZeroMQ primitives.
If in doubts, take a sheet of paper and pencil, draw a small crowd of kids on playground and sketch their "shouts", their "listening", their "silence", "waiting" and "doubts", their many or few "replies", their "voting" and "anger" of not being voted for by friends, their fight for a place on the Sun and their "persistence" not to let others take theirs turn and let 'em sit on the "swing" after releasing the so far pleasurable swinging oneselves.
All this is the part of finding the right mix of ( protocol-orchestrated ) levels of control and levels of freedom to act.
There we get the new, distributed-behaviour, tailor-made for your specific use-case.
Probability to find a ready-made primitive tool to match and fulfill any user-specific use case is limitlessly close to Zero ( sure, unless one's own, user-specific use-case requirements match all those of the primitive archetype, but that is not a user-specific use-case, but a re-use of an already implemented archetype for the very same situation, that was foreseen by the ZeroMQ fathers, wasn't it? )
Again, welcome to the art of Zen-of-Zero.
Maylike to readthis and this and this
Let's say I want to set up and event-driven architecture with services A-D where the events propagate as follows
A
/ \
B C
/
D
In other words,
(1) A publishes an event
(2) Subscribers B and C receive A's event
(3) C publishes an event
(4) Subscriber D receive's C's event
One way is to have services B and C directly listen to a queue into which A posts messages. But the issue I see with this is maintenance. Once the system becomes complicated with 1000s of subscriptions, it becomes difficult to have any visibility into how the updates are propagating.
A solution I propose to this problem is to have another service X that knows the tree in the in the first image and is responsible for directing the propagation of events according to the tree. Every service publishes its event to X and it publishes the event to the listening services. So it's kinda of a middleman like
A
|
X
/ \
B C
|
X
|
D
This also makes it easier to track the event propagation.
Are there any downsides to this (other than extra cost associating with twice as much message transferring)?
You’re thinking of events like they are implemented in a Winforms UI where the publisher sends the event directly to the subscriber. That’s not how events work in an EDA architecture. The word “event” has taken on a whole new meaning.
Before we start, you’re jumbling together the ideas of a message and an event when they really need to be kept separate. A message is a request for some action to happen, while an event is notification that something has already happened. The important distinction for this discussion is that a message publisher assumes 1 or more other processes will receive and process the message. If the message is not processed by something, downstream errors will occur. An event has no such assumption and can go unread without adversely affecting anything. Another difference is that once messages are processed they are typically thrown away, whereas events are kept for an extended period (days, or weeks).
With that in mind, the ‘X’ service you talk about already exists (please don’t build one) and is integral to the process – it’s called the bus. There are 2 types of bus; a message bus (think RabbitMQ, MSMQ, ZeroMQ, etc) or event bus (Kafka, Kinesis, or Azure Event Hub). In either case, a publisher puts a message on to the bus and subscribers get it from the bus. You may implement the bus servers as multiple physical buses, but when imagining it think of them all being the same logical bus.
The key point that’s tripping you up, and it’s a subtle difference, is thinking that the message bus has business logic indicating where messages go. The business logic of who gets what message is determined by the subscribers – the message bus is just a holding place for the messages to wait for pickup.
In your example, A publishes an event to the bus with a message type of “MT1”. B and C both tell the bus that they are interested in events of type “MT1”. When the bus receives the request from B and C to be notified of “MT1” messages, the bus creates a queue for B and a queue for C. When A publishes the message, the bus puts a copy in the “B-MT1” queue and a copy in the “C-MT1” queue. Note that the bus doesn’t know why B and C want to receive those messages, only that they’ve subscribed.
These messages sit there until processed by their respective subscribers (the processes can poll or the bus can push the messages, but the key idea is that the messages are held until processed). Once processed, the messages are thrown away.
For C to communicate with D, D will subscribe to messages of type “MT2” and C will publish them to the bus.
Constantin’s answer above has a point that this is a single point of failure, but it can be managed with standard network architecture like failover servers, local message persistence, message acknowledgements, etc.
One of your concerns is that with 1000’s of subscriptions it becomes difficult to follow the path, and you’re right. This is an inherent downside of EDA and there’s nothing you can do about it. Eventual consistency is also something the business is going to complain about, but it’s part of the beast and is actually a good thing from a technical perspective because it enables more scalability. The biggest problem I’ve found using the term Eventual Consistency is that the business thinks it means hours or days, not seconds.
BTW, This whole discussion assumes the message publishers and subscribers are different apps. All the same ideas can be applied within the same address space, just with a different bus. If you’re a .net shop look at Mediatr. For other tech stacks, there are similar solutions that I’m sure google knows about.
If your main concern is visibility into the propagation of events (which is a very valid concern for debugging and long-term application maintenance of a distributed system), you can use a correlation identifier to trace the generation of messages from the initial event through the entire chain. You don't need to build another layer of orchestration -- let your messaging platform handle that for you.
Most messaging platforms/libraries have the concept built in: e.g., NServiceBus defines a ConversationId field in the message headers, and AMQP defines a correlation-id field in the basic messaging model.
Your system should have some kind of logging that allows you to audit messages -- the correlation ID will allow you to group all messages that result from a single command/request to make debugging distributed logic much simpler.
If you set a GUID in the client requests, you can even correlate actions in the UI to the backend API, right through all the events recursively generated.
It is OK but the microservices shouldn't care how they get the messages in the first place. From their point of view the input messages just arrive. You will then be tempted to design your system to depend on some global order of events, which is hard in a distributed scalable system. Resist that temptation and design your system to relay only on local ordering of events (i.e. the ordering in an Event stream emitted by an Aggregate in Event sourcing + DDD).
One downside that I see is that the availability and the scalability may be hurt. You will then have a single point of failure for the entire system. If this fails everything fails. When it needs to be scaled up then you will have again problems as you will have distributed messaging system.
This is more of a hypothetical question, so I can't really show any code examples. Imagine if a site like Twitter wanted to live-update stats on a Tweet via web sockets/Socket.io. In terms of performance, which of these would be the best approach?
Each action (like, retweet, reply) sends a message to the server, which then gets emitted to all clients, and the client is responsible for updating the appropriate tweet.
Each tweet the client loads is connected to a different room so that it only emits and receives messages relevant to itself.
Other?
Or perhaps it's dependent on the scale of the application? Maybe 1 is better if you had a Twitter clone with only a few users, whereas I would think 2 is better in Twitter's case because it's a matter of hundreds of "rooms" vs millions of signals/second? And if that's the case, at what point is one approach preferred over the other?
At scale, you do not want to be sending messages to clients that they did not ask for and do not have any use for. Imagine a twitter client that was receiving every single tweet being sent in real time. That could overwhelm that client and it would mean the server would be delivering every single tweet to every single connected client. That obviously doesn't scale on either the server side or the client side.
So option 1 is out.
The appropriate solution has the server send to the client only the messages that is has a particular interest in seeing. This works just fine at any scale. I can't tell whether your option 2 is that or not since rooms are just a tool for making groups of connections that you can send the same message to - they don't really decide who gets what message - that logic must be baked into your server code.
For a twitter-like service, it seems you're going to have to have a system where your server can easily tell which users have an interest in this particular new message. That can presumably be for a number of reasons such as they are following the author, they are following a hashtag present in the message, they are mentioned in the message, etc... That is server-side logic, not just simple rooms.
In Spring Integration, I have a chain of services, like this:
message -> A -> B -> C -> D -> ... -> output
This works fine. I want to make each of the services asynchronous and to make them pessimistic. Each of them will get a message, process it and send it to the next service in chain. However, it will not wait till the whole chain finishes. It will continue processing the next message and so on. Standard async here.
However, let's say service B is slower than A and that it accumulates 10k messages in its inbound channel queue and at that time the system crashes. I want to be able to restore the system by figuring out where I left and re-processing the messages. For that reason, I want each of the services to know which of the messages it processed was successfully consumed by the following service. The difference between sent vs. processed.
My idea is to do it similar to this (fancy ascii):
-> A --> B -> C -> ...
^ |
| ack |
\-----/
That is, A will send to B, B will process and when it is done successfully it will send an ack to A. A will then remove that particular message from the store, so that the next time it runs, it will not re-process it. I thought I would just put a splitter after B that will call a different method on service A (i.e. ackProcessed).
Is this how it should be done in SI or is there another way I'm missing? I'm primarily asking for a confirmation I'm not missing something supported out-of-the-box or something that will not force me to create a splitter after each of the services.
It wouldn't be a splitter; more likely a pub-sub channel and the ack would probably want to go to a different method in A (i.e. a different service-activator that references the same bean, different method; and the methods share some state).
An easier solution would be to use a persistent message channel (e.g. JMS, RabbitMQ, or a message-store-backed QueueChannel). That way the framework will take care of everything for you.
I'm just starting to research AMQP and I'm wondering if I'd be using it for something it's not designed for. Here's something like what I want to do:
ClientA does goes about it's business
and publishes it's state to some
exchange (correct me if I use the
wrong terms anywhere).
ClientB connects to the same broker
and "says what publishers are
publishing here? I choose you,
clientB. What is going on?".
ClientA says "My foo is bar and my baz
is true"
ClientB says "OK. Set your baz to
false"
edit for a less abstract example"
ClientA talks/listens to a hardware
device, say a video projector. When
ClientB comes online, it wants to find
any projector clients (like ClientA)
that are connected and then to know
the status of the projectors (is the
lamp on?) and also change, if it needs to, the status
(turn the lamp off). So ClientA is
keeping some state (lamp is off) and
can send it out when requested, and
call also respond to commands from the
exchange and convert and pass them to
the projector (turn lamp on).
I'm finding it hard to follow your example, but it sounds like you want these A and B types to have back-and-forth conversations with each other. Is that correct?
AMQP is better suited for asynchronous message passing, and to add the kind of point-to-point style you're describing requires that you set up request and reply queues so that clients can both send and receive messages. It's certainly possible to have clients both publish and consume messages.
This is possible and it would make sense if the different actors in your example, are networked devices because AMQP would provide a loosely coupled way of messaging.
One thing to watch out for is the last abstract line where client B says "OK, set some attribute". That sounds suspiciously like a scenario where subroutine calls return some value and then the next step takes place. AMQP can certainly simulate that kind of RPC, but it works better when processes can send a message and don't have to wait for completing.
If most of your messaging doesn't involve waiting for turnaround replies, then AMQP sounds like a fit for what you are doing. But if most of your needs are RPC, then it may not be the best choice.
AMQP really shines when there are future possibilities, for instance in your scenario, if you needed to add a couple thousand projectors, 10,000 client Bs, and several other device types that also need to exchange status. The loose coupling of AMQP makes it easy to add other applications to the broker, just by declaring new exchanges.