dynamic topics with pub-sub with zmq, will that be fine?

dynamic topics with pub-sub with zmq, will that be fine? - zeromq

I've read my docs most examples are for basic use cases.
Where simply one process publish X event and another subscribe to X event.
But in my applications X is kind of variable. so lets say i've X means my user.
so i can do publish from one server event like user-ID means if i've 1000s of user connected to server so will that be Okay to publish and subscribe to so many dynamic topics, and then another 20 servers subscribe to that 1000s topics on this server.
Lets see the example.
i've 10 servers. each server with 1000 users connected. so total 10k users.
i need to send X data from each user to another user.
so i've did this.
X server publish user-ID data (1 publish user's who is connected, 1K publish)
Y server subscribe user-ID data (10k subscribe request to sent each server)
What should be optimal way of pub sub with dynamic topics so less bandwidth used among servers?
Notice::
user-ID is just an example where ID is dynamic number, and it publish some real time data which can't be stored anywhere.

In ZeroMQ subscription matching is implemented in the PUB socket with a prefix-matching trie. This is a very efficient data structure, and I would expect that 10K subscriptions and 10K msg/sec would be no problem at all.
The PUB socket only sends messages for matching subscriptions (so there is no "waste"). If a message doesn't match any subscription then the PUB socket will drop it. Matching messages are only sent to SUB sockets that have subscribed to them.
When you add or remove a subscription, the SUB socket will send a message its connected PUB socket(s). Each PUB socket will then update its topic trie.
My guess is 10k subs and 10k msgs/s is no problem, but the best thing to do would be to write some test code and try it out. Once nice thing about ZeroMQ is that it's not much work to test different architectures.

As far as I know in pyzmq API publisher can send messages to any topic
socket.send("%d %d" % (topic, messagedata))
and subscribers set a filter on these topics for topic of their interests with setsockopt
topicfilter = "10001"
socket.setsockopt(zmq.SUBSCRIBE, topicfilter)
So I think you can fully implement your plan.

Related

Zmq pub-sub pattern does the publisher need to upload multiple times in n*O(m) messages to subscribers?

Hello all assuming that we have a pub-sub pattern in zmq with many subscribers, one publisher, and a message of 3GB. My question is does the publisher send n x O(m) where n is the number of subscribers and m is the 3GB size or does it only uploads once the 3 GB and somehow the subscriber download it? so to avoid the n x O(m).
According to zmq docs pub-sub is a multicast pattern
"ZeroMQ’s low-level patterns have their different characters. Pub-sub
addresses an old messaging problem, which is multicast or group
messaging"
so i expect not n x O(m) but just O(m) am i correct?

It all depends on the transport you choose rather than just the zeromq pattern (in this case pub/sub).
If you choose tcp then there will be X copies of the data sent to the subscribers from the host you are running on because tcp has a separate connection to each one. If you choose pgm (reliable multicast) there will be one copy sent from the host and it will end up being fanned out in a router downstream to each subscriber.
There is also a newer radio/dish pattern that supports basic multicast but you lose the publisher side subscription filtering.

In general you only send once through the pub socket and it gets send to all subscribers.
See docs here: https://zeromq.org/socket-api/#publish-subscribe-pattern
PUB socket
A PUB socket is used by a publisher to distribute data. Messages sent are distributed in a fan out fashion to all connected peers. This socket type is not able to receive any messages.
When a PUB socket enters the mute state due to having reached the high water mark for a subscriber, then any messages that would be sent to the subscriber in question shall instead be dropped until the mute state ends. The send function does never block for this socket type.

Autoscaling Backend and RabbitMQ Queues

I have an IoT system around 100k devices, publishing their state every second to the backend written in Java/Spring Boot. Until now, I was using gRPC but I see excessive CPU usage so I was planning to let the devices publish to RabbitMQ and let the backend workers process them.
Processing: Updating the db table.
Since data from same device must be processed sequentially, I was planning to use RabbitMQ's consistent hashing exchange, and bind the n queues for n workers. But I'm not sure how it'd work with autoscaling.
I thought of creating auto-delete queues for each backend instance and binding them to the exchange but I couldn't figure out:
How to rebalance messages already sitting in the queue?
If connectivity issue occurs, queue might get deleted, so I need to re-forward those messages to the existing queues.
Is there any algorithms for handle the autoscaling of workers? For instance if messages pile up, I need to spawn new workers even though cpu/memory usage is low.

I think I'll go with MQTT's shared subcriptions for this case.
https://emqx.medium.com/introduction-to-mqtt-5-0-protocol-shared-subscription-4c23e7e0e3c1
Sharing strategy
Although shared subscriptions allow subscribers to consume messages in
a load-balanced manner, the MQTT protocol does not specify what
load-balancing strategy the server should use. For reference, EMQ X
provides four strategies for users to choose: random, round_robin,
sticky, and hash.
random: randomly select one in all shared subscription sessions to publish messages
round_robin: select in turn according to subscription order
sticky: use a random strategy to randomly select a subscription session, continue to use the session until the subscription is cancelled or disconnect and repeat the process
hash: Hash the ClientID of the sender, and select a subscription session based on the hash result
Hash seems like what I'm looking for.

TCP replication of topics

According to the documentation here: https://github.com/OpenHFT/Chronicle-Engine one is able to do pub/sub using maps. This allows one to create a construct similar to topics that are available in middleware such as Tibco, 29W, Kafka and use that as a way of sending events across processes. Is this a recommended usage of chronicle map? What kind of latency can I expect if both publisher and subscriber stay in the same machine?
My second question is, how can this be extended to send messages across machines? How does this work with enterprise TCP replication?
My requirement is to create thousands of topics and use them to communicate across processes running in different machines (in a LAN). Each of these topics would be written by a single source and read by multiple readers running in same or different machines. If the source of a particular topic dies, that source's replica would start writing to the topic and listeners will continue to receive messages. These messages need not be stored for replay.

Is this a recommended usage of chronicle map?
Yes, you can use engine to support event notification across a machine. However, if you want lowest latencies you might need to send a notification via Queue and keep the latest value in a map.
What kind of latency can I expect if both publisher and subscriber stay in the same machine?
It depends on your use case esp the size of the data (in maps case the number of entries as well) The Latency for Map in Engine is around 30 - 100 us, however the latency for Queue is around 2 - 5 us.
My second question is, how can this be extended to send messages across machines?
For this you need our licensed product but the code is the same.
Each of these topics would be written by a single source and read by multiple readers running in same or different machines. If the source of a particular topic dies, that source's replica would start writing to the topic and listeners will continue to receive messages.
Most likely, the simplest solution is to have a Map where each topic is a different key. This will send the latest value for that topic to the consumers.
If you need to recorded every event, a Queue is likely to be a better choice. If you don't need to retain the data for long, you can use a very sort file rotation.

Best Performance - emit to sockets via a loop or rooms

We currently have a chat app whereby when emitting messages out to the appropriate users (could be 1 or several depending how many are in the conversation) we loop through all socket (Socket.io 2.0.2) connections to the server (NodeJS) to get a list of sockets that a user has based on a member ID value as each user could be connected from multiple devices. The code looks like this in order to determine which sockets a user has that we should be sending the message,
var sockets = Object.keys(socketList);
var results = [];
for (var key in sockets) {
if (hasOwnProperty(socketList[sockets[key]].handshake.query, 'token')) {
if (JSON.parse(socketList[sockets[key]].handshake.query.member).id === memberId) {
results.push(socketList[sockets[key]]);
}
}
}
Having to loop through the socket connections seems inefficient and I wonder is there a better way. My thought is to create a room for each user, most users will have only the one connection but some will be connected via multiple devices so they could have multiple sockets in their room. Then I would just broadcast to the appropriate rooms rather than always looping through all sockets. Given that 95% of users will only have the one socket connection I'm not sure if this approach is any more efficient or not and would appreciate some input on this.
Thanks.

First off, socket.io already creates a room for every single user. That room has the name of the socket.id. Rooms are very lightweight objects. They basically just consist of an object with all the ids of the sockets that are in the room. So, there should be no hesitancy to use rooms at all. If they fit the model of what you're doing, then use them.
As for looping yourself vs. emitting to a room, there's really no difference - use whichever makes your code simpler. When you emit to a room, all it does is loop through the sockets in the room and send to each one individually.
Having to loop through the socket connections seems inefficient and I wonder is there a better way.
The main advantage of rooms is that they are pre-built associations of sockets so you don't have to dynamically figure out which sockets you want to send to - there's already a list of sockets in the right room that you can send to. So, it would likely be a small bit more efficient to just send to all sockets in a room than to do what your code is doing because you code is dynamically trying to figure out which sockets to send to, rather than sending to an already made list. Would this make a difference? That depends upon how long the whole list of sockets is and how expensive the computation is to figure out which ones you want to send to. My guess is that it probably wouldn't make much difference either way.
Sending a message to a room is not much more efficient on the actual sending part. Each socket has to be sent the message individually so somebody (your code or the socket.io rooms code) is going to be looping through a list of sockets either way. The underlying OS does not contain a function to send a single message to multiple sockets. Each socket has to be sent to individually.
Then I would just broadcast to the appropriate rooms rather than always looping through all sockets.
Sending to a room is a programming convenience for you, but socket.io will just be looping under the covers anyway.

I would use Socket.io rooms to accomplish what you want to do.
Server side, adding a client to a chat room:
socket.join('some room');
Then I would use socket.to('some room').emit for a sender message to be sent to all participants in the room.

Looking For A Scalable PubSub Solution Or Alternative

I'm currently looking for the best architecture for an IM app I'm trying to build.
The app consists of channels each having a couple thousands of subscribed users. Each user is subscribed only to one channel at a time and is able to publish and read from that channel. Users may move rapidly between channels.
I initially considered using the XMPP PubSub (via Ejabbered or MongooseIM) but as far as I understand it was added as an afterthought and is not very scalable.
I also thought about using using a message queue protocol like AMPQ but I'm not sure if that's what I'm looking for from the IM aspect.
Is my concern regarding the XMPP PubSub justified? And if so, do you know of a better solution?

Take a look at Redis and Kafka. Both are scalable and performant.

I imagined below primary usecases for above IM application based on your inputs.
**
Usecases
**
Many new users keep registering with system and subscribing to one
of the channels
Many existing users changing their subscription from one channel to
other channel
Many existing users keep publishing messages to channels
Many existing users keep receiving messages as subscribers
XMPP is natural fit for 3rd and 4th usecases. "ejabbered" is one of proven highly scalable platform to go ahead.
In case 2nd usecase, You probably may have logic some thing like this.
- a)update channel info of the user in DB
- b)make him listen to new channel
- c)change his publishing topic to other channel...so on
When ever you need to do multiple operations, I strongly recommend to use "KAFKA" to perform above operations in async manner
In case of 1st usecase, Provide registration through rest APIs.So that registration can be done from any device.While registering an user,You may have many operations as follows.
- 1) register user in DB
- 2) create internally IM account
- 3) send email OR SMS for confirmation...so on
Here also perform 1st operation as a part of rest API service logic. Perform 2nd and 3rd operations in async manner using KAFKA. That means your service logic perform 1st operation in sync manner and raise an event to KAFKA. Every consumer will handle 2nd and 3rd operations in async manner.
System could scale well if all layers/subsystems can scale well. In that perspective, Below tech stack may help you scale well.
REST APIS + KAFKA + EJABBERED(XMPP)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio