Horizontal scaling of Websocket Client

Horizontal scaling of Websocket Client - websocket

I am using nchan with Redis cluster to create websocket based pub-sub. We have a java application subscribing to nchan channel using websocket. (i.e. this application acts as websocket client).
Everything works fine when there is one instance of this subscribing application. But, when subscribing application is horizontally scaled, each node of the scaled application creates websocket connection to server and ends up getting copy of same message.
What are the strategies used to horizontally scale websocket clients so only one node of the subscriber application gets the message. In other words, is it possible to create a cluster of websocket client.

Related

Updating a Web Socket from Outside the Web Socket App

If I have an app (let's Express.js app) with a web socket (socket.io) and I want to send message to a client from a different server app, what is the best way to go about that.
Let's assume that both apps are on a public cloud and running on separate containers or VMs. What's the best way to ensure that the message is sent to the right web socket app instance that holds the connection to the client?

You can use redis to ensure that client will get the message no matter which instance of app is sending the message.
But if your other app is a completely different app and does not start a socket server, You can still use socket.io emitter (along with redis adaptor) to send messages to clients without creating another socket server.

Working of websocket services in clustered deployment

Lets say I have a websocket implemented in springboot. The architecture is microservice. I have deployed the service in kubernetes cluster and I have 2 running instance of the service, the socket implementation is using stomp and redis as broker.
Now the first connection is created between a client and one of the service. Does all the data flow occur through the client and the connected service? Would the other service also have a connection? Incase the current service goes down would the other service open up a connection?
Now lets say I'am sending some data back to the client which comes through a kafka topic. One of the either service could read it. If then would either of them be able to send the data back to the client?
Can someone help me understand these scenarios?

A websocket is a permanent connection. After opening it, it will be routed through kubernetes to a fixed pod. No other pod will receive the connection.
If the pod goes down, the connection is terminated.
If a new connection is created, for example by a different user, it may be routed to a different pod.
What data is transmitted, for example with kafka as source, is not relevant in this context. It could be anything.

Routing messages from Kafka to web socket clients connected to application server cluster

I would like to figure out the best way to route messages from Kafka to web socket clients connected to a load balanced application server cluster. I understand that spring-kafka facilitates consuming and publishing messages to a kafka topic, but how does this work in a load balanced application server scenario when connecting to a distributed kafka topic. Here are the requirements that I would like to satisfy, with the overall goal of facilitating peer to peer messaging in an application with a very, very large volume of users:
Web clients can connect to a tomcat application server via web sockets connection via a load balancer.
Web client can send a message/notification to another client thats connected to different tomcat application server.
Messages are saved in the database and published to a kafka topic/partition that can be consumed by the appropriate web clients/users.
Kafka can be scaled to many brokers with many consumers.
I can see how this can be implemented quite easily in a single application server scenario where the consumer consumes all messages from a kafka topic and re-distributes via spring messaging/websockets. But I can't figure out how this would work in a load balanced application server scenario where there are consumers on each application server forming an overall consumer group for the kafka topic. Assuming that each of the application servers are are consuming sub-sets/partitions of the kafka topic, how do they know which server their intended recipients are connected to? And even if they knew which server their recipients were connected to, how would they route the message to them via websockets?
I considered that the application server load balancing could work by logging users with a particular routing key (users starts with 'A' etc) on to a specific application server, then only consuming messages for users starts with 'A' on that application server. But this seems like it would be difficult to maintain and would make autoscaling very difficult. This seems like it should be an common scenario to implement but I can't find any tools or approaches that fit this scenario.

Sounds like every single consumer should live in its own consumer group. This way all the available consumers are going to consume all the messages sent to the topic. Therefore all the connected websocket clients are going to be notified with those messages.
If you need more complex logic with those messages at
after consuming, e.g. filtering, routing, transforming, aggregating etc., you should consider to involve Spring Integration in you project: https://spring.io/projects/spring-integration

Broadcast to all the consumer may work, but the most efficient solution should route message to the node holds the websocket connection for the target user precisely. As i know, route in a distributed system can be done as follows:
Put the route information in a middleware，such as Redis; Or implement a service by yourself to keep track of all the ssesions. That is, solved in a centralized way.
Let the websocket server find route by themselves. In this circumstance, consensus algorithm like gossip should be taken into consideration.

Connect/disconnect from ActiveMQ topic on camel websocket connection/disconnection

I've got the following camel route which listens for messages on an ActiveMQ topic and immediately sends them to all connected web socket clients. This is working fine, but the connection to the topic is made as soon as the route builder is initialised.
from("activemq:topic:mytopic").routeId("routeid").to("websocket://test?sendToAll=true");
What I need is to only connect to the topic when one or more clients are connected to the web socket. Once there are no more connections I want to stop listening on the topic. Is this possible?

According to me there is no proper way to do this. The only way this can be achieved is override Jetty WebSocket code. Once you override Jetty Websocket code you get the flexibility to write your own custom code in open and close websocket.
Maintain a List for all websocket clients in open websocket. Check for close websocket and remove it from the list to know how many are connected or disconnected. Or keep a counter on open and close websocket.
Once all websocket clients get closed suspend the route so that your messages stay in the topic or queue.
If any client gets connected to websocket, resume the route so that the messages reach the particular client connected.

Socket.IO with RabbitMQ?

I'm currently using Socket.IO with redis store.
And I'm using Room feature with it.
So I'm totally okay with Room join (subscribe)
and Leave (unsubscribe) with Socket.IO.
I just see this page
http://www.rabbitmq.com/blog/2010/11/12/rabbitmq-nodejs-rabbitjs/
And I have found that some people are using Socket.IO with rabbitMQ.
Why using Socket.IO alone is not good enough?
Is there any good reason to use Socket.IO with rabbitMQ?

SocketIO is a browser --> server transport mechanism whereas RabbitMQ is a server --> server message bus.
The two can be implemented together to create a very responsive system in scenarios where a user journey consists of a message starting life on a browser and ending up in, say, some persistence layer (such as a database).
A message would be transported to the web server via socketIO and then, instead of the web server being responsible for persisting the message, it would drop it on a Rabbit queue and leave some other process responsible for persisting it. This way, the web server is free to return to its web serving responsibilities and, crucially, lessening its load.

Take a look at SockJS http://sockjs.org .
It's made by the RabbitMQ team
It's simpler than Socket.io
There's an erlang server for SockJS
Apart from that, there is an experimental project within RabbitMQ team that intends to provide a SockJS plugin for RabbitMQ.

I just used rabbitMQ with socket.io for a totally different reason than in the accepted answer. It wasn't that relevant in 2012, that's why I'm updating here.
I'm using a docker swarm deployment of a chat application with scalability and high availability. I have three replicas of the chat application (which uses socket.io) running in the cluster. The swarm cluster automatically load-balances the incoming requests and at any given time a client might get connected to any of the three replicas of the application.
With this scenario, it gets really necessary to sync the WebSocket responses in the replicas of the application because two clients connected to two different instances of the application wouldn't get each other's messages because they've been connected to different WebSockets.
This is where rabbitMQ intervenes. It syncs all the instances of the application and whenever a message is pushed from a WebSocket on a replica, it gets pushed by all replicas.
Complete details of the project have been given here. This is a potential use case of socket.io and rabbitMQ use in conjunction. This goes for any application using socket.io in a distributed environment with high availability and scalability.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio