Routing messages from Kafka to web socket clients connected to application server cluster - websocket

I would like to figure out the best way to route messages from Kafka to web socket clients connected to a load balanced application server cluster. I understand that spring-kafka facilitates consuming and publishing messages to a kafka topic, but how does this work in a load balanced application server scenario when connecting to a distributed kafka topic. Here are the requirements that I would like to satisfy, with the overall goal of facilitating peer to peer messaging in an application with a very, very large volume of users:
Web clients can connect to a tomcat application server via web sockets connection via a load balancer.
Web client can send a message/notification to another client thats connected to different tomcat application server.
Messages are saved in the database and published to a kafka topic/partition that can be consumed by the appropriate web clients/users.
Kafka can be scaled to many brokers with many consumers.
I can see how this can be implemented quite easily in a single application server scenario where the consumer consumes all messages from a kafka topic and re-distributes via spring messaging/websockets. But I can't figure out how this would work in a load balanced application server scenario where there are consumers on each application server forming an overall consumer group for the kafka topic. Assuming that each of the application servers are are consuming sub-sets/partitions of the kafka topic, how do they know which server their intended recipients are connected to? And even if they knew which server their recipients were connected to, how would they route the message to them via websockets?
I considered that the application server load balancing could work by logging users with a particular routing key (users starts with 'A' etc) on to a specific application server, then only consuming messages for users starts with 'A' on that application server. But this seems like it would be difficult to maintain and would make autoscaling very difficult. This seems like it should be an common scenario to implement but I can't find any tools or approaches that fit this scenario.

Sounds like every single consumer should live in its own consumer group. This way all the available consumers are going to consume all the messages sent to the topic. Therefore all the connected websocket clients are going to be notified with those messages.
If you need more complex logic with those messages at
after consuming, e.g. filtering, routing, transforming, aggregating etc., you should consider to involve Spring Integration in you project: https://spring.io/projects/spring-integration

Broadcast to all the consumer may work, but the most efficient solution should route message to the node holds the websocket connection for the target user precisely. As i know, route in a distributed system can be done as follows:
Put the route information in a middleware,such as Redis; Or implement a service by yourself to keep track of all the ssesions. That is, solved in a centralized way.
Let the websocket server find route by themselves. In this circumstance, consensus algorithm like gossip should be taken into consideration.

Related

How to funnel an API call to a specific service fabric node

I have exposed a websocket enabled service endpoint through Azure Application Gateway and the service is hosted on azure service fabric. Client initiates a websocket connection with my endpoint and is able to exchange data. During certain message flows, my Web Socket enabled service calls other services hosted on the service fabric using azure service bus. These are handled in a completely async manner. Once the other services finish processing, they post a message to the service bus which my WebSocket service reads back.
The problem I am having is to route the messages back to the right service fabric node so that it can be pushed back to the client at the other end of the WebSocket connection
In the picture below, you can imagine each node containing multiple services including the web socket enabled service. Once the Websocket service posts a message to the service bus, the downstream services start processing and finally they post a message back to the service bus which the websocket service reads back. Here a random node will pick up the message and it might not have the relevent websocket connection to push the processed data back
Sample Design
I have looked at redis pubsub model and it looks like I have to maintain last message processed on the nodes. It also means, every node on the cluster will need to read the message and discard it if they don't have the websocket connection with the client. I am looking for any suggested design models for this kind of problem
I ran into a similar scenario and didn't like the idea of using a new external service (Redis/SQL Server) as a backplane that would simply duplicate each message/event across all nodes.
The solution I settled on was to lean on a property of actor proxies, using actor events to call-back to a specific instance of a stateless service. Creating an actor service to act as a pub/sub backplane.
The solution is summarised in this blog post and this GitHub repo. It's worth pointing out that the documentation states actor events are best effort. This hasn't really been an issue when the application is running as normal, I presume that during a deployment or failover, some events may get lost, however this could be mitigated with additional work.
It's also worth noting that your load balancing rules should maintain sticky connections between clients and back-end instances. You could create separate rules for websockets if you only wanted this to apply to them and not your regular HTTP traffic.

how to set a load balancer for clustered MQTT brokers

I'm working on a project to develop a real time mobile messaging application that needs to have advanced message filtering based on message content and user's balance, meaning that if the user has ran out of balance or if he's sending content that violates the policy the messages have to be blocked.
For this reason I need to implement some load balancing solution that scans published messages and could also determine if the message should be blocked based on the rules above, hence I can't implement a basic proxy as I need special rules applied on each message.
The difficult part:
Mobile app needs to receive subscription messages (connection acknowledgement too) without passing through the load balancer preferably (see my next point).
The problem is that the only way I could forward subscription messages to the mobile app would be by handling connections and subscriptions in the load balancer which is disastrous. I need the connections to be transparent and the load balancer stateless.
How can I accomplish this? (if it's of any help my current design involves Java component with spring boot for load balancing and VerneMQ as the message broker)
Try
Mobile App -- > MQTT Broker -- > Message Scan / Block Algorithm -- > MQTT Broker -- > Subscriber.
Your mobile app should have the intelligence to stop messages when the app runs out of balance after the first message.
So the MQTT Broker should not send the message to its subscriber directly at its layer. It should send out messsages that were received after processing.
Not sure anyone has a ready made solution for this flow. But doable.

I need to build a Vert.x virtual host server that channels traffic to other Vert.x apps. How is this kind of inter-app communication accomplished?

As illustrated above, I need to build a Vert.x Java app that will be an HTTP server/virtual host (TLS Http traffic, Web socket traffic) that will redirect/channel specific domain traffic to other Vert.x Java apps running on the same server, each in it's own JVM.
I have been reading for days but I remain uncertain as to how to approach all aspects of the task.
What I DO know or have experience with:
Creating an HTTP server, etc
Using a Vert.x VirtualHost handler to "handle" incoming traffic for a
specific domain
What I DO NOT know:
How do I "re-direct" a domain's traffic to another Vert.x app (this
other Vert.x app would also be running on the same server, in its own
JVM).
- Naturally this "other" Vert.x app would need to respond to HTTP
requests, etc. What Vert.x mechanisms do I employ to accomplish this
aspect of the task?
Are any of the following concepts part of the solution? I'm unfamiliar with these concepts and how they may or may not form part of the solution.:
Running each Vert.x app using -cluster option?
Vert.x Streams?
Vert.x Pumps?
There are multiple ways to let your microservices communicate with each other, the fact that all your apps are running on the same server doesn't change much, but it makes number 2.) easy to configure
1.) Rest based client - server communication
Both host and apps have a webserver
When you handle the incoming requests on the host, you simply call another app with a HttpClient
Typically all services find each others address via service discovery.
Eg: each service registers his address in a central registry then other services use this central registry to find the addresses.
Note: this maybe an overkill for you and you can just configure the addresses of the other services.
2.) You start the vertx microservices in clustered mode
the eventbus is then shared among the services
For all incoming requests you send a broadcast on the eventbus
the responsible app replies to the message
For further reading you can checkout https://vertx.io/docs/vertx-hazelcast/java/#configcluster. You start your projects with -cluster option and define the clustering in an xml configuration. I think by default it finds the services via local broadcast.
3.) You use a message broker like RabbitMq etc.
All your apps connect to a central message broker
When a new request comes in to the host, it sends a message to the message broker
The responible app then listens to the relevant messages and replies
The host receives the reply from the message broker
There are already many existing vertx clients for certain message brokers like kafka, camel, zeromq:
https://github.com/vert-x3/vertx-awesome#integration

What is the best way to return data to users when using event-driven microservices?

I am using a microservice architecture for this current project using RabbitMQ as a message broker. My issue is determining the best possible way to make "requests" to the microservices and return back the eventual response, currently I have a socket.io socket running and connecting the browser client to that and sending events to the socket, the socket reads the events and publishes them into RabbitMQ, and of course they are then consumed by the services.
So my question:
Is my current setup good enough to just keep using or is there other ways that are better?

Socket.IO with RabbitMQ?

I'm currently using Socket.IO with redis store.
And I'm using Room feature with it.
So I'm totally okay with Room join (subscribe)
and Leave (unsubscribe) with Socket.IO.
I just see this page
http://www.rabbitmq.com/blog/2010/11/12/rabbitmq-nodejs-rabbitjs/
And I have found that some people are using Socket.IO with rabbitMQ.
Why using Socket.IO alone is not good enough?
Is there any good reason to use Socket.IO with rabbitMQ?
SocketIO is a browser --> server transport mechanism whereas RabbitMQ is a server --> server message bus.
The two can be implemented together to create a very responsive system in scenarios where a user journey consists of a message starting life on a browser and ending up in, say, some persistence layer (such as a database).
A message would be transported to the web server via socketIO and then, instead of the web server being responsible for persisting the message, it would drop it on a Rabbit queue and leave some other process responsible for persisting it. This way, the web server is free to return to its web serving responsibilities and, crucially, lessening its load.
Take a look at SockJS http://sockjs.org .
It's made by the RabbitMQ team
It's simpler than Socket.io
There's an erlang server for SockJS
Apart from that, there is an experimental project within RabbitMQ team that intends to provide a SockJS plugin for RabbitMQ.
I just used rabbitMQ with socket.io for a totally different reason than in the accepted answer. It wasn't that relevant in 2012, that's why I'm updating here.
I'm using a docker swarm deployment of a chat application with scalability and high availability. I have three replicas of the chat application (which uses socket.io) running in the cluster. The swarm cluster automatically load-balances the incoming requests and at any given time a client might get connected to any of the three replicas of the application.
With this scenario, it gets really necessary to sync the WebSocket responses in the replicas of the application because two clients connected to two different instances of the application wouldn't get each other's messages because they've been connected to different WebSockets.
This is where rabbitMQ intervenes. It syncs all the instances of the application and whenever a message is pushed from a WebSocket on a replica, it gets pushed by all replicas.
Complete details of the project have been given here. This is a potential use case of socket.io and rabbitMQ use in conjunction. This goes for any application using socket.io in a distributed environment with high availability and scalability.

Resources