Do I need session-clustering on a DB for load balancing a Jetty WebSockets server with HAProxy on AWS/EC2? - websocket

I am writing a chat-like application using WebSockets using a Jetty 9.3.7 WebSockets server running on AWS/EC2. A description of the architecture is below:
(a) The servers are based on HTTPS (wss). I am thinking of using HAProxy using IP hash-based LB for this. The system architecture will look like this:
-->wss1: WebSocket server 1
/
clients->{HAProxy LB} -->wss2: WebSocket server 2
(a, b,..z) \
-->wss3: WebSocket server 3
I am terminating HTTPS/wss on the LB per these instructions.
(b) Clients a...z connect to the system and will connect variously to wss1, wss2 or wss3 etc.
(c) Now, my WebSocket application works as follows. When one of the clients pushes a message, it is sent to the WS server the client is connected to (say wss1, and then that message is disseminated to a few of the other clients (the set of clients being programmatically determined at my WebSocket application running on wss1). E.g., a creates a message Hey guys! and pushes it to wss1, which is then pushed to clients b and c so that b and c receive Hey guys! message. b has a WebSocket connection to server wss2 and c has a WebSocket connection to wss3.
My question is, to push the message from the message receiving server, like (c) above, wss1 needs to know the WebSocket session/connection to b and c which may well be on a different WebSocket server. Can I use session clustering on Jetty to retrieve the sessions b and c are connected to? If not, what's the best way to provide this lookup while load balancing Jetty WebSockets?
Second, if I do use session clustering or some such method to retrieve the session, how can I use the sessions for b and c on wss1 to send the message to b and c? It appears like there is no way to do this except with some sort of communication between the servers. Is this correct?
If I have to use session clustering for this, is there a github example you can point me to?
Thanks!

I think session clustering is not a right tool. Message Oriented Middleware (MOM) supporting publish and subscribe model should be enough to cluster multiple real-time applications. As an author of Cettia, a real-time application framework, I've used publish and subscribe model to scale application horizontally.
The basic idea is
A message to be exchanged through MOM is an operation applying to each server locally. For example, operation can be 'sending a message to all clients'. Here all clients means ones that connect to server to execute a given operation.
Every server subscribe the same topic of MOM. When some message is given by the topic, server deserializes it into operation and executes the operation locally. It happens on every server.
If some operation happens on some server, that server should serialize it into message and publish it to the topic.
With Cettia, all you need is to plug your MOM into cettia application. If you want to make it from scratch, you need to implement the above ideas.
http://cettia.io/projects/cettia-java-server/1.0.0-Beta1/reference/#clustering
https://github.com/cettia/cettia-java-server/blob/1.0.0-Beta1/server/src/main/java/io/cettia/ClusteredServer.java
Here's working examples per some MOMs. Though they are examples written in Cettia, it might help you understand how the above idea works.
AMQP 1
Hazelcast 3
jGroups 3
JMS 2
Redis 2
Vert.x 2

Related

ActiveMQ - Stomp over websockets - Same Origin Policy

I have a process that runs in California that wants to talk to a process in New York, using Stomp over Websockets.
Also note that my process is not a web app, but I implemented a stomp over websocket client in C++, in order to connect things up to my backend. Maybe this was or wasn't a good idea. So, I want my client to talk to the server and subscribe, where their client pushed messages.
I was implementing my own server when I saw that ApacheMQ supported Stomp over Websockets. So, I started reading the docs.
It says with the last line under 'configuration' at
http://activemq.apache.org/websockets :
One thing worth noting is that web sockets (just as Ajax) implements ? > the same origin policy, so you can access only brokers running on the > same host as the web application running the client.
it says it again in several related searches such as http://sensatic.net/activemq/activemq-54-stomp-over-web-sockets.html
Is this a limitation of the server or the web client?
With that limitation, if I understand right, the server is not going to accept websocket connections from a client, of any kind, that is not on the same machine?
I am not sure I see the point of that...
If that is indeed its meaning, then how do I get around it in order to implement my scenario?
I've not found that bit of documentation you are referring to but from what I know of the STOMP implementation on the broker this seems incorrect. There shouldn't be any limit to the transport connector accepting connect requests from an outside host by default and I don't think the browser treats the websocket requests the same as it does other things like an Ajax case in terms of the same origin policy.
This probably a case that is best checked by actually trying it to see if it works, I've connected just fine from outside the same host using AMQP over websockets on ActiveMQ so I'd guess the STOMP stack should also work fine.

Routing messages from Kafka to web socket clients connected to application server cluster

I would like to figure out the best way to route messages from Kafka to web socket clients connected to a load balanced application server cluster. I understand that spring-kafka facilitates consuming and publishing messages to a kafka topic, but how does this work in a load balanced application server scenario when connecting to a distributed kafka topic. Here are the requirements that I would like to satisfy, with the overall goal of facilitating peer to peer messaging in an application with a very, very large volume of users:
Web clients can connect to a tomcat application server via web sockets connection via a load balancer.
Web client can send a message/notification to another client thats connected to different tomcat application server.
Messages are saved in the database and published to a kafka topic/partition that can be consumed by the appropriate web clients/users.
Kafka can be scaled to many brokers with many consumers.
I can see how this can be implemented quite easily in a single application server scenario where the consumer consumes all messages from a kafka topic and re-distributes via spring messaging/websockets. But I can't figure out how this would work in a load balanced application server scenario where there are consumers on each application server forming an overall consumer group for the kafka topic. Assuming that each of the application servers are are consuming sub-sets/partitions of the kafka topic, how do they know which server their intended recipients are connected to? And even if they knew which server their recipients were connected to, how would they route the message to them via websockets?
I considered that the application server load balancing could work by logging users with a particular routing key (users starts with 'A' etc) on to a specific application server, then only consuming messages for users starts with 'A' on that application server. But this seems like it would be difficult to maintain and would make autoscaling very difficult. This seems like it should be an common scenario to implement but I can't find any tools or approaches that fit this scenario.
Sounds like every single consumer should live in its own consumer group. This way all the available consumers are going to consume all the messages sent to the topic. Therefore all the connected websocket clients are going to be notified with those messages.
If you need more complex logic with those messages at
after consuming, e.g. filtering, routing, transforming, aggregating etc., you should consider to involve Spring Integration in you project: https://spring.io/projects/spring-integration
Broadcast to all the consumer may work, but the most efficient solution should route message to the node holds the websocket connection for the target user precisely. As i know, route in a distributed system can be done as follows:
Put the route information in a middleware,such as Redis; Or implement a service by yourself to keep track of all the ssesions. That is, solved in a centralized way.
Let the websocket server find route by themselves. In this circumstance, consensus algorithm like gossip should be taken into consideration.

When to choose a remote queue design versus local queue for get/put activities

I'm trying to figure out under what conditions I would want to implement a remote queue versus a local one for 2 endpoint applications.
Consider this scenario: App A on Server A needs to send messages to App B on Server B via MQServer1.
It seems like the simplest configuration would be to create a single local queue on MQServer1 and configure AppA to put messages to the local queue while configuring AppB to get messages from the same local queue. Both AppA and AppB would connect to the same Queue Manager but execute different commands.
What sort of circumstances would require the need to install another MQ server (e.g. MQServer2) and configure a remote queue on MQServer1 which instead sends the messages from AppA over a channel to a local queue on MQServer2 to be consumed by AppB?
I believe I understand the benefit of remote queuing but I'm not sure when it's best used over a more simpler design.
Here are some problems with what you call the simpler design that you don't have with remote queuing:-
Time Independance - Server1 has to be available all the time, whereas with a remote queue, once the messages have been moved to Server B, Server A and Server 1 don't need to be online when App B wants to get its messages.
Network Efficiency - with two client applications putting or getting from a central queue, you have two inefficient network hops, instead of one efficient channel batched network connection from Server A to Server B (no need for Server 1 in the middle)
Network Problems - No network, no messages. Whereas when they are stored locally, any that have already arrived can be processed even while the network is down. Likewise, the application putting messages is also not held up by a network problem, the messages sit on the transmit queue easy to be moved, and the application can get on with the next thing.
Of course your applications should be written so that they aren't even aware of the difference, and it's just configuration changes that switch you from one design to the other.
Here we can have separate Queue Manager for both the application.Application A will send the message on to the queue defined on local Queue Manager, which in turn transmit it to the Transmission queue via defined channels (Need to do configuration for that in the QueueManager) which in turn send it to the Local queue of the Application B.

Does websocket only broadcasts the data to all clients connected instead of sending to a particular client?

I am new to Websockets. While reading about websockets, I am not been able to find answers to some of my doubts. I would like if someone clarifies it.
Does websocket only broadcasts the data to all clients connected instead of sending to a particular client? Whatever example (mainly chat apps) I tried they sends data to all the clients. Is it possible to alter this?
How it works on clients located on NAT (behind router).
Since client server connection will always remain open, how will it affect server performance for large number of connections?
Since I want all my clients to get real time updates, it is required to connect all my clients to server, so how should I handele the client connection limit?
NOTE:- My client is not a Web browser but a desktop application.
No, websocket is not only for broadcasting. You send messages to specific clients, when you broadcast you just send the same message to all connected clients, but you can send different messages to different clients, for example a game session.
The clients connect to the server and initialise the connections, so NAT is not a problem.
It's good to use a scalable server, e.g. an event driven server (e.g. Node.js) that doesn't use a seperate thread for each connection, or an erlang server with lightweight processes (a good choice for a game server).
This should not be a problem if you use a good server OS (e.g. Linux), but may be a limitation if your server uses a desktop version of Windows (e.g. may be limited to 200 connections).

Socket.IO with RabbitMQ?

I'm currently using Socket.IO with redis store.
And I'm using Room feature with it.
So I'm totally okay with Room join (subscribe)
and Leave (unsubscribe) with Socket.IO.
I just see this page
http://www.rabbitmq.com/blog/2010/11/12/rabbitmq-nodejs-rabbitjs/
And I have found that some people are using Socket.IO with rabbitMQ.
Why using Socket.IO alone is not good enough?
Is there any good reason to use Socket.IO with rabbitMQ?
SocketIO is a browser --> server transport mechanism whereas RabbitMQ is a server --> server message bus.
The two can be implemented together to create a very responsive system in scenarios where a user journey consists of a message starting life on a browser and ending up in, say, some persistence layer (such as a database).
A message would be transported to the web server via socketIO and then, instead of the web server being responsible for persisting the message, it would drop it on a Rabbit queue and leave some other process responsible for persisting it. This way, the web server is free to return to its web serving responsibilities and, crucially, lessening its load.
Take a look at SockJS http://sockjs.org .
It's made by the RabbitMQ team
It's simpler than Socket.io
There's an erlang server for SockJS
Apart from that, there is an experimental project within RabbitMQ team that intends to provide a SockJS plugin for RabbitMQ.
I just used rabbitMQ with socket.io for a totally different reason than in the accepted answer. It wasn't that relevant in 2012, that's why I'm updating here.
I'm using a docker swarm deployment of a chat application with scalability and high availability. I have three replicas of the chat application (which uses socket.io) running in the cluster. The swarm cluster automatically load-balances the incoming requests and at any given time a client might get connected to any of the three replicas of the application.
With this scenario, it gets really necessary to sync the WebSocket responses in the replicas of the application because two clients connected to two different instances of the application wouldn't get each other's messages because they've been connected to different WebSockets.
This is where rabbitMQ intervenes. It syncs all the instances of the application and whenever a message is pushed from a WebSocket on a replica, it gets pushed by all replicas.
Complete details of the project have been given here. This is a potential use case of socket.io and rabbitMQ use in conjunction. This goes for any application using socket.io in a distributed environment with high availability and scalability.

Resources