I am quite new to Websocket Technology. I am building an application that requires realtime users communication. The messages (chat data) is going to be persisted into Cassandra (NoSql DB). The application is going to have at least 800 users online at any given moment. The communications between users need to be superfast. Persisting data into a DB takes some seconds (nano, micro or whatever).
I was thinking that maybe it would be best if I push all the messages to be persisted into a Message Queue, which will consumed by RabbitMQ consumer whose main job is to save messages into the DB.
I am still a junior developer. Will be extremely grateful for any suggestion. Thanks.
Related
I am building a scalable chat application using Go and Redis w/ websockets.
I need to publish a new message using redis pub-sub model to other websocket servers to inform all the users (saved in memory of other servers) about the new joined user.
But the issue is, the publisher(also a redis client) receives the same message. Is there a direct way to solve this?
Workaround:
Check if the user for new user in the received event (for publisher) is in the list of current local users everytime.
WHY NEGATIVE VOTES? I'm so pissed at stack-overflow these days. People have no tolerance or too much arrogance
Is such a situation even possible ? :
There is an application "XYZ" (in which there is no Kafka) that exposes a REST api. It is a SpringBoot application with which Angular application communicates.
A new application (SpringBoot) is created which wants to use Kafka and needs to fetch data from "XYZ" application. And it wants to do this using Kafka.
The "XYZ" application has an example endpoint [GET] api/message/all which displays all messages.
Is there a way to "connect" Kafka directly to this endpoint and read data from it ? In short, the idea is for Kafka to consume data directly from the EP. Communication between two microservices, where one microservice does not have a kafka.
What suggestions do you have for solving this situation. Because I guess this option is not possible. Is it necessary to add a publisher in application XYZ which will send data to the queue and only then will they be available for consumption by a new application ??
Getting them via the REST-Interface might not be a very good idea.
Simply put, in the messaging world, message delivery guarantees are a big topic and the standard ways to solve that with Kafka are usually
Producing messages from your service using the Producer-API to a Kafka topic.
Using Kafka-Connect to read from an outbox-table.
Since you most likely have a database already attached to your API-Service, there might arise the problem of dual writes if you choose to produce the messages directly to a topic. What this means, is that writes to a database might fail while it might be successfully written to Kafka/vice-versa. So you can end up with inconsistent states. Depending on your use case this might be a problem or not.
Nevertheless, to overcome that, the outbox pattern can come in handy.
Via the outbox pattern, you'd basically write your messages to a table, a so-called outbox-table, and then you'd use Kafka-Connect to poll this table of the database. Kafka Connect is basically a cluster of workers that consume this database table and forward the entries of the table to a Kafka topic. You might want to look at confluent cloud, they offer a fully managed Kafka-Connect service. Like this you don't have to manage the cluster of workers yourself. Once you have the messages in a Kafka topic, you can consume them with the standard Kafka Consumer-API/ Stream-API.
What you're looking for is a Source-Connector.
A source connector for a specific database. E.g. MongoDB
E.g. https://www.confluent.io/hub/mongodb/kafka-connect-mongodb
For now, most source-connectors produce in an at-least-once fashion. This means that the topic you configure the connector to write to might contain a message twice. So make sure that if you need them to be consumed exactly once, you think about deduplicating these messages.
I'm building a Web Application that consumes data pushed from Server.
Each message is JSON and could be large, hundreds of kilobytes, and messages send couple times per minute, and the order doesn't matter.
The Server should be able to persist not yet delivered messages, potentially storing couple of megabytes for client for couple of days, until client won't get online. There's a limit on the storage size for unsent messages, say 20mb per client, and old undelivered messages get deleted when this limit is exceeded.
Server should be able to handle around 1 thousand simultaneous connections. How it could be implemented simply?
Possible Solutions
I was thinking maybe store messages as files on disk and use Browser Pool for 1 sec, to check for new messages and serve it with NGinx or something like that? Is there some configs / modules for NGinx for such use cases?
Or maybe it's better to use MQTT Server or some Message Queue like Rabbit MQ with some Browser Adapter?
Actually, MQTT supports the concept of sessions that persist across client connections, but the client must first connect and request a "non-clean" session. After that, if the client is disconnected, the broker will hold all the QoS=1 or 2 messages destined for that client until it reconnects.
With MQTT v3.x, technically, the server is supposed to hold all the messages for all these disconnected clients forever! Each messages maxes out at a 256MB payload, but the server is supposed to hold all that you give it. This created a big problem for servers that MQTT v5 came in to fix. And most real-world brokers have configurable settings around this.
But MQTT shines if the connections are over unreliable networks (wireless, cell modems, etc) that may drop and reconnect unexpectedly.
If the clients are connected over fairly reliable networks, AMQP with RabbitMQ is considerably more flexible, since clients can create and manage the individual queues. But the neat thing is that you can mix the two protocols using RabbitMQ, as it has an MQTT plugin. So, smaller clients on an unreliable network can connect via MQTT, and other clients can connect via AMQP, and they can all communicate with each other.
MQTT is most likely not what you are looking for. The protocol is meant to be lightweight and as the comments pointed out, the protocol specifies that there may only exist "Control Packets of size up to 268,435,455 (256 MB)" source. Clearly, this is much too small for your use case.
Moreover, if a client isn't connected (and subscribed on that particular topic) at the time of the message being published, the message will never be delivered. EDIT: As #Brits pointed out, this only applies to QoS 0 pubs/subs.
Like JD Allen mentioned, you need a queuing service like Rabbit MQ or AMQ. There are countless other such services/libraries/packages in existence so please investigate more.
If you want to role your own, it might be worth considering using AWS SQS and wrapping some of your own application logic around it. That'll likely be a bit hacky though, so take that suggestion with a grain of salt.
We have been evaluating Spring-Stomp-Broker-websockets, for a full duplex type messaging application that will run on AWS. We had hoped to use Amazon MQ. We are pushing messages to individual users, and also broadcasting. So functionally the stack did look good. We have about 40,000 - 80,000 users. We quickly found, with load testing, that none of the spring stack or Amazon MQ scales very well, issues:
Spring Cloud Gateway instance cannot handle more than about 3,000
websockets before dying.
Spring Websocket server instance can also
only handle about 4,000 websockets, on a T3.Medium. When we bypass
the Gateway.
AWS limits Active MQ connections to 100 for a small
server, and then only 1000 on a massive server. No in-between, this
is just weird.
Yes we have increased the file handles etc on the machines so TCP connections are not the limit. There is no way Spring could ever get close to the limit here.We are sending a 18 K message, for load, the maximum we will expect. In our results message size has little impact, its just the connection over head on the Spring Stack.
The StompBrokerRelayMessageHandler opens a connection to the Broker for each STOMP Connect. There is no way to pool the connections. So this makes this Spring feature completely useless for any ‘real’ web applications. In order to support our users the cost of AWS massive servers for MQ means this solution is ridiculously expensive, requiring 40 of the biggest servers. In load testing, the Amazon MQ machine is doing nothing, with the 1000 users, it is not loaded.In reality a couple of medium sized machine is all we need for all our brokers.
Has any one ever built a real world solution, as above, using Spring Stack. It appears no one has done this, and no one has scaled this up.
Has anyone written a pooling StompBrokerRelayMessageHandle. I assume there must be a reason this can’t work as it should be the default approach ? What is the issue here ?
Seems this issues makes the whole Spring Websocket + STOMP + Broker approach pretty useless and we are now forced to use a different approach for message reliability, and for messaging across servers where users are not connected (main reason we are using broker) and have gone back too using a Simple Broker, and wrote a registry to manage the client server location. So we have now eliminated the broker and the figures above are with that model. The we may add in AWS SQS for reliability of messages.
Whats left. We were going to use the Spring Cloud Gateway to load balance across multiple small WebSocket servers, but seems this approach will not work, as the WebSocket load a server can handle is just way too small. The Gateway just cannot handle it. We are now removing Spring Cloud Gateway and using a AWS load balancer instead. So now we can get significantly more connections load balanced. Why does Spring Cloud Gateway not load balance ?
Whats left. The websocket server instances are t3.mediums, they have no business logic and just pass a message between 2 clients, so it really does not need a bigger server. We would expect considerably better than 4,000 connections. However this is close to usable.
We are now drilling into the issues to get more details on where the performance bottlenecks are, but the lack of any tuning guides or scaling information does not suggest good things about Spring. Compare this to Node solutions that scale very well, and handle larger number of connections on small machines.
Next approach is to look at WebFlux + WebSocket, but then we loose STOMP. Maybe we’ll check raw websockets ?
This is just an early attempt to see if anyone actually has used Spring Websockets in anger and can share real working production architecture, as only Toy examples are available. So any help on above issues would be appreciated.
Most of the articles on the web dealing with WebSockets are about in-memory Chat.
I'm interested in kind of less instant Chat, that is persistent, like a blog's post's comments.
I have a cluster of two servers handling client requests.
I wonder what could be the best strategy to handle pushing of database update to corresponding clients.
As I'm using Heroku to handle this cluster (of 2 web dynos), I obviously read this tutorial aiming to build a Chat Room shared between all clients.
It uses Redis in order to centralize coming messages; each server listening for new messages to propagate to web clients through websocket connections.
My use case differs in that I've got a Neo4j database, persisting into it each message written by any client.
My goal is to notify each client from a specific room that a new message/comment has just been persisted by a client.
With an architecture similar to the tutorial linked above, how could I filter only new messages to propagate to user? Is there an easy and efficient way to tell Redis:
"(WebSocket saying) When my client initiates the websocket connection, I take care to make a query for all persisted messages and sent them to client, however I want you (Redis) to feed me with all NEW messages, that I didn't send to client, so that I will be able to provide them."
How to prevent Redis from publishing the whole conversation each time a websocket connection is made? It would lead to duplications since the database query already provided the existing contents at the moment.
This is actually a pretty common scenario, where you have three components:
A cluster of stateless web servers that maintain open connections with all clients (load balanced across the cluster, obviously)
A persistent main data storage - Neo4j in your case
A messaging/queueing backend for broadcasting messages across channels (thus across the server cluster) - Redis
Your requirement is for new clients to receive an initial feed of the recent messages, and any consequent messages in real-time. All of this is implemented in your connection handlers.
Essentially, this is what your (pseudo-)code should look like:
class ConnectionHandler:
redis = redis.get_connection()
def on_init():
self.send("hello, here are all the recent messages")
recent_msgs = fetch_msgs_from_neo4j()
self.send(recent_msgs)
redis.add_listener(on_msg)
self.send("now listening on new messages")
def on_msg(msg):
self.send("new message: ")
self.send(msg)
The exact implementation really depends on your environment, but this is the general flow of things.