Send messages to clients at scale over APIGateway Websockets - websocket

We are working on a POC to send messages to clients/browsers over Websockets. We are using AWS APIGateway Websockets for it, after client requests a connection, the connection is created and id is stored in Dynamo DB. Whenever there is an update, AWS Lambda fetches all the connection ids from DDB and iterate over them and send message to clients over the websocket connections.
This solution works fine with less number of clients but fails at scale, because lambda has to iterate through large number of connections. Is there a support from APIGateway to broadcast messages to all clients about the updates, if not what approach can we take to support large number of clients using Websockets?

Is there a support from APIGateway to broadcast messages to all clients about the updates
There has not been any way via the API Gateway API (at least the v3 javascript API) to send to client connections without explicitly knowing the connection ID.
what approach can we take to support large number of clients using Websockets?
Scanning DynamoDB is not ideal in terms of cost or performance. I've learned this the hard way.
I would consider either creating your own websocket server and hosting it via EC2 or switching your data storage to something outside of the traditional offerings of AWS, assuming your requirements are minimal (i.e., only needing to store connection IDs).

I am working on a similar project (WebSocket API Gateway + DynamoDB + Lambda triggered by a FIFO SQS Queue to publish messages to the connected users) and I realized that what was slowing down everything when broadcasting the messages was the postToConnection method.
At first, I tried multithreading in python to make multiple calls in parallel but I soon realized it didn't change anything.
At some point, I realized that the memory setting for my Lambda was still the default 128mb. I was not hitting up the memory limit at all, but within the config page of the Lambda, I noticed this sentence:
Your function is allocated CPU proportional to the memory configured.
The Memory (MB) setting determines the amount of memory available for your
Lambda function during invocation. Lambda allocates CPU power linearly
in proportion to the amount of memory configured. At 1,769 MB, a
function has the equivalent of one vCPU (one vCPU-second of credits
per second. To increase or decrease the memory and CPU power allocated
to your function, set a value between 128 MB and 10240 MB.
Upon increasing the memory setting (and CPU at the same time), I immediately noticed a huge boost in performance. I can't say what is the "ideal" setting for the number of connections, but just increasing it to 512mb made all the difference in our case.
Hope this helps!


Simple Server to PUSH lots of data to Browser?

I'm building a Web Application that consumes data pushed from Server.
Each message is JSON and could be large, hundreds of kilobytes, and messages send couple times per minute, and the order doesn't matter.
The Server should be able to persist not yet delivered messages, potentially storing couple of megabytes for client for couple of days, until client won't get online. There's a limit on the storage size for unsent messages, say 20mb per client, and old undelivered messages get deleted when this limit is exceeded.
Server should be able to handle around 1 thousand simultaneous connections. How it could be implemented simply?
Possible Solutions
I was thinking maybe store messages as files on disk and use Browser Pool for 1 sec, to check for new messages and serve it with NGinx or something like that? Is there some configs / modules for NGinx for such use cases?
Or maybe it's better to use MQTT Server or some Message Queue like Rabbit MQ with some Browser Adapter?
Actually, MQTT supports the concept of sessions that persist across client connections, but the client must first connect and request a "non-clean" session. After that, if the client is disconnected, the broker will hold all the QoS=1 or 2 messages destined for that client until it reconnects.
With MQTT v3.x, technically, the server is supposed to hold all the messages for all these disconnected clients forever! Each messages maxes out at a 256MB payload, but the server is supposed to hold all that you give it. This created a big problem for servers that MQTT v5 came in to fix. And most real-world brokers have configurable settings around this.
But MQTT shines if the connections are over unreliable networks (wireless, cell modems, etc) that may drop and reconnect unexpectedly.
If the clients are connected over fairly reliable networks, AMQP with RabbitMQ is considerably more flexible, since clients can create and manage the individual queues. But the neat thing is that you can mix the two protocols using RabbitMQ, as it has an MQTT plugin. So, smaller clients on an unreliable network can connect via MQTT, and other clients can connect via AMQP, and they can all communicate with each other.
MQTT is most likely not what you are looking for. The protocol is meant to be lightweight and as the comments pointed out, the protocol specifies that there may only exist "Control Packets of size up to 268,435,455 (256 MB)" source. Clearly, this is much too small for your use case.
Moreover, if a client isn't connected (and subscribed on that particular topic) at the time of the message being published, the message will never be delivered. EDIT: As #Brits pointed out, this only applies to QoS 0 pubs/subs.
Like JD Allen mentioned, you need a queuing service like Rabbit MQ or AMQ. There are countless other such services/libraries/packages in existence so please investigate more.
If you want to role your own, it might be worth considering using AWS SQS and wrapping some of your own application logic around it. That'll likely be a bit hacky though, so take that suggestion with a grain of salt.

Spring + Websockets + STOMP + Broker + Gateway does not scale

We have been evaluating Spring-Stomp-Broker-websockets, for a full duplex type messaging application that will run on AWS. We had hoped to use Amazon MQ. We are pushing messages to individual users, and also broadcasting. So functionally the stack did look good. We have about 40,000 - 80,000 users. We quickly found, with load testing, that none of the spring stack or Amazon MQ scales very well, issues:
Spring Cloud Gateway instance cannot handle more than about 3,000
websockets before dying.
Spring Websocket server instance can also
only handle about 4,000 websockets, on a T3.Medium. When we bypass
the Gateway.
AWS limits Active MQ connections to 100 for a small
server, and then only 1000 on a massive server. No in-between, this
is just weird.
Yes we have increased the file handles etc on the machines so TCP connections are not the limit. There is no way Spring could ever get close to the limit here.We are sending a 18 K message, for load, the maximum we will expect. In our results message size has little impact, its just the connection over head on the Spring Stack.
The StompBrokerRelayMessageHandler opens a connection to the Broker for each STOMP Connect. There is no way to pool the connections. So this makes this Spring feature completely useless for any ‘real’ web applications. In order to support our users the cost of AWS massive servers for MQ means this solution is ridiculously expensive, requiring 40 of the biggest servers. In load testing, the Amazon MQ machine is doing nothing, with the 1000 users, it is not loaded.In reality a couple of medium sized machine is all we need for all our brokers.
Has any one ever built a real world solution, as above, using Spring Stack. It appears no one has done this, and no one has scaled this up.
Has anyone written a pooling StompBrokerRelayMessageHandle. I assume there must be a reason this can’t work as it should be the default approach ? What is the issue here ?
Seems this issues makes the whole Spring Websocket + STOMP + Broker approach pretty useless and we are now forced to use a different approach for message reliability, and for messaging across servers where users are not connected (main reason we are using broker) and have gone back too using a Simple Broker, and wrote a registry to manage the client server location. So we have now eliminated the broker and the figures above are with that model. The we may add in AWS SQS for reliability of messages.
Whats left. We were going to use the Spring Cloud Gateway to load balance across multiple small WebSocket servers, but seems this approach will not work, as the WebSocket load a server can handle is just way too small. The Gateway just cannot handle it. We are now removing Spring Cloud Gateway and using a AWS load balancer instead. So now we can get significantly more connections load balanced. Why does Spring Cloud Gateway not load balance ?
Whats left. The websocket server instances are t3.mediums, they have no business logic and just pass a message between 2 clients, so it really does not need a bigger server. We would expect considerably better than 4,000 connections. However this is close to usable.
We are now drilling into the issues to get more details on where the performance bottlenecks are, but the lack of any tuning guides or scaling information does not suggest good things about Spring. Compare this to Node solutions that scale very well, and handle larger number of connections on small machines.
Next approach is to look at WebFlux + WebSocket, but then we loose STOMP. Maybe we’ll check raw websockets ?
This is just an early attempt to see if anyone actually has used Spring Websockets in anger and can share real working production architecture, as only Toy examples are available. So any help on above issues would be appreciated.

Multiple websocket channels, single ws object?

I will be subscribing to multiple websocket channels of the same server. Writing a manager to assign the various types of updates I receive to different queues based on tags present in the Json is possible, but it would save programming time to just create a multiple websocket client objects in my app, so each websocket object only subscribies to a single channel.
Is this a sensible idea or should I stick to a single websocket client?
The correct answer really depends on your architecture. However, as a general rule:
Stick to a single websocket client if you can.
Servers have a limit on the number of connections they can handle, meaning that with every new Websocket client, you're getting closer to your server's limits (even if the Websocket does absolutely nothing except remain open).
If each client opens two Websocket connections, the number of clients the server can handle is cut by half, open 4 connections per client and the server's capacity just dropped to 25%.
This directly translates to money and costs since running another server will increase your expenses. Also, the moment you have to scale beyond a single server, you add backend costs.

Cassandra throttling workload

I've been recently attempting to send a workload of read operations to a 2-node Cassandra cluster (version 2.0.9, with rf=2). My intention was to send a number of reads at a rate that is higher than the capacity of my backend servers, thereby overwhelming them and resulting in server-side queuing. To do so, I'm using the datastax java driver (cql version 2) to run my operations asynchronously (in other words, the calling thread doesn't block waiting for a response).
The problem is that I'm unable to reach a high-enough sending-rate to overload my backend servers. The # of requests that I'm sending is being somehow throttled by Cassandra. To confirm this, I've ran clients from two different machines simultaneously, and the total number of requests sent per unit time is still peaking at the same value. I'm wondering if there's a mechanism that is employed by Cassandra to throttle the amount of requests that are being received? Otherwise, what else might be causing this behavior?
Each request received by Cassandra will be handled by multiple thread pools implementing a staged event-driven architecture, where requests will be queued for each stage. You can use nodetool tpstats to inspect the current status of each queue. Once too many requests are about to overwhelm the server, Cassandra will shed load by dropping requests once queues are about to reach their capacity. You'll notice this by numbers shown in the dropped section of tpstats. In case no requests are dropped, all of them will eventually complete, but you may see higher latencies using nodetool cfhistograms or WriteTimeoutExceptions on the client.
The network bandwidth from Cassandra side is throttling the amount of requests that are being received.
As far as I know their is no other mechanism employed by Cassandra to prevent itself from receiving too much requests. Timeout Exception is the main mechanism that Cassandra use to avoid crashing when it is overloaded.
Yes, Cassandra has multiple ways to throttle incoming requests. The first action on your part would be to find out which mechanism is the culprit. Then you can tune this mechanism to fit your needs.
The first step to find out where the block occurs, would be to connect to JMX with jconsole or similar and look at the queues and block values.
If I would hazard a guess, check MessagingService for timeouts and dropped messages between nodes. Then check the native transport requests for blocked tasks before the request even get to the stages.

What is the best way to deliver real-time messages to Client that can not be requested

We need to deliver real-time messages to our clients, but their servers are behind a proxy, and we cannot initialize a connection; webhook variant won't work.
What is the best way to deliver real-time messages considering that:
client that is behind a proxy
client can be off for a long period of time, and all messages must be delivered
the protocol/way must be common enough, so that even a PHP developer could easily use it
I have in mind three variants:
WebSocket - client opens a websocket connection, and we send messages that were stored in DB, and messages comming in real time at the same time.
RabbitMQ - all messages are stored in a durable, persistent queue. What if partner will not read from a queue for some time?
HTTP GET - partner will pull messages by blocks. In this approach it is hard to pick optimal pull interval.
Any suggestions would be appreciated. Thanks!
Since you seem to have to store messages when your peer is not connected, the question applies to any other solution equally: what if the peer is not connected and messages are queueing up?
RabbitMQ is great if you want loose coupling: separating the producer and the consumer sides. The broker will store messages for you if no consumer is connected. This can indeed fill up memory and/or disk space on the broker after some time - in this case RabbitMQ will shut down.
In general, RabbitMQ is a great tool for messaging-based architectures like the one you describe:
Load balancing: you can use multiple publishers and/or consumers, thus sharing load.
Flexibility: you can configure multiple exchanges/queues/bindings if your business logic needs it. You can easily change routing on the broker without reconfiguring multiple publisher/consumer applications.
Flow control: RabbitMQ also gives you some built-in methods for flow control - if a consumer is too slow to keep up with publishers, RabbitMQ will slow down publishers.
You can refactor the architecture later easily. You can set up multiple brokers and link them via shovel/federation. This is very useful if you need your app to work via multiple data centers.
You can easily spot if one side is slower than the other, since queues will start growing if your consumers can't read fast enough from a queue.
High availability and fault tolerance. RabbitMQ is very good at these (thanks to Erlang).
So I'd recommend it over the other two (which might be good for a small-scale app, but you might grow it out quickly is requirements change and you need to scale up things).
Edit: something I missed - if it's not vital to deliver all messages, you can configure queues with a TTL (message will be discarded after a timeout) or with a limit (this limits the number of messages in the queue, if reached new messages will be discarded).
