WebSocket pushing database updates - heroku

Most of the articles on the web dealing with WebSockets are about in-memory Chat.
I'm interested in kind of less instant Chat, that is persistent, like a blog's post's comments.
I have a cluster of two servers handling client requests.
I wonder what could be the best strategy to handle pushing of database update to corresponding clients.
As I'm using Heroku to handle this cluster (of 2 web dynos), I obviously read this tutorial aiming to build a Chat Room shared between all clients.
It uses Redis in order to centralize coming messages; each server listening for new messages to propagate to web clients through websocket connections.
My use case differs in that I've got a Neo4j database, persisting into it each message written by any client.
My goal is to notify each client from a specific room that a new message/comment has just been persisted by a client.
With an architecture similar to the tutorial linked above, how could I filter only new messages to propagate to user? Is there an easy and efficient way to tell Redis:
"(WebSocket saying) When my client initiates the websocket connection, I take care to make a query for all persisted messages and sent them to client, however I want you (Redis) to feed me with all NEW messages, that I didn't send to client, so that I will be able to provide them."
How to prevent Redis from publishing the whole conversation each time a websocket connection is made? It would lead to duplications since the database query already provided the existing contents at the moment.

This is actually a pretty common scenario, where you have three components:
A cluster of stateless web servers that maintain open connections with all clients (load balanced across the cluster, obviously)
A persistent main data storage - Neo4j in your case
A messaging/queueing backend for broadcasting messages across channels (thus across the server cluster) - Redis
Your requirement is for new clients to receive an initial feed of the recent messages, and any consequent messages in real-time. All of this is implemented in your connection handlers.
Essentially, this is what your (pseudo-)code should look like:
class ConnectionHandler:
redis = redis.get_connection()
def on_init():
self.send("hello, here are all the recent messages")
recent_msgs = fetch_msgs_from_neo4j()
self.send(recent_msgs)
redis.add_listener(on_msg)
self.send("now listening on new messages")
def on_msg(msg):
self.send("new message: ")
self.send(msg)
The exact implementation really depends on your environment, but this is the general flow of things.

Related

Redis publisher also receiving the message

I am building a scalable chat application using Go and Redis w/ websockets.
I need to publish a new message using redis pub-sub model to other websocket servers to inform all the users (saved in memory of other servers) about the new joined user.
But the issue is, the publisher(also a redis client) receives the same message. Is there a direct way to solve this?
Workaround:
Check if the user for new user in the received event (for publisher) is in the list of current local users everytime.
WHY NEGATIVE VOTES? I'm so pissed at stack-overflow these days. People have no tolerance or too much arrogance

Scaleable Live Chat Architectures: Proxy-Managed v.s. Broker-Managed

I've done a lot of thinking about how to implement scaleable live-chat architectures., and I've come to the conclusion that there is two major ways to implement live-chat architectures: I call them Proxy-Managed and Broker-Managed.
Proxy-Managed:
a proxy ensures all users in a chatroom receives messages by routing all users in a chatroom to the same WebSocket (WS) server in a pool of scaleable WS servers; one chatroom exists within a server selected by load balancing
pros: is a system of minimal number of components
diagram: proxy-managed (on imgur since not enough rep 🙁)
Broker-Managed:
a broker ensures that all users, who may be routed to different WS servers, receive messages by uploading the message to all WS servers that handle the chatroom that the message is in; one chatroom exists across all servers
pro: load is distributed based on the routing algorithm of the proxy
diagram: broker-managed
I'm creating this thread to discuss the architectures, not for a specific answer. I would like to see more points for and against either architecture.

Should a websocket connection be general or specific?

Should a websocket connection be general or specific?
e.g. If I was building a stock trading system, I'd likely to have real time stock prices, real time trade information, real time updates to the order book, perhaps real time chat to enable traders to collude and manipulate the market. Should I have one websocket to handle all the above data flow or is it better to have several websocket to handle different topics?
It all depends. Let's look at your options, assuming your stock trader, your chat, and your order book are built as separate servers/micro-services.
One WebSocket for each server
You can have each server running their own WebSocket server, streaming events relevant to that server.
Pros
It is a simple approach. Each server is independent.
Cons
Scales poorly. The number of open TCP connections will come at a price as the number of concurrent users increases. Increased complexity when you need to replicate the servers for redundancy, as all replicas needs to broadcast the same events. You also have to build your own fallback for recovering from client data going stale due to lost WebSocket connection. Need to create event handlers on the client for each type of event. Might have to add version handling to prevent data races if initial data is fetched over HTTP, while events are sent on the separate WebSocket connection.
Publish/Subscribe event streaming
There are many publish/subscribe solutions available, such as Pusher, PubNub or SocketCluster. The idea is often that your servers publish events on a topic/subject to a message queue, which is listened to by WebSocket servers that forwards the events to the connected clients.
Pros
Scales more easily. The server only needs to send one message, while you can add more WebSocket servers as the number of concurrent users increases.
Cons
You most likely still have to handle recovery from events lost during disconnect. Still might require versioning to handle data races. And still need to write handlers for each type of event.
Realtime API gateway
This part is more shameless, as it covers Resgate, an open source project I've been involved in myself. But it also applies to solutions such as Firebase. With the term "realtime API gateway", I mean an API gateway that not only handles HTTP requests, but operates bidirectionally over WebSocket as well.
With web clients, you are seldom interested in events - you are interested in change of state. Events are just means to either describe the changes. By fetching the data through a gateway, it can keep track on which resources the client is currently interested in. It will then keep the client up to date for as long as the data is being used.
Pros
Scales well. Client requires no custom code for event handling, as the system updates the client data for you. Handles recovery from lost connections. No data races. Simple to work with.
Cons
Primarily for client rendered web sites (using React, Vue, Angular, etc), as it works poorly with sites with server-rendered pages. Harder to apply to already existing HTTP API's.

Why do you need a message queue for a chat with web sockets?

I have seen a lot of examples on the internet of chats using web sockets and RabbitMQ (https://github.com/videlalvaro/rabbitmq-chat), however I do not understand why it is need it a message queue for a chat application.
Why it is not ok to send the message from the browser via web sockets to the server and then the server to broadcast that message to the rest of active browsers using again web sockets with broadcast method? (maybe I am missing something)
Pseudo code examples (using socket.io):
// client (browser)
socket.emit("message","my great message that will be received by all"
// server (any server can be, but let's just say that it is also written in JavaScript
socket.on("message", function(msg) {
socket.broadcast.emit(data);
});
// the rest of the browsers
socket.on("message", function(msg) {
// display on the screen the message
});
i don't think RabbitMQ should be used for a chat room, personally. at least, not in the "chat" or "room" part of the application.
unless your chat rooms don't care about history at all - and i think most do care about that - a message queue like RMQ doesn't make much sense.
you would be better off storing the message in a database and keeping a marker for each user to say what message they last saw.
now, you may end up needing something like RMQ to facilitate the process of the chat application. you can offload process from the web servers, for example, and push all messages through RMQ to a back-end service that updates the database and cache layers, for example.
this would allow you to scale the front-end web servers much faster, and support more users per web server. and that sounds like a good use of RMQ, but is not specific to chat apps. it's just good practice for scaling web apps / systems.
the key, in my experience, is that RMQ is not responsible for delivery of the messages to the users / chat rooms. that happens through websockets or similar technologies that are designed to be used per user.
Simple answer ...
For a simple chat app you don't need a queue (e.g. signalr would do exactly this without the queue).
Typically though real world applications are not just "a simple chat app", the queue might represent the current state of the room for new users joining perhaps, so the server knows what list of messages to serve up when that happens.
Also it's worth noting that message queues are often implemented when you want reliable messaging (e.g. Service bus) to ensure that all messages definitely get to where they should go even if the first attempt fails. So it's likely that the queue is included in many examples as a default primer in to later problem solving.
I may be late for the answer as the messaging domain changed rapidly in last few years. Applications like WhatsApp do not store messages in their database, and also provide E2E encryption.
Coming to RabbitMQ, they support MQTT protocol which is ideal for low latency high scalability applications. Thus using such queuing services offload the heavy work from your server and provide features like scalability and security.
uhmm I didn't understand exactly for are you looking for...
but In RabbiMQ you always publish a messages to an exchange and consume the message using a queue.
to "broadcast that message" you need to consume it.
hope it helps

What is the best way to deliver real-time messages to Client that can not be requested

We need to deliver real-time messages to our clients, but their servers are behind a proxy, and we cannot initialize a connection; webhook variant won't work.
What is the best way to deliver real-time messages considering that:
client that is behind a proxy
client can be off for a long period of time, and all messages must be delivered
the protocol/way must be common enough, so that even a PHP developer could easily use it
I have in mind three variants:
WebSocket - client opens a websocket connection, and we send messages that were stored in DB, and messages comming in real time at the same time.
RabbitMQ - all messages are stored in a durable, persistent queue. What if partner will not read from a queue for some time?
HTTP GET - partner will pull messages by blocks. In this approach it is hard to pick optimal pull interval.
Any suggestions would be appreciated. Thanks!
Since you seem to have to store messages when your peer is not connected, the question applies to any other solution equally: what if the peer is not connected and messages are queueing up?
RabbitMQ is great if you want loose coupling: separating the producer and the consumer sides. The broker will store messages for you if no consumer is connected. This can indeed fill up memory and/or disk space on the broker after some time - in this case RabbitMQ will shut down.
In general, RabbitMQ is a great tool for messaging-based architectures like the one you describe:
Load balancing: you can use multiple publishers and/or consumers, thus sharing load.
Flexibility: you can configure multiple exchanges/queues/bindings if your business logic needs it. You can easily change routing on the broker without reconfiguring multiple publisher/consumer applications.
Flow control: RabbitMQ also gives you some built-in methods for flow control - if a consumer is too slow to keep up with publishers, RabbitMQ will slow down publishers.
You can refactor the architecture later easily. You can set up multiple brokers and link them via shovel/federation. This is very useful if you need your app to work via multiple data centers.
You can easily spot if one side is slower than the other, since queues will start growing if your consumers can't read fast enough from a queue.
High availability and fault tolerance. RabbitMQ is very good at these (thanks to Erlang).
So I'd recommend it over the other two (which might be good for a small-scale app, but you might grow it out quickly is requirements change and you need to scale up things).
Edit: something I missed - if it's not vital to deliver all messages, you can configure queues with a TTL (message will be discarded after a timeout) or with a limit (this limits the number of messages in the queue, if reached new messages will be discarded).

Resources