Sinks.many() with different speed of consumers

Sinks.many() with different speed of consumers - spring-boot

I have dilemma, which (how) to use project reactor with real world scenario.
I have need to distribute events to all connected clients (browsers, using http and Server Sent Events). For simplicity, I am testing things as unit tests, with console subscribers.
Sample code for testing https://gist.github.com/luvarqpp/8ea6ad5ad32a8fcbf3264c00ebe351b2
My target is to make able publishing broadcast events and receive them in different browsers in different speeds.
I would like to disconnect client, when it does handle pace of events. So he will need to reconnect, to receive stream again.
PS: Sending/pushing to http can be done using SSE, simple sample code: https://gist.github.com/luvarqpp/dc941ab368504a525ac66716253b04d7
PS2: Bonus would be to be able to make "smart batching" possible to happen. i.e., when some client have X events in buffer on server side, call to custom code would make shrink of given events. Reason about events as updates to key,value database. So any update to same key can be shrunk to single update (latest value).
My approaches was using:
Sinks.many().replay().limit(0); // latest();
Sinks.many().multicast().directAllOrNothing();
Sinks.many().multicast().directBestEffort();
Sinks.many().multicast().onBackpressureBuffer(8);

Related

How to trigger GraphQL Subscriptions from a backend message queue application

Currently I'm using Socket.io / SignalR to emit an event from my backend message queue system, whenever new data is incoming. That way I can setup an event handler in my React application and update the relay cache from within the event handler.
It does not seem like the most Graphql ish way to do things, so I was playing a bit around with pre-RFC live queries implementations, where you observed data changes in reactive data stores pushed it to the graphql server, and further to the client using websockets... with some rather complex custom code... obviously graphql is not ready for real live queries (not polling)
A few lines further down it says:
When building event-based subscriptions, the problem of determining what should trigger an event is easy, since the event defines that explicitly. It also proved fairly straight-forward to implement atop existing message queue systems.
Which leads me to my question. How can you (in a graphql way) best trigger graphql subscriptions when a new event is incoming to your backend message queue application and you need to reflect this new data in the ui in realtime - let's say each second? I'm not talking about triggering the event in the frontend/client or polling ever x seconds like you usually see when talking about subscriptions.
Not sure it's relevant but I'm using Relay Modern as my preferred graphql client.
Here's some ideas that might work if I get a little help to understand in general how to trigger/call a subscription without a mutation.
Backend worker / message queue "A" receives new incoming event with some device data. It uses either SignalR, or other pubsub (redis/socket.io/?) to notify the graphql server "B" (which subscribes to the event) about a new event has happened. The graphql server then trigger/execute the subscription and the frontend react relay application "C" automatically updates, since it has a relay subscription defined. This would be ideal, right? but how to trigger subscription on the graphql server?
Simply use Socket.io/SignalR to emit events from backend worker / message queue "A" on incoming data, subscribe and handle the event in the frontend "B", and then programically calling the subscription from within the Socket.io/SignalR event handler (if such a thing, directly calling a subscription, is even possible?). But then the only improvement from using subscriptions, instead of pure Socket.io/SignalR will be that I have moved the updating of the relay cache/store from the handler to the subscription. Not a big improvement, if any. But the manual update of the cache/store is really cumbersome, although not that hard :/
How do people handle real streaming live (device) data with signalr, and why is all realtime articles/examples just repeating the same old simple chat application, where the ui just updates after a user makes a click event? Is graphql not suited yet for dealing with a stream of frequently incoming device data in realtime? I understand why live queries was delayed after playing with implementing them myself, but without them, REAL realtime data updates and push it from the server to the frontend?

Should a websocket connection be general or specific?

Should a websocket connection be general or specific?
e.g. If I was building a stock trading system, I'd likely to have real time stock prices, real time trade information, real time updates to the order book, perhaps real time chat to enable traders to collude and manipulate the market. Should I have one websocket to handle all the above data flow or is it better to have several websocket to handle different topics?

It all depends. Let's look at your options, assuming your stock trader, your chat, and your order book are built as separate servers/micro-services.
One WebSocket for each server
You can have each server running their own WebSocket server, streaming events relevant to that server.
Pros
It is a simple approach. Each server is independent.
Cons
Scales poorly. The number of open TCP connections will come at a price as the number of concurrent users increases. Increased complexity when you need to replicate the servers for redundancy, as all replicas needs to broadcast the same events. You also have to build your own fallback for recovering from client data going stale due to lost WebSocket connection. Need to create event handlers on the client for each type of event. Might have to add version handling to prevent data races if initial data is fetched over HTTP, while events are sent on the separate WebSocket connection.
Publish/Subscribe event streaming
There are many publish/subscribe solutions available, such as Pusher, PubNub or SocketCluster. The idea is often that your servers publish events on a topic/subject to a message queue, which is listened to by WebSocket servers that forwards the events to the connected clients.
Pros
Scales more easily. The server only needs to send one message, while you can add more WebSocket servers as the number of concurrent users increases.
Cons
You most likely still have to handle recovery from events lost during disconnect. Still might require versioning to handle data races. And still need to write handlers for each type of event.
Realtime API gateway
This part is more shameless, as it covers Resgate, an open source project I've been involved in myself. But it also applies to solutions such as Firebase. With the term "realtime API gateway", I mean an API gateway that not only handles HTTP requests, but operates bidirectionally over WebSocket as well.
With web clients, you are seldom interested in events - you are interested in change of state. Events are just means to either describe the changes. By fetching the data through a gateway, it can keep track on which resources the client is currently interested in. It will then keep the client up to date for as long as the data is being used.
Pros
Scales well. Client requires no custom code for event handling, as the system updates the client data for you. Handles recovery from lost connections. No data races. Simple to work with.
Cons
Primarily for client rendered web sites (using React, Vue, Angular, etc), as it works poorly with sites with server-rendered pages. Harder to apply to already existing HTTP API's.

socket io - Emit an event every X seconds or just emit it after a POST event?

I'm using socket io, and I was wondering what was better.
Emiting an event every X seconds to keep always updated with the database or emit the event after e.g a POST event, so it's more efficient.
I believe updating X seconds should be easier, and maybe has better scalability, but don't know if that's the correct way.
EDIT-1: To give more context. The application is for an accounting team. They basically want their excel sheets converted to a app. They have a lot of data, so I don't know if emitting an event every X seconds is a good idea.
Thanks.

There is no "correct" way. It depends entirely upon the needs of your client and the capabilities of your server. If the client needs to be kept more instantly up-to-date, then send data from your server to the client whenever the server has new data. If the client only needs to be updated every once-in-a-while, then only send it data every once-in-a-while. There is no "correct" way. It depends upon your application.
It is always more efficient to only send data to the client when the data has actually changed and when the client actually cares that something has changed. So, it would be foolish to send a client update every few seconds if the data isn't actually changing that often. If you have a means of knowing when the data changes on the server, then use that event to know when to send data to the client and even then, don't send it more often than the client actually cares to know.
It is always more efficient to have the server do no more work than is actually required by the client. Things like caching and keeping track of what each client was last sent can sometimes save lots of work for the server too.
Any further advice on this matter would need to know a lot more about the needs of your application and how this particular data fits into that and how often the data in question actually changes.
A summary on this topic:
Send data to the client no more often than it needs it
Sending data to the client that has not changed since the last time you changed it is inefficient for the server and consumes bandwidth.
Only you can decide how often your client needs updates (it depends upon your application)
Only you can test the impact on scalability of sending data to every client every time the data changes.
Server-side caching and keeping track of what client already has what data can help you avoid sending data to a client that it already has.
Server-side scalability probably has a lot to do with how many simultaneous clients are connected and how frequently there is changed data to send them.

Howto find out all the subscribed to filters in a PUB server?

I have a PUB server. How can it tell what filters are subscribed to, so the server knows what data it has to create?The server doesn't need to create data once no SUB clients are interested in.
Say the set of possible filters is huge ( or infinite ), but subscribers at any given time are just subscribed to a few of them.
Example: Say SUB clients are only subscribed to a weather feed data for a few area codes in New York and Paris. Then the PUB server shouldn't have to create weather data for every other area code in every other city in the world, just to throw it all away again.
How do you find out all the subscribed to filters in a PUB server?
If there is no easy way, how do I solve this in another way?

I'll answer my own question here in case its of use to anyone else.
The requirements where:
The client should be able to ask the server what ids (topics) are available for subscription.
The client should chooses the id's it is interested in and tell the server about it.
The server should created data for all subscribed too id's and send that data to clients.
The client and server should not block/hang if either one goes away.
Implementation:
Step 1. Is two way traffic, and is done with REQ/REP sockets.
Step 2. Is one way traffic from one client to one server, and is done by PUSH/PULL sockets.
Step 3. Is one way traffic from one server to many clients, and is done by PUB/SUB sockets.
Step 4. The receives can block either the server or client if the other one is not there. Therefore I followed the "lazy pirate pattern" of checking if there is anything to receive in the queue, before I try and receive. (If there is nothing in the queue I'll check again on the next loop of the program etc).
Step 4+. Clients can die without unsubscribing, and the server wont know about it, It will continue to publish data for those ids. A solution is for the client to resends the subscription information (with a timestamp) every so often to the server. This works as a heartbeat for the ids the client has subscribed too. If the client dies without unsubscribing, the server notices that some subscription ids have not been refreshed in a while (the timestamp). The server removes those ids.
This solution seems to work fine. It was a lot of low level work though. It would be nice if zeromq was a bit higher level, and had some common and reliable architectures/frameworks ready to use out of the box.

Factors Affected for Low Performance of middleware Messaging Softwares

I am planning to inegrate messaging middleware in my web application. Right now I am tesing different messaging middleware software like RabbitMQ,JMS, HornetQ, etc..
Examples provided with this softwares are working but its not giving as desired results.
So, I want to know that which are the factors which are responsible to improve peformance that one should keep in eyes?
Which are the areas, a developer should take care of to improve the performance of middleware messaging software?

I'm the project lead for HornetQ but I will try to give you a generic answer that could be applied to any message system you choose.
A common question that I see is people asking why a single producer / single consumer won't give you the expected performance.
When you send a message, and are asking confirmation right away, you need to wait:
The message transfer from client to server
The message being persisted on the disk
The server acknowledging receipt of the message by sending a callback to the client
Similarly when you are receiving a message, you ACK to the server:
The ACK is sent from client to server
The ACK is persisted
The server sends back a callback saying that the callback was achieved
And if you need confirmation for all your message-sends and mesage-acks you need to wait these steps as you have a hardware involved on persisting the disk and sending bits on the network.
Message Systems will try to scale up with many producers and many consumers. That is if many are producing they should all use the resources available at the server shared for all the consumers.
There are ways to speed up a single producer or single consumer:
One is by using transactions. So, you minimize the blocks and syncs you perform on disk while persisting at the server and roundtrips on the network. (This is actually the same on any database)
Another one, is by using Callbacks instead of blocking at the consumer. (JMS 2 is proposing a Callback similar to the ConfirmationHandler on HornetQ).
Also: most providers I know will have a performance section on their docs with requirements and suggestions for that specific product. You should look individually at each product

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio