Performance improvement idea about saving large number of Objects into the dabatase - spring

My web application has one feature where it allows the user to send messages to all friends. The number of friends can be 100K to 200K. The application is using Spring and Hibernate.
It entails fetching the friends' info, building the message object and saving it to the database. After all the messages are sent (actually saved to the db), a notification will popup showing how many messages are sent successfully such as 99/100 sent or 100/100 sent.
However, when I was load testing this feature. It took an extremely long time to finish. I am trying to improve the performance. One approach I tried was to divide the friends into small batches and fetch/save each batch concurrently and wait on all of them to finish. But that still didn't get too much improvement.
I was wondering if there are any other ways I can try. Another approach I can think of is to use WebSockets to send each batch and update the notification after each batch and start the next batch until all the batches are sent. But how can the user still get the notification after he navigates away from the message page? The Websocket logic on the client side has to be somewhere global, correct?
Thank you in advance.

Related

How to design notification system that sends real-time alerts created by users

I've been thinking about how to design a system that supports user created scheduled alerts. My problem is once the alerts are created and inserted into a database, I don't know what the best way to go about scheduling those alerts. Polling the database to see which alerts need to go out next doesn't seem entirely right to me.
What are some ways this could be handled on a scale where say a million users could create their own custom alerts like change baby diaper at 3pm everyday?
This problem is very suitable for cloud platforms. For example, you could use GCP Cloud Scheduler to invoke a cloud function when the alert is supposed to be sent out. The cloud function then calls some API to alert the user.
If cloud platforms are not an option, you could have your application spawn a new thread when an alert is created, and sleep that thread for a certain duration. When it wakes up, it sends the alert. Less elegant and less scalable than the first solution, but it would still work.

How to reduce long WebSocket IO pauses?

I have a tool called Tendermint, which is written in Golang. It processes transactions and creates blocks (details are intentionally omitted). Transactions can be submitted through the WebSocket server. Blocks are configured to be created ~ every second.
Now, when I open two or more WS connections and submit more transactions than the application can handle, periodically, Tendermint gets stuck.
During this time, it does not create any blocks, but instead spends the significant portion of its time handling WebSocket IO.
I still don't understand the exact nature of these pauses. Maybe someone here knows or can ask the right questions? Also, I'm wondering what the ways to limit the IO are? Throttle each connection?
NOTE: I'm using https://github.com/gorilla/websocket for WebSockets. Our WS server can be found here.
Thank you for your time!
UPD 1: I've managed to flatten the pauses by batching responses in our WS server (see https://github.com/tendermint/tendermint/issues/3905#issuecomment-684860429)

How reliable is delaying a Mail in Laravel?

I want to inform the seller, that the buyer is coming soon (about 2 hours before pickup time) via mail.
I would normally do it the hard way with CRON and a database table. Checking hourly if I find an order with pickup time minus 2 hours, only then sending the mail out.
Now, I would like to know if you would recommend using Queueing Jobs for sending Mails out.
With
$when = now()->addDays(10); //I would dynamically set the date
Mail::to($order->seller())
->later($when, new BuyerIsComing($order));
I can delay the delivery of a queued email message.
But how safe would this be? Especially, if someone is ordering something but is picking it up in let us exaggerate two months?
Is the Laravel queueing system rigid enough to behave correctly after long delays (i.e. 2 months)?
Edit
I'm using Redis for Queueing
You actually have nothing to worry about. Sending mail usually increases the response time of your application, so it's a good thing you want to delay the sending.
Queues are the way to go and it's pretty easy to setup in Laravel. Laravel supports a couple of them out of the box. I would advise you start with database and then try beanstalk etc.
Lastly and somehow more importantly, use a process manager like Supervisor to monitor and maintain your queue workers...
Take a look at https://laravel.com/docs/5.7/queues for more insight.
Cheers.
If by safe, you mean reliable, then it would be little different than sending an email immediately. If there's ever a possibility that your server "hiccups" and doesn't send an email, that possibility would be the same now as 10 minutes from now. Once the job is in the queue, it is persisted until completion (unless you use a memory-based driver, like Redis, which could get reset if the server reboots).
If you are using a database queue driver or remote, the log of queued jobs will remain even if the server is unavailable for a short period of time. Your queue will be honored even if the exact time stamp for when you want to send the job has expired. For instance if you schedule to send an email at 1:00pm but your server is down at that exact moment, when it comes back online it will still see the job because it is stored as incomplete and the time for the job is in the past, which will trigger the execution of the job at the next time your queue worker checks the job list.
Of course, this assumes that you have your queue worker set up to always check jobs and automatically restart, even after a server failure, but that's a different discussion with lots of solutions...such as those shown here.
If you're using database driver with Laravel queues to process your email then you don't need to worry about anything.
Jobs are only removed from Jobs table if they are successfully completed otherwise their next attempt time is set which is few minutes in future and they are executed again (if your queue worker is online).
So its completely safe to use Laravel queues

socket io - Emit an event every X seconds or just emit it after a POST event?

I'm using socket io, and I was wondering what was better.
Emiting an event every X seconds to keep always updated with the database or emit the event after e.g a POST event, so it's more efficient.
I believe updating X seconds should be easier, and maybe has better scalability, but don't know if that's the correct way.
EDIT-1: To give more context. The application is for an accounting team. They basically want their excel sheets converted to a app. They have a lot of data, so I don't know if emitting an event every X seconds is a good idea.
Thanks.
There is no "correct" way. It depends entirely upon the needs of your client and the capabilities of your server. If the client needs to be kept more instantly up-to-date, then send data from your server to the client whenever the server has new data. If the client only needs to be updated every once-in-a-while, then only send it data every once-in-a-while. There is no "correct" way. It depends upon your application.
It is always more efficient to only send data to the client when the data has actually changed and when the client actually cares that something has changed. So, it would be foolish to send a client update every few seconds if the data isn't actually changing that often. If you have a means of knowing when the data changes on the server, then use that event to know when to send data to the client and even then, don't send it more often than the client actually cares to know.
It is always more efficient to have the server do no more work than is actually required by the client. Things like caching and keeping track of what each client was last sent can sometimes save lots of work for the server too.
Any further advice on this matter would need to know a lot more about the needs of your application and how this particular data fits into that and how often the data in question actually changes.
A summary on this topic:
Send data to the client no more often than it needs it
Sending data to the client that has not changed since the last time you changed it is inefficient for the server and consumes bandwidth.
Only you can decide how often your client needs updates (it depends upon your application)
Only you can test the impact on scalability of sending data to every client every time the data changes.
Server-side caching and keeping track of what client already has what data can help you avoid sending data to a client that it already has.
Server-side scalability probably has a lot to do with how many simultaneous clients are connected and how frequently there is changed data to send them.

Progress notifications from HTTP/REST service

I'm working on a web application that submits tasks to a master/worker system that farms out the tasks to any of a series of worker instances. The work queue master runs as a separate process (on a separate machine altogether) and tasks are submitted to the master via HTTP/REST requests. Once tasks are submitted to the work queue, client applications can submit another HTTP request to get status information about tasks.
For my web application, I'd like it to provide some sort of progress bar view that gives the user some indication of how far along task processing has come. The obvious way to implement this would be an AJAX progress meter widget that periodically polls the work queue for status on the tasks that have been submitted. My question is, is there a better way to accomplish this without the frequent polling?
I've considered having the client web application open up a server socket on which it could listen for notifications from the work master. Another similar thought I've had is to use XMPP or a similar protocol for the status notifications. (Of course, the master/worker system would need to be updated to provide notifications either way but I own the code for that so can make any necessary updates myself.)
Any thoughts on the best way to set up a notification system like this? Is the extra effort involved worth it, or is the simple polling solution the way to go?
Polling
The client keeps polling the server to get the status of the response.
Pros
Being really RESTful means cacheable and scaleable.
Cons
Not the best responsiveness if you do not want to poll your server too much.
Persistent connection
The server does not close its HTTP connection with the client until the response is complete. The server can send intermediate status through this connection using HTTP multiparts.
Comet is the most famous framework to implement this behaviour.
Pros
Best responsiveness, almost real-time notifications from the server.
Cons
Connection limit is limited on a web server, keeping a connection open for too long might, at best load your server, at worst open the server to Denial of Service attacks.
Client as a server
Make the server post status updates and the response to the client as if it were another RESTful application.
Pros
Best of every worlds, no resources are wasted waiting for the response, either on the server or on the client side.
Cons
You need a full HTTP server and web application stack on the client
Firewalls and routers with their default "no incoming connections at all" will get in the way.
Feel free to edit to add your thoughts or a new method!
I guess it depends on a few factors
How accurate the feedback can be (1 percent, 5 percent, 50 percent) Accurate feedback makes it worth pursuing some kind of progress bar and comet style push. If you can only say "Busy... hold on... almost there... done" then a simple ajax "are we there yet" poll is certainly easier to code.
How timely the Done message has to be seen by the client
How long each task takes (1 second, 10 seconds, 10 minutes)
1 second makes it a bit moot. 10 seconds makes it worth it. 10 minutes means you're better off suggesting the user goes for a coffee break :-)
How many concurrent requests there will be
Unless you've got a "special" server, live push style systems tend to eat connections and you'll be maxed out pretty quickly. Having to throw more webservers in for a fancy progress bar might hurt the budget.
I've got some sample code on 871184 that shows a hand rolled "forever frame" which seems to work out well. The project I developed that for isn't hammered all that hard though, the operations take a few seconds and we can give pretty accurate percent. The code uses asp.net and jquery, but the general techniques will work with any server and javascript framework.
edit As John points out, status reporting probably isn't the job of the RESTful service. But there's nothing that says you can't open an iframe on the client that hooks to a page on the server that polls the service. Theory says the server and the service will at least be closer to one another :-)
Look into Comet. You make a single request to the server and the server blocks and holds the connection open until an update in status occurs. Once that happens the response is sent and committed. The browser receives this response, handles it and immediately re-requests the same URL. The effect is that of events being pushed to the browser. There are pros and cons and it might not be appropriate for all use cases but would provide the most timely status updates.
My opinion is to stick with the polling solution, but you might be interested in this Wikipedia article on HTTP Push technologies.
REST depends on HTTP, which is a request/response protocol. I don't think you're going to get a pure HTTP server calling the client back with status.
Besides, status reporting isn't the job of the service. It's up to the client to decide when, or if, it wants status reported.
One approach I have used is:
When the job is posted to the server, the server responds back a pubnub-channel id (one could alternatively use Google's PUB-SUB kind of service).
The client on browser subscribes to that channel and starts listening for messages.
The worker/task server publishes status on that pubnub channel to update the progress.
On receiving messages on the subscribed pubnub-channel, the client updates the web UI.
You could also use self-refreshing iframe, but AJAX call is much better. I don't think there is any other way.
PS: If you would open a socket from client, that wouldn't change much - PHP browser would show the page as still "loading", which is not very user-friendly. (assuming you would push or flush buffer to have other things displayed before)

Resources