How do I fire scheduled events without checking the database every second for each record? - events

I have rows of data containing an email address, phone, message, and a datetime.The message will be sent to the email address and phone number at the specified datetime. It might also be sent to a webhook URL.
Suppose I have 50 million rows of these data with dates spanning to over a year in the future. And Suppose I want the messages sent the second they're due or even within the minute they're due.
Does Calendly/Google Calendar/Google Meet really check their databases each minute to send those '30minute before' notifications for millions of events? Does Gmail check the database every second if a scheduled email is due? Or is there another way to fire events specified in the database without querying as much?

Related

How to process a logic or job periodically for all users in a large scale?

I have a large set of users in my project like 50m.
I should create a playlist for each user every day, for doing this, I'm currently using this method:
I have a column in my users' table that holds the latest time of creating a playlist for that user, and I name it last_playlist_created_at.
I run a query on the users' table and get the top 1000s, that selects the list of users which their last_playlist_created_at is past one day and sort the result in ascending order by last_playlist_created_at
After that, I run a foreach on the result and publish a message for each in my message-broker.
Behind the message-broker, I start around 64 workers to process the messages (create a playlist for the user) and update last_playlist_created_at in the users' table.
If my message-broker messages list was empty, I will repeat these steps (While - Do-While)
I think the processing method is good enough and can be scalable as well,
but the method we use to create the message for each user is not scalable!
How should I do to dispatch a large set of messages for each of my users?
Ok, so my answer is completely based on your comment where you mentioned that you use while(true) to check if the playlist needs to be updated which does not seem so trivial.
Although this is a design question and there are multiple solutions, here's how I would solve it.
First up, think of updating the playlist for a user as a job.
Now, in your case this is a scheduled Job. ie. once a day.
So, use a scheduler to schedule the next job time.
Write a Scheduled Job Handler to push this to a Message Queue. This part is just to handle multiple jobs at the same time where you could control the flow.
Generate the playlist for the user based on the job. Create a Schedule event for the next day.
You could persist Scheduled Job data just to avoid race conditions.

Publish event to RabbitMQ when data get expiration time

I am currently on an event driven architecture. We have some data which has an expiration time on our data base. what is the best solution to publish an event(data-expired) just in the exact moment when that data get expiration time? I would like to do that in real time a not doing an scheduler process or batch.
I think to achieve that you will need the DB to have some mechanism to support that. And I can't think of any right now (even less without knowing which DB).
But from the top of my head, I can say: if that expiration date is never modified after being inserted in DB, you could just insert a message in RabbitMQ at the same moment that you insert the data in DB. You would set a TTL in the message, and configure dead lettering in that queue. That way, when the message expires it gets automatically published to another exchange.
You can read more here: https://www.rabbitmq.com/dlx.html

Data structure and file structure for storing append only messages?

A message is a bundle of data of variable size with a unique message ID(integer). I'd like to have a design/data structure/algorithm to:
be able to efficiently store the messages on the disk, the number of messages can be very big, length is variable. But there is no update or modification of stored ones.
be able to retrieve a message with a message ID, i.e. return the message stored.
recently stored messages are queried more often than old ones
each message has a TTL, need a way to truncate the file with old messages
What is the proper data structure and file structure for this need?
If we're talking five messages per second, then you're talking on the order of a half million messages per day.
What I've done in the past is maintain multiple files. If the TTL for messages is measured in days, I have one messages file per day. The process that reads and stores messages creates a new file for the first message of a new day. This is trivial to implement by keeping track of the date and time the last message was received.
I also maintain a paired index file with each messages file. This, too, is a simple sequential file that contains message ID and position for each message. So to look up a message for a particular day, you load that day's file, do a binary search for the message ID, and then use the corresponding position to look up the message in the messages file. Lookup should be very fast within an index if the message IDs are sequential and no numbers are missing. If you can have missing numbers, then binary search works well. And with only 512K messages, binary search will be very fast.
To handle multiple days, you have the lookup program's startup sequence scan the directory for all daily message indexes and build a meta-index that contains the IDs for the first message in each day.
To delete old messages, you have the lookup program delete old files on startup, or have it do that at midnight every day. At that time it can also get the ID for the first message in the next day's file.
Or, the message gatherer can spawn a task to delete old files when it receives the first message for a new day. You can also make it notify the lookup program of the new day so that the lookup program can update its meta index.
With only 512K messages per day (5 per second is about a half million per day), you should be able to keep 10 day's worth of index entries in memory without trouble. Your index will contain a message ID and file offset, so figure 16 bytes per entry. Times 5 million for 10 days, that's like 80 megabytes: pocket change. To remove old entries (once per day), just delete that day's index from memory.
If messages have varying TTL, then you keep older messages around but keep track of their TTL. When somebody looks up an expired message, you'll have to do a secondary check on the expiration date before returning it. And of course you'll have to keep track of the longest TTL for each day so that you can delete the file when all of its messages have expired.
This is a pretty low-tech solution, but you can code it up in a day and it works and performs surprisingly well. I've used it in several projects, to great effect.

Query on Oracle scheduler

I have a requirement in which I need to call a process which sends a perticular message for every X days for a customer till N days.
Basically, it's like the process runs every day fetching the customers into cursor then the process should check when was the last message sent for each customer if it was sent exactly X days before then I need to send the message to those customers.
I can handle this in the process by adding a extra column to track last notification date and refer that for sending. But it will be a performance hit..
So Can any one suggest me if there is a simpler way to handle this .
Kindly let me know if you need clarification on any part
I don't think that would be a performance bump !
If you are adding the a column in the same table , anyway only one query is gonna be executed. So I didn't likely to be a performance bump.

Scaling message queues with lots of API calls

I have an application where some of my user's actions must be retrieved via a 3rd party api.
For example, let's say I have a user that can receive tons of phone calls. This phone call record should be update often because my user want's to see the call history, so I should do this "almost in real time". The way I managed to do this is to retrieve every 10 minutes the list of all my logged users and, for each user I enqueue a task that retrieves the call record list from the timestamp of the latest saved record to the current timestamp and saves all that to my database.
This doesn't seems to scale well because the more users I have, then, the more connected users I'll have and the more tasks i'll enqueue.
Is there any other approach to achieve this?
Seems straightforward with background queue of jobs. It is unlikely that all users use the system at the same rate so queue jobs based on their use. With fall back to daily.
You will likely at some point need more workers taking jobs from the queue and then multiple queues so if you had a thousand users the ones with a later queue slot are not waiting all the time.
It also depends how fast you need this updated and limit on api calls.
There will be some sort of limit. So suggest you start with committing to updated with 4h or 1h delay to always give some time and work on improving this to sustain level.
Make sure your users are seeing your data and cached api not live call api data incase it goes away.

Resources