I'm new to jobs and queues.
At the moment, I'm only really using the ->later() method on a Mail. This places each mail on the default queue.
There are instances where I need cancel jobs on the queue related to a specific model ID. I don't really see any reference to deleting pending jobs in the queue - only deleting / clearing failed.
In Telescope, there are tags showing the Model IDs associated with each pending job.
There are a few things I was hoping to do:
Delete all jobs associated with a specific model ID
Listen for the execution of a job based on a specific model ID, so I may update the database table with the date/timestamp of when the job actually executed. (users can queue emails to send hours in advance and I'd like to log when their customer actually receives the email)
Remove record associated with job since it should not exist if the email didn't actually get sent.
Hoping for some advice on how to solve this problem of needing to manage jobs in this fashion.
I'm using Redis if that makes any difference.
Related
After watching this awesome talk by Martin Klepmann about how Kafka can be used to stream events so that we can get rid of 2-phase-commits, I have a couple of questions related to updating a cache only when the database is updated properly.
Problem Statement
Lets say you have a Redis cache which stores the user's profile pic and a Postgres database which is used for all the User related operations(creating, updation, deletion, etc)
I want to update my Redis cache only and only when a new user has been successfully added to my database.
How can I do that using Kafka ?
If I am to take the example given in the video then the workflow would follow something like this:
User registers
Request is handled by User Registration Micro service
User Registration Microservice inserts a new entry into the User's table.
Then generates an User Creation Event in the user_created topic.
Cache population microservice consumes the newly created User Creation Event
Cache population microservice updates the redis cache.
The problem starts what would happen if the User Registration Microservice crashed just after writing to the database, but failed to send the event to Kafka ?
What would be the correct way of handling this ?
Does the User Registration Microservice maintain the last event it published ? How can it reliably do that ? Does it write to a DB ? Then the problem starts all over again, what if it published the event to Kafka but failed before it could update its last known offset.
There are three broad approaches one can take for this:
There's the transactional outbox pattern, wherein, in the same transaction as inserting the new entry into the user table, a corresponding user creation event is inserted into an outbox table. Some process then eventually queries that outbox table, publishes the events in that table to Kafka, and deletes the events in the table. Since the inserts are in the same transaction, they either both occur or neither occurs; barring a bug in the process which publishes the outbox to Kafka, this guarantees that every user insert eventually has an associated event published (at least once) to Kafka.
There's a more event-sourcingish pattern, where you publish the user creation event to Kafka and then some consuming process inserts into the user table based on the event. Since this happens with a delay, this strongly suggests that the user registration service needs to keep state of which users it has published creation events for (with the combination of Kafka and Postgres being the source of truth for this). Since Kafka allows a message to be consumed by arbitrarily many consumers, a different consumer can then update Redis.
Change data capture (e.g. Debezium) can be used to tie into Postgres' write-ahead log (as Postgres actually event sources under the hood...) and publish an event that essentially says "this row was inserted into the user table" to Kafka. A consumer of that event can then translate that into a user created event.
CDC in some sense moves the transactional outbox into the infrastructure, at the cost of requiring that the context it inherently throws away be reconstructed later (which is not always possible).
That said, I'd strongly advise against having ____ creation be a microservice and I'd likewise strongly advise against a RInK store like Redis. Both of these smell like attempts to paper over architectural deficiencies by adding microservices and caches.
The one-foot-on-the-way-to-event-sourcing approach isn't one I'd recommend, but if one starts there, the requirement to make the registration service stateful suddenly opens up possibilities which may remove the need for Redis, limit the need for a Kafka-like thing, and allow you to treat the existence of a DB as an implementation detail.
I have a large set of users in my project like 50m.
I should create a playlist for each user every day, for doing this, I'm currently using this method:
I have a column in my users' table that holds the latest time of creating a playlist for that user, and I name it last_playlist_created_at.
I run a query on the users' table and get the top 1000s, that selects the list of users which their last_playlist_created_at is past one day and sort the result in ascending order by last_playlist_created_at
After that, I run a foreach on the result and publish a message for each in my message-broker.
Behind the message-broker, I start around 64 workers to process the messages (create a playlist for the user) and update last_playlist_created_at in the users' table.
If my message-broker messages list was empty, I will repeat these steps (While - Do-While)
I think the processing method is good enough and can be scalable as well,
but the method we use to create the message for each user is not scalable!
How should I do to dispatch a large set of messages for each of my users?
Ok, so my answer is completely based on your comment where you mentioned that you use while(true) to check if the playlist needs to be updated which does not seem so trivial.
Although this is a design question and there are multiple solutions, here's how I would solve it.
First up, think of updating the playlist for a user as a job.
Now, in your case this is a scheduled Job. ie. once a day.
So, use a scheduler to schedule the next job time.
Write a Scheduled Job Handler to push this to a Message Queue. This part is just to handle multiple jobs at the same time where you could control the flow.
Generate the playlist for the user based on the job. Create a Schedule event for the next day.
You could persist Scheduled Job data just to avoid race conditions.
I'm thinking how to implement database schema for private one-to-one user messaging (using Laravel). All threads are only one-to-one, so no group messages or multiple participants. Best example is a dating site, where each message thread for user is represented by the recipient.
At first, I did the simplest one table approach. Like this:
id,sender_id,recipient_id,body
This works fine. The major problem however, is that a user can't delete conversation while other user keeps the messages.
The other approach is with three tables: messages, threads, participants.
This is the same approach like with multiple participants, I am simply limiting to only two. Then, if a user deletes a thread, I can simply remove him from participants for this thread.
Now here is a problem with this approach. Let's say user A sends a message to user B. Then user A deletes the thread (removed from thread participants). User B gets to keep the messages (he's a participant). What if now user A sends again a message to user B? This will actually create a new thread, since old thread is lost for this user. But the message actually needs to go to the old thread, since there's always only one thread between two users.
I'm a bit lost how to implement this. What database schema would you suggest for only one to one messaging between users, while allowing thread deletion?
Thanks in advance!
My suggestion is to leave the single table approach, and consider the deletion of a thread by a user as an operation that, instead of actually removing data, hide the messages to that user in the application from this moment on.
This could be implemented by modifying your table in this way:
id, sender_id, recipient_id, hidden_to_sender, hiddent_to_recipient, body
so that, when a user removes a thread, all its messages are marked as hidden to him (and only to him). When he send a new message to other person, it is simply added the the previous thread, with hidden set to false for both partecipant.
You could also consider an optimization such that, when a user remove a conversation with another user, if all the messages are marked as hidden to the other user, you can erase them, instead of marking the messages as hidden also to the first user.
I'm trying to initialize my data in my Azure Data Tables but I only want this to happen once on the server at startup (i.e. via the WebRole Role Entry OnStart routine). The problem is if I have multiple instances starting up at the same time then potentially either one of those instances can add records to the same table at the same time hence duplicating the data at runtime.
Is there there like an overarching routine for all instances? An application object in which I can shove a value into and check it in each of the instances to see if the tables have been created or not? A singleton of some sort that azure exposes?
Cheers
Rob
No, but you could use a Blob lease as a mutex. You could also use a table lock in SQL Azure, if you're using that.
You could also use a Queue, and drop a message in there and then just one role would pick up the message and process it.
You could create a new single instance role that does this job on role start.
To be really paranoid about this and address the event of failure in the middle of writing the data, you can do something even more complex.
A queue message is a great way to ensure transactional capabilities as long as the work you are doing can be idempotent.
Each instance adds a message to a queue.
Each instance polls the queue and on receiving a message
Reads the locking row from the table.
If the ‘create data state’ value is ‘unclaimed’
Attempts to update the row with a ‘in process’ value and a timeout expiration timestamp based on the amount of time needed to create the data.
if the update is successful, the instance owns the task of creating the data
So create the data
update the ‘create data state’ to ‘committed’
delete the message
else if the update is unsuccessful the instance does not own the task
so just delete the message.
Else if the ‘create data’ value is ‘in process’, check if the current time is past the expiration timestamp.
That would imply that the ‘in process’ failed
So try all over again to set the state to ‘in process’, delete the incomplete written rows
And try recreating the data, updating the state and deleting the message
Else if the ‘create data’ value is ‘committed’
Just delete the queue message, since the work has been done
I understand that data is always stale.
What is a way to handle a workflow task, like Approve Invoice. This task is allowed to execute once by the user. When this is processed by an async service it can take some seconds (or longer). In the meantime the user can approve the same invoice again, because the task is not updated yet in the DB.
Any ideas about this are appreciated.
The domain model must enforce consistency. The model on the write side should not be considered stale, only the projections on the read side.
It doesn't matter if the approval event hasn't been projected into the read model. But if the user sends an invalid command based on stale data, the domain model needs to know that the approval had already happened.
Your domain's repository should always get the aggregate root in its lates state (no matter if you use event sourcing or some state-based persistence as a SQL db).