Publish event to RabbitMQ when data get expiration time - events

I am currently on an event driven architecture. We have some data which has an expiration time on our data base. what is the best solution to publish an event(data-expired) just in the exact moment when that data get expiration time? I would like to do that in real time a not doing an scheduler process or batch.

I think to achieve that you will need the DB to have some mechanism to support that. And I can't think of any right now (even less without knowing which DB).
But from the top of my head, I can say: if that expiration date is never modified after being inserted in DB, you could just insert a message in RabbitMQ at the same moment that you insert the data in DB. You would set a TTL in the message, and configure dead lettering in that queue. That way, when the message expires it gets automatically published to another exchange.
You can read more here: https://www.rabbitmq.com/dlx.html

Related

system design - How to update cache only after persisted to database?

After watching this awesome talk by Martin Klepmann about how Kafka can be used to stream events so that we can get rid of 2-phase-commits, I have a couple of questions related to updating a cache only when the database is updated properly.
Problem Statement
Lets say you have a Redis cache which stores the user's profile pic and a Postgres database which is used for all the User related operations(creating, updation, deletion, etc)
I want to update my Redis cache only and only when a new user has been successfully added to my database.
How can I do that using Kafka ?
If I am to take the example given in the video then the workflow would follow something like this:
User registers
Request is handled by User Registration Micro service
User Registration Microservice inserts a new entry into the User's table.
Then generates an User Creation Event in the user_created topic.
Cache population microservice consumes the newly created User Creation Event
Cache population microservice updates the redis cache.
The problem starts what would happen if the User Registration Microservice crashed just after writing to the database, but failed to send the event to Kafka ?
What would be the correct way of handling this ?
Does the User Registration Microservice maintain the last event it published ? How can it reliably do that ? Does it write to a DB ? Then the problem starts all over again, what if it published the event to Kafka but failed before it could update its last known offset.
There are three broad approaches one can take for this:
There's the transactional outbox pattern, wherein, in the same transaction as inserting the new entry into the user table, a corresponding user creation event is inserted into an outbox table. Some process then eventually queries that outbox table, publishes the events in that table to Kafka, and deletes the events in the table. Since the inserts are in the same transaction, they either both occur or neither occurs; barring a bug in the process which publishes the outbox to Kafka, this guarantees that every user insert eventually has an associated event published (at least once) to Kafka.
There's a more event-sourcingish pattern, where you publish the user creation event to Kafka and then some consuming process inserts into the user table based on the event. Since this happens with a delay, this strongly suggests that the user registration service needs to keep state of which users it has published creation events for (with the combination of Kafka and Postgres being the source of truth for this). Since Kafka allows a message to be consumed by arbitrarily many consumers, a different consumer can then update Redis.
Change data capture (e.g. Debezium) can be used to tie into Postgres' write-ahead log (as Postgres actually event sources under the hood...) and publish an event that essentially says "this row was inserted into the user table" to Kafka. A consumer of that event can then translate that into a user created event.
CDC in some sense moves the transactional outbox into the infrastructure, at the cost of requiring that the context it inherently throws away be reconstructed later (which is not always possible).
That said, I'd strongly advise against having ____ creation be a microservice and I'd likewise strongly advise against a RInK store like Redis. Both of these smell like attempts to paper over architectural deficiencies by adding microservices and caches.
The one-foot-on-the-way-to-event-sourcing approach isn't one I'd recommend, but if one starts there, the requirement to make the registration service stateful suddenly opens up possibilities which may remove the need for Redis, limit the need for a Kafka-like thing, and allow you to treat the existence of a DB as an implementation detail.

Scaling message queues with lots of API calls

I have an application where some of my user's actions must be retrieved via a 3rd party api.
For example, let's say I have a user that can receive tons of phone calls. This phone call record should be update often because my user want's to see the call history, so I should do this "almost in real time". The way I managed to do this is to retrieve every 10 minutes the list of all my logged users and, for each user I enqueue a task that retrieves the call record list from the timestamp of the latest saved record to the current timestamp and saves all that to my database.
This doesn't seems to scale well because the more users I have, then, the more connected users I'll have and the more tasks i'll enqueue.
Is there any other approach to achieve this?
Seems straightforward with background queue of jobs. It is unlikely that all users use the system at the same rate so queue jobs based on their use. With fall back to daily.
You will likely at some point need more workers taking jobs from the queue and then multiple queues so if you had a thousand users the ones with a later queue slot are not waiting all the time.
It also depends how fast you need this updated and limit on api calls.
There will be some sort of limit. So suggest you start with committing to updated with 4h or 1h delay to always give some time and work on improving this to sustain level.
Make sure your users are seeing your data and cached api not live call api data incase it goes away.

Spring caching solution with time based data

I'm a 'caching beginner' and I was looking at Spring's alternatives to solve the following requirements:
I have some time based data that is inserted into the database every minute. Once the data is inserted it will never be modified or deleted. Also, data will never be inserted in any days prior to the current one (no insertions 'in the past').
Users frequently request past data between a starting date and the current one. I would like their requests to be fulfilled by a mixed cache/database solution.
E.G. If an user requests last week of data once a day every day, I would like to access the cache for the first 6 days and the database for the last one. The cache would then be updated and I would have the same behavior the day after.
Is there a way to configure/implement this in a clean way using any of Spring's caching alternatives?
Thank you.
EHCache support all of this and more and it integrates with Spring nicely.
[update] - If I am reading your question right, you need to configure timeToLive and timeToIdle on your cache. All of this is documented in the main configuration page.

Can DB2 tell a web-app when a table data is updated?

I have a table of non trivial size on a DB2 database that is updated X times a day per user input in another application. This table is also read by my web-app to display some info to another set of users. I have a large number of users on my web app and they need to do lots of fuzzy string lookups with data that is up-to-the-minute accurate. So, I need a server side cache to do my fuzzy logic on and to keep the DB from getting hammered.
So, what's the best option? I would hate to pull the entire table every minute when the data changes so rarely. I could setup a trigger to update a timestamp of a smaller table and poll that to see if I need refresh my cache, but that seems hacky to.
Ideally I would like to have DB2 tell my web-app when something changes, or at least provide a very lightweight mechanism to detect data level changes.
I think if your web application is running in WebSphere, setting up MQ would be a pretty good solution.
You could write triggers that use the MQ Series routines to add things to a queue, and your web app could subscribe to the queue and listen for updates.
If your web app is not in WebSphere then you could still look at this option but it might be more difficult.
A simple solution could be to have a timestamp (somewhere) for the latest change on to table.
The timestamp could be located in a small table/view that is updated by either the application that updates the big table or by an update-trigger on the big table.
The update-triggers only task would be to update the "help"-timestamp with currenttimestamp.
Then the webapp only checks this timestamp.
If the timestamp is newer then what the webapp has then the data is reread from the big table.
A "low-tech"-solution thats fairly non intrusive to the exsisting system.
Hope this solution fits your setup.
Regards
Sigersted
Having the database push a message to your webapp is certainly doable via a variety of mechanisms (like mqseries, etc). Similar and easier is to write a java stored procedure that gets kicked off by the trigger and hands the data to your cache-maintenance interface. But both of these solutions involve a lot of versioning dependencies, etc that could be a real PITA.
Another option might be to reconsider the entire approach. Is it possible that instead of maintaining a cache on your app's side you could perform your text searching on the original table?
But my suggestion is to do as you (and the other poster) mention - and just update a timestamp in a single-row table purposed to do this, then have your web-app poll that table. Similarly you could just push the changed rows to this small table - and have your cache-maintenance program pull from this table. Either of these is very simple to implement - and should be very reliable.

Compare and Contrast Change Data Capture and Database Change Notification

Oracle has two seemingly competing technologies. CDC and DCN.
What are the strengths of each?
When would you use one and not the other?
In general, you would use DCN to notify a client application that the client application needs to clear/ update the application's cache. You would use CDC for ETL processing.
DCN would generally be preferable when you have an OLTP application that needs to be notified immediately about data changes in the database. Since the goal here is to minimize the number of network round-trips and the number of database hits, you'd generally want the application to use DCN for queries which either are mostly static. If a large fraction of the query is changing regularly, you may be better off just refreshing the application's cache on a set frequency rather than running queries constantly to get the changed data (DCN does not contain the changed data, just the ROWID of the row(s) that changed). If the application goes down, I believe DCN allows changes to be lost.
CDC would generally be preferable when you have a DSS application that needs to periodically pull over all the data that changed in a number of tables. CDC can guarantee that the subscriber has received every change to the underlying table(s) which can be important if you are trying to replicate changes to a different database . CDC allows the subscriber to pull the changes at its convenience rather than trying to notify the subscriber that there are changes, so you'd definitely want CDC if you wanted the subscriber to process new data every hour or every day rather than in near real time. (note: DCN also has a guaranteed delivery mode, see comments below. --Mark Harrison)
CDC seems to be much more complex to set up than DCN.
I mean to setup DCN I wrap a select in a start and end DCN block and then write a procedure to be called with a collect of changes. That's it.
CDC requires publishers and subscribers and anyways, seems like more work.

Resources