Persistent subscriber in Firebase - elasticsearch

Is there built-in support or any way of implementing a persistent subscription in Firebase?
I need to set up a backend which reacts to certain events in my Firebase database. If the backend has crashed or is being restarted I need it to catch up with anything that has happened while it was down.
For example, I want to re-index certain objects in ElasticSearch when they change. If the backend is down I need to re-index any changed objects when the backend comes back up again.

Nothing is built in for that, although you can definitely build it on top of Firebase by adding a isIndexed or isDirty property to the items.
But the more common approach is to stuff the items that need to re-indexed into a queue and use a worker process that removes them from the queue when they've been handled. I highly recommend using firebase-queue for that.

Related

What is the best way to share events between Google cloud run containers

I have a service which is running on many cloud run containers.
When a single container (A) receives a web request to do some work, I need all the other live containers to fetch some updated data from elasticsearch.
I would have expected ES to have a "listening" type of connection such as firebase but this is not possible.
Right now I am having to poll the database from each service.
Is there a better way to achieve this sort of cross container sync when using cloud run? Would pub/sub be the best solution here?
It's unusual but not impossible to achieve.
First of all, you have to understand the instance life cycle: the CPU is allocated only when a request is being processed. Else, the CPU is throttle ( bellow 5%). That's also for that you pay only when your instance is processing, and not when the instance is kept warm (and offloaded after a while).
That being said, it's totally useless and inefficient to update instances in background when a request is not being processed.
Therefore, the idea is to perform something when the instance receive a request. The bad thing is that this solution will increase the request latency (the instance start to sync his cache and then process the request).
Finally the solution is to store, somewhere, the latest cache update. You have to keep that pretty same information in your instance. When the instance receive a request, first thing, it compares its own cache date with the central data date.
If it's the same, no problem, continue the processing.
If the central data date is after the current instance date, update the instance data, and then process the request.
You can store the data, and the date of that data in Firestore for instance, or in MemoryStore, or in any other databases.
PubSub can be also a solution but more complex to implement. Each instance, when they start have to create a pull subscription on a topic. When the instance is killed, you have to delete that subscription.
Then, when a request comes in, your instance have to pull the subscription, and get the messages, if any, and update his local cache.
Could be faster than the previous solution, but harder to implement.

How to migrate to Event-Sourcing?

we are migrating from a legacy monolith application to a microservice architecture. we use CQRS and event sourcing pattern and message broker (rabbitmq) for communication mechanism. now We are facing a challenge about how can convert the old database to new architecture and how can use event sourcing for these? Assuming the old database did not have events, can we do the data conversion without creating events? what is the start point of our old database data in the event sourcing pattern?
One important thing to remember is that many databases internally event source: every write goes to a log and that log is used to update tables, replicate etc., after which the log is truncated. It's equivalent to event sourcing with a lot of snapshots and very little retention of events and old snapshots.
In these databases (which include the likes of Postgres, MySQL, Oracle, SQL Server, Cassandra, CosmosDB, to name ones I know from experience do this), there's a technique called Change Data Capture which essentially taps into the log and exposes a stream of changes to the database which can be treated as events from the database (or by extension as commands: "one service's events are another service's commands"). Debezium can be used to write CDC records to Kafka; for RabbitMQ you may need to roll something yourself, in which case you'll want to get acquainted with how CDC is exposed in your database.
Even if the database doesn't support CDC, if the data isn't that large, you can often turn it into an ersatz event stream by periodically dumping its data (if the records are timestamped, this can even work if the data is particularly slow moving) and implementing a service to track what changed: this won't tell you about changes that netted out, but it's often better than nothing. This sort of dump is also likely to be required if you need a "genesis" event to ensure that your initial state is current to when you moved to event-sourcing or CDC.
This whole broad family of techniques has limitations compared to full event sourcing: reifying what changed is not as valuable as reifying what changed and why it changed. But it can be a useful middle ground in migrating to event-sourcing.
By referring #alexey-zimarev's answer at this post, it's essential to have the starting event in your event sourced database. You can not configure an event-sourced aggregate without replaying its events. Therefore, you need to map the legacy snapshot to an individual domain event of your relevant aggregate.
Either the way, considering event souring definition by Martin Fowler:
The fundamental idea of Event Sourcing is that of ensuring every
change to the state of an application is captured in an event object,
and that these event objects are themselves stored in the sequence
they were applied for the same lifetime as the application state
itself.
So that, it's not an appropriate solution to migrate legacy snapshots into the newer one without extracting and storing domain events. It will turn your event-sourced project into a semi-event-sourced project which is not considered as a paradigm to design and develop.
You have an event store that is a database for events. you can create event data that you need for the old database and insert into the event store. After that, do event replaying for creating read models.

Should we store Events in a database? (Event Driven Design)

We have several services that publishes and subscribes to Domain Events. What we usually do is log events whenever we publish and log events whenever we process events. We basically use this to apply choreography pattern.
We are not doing Event Sourcing in these systems, and there's no programmatic use for them after publishing/processing. That's the main driver we opted not to store these in a durable container, like a database or event store.
Question is, are we missing some fundamental thing by doing this?
Is storing Events a must?
I consider queued messages as system messages, even if they represent some domain event in an event-driven architecture (pub/sub messaging).
There is absolutely no hard-and-fast rule about their storage. If you would like to keep them around you could have your messaging mechanism forward them to some auditing endpoint for storage and then remove them after some time (if necessary).
You are not missing anything fundamental by not storing them.
You're definitely not missing out on anything (but there is a catch) especially if that's not a need by the business. An Event-Sourced System would definitely store all the events generated by the system into a database (or any other event-store)
The main use of an event store is to be able to restore the state of the system to the current state in case of a failure by replaying messages. To make this process of recovery faster we have snapshots.
In your case since these events are just are only relevant until the process is completed, it would not make sense to store them until you have a failure. (this is the catch) especially in a Distributed Transaction case scenario.
What I would suggest?
Don't store the event themselves but log the relevant details about these events and maybe use an ELK stack or Grafana to store these logs.
Use either the Saga Pattern or the Routing Slip pattern in case of a Distributed Transaction and log them as well.
In case a failure occurs while processing an event, put that event into an exception queue and handle it. If it's a part of a distributed transaction make sure either they all have the same TransactionId or they have a CorrelationId so you can lookup for logs and save your system.
For reliably performing your business transactions in a distributed archicture you somehow need to make sure that your events are published at least once.
So a service that publishes events needs to persist such an event within the same transaction that causes it to get created.
Considering you are publishing an event via infrastructure services (e.g. a messaging service) you can not rely on it being available all the time.
Also, your own service instance could go down after persisting your newly created or changed aggregate but before it had the chance to publish the event via, for instance, a messaging service.
Question is, are we missing some fundamental thing by doing this? Is storing Events a must?
It doesn't matter that you are not doing event sourcing. Unless it is okay from the business perspective to sometimes lose an event forever you need to temporarily persist your event with your local transaction until it got published.
You can look into the Transactional Outbox Pattern to achieve reliable event publishing.
Note: Logging/tracking your events somehow for monitoring or later analyzing/reporting purpose is a different thing and has another motivation.

Sharing events between two Laravel applications

Is it possible to have one Laravel application listen for events triggered in another?
I've built a REST API to complement an existing web app. It uses the same database but I've built it as a separate application and there are certain events which clear some cached results. At the moment the events are not being shared between the two applications so I'm getting the cached results in spite of having updated the database. Is there a way for one app to pick up on events fired by the other? I haven't found anything about this in the docs.
Redis is completely agnostic about what application is listening to it. You can set your broadcast driver to redis and invoke your events in one application while listening on the other as long as they both use the same Redis instance. The other can then listen for those events. However, of note is that the way that Laravel handles the listeners is to bind to a specific class. So you would still have to make sure the class existed so you may define a listener for it.

Event Aggregator Error Handling With Rollback

I've been studying a lot of the common ways that developers design/architect an application on domain driven design (Still trying to understand the concept as a whole). Some of the examples that I saw included the use of events via an event aggregator. I liked the concept because it truly keeps the different elements/domains of an application decoupled.
A concern that I have is: how do you rollback an operation in the case of an error?
For example:
Say I have an order application that has to save an order to the database and also save a copy of the order as a pdf to a CMS. The application fires an event that a new order has been created and the pdf service that subscribes to this event saves the pdf. Meanwhile when committing the order changes to the database an exception is thrown. The problem is that the pdf has been saved but their isn't a matching database record.
Should I cache the previously handled events and fire a new error event that looks to the cache for "undo" operations? Use something like the command pattern for this?
Or... is the event aggregator not a good pattern for this.
Edit
I'm starting to think that maybe events should be used for less "mission critical" items, such as emailing and logging.
My initial thought was to limit dependencies by using the event aggregator pattern.
You want the event to be committed in the same transaction as the operation on your database.
In this particular scenario, you can push the event on a queue, which enlists in your transaction, so that the event will never go out unless the aggregate is persisted. This will make creating the PDF eventual consistent; if creating the PDF fails, you can fix the problem, and have it automatically retried.
Maybe you can get more inspiration in one of my previous posts on eventual consistent domain events with RavenDB and IronMQ.
Handling an event before it actually happened (committed) only works if the event handler participates in the transaction. Make the event handler transactional (for instance by storing the PDF in a database), or publish and handle events after the transaction committed.

Resources