Microservice - persisting to RDBMS & queue within a transaction - microservices

I have a REST service - all its requests are persisted to its own relational database. So far, good. But, there is also a small business functionality (email notification, sms alert) that should be run on the newly received/updated data. For this process to work on data in background, it requires some way to know about the persisted data - a message queue would fix the problem. Three common ways I see designing this,
The REST service inserts into the database, also, publish to the queue, too.
The problem here is, distributed transaction - combining different types within one transaction - relational database & the queue. Some tools may support, some may not.
As usual REST service persists only to its database. Additionally it also inserts the data into another table to which a scheduled job queries, publishes them to queue (from which the background job should start its work).
The problem I see is the scheduler - not reactive, batchprocessing, limited by the time slot, not realtime, slow and others.
The REST endpoint publishes the data directly to a topic. A consumer persists it to the database, whereas another process it in the background.
Something like eventsourcing. TMU, it is bit complex to implement as the number of services grow. Also, if the db is down, the persistent service would fail to save the data, however the background service (say, the emailer) would send email which is functionaly wrong. This may lead to inconsistency among the services, also functional.
I have also thought of reading database transaction-logs, but it seems more complex, requires tools to configurations to make it work, also, it seems right for data processing systems than for our use case.
What's your thought on this - did I miss anything? How do you manage such scenarios? What should be looked for? Thinking reactive, say Vertx?
Apologies if this looks very naive, but I have to ask.

I think best approach is 2 with a CDC(change data capture) system like debezium.
See [https://microservices.io/patterns/data/transactional-outbox.html][1]

I usually recommend option 3 if you don't need immediate read after write consistency. Background job should retry if the database record is still not updated by the message it processes.
Your post exemplifies why queues shouldn't be used for these types of scenarios. They are good for delivering analytical data or logs, but for task orchestration developers have to reinvent the wheel every time.
The much better approach is to use a task orchestration system like Cadence Workflow that eliminates issues you described and makes multi-service orchestration much simpler.
See this presentation that explains the Cadence programming model.

Related

How to rehydrate state stored aggregates in Axon Framework

I have state stored aggregates in a PostgreSQL database. I'm testing replay by deleting the state stored tables and the token_entry table and restarting the application. All events will be replayed and aggregates are restored in-memory. However, my state stored tables stay empty. I was thinking that they would also be restored?
I'm using SpringBoot and latest Axon. The code, at this moment, is as simple as it can get.
Whenever you're using Axon's State-Stored Aggregates, they'll only be stored as-is. Hence, throwing away the stored instances and starting the application will not trigger a replay.
When removing TrackingTokens, or initiating a replay by invoking StreamingEventProcessor#resetTokens (this is the recommended approach, by the way), you're effectively telling the Event Processors of your application to start event streaming from scratch.
This part of Axon Framework supports the so-called Query Side of CQRS. The Aggregate support in Axon Framework is specifically for the Command Side of an application.
Long story short, State-Stored Aggregates don't have replay support. If you want your Aggregates (Command Models) to be replayed, you will have to use Event Sourcing Aggregates instead.
I hope the above clarifies a little about the misconception between Token and Aggregates from your question. And by the way, if you feel Axon's Reference Guide should be adjusted to clarify your situation, you're always free to file an issue.

How to migrate to Event-Sourcing?

we are migrating from a legacy monolith application to a microservice architecture. we use CQRS and event sourcing pattern and message broker (rabbitmq) for communication mechanism. now We are facing a challenge about how can convert the old database to new architecture and how can use event sourcing for these? Assuming the old database did not have events, can we do the data conversion without creating events? what is the start point of our old database data in the event sourcing pattern?
One important thing to remember is that many databases internally event source: every write goes to a log and that log is used to update tables, replicate etc., after which the log is truncated. It's equivalent to event sourcing with a lot of snapshots and very little retention of events and old snapshots.
In these databases (which include the likes of Postgres, MySQL, Oracle, SQL Server, Cassandra, CosmosDB, to name ones I know from experience do this), there's a technique called Change Data Capture which essentially taps into the log and exposes a stream of changes to the database which can be treated as events from the database (or by extension as commands: "one service's events are another service's commands"). Debezium can be used to write CDC records to Kafka; for RabbitMQ you may need to roll something yourself, in which case you'll want to get acquainted with how CDC is exposed in your database.
Even if the database doesn't support CDC, if the data isn't that large, you can often turn it into an ersatz event stream by periodically dumping its data (if the records are timestamped, this can even work if the data is particularly slow moving) and implementing a service to track what changed: this won't tell you about changes that netted out, but it's often better than nothing. This sort of dump is also likely to be required if you need a "genesis" event to ensure that your initial state is current to when you moved to event-sourcing or CDC.
This whole broad family of techniques has limitations compared to full event sourcing: reifying what changed is not as valuable as reifying what changed and why it changed. But it can be a useful middle ground in migrating to event-sourcing.
By referring #alexey-zimarev's answer at this post, it's essential to have the starting event in your event sourced database. You can not configure an event-sourced aggregate without replaying its events. Therefore, you need to map the legacy snapshot to an individual domain event of your relevant aggregate.
Either the way, considering event souring definition by Martin Fowler:
The fundamental idea of Event Sourcing is that of ensuring every
change to the state of an application is captured in an event object,
and that these event objects are themselves stored in the sequence
they were applied for the same lifetime as the application state
itself.
So that, it's not an appropriate solution to migrate legacy snapshots into the newer one without extracting and storing domain events. It will turn your event-sourced project into a semi-event-sourced project which is not considered as a paradigm to design and develop.
You have an event store that is a database for events. you can create event data that you need for the old database and insert into the event store. After that, do event replaying for creating read models.

Microservice failure Scenario

I am working on Microservice architecture. One of my service is exposed to source system which is used to post the data. This microservice published the data to redis. I am using redis pub/sub. Which is further consumed by couple of microservices.
Now if the other microservice is down and not able to process the data from redis pub/sub than I have to retry with the published data when microservice comes up. Source can not push the data again. As source can not repush the data and manual intervention is not possible so I tohught of 3 approaches.
Additionally Using redis data for storing and retrieving.
Using database for storing before publishing. I have many source and target microservices which use redis pub/sub. Now If I use this approach everytime i have to insert the request in DB first than its response status. Now I have to use shared database, this approach itself adding couple of more exception handling cases and doesnt look very efficient to me.
Use kafka inplace if redis pub/sub. As traffic is low so I used Redis pub/sub and not feasible to change.
In both of the above cases, I have to use scheduler and I have a duration before which I have to retry else subsequent request will fail.
Is there any other way to handle above cases.
For the point 2,
- Store the data in DB.
- Create a daemon process which will process the data from the table.
- This Daemon process can be configured well as per our needs.
- Daemon process will poll the DB and publish the data, if any. Also, it will delete the data once published.
Not in micro service architecture, But I have seen this approach working efficiently while communicating 3rd party services.
At the very outset, as you mentioned, we do indeed seem to have only three possibilities
This is one of those situations where you want to get a handshake from the service after pushing and after processing. In order to accomplish the same, using a middleware queuing system would be a right shot.
Although a bit more complex to accomplish, what you can do is use Kafka for streaming this. Configuring producer and consumer groups properly can help you do the job smoothly.
Using a DB to store would be a overkill, considering the situation where you "this data is to be processed and to be persisted"
BUT, alternatively, storing data to Redis and reading it in a cron-job/scheduled job would make your job much simpler. Once the job is run successfully, you may remove the data from cache and thus save Redis Memory.
If you can comment further more on the architecture and the implementation, I can go ahead and update my answer accordingly. :)

Notifying golongpoll.SubscriptionManager of an event from kafka-go

I was writing a POC on long-polling using go.
I see the general package to be used is https://github.com/jcuga/golongpoll .
But assuming that I would want to publish an event to the golongpoll.SubscriptionManager from a general context, especially when there is a possibility that the long poll API request is being served by one machine, while the Kafka event for that particular consumer group is consumed by another instance in the cluster.
The examples given in the documentation did not talk of such a scenario at all, even though this seems like a common scenario. One way I can think of is have a distributed cache like Redis in between and have all the services poll this for a change? But that sounds a bit dumb to me.

Micro-services architecture, need advise

We are working on a system that is supposed to 'run' jobs on distributed systems.
When jobs are accepted they need to go through a pipeline before they can be executed on the end system.
We've decided to go with a micro-services architecture but there one thing that bothers me and i'm not sure what would be the best practice.
When a job is accepted it will first be persisted into a database, then - each micro-service in the pipeline will do some additional work to prepare the job for execution.
I want the persisted data to be updated on each such station in the pipeline to reflect the actual state of the job, or the its status in the pipeline.
In addition, while a job is being executed on the end system - its status should also get updated.
What would be the best practice in sense of updating the database (job's status) in each station:
Each such station (micro-service) in the pipeline accesses the database directly and updates the job's status
There is another micro-service that exposes the data (REST) and serves as DAL, each micro-service in the pipeline updates the job's status through this service
Other?....
Help/advise would be highly appreciated.
Thanx a lot!!
To add to what was said by #Anunay and #Mohamed Abdul Jawad
I'd consider writing the state from the units of work in your pipeline to a view (table/cache(insert only)), you can use messaging or simply insert a row into that view and have the readers of the state pick up the correct state based on some logic (date or state or a composite key). as this view is not really owned by any domain service it can be available to any readers (read-only) to consume...
Consider also SAGA Pattern
A Saga is a sequence of local transactions where each transaction updates data within a single service. The first transaction is initiated by an external request corresponding to the system operation, and then each subsequent step is triggered by the completion of the previous one.
http://microservices.io/patterns/data/saga.html
https://dzone.com/articles/saga-pattern-how-to-implement-business-transaction
https://medium.com/#tomasz_96685/saga-pattern-and-microservices-architecture-d4b46071afcf
If you would like to code the workflow:
Micorservice A which accepts the Job and command for update the job
Micorservice B which provide read model for the Job
Based on JobCreatedEvents use some messaging queue and process and update the job through queue pipelines and keep updating JobStatus through every node in pipeline.
I am assuming you know things about queues and consumers.
Myself new to Camunda(workflow engine), that might be used not completely sure
accessing some shared database between microservices is highly not recommended as this will violate the basic rule of microservices architecture.
microservice must be autonomous and keep it own logic and data
also to achive a good microservice design you should losely couple your microservices
Multiple microservices accessing the database is not recommended. Here you have the case where each of the service needs to be triggered, then they update the data and then some how call the next service.
You really need a mechanism to orchestrate the services. A workflow engine might fit the bill.
I would however suggest an event driven system. I might be going beyond with a limited knowledge of the data that you have. Have one service that gives you basic crud on data and other services that have logic to change the data (I would at this point would like to ask why you want different services to change the state, if its a biz req, its fine) Once you get the data written just create an event to which services can subscribe and react to it.
This will allow you to easily add more states to your pipeline in future.
You will need a service to manage the event queue.
As far as logging the state of the event was concerned it can be done easily by logging the events.
If you opt for workflow route you may use Amazon SWF or Camunda or really there quite a few options out there.
If going for the event route you need to look into event driven system in mciroservies.

Resources