Microservice failure Scenario - spring-boot

I am working on Microservice architecture. One of my service is exposed to source system which is used to post the data. This microservice published the data to redis. I am using redis pub/sub. Which is further consumed by couple of microservices.
Now if the other microservice is down and not able to process the data from redis pub/sub than I have to retry with the published data when microservice comes up. Source can not push the data again. As source can not repush the data and manual intervention is not possible so I tohught of 3 approaches.
Additionally Using redis data for storing and retrieving.
Using database for storing before publishing. I have many source and target microservices which use redis pub/sub. Now If I use this approach everytime i have to insert the request in DB first than its response status. Now I have to use shared database, this approach itself adding couple of more exception handling cases and doesnt look very efficient to me.
Use kafka inplace if redis pub/sub. As traffic is low so I used Redis pub/sub and not feasible to change.
In both of the above cases, I have to use scheduler and I have a duration before which I have to retry else subsequent request will fail.
Is there any other way to handle above cases.

For the point 2,
- Store the data in DB.
- Create a daemon process which will process the data from the table.
- This Daemon process can be configured well as per our needs.
- Daemon process will poll the DB and publish the data, if any. Also, it will delete the data once published.
Not in micro service architecture, But I have seen this approach working efficiently while communicating 3rd party services.

At the very outset, as you mentioned, we do indeed seem to have only three possibilities
This is one of those situations where you want to get a handshake from the service after pushing and after processing. In order to accomplish the same, using a middleware queuing system would be a right shot.
Although a bit more complex to accomplish, what you can do is use Kafka for streaming this. Configuring producer and consumer groups properly can help you do the job smoothly.
Using a DB to store would be a overkill, considering the situation where you "this data is to be processed and to be persisted"
BUT, alternatively, storing data to Redis and reading it in a cron-job/scheduled job would make your job much simpler. Once the job is run successfully, you may remove the data from cache and thus save Redis Memory.
If you can comment further more on the architecture and the implementation, I can go ahead and update my answer accordingly. :)

Related

What is the best way to share events between Google cloud run containers

I have a service which is running on many cloud run containers.
When a single container (A) receives a web request to do some work, I need all the other live containers to fetch some updated data from elasticsearch.
I would have expected ES to have a "listening" type of connection such as firebase but this is not possible.
Right now I am having to poll the database from each service.
Is there a better way to achieve this sort of cross container sync when using cloud run? Would pub/sub be the best solution here?
It's unusual but not impossible to achieve.
First of all, you have to understand the instance life cycle: the CPU is allocated only when a request is being processed. Else, the CPU is throttle ( bellow 5%). That's also for that you pay only when your instance is processing, and not when the instance is kept warm (and offloaded after a while).
That being said, it's totally useless and inefficient to update instances in background when a request is not being processed.
Therefore, the idea is to perform something when the instance receive a request. The bad thing is that this solution will increase the request latency (the instance start to sync his cache and then process the request).
Finally the solution is to store, somewhere, the latest cache update. You have to keep that pretty same information in your instance. When the instance receive a request, first thing, it compares its own cache date with the central data date.
If it's the same, no problem, continue the processing.
If the central data date is after the current instance date, update the instance data, and then process the request.
You can store the data, and the date of that data in Firestore for instance, or in MemoryStore, or in any other databases.
PubSub can be also a solution but more complex to implement. Each instance, when they start have to create a pull subscription on a topic. When the instance is killed, you have to delete that subscription.
Then, when a request comes in, your instance have to pull the subscription, and get the messages, if any, and update his local cache.
Could be faster than the previous solution, but harder to implement.

Microservice - persisting to RDBMS & queue within a transaction

I have a REST service - all its requests are persisted to its own relational database. So far, good. But, there is also a small business functionality (email notification, sms alert) that should be run on the newly received/updated data. For this process to work on data in background, it requires some way to know about the persisted data - a message queue would fix the problem. Three common ways I see designing this,
The REST service inserts into the database, also, publish to the queue, too.
The problem here is, distributed transaction - combining different types within one transaction - relational database & the queue. Some tools may support, some may not.
As usual REST service persists only to its database. Additionally it also inserts the data into another table to which a scheduled job queries, publishes them to queue (from which the background job should start its work).
The problem I see is the scheduler - not reactive, batchprocessing, limited by the time slot, not realtime, slow and others.
The REST endpoint publishes the data directly to a topic. A consumer persists it to the database, whereas another process it in the background.
Something like eventsourcing. TMU, it is bit complex to implement as the number of services grow. Also, if the db is down, the persistent service would fail to save the data, however the background service (say, the emailer) would send email which is functionaly wrong. This may lead to inconsistency among the services, also functional.
I have also thought of reading database transaction-logs, but it seems more complex, requires tools to configurations to make it work, also, it seems right for data processing systems than for our use case.
What's your thought on this - did I miss anything? How do you manage such scenarios? What should be looked for? Thinking reactive, say Vertx?
Apologies if this looks very naive, but I have to ask.
I think best approach is 2 with a CDC(change data capture) system like debezium.
See [https://microservices.io/patterns/data/transactional-outbox.html][1]
I usually recommend option 3 if you don't need immediate read after write consistency. Background job should retry if the database record is still not updated by the message it processes.
Your post exemplifies why queues shouldn't be used for these types of scenarios. They are good for delivering analytical data or logs, but for task orchestration developers have to reinvent the wheel every time.
The much better approach is to use a task orchestration system like Cadence Workflow that eliminates issues you described and makes multi-service orchestration much simpler.
See this presentation that explains the Cadence programming model.

Micro-services architecture, need advise

We are working on a system that is supposed to 'run' jobs on distributed systems.
When jobs are accepted they need to go through a pipeline before they can be executed on the end system.
We've decided to go with a micro-services architecture but there one thing that bothers me and i'm not sure what would be the best practice.
When a job is accepted it will first be persisted into a database, then - each micro-service in the pipeline will do some additional work to prepare the job for execution.
I want the persisted data to be updated on each such station in the pipeline to reflect the actual state of the job, or the its status in the pipeline.
In addition, while a job is being executed on the end system - its status should also get updated.
What would be the best practice in sense of updating the database (job's status) in each station:
Each such station (micro-service) in the pipeline accesses the database directly and updates the job's status
There is another micro-service that exposes the data (REST) and serves as DAL, each micro-service in the pipeline updates the job's status through this service
Other?....
Help/advise would be highly appreciated.
Thanx a lot!!
To add to what was said by #Anunay and #Mohamed Abdul Jawad
I'd consider writing the state from the units of work in your pipeline to a view (table/cache(insert only)), you can use messaging or simply insert a row into that view and have the readers of the state pick up the correct state based on some logic (date or state or a composite key). as this view is not really owned by any domain service it can be available to any readers (read-only) to consume...
Consider also SAGA Pattern
A Saga is a sequence of local transactions where each transaction updates data within a single service. The first transaction is initiated by an external request corresponding to the system operation, and then each subsequent step is triggered by the completion of the previous one.
http://microservices.io/patterns/data/saga.html
https://dzone.com/articles/saga-pattern-how-to-implement-business-transaction
https://medium.com/#tomasz_96685/saga-pattern-and-microservices-architecture-d4b46071afcf
If you would like to code the workflow:
Micorservice A which accepts the Job and command for update the job
Micorservice B which provide read model for the Job
Based on JobCreatedEvents use some messaging queue and process and update the job through queue pipelines and keep updating JobStatus through every node in pipeline.
I am assuming you know things about queues and consumers.
Myself new to Camunda(workflow engine), that might be used not completely sure
accessing some shared database between microservices is highly not recommended as this will violate the basic rule of microservices architecture.
microservice must be autonomous and keep it own logic and data
also to achive a good microservice design you should losely couple your microservices
Multiple microservices accessing the database is not recommended. Here you have the case where each of the service needs to be triggered, then they update the data and then some how call the next service.
You really need a mechanism to orchestrate the services. A workflow engine might fit the bill.
I would however suggest an event driven system. I might be going beyond with a limited knowledge of the data that you have. Have one service that gives you basic crud on data and other services that have logic to change the data (I would at this point would like to ask why you want different services to change the state, if its a biz req, its fine) Once you get the data written just create an event to which services can subscribe and react to it.
This will allow you to easily add more states to your pipeline in future.
You will need a service to manage the event queue.
As far as logging the state of the event was concerned it can be done easily by logging the events.
If you opt for workflow route you may use Amazon SWF or Camunda or really there quite a few options out there.
If going for the event route you need to look into event driven system in mciroservies.

Data replication in Micro Services: restoring database backup

I am currently working with a legacy system that consists of several services which (among others) communicate through some kind of Enterprise Service Bus (ESB) to synchronize data.
I would like to gradually work this system towards the direction of micro services architecture. I am planning to reduce the dependency on ESB and use more of message broker like RabbitMQ or Kafka. Due to some resource/existing technology limitation, I don't think I will be able to completely avoid data replication between services even though I should be able to clearly define a single service as the data owner.
What I am wondering now, how can I safely do a database backup restore for a single service when necessary? Doing so will cause the service to be out of sync with other services that hold the replicated data. Any experience/suggestion regarding this?
Have your primary database publish events every time a database mutation occurs, and let the replicated services subscribe to this event and apply the same mutation on their replicated data.
You already use a message broker, so you can leverage your existing stack for broadcasting the events. By having replication done through events, a restore being applied to the primary database will be propagated to all other services.
Depending on the scale of the backup, there will be a short period where the data on the other services will be stale. This might or might not be acceptable for your use case. Think of the staleness as some sort of eventual consistency model.

Cache values in Java EE

I'm building a simple message delegation application. Messages are being send on both ends via JMS. I'm using a MDB to process incoming messages, transform them and send them to a target queue. Unfortunately the same messages can be send to the incoming queue more than once but it is not allowed to forward duplicates.
So what is the best way to accomplish that?
Since there can be multiple MDBs listening on the incoming queue a need a single cache where I can store the unique message uuids of the incoming messages for at least an hour. How should this cache be accessed? Via a singleton/ static class (I'm running Java EE 5 and thus don't have the singleton annotation)?
In addition I think all operations must be synchronized, right? Does that harm performance too much?
#Ingo: are you OK with database solution. You can full fledged DB server or simple apache derby solution for this..
If so, you can have a simple table where you can store message unique UId and can check against it for uniqueness....this solution will have following benefits:
Simple code
No need of time bound cache(1 hour). You can check for uniqueness of a message forever.
Persistent record of what messages came in.
No need of expensive synchronized, you can rely on DB isolation level to have consistency.
centralized solution for your possibly many deployments of application.

Resources