What is the best way to share events between Google cloud run containers - elasticsearch

I have a service which is running on many cloud run containers.
When a single container (A) receives a web request to do some work, I need all the other live containers to fetch some updated data from elasticsearch.
I would have expected ES to have a "listening" type of connection such as firebase but this is not possible.
Right now I am having to poll the database from each service.
Is there a better way to achieve this sort of cross container sync when using cloud run? Would pub/sub be the best solution here?

It's unusual but not impossible to achieve.
First of all, you have to understand the instance life cycle: the CPU is allocated only when a request is being processed. Else, the CPU is throttle ( bellow 5%). That's also for that you pay only when your instance is processing, and not when the instance is kept warm (and offloaded after a while).
That being said, it's totally useless and inefficient to update instances in background when a request is not being processed.
Therefore, the idea is to perform something when the instance receive a request. The bad thing is that this solution will increase the request latency (the instance start to sync his cache and then process the request).
Finally the solution is to store, somewhere, the latest cache update. You have to keep that pretty same information in your instance. When the instance receive a request, first thing, it compares its own cache date with the central data date.
If it's the same, no problem, continue the processing.
If the central data date is after the current instance date, update the instance data, and then process the request.
You can store the data, and the date of that data in Firestore for instance, or in MemoryStore, or in any other databases.
PubSub can be also a solution but more complex to implement. Each instance, when they start have to create a pull subscription on a topic. When the instance is killed, you have to delete that subscription.
Then, when a request comes in, your instance have to pull the subscription, and get the messages, if any, and update his local cache.
Could be faster than the previous solution, but harder to implement.

Related

Process a stream of sessions on aws

Is there a way to implement somethong like Flink's session-window on aws with lambda and some way of managing messages?
We have a stream of small events with a session id. We cannot guarantee the order of the arriving events and we don't always have a session-finished event. We know that session ids are unique. We also know that when a session is finished it won't be restarted. We also know that when the session is active we will receive a message every minute or so. We need to process the entire session as a whole.
We want to wait for a silent time of X minutes, and if no messages arrive we will process the entire session as a whole.
This is exactly what Flink's silent window does, is there a way to do the same thing purely using aws lambda and it's triggers?
There can be 10s of millions of sessions at the same time
It's not possible with an AWS Lambda.
Lambdas are stateless, they are able to process messages one by one, but cannot offer any processing over a sequence of messages, which would be required for the kind of windowing logic you describe.
Maybe an option for you would be Kinesis Data Analytics? Under the hood, this one is actually Flink, although it's provided as a managed service by AWS, so maybe you'll get there the "lambda-like" experience you're looking for?

Microservice failure Scenario

I am working on Microservice architecture. One of my service is exposed to source system which is used to post the data. This microservice published the data to redis. I am using redis pub/sub. Which is further consumed by couple of microservices.
Now if the other microservice is down and not able to process the data from redis pub/sub than I have to retry with the published data when microservice comes up. Source can not push the data again. As source can not repush the data and manual intervention is not possible so I tohught of 3 approaches.
Additionally Using redis data for storing and retrieving.
Using database for storing before publishing. I have many source and target microservices which use redis pub/sub. Now If I use this approach everytime i have to insert the request in DB first than its response status. Now I have to use shared database, this approach itself adding couple of more exception handling cases and doesnt look very efficient to me.
Use kafka inplace if redis pub/sub. As traffic is low so I used Redis pub/sub and not feasible to change.
In both of the above cases, I have to use scheduler and I have a duration before which I have to retry else subsequent request will fail.
Is there any other way to handle above cases.
For the point 2,
- Store the data in DB.
- Create a daemon process which will process the data from the table.
- This Daemon process can be configured well as per our needs.
- Daemon process will poll the DB and publish the data, if any. Also, it will delete the data once published.
Not in micro service architecture, But I have seen this approach working efficiently while communicating 3rd party services.
At the very outset, as you mentioned, we do indeed seem to have only three possibilities
This is one of those situations where you want to get a handshake from the service after pushing and after processing. In order to accomplish the same, using a middleware queuing system would be a right shot.
Although a bit more complex to accomplish, what you can do is use Kafka for streaming this. Configuring producer and consumer groups properly can help you do the job smoothly.
Using a DB to store would be a overkill, considering the situation where you "this data is to be processed and to be persisted"
BUT, alternatively, storing data to Redis and reading it in a cron-job/scheduled job would make your job much simpler. Once the job is run successfully, you may remove the data from cache and thus save Redis Memory.
If you can comment further more on the architecture and the implementation, I can go ahead and update my answer accordingly. :)

Microservice State Synchronization

We are working on an application that has a WebSocket connection to every client. For high availability and load balancing purposes, we would like to scale the receiving micro service. As the WebSocket connection is used to propagate the state of a client to every other client it is important to synchronize the current state of a client with all other instances of the receiving micro service. It is also important that the state has to be reset when a client disconnects.
To give you some specs:
We are using docker swarm
Its a NodeJS Backend and an Angular 9 Frontend
We have looked into multiple ideas, for example:
Redis Cache (The state would not be deleted if the instance fails.)
Queues/Topics (This would mean every instance has to keep track of the current state of all clients.)
WebSockets between instances (This looks promising but is not really scalable.)
What is the best practice to sync the state of a micro service between multiple instances while making sure that there are no inconsistencies? How are you solving this issue? Are we missing something obvious? Any tips and tricks?
We appreciate any suggestions.
This might not be 100% what you want to hear, but generally people advise that all microservices should be stateless.
An overall application, of course, has state, and databases, persistent event streams or key-value caches (e.g. Redis) are excellent ways of persisting this. Ideally this is bounded per service though, otherwise you risk end up writing a distributed monolith.
Hard to say in your particular case, but perhaps rethink how state is stored conceptually, and make that more explicit - determining what is cache (for performance) and what is genuine state that should be persisted externally (e.g. to Redis & a database), that allows many service instances to use instantly, thus making sure they can are truly disposable processes.

socket io - Emit an event every X seconds or just emit it after a POST event?

I'm using socket io, and I was wondering what was better.
Emiting an event every X seconds to keep always updated with the database or emit the event after e.g a POST event, so it's more efficient.
I believe updating X seconds should be easier, and maybe has better scalability, but don't know if that's the correct way.
EDIT-1: To give more context. The application is for an accounting team. They basically want their excel sheets converted to a app. They have a lot of data, so I don't know if emitting an event every X seconds is a good idea.
Thanks.
There is no "correct" way. It depends entirely upon the needs of your client and the capabilities of your server. If the client needs to be kept more instantly up-to-date, then send data from your server to the client whenever the server has new data. If the client only needs to be updated every once-in-a-while, then only send it data every once-in-a-while. There is no "correct" way. It depends upon your application.
It is always more efficient to only send data to the client when the data has actually changed and when the client actually cares that something has changed. So, it would be foolish to send a client update every few seconds if the data isn't actually changing that often. If you have a means of knowing when the data changes on the server, then use that event to know when to send data to the client and even then, don't send it more often than the client actually cares to know.
It is always more efficient to have the server do no more work than is actually required by the client. Things like caching and keeping track of what each client was last sent can sometimes save lots of work for the server too.
Any further advice on this matter would need to know a lot more about the needs of your application and how this particular data fits into that and how often the data in question actually changes.
A summary on this topic:
Send data to the client no more often than it needs it
Sending data to the client that has not changed since the last time you changed it is inefficient for the server and consumes bandwidth.
Only you can decide how often your client needs updates (it depends upon your application)
Only you can test the impact on scalability of sending data to every client every time the data changes.
Server-side caching and keeping track of what client already has what data can help you avoid sending data to a client that it already has.
Server-side scalability probably has a lot to do with how many simultaneous clients are connected and how frequently there is changed data to send them.

Micro-services architecture, need advise

We are working on a system that is supposed to 'run' jobs on distributed systems.
When jobs are accepted they need to go through a pipeline before they can be executed on the end system.
We've decided to go with a micro-services architecture but there one thing that bothers me and i'm not sure what would be the best practice.
When a job is accepted it will first be persisted into a database, then - each micro-service in the pipeline will do some additional work to prepare the job for execution.
I want the persisted data to be updated on each such station in the pipeline to reflect the actual state of the job, or the its status in the pipeline.
In addition, while a job is being executed on the end system - its status should also get updated.
What would be the best practice in sense of updating the database (job's status) in each station:
Each such station (micro-service) in the pipeline accesses the database directly and updates the job's status
There is another micro-service that exposes the data (REST) and serves as DAL, each micro-service in the pipeline updates the job's status through this service
Other?....
Help/advise would be highly appreciated.
Thanx a lot!!
To add to what was said by #Anunay and #Mohamed Abdul Jawad
I'd consider writing the state from the units of work in your pipeline to a view (table/cache(insert only)), you can use messaging or simply insert a row into that view and have the readers of the state pick up the correct state based on some logic (date or state or a composite key). as this view is not really owned by any domain service it can be available to any readers (read-only) to consume...
Consider also SAGA Pattern
A Saga is a sequence of local transactions where each transaction updates data within a single service. The first transaction is initiated by an external request corresponding to the system operation, and then each subsequent step is triggered by the completion of the previous one.
http://microservices.io/patterns/data/saga.html
https://dzone.com/articles/saga-pattern-how-to-implement-business-transaction
https://medium.com/#tomasz_96685/saga-pattern-and-microservices-architecture-d4b46071afcf
If you would like to code the workflow:
Micorservice A which accepts the Job and command for update the job
Micorservice B which provide read model for the Job
Based on JobCreatedEvents use some messaging queue and process and update the job through queue pipelines and keep updating JobStatus through every node in pipeline.
I am assuming you know things about queues and consumers.
Myself new to Camunda(workflow engine), that might be used not completely sure
accessing some shared database between microservices is highly not recommended as this will violate the basic rule of microservices architecture.
microservice must be autonomous and keep it own logic and data
also to achive a good microservice design you should losely couple your microservices
Multiple microservices accessing the database is not recommended. Here you have the case where each of the service needs to be triggered, then they update the data and then some how call the next service.
You really need a mechanism to orchestrate the services. A workflow engine might fit the bill.
I would however suggest an event driven system. I might be going beyond with a limited knowledge of the data that you have. Have one service that gives you basic crud on data and other services that have logic to change the data (I would at this point would like to ask why you want different services to change the state, if its a biz req, its fine) Once you get the data written just create an event to which services can subscribe and react to it.
This will allow you to easily add more states to your pipeline in future.
You will need a service to manage the event queue.
As far as logging the state of the event was concerned it can be done easily by logging the events.
If you opt for workflow route you may use Amazon SWF or Camunda or really there quite a few options out there.
If going for the event route you need to look into event driven system in mciroservies.

Resources