Schedule a task in EC2 Auto Scaling Group - spring

I have multiple EC2s on an autoscaling group. They all run the same java application. In the application, I want to trigger a functionality every month. So, I have a function that uses Spring Schedule and runs every month. But, that function is run on every single EC2 instance in the autoscaling group while it must run only once. How should I approach this issue? I am thinking of using services like Amazon SQS but they would have the same problem.
To be more specific on what I have tried, in one attempt the function puts a record with a key unique to this month on a database which is shared among all the ec2 instances. If the record for this month is already there, the put request is ignored. Now the problems transfer to the reading part? I have a function that reads the database and do the job. But that function is run by every single ec2 instance.

Interesting! You could put a configuration on one of the servers to trigger a monthly activity, but individual instances in an Auto Scaling group should be treated as identical, fragile systems that could be replaced during a month. So, there would be no guarantee that this specific server would be around in one month.
I would suggest you take a step back and look at the monthly event as something that is triggered external to the servers.
I'm going to assume that the cluster of servers is running a web application and there is a Load Balancer in front of the instances that distributes traffic amongst the instances. If so, "something" should send a request to the Load Balancer, and this would be forwarded to one of the instances for processing, just like any normal request.
This particular request would be to a URL used specifically trigger the monthly processing.
This leaves the question of what is the "something" that sends this particular request. For that, there are many options. A simple one would be:
Configure Amazon CloudWatch Events to trigger a Lambda function based on a schedule
The AWS Lambda function would send the HTTP request to the Load Balancer

Related

What is the best way to share events between Google cloud run containers

I have a service which is running on many cloud run containers.
When a single container (A) receives a web request to do some work, I need all the other live containers to fetch some updated data from elasticsearch.
I would have expected ES to have a "listening" type of connection such as firebase but this is not possible.
Right now I am having to poll the database from each service.
Is there a better way to achieve this sort of cross container sync when using cloud run? Would pub/sub be the best solution here?
It's unusual but not impossible to achieve.
First of all, you have to understand the instance life cycle: the CPU is allocated only when a request is being processed. Else, the CPU is throttle ( bellow 5%). That's also for that you pay only when your instance is processing, and not when the instance is kept warm (and offloaded after a while).
That being said, it's totally useless and inefficient to update instances in background when a request is not being processed.
Therefore, the idea is to perform something when the instance receive a request. The bad thing is that this solution will increase the request latency (the instance start to sync his cache and then process the request).
Finally the solution is to store, somewhere, the latest cache update. You have to keep that pretty same information in your instance. When the instance receive a request, first thing, it compares its own cache date with the central data date.
If it's the same, no problem, continue the processing.
If the central data date is after the current instance date, update the instance data, and then process the request.
You can store the data, and the date of that data in Firestore for instance, or in MemoryStore, or in any other databases.
PubSub can be also a solution but more complex to implement. Each instance, when they start have to create a pull subscription on a topic. When the instance is killed, you have to delete that subscription.
Then, when a request comes in, your instance have to pull the subscription, and get the messages, if any, and update his local cache.
Could be faster than the previous solution, but harder to implement.

How can I inform to my sqs polling that my spot ec2 in going to terminated in next two minutes?

How can I inform to my sqs polling that my ec2 in going to terminated in next two minutes ?
I have used sqs and autos calling group, my sqs simple queue pooling request for ce2. ec2 takes task request from sqs and excute.
problem scenario : as i used spot ece aws terminate ec2 to manage capacity. now i want to stop new task execution over ec2 once it got termination request from autos calling or from aws.
How can I deal with it?
Normally you would setup CloudWatch Event (CWE) rule:
{
"source": [
"aws.ec2"
],
"detail-type": [
"EC2 Spot Instance Interruption Warning"
]
}
The CWE rule would trigger a lambda function, and the function could perform a number of actions on your instance, depending on what you want to do. This is use-case and application specific, and depends how your app gets notified about such events and what it does.
It could, for example:
use SSM Run Command to execute a bash/powershell commands on your instance to do cleaning before termination occurs.
it could call http endpoint on your instance which is exposed by your application. This way your application can directly get notified that it is going to be terminated soon.
copy some logs or data files to s3 before it gets terminated
and more

Best way to schedule one-time events in serverless environments

Example use case
Send the user a notification 2 hours after signup.
Options considered
setTimeout(() => { /* send notification */ }, 2*60*60*1000); is not an option in serverless environments since the function terminates after execution (so it has to be stateless).
CloudWatch events can schedule lambda invocations using cron expressions - but this was designed for repetitive invocations (there's a limit of 100 rules/region).
I have not seen scheduling options in AWS SNS/SQS or GCP Pub/Sub. Are there alternatives with scheduling?
I want to avoid (if possible) setting up a dedicated message broker (overkill) or stateful/non-serverless instance - is there a serverless way to do this?
I can queue the events in a database and invoke a lambda function every minute to poll the database for events to execute in that minute... is there a more elegant solution?
Use AWS Step functions, they are like serverless functions that don't have the 15 minute limit like AWS Lambda does. You can design a workflow in AWS step that integrates with API Gateway, Lambda and SNS to send email and text notifications as follows:
Create a REST API via API gateway that will invoke a Lambda function passing in for example, the destination address (email, phone #) of the SNS notification, when it should be sent, notification method (e.g. email, text, etc.).
The Lambda function on invocation will invoke the Step function passing in the data (Lambda is needed because API Gateway currently can't invoke Step functions directly).
The Step function is basically a workflow, you can define states for waiting (like waiting for the specified time to send the notification e.g. 30 seconds), and states for invoking other Lambda functions that can use SNS to send out an email and/or text notifications.
A rudimentary example is provided by AWS w/ their Task Timer example.
Things are coming on GCP for doing this, but not very soon. Thereby, today, the solution is to poll a database.
You can to that with Datastore/firestore with the execution datetime indexed (to prevent to read all the documents each minute). But be careful of traffic spike, you could create hotspot.
You can use Cloud Scheduler on Google Cloud Platform. As is is stated in the official documentation :
Cloud Scheduler is a fully managed enterprise-grade cron job scheduler. It allows you to schedule virtually any job, including batch, big data jobs, cloud infrastructure operations, and more. You can automate everything, including retries in case of failure to reduce manual toil and intervention. Cloud Scheduler even acts as a single pane of glass, allowing you to manage all your automation tasks from one place.
Here you can check a quickstart for using it with Pub/Sub and Cloud Functions.

Micro-services architecture, need advise

We are working on a system that is supposed to 'run' jobs on distributed systems.
When jobs are accepted they need to go through a pipeline before they can be executed on the end system.
We've decided to go with a micro-services architecture but there one thing that bothers me and i'm not sure what would be the best practice.
When a job is accepted it will first be persisted into a database, then - each micro-service in the pipeline will do some additional work to prepare the job for execution.
I want the persisted data to be updated on each such station in the pipeline to reflect the actual state of the job, or the its status in the pipeline.
In addition, while a job is being executed on the end system - its status should also get updated.
What would be the best practice in sense of updating the database (job's status) in each station:
Each such station (micro-service) in the pipeline accesses the database directly and updates the job's status
There is another micro-service that exposes the data (REST) and serves as DAL, each micro-service in the pipeline updates the job's status through this service
Other?....
Help/advise would be highly appreciated.
Thanx a lot!!
To add to what was said by #Anunay and #Mohamed Abdul Jawad
I'd consider writing the state from the units of work in your pipeline to a view (table/cache(insert only)), you can use messaging or simply insert a row into that view and have the readers of the state pick up the correct state based on some logic (date or state or a composite key). as this view is not really owned by any domain service it can be available to any readers (read-only) to consume...
Consider also SAGA Pattern
A Saga is a sequence of local transactions where each transaction updates data within a single service. The first transaction is initiated by an external request corresponding to the system operation, and then each subsequent step is triggered by the completion of the previous one.
http://microservices.io/patterns/data/saga.html
https://dzone.com/articles/saga-pattern-how-to-implement-business-transaction
https://medium.com/#tomasz_96685/saga-pattern-and-microservices-architecture-d4b46071afcf
If you would like to code the workflow:
Micorservice A which accepts the Job and command for update the job
Micorservice B which provide read model for the Job
Based on JobCreatedEvents use some messaging queue and process and update the job through queue pipelines and keep updating JobStatus through every node in pipeline.
I am assuming you know things about queues and consumers.
Myself new to Camunda(workflow engine), that might be used not completely sure
accessing some shared database between microservices is highly not recommended as this will violate the basic rule of microservices architecture.
microservice must be autonomous and keep it own logic and data
also to achive a good microservice design you should losely couple your microservices
Multiple microservices accessing the database is not recommended. Here you have the case where each of the service needs to be triggered, then they update the data and then some how call the next service.
You really need a mechanism to orchestrate the services. A workflow engine might fit the bill.
I would however suggest an event driven system. I might be going beyond with a limited knowledge of the data that you have. Have one service that gives you basic crud on data and other services that have logic to change the data (I would at this point would like to ask why you want different services to change the state, if its a biz req, its fine) Once you get the data written just create an event to which services can subscribe and react to it.
This will allow you to easily add more states to your pipeline in future.
You will need a service to manage the event queue.
As far as logging the state of the event was concerned it can be done easily by logging the events.
If you opt for workflow route you may use Amazon SWF or Camunda or really there quite a few options out there.
If going for the event route you need to look into event driven system in mciroservies.

Parse Server with independent workers

Image we want to check two weeks after a user's registration if she has been active and otherwise I want to notify her.
To achieve this we currently use the following setup (this runs on Heroku):
The parse server puts a task into the redis queue. The worker fetches tasks from that queue. Then it performs checks on the activity of the user. For this it needs to access the parse server to fetch that information. This puts additional load on our api.
I image the following scenario to be better:
I wonder: is it possible to achieve this scenario using parse server? (The worker dynos don't have a HTTP interface to run a parse server...)

Resources