I am working in microservices architecture. Every service has few long-running tasks (data processing, report generation) which can take up to 1-2 hours. We are using Kafka for the queue.
How to handle cases where pod restart or any deployment happens just before the completion of the task? The task will start again and will take that much time again. Is there any way to run these tasks independently to the application pod?
you can use the Kubernetes jobs to do these types of tasks so as soon as the task is done Kubernetes will auto-delete the pods also.
Jobs are also configurable and will be running standalone so if you will deploy the job again it fetches the data from Kafka and jobs new job will start.
Related
I have a .NET 6 application. This application contains a BackgroundService that is used to handler background jobs. These jobs are fired on a certain time based on different schedules.
I need to implement graceful shutdown when the application is stopping, because there might be some running jobs and I want to complete the jobs before stopping the application.
The app is deployed in Azure Kubernetes Service, when running this command kubectl delete pod {podname} to delete a certain pod, I am not able to handle the sigterm and write my logic to handle graceful shutdown.
I am using the IHostApplicationLifetime and registering on the ApplicationStopping event.
IHostApplicationLifetime _applicationLifetime;
_applicationLifetime.ApplicationStopping.Register(OnAppStopping);
In the docker file I added the below line
STOPSIGNAL SIGTERM
The OnAppStopping is never fired on AKS.
I am exploring different strategies for handling shutdown gracefully in case of deployment/crash. I am using the Spring Boot framework and Kubernetes. In a few of the services, we have tasks that can take around 10-20 minutes(data processing, large report generation). How to handle pod termination in these cases when the task is taking more time. For queuing I am using Kafka.
we have tasks that can take around 10-20 minutes(data processing, large report generation)
First, this is more of a Job/Task rather than a microservice. But similar "rules" applies, the node where this job is executing might terminate for upgrade or other reason, so your Job/Task must be idempotent and be able to be re-run if it crashes or is terminated.
How to handle pod termination in these cases when the task is taking more time. For queuing I am using Kafka.
Kafka is a good technology for this, because it is able to let the client Jon/Task to be idempotent. The job receives the data to process, and after processing it can "commit" that it has processed the data. If the Task/Job is terminated before it has processed the data, a new Task/Job will spawn and continue processing on the "offset" that is not yet committed.
i'm starting a project in spring batch, my plan is like the following:
create spring boot app
expose an api to submit a job(without executing), that will return the job execution id to be able to track the progress of the job later by other clients
create a scheduler for running a job - i want to have a logic that will decide how many jobs i can run at any moment.
the issue is that my batch service may receive many requests for starting jobs, and i want to put the job exeuctuion in a pending status first, then a scheduler later on will check the jobs in pending status and depending on my logic will decide if it should run another set jobs.
is that possible to do in spring batch, or i need to implement it from scratch ?
The common way of addressing such a use case is to decouple job submission from job execution using a queue. This is described in details in the Launching Batch Jobs through Messages section. Your controller can accept job requests and put them in a queue. The scheduler can then control how many requests to read from the queue and launch jobs accordingly.
Spring Batch provides all building blocks (JobLaunchRequest, JobLaunchingMessageHandler, etc) to implement this pattern.
I have some code execution which will scheduled many jobs at different date-time. So overall I will have lot of jobs to run at specific date-time. I know that there is Spring Scheduler which will execute a job at some time period, but it does not schedule a job dynamically. I can use ActiveMQ with timed delivery or Quartz for my purpose but looking for a little suggestion. Shall I use Quartz or ActiveMQ timed/delayed delivery or something else.
There is another alternative as well in Executor service with timed execution, but if application restarts then the job will be gone I believe. Any help will be appreciated.
While you can schedule message delivery in ActiveMQ it wasn't designed to be used as a job scheduler whereas that's exactly what Quartz was designed for.
In one of your comments you talked about wanting a "scalable solution" and ActiveMQ won't scale well with a huge number of scheduled jobs because the more messages which accumulate in the queues the worse it will perform since it will ultimately have to page those messages to disk rather than keeping them in memory. ActiveMQ, like most message brokers, was meant to hold messages for a relatively short amount of time before they are consumed. It's much different than a database which is better suited for this use-case. Quartz should scale better than ActiveMQ for a large number of jobs for this reason.
Also, the complexity of the jobs you can configure in Quartz is greater. If you go with ActiveMQ and you eventually need more functionality than it supports then that complexity will be pushed down into your application code. However, there's a fair chance could simply do what you want with Quartz since it was designed as a job scheduler.
Lastly, a database is more straight-forward to maintain than a message broker in my opinion and a database is also easy to provision in most cloud providers. I'd recommend you go with Quartz.
You can start by using a cron-expression in order to cover the case when your application will restart. The cron-expression can be stored in the properties file. Also, when your application will be scheduled, you can restart or reschedule your job programatically by creating a new job instance with another cron-expression for example.
I haven't yet actually used Resque. I have the following questions and assumptions that I'd like verified:
1) I understand that you can have a multiserver architecture by configuring each of your resque instances to point to a central redis server. Correct?
2) Does this mean that any resque instance can add items to a queue and any workers can work on any of those queues?
3) Can multiple workers respond to the same item in a queue? I.e. one server puts "item 2 has been updated" in a queue, can workers 1, 2, and 3, on different servers, all act on that? Or would I need to create separate queues? I kind of want a pub/sub types tasks.
4) Does the Sinatra monitor app live on each instance of Resque? Or is there just one app that knows about all the queues and workers?
5) Does Resque know when a task is completed? I.e. does the monitor app show that a task is in process? Or just that a worker took it?
6) Does the monitor app show completed tasks? I.e. if a task complete quickly will I be able to see that at some point in the recent past that task was completed?
7) Can I programmatically query whether a task has been started, is in progress, or is completed?
As I am using resque extensively in our project, here are few answers for your query:
I understand that you can have a multi-server architecture by configuring each of your resque instances to point to a central redis server. Correct?
Ans: Yes you have multiple servers pointing to single resque server. I am running on similar architecture.
Does this mean that any resque instance can add items to a queue and any workers can work on any of those queues?
Ans: This depends on how you are configuring your servers, you have to create queues and them assign workers to them.
You can have multiple queues and each queue can have multiple workers working on them.
Can multiple workers respond to the same item in a queue? I.e. one server puts "item 2 has been updated" in a queue, can workers 1, 2, and 3, on different servers, all act on that? Or would I need to create separate queues? I kind of want a pub/sub types tasks.
Ans: This again based on your requirement, if you want to have a single queue and all workers working on it, this is also valid.
or if you want separate queue for each server you can do that also.
Any server can put jobs in any queue but only assigned workers will pickup and work on that job
Does the Sinatra monitor app live on each instance of Resque? Or is there just one app that knows about all the queues and workers?
Ans: Sinatra monitor app gives you an interface where you can see all workers/queues related info such as running jobs, waiting jobs, queues and failed jobs etc.
Does Resque know when a task is completed? I.e. does the monitor app show that a task is in process? Or just that a worker took it?
Ans: It does, basically resque also maintains internal queues to manage all the jobs.
Ans: Yes it shows every stats about the Job.
Does the monitor app show completed tasks? I.e. if a task complete quickly will I be able to see that at some point in the recent past that task was completed?
Ans: Yes you can do
for example to know which workers are working use Resque.working, similarly you can check their code base and utilise anything.
Resque is a very powerful library we are using for more than one year now.
Cheers happy coding!!