Spring Batch or JMS for long running jobs - spring

I have the problem that I have to run very long running processes on my Webservice and now I'm looking for a good way to handle the result. The scenario : A user executes such a long running process via UI. Now he gets the message that his request was accepted and that he should return some time later. So there's no need to display him the status of his request or something like this. I'm just looking for a way to handle the result of the long running process properly. Since the processes are external programms, my application server is not aware of them. Therefore I have to wait for these programms to terminate. Of course I don't want to use EJBs for this because then they would block for the time no result is available. Instead I thought of using JMS or Spring Batch. Does anyone ever had the same problem or an advice which solution would be better?

It really depends on what forms of communication your external programs have available. JMS is a very good approach and immediately available in your app server but might not be the best option if your external program is a long running DB query which dumps the result in a text file...
The main advantage of Spring Batch over "just" using JMS as an aynchronous communcations channel is the transactional properties, allowing the infrastructure to retry failed jobs, group jobs together and such. Without knowing more about your specific setup, it is hard to give detailed advise.
Cheers,

I had a similar design requirement, users were sending XML files and I had to generate documents from them. Using JMS in this case is advantageous since you can always add new instances of these processes which can consume and execute the jobs in parallel.
You can use a timer task to check status or monitor these processes. Also, you can publish a message to a JMS queue once the processes are completed.

Related

How to know the the running status of a spring integration flow

I have a simple integration flow that poll data based on a cron job from database, publish on a DirectChannel, then do split and transformations, and publish on another executor service channel, do some operations and finally publish to an output channel, its written using dsl style.
Also, I have an endpoint where I might receive an http request to trigger this flow, at this point I send the messages one of the mentioned channels to trigger the flow.
I want to make sure that the manual trigger doesn’t happen if the flow is already running due to either the cron job or another request.
I have used the isRunning method of the StandardIntegrationFlow, but it seems that it’s not thread safe.
I also tried using .wireTap(myService) and .handle(myService) where this service has an atomicBoolean flag but it got set per every message, which is not a solution.
I want to know if the flow is running without much intervention from my side, and if this is not supported how can I apply the atomic boolean logic on the overall flow and not on every message.
How can I simulate the racing condition in a test in order to make sure my implementation prevent this?
The IntegrationFlow is just a logical container for configuration phase. It does have those lifecycle methods, but only for an internal framework logic. Even if they are there, they don't help because endpoints are always running if you want to do them something by some event or input message.
It is hard to control all of that since it is in an async state as you explain. Even if we can stop a SourcePollingChannelAdapter in the beginning of that flow to let your manual call do do something, it doesn't mean that messages in other threads are not in process any more. The AtomicBoolean cannot help here for the same reason: even if you set it to true in the MessageSourceMutator.beforeReceive() and reset back to false in its afterReceive() when message is null, it still doesn't mean that messages you pushed down in other thread are already processed.
You might consider to use an aggregator for AtomicBoolean resetting in the end of batch since you mention that you pull data from DB, so perhaps there is a number of records per poll you can track downstream. This way your manual call could be skipped until aggregator collects results for that batch.
You also need to think about stopping a SourcePollingChannelAdapter at the moment when manual action is permitted, so there won't be any further race conditions with the cron.

Service synchronization issue

I've created two services.
One of them (scheduler) only requests to the other (backoffice) for performing some "large" operations.
When backoffice receives a request:
first creates a mark (key on redis) in order to set that the process has started.
Each time a request is reached:
backoffice checks if the mark exist.
When it exists means that the previous process has not yet finished, and escape it.
Perform the large process.
When process is finished, the previous key in redis is removed.
It would be something like this:
if (key exists)
return;
make long process... (1);
remove key;
The problem arises when service is destroyed when the process has not already finished and then it doesn't removes the mark on redis. It means the process will never run again.
Is there any way to solve this kind of problems?
The way to solve this problem is use an existing engine as building custom scalable and robust solution for reliable service orchestration is really hard.
I recommend looking at Uber Cadence Workflow which would allow to convert your pseudocode into a real production application with minor changes.
You can fire a background job that updates timestamp under the key, e.g. every minute.
When service attempts to start the process it must verify key existence (as it does now) + timestamp under the key. If it is more than 1 minute ago then the previous attempt is stale and you can start over.
Sounds like you should be using a messaging queue to schedule tasks for the back office service. Queuing solutions like RabbitMQ allow you to manually acknowledge (or “ack”) that the process is complete. Whenever a subscriber crashes, the queue detects that the connection dropped without acknowledgement and will re-enqueue the same task which will be picked up by the next available subscriber. Here’s another thread talking about this problem specifically focused on messaging queues:
What happens to fetched messages when RabbitMQ consumer crashes?

Spring Batch Flow Job - control number of jobs running simultaneously

I have a complex long-running flow that I'm going to implement based on Spring Batch Flow Job.
My REST API will wait for the incoming request and then (based on each request) initiate a new job execution.
Right now I'm worried about the server resources because the number of incoming requests is a quite big and I'd like to control the number of jobs running simultaneously. Is there any way to tell Spring Batch to run simultaneously not more than the exact number of jobs(let's say 5) and put rest of the jobs into the queue in order to be executed later, when for example one of these previous 5 jobs will be finished?
There is not a way to accomplish this in Spring Batch. The reason for this is that the number of concurrent jobs is really an orchestration problem which Spring Batch specifically avoids solving (allowing you to integrate with whatever you want).
That being said, the ability to control what you're describing can be done in a relatively straight forward manor by implementing a work queue that stores the requests to run a job, and having a service picking up those requests at the other end. The concurrency can be controlled easily with Spring Integration components to prevent the system from being overloaded (assuming you have a mechanism to handle the queue size in question).

Golang background processing

How can one do background processing/queueing in Go?
For instance, a user signs up, and you send them a confirmation email - you want to send the confirmation email in the background as it may be slow, and the mail server may be down etc etc.
In Ruby a very nice solution is DelayedJob, which queues your job to a relational database (i.e. simple and reliable), and then uses background workers to run the tasks, and retries if the job fails.
I am looking for a simple and reliable solution, not something low level if possible.
While you could just open a goroutine and do every async task you want, this is not a great solution if you want reliability, i.e. the promise that if you trigger a task it will get done.
If you really need this to be production grade, opt for a distributed work queue. I don't know of any such queues that are specific to golang, but you can work with rabbitmq, beanstalk, redis or similar queuing engines to offload such tasks from your process and add fault tolerance and queue persistence.
A simple Goroutine can make the job:
http://golang.org/doc/effective_go.html#goroutines
Open a gorutine with the email delivery and then answer to the HTTP request or whatever
If you wish use a workqueue you can use Rabbitmq or Beanstalk client like:
https://github.com/streadway/amqp
https://github.com/kr/beanstalk
Or maybe you can create a queue in you process with a FIFO queue running in a goroutine
https://github.com/iNamik/go_container
But maybe the best solution is this job queue library, with this library you can set the concurrency limit, etc:
https://github.com/otium/queue
import "github.com/otium/queue"
q := queue.NewQueue(func(email string) {
//Your mail delivery code
}, 20)
q.Push("foo#bar.com")
I have created a library for running asynchronous tasks using a message queue (currently RabbitMQ and Memcache are supported brokers but other brokers like Redis or Cassandra could easily be added).
You can take a look. It might be good enough for your use case (and it also supports chaining and workflows).
https://github.com/RichardKnop/machinery
It is an early stage project though.
You can also use goworker library to schedule jobs.
http://www.goworker.org/
If you are coming from Ruby background and looking for something like Sidekiq, Resque, or DelayedJob, please check out the library asynq.
Queue semantics are very similar to sidekiq.
https://github.com/hibiken/asynq
If you want a library with a very simple interface, yet robust that feels Go-like, uses Redis as Backend and RabbitMQ as message broker, you can try
https://github.com/Joker666/cogman

scheduling jobs using spring batch or just Quartz scheduler

I am looking for best solution to create a java web application to generate reports in excel/PDf format. some thing similar to Google Adwords, where user can create schedule reports and download it when the report is generated at a later time.
I am thinking to develop and java application where User logs, selects a pre defined report and provides the input parameters (like report date etc), This request will be queued up or saved as Quarts Job(prefer persistent Queue). A Job will be monitoring the queue/job and execute the job, generate the report(output excel /pdf) and stored in disk.
When the user refresh the screen or logs back at a later time, the report should be available for down load.
Using Spring batch and Quartz scheduler can I do this ? I also expecting like Spring admin , where I can see number of request in Queue(jobs queued up), and stop the queue processing etc.
You would use spring-batch if you wanted to process all report requests at the same time, perhaps at night when your servers are not otherwise occupied processing real-time user requests (or even during the day during slow periods).
You would use a quartz job if you wanted to check for new jobs every few seconds/minutes/hours/etc, and process one/many of them at that specified time interval.
So, quartz is a scheduler and batch is a process. You could use quartz to schedule batch jobs to run at specific times. They aren't competing technologies, they are complimentary.
About your question:
Given that you talk about queues and their persistence however it sounds a lot like your problem would fit into a simple jms model. You would need some messaging software. If you want to make it easy on yourself I'd recommend using spring-jms as a wrapper around the basic Java EE JMS api -- the spring wrappers are simply simpler than basic jms. For a messaging service I'd look at RabbitMQ, because again it's pretty simple.
With the jms architecture you'd post user requests to the queue, which you'd configured to be persistent. You'd have a custom listener on the queue, passing requests to a report generator whenever it runs. You can assign one or more threads to the listener, meaning that you should find it easy to tune the performance of the report generator.
There is a pretty useful DZone article about using rabbitmq via spring-integration (a set of prebuilt pattern implementations that help with connecting things to each other).

Resources