I have a complex long-running flow that I'm going to implement based on Spring Batch Flow Job.
My REST API will wait for the incoming request and then (based on each request) initiate a new job execution.
Right now I'm worried about the server resources because the number of incoming requests is a quite big and I'd like to control the number of jobs running simultaneously. Is there any way to tell Spring Batch to run simultaneously not more than the exact number of jobs(let's say 5) and put rest of the jobs into the queue in order to be executed later, when for example one of these previous 5 jobs will be finished?
There is not a way to accomplish this in Spring Batch. The reason for this is that the number of concurrent jobs is really an orchestration problem which Spring Batch specifically avoids solving (allowing you to integrate with whatever you want).
That being said, the ability to control what you're describing can be done in a relatively straight forward manor by implementing a work queue that stores the requests to run a job, and having a service picking up those requests at the other end. The concurrency can be controlled easily with Spring Integration components to prevent the system from being overloaded (assuming you have a mechanism to handle the queue size in question).
Related
I have written spring batch remote partitioning approach.I dont want my master step to waits for slave step acknowledgment.I want my master step to complete as soon as it partitions the data.
Is there any configuration for same in spring batch.
If the manager should not wait for workers, what you are describing is not a manager/worker configuration anymore. In a manager/worker setup, the manager divides the work among workers and waits for them to finish (in Spring Batch, you can configure the manager to wait in two different ways: poll the job repository for worker statuses, or gather replies from workers up to a given timeout).
I don't see the rationale behind this "fire-and-forget" approach (who would monitor the status of workers and drive the process accordingly?), but remote partitioning is definitely not suitable to implement this pattern (at least in my opinion). If you really want to (ab)use remote partitioning for that, you can register a custom PartitionHandler that does not wait for workers (ie remove this section).
i'm starting a project in spring batch, my plan is like the following:
create spring boot app
expose an api to submit a job(without executing), that will return the job execution id to be able to track the progress of the job later by other clients
create a scheduler for running a job - i want to have a logic that will decide how many jobs i can run at any moment.
the issue is that my batch service may receive many requests for starting jobs, and i want to put the job exeuctuion in a pending status first, then a scheduler later on will check the jobs in pending status and depending on my logic will decide if it should run another set jobs.
is that possible to do in spring batch, or i need to implement it from scratch ?
The common way of addressing such a use case is to decouple job submission from job execution using a queue. This is described in details in the Launching Batch Jobs through Messages section. Your controller can accept job requests and put them in a queue. The scheduler can then control how many requests to read from the queue and launch jobs accordingly.
Spring Batch provides all building blocks (JobLaunchRequest, JobLaunchingMessageHandler, etc) to implement this pattern.
Is there an option available in Mule for specifying the jdbc polling time to begin (not the pollingFrequency). My scenario is, we are going to deploy the application in two nodes, so want to specify the start time for the polling to begin, so there will not be duplicate processing.
Instead of playing with delay, I would use a Quartz endpoint poller with different Cron expressions for my 2 nodes (externally configured in a properties file). That way, the polling times will be strictly configured.
I am looking for best solution to create a java web application to generate reports in excel/PDf format. some thing similar to Google Adwords, where user can create schedule reports and download it when the report is generated at a later time.
I am thinking to develop and java application where User logs, selects a pre defined report and provides the input parameters (like report date etc), This request will be queued up or saved as Quarts Job(prefer persistent Queue). A Job will be monitoring the queue/job and execute the job, generate the report(output excel /pdf) and stored in disk.
When the user refresh the screen or logs back at a later time, the report should be available for down load.
Using Spring batch and Quartz scheduler can I do this ? I also expecting like Spring admin , where I can see number of request in Queue(jobs queued up), and stop the queue processing etc.
You would use spring-batch if you wanted to process all report requests at the same time, perhaps at night when your servers are not otherwise occupied processing real-time user requests (or even during the day during slow periods).
You would use a quartz job if you wanted to check for new jobs every few seconds/minutes/hours/etc, and process one/many of them at that specified time interval.
So, quartz is a scheduler and batch is a process. You could use quartz to schedule batch jobs to run at specific times. They aren't competing technologies, they are complimentary.
About your question:
Given that you talk about queues and their persistence however it sounds a lot like your problem would fit into a simple jms model. You would need some messaging software. If you want to make it easy on yourself I'd recommend using spring-jms as a wrapper around the basic Java EE JMS api -- the spring wrappers are simply simpler than basic jms. For a messaging service I'd look at RabbitMQ, because again it's pretty simple.
With the jms architecture you'd post user requests to the queue, which you'd configured to be persistent. You'd have a custom listener on the queue, passing requests to a report generator whenever it runs. You can assign one or more threads to the listener, meaning that you should find it easy to tune the performance of the report generator.
There is a pretty useful DZone article about using rabbitmq via spring-integration (a set of prebuilt pattern implementations that help with connecting things to each other).
I have the problem that I have to run very long running processes on my Webservice and now I'm looking for a good way to handle the result. The scenario : A user executes such a long running process via UI. Now he gets the message that his request was accepted and that he should return some time later. So there's no need to display him the status of his request or something like this. I'm just looking for a way to handle the result of the long running process properly. Since the processes are external programms, my application server is not aware of them. Therefore I have to wait for these programms to terminate. Of course I don't want to use EJBs for this because then they would block for the time no result is available. Instead I thought of using JMS or Spring Batch. Does anyone ever had the same problem or an advice which solution would be better?
It really depends on what forms of communication your external programs have available. JMS is a very good approach and immediately available in your app server but might not be the best option if your external program is a long running DB query which dumps the result in a text file...
The main advantage of Spring Batch over "just" using JMS as an aynchronous communcations channel is the transactional properties, allowing the infrastructure to retry failed jobs, group jobs together and such. Without knowing more about your specific setup, it is hard to give detailed advise.
Cheers,
I had a similar design requirement, users were sending XML files and I had to generate documents from them. Using JMS in this case is advantageous since you can always add new instances of these processes which can consume and execute the jobs in parallel.
You can use a timer task to check status or monitor these processes. Also, you can publish a message to a JMS queue once the processes are completed.