Spring Batch - how to execute a specific step - spring

I've developed a Spring Batch with 10 Step.
How can i execute a specific Step ?
My wish is to passed in JobParameter the Step to execute.
Only the Step specified in JobParameter must be executed.
I searched to use the Decider, but i'm not really satisfied.
There's a better Solution ?
Any idea ?
Thanks a lot.
cordially

The Decider is the correct option for the type of processing you're talking about. I'd be interested in why you were "not really satisfied" by that option?

I had similar use case. I wanted to pass job name and step name as the input and expectation was to execute only that particular step from the job.
I created a Rest API which accepts job name and step name as URL parameters.
Below is the code inside the Rest API. It requires few objects injected into the class.
jobRegistry - instance of MapJobRegistry
jobRepository - instance of jobRepository
jobLauncher - instance of JobLauncher
Job job = null;
Step step = null;
try{
Job job = jobRegistry.getJob(jobName);
if(job instanceof StepLocator){
Step = ((StepLocator)job).getStep(stepName);
}
}catch(NoSuchJobException ex){
throw new Exception("Invalid Job", ex);
}catch(NoSuchStepException ex){
throw new Exception("Invalid Step", ex);
}
if(null == step){
throw new Exception("invalid step");
}
JobBuilderFactory jobBuilder = new JobBuiderFactory(jobRepository);
Job newJob = jobBuilder.get("specific-step-job")
.start(step)
.build();
jobLauncher.run(newJob, jobParameters); //jobParameters relevant for this job
This will dynamically create a new job with name "specific-step-job" and add the specific step instance as the start/only step inside the new Job and executes this job.

Yes, you can test an individual step using JobLauncherTestUtils#launchStep.
Please have a look at section 10.3. Testing Individual Steps
Find a sample code here Spring Batch unit test example

Related

Spring batch won't run job twice

I developed two different batches, let's say they're named as batch1 and batch2.
In an integration test, I'm trying to prove that the execution of twice of the same batch (batch1 then batch1 for instance) isn't duplicating data saved in database in the writer.
The issue I face is that the first batch is running successfully, but the second isn't doing anything (reader, processor, writer are not called). However, the batch status is marked as COMPLETED.
Here's the code:
#Test
void testBatchWorksIfJobIsRanTwice()
throws JobInstanceAlreadyCompleteException, JobExecutionAlreadyRunningException, JobParametersInvalidException, JobRestartException {
// given
createData();
// when
JobExecution firstJobExecution = runJobAndGetExecution("batch1_id1", jobRepository, job);
assertEquals(BatchStatus.COMPLETED, firstJobExecution.getStatus());
createMoreData();
JobExecution secondJobExecution = runJobAndGetExecution("batch1_id2", jobRepository, job);
assertEquals(BatchStatus.COMPLETED, secondJobExecution.getStatus());
// then
// assertions on data
}
public static JobExecution runJobAndGetExecution(String jobId, JobRepository jobRepository, Job job)
throws JobInstanceAlreadyCompleteException, JobExecutionAlreadyRunningException, JobParametersInvalidException, JobRestartException {
JobParameters param = new JobParametersBuilder()
.addString("JobID", jobId)
.addLong("currentTime", System.currentTimeMillis())
.toJobParameters();
SimpleJobLauncher launcher = new SimpleJobLauncher();
launcher.setJobRepository(jobRepository);
launcher.setTaskExecutor(new SyncTaskExecutor());
return launcher.run(job, param);
}
Please note that I pass a timestamp param in the job parameters to ensure that params are different.
Please also note that running the two different batches one after the other (batch1 and then batch2 is working fine, but batch1 then batch1 or batch2 then batch2 is not)
Any idea why it seems to run ? (I put some breakpoints and everything seems to happen correctly, just the reader, processer and writer are not called).
I resolved it. It was a misconception of the reader I implemented that was never reinitializing data, because I missunderstood the way reader works. The reader is called only once even if we're running the job 10 times.
The comment from Mahmoud Ben Hassine on the post is complety right

Monitor the progress of a spring batch job

I am writing various jobs using spring batch with java configuration.
I need to get the current state of the job
e.g.
which steps are currently running (I may have multiple steps running at the same time)
Which steps failed (the status and exit code)
etc.
The only examples I see online are of XML based spring batch and I want to use java config only.
Thanks.
Another option is to use JobExplorer
Entry point for browsing executions of running or historical jobs and steps. Since the data may be re-hydrated from persistent storage, it may not contain volatile fields that would have been present when the execution was active.
List<JobExecution> jobExecutions = jobExplorer.getJobExecutions(jobInstance);
for (JobExecution jobExecution : jobExecutions) {
jobExecution.getStepExecutions();
//read step info
}
And for create jobExplorer you have to use the factory:
import org.springframework.batch.core.explore.support.JobExplorerFactoryBean;
JobExplorerFactoryBean factory = new JobExplorerFactoryBean();
factory.setDataSource(dataSource);
factory.getObject();
I use these two queries on spring batch meta data tables to know about job progress and step details.
SELECT * FROM BATCH_JOB_EXECUTION ORDER BY START_TIME DESC;
SELECT * FROM BATCH_STEP_EXECUTION WHERE JOB_EXECUTION_ID=? ORDER BY STATUS;
With First query, I first find JOB_EXECUTION_ID corresponding to my job execution then use that id in second query to find details about specific steps.
Additionally, your config choice ( Java or XML ) has nothing to do with Spring Batch meta data. If you are persisting data then it doesn't matter if its XML config or Java Config.
For Java based monitoring- you can use JobExplorer & JobRepository beans to query jobs etc.
e.g. List<JobInstance> from jobExplorer.getJobInstances & jobExplorer.getJobExecutions(jobInstance) etc.
From JobExecutions you can get StepExecutions and so on.
You might have to set JobRegistryBeanPostProcessor bean like below for JobExplorer & JobRepository to work properly.
#Bean
public JobRegistryBeanPostProcessor jobRegistryBeanPostProcessor(
JobRegistry jobRegistry) {
JobRegistryBeanPostProcessor jobRegistryBeanPostProcessor = new JobRegistryBeanPostProcessor();
jobRegistryBeanPostProcessor.setJobRegistry(jobRegistry);
return jobRegistryBeanPostProcessor;
}

Spring batch repeat step ending up in never ending loop

I have a spring batch job that I'd like to do the following...
Step 1 -
Tasklet - Create a list of dates, store the list of dates in the job execution context.
Step 2 -
JDBC Item Reader - Get list of dates from job execution context.
Get element(0) in dates list. Use is as input for jdbc query.
Store element(0) date is job execution context
Remove element(0) date from list of dates
Store element(0) date in job execution context
Flat File Item Writer - Get element(0) date from job execution context and use for file name.
Then using a job listener repeat step 2 until no remaining dates in the list of dates.
I've created the job and it works okay for the first execution of step 2. But step 2 is not repeating as I want it to. I know this because when I debug through my code it only breaks for the initial run of step 2.
It does however continue to give me messages like below as if it is running step 2 even when I know it is not.
2016-08-10 22:20:57.842 INFO 11784 --- [ main] o.s.batch.core.job.SimpleStepHandler : Duplicate step [readStgDbAndExportMasterListStep] detected in execution of job=[exportMasterListCsv]. If either step fails, both will be executed again on restart.
2016-08-10 22:20:57.846 INFO 11784 --- [ main] o.s.batch.core.job.SimpleStepHandler : Executing step: [readStgDbAndExportMasterListStep]
This ends up in a never ending loop.
Could someone help me figure out or give a suggestion as to why my stpe 2 is only running once?
thanks in advance
I've added two links to PasteBin for my code so as not to pollute this post.
http://pastebin.com/QhExNikm (Job Config)
http://pastebin.com/sscKKWRk (Common Job Config)
http://pastebin.com/Nn74zTpS (Step execution listener)
From your question and your code I deduct that based on the amount of dates that you retrieve (this happens before the actual job starts), you will execute a step for the amount of times you have dates.
I suggest a design change. Create a java class that will get you the dates as a list and based on that list you will dynamically create your steps. Something like this:
#EnableBatchProcessing
public class JobConfig {
#Autowired
private JobBuilderFactory jobBuilderFactory;
#Autowired
private StepBuilderFactory stepBuilderFactory;
#Autowired
private JobDatesCreator jobDatesCreator;
#Bean
public Job executeMyJob() {
List<Step> steps = new ArrayList<Step>();
for (String date : jobDatesCreator.getDates()) {
steps.add(createStep(date));
}
return jobBuilderFactory.get("executeMyJob")
.start(createParallelFlow(steps))
.end()
.build();
}
private Step createStep(String date){
return stepBuilderFactory.get("readStgDbAndExportMasterListStep" + date)
.chunk(your_chunksize)
.reader(your_reader)
.processor(your_processor)
.writer(your_writer)
.build();
}
private Flow createParallelFlow(List<Step> steps) {
SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor();
// max multithreading = -1, no multithreading = 1, smart size = steps.size()
taskExecutor.setConcurrencyLimit(1);
List<Flow> flows = steps.stream()
.map(step -> new FlowBuilder<Flow>("flow_" + step.getName()).start(step).build())
.collect(Collectors.toList());
return new FlowBuilder<SimpleFlow>("parallelStepsFlow")
.split(taskExecutor)
.add(flows.toArray(new Flow[flows.size()]))
.build();
}
}
EDIT: added "jobParameter" input (slightly different approach also)
Somewhere on your classpath add the following example .properties file:
sql.statement="select * from awesome"
and add the following annotation to your JobDatesCreator class
#PropertySource("classpath:example.properties")
You can provide specific sql statements as a command line argument as well. From the spring documentation:
you can launch with a specific command line switch (e.g. java -jar
app.jar --name="Spring").
For more info on that see http://docs.spring.io/spring-boot/docs/current/reference/html/boot-features-external-config.html
The class that gets your dates (why use a tasklet for this?):
#PropertySource("classpath:example.properties")
public class JobDatesCreator {
#Value("${sql.statement}")
private String sqlStatement;
#Autowired
private CommonExportFromStagingDbJobConfig commonJobConfig;
private List<String> dates;
#PostConstruct
private void init(){
// Execute your logic here for getting the data you need.
JdbcTemplate jdbcTemplate = new JdbcTemplate(commonJobConfig.onlineStagingDb);
// acces to your sql statement provided in a property file or as a command line argument
System.out.println("This is the sql statement I provided in my external property: " + sqlStatement);
// for now..
dates = new ArrayList<>();
dates.add("date 1");
dates.add("date 2");
}
public List<String> getDates() {
return dates;
}
public void setDates(List<String> dates) {
this.dates = dates;
}
}
I also noticed that you have alot of duplicate code that you can quite easily refactor. Now for each writer you have something like this:
#Bean
public FlatFileItemWriter<MasterList> division10MasterListFileWriter() {
FlatFileItemWriter<MasterList> writer = new FlatFileItemWriter<>();
writer.setResource(new FileSystemResource(new File(outDir, MerchHierarchyConstants.DIVISION_NO_10 )));
writer.setHeaderCallback(masterListFlatFileHeaderCallback());
writer.setLineAggregator(masterListFormatterLineAggregator());
return writer;
}
Consider using something like this instead:
public FlatFileItemWriter<MasterList> divisionMasterListFileWriter(String divisionNumber) {
FlatFileItemWriter<MasterList> writer = new FlatFileItemWriter<>();
writer.setResource(new FileSystemResource(new File(outDir, divisionNumber )));
writer.setHeaderCallback(masterListFlatFileHeaderCallback());
writer.setLineAggregator(masterListFormatterLineAggregator());
return writer;
}
As not all code is available to correctly replicate your issue, this answer is a suggestion/indication to solve your problem.
Based on our discussion on Spring batch execute dynamically generated steps in a tasklet I'm trying to answer the questions on how to access jobParameter before the job is actually being executed.
I assume that there is restcall which will execute the batch. In general, this will require the following steps to be taken.
1. a piece of code that receives the rest call with its parameters
2. creation of a new springcontext (there are ways to reuse an existing context and launch the job again but there are some issues when it comes to reuse of steps, readers and writers)
3. launch the job
The simplest solution would be to store the jobparameter received from the service as an system-property and then access this property when you build up the job in step 3. But this could lead to a problem if more than one user starts the job at the same moment.
There are other ways to pass parameters into the springcontext, when it is loaded. But that depends on the way you setup your context.
For instance, if you are using SpringBoot directly for step 2, you could write a method like:
private int startJob(Properties jobParamsAsProps) {
SpringApplication springApp = new SpringApplication(.. my config classes ..);
springApp.setDefaultProperties(jobParamsAsProps);
ConfigurableApplicationContext context = springApp.run();
ExitCodeGenerator exitCodeGen = context.getBean(ExitCodeGenerator.class);
int code = exitCodeGen.getExitCode();
context.close();
return cod;
}
This way, you could access the properties as normal with standard Value- or ConfigurationProperties Annotations.

How Does Spring Batch Step Scope Work

I have a requirement where I need to process files based on the rest call in which I get the name of the file, I am adding it to the job parameter and using it while creating the beans.
I am creating step scope Beans for (reader,writer) and using the job parameter.I am starting the job in a new thread as I am using asynchronus task exceutor to launch the job and my question is how will the beans be created by spring when we define #StepScope
jobParametersBuilder.addString("fileName", request.getFileName());
jobExecution = jobLauncher.run(job, jobParametersBuilder.toJobParameters());
#Bean
public JobLauncher jobLauncher() {
SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
jobLauncher.setJobRepository(jobRepository());
jobLauncher.setTaskExecutor(asyncTaskExecutor());
return jobLauncher;
}
#Bean
#StepScope
public ItemWriter<Object> writer(#Value ("#{jobParameters['fileName']}"String fileName) {
JdbcBatchItemWriter<Object> writer = new JdbcBatchItemWriter<>();
writer.setItemSqlParameterSourceProvider(
new BeanPropertyItemSqlParameterSourceProvider<Object>());
writer.setSql(queryCollection.getquery());
writer.setDataSource(dataSource(fileName));
return writer;
}
A spring batch StepScope object is one which is unique to a specific step and not a singleton. As you probably know, the default bean scope in Spring is a singleton. But by specifying a spring batch component being StepScope means that Spring Batch will use the spring container to instantiate a new instance of that component for each step execution.
This is often useful for doing parameter late binding where a parameter may be specified either at the StepContext or the JobExecutionContext level and needs to be substituted for a placeholder, much like your example with the filename requirement.
Another useful reason to use StepScope is when you decide to reuse the same component in parallel steps. If the component manages any internal state, its important that it be StepScope based so that one thread does not impair the state managed by another thread (e.g, each thread of a given step has its own instance of the StepScope component).

Rerunning a completed Job from Spring Framework?

I am developing a web application using Spring Framework that does some jobs as the application starts up, and these jobs primarily consist of loading data from CSVs and making Java objects out of them.
Currently, I am trying to build a RESTful API using Restlet and Spring framework and one of the queries is supposed to take in a job name as parameter and restart that job even if that job has been marked as COMPLETED, how do I accomplish a job restart? I have tried the spring frameworks' Joboperator interface's startNextInstance() method and have also tried to manually increment the JobParameters so that there is no jobinstancealrradyrunning exception?
Anyone has any code snippet or alternative idea on how to restart a Job in Spring Framework that has been marked as Completed?
Any help would be greatly appreciated, thanks!!
Because of the terms you're using i'm quite certain you're using Spring Batch
In Batch terms you cannot actually restart a COMPLETED instance or execution. Single job instance is identified by job parameters. If you need to run the job again with same parameters, one way would be to include some unique parameter, for example the current timestamp to the JobParameters before launching.
So restarting a completed job would mean starting a new instance of the job with similiar parameters. Here's an slightly modified snippet i've used before that uses JobLauncher and JobRegistry to launch a new job by name:
#Autowired
#Qualifier("asyncJobLauncher")
private JobLauncher asyncJobLauncher;
#Autowired
private JobRegistry jobRegistry;
...
public JobExecution startJob(String jobName) {
Job job;
try {
job = jobRegistry.getJob(jobName);
} catch (NoSuchJobException e) {
// handle invalid job name
}
JobParametersBuilder jobParams = new JobParametersBuilder();
jobParams.addLong("currentTime", System.currentTimeMillis());
// add other relevant parameters
try {
JobExecution jobExecution = asyncJobLauncher.run(job, jobParams.toJobParameters());
return jobExecution;
} catch (JobExecutionAlreadyRunningException e) {
// handle instance already running with given params
} catch (Exception e) {
// handle other errors
}
}
Hope it helps, here's some reading about the subject.

Resources