Spring batch SystemCommandTasklet throwing null pointer exception - spring

I am new to Spring Batch and I am trying to run a linux sort command after the batch process using SystemCommandTasklet as a second step. However, its throwing NullPointerException when sorting bigger files (which takes some time, around 250 MB). It looks like SystemCommandTasklet is unable to initialize StepExecution in beforeStep() and throwing an error. Can someone check my configuration and let me know if I am missing some configuration which is causing this?
BatchConfig.java
#Bean
public Job job() throws Exception {
return jobs.get("job")
.incrementer(new RunIdIncrementer())
.flow(step1()).on("FAILED").fail().on("COMPLETED").to(step2())
.end()
.build();
}
#Bean
public Step step1() {
return steps.get("step1")
.<FileEntry,FileEntry>chunk(100)
.reader(reader()).faultTolerant().skipLimit(MAX_SKIP_LIMIT).skip(FlatFileParseException.class)
.processor(new Processor())
.writer(compositeWriter()).stream(outputwriter()).stream(rejectwriter())
.listener(new CustomStepExecutionListener())
.build();
}
#Bean
public Step step2() throws Exception {
return steps.get("step2")
.tasklet(sortingTasklet())
.build();
}
#Bean
#StepScope
public Tasklet sortingTasklet() throws Exception {
SystemCommandTasklet tasklet = new SystemCommandTasklet();
logger.debug("Sorting File : " + getOutputFileName());
tasklet.setCommand(new String("sort " + getOutputFileName() + " -d -s -t \001 -k1,1 -o " + getOutputFileName() + ".sorted "));
tasklet.setTimeout(600000l);
return tasklet;
}
Here is the link to SpringBatch source code for SystemCommandTasklet, its throwing NullPointerException at line 131.
https://github.com/spring-projects/spring-batch/blob/master/spring-batch-core/src/main/java/org/springframework/batch/core/step/tasklet/SystemCommandTasklet.java

You aren't registering the SystemCommandTasklet as a StepExecutionListener and since you aren't returning the implementing class on the #Bean method, Spring Batch doesn't know that the tasklet implements that interface. I'd recommend two things to be safe:
Change the tasklet's configuration method signature to be:
#Bean
#StepScope
public SystemCommandTasklet sortingTasklet() throws Exception {
Register the tasklet as a listener on your step as well, similar to how you're doing it with the CustomStepExecutionListener.

Related

How to run SystemCommandTasklet asynchronously

I want to run a shell script using Spring Batch and let the batch control job id and status. But I don't want my app to wait/hang until this shell script (SystemCommandTasklet) to be completed.
#Override
#Bean(name = "myJobLauncher")
public SimpleJobLauncher getJobLauncher() throws Exception {
SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
jobLauncher.setJobRepository(getJobRepository());
//jobLauncher.setTaskExecutor(new SimpleAsyncTaskExecutor());
jobLauncher.afterPropertiesSet();
return jobLauncher;
}
#Bean
public Step myStep(Tasklet tasklet) {
return this.stepBuilderFactory.get("myStep")
.listener(tasklet)
.tasklet(tasklet)
.build();
}
#Bean
#StepScope
public SystemCommandTasklet systemCommandTasklet(#Value("#{jobParameters['dir']}") String dir,
#Value("#{jobParameters['command']}") String command) {
SystemCommandTasklet tasklet = new SystemCommandTasklet();
tasklet.setWorkingDirectory(dir);
tasklet.setCommand(command);
tasklet.setTimeout(100000);
return tasklet;
}
When I run the code above, batch/application waits until 'command' is completed.
If I add jobLauncher.setTaskExecutor(new SimpleAsyncTaskExecutor()); then it fails without any error logged.
I had an issue in different part of my code.
adding new SimpleAsyncTaskExecutor() is actually working.

How to add tasklet to run after each partition step completion in Spring Batch

I am new to Spring batch and implementing a spring batch job where it has to pull huge data set from DB and write to file. Below is the sample job config which is working as expected for me.
#Bean
public Job customDBReaderFileWriterJob() throws Exception {
return jobBuilderFactory.get(MY_JOB)
.incrementer(new RunIdIncrementer())
.flow(partitionGenerationStep())
.next(cleanupStep())
.end()
.build();
}
#Bean
public Step partitionGenerationStep() throws Exception {
return stepBuilderFactory
.get("partitionGenerationStep")
.partitioner("Partitioner", partitioner())
.step(multiOperationStep())
.gridSize(50)
.taskExecutor(taskExecutor())
.build();
}
#Bean
public Step multiOperationStep() throws Exception {
return stepBuilderFactory
.get("MultiOperationStep")
.<Input, Output>chunk(100)
.reader(reader())
.processor(processor())
.writer(writer())
.build();
}
#Bean
#StepScope
public DBPartitioner partitioner() {
DBPartitioner dbPartitioner = new DBPartitioner();
dbPartitioner.setColumn(ID);
dbPartitioner.setDataSource(dataSource);
dbPartitioner.setTable(TABLE);
return dbPartitioner;
}
#Bean
#StepScope
public Reader reader() {
return new Reader();
}
#Bean
#StepScope
public Processor processor() {
return new Processor();
}
#Bean
#StepScope
public Writer writer() {
return new Writer();
}
#Bean
public Step cleanupStep() {
return stepBuilderFactory.get("cleanupStep")
.tasklet(cleanupTasklet())
.build();
}
#Bean
#StepScope
public CleanupTasklet cleanupTasklet() {
return new CleanupTasklet();
}
#Bean
public TaskExecutor taskExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(10);
executor.setMaxPoolSize(10);
executor.setQueueCapacity(10);
executor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
executor.setThreadNamePrefix("MultiThreaded-");
return executor;
}
As the data set is huge, i have configured thread pool value for task-executor as 10 and grid size 50. With this setup 10 threads are writing to 10 files at a time, and reader is reading file in chunks so reader processor and writer flow is iterating multiple times (for a group of 10, before moving to next partition).
Now, I would like to add a tasklet where i can compress files once all iteration (read, process,write) for one thread is completed i.e. after completion of each partition.
I do have a cleanup tasklet to run at last, but having compression logic there means to get all files generated from each partition first and then perform compression. Please suggest.
You can change your worker step multiOperationStep to be a FlowStep of a chunk-oriented step followed by a simple tasklet step where you do the compression. In other words, the worker step is actually two steps combined in one FlowStep.

Method annotated with #Bean is called directly. Use dependency injection instead

I'm following a tutorial for Spring Batch and when I write the following code - IntelliJ is complaining that the tasklet(null) call in the job function is called directly:
Method annotated with #Bean is called directly. Use dependency injection instead.
I can get the error to go away if I remove the #Bean annotation from the job - but I want to know what's going on. How can I inject the bean there? Simply writing tasklet(Tasklet tasklet(null)) gives the same error.
#Bean
#StepScope
public Tasklet tasklet(#Value("#{jobParameters['name']}") String name) {
return ((contribution, chunkContext) -> {
System.out.println(String.format("This is %s", name));
return RepeatStatus.FINISHED;
});
}
#Bean
public Job job() {
return jobBuilderFactory.get("job")
.start(stepBuilderFactory.get("step1")
.tasklet(tasklet(null)) // tasklet(null) = problem
.build())
.build();
}
asd
#Bean
#StepScope
public Tasklet tasklet(#Value("#{jobParameters['name']}") String name) {
return ((contribution, chunkContext) -> {
System.out.println(String.format("This is %s", name));
return RepeatStatus.FINISHED;
});
}
#Bean
public Job job(Tasklet tasklet) {
return jobBuilderFactory.get("job")
.start(stepBuilderFactory.get("step1")
.tasklet(tasklet)
.build())
.build();
}
Spring Bean creation and AOPs are very picky. You need to be very careful with the usage.
In this case you can use bean dependency to solve the TaskLet name being null.

How to pass JobParameters while creating spring beans using java config

I originally asked a question at [Getting "Scope 'step' is not active for the current thread" while creating spring batch beans, and based on the suggestion provided at Spring batch scope issue while using spring boot, I tried to replace #StepScope annotation and instead defined StepScope bean in configuration as below
#Bean
#Qualifier("stepScope")
public org.springframework.batch.core.scope.StepScope stepScope() {
final org.springframework.batch.core.scope.StepScope stepScope = new org.springframework.batch.core.scope.StepScope();
stepScope.setAutoProxy(true);
return stepScope;
}
with this change, I'm not able to pass job parameters while creating beans as, it is throwing
'jobParameters' cannot be found on object of type 'org.springframework.beans.factory.config.BeanExpressionContext'
My configuration is like
#Configuration
#EnableBatchProcessing
public class MyConfig{
#Bean
#Qualifier("partitionJob")
public Job partitionJob() throws Exception {
return jobBuilderFactory
.get("partitionJob")
.incrementer(new RunIdIncrementer())
.start(partitionStep(null))
.build();
}
#Bean
#StepScope
public Step partitionStep(
#Value("#{jobParameters[gridSize]}") String gridSize)
throws Exception {
return stepBuilderFactory
.get("partitionStep")
.partitioner("slaveStep", partitioner())
.gridSize(gridZize)
.step(slaveStep(chunkSize))
.taskExecutor(threadPoolTaskExecutor()).build();
}
#Bean
#StepScope
public Step slaveStep(int chunkSize) throws Exception {
return stepBuilderFactory
.get("slaveStep")
......
........
}
I read that the bean should be annotated with #StepScope,if job Parameters needs to be accessed like my example. But I'm getting exceptions as explained above.
This is how you can pass job parameters
GenericApplicationContext context = new AnnotationConfigApplicationContext(MyConfiguration.class);
JobLauncher jobLauncher = (JobLauncher) context.getBean("jobLauncher");
JobParameters jobParameters = new JobParametersBuilder()
.addString("paramter1", "Test")
.toJobParameters();
Job job = (Job) context.getBean("myJob");
JobExecution execution = jobLauncher.run(job, jobParameters);
Access Job parameters
Bean
#StepScope
public Step partitionStep(
#Value("#{jobParameters['paramter1']}") String gridSize)
throws Exception {
return stepBuilderFactory
.get("partitionStep")
.partitioner("slaveStep", partitioner())
.gridSize(gridZize)
.step(slaveStep(chunkSize))
.taskExecutor(threadPoolTaskExecutor()).build();
}

Issues with Spring Batch

Hi I have been working in Spring batch recently and need some help.
1) I want to run my Job using multiple threads, hence I have used TaskExecutor as below,
#Bean
public TaskExecutor taskExecutor() {
SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor();
taskExecutor.setConcurrencyLimit(4);
return taskExecutor;
}
#Bean
public Step myStep() {
return stepBuilderFactory.get("myStep")
.<MyEntity,AnotherEntity> chunk(1)
.reader(reader())
.processor(processor())
.writer(writer())
.taskExecutor(taskExecutor())
.throttleLimit(4)
.build();
}
but, while executing in can see below line in console.
o.s.b.c.l.support.SimpleJobLauncher : No TaskExecutor has been set, defaulting to synchronous executor.
What does this mean? However, while debugging I can see four SimpleAsyncExecutor threads running. Can someone shed some light on this?
2) I don't want to run my Batch application with the metadata tables that spring batch creates. I have tried adding spring.batch.initialize-schema=never. But it didn't work. I also saw some way to do this by using ResourcelessTransactionManager, MapJobRepositoryFactoryBean. But I have to make some database transactions for my job. So will it be alright if I use this?
Also I was able to do this by extending DefaultBatchConfigurer and overriding:
#Override
public void setDataSource(DataSource dataSource) {
// override to do not set datasource even if a datasource exist.
// initialize will use a Map based JobRepository (instead of database)
}
Please guide me further. Thanks.
Update:
My full configuration class here.
#EnableBatchProcessing
#EnableScheduling
#Configuration
public class MyBatchConfiguration{
#Autowired
public JobBuilderFactory jobBuilderFactory;
#Autowired
public StepBuilderFactory stepBuilderFactory;
#Autowired
public DataSource dataSource;
/* #Override
public void setDataSource(DataSource dataSource) {
// override to do not set datasource even if a datasource exist.
// initialize will use a Map based JobRepository (instead of database)
}*/
#Bean
public Step myStep() {
return stepBuilderFactory.get("myStep")
.<MyEntity,AnotherEntity> chunk(1)
.reader(reader())
.processor(processor())
.writer(writer())
.taskExecutor(executor())
.throttleLimit(4)
.build();
}
#Bean
public Job myJob() {
return jobBuilderFactory.get("myJob")
.incrementer(new RunIdIncrementer())
.listener(listener())
.flow(myStep())
.end()
.build();
}
#Bean
public MyJobListener myJobListener()
{
return new MyJobListener();
}
#Bean
public ItemReader<MyEntity> reader()
{
return new MyReader();
}
#Bean
public ItemWriter<? super AnotherEntity> writer()
{
return new MyWriter();
}
#Bean
public ItemProcessor<MyEntity,AnotherEntity> processor()
{
return new MyProcessor();
}
#Bean
public TaskExecutor taskExecutor() {
SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor();
taskExecutor.setConcurrencyLimit(4);
return taskExecutor;
}}
In the future, please break this up into two independent questions. That being said, let me shed some light on both questions.
SimpleJobLauncher : No TaskExecutor has been set, defaulting to synchronous executor.
Your configuration is configuring myStep to use your TaskExecutor. What that does is it causes Spring Batch to execute each chunk in it's own thread (based on the parameters of the TaskExecutor). The log message you are seeing has nothing to do with that behavior. It has to do with launching your job. By default, the SimpleJobLauncher will launch the job on the same thread it is running on, thereby blocking that thread. You can inject a TaskExecutor into the SimpleJobLauncher which will cause the job to be executed on a different thread from the JobLauncher itself. These are two separate uses of multiple threads by the framework.
I don't want to run my Batch application with the metadata tables that spring batch creates
The short answer here is to just use an in memory database like HSQLDB or H2 for your metadata tables. This provides a production grade data store (so that concurrency is handled correctly) without actually persisting the data. If you use the ResourcelessTransactionManager, you are effectively turning transactions off (a bad idea if you're using a database in any capacity) because that TransactionManager doesn't actually do anything (it's a no-op implementation).

Resources