Limit thread with partitioner Spring Batch - spring

My Spring batch application consumes too many resources (+4 go Ram).
When I look at the jvm, the application creates 10 threads.
I use the partitioner to process file by file without scheduler
jobExecutionListener is used to stop the batch at the end of execution
#Bean
public Job mainJob() throws IOException {
SimpleJobBuilder mainJob = this.jobBuilderFactory.get("mainJob")
.start(previousStep())
.next(partitionStep())
.next(finalStep())
.listener(jobExecutionListener(taskExecutor()));;
return mainJob.build();
}
#Bean
public Step partitionStep() throws IOException {
Step mainStep = stepBuilderFactory.get("mainStep")
.<InOut, InOut>chunk(1)
.reader(ResourceReader())
.processor(processor())
.writer(itemWriter())
.build();
return this.stepBuilderFactory.get("partitionStep")
.partitioner(mainStep)
.partitioner("mainStep", partitioner())
.build();
}
#Bean(name = "taskExecutor")
public ThreadPoolTaskExecutor taskExecutor() {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setCorePoolSize(1);
taskExecutor.setMaxPoolSize(1);
taskExecutor.setQueueCapacity(1);
taskExecutor.setThreadNamePrefix("MyBatch-");
taskExecutor.initialize();
return taskExecutor;
}
//This jobExecutionListener stop the batch
#Bean
public JobExecutionListener jobExecutionListener(#Qualifier("taskExecutor")
ThreadPoolTaskExecutor executor) {
return new JobExecutionListener() {
private ThreadPoolTaskExecutor taskExecutor = executor;
#Override
public void beforeJob(JobExecution jobExecution) {
}
#Override
public void afterJob(JobExecution jobExecution) {
taskExecutor.shutdown();
System.exit(0);
}
};
}
#Bean
public Partitioner partitioner() {
MultiResourcePartitioner partitioner = new MultiResourcePartitioner();
ResourcePatternResolver patternResolver = new
PathMatchingResourcePatternResolver();
try {
partitioner.setResources(patternResolver.getResources(FILE +
configProperties.getIn()+ "/*.xml"));
} catch (IOException e) {
throw new RuntimeException("I/O problems when resolving the input file pattern.",e);
}
partitioner.setKeyName("file");
return partitioner;
}
How can I apply my application in monothread ? The taskexecutor doesn't work.

Your app creates 10 threads but those are not necessarily Spring Batch threads. According to your config, only one thread with prefix MyBatch- should be created.
Moreover, you declared a task executor as a bean but you did not set it on the partitioned step. Your partitionStep should be something like:
#Bean
public Step partitionStep() throws IOException {
Step mainStep = stepBuilderFactory.get("mainStep")
.<InOut, InOut>chunk(1)
.reader(ResourceReader())
.processor(processor())
.writer(itemWriter())
.build();
return this.stepBuilderFactory.get("partitionStep")
.step(mainStep) // instead of .partitioner(mainStep)
.partitioner("mainStep", partitioner())
.taskExecutor(taskExecutor())
.build();
}
How can I apply my application in monothread ? The taskexecutor doesn't work.
After setting the task executor on the partitioned step, you should see this step being executed by the sole thread as defined in your ThreadPoolTaskExecutor. However, I don't see the benefit of using a single thread for a partitioned step, because the usual goal for such a setup is to process partitions in parallel (either locally with multiple threads or remotely with multiple worker JVMs).
As a side note, it's good that you shutdown the task executor with a Job listener in afterJob, but don't System.exit. You need to let the JVM shutdown gracefully.
Hope this helps.

Related

How to run SystemCommandTasklet asynchronously

I want to run a shell script using Spring Batch and let the batch control job id and status. But I don't want my app to wait/hang until this shell script (SystemCommandTasklet) to be completed.
#Override
#Bean(name = "myJobLauncher")
public SimpleJobLauncher getJobLauncher() throws Exception {
SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
jobLauncher.setJobRepository(getJobRepository());
//jobLauncher.setTaskExecutor(new SimpleAsyncTaskExecutor());
jobLauncher.afterPropertiesSet();
return jobLauncher;
}
#Bean
public Step myStep(Tasklet tasklet) {
return this.stepBuilderFactory.get("myStep")
.listener(tasklet)
.tasklet(tasklet)
.build();
}
#Bean
#StepScope
public SystemCommandTasklet systemCommandTasklet(#Value("#{jobParameters['dir']}") String dir,
#Value("#{jobParameters['command']}") String command) {
SystemCommandTasklet tasklet = new SystemCommandTasklet();
tasklet.setWorkingDirectory(dir);
tasklet.setCommand(command);
tasklet.setTimeout(100000);
return tasklet;
}
When I run the code above, batch/application waits until 'command' is completed.
If I add jobLauncher.setTaskExecutor(new SimpleAsyncTaskExecutor()); then it fails without any error logged.
I had an issue in different part of my code.
adding new SimpleAsyncTaskExecutor() is actually working.

How to add tasklet to run after each partition step completion in Spring Batch

I am new to Spring batch and implementing a spring batch job where it has to pull huge data set from DB and write to file. Below is the sample job config which is working as expected for me.
#Bean
public Job customDBReaderFileWriterJob() throws Exception {
return jobBuilderFactory.get(MY_JOB)
.incrementer(new RunIdIncrementer())
.flow(partitionGenerationStep())
.next(cleanupStep())
.end()
.build();
}
#Bean
public Step partitionGenerationStep() throws Exception {
return stepBuilderFactory
.get("partitionGenerationStep")
.partitioner("Partitioner", partitioner())
.step(multiOperationStep())
.gridSize(50)
.taskExecutor(taskExecutor())
.build();
}
#Bean
public Step multiOperationStep() throws Exception {
return stepBuilderFactory
.get("MultiOperationStep")
.<Input, Output>chunk(100)
.reader(reader())
.processor(processor())
.writer(writer())
.build();
}
#Bean
#StepScope
public DBPartitioner partitioner() {
DBPartitioner dbPartitioner = new DBPartitioner();
dbPartitioner.setColumn(ID);
dbPartitioner.setDataSource(dataSource);
dbPartitioner.setTable(TABLE);
return dbPartitioner;
}
#Bean
#StepScope
public Reader reader() {
return new Reader();
}
#Bean
#StepScope
public Processor processor() {
return new Processor();
}
#Bean
#StepScope
public Writer writer() {
return new Writer();
}
#Bean
public Step cleanupStep() {
return stepBuilderFactory.get("cleanupStep")
.tasklet(cleanupTasklet())
.build();
}
#Bean
#StepScope
public CleanupTasklet cleanupTasklet() {
return new CleanupTasklet();
}
#Bean
public TaskExecutor taskExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(10);
executor.setMaxPoolSize(10);
executor.setQueueCapacity(10);
executor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
executor.setThreadNamePrefix("MultiThreaded-");
return executor;
}
As the data set is huge, i have configured thread pool value for task-executor as 10 and grid size 50. With this setup 10 threads are writing to 10 files at a time, and reader is reading file in chunks so reader processor and writer flow is iterating multiple times (for a group of 10, before moving to next partition).
Now, I would like to add a tasklet where i can compress files once all iteration (read, process,write) for one thread is completed i.e. after completion of each partition.
I do have a cleanup tasklet to run at last, but having compression logic there means to get all files generated from each partition first and then perform compression. Please suggest.
You can change your worker step multiOperationStep to be a FlowStep of a chunk-oriented step followed by a simple tasklet step where you do the compression. In other words, the worker step is actually two steps combined in one FlowStep.

How to configure graceful shutdown using DelegatingSecurityContextScheduledExecutorService with Spring

I'm trying to use the new options to do graceful shutdown with spring introduced in version 2.3, but I'm struggling to make my scheduled task to behave the same way.
As I need a valid user in the context during scheduled tasks execution, I am using DelegatingSecurityContextScheduledExecutorService to achieve this goal.
Here is a sample of my implementation of SchedulingConfigurer:
#Configuration
#EnableScheduling
public class ContextSchedulingConfiguration implements SchedulingConfigurer {
#Override
public void configureTasks(ScheduledTaskRegistrar taskRegistrar) {
taskRegistrar.setScheduler(taskExecutor());
}
#Bean
public TaskSchedulerCustomizer taskSchedulerCustomizer() {
return taskScheduler -> {
taskScheduler.setAwaitTerminationSeconds(120);
taskScheduler.setWaitForTasksToCompleteOnShutdown(true);
taskScheduler.setPoolSize(2);
};
}
#Bean
public Executor taskExecutor() {
ThreadPoolTaskScheduler threadPool = new ThreadPoolTaskScheduler();
taskSchedulerCustomizer().customize(threadPool);
threadPool.initialize();
threadPool.setThreadNamePrefix("XXXXXXXXX");
SecurityContext schedulerContext = createSchedulerSecurityContext();
return new DelegatingSecurityContextScheduledExecutorService(threadPool.getScheduledExecutor(), schedulerContext);
}
private SecurityContext createSchedulerSecurityContext() {
//This is just an example, the actual code makes several changes to the context.
return SecurityContextHolder.createEmptyContext();
}
#Scheduled(initialDelay = 5000, fixedDelay = 15000)
public void run() throws InterruptedException {
System.out.println("Started at: " + LocalDateTime.now().toString());
long until = System.currentTimeMillis() + TimeUnit.SECONDS.toMillis(30);
while (System.currentTimeMillis() < until) {}
System.out.println("Ended at: " + LocalDateTime.now().toString());
}
}
But when I send a termination signal while the sheduled task is running, the application does not wait for the task.
If in my bean taskExecutor I replace the last two lines, returning the ThreadPoolTaskScheduler without a context, everything work as expected. It only doesn't work when I return the DelegatingSecurityContextScheduledExecutorService.
How can I set the context for the taskExecutor and at the same time configure to wait for tasks to complete on shutdown?
I alredy tried several variations of this code, using another implementations of the interfaces TaskScheduler and TaskExecutor, but without success.
For starters cleanup your code and use the proper return types in the bean methods (be specific) and expose both as beans (marking one as #Primary!).
#Configuration
#EnableScheduling
public class ContextSchedulingConfiguration implements SchedulingConfigurer {
#Override
public void configureTasks(ScheduledTaskRegistrar taskRegistrar) {
taskRegistrar.setScheduler(securitytaskScheduler());
}
#Bean
public ThreadPoolTaskScheduler taskScheduler() {
ThreadPoolTaskScheduler taskScheduler= new ThreadPoolTaskScheduler();
taskScheduler.setAwaitTerminationSeconds(120);
taskScheduler.setWaitForTasksToCompleteOnShutdown(true);
taskScheduler.setPoolSize(2);
taskScheduler.setThreadNamePrefix("XXXXXXXXX");
return taskScheduler;
}
#Bean
#Primary
public DelegatingSecurityContextScheduledExecutorService securitytaskScheduler() {
SecurityContext schedulerContext = createSchedulerSecurityContext();
return new DelegatingSecurityContextScheduledExecutorService(taskScheduler().getScheduledExecutor(), schedulerContext);
}
private SecurityContext createSchedulerSecurityContext() {
//This is just an example, the actual code makes several changes to the context.
return SecurityContextHolder.createEmptyContext();
}
#Scheduled(initialDelay = 5000, fixedDelay = 15000)
public void run() throws InterruptedException {
System.out.println("Started at: " + LocalDateTime.now().toString());
long until = System.currentTimeMillis() + TimeUnit.SECONDS.toMillis(30);
while (System.currentTimeMillis() < until) {}
System.out.println("Ended at: " + LocalDateTime.now().toString());
}
}
Important is to be as specific in your return types as possible. Configuration classes are detected early on and the return types are checked to determine the callbacks to be made. Now ThradPoolTaskScheduler is a DisposableBean an Executor is not and will not receive callbacks as such!.

Issues with Spring Batch

Hi I have been working in Spring batch recently and need some help.
1) I want to run my Job using multiple threads, hence I have used TaskExecutor as below,
#Bean
public TaskExecutor taskExecutor() {
SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor();
taskExecutor.setConcurrencyLimit(4);
return taskExecutor;
}
#Bean
public Step myStep() {
return stepBuilderFactory.get("myStep")
.<MyEntity,AnotherEntity> chunk(1)
.reader(reader())
.processor(processor())
.writer(writer())
.taskExecutor(taskExecutor())
.throttleLimit(4)
.build();
}
but, while executing in can see below line in console.
o.s.b.c.l.support.SimpleJobLauncher : No TaskExecutor has been set, defaulting to synchronous executor.
What does this mean? However, while debugging I can see four SimpleAsyncExecutor threads running. Can someone shed some light on this?
2) I don't want to run my Batch application with the metadata tables that spring batch creates. I have tried adding spring.batch.initialize-schema=never. But it didn't work. I also saw some way to do this by using ResourcelessTransactionManager, MapJobRepositoryFactoryBean. But I have to make some database transactions for my job. So will it be alright if I use this?
Also I was able to do this by extending DefaultBatchConfigurer and overriding:
#Override
public void setDataSource(DataSource dataSource) {
// override to do not set datasource even if a datasource exist.
// initialize will use a Map based JobRepository (instead of database)
}
Please guide me further. Thanks.
Update:
My full configuration class here.
#EnableBatchProcessing
#EnableScheduling
#Configuration
public class MyBatchConfiguration{
#Autowired
public JobBuilderFactory jobBuilderFactory;
#Autowired
public StepBuilderFactory stepBuilderFactory;
#Autowired
public DataSource dataSource;
/* #Override
public void setDataSource(DataSource dataSource) {
// override to do not set datasource even if a datasource exist.
// initialize will use a Map based JobRepository (instead of database)
}*/
#Bean
public Step myStep() {
return stepBuilderFactory.get("myStep")
.<MyEntity,AnotherEntity> chunk(1)
.reader(reader())
.processor(processor())
.writer(writer())
.taskExecutor(executor())
.throttleLimit(4)
.build();
}
#Bean
public Job myJob() {
return jobBuilderFactory.get("myJob")
.incrementer(new RunIdIncrementer())
.listener(listener())
.flow(myStep())
.end()
.build();
}
#Bean
public MyJobListener myJobListener()
{
return new MyJobListener();
}
#Bean
public ItemReader<MyEntity> reader()
{
return new MyReader();
}
#Bean
public ItemWriter<? super AnotherEntity> writer()
{
return new MyWriter();
}
#Bean
public ItemProcessor<MyEntity,AnotherEntity> processor()
{
return new MyProcessor();
}
#Bean
public TaskExecutor taskExecutor() {
SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor();
taskExecutor.setConcurrencyLimit(4);
return taskExecutor;
}}
In the future, please break this up into two independent questions. That being said, let me shed some light on both questions.
SimpleJobLauncher : No TaskExecutor has been set, defaulting to synchronous executor.
Your configuration is configuring myStep to use your TaskExecutor. What that does is it causes Spring Batch to execute each chunk in it's own thread (based on the parameters of the TaskExecutor). The log message you are seeing has nothing to do with that behavior. It has to do with launching your job. By default, the SimpleJobLauncher will launch the job on the same thread it is running on, thereby blocking that thread. You can inject a TaskExecutor into the SimpleJobLauncher which will cause the job to be executed on a different thread from the JobLauncher itself. These are two separate uses of multiple threads by the framework.
I don't want to run my Batch application with the metadata tables that spring batch creates
The short answer here is to just use an in memory database like HSQLDB or H2 for your metadata tables. This provides a production grade data store (so that concurrency is handled correctly) without actually persisting the data. If you use the ResourcelessTransactionManager, you are effectively turning transactions off (a bad idea if you're using a database in any capacity) because that TransactionManager doesn't actually do anything (it's a no-op implementation).

Spring Boot Asynchronous Request Processing Task Executor Configuration

From this doc I have learned that, now we can return Callable<T> from any action method. And spring will execute this action in a separate thread with the help of a TaskExecutor. This blog only says that this TaskExecutor is configurable. But I did not find a way to configure this TaskExecutor in spring boot application. Can anyone help me?
My another question is should I worry about the configuration of this TaskExecutor like threadpool size, queue size etc?
As pkoli asked, here is my Main class
#SpringBootApplication
public class MyWebApiApplication extends SpringBootServletInitializer {
public static void main(String[] args) {
SpringApplication.run(MyWebApiApplication.class, args);
}
#Override
protected SpringApplicationBuilder configure(SpringApplicationBuilder application) {
return application.sources(MyWebApiApplication.class);
}
#Bean
public Executor asyncExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(10);
executor.setMaxPoolSize(10);
executor.setQueueCapacity(100);
executor.setThreadNamePrefix("MyThread-");
executor.initialize();
return executor;
}
}
Finally found answer here
To use other implementation of TaskExecutor we can extend our configuration class from WebMvcConfigurerAdapter or we can use it as bean. For example in a boot application:
#SpringBootApplication
public class AsyncConfigExample{
#Bean
WebMvcConfigurer configurer(){
return new WebMvcConfigurerAdapter(){
#Override
public void configureAsyncSupport (AsyncSupportConfigurer configurer) {
ThreadPoolTaskExecutor t = new ThreadPoolTaskExecutor();
t.setCorePoolSize(10);
t.setMaxPoolSize(100);
t.setQueueCapacity(50);
t.setAllowCoreThreadTimeOut(true);
t.setKeepAliveSeconds(120);
t.initialize();
configurer.setTaskExecutor(t);
}
};
}
public static void main (String[] args) {
SpringApplication.run(AsyncConfigExample.class, args);
}
}
To create a task executor simply create a bean as follows with the configuration that suits your requirement.
#Bean
public Executor asyncExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(1);
executor.setMaxPoolSize(1);
executor.setQueueCapacity(100);
executor.setThreadNamePrefix("MyThread-");
executor.initialize();
return executor;
}
Regarding the second part of your question, Yes the configuration needs to be provided keeping in mind your application.
Here's an explanation from the javadoc.
When a new task is submitted in method execute(java.lang.Runnable), and fewer than corePoolSize threads are running, a new thread is created to handle the request, even if other worker threads are idle. If there are more than corePoolSize but less than maximumPoolSize threads running, a new thread will be created only if the queue is full.

Resources