Execution time of a spring boot job - performance

I am trying to calculate the total time taken by a spring batch job. I have used spring boot to trigger the job. Before spring boot triggers the spring batch job, datasource and other beans required by job are configured by spring boot which consumes some time. Should i consider this time also to calculate total amount of time taken by spring boot job for execution as it uses the datasources and beans configured by spring boot?

Should i consider this time also to calculate total amount of time
taken by spring boot job
Simple answer is NO. Unless you reconnect the datasource and refresh the configuration on each execution of the Job.
When you say, your application (boot or batch) is up and ready for execution, it mean all the components are initialised, dependencies are resolved, connections are made and it is just waiting for a task/trigger to start execution.
This mean, the time taken by the datasource config or context setting is not part of your job execution time.

Should i consider this time also to calculate total amount of time taken by spring boot job for execution
It depends on what you want to measure. If you want to measure the execution time of the whole spring boot app (from the OS point of view, the total time of running your JVM process), then yes you need to include everything.
If you want to measure the execution time of your Spring Batch job and only that, you can use a JobExecutionListener, like for example:
class ExecutionTimeJobListener implements JobExecutionListener {
private Logger logger = LoggerFactory.getLogger(ExecutionTimeJobListener.class);
private StopWatch stopWatch = new StopWatch();
#Override
public void beforeJob(JobExecution jobExecution) {
stopWatch.start();
}
#Override
public void afterJob(JobExecution jobExecution) {
stopWatch.stop();
logger.info("Job took " + stopWatch.getTotalTimeSeconds() + "s");
}
}

Related

How to write a tenantaware RepositoryItemReader in Spring batch?

I have a job configured to run based on the job parameters and integrated with spring web and quartz to invoke based on demand and cron based. I am using RepositoryItemReader to take advantage of spring data. This is running as expected.
Now I want to introduce multi tenancy in the job. I have 3 tenants with different databases say tenant1, tenant2 and tenant3. Basically i want to run the batch job picking the data from the database based on the jobparameter. If the jobparameter is tenant1, i want to pick the data from the tenant1 database.
I have found an article on how to introduce multi tenancy in spring boot application here. https://www.baeldung.com/multitenancy-with-spring-data-jpa
The problem is that i am not able to understand where i could inject the context into the thread as i am using an AsyncTaskScheduler to launch a job and there are other jobs which are also registered in the context.
JobParameters jobParameters = new JobParametersBuilder()
.addString("tenantId",tenantId)
.addString("jobName",jobName)
.addLong("time", System.currentTimeMillis()).toJobParameters();
Job job = jobRegistry.getJob(jobName);
JobExecution jobExecution = asyncJobLauncher.run(job, jobParameters);
My itemReader bean is described as
#StepScope
#Bean
public ItemReader<Person> itemReader() {
return new RepositoryItemReaderBuilder<Person>()
.name("ItemReader")
.repository(personRepository)
.arguments("personName").methodName("findByPersonNameEquals")
.maxItemCount(30).pageSize(5)
.sorts(Collections.singletonMap("createTs", Sort.Direction.ASC)).build();
}
I discovered a work around for the problem by
Extending the RepositoryItemReader something like TenantAwareRepositoryItemReader which takes tenant as contructor arg.
Override the doPageRead() function in TenantAwareRepositoryItemReader by
Setting the tenantId in the threadcontext
Calling the super.doPageRead()
Clear the db thread context
Use the TenantAwareRepositoryItemReader as a itemReader.

improve spring batch job performance

I am in the process of implementing a spring batch job for our file upload process. My requirement is to read a flat file, apply business logic then store it in DB then post a Kafka message.
I have a single chunk-based step that uses a custom reader, processor, writer. The process works fine but takes a lot of time to process a big file.
It takes 15 mins to process a file having 60K records. I need to reduce it to less than 5 mins, as we will be consuming much bigger files than this.
As per https://docs.spring.io/spring-batch/docs/current/reference/html/scalability.html I understand making it multithreaded would give a performance boost, at the cost of restart ability. However, I am using FlatFileItemReader, ItemProcessor, ItemWriter and none of them is thread-safe.
Any suggestions as to how to improve performance here?
Here is the writer code:-
public void write(List<? extends Message> items) {
items.forEach(this::process);
}
private void process(Message message) {
if (message == null)
return;
try {
//message is a DTO that have info about success or failure.
if (success) {
//post kafka message using spring cloud stream
//insert record in DB using spring jpaRepository
} else {
//insert record in DB using spring jpaRepository
}
} catch (Exception e) {
//throw exception
}
}
Best regards,
Preeti
Please refer to below SO thread and refer the git hub source code for parallel processing
Spring Batch multiple process for heavy load with multiple thread under every process
Spring batch to process huge data

Spring Batch - How to output Thread and Grid number to console or log

In my Spring Batch configuration I have this:
#Bean
public TaskExecutor taskExecutor() {
SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor("myJob");
asyncTaskExecutor.setConcurrencyLimit(15);
asyncTaskExecutor.setThreadNamePrefix("SrcToDest");
return taskExecutor;
}
And also I have a "master-step" where I am setting the grid-size as per below:
#Bean
#Qualifier("masterStep")
public Step masterStep() {
return stepBuilderFactory.get("masterStep").partitioner("step1", partitioner()).step(step1())
.taskExecutor(threadpooltaskExecutor()).taskExecutor(taskExecutor())
.gridSize(10).build();
}
In my case, I see only "Thread-x" at the end when "myjob" finishes with "COMPLETED" status.
Questions
In order to monitor how can I print the thread number to the console/log throughout the execution process? i.e. "myjob" start to finish
Is there some way I can get the output to console/log to see the grid action too?
I could not find any example or anywhere in Spring Guides for these.
Still looking how to display grid numbers to console
This depends on your partitioner. You can add a log statement in your partitioner and show the grid size. So at partitioning time, it's on your side.
At partition handling time, Spring Batch will show a log statement at debug level of each execution of the worker step.

Is it good to have dedicated ExecutorService for Spring Boot With Tomcat

I have seen this code many times but don't know what is the advantage/disadvantage for it. In Spring Boot applications, I saw people define this bean.
#Bean
#Qualifier("heavyLoadBean")
public ExecutorService heavyLoadBean() {
return Executors.newWorkStealingPool();
}
Then whenever a CompletableFuture object is created in the service layer, that heavyLoadBean is used.
public CompletionStage<T> myService() {
return CompletableFuture.supplyAsync(() -> doingVeryBigThing(), heavyLoadBean);
}
Then the controller will call the service.
#GetMapping("/some/path")
public CompletionStage<SomeModel> doIt() {
return service.myService();
}
I don't see the point of doing that. Tomcat in Spring Boot has x number of threads. All the threads are used to process user requests. What is the point of using a different thread pool here? Anyway the user expects to see response coming back.
CompletableFuture is used process the tasks asynchronously, suppose in your application if you have two tasks independent of each other then you can execute two tasks concurrently (to reduce the processing time)
public CompletionStage<T> myService() {
CompletableFuture.supplyAsync(() -> doingVeryBigThing(), heavyLoadBean);
CompletableFuture.supplyAsync(() -> doingAnotherBigThing(), heavyLoadBean);
}
In the above example doingVeryBigThing() and doingAnotherBigThing() two tasks which are independent of each other, so now these two tasks will be executed concurrently with two different threads from heavyLoadBean thread pool, try below example will print the two different thread names.
public CompletionStage<T> myService() {
CompletableFuture.supplyAsync(() -> System.out.println(Thread.currentThread().getName(), heavyLoadBean);
CompletableFuture.supplyAsync(() -> System.out.println(Thread.currentThread().getName(), heavyLoadBean);
}
If you don't provide the thread pool, by default supplied Supplier will be executed by ForkJoinPool.commonPool()
public static CompletableFuture supplyAsync(Supplier supplier)
Returns a new CompletableFuture that is asynchronously completed by a task running in the ForkJoinPool.commonPool() with the value obtained by calling the given Supplier.
public static CompletableFuture supplyAsync(Supplier supplier,
Executor executor)
Returns a new CompletableFuture that is asynchronously completed by a task running in the given executor with the value obtained by calling the given Supplier.
Please check comments in the main post and other solutions. They will give you more understanding of java 8 CompletableFuture. I'm just not feeling the right answer was given though.
From our discussions, I can see the purpose of having a different thread pool instead of using the default thread pool is that the default thread pool is also used by the main web server (spring boot - tomcat). Let's say 8 threads.
If we use up all 8 threads, server appears to be irresponsive. However, if you use a different thread pool and exhaust that thread pool with your long running processes, you will get a different errors in your code. Therefore, the server can still response to other user requests.
Correct me if I'm wrong.

Spring Scheduler stops working for my cron expression

I've a method scheduled to run periodically with Spring Scheduler, it's been working fine and stopped working today with no error. What could be the potential cause ? Is there any alternative way to schedule task periodically using Spring Scheduler that ensures that the method will be executed no matter what?
#Scheduled(cron="0 0/1 * * * ?")
public void executePollingFlows(){
if(applicationConfig.isScheduleEnabled()) {
for (long flowId : applicationConfig.getPollingFlowIds()) {
flowService.executeFlow(flowId);
}
logger.info("Finished executing all polling flows at {}", new Date());
}
}
You may have got Out of Memory exception if the job could not finish its tasks but you try to run it again and again. If it is a Out of Memory exception you may try to create a ThreadPool and check it in every run. If there is no enough space in the ThreadPool you can skip the task for this turn.
There is alternative way to use #Scheduled periodically. You may change your #Scheduled annotation with this:
#Scheduled(fixedRate=1000)
It will still be running in every second and if necessary you can add initialDelay to it:
#Scheduled(initialDelay=1000, fixedRate=1000)
You can find more details about fixedRate, initialDelay and fixedDelay here:
https://docs.spring.io/spring/docs/current/spring-framework-reference/html/scheduling.html

Resources