ThreadPoolTaskExecutor execution strategy - spring

I'm using spring's abstraction ThreadPoolTaskExecutor in order to execute tasks using threads.
The execute method described as (java doc):
Execute the given task.
The call might return immediately if the implementation uses an asynchronous execution strategy, or might block in the case of synchronous execution.
Got 2 questions:
Where can the execution strategy be configured?
If the execution strategy is set to "synchronous", how can it serve me? It seems weird to use an executor that works synchronously.

Related

Spring #Scheduled - run concurrently

I have a Spring boot 2.1.6.RELEASE application in which I have a method annotated with
#Scheduled(cron = "*/10 * * * * *}
I want it to run with that cron, EVEN IF another execution is already in progress.
I tried increasing the executor thread number using the application.properties file:
spring.task.scheduling.pool.size=10
But it didn't seem to work as it is still waiting for an execution to finish before starting the next one.
What is the proper way to do parallel executions using a cron in the #Scheduled annotation?
It is true that the default pool size for the task scheduler is 1 but increasing this pool size is only making more threads available for other #Scheduled methods. The intended behaviour is not for methods to run in parallel as otherwise threads could become exhausted.
If you wish to change this behaviour to allow the same method to run in parallel you need to use #EnableAsync and #Async annotations. You might also want to change the pool size of the task executor. That being said, keep in mind that you may still exhaust your threads so be very careful with changing this intended behaviour.

Spring integration: testing poller dependent logic

I wonder how could I write spring tests to assert logic chain which is triggered by 'SourcePollingChannelAdapter'.
What comes to my mind:
use Thread.sleep() which is really bad idea for tests
Have another test version of spring context where I will replace all pollable channels with direct ones. This requires much work.
Are there any common ways to force trigger poller within test?
Typically we use QueueChannel in our tests and wait for the messages via its receive(10000) method. This way, independently of the source of data, our test method thread is blocked until data has arrived.
The SourcePollingChannelAdapter is triggered by the TaskScheduler, therefore the whole flow logic is done within a separate thread from the test method. I mean that your idea about replacing channels won't help. The Thread.sleep() might have value, but QueueChannel.receive(10000) is much reliable because we really maximum wait only for those 10 seconds.
Another way to block test-case comes from the standard CountDownLatch, which you would countDown() somewhere in the flow and wait for it in the test method.
There is some other way to test: have some loop with short sleep period in between iteration and check some condition to exit and verify. That may be useful in case of poller and database in the end. So, we would perform SELECT in that loop until desired state.
You can find some additional info in the Reference Manual.

Issue with Spring Boot Async method

I have a asynchronous method enabled using #Async annotation. At times i am seeing SimpleAsyncTaskExecutor thread count increases exponentially. Any idea on this behavior?
If it increases literally exponentially it sounds like the async method is calling itself perhaps?
By default, Spring uses a SimpleAsyncTaskExecutor to run the methods asynchronously.
SimpleAsyncTaskExecutor spawns a new thread with each task and does not support thread pooling and queueing of tasks.
So, if the async method is called multiple times in a short span of time, multiple threads will be opened for each task
You should define your own executor. Refer the following link
http://www.baeldung.com/spring-async

Multiple Spring Batch instances for robustness and scalability

My batch use case looks like a common pattern, yet I'm not sure if Spring Batch is designed to work as I expect. Many thanks in advance for clarifications and suggestions.
There can be a number of spring-batch based applications responsible for tasks processing. Processing requests come via REST resource, hence ThreadPoolTaskExecutor is used (as discussed here). JobRegistry is based on JDBC, all instances share exactly the same configuration.
What I want to achieve is a situation where each node can process any job that has been submitted. This way I can scale my solution out as load grows - I would simply add new instances that would process queued requests. Solution is also robust: if any node dies, its tasks are automatically handled by a different node.
But it does not work this way with Spring Batch apparently. Each node seems to handle the tasks that were handled on that very node. Even if node A has 1000 items in the queue and node B does nothing, it will not take any of the A's load.
That's because how SimpleJobLauncher works - it simply queues a task into taskExecutor after creating a respective JobExecution:
JobExecution jobExecution = this.jobRepository.createJobExecution(job.getName(), jobParameters);
this.taskExecutor.execute(new Runnable(job, jobParameters, jobExecution) ....
I don't think I need job partitioning - it does not seem to be the usecase. So how do I achieve robustness and scalability with Spring Batch?
Thanks
f

Parallel step execution of ItemStreamReader in SpringBatch

I have a ItemStreamReader (extends AbstractItemCountingItemStreamItemReader), the reader on its own is quite fast, but the the following processing takes quite some time.
From a business point of view I can process as many items in parallel as I want.
As my ItemStreamReader is reading a large JSON file with a JsonParser, it ends up to be statefull. So just adding a TaskExecutor to the Step does not work and throws parsing exceptions and the following log output by spring batch:
16:51:41.023 [main] WARN o.s.b.c.s.b.FaultTolerantStepBuilder - Asynchronous TaskExecutor detected with ItemStream reader. This is probably an error, and may lead to incorrect restart data being stored.
16:52:29.790 [jobLauncherTaskExecutor-1] WARN o.s.b.core.step.item.ChunkMonitor - No ItemReader set (must be concurrent step), so ignoring offset data.
16:52:31.908 [feed-import-1] WARN o.s.b.core.step.item.ChunkMonitor - ItemStream was opened in a different thread. Restart data could be compromised.
How can I execute the processing in my Step to be executed in parallel by multiple threads?
Spring Batch provides a number of ways to parallelize processing. In your case, since processing seems to be the bottle neck, I'd recommend looking at two options:
AsyncItemProcessor/AsyncItemWriter
The AsyncItemProcessor and AsyncItemWriter work in tandem to parallelize the processing of items within a chunk. You can think of them as a kind of fork/join concept. The items within the chunk are read by a single thread as normal. The AsyncItemProcessor wraps your normal ItemProcessor and executes that logic on a different thread, returning a Future instead of the actual item. The AsyncItemWriter then waits for the Future to return the processed item before writing it. These classes are found in the Spring Batch Integration module. You can read more about them in the documentation here: http://docs.spring.io/spring-batch/trunk/reference/html/springBatchIntegration.html#asynchronous-processors
Remote Chunking
The AsyncItemProcessor/AsyncItemWriter paradigm works well in a single JVM, but if you need to scale your processing further, you may want to take a look at remote chunking. Remote chunking is designed to scale the processor piece of a step to beyond a single JVM. Using a master/slave configuration, the master reads the input using a regular ItemReader. Then the items are sent via Spring Integration channels to the slaves for processing. The results can either be written in the slave or returned to the master for writing. It's important to note that in this approach, each item read by the master will go over the wire so it can be very IO intensive and should only be considered if the processing bottle neck is worse than the potential impact of sending the messages. You can read more about remote chunking in the documentation here: http://docs.spring.io/spring-batch/trunk/reference/html/springBatchIntegration.html#externalizing-batch-process-execution

Resources