Spring : use multithreading to perform a long task - spring

In a Spring REST application, I want to save 100 000 elements in my database from an external API, which returns XML answer.
I can do it using multiple threads since I am able to retrieve a range of these elements (ex: 1-1000, 1001-2000) when doing a GET request to this external API.
One thread retrieves a range from a ConcurrentLinkedQueue. If this range is not null, this thread download elements. If not, it ends.
If I had 8000 elements to retrieve and wanted to do this task with 4 threads, I'd had this situation :
Multithreading task
I know I can do it using a derived class from Thread, but is there a proper way to use multithreading to achieve a long task in Spring Boot ?
I tried with CompletableFuture but I don't know how to configure the number of threads with it.
In advance, many thanks for your answers !

After some research, I finally found this excellent tutorial (doable with spring): Tutorial from Calli Coder
You can therefore use TaskExecutor (as suggested by mckszcz) along with CompletableFuture to perform a long task :)

Related

Schedule simple GET batch for each second or even less than one second - Should opt for Spring Cloud Task, Spring Batch or springframework.scheduling

Context: in my country there will be a new way to Instantly Payment previewed for November. Basically, the Central Bank will provide two endpoints: (1) one POST endpoint which we post a single money transfer and (2) one GET endpoint where we get the result of a money transfer sent before and it can be completely out of order. It will answer back only on Money Transfer result and in its header will inform if there is another result we must GET. It never informs how many results are available. If there is a result it gives back on Get response and only inform if it is the last one or there is remaining ones for next GET.
Top limitation: from the moment final user clicks Transfer button in his/her mobile app until final result showing in his mobile screen if it was successful or failed is 10 seconds.
Strategy: I want a schedule which triggers each second or even less than a second a Get to Central Bank. The Scheduler will basically evoke a simple function which
Calls the Get endpoint
Pushes it to a Kafka or persist in database and
If in the answer headers it is informed more results are available, start same function again.
Issue: Since we are Spring users/followers, I though my decision was between Spring Batch versus org.springframework.scheduling.annotation.SchedulingConfigurer/TaskScheduler. I have used successfully Spring Batch for while but never for a so short period trigger (never used for 1 second period). I stumbled in discussion that drove me to think if in my case, a very simple task but with very short period, I should consider Spring Cloud Data Flow or Spring Cloud Task instead of Spring Batch.
According to this answer "... Spring Batch is ... designed for the building of complex compute problems ... You can orchestrate Spring Batch jobs with Spring Scheduler if you want". Based on that, it seems I shouldn't use Spring Batch because it isn't complex my case. The challenge design decision is more regard a short period trigger and triggering another batch from current batch instead of transformation, calculation or ETL process. Nevertheless, as far as I can see Spring Batch with its tasklet is well-designed for restarting, resuming and retrying and fits well a scenario which never finishes while org.springframework.scheduling seems to be only a way to trigger an event based on period configuration. Well, this is my filling based on personal uses and studies.
According to an answer to someone asking about orchestration for composed tasks this answer "... you can achieve your design goals using Spring Cloud Data Flow along with the Spring Cloud Task/Spring Batch...". In my case, I don't see composed tasks. In my case, the second trigger doesn't depend on result from previous one. It sounds more as "chained" tasks instead of "composed". I have never used Spring Cloud Data Flow but it seems a nice candidate for Manage/View/Console/Dashboards the triggered task. Nevertheless, I didn't find anywhere informing limitations or rule of thumbs for short periods triggers and "chained" triggers.
So my straight question is: what is the current recommend Spring members for a so short period trigger? Assuming Spring Cloud Data Flow is used for manager/dashboard what is the trigger member from Spring recommended in so short trigger scenarios? It seems Spring Cloud Task is designed for calling complex functions and Spring Batch seems to add too much than I need and org.springframework.scheduling.* missing integration with Spring Cloud Data Flow. As an analogy and not as comparison, in AWS, the documentation clear says "don't use CloudWatch for less than one minute. If you want less than one minute, start CloudWatch for each minute that start another scheduler/cron each second". There might be a well-know rule of thumb for a simple task that needs to be trigger each second or even less than one second and take advantage of Spring family approach/concerns/experience.
This may be stupid answer. Why do you need scheduler here?. Wouldn't a never ending job will achieve the goal here?
You start a job, it does a GET request, push the result to kafka,
If the GET response indicated, it had more results, it immediately does a GET again, push the result to kafka
If the GET response indicated, there are no more results, sleep for 1 second, do the GET request again.

Kotlin: Create custom CoroutineContext

I'm using Kotlin in my API backend. I don't want to run db queries in the common pool. Basically, I want to create a CoroutineContext that has a number of threads that matches the database maximumPoolSize.
What's the best way to accomplish this (generally and for my specific use case)? I know Kotlin provides contexts out of the box, but what's the best approach to create my own?
Bonus question: If I have a jdbc connection pool size of 3, does it make sense to use a coroutinecontext with a thread pool size of 3? Can this guarantee the best concurrency possible?
The function newFixedThreadPoolContext is now considered obsolete with current version of Kotlin coroutines (1.3.0), as it is now annotated with #ObsoleteCoroutinesApi and it will give you a warning if you would try to use. The documentation also states that it will be replaced in the future.
The recommended way to create a CoroutineContext is now through
Executors.newFixedThreadPool(3).asCoroutineDispatcher()
So a complete example with imports, where also creating a CoroutineScope, would look like this
import kotlinx.coroutines.CoroutineScope
import kotlinx.coroutines.asCoroutineDispatcher
import java.util.concurrent.Executors
import kotlin.coroutines.CoroutineContext
fun coroutineScope(threads: Int): CoroutineScope {
val context: CoroutineContext = Executors.newFixedThreadPool(threads).asCoroutineDispatcher()
return CoroutineScope(context)
}
You can create a CoroutineContext that's backed by a thread pool with a fixed number of threads using newFixedThreadPoolContext:
val myContext = newFixedThreadPoolContext(nThreads = 3, name = "My JDBC context")
And yes, it seems like a good idea to match your thread pool's size to the connection pool's size, because that way your threads (assuming they each use one connection at a time) will always have a database connection ready for them - here's a blog post suggesting the same.
The answer by zsmb13 works perfectly, but Android Studio warns that newFixedThreadPoolContext is a delicate API and should only be used in specific cases.
The recommended alternative (as of Spring 2022) for limited parallelism is limitedParallelism:
Creates a view of the current dispatcher that limits the parallelism to the given value. The resulting view uses the original dispatcher for execution, but with the guarantee that no more than parallelism coroutines are executed at the same time.
This method does not impose restrictions on the number of views or the total sum of parallelism values, each view controls its own parallelism independently with the guarantee that the effective parallelism of all views cannot exceed the actual parallelism of the original dispatcher.
(from https://kotlin.github.io/kotlinx.coroutines/kotlinx-coroutines-core/kotlinx.coroutines/-coroutine-dispatcher/limited-parallelism.html)
So an alternative solution would be to create a view on the thread pool used for the db connection like:
val dbDispatcher = Dispatchers.IO.limitedParallelism(maximumPoolSize)
and then use this as coroutine dispatcher.

My Concerns about Spring-Batch that you cant actually multi-thread/read in chunks while reading items

I was trying to batch simple file. I understand that I couldnt multi-thread it. So at least I tried to perform better while increasing the chunks param:
#Bean
public Step processFileStep() {
return stepBuilderFactory.get("processSnidFileStep")
.<MyItem, MyItem>chunk(10)
.reader(reader())
....
My logic needs the processor to 'filter' our non valid records.
but than I found out that the processor not able to get chunks.. but only one Item at a time:
public interface ItemProcessor<I, O> {
O process(I item) throws Exception;
}
In my case I need to access the database and valid my record over there. so for each Item I have to query the DB(instead of doing it with bunch of items together)
I cant multi-thread or make my process perform better? what am I missing here? It will take too long to process each record one by one from a file.
thanks.
From past discussions, the CSV reader may have serious performance issues. You might be better served by writing a reader using another CSV parser.
Depending on your validation data, you might create a job scoped filter bean that wraps a Map that can be either preloaded very quickly or lazy loaded. This way you would limit the hits on the database to either initialization or first reference (repectively), and reduce the filter time to a hashmap lookaside.
In the Spring Batch chunk-oriented processing architecture, the only component where you get access to the complete chunk of records is the ItemWriter.
So if you want to do any kind of bulk processing this is where you would typically do that. Either with an ItemWriteListener#beforeWrite or by implementing your own custom ItemWriter.

Why not delete objects from Cloud Code?

This answer from Parse says:
You can call destroy() on any ParseObject from Cloud Code to delete them. Deleting, as well as creating or updating, multiple objects from Cloud Code is not recommended, however.
Why? The answerer doesn't say, and it seems like Cloud Code would be exactly the place to bulk update/delete objects. Is he using Cloud Code in opposition to a Cloud background job? Or am I missing some other way to delete objects in Parse?
The linked answer was from before the launch of Background Jobs, which have an increased time-limit.
Cloud Functions have a 15 second maximum run-time. This is why you need to be a little conservative about how many operations you perform in a specific cloud function.
Now, Background Jobs are the recommended path for maintenance-type processes. https://parse.com/docs/cloud_code_guide#jobs
They have a 15 minute time limit, and if you're clever about it, can be used to handle lots of work at near-real-time speeds. i.e. https://gist.github.com/gfosco/131974d200c5e9fc6c94

Spring batch JMS writer/reader example

Anybody know of a good resource for a detailed (more so than the Spring Batch docs) look at the uses of JMS Item Writer/Reader in Spring Batch?
Specifically, and because I'm being tasked with trying to reuse an existing system whose only interface is asynchronous over a queue, I'm wondering if the following is possible:
Step 1: read some data and build a message.
Step 2: Drop message on queue using JMSItemWriter.
Step 3: Wait for message to come back using JMSItemReader on the response queue.
Step 4: Do some other stuff
...
Rinse and repeat, a few thousand times a day.
Or in other words, essentially using Spring Batch to force synchronous interaction with an asynchronous resource. I'd like to make sure before I get further in research, that this is A) possible, and B) not shameless abuse of the framework that will cause major headaches down the road.
Thanks in advance for any info.

Resources