How to Fan Out Inside Chained CompletableFuture? - java-8

I want to chain a CompletableFuture such that it fans out in the middle of processing. By this I mean I have an open CompletableFuture against a list, and I want to apply a computation against each item in that list.
The first step is to call m_myApi.getResponse(request, executor) which issues an async call.
The result of that async call has a getCandidates method. I want to parse all of those candidates in parallel.
Currently, my code parses them all serially
public CompletableFuture<List<DOMAIN_OBJECT>> parseAllCandidates(#Nonnull final REQUEST request, #Nonnull final Executor executor)
{
CompletableFuture<RESPONSE> candidates = m_myApi.getResponse(request, executor);
return candidates.thenApplyAsync(response -> response.getCandidates()
.stream()
.map(MyParser::ParseCandidates)
.collect(Collectors.toList()));
}
I want something like this:
public CompletableFuture<List<DOMAIN_OBJECT>> parseAllCandidates(#Nonnull final REQUEST request, #Nonnull final Executor executor)
{
CompletableFuture<RESPONSE> candidates = m_myApi.getResponse(request, executor);
return candidates.thenApplyAsync(response -> response.getCandidates()
.stream()
.PARSE_IN_PARALLEL_USING_EXECUTOR
}

As said in this answer, if the Executor happens to be a Fork/Join pool, there is the (undocumented) feature that commencing a parallel stream in one of its worker threads will perform the parallel operation using that executor.
When you want to support arbitrary Executor implementations, things are more complicated. One solution looks like
public CompletableFuture<List<DOMAIN_OBJECT>> parseAllCandidates(
#Nonnull final REQUEST request, #Nonnull final Executor executor)
{
CompletableFuture<RESPONSE> candidates = m_myApi.getResponse(request, executor);
return candidates.thenComposeAsync(
response -> {
List<CompletableFuture<DOMAIN_OBJECT>> list = response.getCandidates()
.stream()
.map(CompletableFuture::completedFuture)
.map(f -> f.thenApplyAsync(MyParser::ParseCandidates, executor))
.collect(Collectors.toList());
return CompletableFuture.allOf(list.toArray(new CompletableFuture<?>[0]))
.thenApplyAsync(x ->
list.stream().map(CompletableFuture::join).collect(Collectors.toList()),
executor);
},
executor);
}
The first crucial thing is that we have to submit all potentially asynchronous jobs before starting to wait on any, to enable the maximum parallelism the executor might support. Hence, we have to collect all futures in a List in a first step.
In the second step, we could just iterate over the list and join all futures. If the executor is a Fork/Join pool and the future has not completed yet, it would detect this and start a compensation thread to regain the configured parallelism. However, for arbitrary executors, we can not assume such a feature. Most notable, if the executor is a single thread executor, this could lead to a deadlock.
Therefore, the solution uses CompletableFuture.allOf to perform the operation of iterating and joining all futures only when all of them have been completed already. Therefore, this solution will never block an executor’s thread, making it compatible with any Executor implementation.

There already is a version of thenApply that takes an Executor as additional argument.
<U> CompletionStage<U> thenApplyAsync​(Function<? super T,​? extends U> fn, Executor executor)
If you pass a forkjoin executor there then a parallel stream inside the lambda will use the passed executor instead of the common pool.

Related

Spring boot: Separate thread pool for specific endpoint

Given a microservice in Spring Boot, it offers 2 end-points to be consumed from 2 separate system.
One of this system is critical while the other one is not.
I would like to prevent the "not critical" one to consume (due to unexpected problems) all the threads (or many) of the HTTP thread pool, so I would like to configure separated thread pools for each one of these end-points.
Is that possible?
There are multiple ways to do this. Using DeferredResult is probably the easiest way:
#RestController
public class Controller {
private final Executor performancePool = Executors.newFixedThreadPool(128);
private final Executor normalPool = Executors.newFixedThreadPool(16);
#GetMapping("/performance")
DeferredResult<String> performanceEndPoint() {
DeferredResult<String> result = new DeferredResult<>();
performancePool.execute(() -> {
try {
Thread.sleep(5000); //A long running task
} catch (InterruptedException e) {
e.printStackTrace();
}
result.setResult("Executed in performance pool");
});
return result;
}
#GetMapping("/normal")
DeferredResult<String> normalEndPoint() {
DeferredResult<String> result = new DeferredResult<>();
normalPool.execute(() -> result.setResult("Executed in normal pool"));
return result;
}
}
You immediately release the Tomcat thread by returning a DeferredResult from a controller, allowing it to serve other requests. The actual response is written to the user when the .setResult method is called.
DeferredResult is one of the many ways you can perform asynchronous request processing in Spring. Check out this section of the docs to learn more about the other ways:
https://docs.spring.io/spring-framework/docs/current/reference/html/web.html#mvc-ann-async
Not sure you can prevent, but you can surely increase the thread pool capacity. By default, tomcat (if default server) can handler 200 simultaneous requests , you can increase that number
Check if this article helps
https://stackoverflow.com/questions/46893237/can-spring-boot-application-handle-multiple-requests-simultaneously#:~:text=Yes%2C%20Spring%20boot%20can%20handle,can%20handle%20200%20simultaneous%20requests.&text=However%2C%20you%20can%20override%20this,tomcat.

Spring Data Redis - StreamMessageListenerContainer only spawning one thread

I am using spring data redis to subscribe to the 'task' redis stream to process tasks.
For some reason redis stream consumer only spawns one thread and processes one message at a time sequentially even thought I explicitly provide a Threadpool TaskExecutor.
I expect it to delegate the creation of threads to the provided Threadpool and spawn a thread up to the Threadpool configured limits. I can see that it is using the give TaskExecutor, but it's not spawning more than one thread.
Even when I don't specify my own taskExecutor, and it internally defaults to SimpleAsyncTaskExecutor, the problem still continues. Tasks are processed sequentially one at a time, one after the other, even when they are long lasting task.
What am I missing here?
#Bean
public Subscription
redisTaskStreamListenerContainer(
RedisConnectionFactory connectionFactory,
#Qualifier("task") RedisTemplate<String, Task<TransportEnvelope>> redisTemplate,
#Qualifier("task") StreamListener<String, MapRecord<String, String, String>> listener,
#Qualifier("task") Executor taskListenerExecutor) {
StreamMessageListenerContainerOptions<String, MapRecord<String, String, String>>
containerOptions = StreamMessageListenerContainerOptions.builder()
.pollTimeout(Duration.ofMillis(consumerPollTimeOutInMilli))
.batchSize(consumerReadBatchSize)
.executor(taskListenerExecutor)
.build();
StreamMessageListenerContainer<String, MapRecord<String, String, String>> container =
StreamMessageListenerContainer.create(connectionFactory, containerOptions);
StreamMessageListenerContainer.ConsumerStreamReadRequest<String> readOptions
=
StreamMessageListenerContainer.StreamReadRequest
.builder(StreamOffset.create(streamName, ReadOffset.lastConsumed()))
//turn off auto shutdown of stream consumer if an error occurs.
.cancelOnError((ex) -> false)
.consumer(Consumer.from(groupId, consumerId))
.build();
Subscription subscription = container.register(readOptions, listener);
container.start();
return subscription;
}
#Bean
#Qualifier("task")
public Executor redisListenerThreadPoolTaskExecutor() {
ThreadPoolTaskExecutor threadPoolTaskExecutor = new ThreadPoolTaskExecutor();
threadPoolTaskExecutor.setCorePoolSize(30);
threadPoolTaskExecutor.setMaxPoolSize(50);
threadPoolTaskExecutor.setQueueCapacity(Integer.MAX_VALUE);
threadPoolTaskExecutor.setThreadNamePrefix("redis-listener-");
threadPoolTaskExecutor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
return threadPoolTaskExecutor;
}

Kafka Streams Processor API batching on size and time

Trying to batch the records using kafka streams processor API. Batching is based on size and time. Lets say if the batch size reaches 10 or the last batch is processed more than 10 secs ago (Size or last processed time what ever comes first) then call external API to send the batch and commit using ProcessingContext.
Using punctuate to periodically check if the batch can be cleared and send to the external system.
Question - Can the processor API process method be invoked by streams API when the punctuate thread is being executed? Since the code is calling commit in punctuate thread can the context.commit() commit records which are not yet processed by process method?
Is it possible that the punctuate thread and process method being executed at the same time in different threads? If so then the code I have commit records which are not processed yet
public class TestProcessor extends AbstractProcessor<String, String> {
private ProcessorContext context;
private List<String> batchList = new LinkedList<>();
private AtomicLong lastProcessedTime = new AtomicLong(System.currentTimeMillis());
private static final Logger LOG = LoggerFactory.getLogger(TestProcessor.class);
#Override
public void init(ProcessorContext context) {
LOG.info("Calling init method " + context.taskId());
this.context = context;
context.schedule(10000, PunctuationType.WALL_CLOCK_TIME, (timestamp) -> {
if(batchList.size() > 0 && System.currentTimeMillis() - lastProcessedTime.get() >
10000){
//call external API
batchList.clear();
lastProcessedTime.set(System.currentTimeMillis());
}
context.commit();
});
}
#Override
public void process(String key, String value) {
batchList.add(value);
LOG.info("Context details " + context.taskId() + " " + context.partition() + " " +
"storeSize " + batchList.size());
if(batchList.size() == 10){
//call external API to send the batch
batchList.clear();
lastProcessedTime.set(System.currentTimeMillis());
}
context.commit();
}
#Override
public void close() {
if(batchList.size() > 0){
//call external API to send the left over records
batchList.clear();
}
}
}
Can the processor API process method be invoked by streams API when
the punctuate thread is being executed?
nope, it's not possible, as Processor executes process and punctuate methods in a single thread (the same thread used for both methods).
Is it possible that the punctuate thread and process method being
executed at the same time in different threads?
response is 'it's not possible', description provided above.
take into consideration that each topic partition will have own instance of your class TestProcessor. instead of local variables batchList and lastProcessedTime I recommend to use Kafka state store like KeyValueStore, so your stream will be fault tolerant.

Does Google Guava Cache do deduplication when refreshing value of the same key

I implemented a non-blocking cache using Google Guava, there's only one key in the cache, and value for the key is only refreshed asynchronously (by overriding reload()).
My question is that does Guava cache handle de-duplication if the first reload() task hasn't finished, and a new get() request comes in.
//Cache is defined like below
this.cache = CacheBuilder
.newBuilder()
.maximumSize(1)
.refreshAfterWrite(10, TimeUnit.MINUTES)
.recordStats()
.build(loader);
//reload is overwritten asynchronously
#Override
public ListenableFuture<Map<String, CertificateInfo>> reload(final String key, Map<String, CertificateInfo> prevMap) throws IOException {
LOGGER.info("Refreshing certificate cache.");
ListenableFutureTask<Map<String, CertificateInfo>> task = ListenableFutureTask.create(new Callable<Map<String, CertificateInfo>>() {
#Override
public Map<String, CertificateInfo> call() throws Exception {
return actuallyLoad();
}
});
executor.execute(task);
return task;
}
Yes, see the documentation for LoadingCache.get(K) (and it sibling, Cache.get(K, Runnable)):
If another call to get(K) or getUnchecked(K) is currently loading the value for key, simply waits for that thread to finish and returns its loaded value.
So if a cache entry is currently being computed (or reloaded/recomputed), other threads that try to retrieve that entry will simply wait for the computation to finish - they will not kick off their own redundant refresh.

Oozie custom asynchronous action

I have a problem implementing a custom asynchronous action in Oozie. My class extends from ActionExecutor, and overwrites the methods initActionType, start, end, check, kill and isCompleted.
In the start method, i want to to start a YARN job, that is implemented through my BiohadoopClient class. To make the call asynchronous, i wrapped the client.run() method in a Callable:
public void start(final Context context, final WorkflowAction action) {
...
Callable<String> biohadoop = new Callable<String>() {
BiohadoopClient client = new BiohadoopClient();
client.run();
}
// submit callable to executor
executor.submit(biohadoop);
// set the start data, according to https://oozie.apache.org/docs/4.0.1/DG_CustomActionExecutor.html
context.setStartData(externalId, callBackUrl, callBackUrl);
...
}
This works fine, and for example when I use my custom action in a fork/join manner, the execution of the actions runs in parallel.
Now, the problem is, that Oozie remains in a RUNNING state for this actions. It seems impossible to change that to a completed state. The check() method is never called by Oozie, the same is true for the end() method. It doesn't help to set the context.setExternalStatus(), context.setExecutionData() and context.setEndData() manually in the Callable (after the client.run() has finished). I tried also to queue manually an ActionEndXCommand, but without luck.
When I wait in the start() method for the Callable to complete, the state gets updated correctly, but the execution in fork/join isn't parallel anymore (which seem logic, as the execution waits for the Callable to complete).
How external clients notify Oozie workflow with HTTP callback didn't help, as using the callback seems to change nothing (well, I can see that it happened in the log files, but beside from that, nothing...). Also, the answer mentioned, that the SSH action runs asynchronously, but I haven't found out how this is done. There is some wrapping inside a Callable, but at the end, the call() method of the Callable is invoked directly (no submission to an Executor).
So far I haven't found any example howto write an asynchronous custom action. Can anybody please help me?
Thanks
Edit
Here are the implementations of initActionType(), start(), check(), end(), the callable implementation can be found inside the start() action.
The callable is submitted to an executor in the start() action, after which its shutdown() method is invoked - so the executor shuts down after the Callable has finished. As next step, context.setStartData(externalId, callBackUrl, callBackUrl) is invoked.
private final AtomicBoolean finished = new AtomicBoolean(false);
public void initActionType() {
super.initActionType();
log.info("initActionType() invoked");
}
public void start(final Context context, final WorkflowAction action)
throws ActionExecutorException {
log.info("start() invoked");
// Get parameters from Node configuration
final String parameter = getParameters(action.getConf());
Callable<String> biohadoop = new Callable<String>() {
#Override
public String call() throws Exception {
log.info("Starting Biohadoop");
// No difference if check() is called manually
// or if the next line is commented out
check(context, action);
BiohadoopClient client = new BiohadoopClient();
client.run(parameter);
log.info("Biohadoop finished");
finished.set(true);
// No difference if check() is called manually
// or if the next line is commented out
check(context, action);
return null;
}
};
ExecutorService executor = Executors.newCachedThreadPool();
biohadoopResult = executor.submit(biohadoop);
executor.shutdown();
String externalId = action.getId();
String callBackUrl = context.getCallbackUrl("finished");
context.setStartData(externalId, callBackUrl, callBackUrl);
}
public void check(final Context context, final WorkflowAction action)
throws ActionExecutorException {
// finished is an AtomicBoolean, that is set to true,
// after Biohadoop has finished (see implementation of Callable)
if (finished.get()) {
log.info("check(Context, WorkflowAction) invoked -
Callable has finished");
context.setExternalStatus(Status.OK.toString());
context.setExecutionData(Status.OK.toString(), null);
} else {
log.info("check(Context, WorkflowAction) invoked");
context.setExternalStatus(Status.RUNNING.toString());
}
}
public void end(Context context, WorkflowAction action)
throws ActionExecutorException {
log.info("end(Context, WorkflowAction) invoked");
context.setEndData(Status.OK, Status.OK.toString());
}
One thing - I can see you are shutting down the executor right after you have submitted the job - executor.shutdown();. That might be causing the issue. Could you please try moving this statement to the end() method instead?
In the end I didn't find a "real" solution to the problem. The solution that worked for me was to implement an action, that invokes the Biohadoop instances in parallel using the Java Executor framework. After the invokation, I wait (still inside the action) for the threads to finish

Resources