I am trying to make parallel calls to getPrice method, for each product in products. I have this piece of code and verified that getPrice is running in separate threads, but they are running sequentially, not in parallel. Can anyone please point me to what am I missing here?
Thanks a lot for your help.
ExecutorService service = Executors.newFixedThreadPool(THREAD_POOL_SIZE);
Set<Product> decoratedProductSet = products.stream()
.map(product -> CompletableFuture
.supplyAsync(() -> getPrice(product.getId(), date, context), service))
.map(t -> t.exceptionally(throwable -> null))
.map(t -> t.join())
.collect(Collectors.<Product>toSet());
You are streaming your products, sending each of to a CompletableFuture but then wait for it with join, before the stream processes the next one.
Why not use:
products.parallelStream()
.map(p -> getPrice(p.getId(), date, context))
.collect(Collectors.<Product>toSet());
Related
I want to know, how to execute code that is guaranteed to run after a REST request in WebFlux ended, specifically no matter how the request ended (success, error, cancelling etc) or what exception was thrown.
Context:
We have a service that offers long running downloads that export CSV based on a DB stream. We need to restrict the number of requests per tenant and this is currently implemented in the service (yeah, there might be better options).
To do this, we have thread safe counter that is increased when the request starts and should be decreased when the request ends.
Flux<String> exportMeasurements( MeasurementsExportParameter exportParameter ) {
(...)
Mono<String> header = Mono.just( csvConverter.getHeader() );
Mono<String> utf8ByteOrderMark = Mono.just( UTF8_BYTE_ORDER_MARK.toString() );
Mono<String> headerWithBom = utf8ByteOrderMark.zipWith( header, String::concat );
parallelDownloadService.increase( exportParameter.getOwnerId() );
Flux<String> data = influxTemplate
.queryChunked( queries, exportServiceProperties.getMeasurements().getQueryChunkSize() )
.publishOn( Schedulers.boundedElastic() )
.map( this::handleQueryError )
.buffer( 2, 1 )
.map( this::convertToCSV )
.doOnCancel( () -> parallelDownloadService.decrease( exportParameter.getOwnerId() ) )
.doOnTerminate( () -> parallelDownloadService.decrease( exportParameter.getOwnerId() ) )
.doOnError( exception -> logError( exception, EXPORT_TYPE_MEASUREMENT ) );
return headerWithBom.concatWith( data );
That works in most cases, but sometimes not. I've been able to reproduce it starting a request via Postman, having a breakpoint when the query starts, then cancelling the request in Postman and then resume the app.
In that case neither doOnCancel, doOnTerminate nor doOnError is called (I added breakpoints). I also tried doFinally which is also not called.
So I read something about cancellation travelling back upstream, not sure if that can be applied here. Is there any obvious reason why that doesn't work?
I probably hate writing noob questions as much as other people hate answering them, but here goes.
I need to split a message retrieved from a JdbcPollingChannelAdapter into multiple messages based on the operation requested in each row of the resultset in the payload.
The split operation is simple enough. What is proving to be a challenge is conditionally routing the message to one flow or the other.
After much trial and error, I believe that this flow represents my intention
/- insertUpdateAdapter -\
Poll Table -> decorate headers -> split -> router -< >- aggregator -> cleanup
\---- deleteAdapter ----/
TO that end I have constructed this Java DSL:
final JdbcOutboundGateway inboundAdapter = createInboundAdapter();;
final JdbcOutboundGateway deleteAdapter = createDeleteAdapter();
final JdbcOutboundGateway insertUpdateAdapter = createInsertUpdateAdapter();
return IntegrationFlows
.from(setupAdapter,
c -> c.poller(Pollers.fixedRate(1000L, TimeUnit.MILLISECONDS).maxMessagesPerPoll(1)))
.enrichHeaders(h -> h.headerExpression("start", "payload[0].get(\"start\")")
.headerExpression("end", "payload[0].get(\"end\")"))
.handle(inboundAdapter)
.split(insertDeleteSplitter)
.enrichHeaders(h -> h.headerExpression("operation", "payload[0].get(\"operation\")"))
.channel(c -> c.executor("stepTaskExecutor"))
.routeToRecipients (r -> r
.recipientFlow("'I' == headers.operation or 'U' == headers.operation",
f -> f.handle(insertUpdateAdapter))
// This element is complaining "Syntax error on token ")", ElidedSemicolonAndRightBrace expected"
// Attempted to follow patterns from https://github.com/spring-projects/spring-integration-java-dsl/wiki/Spring-Integration-Java-DSL-Reference#routers
.recipientFlow("'D' == headers.operation",
f -> f.handle(deleteAdapter))
.defaultOutputToParentFlow())
)
.aggregate()
.handle(cleanupAdapter)
.get();
Assumptions I have made, based on prior work include:
The necessary channels are auto-created as Direct Channels
Route To Recipients is the appropriate tool for this function (I have also considered expression router, but the examples of how to add sub-flows were less clear than the Route To Recipients)
Insert an ExecutorChannel somewhere between the splitter and router if you want to run the splits in parallel. You can limit the pool size of the executor to control the concurrency.
There is an extra parenthesis after .defaultOutputToParentFlow())
The corrected code is:
return IntegrationFlows
.from(setupAdapter,
c -> c.poller(Pollers.fixedRate(1000L, TimeUnit.MILLISECONDS).maxMessagesPerPoll(1)))
.enrichHeaders(h -> h.headerExpression("ALC_startTime", "payload[0].get(\"ALC_startTime\")")
.headerExpression("ALC_endTime", "payload[0].get(\"ALC_endTime\")"))
.handle(inboundAdapter)
.split(insertDeleteSplitter)
.enrichHeaders(h -> h.headerExpression("ALC_operation", "payload[0].get(\"ALC_operation\")"))
.channel(c -> c.executor(stepTaskExecutor))
.routeToRecipients (r -> r
.recipientFlow("'I' == headers.ALC_operation or 'U' == headers.ALC_operation",
f -> f.handle(insertUpdateAdapter))
// This element is complaining "Syntax error on token ")", ElidedSemicolonAndRightBrace expected"
// Attempted to follow patterns from https://github.com/spring-projects/spring-integration-java-dsl/wiki/Spring-Integration-Java-DSL-Reference#routers
.recipientFlow("'D' == headers.ALC_operation",
f -> f.handle(deleteAdapter))
.defaultOutputToParentFlow())
.aggregate()
.handle(cleanupAdapter)
.get();
I need to implement the following architecture:
I have data that must be sent to systems (Some external application ) using JMS.
Depending on the data you need to send only to the necessary systems (For example, if the number of systems is 4, then you can send from 1 to 4 )
It is necessary to wait for a response from the systems to which the messages were sent, after receiving all the answers, it is required to process the received data (or to process at least one timeout)
The correlation id is contained in the header of both outgoing and incoming JMS messages
Each new such process can be started asynchronously and in parallel
Now I have it implemented only with the help of Spring JMS. I synchronize the threads manually, also manually I manage the thread pools.
The correlation ids and information about the systems in which messages were sent are stored as a state and update it after receiving new messages, etc.
But I want to simplify the logic and use Spring-integration Java DSL, Scatter gather pattern (Which is just my case) and other useful Spring features.
Can you help me show an example of how such an architecture can be implemented with the help of Spring-integration/IntregrationFlow?
Here is some sample from our test-cases:
#Bean
public IntegrationFlow scatterGatherFlow() {
return f -> f
.scatterGather(scatterer -> scatterer
.applySequence(true)
.recipientFlow(m -> true, sf -> sf.handle((p, h) -> Math.random() * 10))
.recipientFlow(m -> true, sf -> sf.handle((p, h) -> Math.random() * 10))
.recipientFlow(m -> true, sf -> sf.handle((p, h) -> Math.random() * 10)),
gatherer -> gatherer
.releaseStrategy(group ->
group.size() == 3 ||
group.getMessages()
.stream()
.anyMatch(m -> (Double) m.getPayload() > 5)),
scatterGather -> scatterGather
.gatherTimeout(10_000));
}
So, there is the parts:
scatterer - to send messages to recipients. In your case all those JMS services. That can be a scatterChannel though. Typically PublishSubscribeChannel, so Scatter-Gather might not know subscrbibers in adavance.
gatherer - well, it is just an aggregator with all its possible options.
scatterGather - is just for convenience for the direct properties of the ScatterGatherHandler and common endpoint options.
How can I set the async operator of Observable to run in the main thread instead in another thread. Or at least set to get the result in the main thread once we finish.
#Test
public void retryWhen() {
Scheduler scheduler = Schedulers.newThread();
Single.just("single")
.map(word -> null)
.map(Object::toString)
.retryWhen(ot ->
ot.doOnNext(t -> System.out.println("Retry mechanism:" + t))
.filter(t -> t instanceof NullPointerException && cont < 5)
.flatMap(t -> Observable.timer(100, TimeUnit.MILLISECONDS,scheduler))
.doOnNext(t -> cont++)
.switchIfEmpty(Observable.error(new NullPointerException())))
.subscribeOn(scheduler)
.subscribe(System.out::println, System.out::println);
// new TestSubscriber()
// .awaitTerminalEvent(1000, TimeUnit.MILLISECONDS);
}
I´m trying observerOn and subscribeOn but both are used to set in which thread you want the execution. But in my case I want the execution or the end of it in the same thread where I run the test
Right now the only way to see the prints are just blocking and waiting for the execution.
Regards.
You could use Observable.toBlocking() to get a BlockingObservable and use that to extract your results in your tests.
If you don't specify observeOn/subscribeOn and no operator or Observable changes the thread, then when you subscribe to the observable it will do all the processing during the subscription.
I create kafka stream with the following codes:
val streams = (1 to 5) map {i =>
KafkaUtils.createStream[....](
streamingContext,
Map( .... ),
Map(topic -> numOfPartitions),
StorageLevel.MEMORY_AND_DISK_SER
).filter(...)
.mapPartitions(...)
.reduceByKey(....)
val unifiedStream = streamingContext.union(streams)
unifiedStream.foreachRDD(...)
streamingContext.start()
I give each stream different group id. When I run the application, only part of kafka messages are received and the executor is pending at foreachRDD call. If I only create one stream, everything works well. There aren't any exceptions from logging info.
I don't know why the application is stuck there. Does it mean no enough resources?
You want to try set the parameter
SparkConf().set("spark.streaming.concurrentJobs", "5")