Dealing with parallel flux in Reactor - spring-boot

I have created a parallet flux from iterable. And on each iterable I have to make a rest call. But while executing even if any of the request fails , all the remaining requests also fail. I want all the requests to be executed irrespective of failure or success.
I am currently using Flux.fromIterable and using runOn operator
Flux.fromIterable(actions)
.parallel()
.runOn(Schedulars.elastic())
.flatMap(request -> someRemoteCall)
.sequential()
.subscribe();
I want all the requests in iterable to be executed , irrespective of the failure or success. But as of now some gets executed and some gets failed.

There's three possible ways I generally use to achieve this:
Use the 3 argument version of flatMap(), the second of which is a mapperOnError -eg. .flatMap(request -> someRemoteCall(), x->Mono.empty(), null);
Use onErrorResume(x -> Mono.empty()) as a separate call to ignore any error;
Use .onErrorResume(MyException.class, x -> Mono.empty())) to just ignore errors of a certain type.
The second is what I tend to use by default, as I find that clearest.

Because of .parallel().runOn(...) usage you can't use onErrorContinue as below:
.parallel()
.runOn(...)
.flatMap(request -> someRemoteCall)
.onErrorContinue(...)
but you might be able to use it like this:
.parallel().runOn(...)
.flatMap(request -> someRemoteCall
.onErrorContinue((t, o) -> log.error("Skipped error: {}", t.getMessage()))
)
provided that someRemoteCall is a Mono or Flux not itself run on .parallel().runOn(...) rails.
But when you don't have a someRemoteCall you can do the trick below (see NOT_MONO_AND_NOT_FLUX) to ignore the unsafe processing run on .parallel().runOn(...) rails:
Optional<List<String>> foundImageNames =
Flux.fromStream(this.fileStoreService.walk(path))
.parallel(cpus, cpus)
.runOn(Schedulers.newBoundedElastic(cpus, Integer.MAX_VALUE, "import"), 1)
.flatMap(NOT_MONO_AND_NOT_FLUX -> Mono
.just(NOT_MONO_AND_NOT_FLUX)
.map(path -> sneak(() -> unsafeLocalSimpleProcessingReturningString(path)))
.onErrorContinue(FileNotFoundException.class,
(t, o) -> log.error("File missing:\n{}", t.getMessage()))
)
.collectSortedList(Comparator.naturalOrder())
.blockOptional();

I'm still in the process of learning WebFlux and Reactor, but try one of the onErrorContinue directly after flatMap (REST call) to drop (and potentially log) errors.

There are delay error operators in Reactor. You could write your code as follows:
Flux.fromIterable(actions)
.flatMapDelayError(request -> someRemoteCall(request).subscribeOn(Schedulers.elastic()), 256, 32)
.doOnNext(System.out::println)
.subscribe();
Note that this will still fail your flux in case of any inside publisher emits error, however, it will wait for all inner publishers to finish before doing that.
These operators also require to specify the concurrency and prefetch parameters. In the example I've set them to their default values which is used in regular flatMap calls.

Related

Mono - Flux switchIfEmpty and onErrorResume

In project reactor is it possible to implement a stream with switchIfEmpty and onErrorResume at the same time?
infoRepository.findById(id); //returns Mono<Info>
in case of empty or error then switch to the same backup stream?
There's no single operator that does these things together, but you can trivially switch to an empty publisher on an error, then handle both cases through switchIfEmpty like:
infoRepository.findById(id)
.onErrorResume(e -> Mono.empty())
.switchIfEmpty(newPublisher);

how to convert a Flux<Object> list into a List<Object>

I have a Flux and I want to convert it to List. How can I do that?
Flux<Object> getInstances(String serviceId); // Current one
List<Object> getInstances(String serviceId); // Demanded one
Java 8 or reactive components have a prepared method to map or convert it to List ??
I should use .map()
final List<ServiceInstance> sis = convertedStringList.parallelStream()
.map( this.reactiveDiscoveryClient::getInstances )
// It should be converted to List<Object>
1. Make sure you want this
A fair warning before diving into anything else: Converting a Flux to a List/Stream makes the whole thing not reactive in the strict sense of the concept because you are leaving the push domain and trading it with a pull domain. You may or may not want this (usually you don't) depending on the use-case. Just wanted to leave the note.
2. Converting a Flux to a List
According to the Flux documentation, the collectList method will return a Mono<List<T>>. It will return immediately, but it's not the resulting list itself, but a lazy structure, the Mono, that promises the result will eventually be there when the sequence is completed.
According to the Mono documentation, the block method will return the contents of the Mono when it completes. Keep in mind that block may return null.
Combining both, you could use someFlux.collectList().block(). Provided that someFlux is a Flux<Object>, the result would be a List<Object>.
The block method won't return anything if the Flux is infinite. As an example, the following will return a list with two words:
Flux.fromArray(new String[]{"foo", "bar"}).collectList().block()
But the following will never return:
Flux.interval(Duration.ofMillis(1000)).collectList().block()
To prevent blocking indefinitely or for too long, you may pass a Duration argument to block, but that will timeout with an exception when the subscription does not complete on time.
3. Converting a Flux to a Stream
According to the Flux documentation, the toStream method converts a Flux<T> into a Stream<T>. This is more friendly to operators such as flatMap. Mind this simple example, for the sake of demonstration:
Stream.of("f")
.flatMap(letter ->
Flux.fromArray(new String[]{"foo", "bar"})
.filter(word -> word.startsWith(letter)).toStream())
.collect(Collectors.toList())
One could simply use .collectList().block().stream(), but not only it's less readable, but it could also result in NPE if block returned null. This approach does not finish for an infinite Flux as well, but because this is a stream of unknown size, you can still use some operations on it before it's complete, without blocking.

Project Reactor: possibly misleading documentation about error handling

I am reading Reactor reference documentation about error handling and something seems wrong. For example this section about fallback method:
Flux.just("key1", "key2")
.flatMap(k -> callExternalService(k))
.onErrorResume(e -> getFromCache(k));
But onErrorResume() lambda takes only one parameter e (error throwable). How k (previous value emitted by flux) is referenced here?
There are other similar code snippets in the docs. Am I reading this wrong?
Or if documentation is indeed incorrect how can I actually handle this case: recover from error by executing alternative path with previous value?
Yes, I think you found a bug in the documentation.
If you want to use k the call to onErrorResume must happen inside the argument to flatMap like so:
Flux.just("key1", "key2")
.flatMap(k -> callExternalService(k)
.onErrorResume(e -> getFromCache(k))
);
Regarding your comment: It is not possible to have the value being processed as part of the onErrorXXX methods because the error in question might not be happening while a value was processed. Maybe it happened for example while handling backpressure (i.e. requestion more elements) or while subscribing.

Publish-Subscribe Channels Both Going to Kafka Result in Duplicate KafkaProducerContexts

I am attempting to use Spring Integration to send data from one channel to two different Kafka queues after those same data go through different transformations on the way to their respective queues. The problem is I apparently have duplicate producer contexts, and I don't know why.
Here is my flow configuration:
flow -> flow
.channel(“firstChannel")
.publishSubscribeChannel(Executors.newCachedThreadPool(), s -> s
.subscribe(f -> f
.transform(firstTransformer::transform)
.channel(MessageChannels.queue(50))
.handle(Kafka.outboundChannelAdapter(kafkaConfig)
.addProducer(firstMetadata(), brokerAddress), e -> e.id(“firstKafkaOutboundChannelAdapter")
.autoStartup(true)
.poller(p -> p.fixedDelay(1000, TimeUnit.MILLISECONDS).receiveTimeout(0).taskExecutor(taskExecutor))
.get())
)
.subscribe(f -> f
.transform(secondTransformer::transform)
.channel(MessageChannels.queue(50))
.handle(Kafka.outboundChannelAdapter(kafkaConfig)
.addProducer(secondMetadata(), brokerAddress), e -> e.id(“secondKafkaOutboundChannelAdapter")
.autoStartup(true)
.poller(p -> p.fixedDelay(1000, TimeUnit.MILLISECONDS).receiveTimeout(0).taskExecutor(taskExecutor))
.get())
));
The exception is this:
Could not register object [org.springframework.integration.kafka.support.KafkaProducerContext#3163987e] under bean name 'not_specified': there is already object [org.springframework.integration.kafka.support.KafkaProducerContext#15f193b8] bound
I have tried using different kafkaConfig objects, but that hasn't helped. Meanwhile, the ProducerMetadata instances are distinct as you can see from the different first parameters to addProducer. Those provide the names of the respective destination queues among other metadata.
It sounds like there are some implicit bean definitions that are being created that conflict with each other.
How can I resolve this exception with the two KafkaProducerContexts?
You should not to use .get() on those KafkaProducerMessageHandlerSpec and let Framework to work out the environment for you.
The issue is because KafkaProducerMessageHandlerSpec implements ComponentsRegistration and no body cares about the:
public Collection<Object> getComponentsToRegister() {
this.kafkaProducerContext.setProducerConfigurations(this.producerConfigurations);
return Collections.<Object>singleton(this.kafkaProducerContext);
}
after manual .get() invocation.
I agree, this a some inconvenience and we should find some better solution for end-application, but there is no yet choice, unless follow with the Spec style for the Framework components, like Kafka.outboundChannelAdapter().
Hope I am clear.
UPDATE
OK, it's definitely an issue on our side. And we will fix it soon:
https://jira.spring.io/browse/INTEXT-216
https://jira.spring.io/browse/INTEXT-217
Meanwhile the workaround for you is like this:
KafkaProducerContext kafkaProducerContext = (KafkaProducerContext) kafkaProducerMessageHandlerSpec.getComponentsToRegister().iterator().next();
kafkaProducerContext.setBeanName(null);
Where you should move
Kafka.outboundChannelAdapter(kafkaConfig)
.addProducer(firstMetadata(), brokerAddress)
to the separate private method to get access to that kafkaProducerContext.

Debugging Erlang Webmachine resource functions

I'm trying to learn how to write Erlang Webmachine resources. One resource throws an error, but I can't tracking it down. The error message in the crash report does not provide enough information.
Is there a way to test these functions in the Erlang shell?
Most of the functions in the resource require request and context parameters. But I don't know how to simulate these parameters in the browser.
Example code below.
Thanks,
LRP
Example code:
I'm thinking specifically of functions like:
content_types_provided(RD, Ctx) ->
Path = wrq:disp_path(RD),
{[{webmachine_util:guess_mime(Path), generate_body}],
RD, Ctx}.
But my current bug is in the init function.
This works...
Dispatch rule:
{["blip"], zzz_resource, []}.
Init:
init([]) -> {ok, undefined}.
to_html(ReqData, State) ->
% {"<html><bodoy>Hello, new world</body></html>", ReqData, State}.
{test:test(), ReqData, State}.
But this throws an error:
Dispatch:
{["static"], static_resource,[]}.
Init:
init(_) ->
DocRoot =
case init:get_argument(doc_root) of
{ok, [[DR]]} -> DR;
error -> "doc_root path error"
end,
{ok, #ctx{docroot=DocRoot}}.
=ERROR REPORT==== 4-Aug-2011::10:54:56 ===
webmachine error: path="/static"
{error,function_clause,
[{filename,join,[[]]},
{static_resource,resource_exists,2},
There are a lot of layers to this answer depending on what you want to see and how deep down the rabbit hole you want to go.
Let's start with the easy stuff:
The error you are getting tells me that a call to static_resource:resource_exists/2 resulted in a call to filename:join/1 which failed because it was passed [] as its argument. That should help you track down the issue.
Recommended reading: errors-and-exceptions
A crude way to track down errors in any language is just to add print statements at strategic loctations. In this case you can use io:format/2 or erlang:display/1 to display whatever you want to the console. For example:
...
erlang:display("I'm inside resource_exists!"),
StuffToJoin = ["foo", "bar"],
erlang:display(StuffToJoin),
filename:join(StuffToJoin),
...
Just reload the page you should see the value printed in the console (assuming the appropriate function was called as part of the reload).
If you want to manually test a resource (like in a unit test) you can do something like the following:
Headers = [{"Host", "mydomain.com"}, {"user-agent", "Firefox"}],
Context = [],
Path = "/static",
ReqData = wrq:create('GET', {1,1}, Path, mochiweb_headers:from_list(Headers)),
static_resource:resource_exists(ReqData, Context)
If you want a deep look at how to debug webmachine, you can read this. You can get pretty far with the above, but doing a full trace can be helpful if you need to see the decision graph.
In addition to the various techniques David has suggested, you should also learn to use the dbg module. It is incredibly powerful and lets you trace functions and modules in real time.
As an example, for your particular case, suppose you want to trace all the functions in static_resource module :
..
1> dbg:tracer().
{ok,}
2> dbg:p(all,[c]).
{ok,[{matched,nonode#nohost,25}]}
3> dbg:tp({static_resource, '_', '_'}, []).
{ok,[{matched,nonode#nohost,5}]}
...
after which you will see a printout (includs all the function parameters in the function call) whenever static_resource module is invoked anywhere.
A full description of dbg is beyond the scope of this small answer space. I recommend O'rielly's Erlang Programming book. Chaper 17 has a really awesome write up and tutorial on how to use it dbg and its various trace features.

Resources