peek in a parallel stream for incrementing a counter - java-8

I have a pipeline where files are processes in parallel, but I am a bit suspicious about the peek function.
File file = articles.parallelStream( )
.map( article -> {
String fileName = processer.getFriendlyName( article, locale );
currentCount.incrementAndGet();
return new ImmutablePair<>( fileName, converted );
} )
.peek( pair -> statusMessageSender.sendStatusMessage( totalCount, currentCount.get(), pair.getKey( ) ) )
.collect( new Archiver( archivePath ) );
By reading the javadocs, I am not completely sure if the counter that is supposed to send the current status of progress is doing its job (basically, looking for assurance in the docs here)
For parallel stream pipelines, the action may be called at whatever
time and in whatever thread the element is made available by the
upstream operation.
It seems to me that an observer would get the current count, regardless if the file name is correct in relation to the processing order, which is fine. but in the end of the day,I am in a path where I am distrusting the peek, and leading towards sync on sendStatusMessage's receptor.
In the end I am looking for a way to send status in a parallel stream, any thoughts?

Initially the discussion had a lot about the peek and why I was splitting the messaging part from the mapping expression. This was more a matter of style as I tend to favor mapping functions for mapping and nothing more.
I could see why people would defend peek or argue against it. But button line it acts to consume a value and pass it along in the pipe. So, as I was looking for a colateral behavior (passing a message) the peek function seemed perfect.
In the parallel stream the issue is that one cannot predict when peek is actually called. but there was two aspects to consider: when the message is sent was irrelevant for the problem at hands and the message itself could be sent at anytime.
In the end the counter could be in the peek part as well with the message receiver was the only true factor here. The message receiver could have its own counter or only consider the highest received in the time frame.
Button line, the question that begun with suggestions around peek, ended up with the following:
In terms of functionality, the peek function would do its job just fine: mainly because the sequence in the pipe was not ordered.
However the message consumer would tell if it could consume that message correctly.Given that only one consumer was using this information and the others were not, the final conclusion was that we had a problem in the protocol design and not around the peek function. We removed the counter from the std message and the problem was gone. peek could be used in a safe way for this problem, yes it could but...
so:
It could be:
File archive = articles.parallelStream( )
.map( article -> {
File converted = converter.getFile( ... );
String fileName = converter.getFriendlyName( ... );
return new ImmutablePair<>( fileName, converted );
} )
.peek( pair -> statusMessageSender.sendStatusMessage( pair.getKey() ) )
.collect( new Archiver( archivePath, deleteArchivedFiles ) );
or:
File archive = articles.parallelStream( )
.map( article -> {
File converted = converter.getFile( ... );
String fileName = converter.getFriendlyName( ... );
return new ImmutablePair<>( fileName, converted );
} )
.peek( pair -> statusMessageSender.sendStatusMessage( currentCount.incrementAndGet(), pair.getKey() ) )
.collect( new Archiver( archivePath, deleteArchivedFiles ) );
But in the end it was about the protocol and not peek. peek could definitely be used, and the non ordered nature of the problem was the reason way it could be used. (thanks for your help people on SO)

Related

How do I use multiple reactive streams in the same pipeline?

I'm using WebFlux to pull data from two different REST endpoints, and trying to correlate some data from one stream with the other. I have Flux instances called events and egvs and for each event, I want to find the EGV with the nearest timestamp.
final Flux<Tuple2<Double,Object>> data = events
.map(e -> Tuples.of(e.getValue(),
egvs.map(egv -> Tuples.of(egv.getValue(),
Math.abs(Duration.between(e.getDisplayTime(),
egv.getDisplayTime()).toSeconds())))
.sort(Comparator.comparingLong(Tuple2::getT2))
.take(1)
.map(v -> v.getT1())));
When I send data to my Thymeleaf template, the first element of the tuple renders as a number, as I'd expect, but the second element renders as a FluxMapFuseable. It appears that the egvs.map(...) portion of the pipeline isn't executing. How do I get that part of the pipeline to execute?
UPDATE
Thanks, #Toerktumlare - your answer helped me figure out that my approach was wrong. On each iteration through the map operation, the event needs the context of the entire set of EGVs to find the one it matches with. So the working code looks like this:
final Flux<Tuple2<Double, Double>> data =
Flux.zip(events, egvs.collectList().repeat())
.map(t -> Tuples.of(
// Grab the event
t.getT1().getValue(),
// Find the EGV (from the full set of EGVs) with the closest timestamp
t.getT2().stream()
.map(egv -> Tuples.of(
egv.getValue(),
Math.abs(Duration.between(
t.getT1().getDisplayTime(),
egv.getDisplayTime()).toSeconds())))
// Sort the stream of (value, time difference) tuples and
// take the smallest time difference.
.sorted(Comparator.comparingLong(Tuple2::getT2))
.map(Tuple2::getT1)
.findFirst()
.orElse(0.)));
what i think you are doing is that you are breaking the reactive chain.
During the assembly phase reactor will call each operator backwards until it finds a producer that can start producing items and i think you are breaking that chain here:
egvs.map(egv -> Tuples.of( ..., ... )
you see egvs returns something that you need to take care of and chain on to the return of events.map
I'll give you an example:
// This works because we always return from flatMap
// we keep the chain intact
Mono.just("foobar").flatMap(f -> {
return Mono.just(f)
}.subscribe(s -> {
System.out.println(s)
});
on the other hand, this behaves differently:
Mono.just("foobar").flatMap(f -> {
Mono.just("foo").doOnSuccess(s -> { System.out.println("this will never print"); });
return Mono.just(f);
});
Because in this example you can see that we ignore to take care of the return from the inner Mono thus breaking the chain.
You havn't really disclosed what evg actually is so i wont be able to give you a full answer but you should most likely do something like this:
final Flux<Tuple2<Double,Object>> data = events
// chain on egv here instead
// and then return your full tuple object instead
.map(e -> egvs.map(egv -> Tuples.of(e.getValue(), Tuples.of(egv.getValue(), Math.abs(Duration.between(e.getDisplayTime(), egv.getDisplayTime()).toSeconds())))
.sort(Comparator.comparingLong(Tuple2::getT2))
.take(1)
.map(v -> v.getT1())));
I don't have compiler to check against atm. but i believe that is your problem at least. its a bit tricky to read your code.

Akka Streams efficiently fold/merge substreams (WebSocket Frames -> Messages)

tldr. How do I efficiently drain BinaryMessages in Akka HTTP to create a Flow of ByteStrings where each ByteString matches one WS Object.
I want to build a Akka WebSocket server that streams complete WebSocket objects as ByteString i.e. assembles WebSocket frames until I have a full WS object and emits that downstream. Or more generally I have a stream of Sources and want to merge every Source into one element before forwarding downstream
E1(S1(a,b,c)), E2(S2(d,e,f,g)), E3(S3(h,i)) -> E1(abc), E2(defg), E3(hi)
// E = one element in the parent stream
// S a inner source, not all child elements might be available directly
// a-i the actual data elements
However I struggle a bit with the API / the best way to do it efficiently. I came up with the following code, that uses a Sink.fold to drain the sources:
def flattenSink[Mat](sink: Sink[ByteString, Mat], materializer: Materializer): Sink[BinaryMessage, Mat] = {
Flow[BinaryMessage]
.map(d => {
val graph = d.dataStream.toMat(Sink.fold(ByteString.empty)((a, b) => a ++ b))(Keep.right)
val future = graph.run()(materializer)
Source.fromFuture(future)
})
.flatMapConcat(identity)
.toMat(sink)(Keep.right)
}
// or similar with the WS API
Flow[BinaryMessage]
.map(d => d.toStrict(timeout, materializer))
...
but the added materializer looks to me as if this might become inefficient, there could be context switches to a different thread ...
is there a better way to do it? Preferred in a way that obviously runs as part of the main flow, without unnecessary context switches to another thread?
(I'm not concerned about the size that the WS objects might have, the time it might take to assemble them, both will be tiny in my case, I'm not going to stream Gigabyte sized objects)
thanks!
I found a solution using the build in functionality of flatMapConcat. Since flatMapConcat materializes a Source internally, it also allows to transform my source of WebSocket frames into a Source of a single ByteString without an external materializer
def flattenSink[Mat](sink: Sink[ByteString, Mat]): Sink[BinaryMessage, Mat] = {
Flow[BinaryMessage]
.flatMapConcat(msg => if (msg.isStrict) {
Source.single(msg.getStrictData)
} else {
msg.dataStream
.fold(new ByteStringBuilder())((b, e) => b.append(e))
.map(x => x.result())
})
.toMat(sink)(Keep.right)
}
materializer: it should be the same that runs the Flow
bytestring concatenation: the builder should be as efficient as it gets
strict messages: wrapping them in a Source.single seems to be unnecessary but I couldn't find a way around it.

RxSwift - How to create two streams from one upstream

Background
I'm trying to observe one Int stream (actually I'm not, but to make the argument easier) and do something with it while combining that stream to multiple other streams, say a String stream and a Double stream like the following:
// RxSwift
let intStream = BehaviorSubject<Int>(value: 0) // subscribe to this later on
let sharedStream = intStream.share()
let mappedStream = sharedStream.map { ... }.share()
let combinedStream1 = Observable.combineLatest(sharedStream, stringStream).map { ... }
let combinedStream2 = Observable.combineLatest(sharedStream, doubleStream).map { ... }
The above code is just to demonstrate what I'm trying to do. The code above is part of view model code (the VM part of MVVM), and only the first map (for mappedStream) runs, while the others are not called.
Question
What is wrong with the above approach, and how do I achieve what I'm trying to do?
Also, is there a better way to achieve the same effect?
Updates
I confirmed that setting the replay count to 1 makes things work. But why?
The code above all goes in the initialization phase of the view model, and the subscription happens afterwards.
Okay, I have an answer but it's a bit complex... One problem is that you are using a Subject in the view model, but I'll ignore that for now. The real problem comes from the fact that you are using hot observables inappropriately (share() make a stream hot) and so events are getting dropped.
It might help if you put a bunch of .debug()s on this code so you can follow along. But here's the essence...
When you subscribe to mappedStream, it subscribes to the share which in turn subscribes to the sharedStream, which subscribes to the intStream. The intStream then emits the 0, and that 0 goes down the chain and shows up in the observer.
Then you subscribe to the combinedStream1, which subscribes to the sharedStream's share(). Since this share has already been subscribed to, the subscriptions stop there, and since the share has already output it's next event, the combinedStream1 doesn't get the .next(0) event.
Same for the combinedStream2.
Get rid of all the share()s and everything will work:
let intStream = BehaviorSubject<Int>(value: 0) // subscribe to this later on
let mappedStream = intStream.map { $0 }
let combinedStream1 = Observable.combineLatest(intStream, stringStream).map { $0 }
let combinedStream2 = Observable.combineLatest(intStream, doubleStream).map { $0 }
This way, each subscriber of intStream gets the 0 value.
The only time you want to share is if you need to share side effects. There aren’t any side effects in this code, so there’s no need to share.

RunnableGraph to wait for multiple response from source

I am using Akka in Play Controller and performing ask() to a actor by name publish , and internal publish actor performs ask to multiple actors and passes reference of sender. The controller actor needs to wait for response from multiple actors and create a list of response.
Please find the code below. but this code is only waiting for 1 response and latter terminating. Please suggest
// Performs ask to publish actor
Source<Object,NotUsed> inAsk = Source.fromFuture(ask(publishActor,service.getOfferVerifyRequest(request).getPayloadData(),1000));
final Sink<String, CompletionStage<String>> sink = Sink.head();
final Flow<Object, String, NotUsed> f3 = Flow.of(Object.class).map(elem -> {
log.info("Data in Graph is " +elem.toString());
return elem.toString();
});
RunnableGraph<CompletionStage<String>> result = RunnableGraph.fromGraph(
GraphDSL.create(
sink , (builder , out) ->{
final Outlet<Object> source = builder.add(inAsk).out();
builder
.from(source)
.via(builder.add(f3))
.to(out); // to() expects a SinkShape
return ClosedShape.getInstance();
}
));
ActorMaterializer mat = ActorMaterializer.create(aSystem);
CompletionStage<String> fin = result.run(mat);
fin.toCompletableFuture().thenApply(a->{
log.info("Data is "+a);
return true;
});
log.info("COMPLETED CONTROLLER ");
If you have several responses ask won't cut it, that is only for a single request-response where the response ends up in a Future/CompletionStage.
There are a few different strategies to wait for all answers:
One is to create an intermediate actor whose only job is to collect all answers and then when all partial responses has arrived respond to the original requestor, that way you could use ask to get a single aggregate response back.
Another option would be to use Source.actorRef to get an ActorRef that you could use as sender together with tell (and skip using ask). Inside the stream you would then take elements until some criteria is met (time has passed or elements have been seen). You may have to add an operator to mimic the ask response timeout to make sure the stream fails if the actor never responds.
There are some other issues with the code shared, one is creating a materializer on each request, these have a lifecycle and will fill up your heap over time, you should rather get a materializer injected from play.
With the given logic there is no need whatsoever to use the GraphDSL, that is only needed for complex streams with multiple inputs and outputs or cycles. You should be able to compose operators using the Flow API alone (see for example https://doc.akka.io/docs/akka/current/stream/stream-flows-and-basics.html#defining-and-running-streams )

What's the use case of Notification in RxJS?

I'm somewhat familiar with basic RxJS concepts like Observables, Observers and Subjects but RxJS Notifications concept is completely new to me.
What is it for? When should I use it?
The documentation you quoted mentions :
This class is particularly useful for operators that manage notifications, like materialize, dematerialize, observeOn, and others. Besides wrapping the actual delivered value, it also annotates it with metadata of, for instance, what type of push message it is (next, error, or complete).
So the question turns out to be about use cases for materialize and the like.
Basically, you use materialize to get meta-information about the dataflow without incurring into the associated side-effects (an error incurring in a stream for example propagates, a stream which completes can lead to the completion of other streams etc.). dematerialize allows to restore the side-effects.
Here are uses case from former SO questions :
Receiving done notifications from observables built using switch
RxJs - parse file, group lines by topics, but I miss the end
A use case: as errors or completions are propagated immediately, you can't for example delay them. To do so, you can try this approach:
// sample stream
interval(500).pipe(
mapTo('normal value'),
// sometimes value, sometimes throw
map(v => {
if (randomInt() > 50) {
throw new Error('boom!')
} else return v;
}),
materialize(),
// turns Observable<T> into Notification<Observable<T>>
// so we can delay or what you want
delay(500),
// and we need to do some magic and change Notification of error into
// Notification of value (error message)
map(n => n.hasValue? n : new Notification('N', n.error.message, null)),
// back to normal
dematerialize()
)
// now it never throw so in console we will have
// `normal value` or `boom!` but all as... normal values (next() emmision)
// and delay() works as expected
.subscribe(v => console.log(v))

Resources