Akka Streams efficiently fold/merge substreams (WebSocket Frames -> Messages)

Akka Streams efficiently fold/merge substreams (WebSocket Frames -> Messages) - websocket

tldr. How do I efficiently drain BinaryMessages in Akka HTTP to create a Flow of ByteStrings where each ByteString matches one WS Object.
I want to build a Akka WebSocket server that streams complete WebSocket objects as ByteString i.e. assembles WebSocket frames until I have a full WS object and emits that downstream. Or more generally I have a stream of Sources and want to merge every Source into one element before forwarding downstream
E1(S1(a,b,c)), E2(S2(d,e,f,g)), E3(S3(h,i)) -> E1(abc), E2(defg), E3(hi)
// E = one element in the parent stream
// S a inner source, not all child elements might be available directly
// a-i the actual data elements
However I struggle a bit with the API / the best way to do it efficiently. I came up with the following code, that uses a Sink.fold to drain the sources:
def flattenSink[Mat](sink: Sink[ByteString, Mat], materializer: Materializer): Sink[BinaryMessage, Mat] = {
Flow[BinaryMessage]
.map(d => {
val graph = d.dataStream.toMat(Sink.fold(ByteString.empty)((a, b) => a ++ b))(Keep.right)
val future = graph.run()(materializer)
Source.fromFuture(future)
})
.flatMapConcat(identity)
.toMat(sink)(Keep.right)
}
// or similar with the WS API
Flow[BinaryMessage]
.map(d => d.toStrict(timeout, materializer))
...
but the added materializer looks to me as if this might become inefficient, there could be context switches to a different thread ...
is there a better way to do it? Preferred in a way that obviously runs as part of the main flow, without unnecessary context switches to another thread?
(I'm not concerned about the size that the WS objects might have, the time it might take to assemble them, both will be tiny in my case, I'm not going to stream Gigabyte sized objects)
thanks!

I found a solution using the build in functionality of flatMapConcat. Since flatMapConcat materializes a Source internally, it also allows to transform my source of WebSocket frames into a Source of a single ByteString without an external materializer
def flattenSink[Mat](sink: Sink[ByteString, Mat]): Sink[BinaryMessage, Mat] = {
Flow[BinaryMessage]
.flatMapConcat(msg => if (msg.isStrict) {
Source.single(msg.getStrictData)
} else {
msg.dataStream
.fold(new ByteStringBuilder())((b, e) => b.append(e))
.map(x => x.result())
})
.toMat(sink)(Keep.right)
}
materializer: it should be the same that runs the Flow
bytestring concatenation: the builder should be as efficient as it gets
strict messages: wrapping them in a Source.single seems to be unnecessary but I couldn't find a way around it.

Related

Linearization in Reactor Netty (Spring Boot Webflux)

How can I guarantee linearizability of requests in Reactor Netty?
Theory:
Given:
Request A wants to write x=2, y=0
Request B wants to read x, y and write x=x+2, y=y+1
Request C wants to read x and write y=x
All Requests are processed asynchronously and return to the client immediately with status ACCEPTED.
Example:
Send requests A, B, C in order.
Example Log Output: (request, thread name, x, y)
Request A, nioEventLoopGroup-2-0, x=2, y=0
Request C, nioEventLoopGroup-2-2, x=2, y=2
Request B, nioEventLoopGroup-2-1, x=4, y=3
Business logic requires all reads after A to see x=2 and y=0.
And request B to see x=2, y=0 and set y=1.
And request C to see x=4 and set y=4.
In short: The business logic makes every next write operation dependent on the previous write operation to be completed. Otherwise the operations are not reversible.
Example Code
Document:
#Document
#Data
#NoArgsConstructor
#AllArgsConstructor
public class Event {
#Id
private String id;
private int data;
public Event withNewId() {
setId(UUID.randomUUID().toString());
return this;
}
}
Repo:
public interface EventRepository extends ReactiveMongoRepository<Event, String> {}
Controller:
#RestController
#RequestMapping(value = "/api/event")
#RequiredArgsConstructor
public class EventHandler {
private final EventRepository repo;
#PostMapping
public Mono<String> create(Event event) {
return Mono.just(event.withNewId().getId())
.doOnNext(id ->
// do query based on some logic depending on event data
Mono.just(someQuery)
.flatMap(query ->
repo.find(query)
.map(e -> event.setData(event.getData() + e.getData())))
.switchIfEmpty(Mono.just(event))
.flatMap(e -> repo.save(e))
.subscribeOn(Schedulers.single())
.subscribe());
}
}
It does not work, but with subscribeOn I try to guarantee linearizability. Meaning that concurrent requests A and B will always write their payload to the DB in the order in which they are received by the server. Therefore if another concurrent request C is a compound of first read than write, it will read changes from the DB that reflect those of request B, not A, and write its own changes based of B.
Is there a way in Reactor Netty to schedule executors with an unbound FIFO queue, so that I can process the requests asynchronously but in order?

I don't think that this is specific to Netty or Reactor in particular, but to a more broad topic - how to handle out-of-order message delivery and more-than-once message delivery. A few questions:
Does the client always sends the same number of requests in the same order? There's always a chance that, due to networking issues the requests may arrive out of order, or one or more may be lost.
Does the client make retries? What happens if the same request reaches the server twice?
If the order matters, why doesn't the client wait for the result of the nth-1 request, before issuing nth request? In other words, why there are many concurrent requests?
I'd try to redesign the operation in such a way that there's a single request executing the operations on the backend in the required order and using concurrency here if necessary to speed-up the process.
If it's not possible, for example, you don't control the client, or more generally the order in which the events (requests) arrive, you have to implement ordering on application-level logic using per-message semantics to do the ordering. You can, for example store or buffer the messages, waiting for all to arrive, and when they do, only then trigger the business logic using the data from the messages in the correct order. This requires some kind of a key (identity) which can attribute messages to the same entity, and a sorting-key, that you know how to sort the messages in the correct order.
EDIT:
After getting the answers, you can definitely implement it "the Reactor way".
Sinks.Many<Event> sink = Sinks.many() // you creat a 'sink' where the events will go
.multicast() // broads all messages to all subscribes of the stream
.directBestEffort(); // additional semantics - publishing will fail if no subscribers - doesn't really matter here
Flux<Event> eventFlux = sink.asFlux(); // the 'view' of the sink as a flux you can subscribe to
public void run() {
subscribeAndProcess();
sink.tryEmitNext(new Event("A", "A", "A"));
sink.tryEmitNext(new Event("A", "C", "C"));
sink.tryEmitNext(new Event("A", "B", "B"));
sink.tryEmitNext(new Event("B", "A", "A"));
sink.tryEmitNext(new Event("B", "C", "C"));
sink.tryEmitNext(new Event("B", "B", "B"));
}
void subscribeAndProcess() {
eventFlux.groupBy(Event::key)
.flatMap(
groupedEvents -> groupedEvents.distinct(Event::type) // distinct to avoid duplicates
.buffer(3) // there are three event types, so we buffer and wait for all to arrive
.flatMap(events -> // once all the events are there we can do the processing the way we need
Mono.just(events.stream()
.sorted(Comparator.comparing(Event::type))
.map(e -> e.key + e.value)
.reduce(String::concat)
.orElse(""))
)
)
.subscribe(System.out::println);
}
// prints values concatenated in order per key:
// - AAABAC
// - BABBBC
See Gist: https://gist.github.com/tarczynskitomek/d9442ea679e3eed64e5a8470217ad96a
There are a few caveats:
If all of the expected events for the given key don't arrive you waste memory buffering - unless you set a timeout
How will you ensure that all the events for a given key go to the same application instance?
How will you recover from failures encountered mid-processing?
Having all this in mind, I would go with a persistent storage - say saving the incoming events in the database, and doing the processing in background - for this you don't need to use Reactor. Most of the time a simple Servlets based Spring app will be far easier to maintain and develop, especially if you have no previous experience with Functional Reactive Programming.

Looking at the provided code I would not try to handle it on Reactor Netty level.
At first, several comments regarding controller implementation because it has multiple issues that violate reactive principles. I would recommend to spend some time learning reactive API but here are some hints
In reactive nothing happens until you subscribe. At the same time calling subscribe explicitly is an anti-pattern and should be avoided until you are creating framework similar to WebFlux.
parallel scheduler should be used to run non-blocking logic until you have some blocking code.
doOn... are so-called side-effect operators and should not be used for constructing reactive flows.
#PostMapping
public Mono<String> create(Event event) {
// do query based on some logic depending on event data
return repo.find(query)
.map(e -> event.setData(event.getData() + e.getData()))
.switchIfEmpty(Mono.just(event))
.flatMap(e -> repo.save(e));
}
Now, processing requests in the predefined sequence could be tricky because of network failures, possible retries, etc. What if you never get Request B or Request C? Should you still persist Request A?
As #ttarczynski mentioned in his comment the best option is to redesign API and send single request.
In case it's not an option you would need to introduce some state to "postpone" request processing and then, depending on consistency semantic, process them as a "batch" when the last request is received or just defer Request C until you get Request A & B.

How do I use multiple reactive streams in the same pipeline?

I'm using WebFlux to pull data from two different REST endpoints, and trying to correlate some data from one stream with the other. I have Flux instances called events and egvs and for each event, I want to find the EGV with the nearest timestamp.
final Flux<Tuple2<Double,Object>> data = events
.map(e -> Tuples.of(e.getValue(),
egvs.map(egv -> Tuples.of(egv.getValue(),
Math.abs(Duration.between(e.getDisplayTime(),
egv.getDisplayTime()).toSeconds())))
.sort(Comparator.comparingLong(Tuple2::getT2))
.take(1)
.map(v -> v.getT1())));
When I send data to my Thymeleaf template, the first element of the tuple renders as a number, as I'd expect, but the second element renders as a FluxMapFuseable. It appears that the egvs.map(...) portion of the pipeline isn't executing. How do I get that part of the pipeline to execute?
UPDATE
Thanks, #Toerktumlare - your answer helped me figure out that my approach was wrong. On each iteration through the map operation, the event needs the context of the entire set of EGVs to find the one it matches with. So the working code looks like this:
final Flux<Tuple2<Double, Double>> data =
Flux.zip(events, egvs.collectList().repeat())
.map(t -> Tuples.of(
// Grab the event
t.getT1().getValue(),
// Find the EGV (from the full set of EGVs) with the closest timestamp
t.getT2().stream()
.map(egv -> Tuples.of(
egv.getValue(),
Math.abs(Duration.between(
t.getT1().getDisplayTime(),
egv.getDisplayTime()).toSeconds())))
// Sort the stream of (value, time difference) tuples and
// take the smallest time difference.
.sorted(Comparator.comparingLong(Tuple2::getT2))
.map(Tuple2::getT1)
.findFirst()
.orElse(0.)));

what i think you are doing is that you are breaking the reactive chain.
During the assembly phase reactor will call each operator backwards until it finds a producer that can start producing items and i think you are breaking that chain here:
egvs.map(egv -> Tuples.of( ..., ... )
you see egvs returns something that you need to take care of and chain on to the return of events.map
I'll give you an example:
// This works because we always return from flatMap
// we keep the chain intact
Mono.just("foobar").flatMap(f -> {
return Mono.just(f)
}.subscribe(s -> {
System.out.println(s)
});
on the other hand, this behaves differently:
Mono.just("foobar").flatMap(f -> {
Mono.just("foo").doOnSuccess(s -> { System.out.println("this will never print"); });
return Mono.just(f);
});
Because in this example you can see that we ignore to take care of the return from the inner Mono thus breaking the chain.
You havn't really disclosed what evg actually is so i wont be able to give you a full answer but you should most likely do something like this:
final Flux<Tuple2<Double,Object>> data = events
// chain on egv here instead
// and then return your full tuple object instead
.map(e -> egvs.map(egv -> Tuples.of(e.getValue(), Tuples.of(egv.getValue(), Math.abs(Duration.between(e.getDisplayTime(), egv.getDisplayTime()).toSeconds())))
.sort(Comparator.comparingLong(Tuple2::getT2))
.take(1)
.map(v -> v.getT1())));
I don't have compiler to check against atm. but i believe that is your problem at least. its a bit tricky to read your code.

RunnableGraph to wait for multiple response from source

I am using Akka in Play Controller and performing ask() to a actor by name publish , and internal publish actor performs ask to multiple actors and passes reference of sender. The controller actor needs to wait for response from multiple actors and create a list of response.
Please find the code below. but this code is only waiting for 1 response and latter terminating. Please suggest
// Performs ask to publish actor
Source<Object,NotUsed> inAsk = Source.fromFuture(ask(publishActor,service.getOfferVerifyRequest(request).getPayloadData(),1000));
final Sink<String, CompletionStage<String>> sink = Sink.head();
final Flow<Object, String, NotUsed> f3 = Flow.of(Object.class).map(elem -> {
log.info("Data in Graph is " +elem.toString());
return elem.toString();
});
RunnableGraph<CompletionStage<String>> result = RunnableGraph.fromGraph(
GraphDSL.create(
sink , (builder , out) ->{
final Outlet<Object> source = builder.add(inAsk).out();
builder
.from(source)
.via(builder.add(f3))
.to(out); // to() expects a SinkShape
return ClosedShape.getInstance();
}
));
ActorMaterializer mat = ActorMaterializer.create(aSystem);
CompletionStage<String> fin = result.run(mat);
fin.toCompletableFuture().thenApply(a->{
log.info("Data is "+a);
return true;
});
log.info("COMPLETED CONTROLLER ");

If you have several responses ask won't cut it, that is only for a single request-response where the response ends up in a Future/CompletionStage.
There are a few different strategies to wait for all answers:
One is to create an intermediate actor whose only job is to collect all answers and then when all partial responses has arrived respond to the original requestor, that way you could use ask to get a single aggregate response back.
Another option would be to use Source.actorRef to get an ActorRef that you could use as sender together with tell (and skip using ask). Inside the stream you would then take elements until some criteria is met (time has passed or elements have been seen). You may have to add an operator to mimic the ask response timeout to make sure the stream fails if the actor never responds.
There are some other issues with the code shared, one is creating a materializer on each request, these have a lifecycle and will fill up your heap over time, you should rather get a materializer injected from play.
With the given logic there is no need whatsoever to use the GraphDSL, that is only needed for complex streams with multiple inputs and outputs or cycles. You should be able to compose operators using the Flow API alone (see for example https://doc.akka.io/docs/akka/current/stream/stream-flows-and-basics.html#defining-and-running-streams )

Java 8 JPA Repository Stream produce two (or more) results?

I have a Java 8 stream being returned by a Spring Data JPA Repository. I don't think my usecase is all that unusual, there are two (actually 3 in my case), collections off of the resulting stream that I would like collected.
Set<Long> ids = // initialized
try (Stream<SomeDatabaseEntity> someDatabaseEntityStream =
someDatabaseEntityRepository.findSomeDatabaseEntitiesStream(ids)) {
Set<Long> theAlphaComponentIds = someDatabaseEntityStream
.map(v -> v.getAlphaComponentId())
.collect(Collectors.toSet());
// operations on 'theAlphaComponentIds' here
}
I need to pull out the 'Beta' objects and do some work on those too. So I think I had to repeat the code, which seems completely wrong:
try (Stream<SomeDatabaseEntity> someDatabaseEntityStream =
someDatabaseEntityRepository.findSomeDatabaseEntitiesStream(ids)) {
Set<BetaComponent> theBetaComponents = someDatabaseEntityStream
.map(v -> v.getBetaComponent())
.collect(Collectors.toSet());
// operations on 'theBetaComponents' here
}
These two code blocks occur serially in the processing. Is there clean way to get both Sets from processing the Stream only once? Note: I do not want some kludgy solution that makes up a wrapper class for the Alpha's and Beta's as they don't really belong together.

You can always refactor code by putting the common parts into a method and turning the uncommon parts into parameters. E.g.
public <T> Set<T> getAll(Set<Long> ids, Function<SomeDatabaseEntity, T> f)
{
try(Stream<SomeDatabaseEntity> someDatabaseEntityStream =
someDatabaseEntityRepository.findSomeDatabaseEntitiesStream(ids)) {
return someDatabaseEntityStream.map(f).collect(Collectors.toSet());
}
}
usable via
Set<Long> theAlphaComponentIds = getAll(ids, v -> v.getAlphaComponentId());
// operations on 'theAlphaComponentIds' here
and
Set<BetaComponent> theBetaComponents = getAll(ids, v -> v.getBetaComponent());
// operations on 'theBetaComponents' here
Note that this pulls the “operations on … here” parts out of the try block, which is a good thing, as it implies that the associated resources are released earlier. This requires that BetaComponent can be processed independently of the Stream’s underlying resources (otherwise, you shouldn’t collect it into a Set anyway). For the Longs, we know for sure that they can be processed independently.
Of course, you could process the result out of the try block even without the moving the common code into a method. Whether the original code bears a duplication that requires this refactoring, is debatable. Actually, the operation consists a single statement within a try block that looks big only due to the verbose identifiers. Ask yourself, whether you would still deem the refactoring necessary, if the code looked like
Set<Long> alphaIDs, ids = // initialized
try(Stream<SomeDatabaseEntity> s = repo.findSomeDatabaseEntitiesStream(ids)) {
alphaIDs = s.map(v -> v.getAlphaComponentId()).collect(Collectors.toSet());
}
// operations on 'theAlphaComponentIds' here
Well, different developers may come to different conclusions…
If you want to reduce the number of repository queries, you can simply store the result of the query:
List<SomeDatabaseEntity> entities;
try(Stream<SomeDatabaseEntity> someDatabaseEntityStream =
someDatabaseEntityRepository.findSomeDatabaseEntitiesStream(ids)) {
entities=someDatabaseEntityStream.collect(Collectors.toList());
}
Set<Long> theAlphaComponentIds = entities.stream()
.map(v -> v.getAlphaComponentId()).collect(Collectors.toSet());
// operations on 'theAlphaComponentIds' here
Set<BetaComponent> theBetaComponents = entities.stream()
.map(v -> v.getBetaComponent()).collect(Collectors.toSet());
// operations on 'theBetaComponents' here

Consuming both Strict and Streamed WebSocket Messages in Akka

I am experimenting with building a web socket service using Akka HTTP. I need to handle Strict messages that arrive in totality, as well as handle Streamed messages that arrive in m multiple frames. I am using a route with handleWebSocketMessages() to pass the handling of web sockets off to a flow. The code I have looks something like this:
val route: Route =
get {
handleWebSocketMessages(createFlow())
}
def createFlow(): Flow[Message, Message, Any] = Flow[Message]
.collect {
case TextMessage.Strict(msg) ⇒ msg
case TextMessage.Streamed(stream) => ??? // <= What to do here??
}
.via(createActorFlow())
.map {
case msg: String ⇒ TextMessage.Strict(msg)
}
def createActorFlow(): Flow[String, String, Any] = {
// Set Up Actors
// ... (this is working)
Flow.fromSinkAndSource(in, out)
}
I am not really sure how two handle both Strict and Streamed messages. I realize I could do something like this :
.collect {
case TextMessage.Strict(msg) ⇒ Future.successful(msg)
case TextMessage.Streamed(stream) => stream.runFold("")(_ + _)
}
But now my stream has to handle Future[String] rather than just Strings, which I am then not sure how to handle, especially since obviously I need to handle messages in order.
I did see this akka issue, which seems to be somewhat related, but not exactly what I need (I don't think?).
https://github.com/akka/akka/issues/20096
Any help would be appriciated

Folding sounds like a sensible option. Handling future in your streams can be done using (e.g.)
flowOfFutures.mapAsync(parallelism = 3)(identity)
Please note that mapAsync does preserve the order of the incoming messages, as per docs.
On a different note, other sensible precautions to handle streamed WS messages could be to use completionTimeout and limit to bound time and space for the message to fold (e.g.)
stream.limit(x).completionTimeout(5 seconds).runFold(...)

The ultimate answer based on the below (thanks to svezfaz) answer turned out to be something like this:
val route: Route =
get {
handleWebSocketMessages(createFlow())
}
def createFlow(): Flow[Message, Message, Any] = Flow[Message]
.collect {
case TextMessage.Strict(msg) ⇒
Future.successful(MyCaseClass(msg))
case TextMessage.Streamed(stream) => stream
.limit(100) // Max frames we are willing to wait for
.completionTimeout(5 seconds) // Max time until last frame
.runFold("")(_ + _) // Merges the frames
.flatMap(msg => Future.successful(MyCaseClass(msg)))
}
.mapAsync(parallelism = 3)(identity)
.via(createActorFlow())
.map {
case msg: String ⇒ TextMessage.Strict(msg)
}
def createActorFlow(): Flow[MyCaseClass, String, Any] = {
// Set Up Actors as source and sink (not shown)
Flow.fromSinkAndSource(in, out)
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Akka Streams efficiently fold/merge substreams (WebSocket Frames -> Messages) - websocket

Related

Linearization in Reactor Netty (Spring Boot Webflux)

How do I use multiple reactive streams in the same pipeline?

RunnableGraph to wait for multiple response from source

Java 8 JPA Repository Stream produce two (or more) results?

Consuming both Strict and Streamed WebSocket Messages in Akka

Categories

Resources