Java 9 reactive streams. Why the need of Processor? - java-9

I've been seeing in Java the reactive streams concept which tends to standardize javaRX and Spring reactive concepts. Everything good except the fact that in order to do some transformation of the streams you need to implement one ore more processor. My question is regarding the need of the Processor interface Processor which extends Subscriber, Publisher
It seems that you passed a fun where you do the transformation and you couple it to both producer and subscriber and that's it ! But some questions arise:
How do you handle the backpressure from the client/subscriber. If client ask for 10 elements you don't know in Processor how many elements you should ask further to Producer. I've seen examples asking for 1 or Int.MAX elements
What's the fuzz about it ? Because from what i've observed it just carries a fun where you do the transformation, the fun is passed to the constructor and it is call later when the item flows through it (and that's it). So couldn't we've achieved this directly in producer or subscriber ? (i know that you want separation of concerns but you can eliminate problem 1)
You can see a basic example here: https://www.concretepage.com/java/java-9/java-reactive-streams .
In the processor method onNext, you can see that processor asks for 1 element and this is bothering me: what about the backpressure from the subscriber side ? What if the subscriber asked for 100 elements once in a batch ? Shouldn't processor focus only on the processing side and it shouldn't asked for elements ?
#Override
public void onNext(Article item) {
subscription.request(1);
submit(function.apply(item));
}
Thanks !

Related

spring cloud stream when a lot of different event types

I want an advise according usage of spring cloud stream technologies.
Currently my service use spring-boot and implements some event-based approaches.
But the events are not sent to some kind of broker, but are simply handled by handlers in separate threads.
I am interested in spring cloud stream technology.
I have implemented CustomMessageRoutingCallback as shown in this example https://github.com/spring-cloud/spring-cloud-stream-samples/tree/main/routing-samples/message-routing-callback.
But the problem, that declaring all consumers at config in this way sounds like a pain:
#Bean
public Consumer<Menu> menuConsumer(){
return menu -> log.info(menu.toString());
}
Because I have around 50-60 different event types. Is where any way to register consumers dynamicly? Or the better way will be declare consumer with some raw input type, then deserialize message in consumer and manually route message to the right consumer?
This really has nothing to do with s-c-stream and more of an architectural question. If you have 50+ different event types having that many diff3rent consumers would be the least of your issues. The question I would be asking - is it really feasible to trust a single application to process that many different event types? What if a single event processing results in the system failure. Are you willing to live with non of the events being processed until the problem is fixed?
This is just an example, but there are many other architectural questions that would need to be answered before you can select a technology
A possible option is to create a common interface for your events
#Bean
public Consumer<CommonIntefaceType> menuConsumer(){
return commonIntefaceTypeObj -> commonIntefaceTypeObj.doSomething();
}

How to think in reactive programming manner and convert traditional oops application into reactive application

In the traditional way of writing an application, I have divided the application into a set of tasks and execute them sequentially.
Get a list of rules for a given rule group from Redis
Construct facts input and fire the rules.
compute a response for the request by hitting multiple rules (Rule group A might depend on the Rule group B result).
send the response back to the caller.
If I was to implement the above steps using the spring web flux reactive manner, how do I achieve it?
I have used ReactiveRedis to get the data from redis.
ReactiveRedisOperations.opsForValue().get(ruleGroupName) does not return anything until we subscribe() to it. But ReactiveRedisOperations.opsForValue().get(ruleGroupName).subscribe() makes the processing thread reactive and the execution goes to next line in the application without waiting for the Subscriber to execute.
As my next steps depend on the data returned by Redis, I have used the block() option to make it wait.
In the real-world how does one tackle a situation like this? Thanks in advance.
PS: New to spring web flux and reactive programming.
Instead of separating logical steps by putting them on a new line, like in imperative programming, reactive programming uses method composition and chaining (the operators).
So once you get a Flux<T> or a Mono<T> (here your rules from Redis), you need to chain operators to build up your processing steps in a declarative manner.
A step that transforms each input element <T> into a single corresponding element <R>, in memory and without latency, is typically expressed as a map(Function<T, R>) and produces a Flux<R>. In turn, chain further operators on that.
A step that either transforms 1 element <T> to N elements <R> and/or does so asynchronously (ie the transformation returns a Flux<R> for each T) is typically expressed as a flatMap(Function<T, Publisher<R>>).
Beyond that, there is a rich vocabulary of specialized operators in Reactor that you can explore.
In the end, your goal is to chain all these operators to describe your processing pipeline, which is going to be a Mono<RETURN_TYPE> or Flux<RETURN_TYPE>. RETURN_TYPE in webflux can either be a business type that Spring can marshall or one of the Spring response-oriented classes.

Spring Data MongoDB Reactive - Dealing with findAll for a large number of documents?

Let's say I have a ReactiveMongoRepository defined like this:
#Repository
interface MyRepo extends ReactiveMongoRepository<MyDTO, String> {}
Given that the repository contains a lot of MyData documents (hundreds of thousands at least) and you do a simple "findAll()" followed by a deletion:
myRepo.findAll()
.doOnNext( myDto -> {
System.out.println(myDto.message);
})
.flatMap( myDto -> {
myRepo.deleteById(myDto.id);
})
This will be executed roughly once a month.
Is it safe to use Spring Data / MongoDB like this when streaming large sets of data? Or is it recommended to using some sort of batching or pagination to avoid cursor issues etc?
The general answer is it depends, but in your specific case in my opinion is no, at least not in your presented way
first of all, I guess that a find all operation, for all collection has very few sense.
I suppose that find an use case that need to handle hundreds of thousands is near to impossible, supposing that you have implement a data ingestion pipeline ok you have handle an infinite stream of data but for this use case a more I can suggest a more suitable architecture like streaming with kafka using spring cloud stream for example.
The problem is not the possibility of handle many data because the mongo reactive drive is very performant and tuning the back pressure mechanism you should save your server but repeat using a find all in streaming so big is few applicable, probably if you should handle a stream of data a messaging middleware with spring cloud stream may be the best option, imaging that you fire a find all ok your server and mogno probably will fine but your user will attend many hours before the request will finished, otherwise if the use case is a of line process as said before ok for processing an infinite data stream spring cloud stream may be the best option
UPDATE
Considering the use case of a lets say batch that should be ran one times per month I can say that the music change a lot.
Reading the code of Spring data reactive mongo I see that:
#NoRepositoryBean
public interface ReactiveMongoRepository<T, ID> extends ReactiveSortingRepository<T, ID>, ReactiveQueryByExampleExecutor<T> {
....
}
instead of
#NoRepositoryBean
public interface MongoRepository<T, ID> extends PagingAndSortingRepository<T, ID>, QueryByExampleExecutor<T> {
...
}
The key point of attention here is that the reactive version of the repository do not has the pagination feature in fact the name of base interface do not contains the word Paging, the key point here is the kind of technology.
In the blocking io the pagination is necessary for the model one thread per req and a so blocking pattern is dangerous for database application and so on busy a connection and the client for all the query is dangerous for timeout, load and so on and the split the query in page can help to not stress too much the system. But in a no blocking io the behavior is different you are attaching to a stream of data, the driver is a no blocking driver you do not use the classical mongo driver, spring data use the specific reactive mongo drive that is optimized for this job and it is based on a event loop model.
said that the key point here is that use a io intensive model for a off line profess probably is not so useful rather than safe, I mean using the reactive model is useful for software that are mainly io bound and with high traffic, the model support the high concurrency. But if your use case is a clean collection one times per month I guess that probably use reactive programming is safe since that is thought for support io intensive use case but in this case a classical batch blocking io model with pagination is a more suitable approach. The key point is i suppose that it should be safe the driver is thougth for manage a lot of data in high and streaming use case but it is useless use this approach for a batch use case
I hope that it can help you

Spring 5 WebFlux Mono and Flux

In Spring 5 I just know Spring WebFlux Handler method handles the request and returns Mono or Flux as response.
#Component
public class HelloWorldHandler {
public Mono<ServerResponse> helloWorld(ServerRequest request) {
return ServerResponse.ok().contentType(MediaType.TEXT_PLAIN).body(BodyInserters.fromObject("Hello World"));
}
}
But I have no idea what means Mono and Flux and how it works with the WebFlux Handler.
Can any one simply explain
1.What means Mono and Flux.
2.How it works with the WebFlux Handler.
Thanks in advance.
Webflux is all about reactive programming, which in summary means that business logic is only executed as soon as the data to process it is available (reactive).
This means you no longer can return simple POJO's, but you have to return something else, something that can provide the result when it's available. Within the reactive streams initiative, this is called a Publisher. A Publisher has a subcribe() method that will allow the consumer to get the POJO when it's available.
A Publisher (for example Publisher<Foo>) can return zero or multiple, possibly infinite, results. To make it more clear how many results you can expect, Project Reactor (the reactive streams implementation of Pivotal) introduced two implementations of Publisher:
A Mono, which will complete after emitting a single result.
A Flux, which will emit zero or multiple, possibly infinite, results and then completes.
So, basically you can see Mono<Foo> as the reactive counterpart of returning Foo and Flux<Foo> as the reactive counterpart of Collection<Foo>.
For example:
Flux
.just(1, 2, 3, 4)
.map(nr -> nr * 2)
.subscribe(System.out::println);
Even though the numbers are already available (you can see them), you should realize that since it's a Flux, they're emitted one by one. In other cases, the numbers might come from an external API and in that case they won't be immediately available.
The next phase (the map operator), will multiply the number as soon as it retrieves one, this means that it also does this mapping one by one and then emit the new value.
Eventually, there's a subscriber (there should always be one, but it could be the Spring framework itself that's subscribing), and in this case it will print each value it obtains and print it to the console, also, one by one.
You should also realize that there's no particular order when processing these items. It could be that the first number has already been printed on the console, while the third item is hasn't been multiplied by two yet.
So, in your case, you have a Mono<ServerResponse>, which means that as soon as the ServerResponse is available, the WebFlux framework can utilize it. Since there is only one ServerResponse expected, it's a Mono and not a Flux.

Cannot use 'subscribe' or 'subscribeWith' with 'ReactorNettyWebSocketClient' in Kotlin

The Kotlin code below successfully connects to a Spring WebFlux server, sends a message and prints each message sent via the stream that is returned.
fun main(args: Array<String>) {
val uri = URI("ws://localhost:8080/myservice")
val client = ReactorNettyWebSocketClient()
val input = Flux.just(readMsg())
client.execute(uri) { session ->
session.send(input.map(session::textMessage))
.thenMany(
session.receive()
.map(WebSocketMessage::getPayloadAsText)
.doOnNext(::println) // want to replace this call
.then()
).then()
}.block()
}
In previous experience with Reactive programming I have always used subscribe or subscribeWith where the call to doOnNext occurs. However it will not work in this case. I understand that this is because neither returns the reactive stream in use - subscribe returns a Disposable and subscribeWith returns the Subscriber it received as a parameter.
My question is whether invoking doOnNext is really the correct way to add a handler to process incoming messages?
Most Spring 5 tutorials show code which either calls this or log, but some use subscribeWith(output).then() without specifying what output should be. I cannot see how the latter would even compile.
subscribe and subscribeWith should always be used right at the end of a chain of operators, not as intermediate operators.
Simon already provided the answer but I'll add some extra context.
When composing asynchronous logic with Reactor (and ReactiveX patterns) you build an end-to-end chain of processing steps, which includes not only the logic of the WebSocketHandler itself but also that of the underlying WebSocket framework code responsible for sending and receiving messages to and from the socket. It's very important for the entire chain to be connected together, so that at runtime "signals" will flow through it (onNext, onError, or onComplete) from start to end and communicate the final result, i.e where you have the .block() at the end.
In the case of WebSocket this looks a little daunting because you're essentially combining two or more streams into one. You can't just subscribe to one of those streams (e.g. for inbound messages) because that prevents composing a unified processing stream, and signals will not flow through to the end where the final outcome is expected.
The other side of this is that subscribe() triggers consumption on a stream whereas what you really want is to keep composing asynchronous logic in deferred mode, i.e. declaring all that will happen when data materializes. This is another reason why composing a single unified chain is important. So it can be triggered after it is fully declared.
In short the main difference with the imperative WebSocketHandler for the Servlet world, is that instead of it being a handler for individual messages, this is a handler for composing the complete streams. Here the handling of an individual message is just one step of the overall processing chain. So the only place to subscribe is at the very end, where .block() is, in order to kick off processing.
BTW since this question was first posted a few months ago, the documentation has been improved to provide more guidance on how to implement a WebSocketHandler.

Resources