Processing incoming payloads as batch not working as expected in spring-cloud-streams - spring

I say 'not working as expected' but actually is more like 'I don't really know if I'm doing the proper work in here', I feel like I'm mixing stuff from different approaches and doesn't really correlate.
Right now I've been using Spring Cloud Streams to process String-type messages from a PubSub subscription and so far so good, message in message out without much of a hassle.
What I'm trying to achieve now is to gather, let's say, 1000 messages, process them and send them altogether to another PubSub Topic. Still unsure about sending them as a List or individually like now, but all at the same time (this shouldn't be related to this question though).
Now I just discovered the following property.
spring.cloud.stream.bindings.input.consumer.batch-mode=true
Together with the following ones more specific to the GCP stuff.
spring.cloud.gcp.pubsub.publisher.batching.enabled=true
spring.cloud.gcp.pubsub.publisher.batching.delay-threshold-seconds=300
spring.cloud.gcp.pubsub.publisher.batching.element-count-threshold=100
So first question is... Are they linked by any means? Must I have the first one together with the other three?
What happened after I added the previous properties to my application.properties file is actually no change at all. Messages keep arriving and leaving the application without any issue and with no batch approach whatsoever.
Currently using the functional features the following way.
#Bean
public Function<Message<String>, String> sampleFunction() {
... // Stream processing in here
return processedString;
}
I was expecting this to crash with some message since the method only receives a String, not a list of String. Since it didn't crash, I modified the method above to receive a list of String (maybe Spring does some magic behind the scenes to still receive messages as String but collect them in a list for the method to process afterwards?).
#Bean
public Function<Message<List<String>>, String> sampleFunction() {
... // Stream processing in here
return processedString;
}
But this just crashes since it's trying to parse a single String message as a List of String.
How could I prepare the code to batch all those String messages into a List? Is there any example on this?

...batch-mode only works with binders that support it (e.g. Kafka, RabbitMQ). It doesn't look like the GCP binder supports it (I see no references to the property).
https://github.com/spring-cloud/spring-cloud-gcp/blob/master/spring-cloud-gcp-pubsub-stream-binder/src/main/java/org/springframework/cloud/gcp/stream/binder/pubsub/PubSubMessageChannelBinder.java
https://docs.spring.io/spring-cloud-stream/docs/3.1.0/reference/html/spring-cloud-stream.html#_batch_consumers
Publisher batching is not related to consumer batching.

Related

Workflow modeling problem in Spring Integration

I have a problem creating/modeling integration flow for the next global use case:
Input to the system is some kind of Message. That message goes
through Splitter and Transformer Endpoint and after that on
ServiceActivator where that transformed message is processed. This
use case is clear for me.
Confusion occurs because of the next part. After the ServiceActivator
finishes processing I need to took the base Message (message from the
beginning of first part) again and put it in other processing, for example again through Splitter and Transformer. How can
I model that use case? Can I return the message payload to that base
value? Is there some component that could help me?
Hope I describe it well.
Your use-case sounds more like a PublishSubscribeChannel: https://docs.spring.io/spring-integration/docs/current/reference/html/core.html#channel-implementations-publishsubscribechannel. So, you are going to have several subscribers (splitters) for that channel and the same input message is going to be processed in those independent sub-flows. You even can do that in parallel if you configure an Executor into that PublishSubscribeChannel.
Another way, if you can do that in parallel and you still need some result from that ServiceActivator to be available alongside with an original message for the next endpoint or so, then you can use a HeaderEnricher to store an original message in the headers ad get access to it whenever you need in your flow: https://docs.spring.io/spring-integration/docs/current/reference/html/message-transformation.html#header-enricher

Spring-integration: keep a context for a Message throught a chain

I am using spring-integration, and I have messages that goes through an int:chain with multiple elements: int:service-activator, int:transformers, etc. In the end, a message is sent to another app's Rest endpoint. There is also an errorHandler that will save any Exception in a text file.
For administration purpose, I would like to keep some information about what happened in the chain (ex: "this DB call returned this", "during this transformation, this rule was applied", etc.). This would be equivalent to a log file, but bound to a Message. Of course there is already a logger, but in the end, I need to create (either after the Rest called is made, or when an error occurs) a file for this specific Message with the data.
I was wondering if there was some kind of "context" for the Message that I could call through any part of the chain, and where I could store stuff. I didn't found anything in the official documentation, but I'm not really sure about what to look for.
I've been thinking about putting it all in the Message itself, but:
It's an immutable object, so I would need to rebuild it each time I want to add something to its header (or the payload).
I wouldn't be able to retrieve any new data from the error handler in case of Exception, because it takes the original message.
I can't really add it to the payload object because some native transformers/service-activators are directly using it (and that would also mean rewriting a lot of code ...)
I've been also thinking to some king of "thread-bound" bean that would act as a context for each Message, but I see too many problem arising from this.
Maybe I'm wrong about some of these ideas. Anyway, I just need a way to keep data though multiple element of a Spring integration chain and also be able to access it in the error handler.
Add a header, e.g. a map or list, and add to it in each stage.
The framework does something similar when message history is enabled.

Cannot use 'subscribe' or 'subscribeWith' with 'ReactorNettyWebSocketClient' in Kotlin

The Kotlin code below successfully connects to a Spring WebFlux server, sends a message and prints each message sent via the stream that is returned.
fun main(args: Array<String>) {
val uri = URI("ws://localhost:8080/myservice")
val client = ReactorNettyWebSocketClient()
val input = Flux.just(readMsg())
client.execute(uri) { session ->
session.send(input.map(session::textMessage))
.thenMany(
session.receive()
.map(WebSocketMessage::getPayloadAsText)
.doOnNext(::println) // want to replace this call
.then()
).then()
}.block()
}
In previous experience with Reactive programming I have always used subscribe or subscribeWith where the call to doOnNext occurs. However it will not work in this case. I understand that this is because neither returns the reactive stream in use - subscribe returns a Disposable and subscribeWith returns the Subscriber it received as a parameter.
My question is whether invoking doOnNext is really the correct way to add a handler to process incoming messages?
Most Spring 5 tutorials show code which either calls this or log, but some use subscribeWith(output).then() without specifying what output should be. I cannot see how the latter would even compile.
subscribe and subscribeWith should always be used right at the end of a chain of operators, not as intermediate operators.
Simon already provided the answer but I'll add some extra context.
When composing asynchronous logic with Reactor (and ReactiveX patterns) you build an end-to-end chain of processing steps, which includes not only the logic of the WebSocketHandler itself but also that of the underlying WebSocket framework code responsible for sending and receiving messages to and from the socket. It's very important for the entire chain to be connected together, so that at runtime "signals" will flow through it (onNext, onError, or onComplete) from start to end and communicate the final result, i.e where you have the .block() at the end.
In the case of WebSocket this looks a little daunting because you're essentially combining two or more streams into one. You can't just subscribe to one of those streams (e.g. for inbound messages) because that prevents composing a unified processing stream, and signals will not flow through to the end where the final outcome is expected.
The other side of this is that subscribe() triggers consumption on a stream whereas what you really want is to keep composing asynchronous logic in deferred mode, i.e. declaring all that will happen when data materializes. This is another reason why composing a single unified chain is important. So it can be triggered after it is fully declared.
In short the main difference with the imperative WebSocketHandler for the Servlet world, is that instead of it being a handler for individual messages, this is a handler for composing the complete streams. Here the handling of an individual message is just one step of the overall processing chain. So the only place to subscribe is at the very end, where .block() is, in order to kick off processing.
BTW since this question was first posted a few months ago, the documentation has been improved to provide more guidance on how to implement a WebSocketHandler.

why does spring cloud stream's #InboundChannelAdapter accept no parameters?

I'm trying to use spring cloud stream to send and receive messages on kafka. The examples for this use a simple example of using time stamps as the messages. I'm trying to go just one step further into a real world application when I ran into this blocker on the InboundChannelAdapter docs:
"A method annotated with #InboundChannelAdapter can't accept any parameters"
I was trying to use it like so:
#InboundChannelAdapter(value = ChannelManager.OUTPUT)
public EventCreated createCustomerEvent(String customerId, String thingId) {
return new EventCreated(customerId, thingId);
}
What usage am I missing? I imagine that when you want to create an event, you have some data that you want to use for that event, and so you would normally pass that data in via parameters. But "A method annotated with #InboundChannelAdapter can't accept any parameters". So how are you supposed to use this?
I understand that #InboundChannelAdapter comes from spring-integration, which spring-cloud-stream extends, and so spring-integration may have a different context in which this makes sense. But it seems un-intuitive to me (as does using an _INBOUND_ChannelAdapter for an output/producer/source)
Well, first of all the #InboundChannelAdapter is defined exactly in Spring Integration and Spring Cloud Stream doesn't extend it. That's false. Not sure where you have picked up that info...
This annotation builds something like SourcePollingChannelAdapter which provides a poller based on the scheduler and calls periodically a MessageSource.receive(). Since there is no any context and end-user can't effect that poller's behavior with his own arguments, the requirement for empty method parameters is obvious.
This #InboundChannelAdapter is a beginning of the flow and it is active. It does its logic on background without your events.
If you would like to call some method with parameters and trigger with that some flow, you should consider to use #MessagingGateway: http://docs.spring.io/spring-integration/reference/html/messaging-endpoints-chapter.html#messaging-gateway-annotation
How are you expecting to call that method? I think there was a miscommunication with your statement "stream extends integration" and Artem probably understood that we extend #InboundChannelAdatper
So, if you are actively calling this method, as it appears since you do have arguments that are passed to it, why not just using your source channel to send the data?
Usually sources do not require arguments as they are either push like the twitter stream that taps on twitter, listen for events and pushes them to the source channel, or they are polled, in which case, they are invoked on an interval defined via a poller.
As Artem pointed, if your intention is to call this method from your business flow, and deal with the return while triggering a message flow, then check his link from the docs.

Strategy for passing same payload between messages when optional outbound gateways fail

I have a workflow whose message payload (MasterObj) is being enriched several times. During the 2nd enrichment () an UnknownHostException was thrown by an outbound gateway. My error channel on the enricher is called but the message the error-channel receives is an exception, and the failed msg in that exception is no longer my MasterObj (original payload) but it is now the object gotten from request-payload-expression on the enricher.
The enricher calls an outbound-gateway and business-wise this is optional. I just want to continue my workflow with the payload that I've been enriching. The docs say that the error-channel on the enricher can be used to provide an alternate object (to what the enricher's request-channel would return) but even when I return an object from the enricher's error-channel, it still takes me to the workflow's overall error channel.
How do I trap errors from enricher's + outbound-gateways, and continue processing my workflow with the same payload I've been working on?
Is trying to maintain a single payload object for the entire workflow the right strategy? I need to be able to access it whenever I need.
I was thinking of using a bean scoped to the session where I store the payload but that seems to defeat the purpose of SI, no?
Thanks.
Well, if you worry about your MasterObj in the error-channel flow, don't use that request-payload-expression and let the original payload go to the enricher's sub-flow.
You always can use in that flow a simple <transformer expression="">.
On the other hand, you're right: it isn't good strategy to support single object through the flow. You carry messages via channel and it isn't good to be tied on each step. The Spring Integration purpose is to be able to switch from different MessageChannel types at any time with small effort for their producers and consumers. Also you can switch to the distributed mode when consumers and producers are on different machines.
If you still need to enrich the same object several times, consider to write some custom Java code. You can use a #MessagingGateway on the matter to still have a Spring Integration gain.
And right, scope is not good for integration flow, because you can simply switch there to a different channel type and lose a ThreadLocal context.

Resources