Message processing guarantees with spring-cloud-stream-binder-kafka functional binding - spring

Given default configuration and this binding
#Bean
public Function<Flux<Message<Input>>, Flux<Message<Output>>> process() {
return input -> input
.map(message -> {
// simplified
return MessageBuilder.build();
});
}
Is there any guarantee that input message offset is commited after output is written to Kafka? I don´t need full Transactions, and I can live with at-least-once delivery and possible duplicates, but I cannot loose output message. I was unable to find this exact scenario in docs, and I believe previous channel-based binding worked as I need it to, since it was blocking by nature, but I am not sure about functional.

Related

Multithreaded Use of Spring Pulsar

I am working on a project to read from our existing ElasticSearch instance and produce messages in Pulsar. If I do this in a highly multithreaded way without any explicit synchronization, I get many occurances of the following log line:
Message with sequence id X might be a duplicate but cannot be determined at this time.
That is produced from this line of code in the Pulsar Java client:
https://github.com/apache/pulsar/blob/a4c3034f52f857ae0f4daf5d366ea9e578133bc2/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ProducerImpl.java#L653
When I add a synchronized block to my method, synchronizing on the pulsar template, the error disappears, but my publish rate drops substantially.
Here is the current working implementation of my method that sends Protobuf messages to Pulsar:
public <T extends GeneratedMessageV3> CompletableFuture<MessageId> persist(T o) {
var descriptor = o.getDescriptorForType();
PulsarPersistTopicSettings settings = pulsarPersistConfig.getSettings(descriptor);
MessageBuilder<T> messageBuilder = Optional.ofNullable(pulsarPersistConfig.getMessageBuilder(descriptor))
.orElse(DefaultMessageBuilder.DEFAULT_MESSAGE_BUILDER);
Optional<ProducerBuilderCustomizer<T>> producerBuilderCustomizerOpt =
Optional.ofNullable(pulsarPersistConfig.getProducerBuilder(descriptor));
PulsarOperations.SendMessageBuilder<T> sendMessageBuilder;
sendMessageBuilder = pulsarTemplate.newMessage(o)
.withSchema(Schema.PROTOBUF_NATIVE(o.getClass()))
.withTopic(settings.getTopic());
producerBuilderCustomizerOpt.ifPresent(sendMessageBuilder::withProducerCustomizer);
sendMessageBuilder.withMessageCustomizer(mb -> messageBuilder.applyMessageBuilderKeys(o, mb));
synchronized (pulsarTemplate) {
try {
return sendMessageBuilder.sendAsync();
} catch (PulsarClientException re) {
throw new PulsarPersistException(re);
}
}
}
The original version of the above method did not have the synchronized(pulsarTemplate) { ... } block. It performed faster, but generated a lot of logs about duplicate messages, which I knew to be incorrect. Adding the synchronized block got rid of the log messages, but slowed down publishing.
What are the best practices for multithreaded access to the PulsarTemplate? Is there a better way to achieve very high throughput message publishing?
Should I look at using the reactive client instead?
EDIT: I've updated the code block to show the minimum synchronization necessary to avoid the log lines, which is just synchronizing during the .sendAsync(...) call.
Your usage w/o the synchronized should work. I will look into that though to see if I see anything else going on. In the meantime, it would be great to give the Reactive client a try.
This issue was initially tracked here, and the final resolution was that it was an issue that has been resolved in Pulsar 2.11.
Please try updating the Pulsar 2.11.

Timeout on replyChannel when wireTap is used

We are using wireTap to take timestamps at different parts of the flow. When introduced to the newest flow, it started causing a timeout in the replyChannel. From what I understand from the documentation, wireTap does intercept the message and sends it to secondary channel, while not affecting the main flow - so it looks like the perfect thing to use to take snapshots of said timestamps. Are we using wrong component for the job, or is there something wrong with the configuration? And if so, how would you recommend to register such information?
The exception:
o.s.integration.core.MessagingTemplate : Failed to receive message from channel 'org.springframework.messaging.core.GenericMessagingTemplate$TemporaryReplyChannel#21845b0d' within timeout: 1000
The code:
#Bean
public MarshallingWebServiceInboundGateway inboundGateway(Jaxb2Marshaller jaxb2Marshaller,
DefaultSoapHeaderMapper defaultSoapHeaderMapper) {
final MarshallingWebServiceInboundGateway inboundGateway =
new MarshallingWebServiceInboundGateway(jaxb2Marshaller);
inboundGateway.setRequestChannelName(INPUT_CHANNEL_NAME);
inboundGateway.setHeaderMapper(defaultSoapHeaderMapper);
return inboundGateway;
}
#Bean
public IntegrationFlow querySynchronous() {
return IntegrationFlows.from(INPUT_CHANNEL_NAME)
.enrichHeaders(...)
.wireTap(performanceTimestampRegistrator.registerTimestampFlow(SYNC_REQUEST_RECEIVED_TIMESTAMP_NAME))
.handle(outboundGateway)
.wireTap(performanceTimestampRegistrator.registerTimestampFlow(SYNC_RESPONSE_RECEIVED_TIMESTAMP_NAME))
//.transform( m -> m) // for tests - REMOVE
.get();
}
And the timestamp flow:
public IntegrationFlow registerTimestampFlow(String asyncRequestReceivedTimestampName) {
return channel -> channel.handle(
m -> MetadataStoreConfig.registerFlowTimestamp(m, metadataStore, asyncRequestReceivedTimestampName));
}
The notable thing here is that if I uncomment the no-operation transformer, everything suddenly works fine, but it doesn't sound right and I would like to avoid such workarounds.
Another thing is that the other, very similar flow works correctly, without any workarounds. Notable difference being it puts message in kafka using kafka adapter, instead of calling some web service with outbound gateway. It still generates response to handle (with generateResponseFlow()), so it should behave the same way. Here is the flow, which works fine:
#Bean
public MarshallingWebServiceInboundGateway workingInboundGateway(Jaxb2Marshaller jaxb2Marshaller,
DefaultSoapHeaderMapper defaultSoapHeaderMapper, #Qualifier("errorChannel") MessageChannel errorChannel) {
MarshallingWebServiceInboundGateway aeoNotificationInboundGateway =
new MarshallingWebServiceInboundGateway(jaxb2Marshaller);
aeoNotificationInboundGateway.setRequestChannelName(WORKING_INPUT_CHANNEL_NAME);
aeoNotificationInboundGateway.setHeaderMapper(defaultSoapHeaderMapper);
aeoNotificationInboundGateway.setErrorChannel(errorChannel);
return aeoNotificationInboundGateway;
}
#Bean
public IntegrationFlow workingEnqueue() {
return IntegrationFlows.from(WORKING_INPUT_CHANNEL_NAME)
.enrichHeaders(...)
.wireTap(performanceTimestampRegistrator
.registerTimestampFlow(ASYNC_REQUEST_RECEIVED_TIMESTAMP_NAME))
.filter(...)
.filter(...)
.publishSubscribeChannel(channel -> channel
.subscribe(sendToKafkaFlow())
.subscribe(generateResponseFlow()))
.wireTap(performanceTimestampRegistrator
.registerTimestampFlow(ASYNC_REQUEST_ENQUEUED_TIMESTAMP_NAME))
.get();
}
Then, there is no problem with wireTap being the last component and response is correctly received on replyChannel in time, without any workarounds.
The behavior is expected.
When the wireTap() (or log()) is used in the end of flow, there is no reply by default.
Since we can't assume what logic you try to include into the flow definition, therefore we do our best with the default behavior - the flow becomes a one-way, send-and-forget one: some people really asked to make it non replyable after log() ...
To make it still reply to the caller you need to add a bridge() in the end of flow.
See more in docs: https://docs.spring.io/spring-integration/docs/current/reference/html/dsl.html#java-dsl-log
It works with your much complex scenario because one of the subscriber for your publishSubscribeChannel is that generateResponseFlow() with the reply. Honestly you need to be careful with request-reply behavior and such a publishSubscribeChannel configuration. The replyChannel can accept only one reply and if you would expect a reply from several subscribers, you would be surprised how the behavior is strange.
The wireTap in this your configuration is not a subscriber, it is an interceptor injected into that publishSubscribeChannel. So, your assumption about similarity is misleading. There is the end of the flow after that wiretap, but since one of the subscribers is replying, you get an expected behavior. Let's take a look into the publishSubscribeChannel as a parallel electrical circuit where all the connections get an electricity independently of others. And they perform they job not affecting all others. Anyway this is different story.
To conclude: to reply from the flow after wireTap(), you need to specify a bridge() and reply message will be routed properly into the replyChannel from the caller.

How to do Event-Driven Microservices with quarkus and smallrye correctly

Dears,
I am trying to do some kind of event-driven Microservices. Currently, I was able to consume a message from Kafka and update database record when message is received using Quarkus & Smallrye-Reactive messaging extension. What I want to achieve further is to be able to send a message to other topic in case of success and send a message to error topic otherwise. I know that we can use return and #outgoing annotation for emitting new message but I don't think it will fit in my use case. I need a guidance here, if error happens while consuming a message. Should I return message to the original topic (by not acknowledging the message) or should I consume it and produce error message to different topic to rollback the original transaction.
Here is my code :
#Incoming("new-payment")
public void newMessage(String msg) {
LOG.info("New payment has been received.");
LOG.info("Payload is {}", msg);
PaymentEvent pe = jsob.fromJson(msg, PaymentEvent.class);
mysqlPool.preparedQuery("select totalBuyers from Book where isbn = ? ",
Tuple.of(pe.getIsbn()))
.thenApply(rs -> {
RowIterator<Row> iterator = rs.iterator();
if (iterator.hasNext()) {
return iterator.next().getInteger(0) + 1;
} else {
return Integer.valueOf(0);
}
})
.thenApply(totalCount -> {
return mysqlPool.preparedQuery("update Book set totalBuyers = ?",
Tuple.of(totalCount));
})
.whenComplete((rs, err) -> {
if (err != null) {
//Emit an error to error topic.
} else {
//Emit a msg to other service.
}
});
}
Also if you've better code please submit, I am still newbie in reactive programming :).
I've been doing enterprise integration for years and I think that you would want to do both.
Should I return message to the original topic (by not acknowledging
the message) or should I consume it and produce error message to
different topic to rollback the original transaction.
The event should remain on the topic for another instance to potentially pick up and process. And an error message should be logged as an event. Perhaps the same consumer could pick up and reprocess the event successfully.
An EDA (Event Driven Architecture) may offer different ways to handle this but on an ESB the message would be marked as tried. Generally three tried attempts would send it to a dead-letter queue so that it can be corrected and reprocessed later.
Our enterprise is also starting to design and build applications using EDA so I am interested to read what others have to say on this question. And KUDOS to you for focusing on Quarkus. I believe that this is one of the best technologies to come from Redhat that I have seen yet!
Another problem with this approach is that you are doing “2 writes in 1 service” e.g. one call to the db and another one to a topic. And this can become problematic when one of the 2 writes fails.
If you want to avoid this and use a pure event driven approach, then you need to reorder your events in such a way that writing to a db is the last event in the whole flow so that you can prevent 2 writes from 1 service.
Thus in your case: change the 2nd thenApply(..) method from updating the db into firing a new event to another topic. And the consumer of this new topic should do the db update. Thus the flow becomes like this:
Producer -> topic1 -> consumer (select from ...) & fire event to another topic -> topic2 -> consumer (update table).

Suppress triggers events only when new events are received on the stream

I am using Kafka streams 2.2.1.
I am using suppress to hold back events until a window closes. I am using event time semantics.
However, the triggered messages are only triggered once a new message is available on the stream.
The following code is extracted to sample the problem:
KStream<UUID, String>[] branches = is
.branch((key, msg) -> "a".equalsIgnoreCase(msg.split(",")[1]),
(key, msg) -> "b".equalsIgnoreCase(msg.split(",")[1]),
(key, value) -> true);
KStream<UUID, String> sideA = branches[0];
KStream<UUID, String> sideB = branches[1];
KStream<Windowed<UUID>, String> sideASuppressed =
sideA.groupByKey(
Grouped.with(new MyUUIDSerde(),
Serdes.String()))
.windowedBy(TimeWindows.of(Duration.ofMinutes(31)).grace(Duration.ofMinutes(32)))
.reduce((v1, v2) -> {
return v1;
})
.suppress(Suppressed.untilWindowCloses(Suppressed.BufferConfig.unbounded()))
.toStream();
Messages are only streamed from 'sideASuppressed' when a new message gets to 'sideA' stream (messages arriving to 'sideB' will not cause the suppress to emit any messages out even if the window closure time has passed a long time ago).
Although, in production the problem is likely not to occur much due to high volume, there are enough cases when it is essential not to wait for a new message that gets into 'sideA' stream.
Thanks in advance.
According to Kafka streams documentation:
Stream-time is only advanced if all input partitions over all input topics have new data (with newer timestamps) available. If at least one partition does not have any new data available, stream-time will not be advanced and thus punctuate() will not be triggered if PunctuationType.STREAM_TIME was specified. This behavior is independent of the configured timestamp extractor, i.e., using WallclockTimestampExtractor does not enable wall-clock triggering of punctuate().
I am not sure why this is the case, but, it explains why suppressed messages are only being emitted when messages are available in the queue it uses.
If anyone has an answer regarding why the implementation is such, I will be happy to learn. This behavior causes my implementation to emit messages just to get my the suppressed message to emit in time and causes the code to be much less readable.

Spring Integration Usage and Approach Validation

I am testing out using Spring Integration to tie together disperate modules within the same Spring-Boot application, for now, and services into a unified flow starting with a single-entry point.
I am looking for the following clarifications with Spring Integration if possible:
Is the below code the right way to structure flows using the DSL?
In "C" below, can i bubble up the result to the "B" flow?
Is using the DSL vs. the XML the better approach?
I am confused as to how to correctly "terminate" a flow?
Flow Overview
In the code below, I am just publishing a page to a destination. The overall flow goes like this.
Publisher flow listens for the payload and splits it into parts.
Content flow filters out pages and splits them into parts.
AWS flow subscribes and handles the part.
File flow subscribes and handles the part.
Eventually, there may be additional and very different types of consumers to the Publisher flow which are not content which is why I split the publisher from the content.
A) Publish Flow (publisher.jar):
This is my "main" flow initiated through a gateway. The intent, is that this serves as the entry point to begin trigger all publishing flows.
Receive the message
Preprocess the message and save it.
Split the payload into individual entries contained in it.
Enrich each of the entries with the rest of the data
Put each entry on the output channel.
Below is the code:
#Bean
IntegrationFlow flowPublish()
{
return f -> f
.channel(this.publishingInputChannel())
//Prepare the payload
.<Package>handle((p, h) -> this.save(p))
//Split the artifact resolved items
.split(Package.class, Package::getItems)
//Find the artifact associated to each item (if available)
.enrich(
e -> e.<PackageEntry>requestPayload(
m ->
{
final PackageEntry item = m.getPayload();
final Publishable publishable = this.findPublishable(item);
item.setPublishable(publishable);
return item;
}))
//Send the results to the output channel
.channel(this.publishingOutputChannel());
}
B) Content Flow (content.jar)
This module's responsibility is to handle incoming "content" payloads (i.e. Page in this case) and split/route them to the appropriate subscriber(s).
Listen on the publisher output channel
Filter the entries by Page type only
Add the original payload to the header for later
Transform the payload into the actual type
Split the page into its individual elements (blocks)
Route each element to the appropriate PubSub channel.
At least for now, the subscribed flows do not return any response - they should just fire and forget but i would like to know how to bubble up the result when using the pub-sub channel.
Below is the code:
#Bean
#ContentChannel("asset")
MessageChannel contentAssetChannel()
{
return MessageChannels.publishSubscribe("assetPublisherChannel").get();
//return MessageChannels.queue(10).get();
}
#Bean
#ContentChannel("page")
MessageChannel contentPageChannel()
{
return MessageChannels.publishSubscribe("pagePublisherChannel").get();
//return MessageChannels.queue(10).get();
}
#Bean
IntegrationFlow flowPublishContent()
{
return flow -> flow
.channel(this.publishingChannel)
//Filter for root pages (which contain elements)
.filter(PackageEntry.class, p -> p.getPublishable() instanceof Page)
//Put the publishable details in the header
.enrichHeaders(e -> e.headerFunction("item", Message::getPayload))
//Transform the item to a Page
.transform(PackageEntry.class, PackageEntry::getPublishable)
//Split page into components and put the type in the header
.split(Page.class, this::splitPageElements)
//Route content based on type to the subscriber
.<PageContent, String>route(PageContent::getType, mapping -> mapping
.resolutionRequired(false)
.subFlowMapping("page", sf -> sf.channel(this.contentPageChannel()))
.subFlowMapping("image", sf -> sf.channel(this.contentAssetChannel()))
.defaultOutputToParentFlow())
.channel(IntegrationContextUtils.NULL_CHANNEL_BEAN_NAME);
}
C) AWS Content (aws-content.jar)
This module is one of many potential subscribers to the content specific flows. It handles each element individually based off of the routed channel published to above.
Subscribe to the appropriate channel.
Handle the action appropriately.
There can be multiple modules with flows that subscribe to the above routed output channels, this is just one of them.
As an example, the the "contentPageChannel" could invoke the below flowPageToS3 (in aws module) and also a flowPageToFile (in another module).
Below is the code:
#Bean
IntegrationFlow flowAssetToS3()
{
return flow -> flow
.channel(this.assetChannel)
.publishSubscribeChannel(c -> c
.subscribe(s -> s
.<PageContent>handle((p, h) ->
{
return this.publishS3Asset(p);
})));
}
#Bean
IntegrationFlow flowPageToS3()
{
return flow -> flow
.channel(this.pageChannel)
.publishSubscribeChannel(c -> c
.subscribe(s -> s
.<Page>handle((p, h) -> this.publishS3Page(p))
.enrichHeaders(e -> e.header("s3Command", Command.UPLOAD.name()))
.handle(this.s3MessageHandler())));
}
First of all there are a lot of content in your question: it's to hard to keep all the info during read. That is your project, so you should be very confident in the subject. But for us that is something new and may just give up even reading not talking already with attempt to answer.
Anyway I'll try to answer to your questions in the beginning, although I feel like you're going to start a long discussion "what?, how?, why?"...
Is the below code the right way to structure flows using the DSL?
It really depends of your logic. That is good idea to distinguish it between logical component, but that might be overhead to sever separate jar on the matter. Looking to your code that seems for me like you still collect everything into single Spring Boot application and just #Autowired appropriate channels to the #Configuration. So, yes, separate #Configuration is good idea, but separate jar is an overhead. IMHO.
In "C" below, can i bubble up the result to the "B" flow?
Well, since the story is about publish-subscribe that is really unusual to wait for reply. How many replies are you going to get from those subscribers? Right, that is the problem - we can send to many subscribers, but we can't get replies from all of them to single return. Let's come back to Java code: we can have several method arguments, but we have only one return. The same is applied here in Messaging. Anyway you may take a look into Scatter-Gather pattern implementation.
Is using the DSL vs. the XML the better approach?
Both are just a high-level API. Underneath there are the same integration components. Looking to your app you'd come to the same distributed solution with the XML configuration. Don't see reason to step back from the Java DSL. At least it is less verbose, for you.
I am confused as to how to correctly "terminate" a flow?
That's absolutely unclear having your big description. If you send to S3 or to File, that is a termination. There is no reply from those components, so no where to go, nothing to do. That is just stop. The same we have with the Java method with void. If you worry about your entry point gateway, so just make it void and don't wait for any replies. See Messaging Gateway for more info.

Resources