Periodically polling consumer metrics with Reactor Kafka - spring-boot

We have a Spring Boot project using Reactor Kafka and a KafkaReceiver for consuming and we would like to collect and emit the underlying consumer metrics. It looks like we could leverage KafkaReceiver.doOnConsumer() with something like this:
receiver.doOnConsumer(Consumer::metrics)
.flatMapIterable(Map::entrySet)
.map(m -> Tuples.of(m.getKey(), m.getValue()))
If this is the best approach I'm not sure what the best way to do this periodically would be.
I notice there's also a version of the KafkaReceiver.create() factory method that takes a custom ConsumerFactory, maybe there's some way to use this to register the underlying Kafka consumer with Micrometer at creation time? I'm new to Spring Boot and relatively new to Kafka Reactor so I'm not totally sure.
Here's a snippet of my code so far for more context:
KafkaReceiver.create(receiverOptions(Collections.singleton(topic)).commitInterval(Duration.ZERO))
.receive()
.groupBy(m -> m.receiverOffset().topicPartition())
.flatMap(partitionFlux -> partitionFlux.publishOn(this.scheduler)
.map(r -> processEvent(partitionFlux.key(), r))
.concatMap(this::commit))
.doOnCancel(this::close)
.doOnError(error -> LOG.error("An error was encountered", error))
.blockLast();
If taking the doOnConsumer() approach makes sense we could possibly hook into doOnNext() but then we'd be collecting and emitting metrics for every event, which is too much, would be better if we could stagger and batch.
Any suggestions or tips appreciated, thanks.

Related

Spring cloud stream and backpressure using PubSubReactiveFactory

I am trying to implement a flow control similar to https://cloud.google.com/pubsub/docs/pull#flow_control using spring cloud stream reactive client.
#Bean
ApplicationRunner reactiveSubscriber(PubSubReactiveFactory reactiveFactory, PubSubMessageConverter converter) {
return (args) -> {
reactiveFactory.poll("orders-subscription", 250L)
// Convert a JSON payload into an object
.map(msg -> converter.fromPubSubMessage(msg.getPubsubMessage(), Order.class))
.doOnNext(order -> proccessOrder(order))
// Mannually acknowledge the message
.doOnNext(AcknowledgeablePubsubMessage::ack);
.subscribe();
};
}
It seems that back-pressure is not possible to limit number of data flowing through the system. Since the processing can last for few second I am afraid to cause some memory problems with high number of incoming message (millions of incoming message per day).
What I would like to achieve is a constant processing of 100 messages in the flux. Anyone had success implement it ? Maybe reactive rabbitmq ?
resolved using spring amqp and prefetch count.

Using onErrorResume to handle problematic payloads posted to Kafka using Reactor Kafka

I am using reactor kafka to send in kafka messages and receive and process them.
While receiving the kakfa payload, I do some deserialization, and if there is an exception, I want to just log that payload ( by saving to mongo ), and then continue receiving other payloads.
For this I am using the below approach -
#EventListener(ApplicationStartedEvent.class)
public void kafkaReceiving() {
for(Flux<ReceiverRecord<String, Object>> flux: kafkaService.getFluxReceives()) {
flux.delayUntil(//some function to do something)
.doOnNext(r -> r.receiverOffset().acknowledge())
.onErrorResume(this::handleException()) // here I'll just save to mongo
.subscribe();
}
}
private Publisher<? extends ReceiverRecord<String,Object>> handleException(object ex) {
// save to mongo
return Flux.empty();
}
Here I expect that whenever I encounter an exception while receiving a payload, the onErrorResume should catch it and log to mongo and then I should be good to continue receiving more messages when I send to the kafka queue. However, I see that after the exception, even though the onErrorResume method gets invoked, but I am not able to process anymore messages sent to Kakfa topic.
Anything I might be missing here?
If you need to handle the error gracefully, you can add onErrorResume inside delayUntil:
flux
.delayUntil(r -> {
return process(r)
.onErrorReturn(e -> saveToMongo(r));
});
.doOnNext(r -> r.receiverOffset().acknowledge())
.subscribe();
Reactive operators treat error as a terminal signal, and, if your inner logic (inside delayUntil) throws an error, delayUntil will terminate the sequence, and onErrorReturn after delayUntil will not make it continue processing the events from Kafka.
As mentioned by #bsideup too, I ultimately went ahead with not throwing exception from the deserializer, since the kafka is not able to commit offset for that record, and there is no clean way of ignoring that record and going ahead with further consumption of records as we dont have the offset information of the record( since it is malformed). So even if I try to ignore the record using reactive error operators, the poll fetches the same record, and the consumer is then kind of stuck

Spring Reactive events and transactional Context

I am migrating one existing app from Spring MVC traditional model to Spring Reactive. My first refactor let me with this piece of code:
return Mono.fromSupplier(() -> visitRequestDao.findById(requestId).get())
.map(request -> request.approve())
.map(request -> ResponseEntity.ok(listOfPendingVisitRequest(owner)));
After execute the endpoint I noticed that my entity did not have the state changed. As a Hibernate user, I know that when I load some object, any change applied will be reflect in the database, after the commit. My guess was that the event was executing in a different thread. I changed the code a little bit.
return Mono.fromSupplier(() -> visitRequestDao.findById(requestId).get())
.map(request -> transactionalContext.execute(() -> request.approve()))
.map(request -> ResponseEntity.ok(listOfPendingVisitRequest(owner)))
A class TransactionalContext was created and marked as transactional. So, now I know that any time its method is called, a new transactional will be started or a current one will be used. Is this the correct approach? Is there any solution?

Spring Integration Usage and Approach Validation

I am testing out using Spring Integration to tie together disperate modules within the same Spring-Boot application, for now, and services into a unified flow starting with a single-entry point.
I am looking for the following clarifications with Spring Integration if possible:
Is the below code the right way to structure flows using the DSL?
In "C" below, can i bubble up the result to the "B" flow?
Is using the DSL vs. the XML the better approach?
I am confused as to how to correctly "terminate" a flow?
Flow Overview
In the code below, I am just publishing a page to a destination. The overall flow goes like this.
Publisher flow listens for the payload and splits it into parts.
Content flow filters out pages and splits them into parts.
AWS flow subscribes and handles the part.
File flow subscribes and handles the part.
Eventually, there may be additional and very different types of consumers to the Publisher flow which are not content which is why I split the publisher from the content.
A) Publish Flow (publisher.jar):
This is my "main" flow initiated through a gateway. The intent, is that this serves as the entry point to begin trigger all publishing flows.
Receive the message
Preprocess the message and save it.
Split the payload into individual entries contained in it.
Enrich each of the entries with the rest of the data
Put each entry on the output channel.
Below is the code:
#Bean
IntegrationFlow flowPublish()
{
return f -> f
.channel(this.publishingInputChannel())
//Prepare the payload
.<Package>handle((p, h) -> this.save(p))
//Split the artifact resolved items
.split(Package.class, Package::getItems)
//Find the artifact associated to each item (if available)
.enrich(
e -> e.<PackageEntry>requestPayload(
m ->
{
final PackageEntry item = m.getPayload();
final Publishable publishable = this.findPublishable(item);
item.setPublishable(publishable);
return item;
}))
//Send the results to the output channel
.channel(this.publishingOutputChannel());
}
B) Content Flow (content.jar)
This module's responsibility is to handle incoming "content" payloads (i.e. Page in this case) and split/route them to the appropriate subscriber(s).
Listen on the publisher output channel
Filter the entries by Page type only
Add the original payload to the header for later
Transform the payload into the actual type
Split the page into its individual elements (blocks)
Route each element to the appropriate PubSub channel.
At least for now, the subscribed flows do not return any response - they should just fire and forget but i would like to know how to bubble up the result when using the pub-sub channel.
Below is the code:
#Bean
#ContentChannel("asset")
MessageChannel contentAssetChannel()
{
return MessageChannels.publishSubscribe("assetPublisherChannel").get();
//return MessageChannels.queue(10).get();
}
#Bean
#ContentChannel("page")
MessageChannel contentPageChannel()
{
return MessageChannels.publishSubscribe("pagePublisherChannel").get();
//return MessageChannels.queue(10).get();
}
#Bean
IntegrationFlow flowPublishContent()
{
return flow -> flow
.channel(this.publishingChannel)
//Filter for root pages (which contain elements)
.filter(PackageEntry.class, p -> p.getPublishable() instanceof Page)
//Put the publishable details in the header
.enrichHeaders(e -> e.headerFunction("item", Message::getPayload))
//Transform the item to a Page
.transform(PackageEntry.class, PackageEntry::getPublishable)
//Split page into components and put the type in the header
.split(Page.class, this::splitPageElements)
//Route content based on type to the subscriber
.<PageContent, String>route(PageContent::getType, mapping -> mapping
.resolutionRequired(false)
.subFlowMapping("page", sf -> sf.channel(this.contentPageChannel()))
.subFlowMapping("image", sf -> sf.channel(this.contentAssetChannel()))
.defaultOutputToParentFlow())
.channel(IntegrationContextUtils.NULL_CHANNEL_BEAN_NAME);
}
C) AWS Content (aws-content.jar)
This module is one of many potential subscribers to the content specific flows. It handles each element individually based off of the routed channel published to above.
Subscribe to the appropriate channel.
Handle the action appropriately.
There can be multiple modules with flows that subscribe to the above routed output channels, this is just one of them.
As an example, the the "contentPageChannel" could invoke the below flowPageToS3 (in aws module) and also a flowPageToFile (in another module).
Below is the code:
#Bean
IntegrationFlow flowAssetToS3()
{
return flow -> flow
.channel(this.assetChannel)
.publishSubscribeChannel(c -> c
.subscribe(s -> s
.<PageContent>handle((p, h) ->
{
return this.publishS3Asset(p);
})));
}
#Bean
IntegrationFlow flowPageToS3()
{
return flow -> flow
.channel(this.pageChannel)
.publishSubscribeChannel(c -> c
.subscribe(s -> s
.<Page>handle((p, h) -> this.publishS3Page(p))
.enrichHeaders(e -> e.header("s3Command", Command.UPLOAD.name()))
.handle(this.s3MessageHandler())));
}
First of all there are a lot of content in your question: it's to hard to keep all the info during read. That is your project, so you should be very confident in the subject. But for us that is something new and may just give up even reading not talking already with attempt to answer.
Anyway I'll try to answer to your questions in the beginning, although I feel like you're going to start a long discussion "what?, how?, why?"...
Is the below code the right way to structure flows using the DSL?
It really depends of your logic. That is good idea to distinguish it between logical component, but that might be overhead to sever separate jar on the matter. Looking to your code that seems for me like you still collect everything into single Spring Boot application and just #Autowired appropriate channels to the #Configuration. So, yes, separate #Configuration is good idea, but separate jar is an overhead. IMHO.
In "C" below, can i bubble up the result to the "B" flow?
Well, since the story is about publish-subscribe that is really unusual to wait for reply. How many replies are you going to get from those subscribers? Right, that is the problem - we can send to many subscribers, but we can't get replies from all of them to single return. Let's come back to Java code: we can have several method arguments, but we have only one return. The same is applied here in Messaging. Anyway you may take a look into Scatter-Gather pattern implementation.
Is using the DSL vs. the XML the better approach?
Both are just a high-level API. Underneath there are the same integration components. Looking to your app you'd come to the same distributed solution with the XML configuration. Don't see reason to step back from the Java DSL. At least it is less verbose, for you.
I am confused as to how to correctly "terminate" a flow?
That's absolutely unclear having your big description. If you send to S3 or to File, that is a termination. There is no reply from those components, so no where to go, nothing to do. That is just stop. The same we have with the Java method with void. If you worry about your entry point gateway, so just make it void and don't wait for any replies. See Messaging Gateway for more info.

Kafka Streams API: KStream to KTable

I have a Kafka topic where I send location events (key=user_id, value=user_location). I am able to read and process it as a KStream:
KStreamBuilder builder = new KStreamBuilder();
KStream<String, Location> locations = builder
.stream("location_topic")
.map((k, v) -> {
// some processing here, omitted form clarity
Location location = new Location(lat, lon);
return new KeyValue<>(k, location);
});
That works well, but I'd like to have a KTable with the last known position of each user. How could I do it?
I am able to do it writing to and reading from an intermediate topic:
// write to intermediate topic
locations.to(Serdes.String(), new LocationSerde(), "location_topic_aux");
// build KTable from intermediate topic
KTable<String, Location> table = builder.table("location_topic_aux", "store");
Is there a simple way to obtain a KTable from a KStream? This is my first app using Kafka Streams, so I'm probably missing something obvious.
Update:
In Kafka 2.5, a new method KStream#toTable() will be added, that will provide a convenient way to transform a KStream into a KTable. For details see: https://cwiki.apache.org/confluence/display/KAFKA/KIP-523%3A+Add+KStream%23toTable+to+the+Streams+DSL
Original Answer:
There is not straight forward way at the moment to do this. Your approach is absolutely valid as discussed in Confluent FAQs: http://docs.confluent.io/current/streams/faq.html#how-can-i-convert-a-kstream-to-a-ktable-without-an-aggregation-step
This is the simplest approach with regard to the code. However, it has the disadvantages that (a) you need to manage an additional topic and that (b) it results in additional network traffic because data is written to and re-read from Kafka.
There is one alternative, using a "dummy-reduce":
KStreamBuilder builder = new KStreamBuilder();
KStream<String, Long> stream = ...; // some computation that creates the derived KStream
KTable<String, Long> table = stream.groupByKey().reduce(
new Reducer<Long>() {
#Override
public Long apply(Long aggValue, Long newValue) {
return newValue;
}
},
"dummy-aggregation-store");
This approach is somewhat more complex with regard to the code compared to option 1 but has the advantage that (a) no manual topic management is required and (b) re-reading the data from Kafka is not necessary.
Overall, you need to decide by yourself, which approach you like better:
In option 2, Kafka Streams will create an internal changelog topic to back up the KTable for fault tolerance. Thus, both approaches require some additional storage in Kafka and result in additional network traffic. Overall, it’s a trade-off between slightly more complex code in option 2 versus manual topic management in option 1.

Resources