Kafka Transactions in Interceptors in streams with EOS - apache-kafka-streams

We are using spring kafka streams for processing data streams. We are currently using the processing.guarantee: exactly_once configuration to make sure it adheres to Exactly-once semantics of data processing.
We have a new requirement to have some metadata emitted via interceptors. The interceptor uses kafka template to publish metadata (mostly skimming header data of messages) to a kafka topic.
The reason to have it on interceptor is the pipeline to process streams, including multiple applications/topic and it would be easy to wire the interceptors to individual apps.
Question I have is, will the kafka template used in my interceptor require #Transactional annotation or will it use the transaction semantics generated by processing.gaurantee config. And will it be able to rollback data and not commit when primary streams consumer/producer failed to commit?
Any pointers towards this is highly appreciated.

The KafkaTemplate cannot participate in the stream's transaction. It would need a separate transaction, which would break EOS.
For exactly once, either everything has to be done with streams, or everything with spring-kafka.

Related

How can I retrieve Kafka messages inside a controller in Spring Boot?

The messages created by the producer are all being consumed as expected.
The thing is, I need to create an endpoint to retrieve the latest messages from the consumer.
Is there a way to do it?
Like an on-demand consumer?
I found this SO post but is only to consume the last N records. I want to consume the latest without caring about the offsets.
Spring Kafka Consumer, rewind consumer offset to go back 'n' records
I'm working with Kotlin but if you have the answer in Java I don't mind either.
There are several ways to create listener containers dynamically; you can then start/stop them on demand. To get the records back into the controller, you'd need to use something like a blocking queue, or make the controller itself a MessageListener.
These answers show a couple of techniques for creating containers on demand:
How to dynamically create multiple consumers in Spring Kafka
Kafka Consumer in spring can I re-assign partitions programmatically?

kafka streams - can I use kafka streams processing in cases where the source is not a kafka topic?

I have an application (call it smscb-router) as shown in the diagram.
It reads data from a legacy system (sms).
Based on the content (callback type), I have to put into corresponding outgoing topic (such as billing-n-cdr, dr-cdr, ...)
I think streams API is better suited in this case, as it has the map functionality to do the content mapping check. What I am unsure is, can I read source data from a non-kafka-topic source.
All the examples that I see on the internet blogs, explain steaming apps with the context of reading from a source topic and put to other destination topics.
So, is this possible to read from a non-topic source, such as say a redis store, or a message queue such as RabbitMQ?
We had a recent implementation, where we had to poll an .xml file from a network attached drive and convert it into the KAFKA Events i.e. publishing each record into an output topic. In such, we wont even call it as something we have developed using a Streams API, but it is just a KAFKA Producer Component.
Java File Poller Module (Quartz time based) -> XML Schema Management -> KAFKA Producer Component -> Output Topic (KAFKA Broker).
And you will get all native features of KAKFA Producer API in terms of retries and you can use producer.send (Sync) or producer.send.get(Asyn) with call-back.
Hope this helps. Streams API is meant for big and something very complex that to be normalized through using Stateful operations.
Thanks,
Christopher
Kafka Streams is only about Topic to Topic Data Streaming
All external system should be integrated by another method :
Ideally Kafka Connect : for example with this one :
https://docs.confluent.io/kafka-connect-rabbitmq-source/current/overview.html
You may also use a manual consumer for the first step, but it always better to reuse all availability mecanism built in Kafka Connect. (No code, just some Json config).
In your schema i would recommend to add one topic and one producer or one connector in front of your Pink Component, then it can become a fully standard Kafka Streams microservice.

Get underlying low-level Kafka consumers and Producers in Spring Cloud Stream

I have a usecase where I want to get the underlying Kafka producer (KafkaTemplate) in a Spring Cloud Stream application. While navigating the code I stumbled upon KafkaProducerMessageHandler which has a getKafkaTemplate method. However, it fails to auto-wire.
Also, if I directly auto-wire KafkaTemplate, the template is initialized with default properties and it ignores the broker in the binder key of the SCSt configuration
How can I access the underlying KafkaTemplate or a producer/consumer in a Spring Cloud Stream app?
EDIT: Actually my SCSt app has multiple Kafka binders and I want to get the KafkaTemplate or Kafka producer corresponding to each binder. Is that possible somehow?
It's not entirely clear why you would need to do that, but you can capture the KafkaTemplates by adding a ProducerMessageHandlerCustomizer #Bean to the application context.

Why to use SpringKafka Template in place existing Kafka Producer Consumer api?

What benefits does spring Kafka template provide?
I have tried the existing Producer/Consumer API by Kafka. That is very simple to use, then why use Kafka template.
Kafka Template internally uses Kafka producer so you can directly use Kafka APIs. The benefit of using Kafka template is it provides different methods for sending message to Kafka topic, kind of added benefits you can see the API comparison between KafkaProducer and KafkaTemplate here:
https://kafka.apache.org/10/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html
https://docs.spring.io/spring-kafka/api/org/springframework/kafka/core/KafkaTemplate.html
You can see KafkaTemplate provide many additional ways of sending data to Kafka topics because of various send methods while some calls are the same as Kafka API and are simply forwarded from KafkaTemplate to KafkaProducer.
It's up to the developer what to use. If you feel like working with KafkaTemplate is easy as you don't have to create ProducerRecord a simple send method will do all the work for you.
At a high level, the benefit is that you can externalize your properties objects more easily and you can just focus on the record processing logic
Plus Spring is integrated with lots of other components.
Note: Other options still exist like Reactor Kafka, Alpakka, Apache Camel, Smallrye reactive messaging, Vert.x... But they all wrap the same Kafka API.
So, I'd say you're (marginally) trading efficiency for convinience

Development compromises in using Spring Cloud Stream

The case for event-driven microservices such as Spring Cloud Stream is their asynchronous nature, which I do agree it makes them more scalable
But I have an issue regarding how to code it in a way where I don't lose certain key features that I have access to using synchronous services
In a servlet-based MS, I make full use of servlet context variables and servlet-based Spring autowiring functions
For e.g., I leverage heavily on HTTP headers to carry metadata between microservices without having to impact the payload. But in Spring Cloud Stream using Kafka, Kafka doesn't support message headers of any kind! I lose that immediately if I use SCS. Putting them into the payload causes all sort of changes in my model classes if I define the attributes clearly. Yes, I can use a simple Hashmap to simulate the HTTP header object but it really seems like reinventing the wheel to me.
On the auto-wiring side: I maintain an audit log record per request, which I implement by declaring a request-scoped Hashmap bean and autowiring it into any methods in the Servlet's call stack that needs to append data to the audit log. Basically it's just a global variable to hold some data within a single request. But in SCS, again, I lose that cos bean scopes that leverage on servlets are not available.
So far, there seems to be a lot of trade-offs that I have to make just to make Spring Cloud Stream work for me.
I thought about an alternative approach where I use SCS just to create an entry point but the Source method would just get the event, use a Processor to construct a HTTP request and send the request along to a HTTP endpoint. But, why go through all that trouble then?
Hoping that some more experienced devs would be able to shed some light on how they leverage on SCS.
#feicipet Thanks for the detailed question. let me try to address some of your concerns in the order you have listed them:
+1
+1
I am not sure why you are referring to it as servlet-based instead of Spring-based? Those are features provided by Spring, but read on. . .
Spring Cloud Stream doesn't use Kafka, the end user does while Spring Cloud Stream provides Kafka binder allowing Spring Cloud Stream to integrate with Kafka. Further more, while Kafka indeed did not support headers prior to version 0.11, Spring Cloud Stream always supported and will continue support headers even with Kafka pre-0.11, embedding them in the Message and then extracting them in the consumer side into the proper Message headers completely transparent to the end user. In other words one would assume that Kafka did support headers by simply using Spring Cloud Stream. With Kafka 0.11+ headers are supported natively and we have adjusted to that with the same level of transparency.
So, you don't need to put anything in the payload. Just create an appropriate Message<payload, headers> and SCSt will take care of the rest regardless of the broker (Kafka, Rabbit, Foo etc.).
Yes you do simply due to the fact that as you eluded earlier SCSt promotes an asynchronous and stateless architecture. However, I do not agree that what you are trying to accomplish is un-accomplishable. Rather it is accomplishable the way you are describing, but there are other way to maintain context and I would be more then glad to discuss it as a separate topic.
I would not call them trade-offs, rather difference in the architecture, that has its benefits, but it is a not one-size-fits-all architecture and therefore its viability should be discussed within the context of a concrete use case.
+1. You don't have to separate it as Source and Processor. You can simply create a custom Source app with exposed REST endpoint and custom processing logic. However we are currently working on enhancements i the framework to ensure that you could do the same with the existing starter apps.
Obviously we have touched on many points here and some of them would probably need to be debated further, but I hope this clears up some of your concerns.
Cheers

Resources