Kafka Streams - override default addSink implementation / custom producer - apache-kafka-streams

It is my first post to this here and I am not sure if this was covered here before, but here goes: I have a Kafka Streams application, using Processor API, following the topology below:
1. Consume data from an input topic (processor.addSource())
2. Inserts data into a DB (processor.addProcessor())
3. Produce its process status to an output topic (processor.addSink())
App works big time, however, for traceability purposes, I need to have in the logs the moment kstreams produced a message to the output topic, as well as its RecordMetaData (topic, partition, offset).
Example below:
KEY="MY_KEY" OUTPUT_TOPIC="MY-OUTPUT-TOPIC" PARTITION="1" OFFSET="1000" STATUS="SUCCESS"
I am not sure if there is a way to override the default kafka streams producer to add this logging or maybe creating my own producer to plug it on the addSink process. I partially achieved it by implementing my own ExceptionHandler (default.producer.exception.handler), but it only covers the exceptions.
Thanks in advance,
Guilherme

If you configure the streams application to use a ProducerInterceptor, then you should be able to get the information you need. Specifically, implementing the onAcknowledgement() will provide access to everything you listed above.
To configure interceptors in a streams application:
Properties props = new Properties();
// add this configuration in addition to your other streams configs
props.put(StreamsConfig.producerPrefix(ProducerConfig.INTERCEPTOR_CLASSES_CONFIG), Collections.singletonList(MyProducerInterceptor.class));
You can provide more than one interceptor if desired, just add the class name and change the list implementation from a singleton to a regular List. Execution of the interceptors follows the order of the classes in the list.
EDIT: Just to be clear, you can override the provided Producer in Kafka Streams via the KafkaClientSupplier interface, but IMHO using an interceptor is the cleaner approach. But which direction to go is up to you. You pass in your KafkaClientSupplier in an overloaded Kafka Streams constructor.

Related

spring cloud stream when a lot of different event types

I want an advise according usage of spring cloud stream technologies.
Currently my service use spring-boot and implements some event-based approaches.
But the events are not sent to some kind of broker, but are simply handled by handlers in separate threads.
I am interested in spring cloud stream technology.
I have implemented CustomMessageRoutingCallback as shown in this example https://github.com/spring-cloud/spring-cloud-stream-samples/tree/main/routing-samples/message-routing-callback.
But the problem, that declaring all consumers at config in this way sounds like a pain:
#Bean
public Consumer<Menu> menuConsumer(){
return menu -> log.info(menu.toString());
}
Because I have around 50-60 different event types. Is where any way to register consumers dynamicly? Or the better way will be declare consumer with some raw input type, then deserialize message in consumer and manually route message to the right consumer?
This really has nothing to do with s-c-stream and more of an architectural question. If you have 50+ different event types having that many diff3rent consumers would be the least of your issues. The question I would be asking - is it really feasible to trust a single application to process that many different event types? What if a single event processing results in the system failure. Are you willing to live with non of the events being processed until the problem is fixed?
This is just an example, but there are many other architectural questions that would need to be answered before you can select a technology
A possible option is to create a common interface for your events
#Bean
public Consumer<CommonIntefaceType> menuConsumer(){
return commonIntefaceTypeObj -> commonIntefaceTypeObj.doSomething();
}

bind destinations dynamically for producers and consumers (Spring)

I'm trying to send and receive messages to channels/topics whose destination names are in a database, so they can be added/modified/deleted at runtime, but I'm surprised I have found little on the web. I'm using Spring Cloud Streams to allow to change the underlying broker.
To send messages to dynamically bound destinations I'm going with BinderAwareChannelResolver.resolveDestination(target).send(message), but I haven't found something that works like it to receive messages.
My questions are:
1. Is there something similar?
2. how can the message be processed periodically as #StreamListener does?
3. And not as important, but can you create a subscriber automatically in case there is none?
Thanks for any help!
This is a bit out of scope of the original design of the framework. But I would further question your architecture. . . If you truly desire to subscribe to unlimited amount of destinations I wonder why? What is the underlying business requirement?
Keep in mind that even if we were to do it somehow that would require creation of a message listener container dynamically for each new destination which would raise more questions, such as, how long would such container have to live since eventually you would run out of resources.
If, however, you simply asking about possibility of mapping multiple destinations to a single channel so all messages go to the same message handler (e.g., StreamListener), then you can simply use input destination property and define multiple destination delimited by comas.

why does spring cloud stream's #InboundChannelAdapter accept no parameters?

I'm trying to use spring cloud stream to send and receive messages on kafka. The examples for this use a simple example of using time stamps as the messages. I'm trying to go just one step further into a real world application when I ran into this blocker on the InboundChannelAdapter docs:
"A method annotated with #InboundChannelAdapter can't accept any parameters"
I was trying to use it like so:
#InboundChannelAdapter(value = ChannelManager.OUTPUT)
public EventCreated createCustomerEvent(String customerId, String thingId) {
return new EventCreated(customerId, thingId);
}
What usage am I missing? I imagine that when you want to create an event, you have some data that you want to use for that event, and so you would normally pass that data in via parameters. But "A method annotated with #InboundChannelAdapter can't accept any parameters". So how are you supposed to use this?
I understand that #InboundChannelAdapter comes from spring-integration, which spring-cloud-stream extends, and so spring-integration may have a different context in which this makes sense. But it seems un-intuitive to me (as does using an _INBOUND_ChannelAdapter for an output/producer/source)
Well, first of all the #InboundChannelAdapter is defined exactly in Spring Integration and Spring Cloud Stream doesn't extend it. That's false. Not sure where you have picked up that info...
This annotation builds something like SourcePollingChannelAdapter which provides a poller based on the scheduler and calls periodically a MessageSource.receive(). Since there is no any context and end-user can't effect that poller's behavior with his own arguments, the requirement for empty method parameters is obvious.
This #InboundChannelAdapter is a beginning of the flow and it is active. It does its logic on background without your events.
If you would like to call some method with parameters and trigger with that some flow, you should consider to use #MessagingGateway: http://docs.spring.io/spring-integration/reference/html/messaging-endpoints-chapter.html#messaging-gateway-annotation
How are you expecting to call that method? I think there was a miscommunication with your statement "stream extends integration" and Artem probably understood that we extend #InboundChannelAdatper
So, if you are actively calling this method, as it appears since you do have arguments that are passed to it, why not just using your source channel to send the data?
Usually sources do not require arguments as they are either push like the twitter stream that taps on twitter, listen for events and pushes them to the source channel, or they are polled, in which case, they are invoked on an interval defined via a poller.
As Artem pointed, if your intention is to call this method from your business flow, and deal with the return while triggering a message flow, then check his link from the docs.

Strategy for passing same payload between messages when optional outbound gateways fail

I have a workflow whose message payload (MasterObj) is being enriched several times. During the 2nd enrichment () an UnknownHostException was thrown by an outbound gateway. My error channel on the enricher is called but the message the error-channel receives is an exception, and the failed msg in that exception is no longer my MasterObj (original payload) but it is now the object gotten from request-payload-expression on the enricher.
The enricher calls an outbound-gateway and business-wise this is optional. I just want to continue my workflow with the payload that I've been enriching. The docs say that the error-channel on the enricher can be used to provide an alternate object (to what the enricher's request-channel would return) but even when I return an object from the enricher's error-channel, it still takes me to the workflow's overall error channel.
How do I trap errors from enricher's + outbound-gateways, and continue processing my workflow with the same payload I've been working on?
Is trying to maintain a single payload object for the entire workflow the right strategy? I need to be able to access it whenever I need.
I was thinking of using a bean scoped to the session where I store the payload but that seems to defeat the purpose of SI, no?
Thanks.
Well, if you worry about your MasterObj in the error-channel flow, don't use that request-payload-expression and let the original payload go to the enricher's sub-flow.
You always can use in that flow a simple <transformer expression="">.
On the other hand, you're right: it isn't good strategy to support single object through the flow. You carry messages via channel and it isn't good to be tied on each step. The Spring Integration purpose is to be able to switch from different MessageChannel types at any time with small effort for their producers and consumers. Also you can switch to the distributed mode when consumers and producers are on different machines.
If you still need to enrich the same object several times, consider to write some custom Java code. You can use a #MessagingGateway on the matter to still have a Spring Integration gain.
And right, scope is not good for integration flow, because you can simply switch there to a different channel type and lose a ThreadLocal context.

Using Spring Integration to split a large XML file into individual smaller messages and process each individually

I am using Spring Integration and have a large XML file containing a collection of child items, I want to split the file into a set of messages, the payload of each message will be one of the child XML fragments.
Using splitter is the obvious but this requires returning a collection of messages and this will exhaust the memory; I need to split the file into individual messages but process them one at a time (or more likely with a multi threaded task-executor).
Is there a standard way to do this without writing a custom component that writes the sub-messages to a channel programatically.
i have been looking for a similar solution and I have not found either any standard way of doing this.
Here is a rather dirty fix, if anyone needs this behavior implemented:
Split the files manually using a Service Activator or a Splitter with a custom bean.
<int:splitter input-channel="rawChannel" output-channel="splitChannel" id="splitter" >
<bean class="com.a.b.c.MYSplitter" />
</int:splitter>
Your custom bean should implement ApplicationContextAware so the application context can be injected by Spring.
Manually retrieve the output channel and send each sub-message
MessageChannel xsltChannel = (MessageChannel) applicationContext.getBean("splitChannel");
Message<String> message = new GenericMessage<String>(payload));
splitChannel.send(message);
For people coming across this very old question. Splitters can now handle results of type Iterable, Iterator, Stream, and Flux (project reactor). If any of these types are returned, messages are emitted one-at-a-time.
Iterator/Iterable since 4.0.4; Stream/Flux since 5.0.0.
There is also now a FileSplitter which emits file contents a line-at-a-time via an Interator - since 4.1.2.

Resources