Kafka Stream Sub topology for Low Level Processor Api - apache-kafka-streams

I have a kafka stream implemented using low level processor api with an intermediate topic. I expect the kafka stream to be split in two sub topologies , one from source to intermediate topic and other from intermediate topic to sink.
When I do topology.describe() I only see 1 sub topology created.
I know this can be done in DSL via through method but coudn't find an example with low level processor api.
Right now I am doing like this:
Topology topology = streamsBuilder.build();
topology.addSource("src",Serializer<MySer>,Deserializer<MyDeser>,"topic")
.addProcessor("processor1",MyProcessor::new,"src")
// intermediate topic starts
.addSink("intermediateTopicSink","intermediateTopicName",Serializer<MySer>,Deserializer<MyDeser>,"processor1")
.addSource("intermediateTopicSource",Serializer<MySer>,Deserializer<MyDeser>,"intermediateTopicName")
// intermediate topic ends
.addProcessor("processor2",AnotherProcessor::new,"intermediateTopicSource")
.addSink("finalSink","finalSinkTopic",Serializer<MySer>,Deserializer<MyDeser>,"processor2");

Related

Spring cloud stream kafka stream Multiple Input Bindings no outputs

I have a simple kafka stream app that process input from 3 topics but no out is required since the final step is to save the processing outcome to db.
I saw this example for multiple inputs topics and that exactly what I need apart from the output binding, That is the bean should not return an output but should consume from 3 different topics as KStream,KTable and GlobalKtable and finish.
Any ideas how I can change the example for my needs?

Handling transactions with Kafka Streams and Spring-Cloud-Stream

I am developing an app (microservices-based) relying on Kafka and Kafka Streams. I am using Spring Boot and Spring Cloud Stream for that and I am having trouble with handling transactions for Kafka Streams operations. I know that there is no problem with handling transactions purely with Kafka consumer however when I try to add Kafka Streams processing in the middle it becomes tricky to me.
The example case is:
In one of my services order request for a product is consumed from topic A.
Inventory info is consumed from topic B
This service produces inventory updates to topic B but it is also responsible for publishing events regarding products being ready for shipping (to topic C)
When receiving order request from topic A I want to check (by processing topic B) whether inventory for particular product is sufficient and publish an event with either success or failure (regarding that order) to topic C.
At the same time I need to update inventory (subtract the quantity that is let's say reserved for shipping) so that for next order I have actual values from topic B. I want to post success to topic C and update inventory on topic B within one transaction.
Is that possible in spring cloud stream with kafka streams? And if yes, how can I manage to do that?

Storm bolt following a kafka bolt

I have a Storm topology where I have to send output to kafka as well as update a value in redis. For this I have a Kafkabolt as well as a RedisBolt.
Below is what my topology looks like -
tp.setSpout("kafkaSpout", kafkaSpout, 3);
tp.setBolt("EvaluatorBolt", evaluatorBolt, 6).shuffleGrouping("kafkaStream");
tp.setBolt("ResultToRedisBolt",ResultsToRedisBolt,3).shuffleGrouping("EvaluatorBolt","ResultStream");
tp.setBolt("ResultToKafkaBolt", ResultsToKafkaBolt, 3).shuffleGrouping("EvaluatorBolt","ResultStream");
The problem is that both of the end bolts (Redis and Kafka) are listening to the same stream from the preceding bolt (ResultStream), hence both can fail independently. What I really need is that if the result is successfully published in Kafka, then only I update the value in Redis. Is there a way to have an output stream from a kafkaBolt where I can get the messages published successfully to Kafka? I can then probably listen to that stream in my RedisBolt and act accordingly.
It is not currently possible, unless you modify the bolt code. You would likely be better off changing your design slightly, since doing extra processing after the tuple is written to Kafka has some drawbacks. If you write the tuple to Kafka and you fail to write to Redis, you will get duplicates in Kafka, since the processing will start over at the spout.
It might be better, depending on your use case, to write the result to Kafka, and then have another topology read the result from Kafka and write to Redis.
If you still need to be able to emit new tuples from the bolt, it should be pretty easy to implement. The bolt recently got the ability to add a custom Producer callback, so we could extend that mechanism.
See the discussion at https://github.com/apache/storm/pull/2790#issuecomment-411709331 for context.

Is there a way to pass a variable from a Bolt to a Spout in Apache Storm?

In storm the flow of information (tuples) is from Spout to Bolt.
To prevent information overload, I am filtering most of the data at the spout in the beginning, but after processing of data, I want to add some more information pass through the spout based on the pattern in the data.
In other words, I want to dynamically change the information passed from a Spout at run-time based on the data processed by the bolts so far.
No. But you can move the filtering logic out of the spout into a new first bolt. The spout just fetches all data and forwards it to the new filter bolt. For bolts it is possible to have cycles in the graph, ie, you can feed back information to the filter bolt. For example something like this:
builder.addSpout("spout",...);
builder.addBolt("filter",...)
.localOrShuffleGrouping("spout") // regular forward connection
.allGrouping("someLaterBolt"); // cyclic feedback connection
// add more bolts here
builder.addBolt("someLaterBolt",...).someConnectionPattern("somePreviousBolt")

Dynamic topic in kafka channel using flume

Is it possible to have a kafka channel with a dynamic topic - something like the kafka sink where you can specify the topic header, or the HDFS sink where you can use a value from a header?
I know I can multiplex to use multiple channels (with a bunch of channel configurations), but that is undesirable because I'd like to have a single dynamic HDFS sink, rather than an HDFS sink for each kafka channel.
My understanding is that the Flume Kafka channel can only be mapped to a single topic because it is both producing and consuming logs on that particular topic.
Looking at the code in KafkaChannel.java from Flume 1.6.0, I can see that only one topic is ever subscribed to (with one consumer per thread).

Resources