How to test spring cloud stream kafka binder - spring

This is my code and I am looking to test it. Been looking at a few things (TopologyTestDriver, etc.) but could not get any to work. This example is directly from https://docs.spring.io/spring-cloud-stream/docs/current/reference/html/spring-cloud-stream-binder-kafka.html#_programming_model
Is there a simple, example unit test? Do I even need the kafka stuff or can I just directly test this function
#Bean
public Function<KStream<Object, String>, KStream<?, WordCount>> process() {
return input -> input
.flatMapValues(value -> Arrays.asList(value.toLowerCase().split("\\W+")))
.map((key, value) -> new KeyValue<>(value, value))
.groupByKey(Serialized.with(Serdes.String(), Serdes.String()))
.windowedBy(TimeWindows.of(5000))
.count(Materialized.as("word-counts-state-store"))
.toStream()
.map((key, value) -> new KeyValue<>(key.key(), new WordCount(key.key(), value,
new Date(key.window().start()), new Date(key.window().end()))));
}

Related

Spring Kafka Stream - No type information in headers and no default type provided when using BiFunction

I am trying to join 2 topics and produce output in to a 3rd topic using BiFunction. I am facing issue with resolving the type for incoming message. My left side message is getting deserialized successfully, but right side it throws "No type information in headers and no default type provided".
When I step through the code I could see it fails in the line org.springframework.kafka.support.serializer.JsonDeserializer
Assert.state(localReader != null, "No headers available and no default type provided");
The Messages are produced by spring boot with Kafka binder. And it has below properties.
spring.kafka.producer.key-serializer=org.apache.kafka.common.serialization.StringSerializer
spring.kafka.producer.value-serializer=org.springframework.kafka.support.serializer.JsonSerializer
##
spring.kafka.producer.properties.spring.json.type.mapping=type1:com.demo.domain.type2,type1:com.demo.domain.type2
spring.kafka.producer.properties.spring.json.trusted.packages=com.demo.domain
spring.kafka.producer.properties.spring.json.add.type.headers=true
And on the Kafka Stream binder consumer side
# kafka stream setting
spring.cloud.stream.bindings.joinProcess-in-0.destination=local-stream-process-type1
spring.cloud.stream.bindings.joinProcess-in-1.destination=local-stream-process-type2
spring.cloud.stream.bindings.joinProcess-out-0.destination=local-stream-process-type3
spring.cloud.stream.kafka.streams.binder.functions.joinProcess.applicationId=local-stream-process
spring.cloud.stream.kafka.streams.binder.configuration.default.key.serde=org.apache.kafka.common.serialization.Serdes$StringSerde
spring.cloud.stream.kafka.streams.binder.configuration.default.value.serde=org.springframework.kafka.support.serializer.JsonSerde
spring.kafka.streams.properties.spring.json.trusted.packages=*
spring.kafka.properties.spring.json.type.mapping=type1:com.demo.domain.type2,type1:com.demo.domain.type2
spring.kafka.streams.properties.spring.json.use.type.headers=true
And My Bifunction looks like
#Configuration
public class StreamsConfig {
#Bean
public RecordMessageConverter converter() {
return new StringJsonMessageConverter();
}
#Bean
public BiFunction<KStream<String, type1>, KStream<String, type2>, KStream<String, type3>> joinProcess() {
return (type1, type2) ->
type1.join(type2, joiner(),
JoinWindows.of(Duration.ofDays(1)));
}
private ValueJoiner<type1, type2, type3> joiner() {
return (type1, type2) -> { new type3("test");
};
}
}
I have pretty much went through all the previous questions and none of them were Bifunction. The one thing i havent tried is set VALUE_TYPE_METHOD.
###Update###
I resolved my issue with explicitly providing the serdes and disabling auto type conversion.
#Bean
public BiFunction<KStream<String, Type1>, KStream<String, Type2>, KStream<String, Type3>> joinStream() {
return (type1, type2) ->
type1.join(type2, myValueJoiner(),
JoinWindows.of(Duration.ofMinutes(1)), StreamJoined.with(Serdes.String(), new Type1Serde(), new Type2Serde()));
}
And I disabled the automatic Deserialization like below
spring.cloud.stream.bindings.joinStream-in-0.consumer.use-native-decoding=false
spring.cloud.stream.bindings.joinStream-in-1.consumer.use-native-decoding=false

Spring Cloud Stream’s Apache Kafka

This is my application that consumes data from a Kafka topic and then computed results are sent to a topic.
#SpringBootApplication
#EnableBinding(KStreamProcessor.class)
public class WordCountProcessorApplication {
#StreamListener("input")
#SendTo("output")
public KStream<?, WordCount> process(KStream<?, String> input) {
return input
.flatMapValues(value -> Arrays.asList(value.toLowerCase().split("\\W+")))
.groupBy((key, value) -> value)
.windowedBy(TimeWindows.of(5000))
.count(Materialized.as("WordCounts-multi"))
.toStream()
.map((key, value) -> new KeyValue<>(null, new WordCount(key.key(), value, new Date(key.window().start()), new Date(key.window().end()))));
}
public static void main(String[] args) {
SpringApplication.run(WordCountProcessorApplication.class, args);
}
How can I print a log befor every consume data from a Kafka topic?
Add a Transformer at the beginning and end of your topology.
See this discussion where there was a request to automatically add custom transformers to the topology by the framework.
It was decided that the work around to add your own is sufficient.

How to create multi output stream from single input stream with Spring Cloud Kafka stream binder?

I am trying to create multi output streams(depend on different time window) from single input stream.
interface AnalyticsBinding {
String PAGE_VIEWS_IN = "pvin";
String PAGE_VIEWS _COUNTS_OUT_Last_5_Minutes = "pvcout_last_5_minutes";
String PAGE_VIEWS _COUNTS_OUT_Last_30_Minutes = "pvcout_last_30_minutes";
#Input(PAGE_VIEWS_IN)
KStream<String, PageViewEvent> pageViewsIn();
#Output(PAGE_VIEWS_COUNTS_OUT_Last_5_Minutes)
KStream<String,Long> pageViewsCountOutLast5Minutes();
#Output(PAGE_VIEWS_COUNTS_OUT_Last_30_Minutes)
KStream<String,Long> pageViewsCountOutLast30Minutes();
}
#StreamListener
#SendTo({ AnalyticsBinding.PAGE_VIEWS_COUNTS_OUT_Last_5_Minutes })
public KStream<String, Long> processPageViewEventForLast5Mintues(
#Input(AnalyticsBinding.PAGE_VIEWS_IN)KStream<String, PageViewEvent> stream) {
// aggregate by Duration.ofMinutes(5)
}
#StreamListener
#SendTo({ AnalyticsBinding.PAGE_VIEWS_COUNTS_OUT_Last_30_Minutes })
public KStream<String, Long> processPageViewEventForLast30Mintues(
#Input(AnalyticsBinding.PAGE_VIEWS_IN)KStream<String, PageViewEvent> stream) {
// aggregate by Duration.ofMinutes(30)
}
When I start the application just one stream task would work, Is there a way to get both processPageViewEventForLast5Mintues and processPageViewEventForLast30Mintues work simultaneously
You are using the same input binding in both processors and that's why you are seeing only one as working. Add another input binding in the binding interface and set it's destination to the same topic. Also, change one of the StreamListener methods to use this new binding name.
With that said, if you are using the latest versions of Spring Cloud Stream, you should consider migrating to a functional model. For e.g. the following should work.
#Bean
public Function<KStream<String, PageViewEvent>, KStream<String, Long>> processPageViewEventForLast5Mintues() {
...
}
and
#Bean
public Function<KStream<String, PageViewEvent>, KStream<String, Long>> processPageViewEventForLast30Mintues() {
...
}
The binder automatically creates two distinct input bindings in this case.
You can set destinations on those bindings.
spring.cloud.stream.bindings.processPageViewEventForLast5Mintues-in-0.destination=<your Kafka topic>
spring.cloud.stream.bindings.processPageViewEventForLast30Mintues-in-0.destination=<your Kafka topic>

Delay in processing with Kafka Streams

I have created a Kafka Stream topology and I am having 1 Source and 2 Sinks.
I am using Spring Boot(2.1.9) with Kafka Streams and Not using Spring Cloud. Kafka Version 2.3.0
#Configuration
#EnableKafkaStreams
public class StreamStart {
#Bean
public KStream<String, String> process(StreamsBuilder builder) {
KStream<String, String> inputStream = builder.stream("streamIn", Consumed.with(Serdes.String(), Serdes.String()));
KStream<String, String> upperCaseStream = inputStream.mapValues(value -> value.toUpperCase());
upperCaseStream.to("outTopic", Produced.with(Serdes.String(), Serdes.String()));
KTable<String, Long> wordCounts = upperCaseStream
.flatMapValues(v -> Arrays.asList(v.split(" ")))
.selectKey((k, v) -> v)
.groupByKey(Serialized.with(Serdes.String(), Serdes.String()))
.count(Materialized.<String, Long, KeyValueStore<Bytes, byte[]>> as("counts-store"));
wordCounts.toStream().to("wordCountTopic", Produced.with(Serdes.String(), Serdes.Long()));
return upperCaseStream;
}
}
The data flows in outTopic instantaneously whereas data getting displayed in wordCountTopic takes 20-25 seconds for each record.
Any Suggestions?

Spring Cloud Stream Kafka Consumer Test

I am trying to setup test as suggested here at GitHub a link
Map<String, Object> senderProps = KafkaTestUtils.producerProps(embeddedKafka);
DefaultKafkaProducerFactory<Integer, String> pf = new DefaultKafkaProducerFactory<>(senderProps);
try {
KafkaTemplate<Integer, String> template = new KafkaTemplate<>(pf, true);
template.setDefaultTopic("words");
template.sendDefault("foobar");
--> ConsumerRecord<String, String> cr = KafkaTestUtils.getSingleRecord(consumer, "output");
log.debug(cr);
}
finally {
pf.destroy();
}
Where StreamProcessor is set to
#StreamListener
#SendTo("output")
public KStream<?, WordCount> process(#Input("input") KStream<Object, String> input) {
return input.map((key, value) -> new KeyValue<>(value, new WordCount(value, 10, new Date(), new Date())));
}
--> line never consumes messages which to my mind should be on topic "output" due to the fact that #Streamprocessor has #SendTo("output")
I want to be able to test stream processed messages.
You need to consume from the actual topic that your output is bound to.
Do you have a configuration for spring.cloud.stream.bindings.output.destination? That should be the value that you need to use. If you don't set that, the default will be the same as the binding - output in this case.

Resources