Using Spring Cloud Stream Kafka Streams with Avro input/output with nativeEncoding/decoding=false - spring-boot

We're testing the use of Kafka Streams via Spring Cloud Stream function support with Avro input/output records, but setting nativeEncoding=false and nativeDecoding=false in order to use a custom MessageConverter where we do the Avro conversion.
The default serdes are StringSerde for keys and ByteArraySerde for values.
Everything is ok when we only use a KStream to KStream function, for example:
#Bean
public Function<KStream<String, DataRecordAvro>, KStream<String, DataRecordAvro>> wordsCount() {
return input -> input
.flatMapValues(value -> Arrays.asList(value.getName().toString().toLowerCase().split("\\W+")))
.map((key, value) -> new KeyValue<>(value, value))
.groupByKey(Grouped.with(Serdes.String(), Serdes.String()))
.windowedBy(TimeWindows.of(Duration.ofSeconds(5)).grace(Duration.ofMillis(0)))
.count()
.toStream()
.map((key, value) -> new KeyValue<>(key.key(), new DataRecordAvro(key.key(), value)));
}
but when we try a little bit more complex example involving an input KTable like this:
#Bean
public BiFunction<KStream<String, DataRecordAvro>, KTable<String, DataRecordAvro>, KStream<String, DataRecordAvro>> userClicksRegionKTableAvro() {
return (userClicksStream, usersRegionKTable) -> userClicksStream
.leftJoin(usersRegionKTable,
(clicks, region) -> new RegionWithClicks(region == null ? "UNKNOWN" : region.getName().toString(), clicks.getCount()))
.map((user, regionWithClicks) -> new KeyValue<>(regionWithClicks.getRegion(), regionWithClicks.getClicks()))
.groupByKey(Grouped.with(Serdes.String(), Serdes.Long()))
.reduce(Long::sum)
.mapValues((key, value) -> new DataRecordAvro(key, value))
.toStream();
}
(The DataRecordAvro class only have two members: CharSequence name; Long count;)
When received the first record this exception is thrown:
ClassCastException invoking Processor. Do the Processor's input types match the deserialized types? Check the Serde setup and change the default Serdes in StreamConfig or provide correct Serdes via method parameters. Make sure the Processor can accept the deserialized input of type key: java.lang.String, and value: com.xxxx.kstreams.fixtures.avro.DataRecordAvro.
Note that although incorrect Serdes are a common cause of error, the cast exception might have another cause (in user code, for example). For example, if a processor wires in a store, but casts the generics incorrectly, a class cast exception could be raised during processing, but the cause would not be wrong Serdes.
The processor where the exception is thrown seems to be:
KSTREAM-LEFTJOIN-0000000011:
states: [user-regions-avro-STATE-STORE-0000000008]
We have no idea why it doesn't work in this case. Maybe the leftJoin operation persists information to an internal topic and there the useNativeEncoding/Decoding=false are not taken into account? But why the kstream->kstream example above does work? We thought the Avro conversion was only done at the start and end of the Topology, why this casting exception while using leftJoin?
Here is another example that works ok (without input Avro records, leaving consumer useNativeDecoding as default true):
#Bean
public BiFunction<KStream<String, Long>, KTable<String, String>, KStream<String, DataRecordAvro>> userClicksRegionKTable() {
return (userClicksStream, usersRegionKTable) -> userClicksStream
.leftJoin(usersRegionKTable,
(clicks, region) -> new RegionWithClicks(region == null ? "UNKNOWN" : region, clicks))
.map((user, regionWithClicks) -> new KeyValue<>(regionWithClicks.getRegion(), regionWithClicks.getClicks()))
.groupByKey(Grouped.with(Serdes.String(), Serdes.Long()))
.reduce(Long::sum)
.mapValues((key, value) -> new DataRecordAvro(key, value))
.toStream();
}
Please help!

For Kafka Streams binder in Spring Cloud Stream, we recommend using native decoding/encoding with Serdes unless you have strong reasoning for relying on the message conversion approach. What is the use case that forces you to go with message converters here? In practice, using message converters for serialization purposes in Kafka Streams applications in Spring Cloud Stream adds an extra layer in your topology and makes it deeper, thus the recommendation to use native decoding/encoding.
As you noted, for KTable, the binder always uses native decoding - at the moment, it is not possible to use message converters there. When you turn off useNativeDecoding on the KTable binding, the binder ignores it and simply uses the default byte serde. I suggest going with the default on the KTable binding and then adding the following bean in your application configuration.
#Bean
public Serde< DataRecordAvro> dataRecordAvroSerde() {
// return Serde
}
This way the binder will detect this bean and realize that the Serde type matches with the type from the function signature and then use it on those inputs.
If you have further issues on this app, feel free to share an MCRE. We can take further look then.

Related

Spring Cloud Stream Kafka Streams inbound KTable predictable internal state-store topic names

We're using Kafka Streams with Spring Cloud Stream Functions. We have the typical example application which joins user clicks kstream with user regions ktable.
We know we can force custom names for internal changelog or repartition topics by using appropiate methods that accept a name for materialized store when defining our topology:
#Bean
public BiFunction<KStream<String, Long>, KTable<String, String>, KStream<String, Long>> bifunctionktable() {
return (userClicksStream, userRegionsTable) -> userClicksStream
.leftJoin(userRegionsTable,
(clicks, region) -> new RegionWithClicks(region == null ? "UNKNOWN" : region, clicks),
Joined.with(Serdes.String(), Serdes.Long(), null, "bifunctionktable-leftjoin"))
.map((user, regionWithClicks) -> new KeyValue<>(regionWithClicks.getRegion(), regionWithClicks.getClicks()))
.groupByKey(Grouped.with(Serdes.String(), Serdes.Long()).withName("bifunctionktable-groupbykey"))
.reduce((firstClicks, secondClicks) -> firstClicks + secondClicks, Materialized.as("bifunctionktable-reduce"))
.toStream();
}
But for the input KTable we cannot change its state-store internal topic name and we always get this topic name: myapp-id-user-regions-STATE-STORE-0000000001-changelog
If we were fully creating our topology by code we do have builder.table(final String topic, final Materialized<K, V, KeyValueStore<Bytes, byte[]>> materialized) method, but using functions... Is there any way to customize the internal topic name for the input KTable in this case?
You can add a custom name for the incoming KTable by using the following property:
spring.cloud.stream.kafka.streams.bindings.bifunctionktable-in-1.consumer.materializedAs: <Your-custom-store-name>
This is documented in this section of the reference docs.

Spring Kafka Stream - No type information in headers and no default type provided when using BiFunction

I am trying to join 2 topics and produce output in to a 3rd topic using BiFunction. I am facing issue with resolving the type for incoming message. My left side message is getting deserialized successfully, but right side it throws "No type information in headers and no default type provided".
When I step through the code I could see it fails in the line org.springframework.kafka.support.serializer.JsonDeserializer
Assert.state(localReader != null, "No headers available and no default type provided");
The Messages are produced by spring boot with Kafka binder. And it has below properties.
spring.kafka.producer.key-serializer=org.apache.kafka.common.serialization.StringSerializer
spring.kafka.producer.value-serializer=org.springframework.kafka.support.serializer.JsonSerializer
##
spring.kafka.producer.properties.spring.json.type.mapping=type1:com.demo.domain.type2,type1:com.demo.domain.type2
spring.kafka.producer.properties.spring.json.trusted.packages=com.demo.domain
spring.kafka.producer.properties.spring.json.add.type.headers=true
And on the Kafka Stream binder consumer side
# kafka stream setting
spring.cloud.stream.bindings.joinProcess-in-0.destination=local-stream-process-type1
spring.cloud.stream.bindings.joinProcess-in-1.destination=local-stream-process-type2
spring.cloud.stream.bindings.joinProcess-out-0.destination=local-stream-process-type3
spring.cloud.stream.kafka.streams.binder.functions.joinProcess.applicationId=local-stream-process
spring.cloud.stream.kafka.streams.binder.configuration.default.key.serde=org.apache.kafka.common.serialization.Serdes$StringSerde
spring.cloud.stream.kafka.streams.binder.configuration.default.value.serde=org.springframework.kafka.support.serializer.JsonSerde
spring.kafka.streams.properties.spring.json.trusted.packages=*
spring.kafka.properties.spring.json.type.mapping=type1:com.demo.domain.type2,type1:com.demo.domain.type2
spring.kafka.streams.properties.spring.json.use.type.headers=true
And My Bifunction looks like
#Configuration
public class StreamsConfig {
#Bean
public RecordMessageConverter converter() {
return new StringJsonMessageConverter();
}
#Bean
public BiFunction<KStream<String, type1>, KStream<String, type2>, KStream<String, type3>> joinProcess() {
return (type1, type2) ->
type1.join(type2, joiner(),
JoinWindows.of(Duration.ofDays(1)));
}
private ValueJoiner<type1, type2, type3> joiner() {
return (type1, type2) -> { new type3("test");
};
}
}
I have pretty much went through all the previous questions and none of them were Bifunction. The one thing i havent tried is set VALUE_TYPE_METHOD.
###Update###
I resolved my issue with explicitly providing the serdes and disabling auto type conversion.
#Bean
public BiFunction<KStream<String, Type1>, KStream<String, Type2>, KStream<String, Type3>> joinStream() {
return (type1, type2) ->
type1.join(type2, myValueJoiner(),
JoinWindows.of(Duration.ofMinutes(1)), StreamJoined.with(Serdes.String(), new Type1Serde(), new Type2Serde()));
}
And I disabled the automatic Deserialization like below
spring.cloud.stream.bindings.joinStream-in-0.consumer.use-native-decoding=false
spring.cloud.stream.bindings.joinStream-in-1.consumer.use-native-decoding=false

Spring Cloud Stream Kafka Streams Binder 3.x: No output to the second output topic in case of multiple output bindings

I have the following processor bean method signature:
#Bean
public BiFunction<KStream<String, MyClass>, KStream<String, String>, KStream<String, MyClass>[]> myStream() {
return (inputStream1, intputStream2) -> {
intputStream2
.peek((k, v) -> {
log.debug(...);
});
return inputStream1
.mapValues(...)
.branch((k,v) -> true, (k,v) -> true);
};
}
The relevant properties:
spring.cloud.stream.function.definition: ...;myStream
spring.cloud.stream.bindings:
myStream-in-0:
destination: inp0
myStream-in-1:
destination: inp1
myStream-out-0:
destination: out0
myStream-out-1:
destination: out1
Spring Cloud Kafka Stream version Hoxton.SR4 (spring-cloud-stream-binder-kafka-streams:jar:3.0.4.RELEASE), embedded Kafka version 2.5.0.
I am testing my topology using embedded Kafka:
#RunWith(SpringRunner.class)
#SpringBootTest(
properties = "spring.cloud.stream.kafka.binder.brokers=${spring.embedded.kafka.brokers}"
)
#EmbeddedKafka(partitions = 1,
topics = {
"inp0", "inp1", "out0", "out1"
},
brokerPropertiesLocation = "kafka.properties"
)
#Slf4j
public class MyApplicationTests {
#Test
public void embeddedKafkaTest() throws IOException, InterruptedException {
Consumer<String, MyClass> out0Consumer = createConsumer("out0ConsumerGroup");
Consumer<String, MyClass> out1Consumer = createConsumer("out1ConsumerGroup");
this.embeddedKafka.consumeFromAnEmbeddedTopic(out0Consumer, "out0");
this.embeddedKafka.consumeFromAnEmbeddedTopic(out1Consumer, "out1");
latch = new CountDownLatch(1);
// ... publish ...
latch.await(15, TimeUnit.SECONDS);
ConsumerRecords<String, MyClass> out0 = KafkaTestUtils.getRecords(out0Consumer);
assertThat(out0.count(), is(greaterThanOrEqualTo(1)));
ConsumerRecords<String, MyClass> out1 = KafkaTestUtils.getRecords(out1Consumer);
assertThat(out1.count(), is(greaterThanOrEqualTo(1)));
}
private <K,V> Consumer<K, V> createConsumer(String groupName) {
Map<String, Object> consumerProps = KafkaTestUtils.consumerProps(groupName, "true", this.embeddedKafka);
consumerProps.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
return new DefaultKafkaConsumerFactory<K, V>(consumerProps).createConsumer();
}
My tests show that the messages from myStream reach and land in the topic "out0" as expected, but "out1" topic remains empty and the unit test fails on the second assertion.
I've tried a couple of things, but it looks like the output to the second output topic is simply not being produced (the output to the first output topic is produced well).
Can you see any mistakes in my setup?
And one more thing: the return statement in the myStream bean method definition shows a compiler warning:
Unchecked generics array creation for varargs parameter
But it looks like that's how the Spring Cloud Kafka Stream 3.x API requires the return type to be defined?
You are passing two predicates to the branch method and both of them always evaluate to true. The first predicate always wins and produces data to the first output binding. The branch method invocation exits after the first predicate evaluate to true. See the javadoc for more details. You should use different predicates (possibly checking certain conditions on key/value). If the first predicate fails and the second one succeeds, then you will see data produced to the second output topic.
With respect to that compiler warning, I think you can safely ignore that as the API will ensure that the predicate objects passed into the branch invocation will have proper type. Since the implementation of the method uses generic varargs, you get that exception. See this thread for details on that compiler warning.

How to create multi output stream from single input stream with Spring Cloud Kafka stream binder?

I am trying to create multi output streams(depend on different time window) from single input stream.
interface AnalyticsBinding {
String PAGE_VIEWS_IN = "pvin";
String PAGE_VIEWS _COUNTS_OUT_Last_5_Minutes = "pvcout_last_5_minutes";
String PAGE_VIEWS _COUNTS_OUT_Last_30_Minutes = "pvcout_last_30_minutes";
#Input(PAGE_VIEWS_IN)
KStream<String, PageViewEvent> pageViewsIn();
#Output(PAGE_VIEWS_COUNTS_OUT_Last_5_Minutes)
KStream<String,Long> pageViewsCountOutLast5Minutes();
#Output(PAGE_VIEWS_COUNTS_OUT_Last_30_Minutes)
KStream<String,Long> pageViewsCountOutLast30Minutes();
}
#StreamListener
#SendTo({ AnalyticsBinding.PAGE_VIEWS_COUNTS_OUT_Last_5_Minutes })
public KStream<String, Long> processPageViewEventForLast5Mintues(
#Input(AnalyticsBinding.PAGE_VIEWS_IN)KStream<String, PageViewEvent> stream) {
// aggregate by Duration.ofMinutes(5)
}
#StreamListener
#SendTo({ AnalyticsBinding.PAGE_VIEWS_COUNTS_OUT_Last_30_Minutes })
public KStream<String, Long> processPageViewEventForLast30Mintues(
#Input(AnalyticsBinding.PAGE_VIEWS_IN)KStream<String, PageViewEvent> stream) {
// aggregate by Duration.ofMinutes(30)
}
When I start the application just one stream task would work, Is there a way to get both processPageViewEventForLast5Mintues and processPageViewEventForLast30Mintues work simultaneously
You are using the same input binding in both processors and that's why you are seeing only one as working. Add another input binding in the binding interface and set it's destination to the same topic. Also, change one of the StreamListener methods to use this new binding name.
With that said, if you are using the latest versions of Spring Cloud Stream, you should consider migrating to a functional model. For e.g. the following should work.
#Bean
public Function<KStream<String, PageViewEvent>, KStream<String, Long>> processPageViewEventForLast5Mintues() {
...
}
and
#Bean
public Function<KStream<String, PageViewEvent>, KStream<String, Long>> processPageViewEventForLast30Mintues() {
...
}
The binder automatically creates two distinct input bindings in this case.
You can set destinations on those bindings.
spring.cloud.stream.bindings.processPageViewEventForLast5Mintues-in-0.destination=<your Kafka topic>
spring.cloud.stream.bindings.processPageViewEventForLast30Mintues-in-0.destination=<your Kafka topic>

KTable returns no data in Spring Boot application, however it can be queried

I have a Spring Boot application working with Kafka Streams. I have a KTable with some financial currency quotes which is created like this:
#Bean(name = "indicativeQuotes")
public KTable<String, Quote> quoteKTable(StreamsBuilder streamsBuilder) {
return streamsBuilder.table(quoteTopicName,
Materialized.<String,Quote,KeyValueStore<Bytes,byte[]>>as("quoteTable")
.withKeySerde(Serdes.String())
.withValueSerde(new JsonSerde<>(Quote.class)));
}
I #Autowire this bean in another component, and test it with the following code:
#Autowired
private KTable<String, Quote> indicativeQuotes;
#PostConstruct
private void postConstruct() {
doPrint();
}
public void doPrint() {
ReadOnlyKeyValueStore<String, Quote> store = streamsBuilderFactoryBean.getKafkaStreams().store("quoteTable", QueryableStoreTypes.keyValueStore());
store.all().forEachRemaining(keyValue -> log.info("Key: " + keyValue.key + " Value: " + keyValue.value));
indicativeQuotes.foreach((k,v) -> log.info(k));}
The code logs correct values when querying through store, but it outputs nothing in foreach(), as if like table was empty. I have also tried print() and other options - all output nothing without any exceptions.
I'm starting to think that I cant inject KTable beans like that, but Spring documentation on the topic of kafka streams is pretty scarce and I can't find good examples. Any help will be appreciated.
Update.
My use case is that I have a scheduled Quartz job which is supposed to write current state of KTable to a Kafka topic when triggered, like following:
#Override
protected void executeInternal(JobExecutionContext jobExecutionContext) throws JobExecutionException {
TriggerKey triggerKey = jobExecutionContext.getTrigger().getKey();
log.info("Job was triggered by: {}", triggerKey.getName());
indicativeQuotes.filter((key, value) -> key.equals(triggerKey.getName()))
.mapValues(quoteToCourseFixedMapper)
.toStream()
.peek((instrument, course)-> log.info("Sending courses for instrument: {}, {}", instrument, course))
.to(quoteEventTopicName);
}
But I think this code does not work because it is not a part of processing topology and I cannot just take data from Ktable on demand. I'm a bit puzzled here, of course I can query the data through store when event is triggered, but maybe there is a better pattern for such use case? Basically I'm interested if its possible to incorporate this triggered job events as a part of processing pipeline.
If you just want to publish the updates to another topic, turn the KTable to a KStream and use the to() function.
KTable ktable = ...;
KStream ksteram = ktable.toStream();
kstream.to("topic", Produces.with(keySerde, valueSerde))
The topic will contain the change log of that table.
BUT
Apparently because of some life cycle related concepts, you can't just inject (#autowire) KStream/KTable. You should keep your KafkaStreams related code kind of as in-line as possible.
So in your specific case that you want the to do something with current state of table in some "random" time, you have to query the store (table). So search for kafka steams interactive queries. remember that you need to fetch data from all of your instances of the application (if you have more than 1. Or you can use a global store. Its a day or two of search.

Resources