Delay in processing with Kafka Streams - spring-boot

I have created a Kafka Stream topology and I am having 1 Source and 2 Sinks.
I am using Spring Boot(2.1.9) with Kafka Streams and Not using Spring Cloud. Kafka Version 2.3.0
#Configuration
#EnableKafkaStreams
public class StreamStart {
#Bean
public KStream<String, String> process(StreamsBuilder builder) {
KStream<String, String> inputStream = builder.stream("streamIn", Consumed.with(Serdes.String(), Serdes.String()));
KStream<String, String> upperCaseStream = inputStream.mapValues(value -> value.toUpperCase());
upperCaseStream.to("outTopic", Produced.with(Serdes.String(), Serdes.String()));
KTable<String, Long> wordCounts = upperCaseStream
.flatMapValues(v -> Arrays.asList(v.split(" ")))
.selectKey((k, v) -> v)
.groupByKey(Serialized.with(Serdes.String(), Serdes.String()))
.count(Materialized.<String, Long, KeyValueStore<Bytes, byte[]>> as("counts-store"));
wordCounts.toStream().to("wordCountTopic", Produced.with(Serdes.String(), Serdes.Long()));
return upperCaseStream;
}
}
The data flows in outTopic instantaneously whereas data getting displayed in wordCountTopic takes 20-25 seconds for each record.
Any Suggestions?

Related

Meter registration fails on Spring Boot Kafka consumer with Prometheus MeterRegistry

I am investigating a bug report in our application (spring boot) regarding the kafka metric kafka.consumer.fetch.manager.records.consumed.total being missing.
The application has two kafka consumers, lets call them query-routing and query-tracking consumers, and they are configured via #KafkaListener annotation and each kafka consumer has it's own instance of ConcurrentKafkaListenerContainerFactory.
The query-router consumer is configured as
#Configuration
#EnableKafka
public class QueryRoutingConfiguration {
#Bean(name = "queryRoutingContainerFactory")
public ConcurrentKafkaListenerContainerFactory<String, RoutingInfo> kafkaListenerContainerFactory(MeterRegistry meterRegistry) {
Map<String, Object> consumerConfigs = new HashMap<>();
// For brevity I removed the configs as they are trivial configs like bootstrap servers and serializers
DefaultKafkaConsumerFactory<String, RoutingInfo> consumerFactory =
new DefaultKafkaConsumerFactory<>(consumerConfigs);
consumerFactory.addListener(new MicrometerConsumerListener<>(meterRegistry));
ConcurrentKafkaListenerContainerFactory<String, RoutingInfo> factory =
new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory);
factory.getContainerProperties().setIdleEventInterval(5000L);
return factory;
}
}
And the query-tracking consumer is configured as:
#Configuration
#EnableKafka
public class QueryTrackingConfiguration {
private static final FixedBackOff NO_ATTEMPTS = new FixedBackOff(Duration.ofSeconds(0).toMillis(), 0L);
#Bean(name = "queryTrackingContainerFactory")
public ConcurrentKafkaListenerContainerFactory<String, QueryTrackingMessage> kafkaListenerContainerFactory(MeterRegistry meterRegistry) {
Map<String, Object> consumerConfigs = new HashMap<>();
// For brevity I removed the configs as they are trivial configs like bootstrap servers and serializers
DefaultKafkaConsumerFactory<String, QueryTrackingMessage> consumerFactory =
new DefaultKafkaConsumerFactory<>(consumerConfigs);
consumerFactory.addListener(new MicrometerConsumerListener<>(meterRegistry));
ConcurrentKafkaListenerContainerFactory<String, QueryTrackingMessage> factory =
new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory);
factory.getContainerProperties().setAckMode(ContainerProperties.AckMode.MANUAL);
factory.setBatchListener(true);
DefaultErrorHandler deusErrorHandler = new DefaultErrorHandler(NO_ATTEMPTS);
factory.setCommonErrorHandler(deusErrorHandler);
return factory;
}
}
The MeterRegistryConfigurator bean configuaration is set as:
#Configuration
public class MeterRegistryConfigurator {
private static final Logger LOG = LoggerFactory.getLogger(MeterRegistryConfigurator.class);
private static final String PREFIX = "dps";
#Bean
MeterRegistryCustomizer<MeterRegistry> meterRegistryCustomizer() {
return registry -> registry.config()
.onMeterAdded(meter -> LOG.info("onMeterAdded: {}", meter.getId().getName()))
.onMeterRemoved(meter -> LOG.info("onMeterRemoved: {}", meter.getId().getName()))
.onMeterRegistrationFailed(
(id, s) -> LOG.info("onMeterRegistrationFailed - id '{}' value '{}'", id.getName(), s))
.meterFilter(PrefixMetricFilter.withPrefix(PREFIX))
.meterFilter(
MeterFilter.deny(id ->
id.getName().startsWith(PREFIX + ".jvm")
|| id.getName().startsWith(PREFIX + ".system")
|| id.getName().startsWith(PREFIX + ".process")
|| id.getName().startsWith(PREFIX + ".logback")
|| id.getName().startsWith(PREFIX + ".tomcat"))
)
.meterFilter(MeterFilter.ignoreTags("host", "host.name"))
.namingConvention(NamingConvention.snakeCase);
}
}
The #KafkaListener for each consumer is set as
#KafkaListener(
id = "query-routing",
idIsGroup = true,
topics = "${query-routing.consumer.topic}",
groupId = "${query-routing.consumer.groupId}",
containerFactory = "queryRoutingContainerFactory")
public void listenForMessages(ConsumerRecord<String, RoutingInfo> record) {
// Handle each record ...
}
and
#KafkaListener(
id = "query-tracking",
idIsGroup = true,
topics = "${query-tracking.consumer.topic}",
groupId = "${query-tracking.consumer.groupId}",
containerFactory = "queryTrackingContainerFactory"
)
public void listenForMessages(List<ConsumerRecord<String, QueryTrackingMessage>> consumerRecords, Acknowledgment ack) {
// Handle each record ...
}
When the application starts up, going to the actuator/prometheus endpoing I can see the metric for both consumers:
# HELP dps_kafka_consumer_fetch_manager_records_consumed_total The total number of records consumed
# TYPE dps_kafka_consumer_fetch_manager_records_consumed_total counter
dps_kafka_consumer_fetch_manager_records_consumed_total{client_id="consumer-qf-query-tracking-consumer-1",kafka_version="3.1.2",spring_id="not.managed.by.Spring.consumer-qf-query-tracking-consumer-1",} 7.0
dps_kafka_consumer_fetch_manager_records_consumed_total{client_id="consumer-QF-Routing-f5d0d9f1-e261-407b-954d-5d217211dee0-2",kafka_version="3.1.2",spring_id="not.managed.by.Spring.consumer-QF-Routing-f5d0d9f1-e261-407b-954d-5d217211dee0-2",} 0.0
But a few seconds later there is a new call to io.micrometer.core.instrument.binder.kafka.KafkaMetrics#checkAndBindMetrics which will remove a set of metrics (including kafka.consumer.fetch.manager.records.consumed.total)
onMeterRegistrationFailed - dps.kafka.consumer.fetch.manager.records.consumed.total string Prometheus requires that all meters with the same name have the same set of tag keys. There is already an existing meter named 'dps.kafka.consumer.fetch.manager.records.consumed.total' containing tag keys [client_id, kafka_version, spring_id]. The meter you are attempting to register has keys [client_id, kafka_version, spring_id, topic].
Going again to actuator/prometheus will only show the metric for the query-routing consumer:
# HELP deus_dps_persistence_kafka_consumer_fetch_manager_records_consumed_total The total number of records consumed for a topic
# TYPE deus_dps_persistence_kafka_consumer_fetch_manager_records_consumed_total counter
deus_dps_persistence_kafka_consumer_fetch_manager_records_consumed_total{client_id="consumer-QF-Routing-0a739a21-4764-411a-9cc6-0e60293b40b4-2",kafka_version="3.1.2",spring_id="not.managed.by.Spring.consumer-QF-Routing-0a739a21-4764-411a-9cc6-0e60293b40b4-2",theKey="routing",topic="QF_query_routing_v1",} 0.0
As you can see above the metric for the query-tracking consumer is gone.
As the log says, The meter you are attempting to register has keys [client_id, kafka_version, spring_id, topic]. The issue is I cannot find where is this metric with a topic key being registered which will trigger io.micrometer.core.instrument.binder.kafka.KafkaMetrics#checkAndBindMetrics which will remove the metric for the query-tracking consumer.
I am using
micrometer-registry-prometheus version 1.9.5
spring boot version 2.7.5
spring kafka (org.springframework.kafka:spring-kafka)
My question is, why does the metric kafka.consumer.fetch.manager.records.consumed.total fails causing it to be removed for the query-tracking consumer and how can I fix it?
I believe this is internal in Micrometer KafkaMetrics.
Periodically, it checks for new metrics; presumably, the topic one shows up after the consumer subscribes to the topic.
#Override
public void bindTo(MeterRegistry registry) {
this.registry = registry;
commonTags = getCommonTags(registry);
prepareToBindMetrics(registry);
checkAndBindMetrics(registry);
VVVVVVVVVVVVVVVVVVVVVVVVVVVVVV
scheduler.scheduleAtFixedRate(() -> checkAndBindMetrics(registry), getRefreshIntervalInMillis(),
getRefreshIntervalInMillis(), TimeUnit.MILLISECONDS);
}
You should be able to write a filter to exclude the one with fewer tags.

Spring Kafka Stream - No type information in headers and no default type provided when using BiFunction

I am trying to join 2 topics and produce output in to a 3rd topic using BiFunction. I am facing issue with resolving the type for incoming message. My left side message is getting deserialized successfully, but right side it throws "No type information in headers and no default type provided".
When I step through the code I could see it fails in the line org.springframework.kafka.support.serializer.JsonDeserializer
Assert.state(localReader != null, "No headers available and no default type provided");
The Messages are produced by spring boot with Kafka binder. And it has below properties.
spring.kafka.producer.key-serializer=org.apache.kafka.common.serialization.StringSerializer
spring.kafka.producer.value-serializer=org.springframework.kafka.support.serializer.JsonSerializer
##
spring.kafka.producer.properties.spring.json.type.mapping=type1:com.demo.domain.type2,type1:com.demo.domain.type2
spring.kafka.producer.properties.spring.json.trusted.packages=com.demo.domain
spring.kafka.producer.properties.spring.json.add.type.headers=true
And on the Kafka Stream binder consumer side
# kafka stream setting
spring.cloud.stream.bindings.joinProcess-in-0.destination=local-stream-process-type1
spring.cloud.stream.bindings.joinProcess-in-1.destination=local-stream-process-type2
spring.cloud.stream.bindings.joinProcess-out-0.destination=local-stream-process-type3
spring.cloud.stream.kafka.streams.binder.functions.joinProcess.applicationId=local-stream-process
spring.cloud.stream.kafka.streams.binder.configuration.default.key.serde=org.apache.kafka.common.serialization.Serdes$StringSerde
spring.cloud.stream.kafka.streams.binder.configuration.default.value.serde=org.springframework.kafka.support.serializer.JsonSerde
spring.kafka.streams.properties.spring.json.trusted.packages=*
spring.kafka.properties.spring.json.type.mapping=type1:com.demo.domain.type2,type1:com.demo.domain.type2
spring.kafka.streams.properties.spring.json.use.type.headers=true
And My Bifunction looks like
#Configuration
public class StreamsConfig {
#Bean
public RecordMessageConverter converter() {
return new StringJsonMessageConverter();
}
#Bean
public BiFunction<KStream<String, type1>, KStream<String, type2>, KStream<String, type3>> joinProcess() {
return (type1, type2) ->
type1.join(type2, joiner(),
JoinWindows.of(Duration.ofDays(1)));
}
private ValueJoiner<type1, type2, type3> joiner() {
return (type1, type2) -> { new type3("test");
};
}
}
I have pretty much went through all the previous questions and none of them were Bifunction. The one thing i havent tried is set VALUE_TYPE_METHOD.
###Update###
I resolved my issue with explicitly providing the serdes and disabling auto type conversion.
#Bean
public BiFunction<KStream<String, Type1>, KStream<String, Type2>, KStream<String, Type3>> joinStream() {
return (type1, type2) ->
type1.join(type2, myValueJoiner(),
JoinWindows.of(Duration.ofMinutes(1)), StreamJoined.with(Serdes.String(), new Type1Serde(), new Type2Serde()));
}
And I disabled the automatic Deserialization like below
spring.cloud.stream.bindings.joinStream-in-0.consumer.use-native-decoding=false
spring.cloud.stream.bindings.joinStream-in-1.consumer.use-native-decoding=false

How to create multi output stream from single input stream with Spring Cloud Kafka stream binder?

I am trying to create multi output streams(depend on different time window) from single input stream.
interface AnalyticsBinding {
String PAGE_VIEWS_IN = "pvin";
String PAGE_VIEWS _COUNTS_OUT_Last_5_Minutes = "pvcout_last_5_minutes";
String PAGE_VIEWS _COUNTS_OUT_Last_30_Minutes = "pvcout_last_30_minutes";
#Input(PAGE_VIEWS_IN)
KStream<String, PageViewEvent> pageViewsIn();
#Output(PAGE_VIEWS_COUNTS_OUT_Last_5_Minutes)
KStream<String,Long> pageViewsCountOutLast5Minutes();
#Output(PAGE_VIEWS_COUNTS_OUT_Last_30_Minutes)
KStream<String,Long> pageViewsCountOutLast30Minutes();
}
#StreamListener
#SendTo({ AnalyticsBinding.PAGE_VIEWS_COUNTS_OUT_Last_5_Minutes })
public KStream<String, Long> processPageViewEventForLast5Mintues(
#Input(AnalyticsBinding.PAGE_VIEWS_IN)KStream<String, PageViewEvent> stream) {
// aggregate by Duration.ofMinutes(5)
}
#StreamListener
#SendTo({ AnalyticsBinding.PAGE_VIEWS_COUNTS_OUT_Last_30_Minutes })
public KStream<String, Long> processPageViewEventForLast30Mintues(
#Input(AnalyticsBinding.PAGE_VIEWS_IN)KStream<String, PageViewEvent> stream) {
// aggregate by Duration.ofMinutes(30)
}
When I start the application just one stream task would work, Is there a way to get both processPageViewEventForLast5Mintues and processPageViewEventForLast30Mintues work simultaneously
You are using the same input binding in both processors and that's why you are seeing only one as working. Add another input binding in the binding interface and set it's destination to the same topic. Also, change one of the StreamListener methods to use this new binding name.
With that said, if you are using the latest versions of Spring Cloud Stream, you should consider migrating to a functional model. For e.g. the following should work.
#Bean
public Function<KStream<String, PageViewEvent>, KStream<String, Long>> processPageViewEventForLast5Mintues() {
...
}
and
#Bean
public Function<KStream<String, PageViewEvent>, KStream<String, Long>> processPageViewEventForLast30Mintues() {
...
}
The binder automatically creates two distinct input bindings in this case.
You can set destinations on those bindings.
spring.cloud.stream.bindings.processPageViewEventForLast5Mintues-in-0.destination=<your Kafka topic>
spring.cloud.stream.bindings.processPageViewEventForLast30Mintues-in-0.destination=<your Kafka topic>

Spring Cloud Stream Kafka Consumer Test

I am trying to setup test as suggested here at GitHub a link
Map<String, Object> senderProps = KafkaTestUtils.producerProps(embeddedKafka);
DefaultKafkaProducerFactory<Integer, String> pf = new DefaultKafkaProducerFactory<>(senderProps);
try {
KafkaTemplate<Integer, String> template = new KafkaTemplate<>(pf, true);
template.setDefaultTopic("words");
template.sendDefault("foobar");
--> ConsumerRecord<String, String> cr = KafkaTestUtils.getSingleRecord(consumer, "output");
log.debug(cr);
}
finally {
pf.destroy();
}
Where StreamProcessor is set to
#StreamListener
#SendTo("output")
public KStream<?, WordCount> process(#Input("input") KStream<Object, String> input) {
return input.map((key, value) -> new KeyValue<>(value, new WordCount(value, 10, new Date(), new Date())));
}
--> line never consumes messages which to my mind should be on topic "output" due to the fact that #Streamprocessor has #SendTo("output")
I want to be able to test stream processed messages.
You need to consume from the actual topic that your output is bound to.
Do you have a configuration for spring.cloud.stream.bindings.output.destination? That should be the value that you need to use. If you don't set that, the default will be the same as the binding - output in this case.

KStream with Testbinder - Spring Cloud Stream Kafka

I recently started looking into Spring Cloud Stream for Kafka, and have struggled to make the TestBinder work with Kstreams. Is this a known limitation, or have I just overlooked something?
This works fine:
String processor:
#StreamListener(TopicBinding.INPUT)
#SendTo(TopicBinding.OUTPUT)
public String process(String message) {
return message + " world";
}
String test:
#Test
#SuppressWarnings("unchecked")
public void testString() {
Message<String> message = new GenericMessage<>("Hello");
topicBinding.input().send(message);
Message<String> received = (Message<String>) messageCollector.forChannel(topicBinding.output()).poll();
assertThat(received.getPayload(), equalTo("Hello world"));
}
But when I try to use KStream in my process, I can't get the TestBinder to be working.
Kstream processor:
#SendTo(TopicBinding.OUTPUT)
public KStream<String, String> process(
#Input(TopicBinding.INPUT) KStream<String, String> events) {
return events.mapValues((value) -> value + " world");
}
KStream test:
#Test
#SuppressWarnings("unchecked")
public void testKstream() {
Message<String> message = MessageBuilder
.withPayload("Hello")
.setHeader(KafkaHeaders.TOPIC, "event.sirism.dev".getBytes())
.setHeader(KafkaHeaders.MESSAGE_KEY, "Test".getBytes())
.build();
topicBinding.input().send(message);
Message<String> received = (Message<String>)
messageCollector.forChannel(topicBinding.output()).poll();
assertThat(received.getPayload(), equalTo("Hello world"));
}
As you might have noticed, I omitted the #StreamListener from the Kstream processor, but without it it doesn't seem like the testbinder can find the handler. (but with it, it doesn't work when starting up the application)
Is this a known bug / limitation, or am I just doing something stupid here?
The test binder is only for MessageChannel-based binders (subclasses of AbstractMessageChannelBinder). The KStreamBinder does not use MessageChannels.
You can testing using the real binder and an embedded kafka broker, provided by the spring-kafka-test module.
Also see this issue.

Resources