"WindowedBy Count KStream" throws StreamsException

"WindowedBy Count KStream" throws StreamsException - spring

I tried to count event from KStream, into time period:
KStream<String, VehicleEventTO> stream = builder.stream("vehicle", Consumed.with(Serdes.String(), new JsonSerde<>(VehicleEventTO.class)));
KStream<String, VehicleEventTO> streamWithKey = stream.selectKey((key, value) -> value.getId_vehicle().toString());
KStream<String, Long> streamValueKey = streamWithKey.map((key, value) -> KeyValue.pair(key, value.getId_vehicle()));
streamValueKey.groupByKey()
.windowedBy(TimeWindows.of(Duration.ofMinutes(10).toMillis()))
.count(Materialized.with(Serdes.String(), new JsonSerde<>(Long.class)));
I've this exception:
Exception in thread
"test-app-87ce164d-c427-4dcf-aa76-aeeb6f8fc943-StreamThread-1"
org.apache.kafka.streams.errors.StreamsException: Exception caught in
process. taskId=0_0, processor=KSTREAM-SOURCE-0000000000,
topic=vehicle, partition=0, offset=160385 at
org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:318)
at
org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.process(AssignedStreamsTasks.java:94)
at
org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:409)
at
org.apache.kafka.streams.processor.internals.StreamThread.processAndMaybeCommit(StreamThread.java:964)
at
org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:832)
at
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767)
at
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736)
Caused by: org.apache.kafka.streams.errors.StreamsException: A
serializer (key:
org.apache.kafka.common.serialization.ByteArraySerializer / value:
org.apache.kafka.common.serialization.ByteArraySerializer) is not
compatible to the actual key or value type (key type: java.lang.String
/ value type: java.lang.Long). Change the default Serdes in
StreamConfig or provide correct Serdes via method parameters.

groupByKey() makes use of the default serialisers:
groupByKey()
Group the records by their current key into a
KGroupedStream while preserving the original values and default
serializers and deserializers.
You either have to use groupByKey(Serialized<K,V> serialized) or groupByKey(Grouped<K,V> grouped).
The following should do the trick:
streamValueKey.groupByKey(Serialized.with(Serdes.String(), Serdes.Long()))
.windowedBy(TimeWindows.of(Duration.ofMinutes(10).toMillis()))
.count(Materialized.with(Serdes.String(), new JsonSerde<>(Long.class)));

Related

Spring Cloud Stream Kafka Binder incorrect partitioning

I'm using the org.springframework.cloud:spring-cloud-stream-binder-kafka library and I'm having trouble with partitioning messages in a topic. My topic has 4 partitions but i'm only seeing events in partition 0 i.e. the publisher is not partitioning the event correctly.
When i check the topic partitions (the one that has messages in it), I do see that the message has a proper value for the key field (but it's not being used? idk, i'm a little confused)
I followed the official partitioning example and have the following code:
Producer code
#Component
class FooEventPublisher {
private val logger = LoggerFactory.getLogger(this::class.java)
private val mapper = jacksonObjectMapper()
.findAndRegisterModules()
.configure(SerializationFeature.WRITE_DATES_AS_TIMESTAMPS, false)
private val ingressChannel = Channel<FooEvent>(capacity = Channel.UNLIMITED)
/** other component will call this to pipe in events to be published */
suspend fun send(event: FooEvent) = ingressChannel.send(event)
/** helper function to convert [FooEvent] into a [Message] with a JSON payload */
private fun FooEvent.toMessage(): Message<ByteArray> {
val payload = mapper.writeValueAsBytes(this)
val partitionKey = this.name
val message = MessageBuilder
.withPayload(payload)
.setHeader(KafkaHeaders.MESSAGE_KEY, partitionKey.toByteArray())
.setHeader("partitionKey", partitionKey.toByteArray())
.build()
return message
}
#Bean
fun publishFooEvents(): () -> Flux<Message<ByteArray>> = {
ingressChannel
.consumeAsFlow()
.map {
try {
it.toMessage()
} catch (err: Exception) {
logger.error("Skipping event because of encoding failure", err)
logger.trace("problematic event=$it")
null
}
}
.filterNotNull()
.asFlux()
}
}
Relevant Spring Configuration
spring:
cloud:
function:
definition: publishFooEvents
stream:
kafka:
binder:
brokers: localhost:9092
bindings:
publishFooEvents-out-0:
destination: kf-foo-events-topic
producer:
partition-key-expression: headers['partitionKey']
I expected the kafka binder library to use the partitionKey field as the field to partition on e.g. all messages with key 1234 would go to partition 1 and messages with key 5678 would go to partition 2
I'm not sure what i'm missing here? why isn't the binder detecting that the target topic has 4 partitions and using that information to partition?
edit: fixed key in example above

Partitioning at the binder level is not intended for infrastructure that supports partitioning natively, such as Kafka. Just use native Kafka partitioning instead (which by default will be based on the key).
Furthermore, you are setting the header to a byte[]; it should remain as String so that the hash algorithm uses the value; the hash code of byte[] depends on its system identity, not the array contents.
e.g. all messages with key 1234 would go to partition 1 and messages with key 1234 would go to partition 2
That makes no sense, I presume you meant to specify different keys.

Creating a state store with Avro inside Kstream Consumer Processor

I have a consumer defined as below. It reads a avro message out of topic and constructs a statestore of aggregated data, which is also of type avro.
#Bean
public Consumer<KStream<String, InputEvent>> avroTest() {
Serde<OutputEvent> serdeOutEvent = new SpecificAvroSerde<>(schemaRegistryClient);
return st -> st.groupByKey().aggregate(OutputEvent::new, (key, currentEvent, outputEvent) -> {
//aggregate here
return outputEvent;
}, Materialized.with(new Serdes.StringSerde(), serdeOutEvent).toStream();
}
The function is able to read messages from topic and create the first aggregated result, but when it tries to store it in statestore, receives a 404 for schema not present.
Exception in thread "odoAvroTest-e4ef8e3e-ea1e-458c-b309-b2afefbeacec-StreamThread-1" org.apache.kafka.streams.errors.StreamsException: Exception caught in process. taskId=0_0, processor=KSTREAM-SOURCE-0000000000, topic=odometer, partition=0, offset=0, stacktrace=org.apache.kafka.common.errors.SerializationException: Error retrieving Avro schema: {"type":"record","name": "" .... }
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Subject not found.; error code: 40401
at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:226)
at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:252)
at io.confluent.kafka.schemaregistry.client.rest.RestService.lookUpSubjectVersion(RestService.java:319)
at io.confluent.kafka.schemaregistry.client.rest.RestService.lookUpSubjectVersion(RestService.java:307)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getIdFromRegistry(CachedSchemaRegistryClient.java:165)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getId(CachedSchemaRegistryClient.java:297)
at io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(AbstractKafkaAvroSerializer.java:73)
at io.confluent.kafka.serializers.KafkaAvroSerializer.serialize(KafkaAvroSerializer.java:53)
at io.confluent.kafka.streams.serdes.avro.SpecificAvroSerializer.serialize(SpecificAvroSerializer.java:65)
at io.confluent.kafka.streams.serdes.avro.SpecificAvroSerializer.serialize(SpecificAvroSerializer.java:38)
at org.apache.kafka.streams.state.internals.ValueAndTimestampSerializer.serialize(ValueAndTimestampSerializer.java:59)
at org.apache.kafka.streams.state.internals.ValueAndTimestampSerializer.serialize(ValueAndTimestampSerializer.java:50)
at org.apache.kafka.streams.state.internals.ValueAndTimestampSerializer.serialize(ValueAndTimestampSerializer.java:27)
at org.apache.kafka.streams.state.StateSerdes.rawValue(StateSerdes.java:192)
at org.apache.kafka.streams.state.internals.MeteredKeyValueStore.put(MeteredKeyValueStore.java:166)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl$KeyValueStoreReadWriteDecorator.put(ProcessorContextImpl.java:486)
at org.apache.kafka.streams.kstream.internals.KStreamAggregate$KStreamAggregateProcessor.process(KStreamAggregate.java:103)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:117)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:201)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:180)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:133)
at org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:87)
at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:363)
at org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.process(AssignedStreamsTasks.java:199)
at org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:425)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:912)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:819)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:788)
Do let know if there are additional config tweaks that are necessary to make this work. When i change the input to hashmap and /or a simple POJO amd use JSONSerde, the code seems to work and creates aggregation

The issue here is Schema Registry needed by AVRO Serde. When you set value Serde in Materialized.with(), you have to set the schema registry config to your serde.

KafkaStream does not use the serde given in Consumed.with(), but uses the default serde

I have created Serde consuming from kafka as the following
import org.apache.kafka.connect.json.JsonDeserializer;
import org.apache.kafka.connect.json.JsonSerializer;
final Deserializer<JsonNode> jsonDeserializer = new JsonDeserializer();
final Serializer<JsonNode> jsonSerializer = new JsonSerializer();
final Serde<JsonNode> jsonNodeSerde = Serdes.serdeFrom(jsonSerializer, jsonDeserializer);
final StreamsBuilder builder = new StreamsBuilder();
final KStream<String, JsonNode> eventStream = builder
.stream("my-test-1",
Consumed.with(Serdes.String(), jsonNodeSerde)
but still receive serialization error:
Caused by: org.apache.kafka.streams.errors.StreamsException: A serializer (key: org.apache.kafka.common.serialization.StringSerializer / value: org.apache.kafka.common.serialization.ByteArraySerializer) is not compatible to the actual key or value type (key type: java.lang.String / value type: com.fasterxml.jackson.databind.node.ObjectNode). Change the default Serdes in StreamConfig or provide correct Serdes via method parameters.
As Consumed.with() is already provided, why the default serde is still used? As the answer written here, this should work, or?
https://stackoverflow.com/a/48832957/3952994

Yes, the problem is that your data doesn't match the serdes.
A serializer (key: org.apache.kafka.common.serialization.StringSerializer /
value: org.apache.kafka.common.serialization.ByteArraySerializer)
is not compatible to the actual key or value type
(key type: java.lang.String /
value type: com.fasterxml.jackson.databind.node.ObjectNode).
However, the error message says the problem is caused when data is serialized, i.e. when Kafka Streams attempts to write the data somewhere.
Your code snippet with Consumed, however, is about deserializing and thus reading data. Therefore it seems that the problem is not caused by the code snippet you shared in your question, but by code that is presumably further down in your Java file, which is not shown in your question. (Btw, it would have helped if you had provided the full stack trace of the error.)

Is it possible to aggregate an object instad of a string with spring cloud stream api?

I want to use the spring cloud stream api to aggreate events from a topic.
Therefore i use as input a KStream.
KStream<Object, LoggerCreatedMessage>
Now i want to use an aggregator to store my new Object in a KeyValue Store, so i use following code:
input
.map((key, value) -> {
return new KeyValue<>(value.logger_id,value);
})
/*.groupBy(
(s, loggerEvent) -> loggerEvent.logger_id,
Serialized.with(null, loggerEventSerde))*/
.groupByKey()
.aggregate(
String::new,
(s, loggerEvent, vr) -> {
return vr;
},
Materialized.<String, String, KeyValueStore<Bytes, byte[]>>as(STORE_NAME).withKeySerde(Serdes.String()).
withValueSerde(Serdes.String())
);
Why can i only use a String as an Initializer is it not possible to use any Object?
Instead of String::new i wanted to use LoggerDomain::new, but i only get this error message:
Bad return type in method reference: cannot convert LoggerDomain to VR
Do i miss something?

You define <key,value> as <String, String> via Materialized.<String, String, KeyValueStore<Bytes, byte[]>> -- if you value type should be LoggerDomain, it should be Materialized.<KeyType, LoggerDomain, KeyValueStore<Bytes, byte[]>>().
Note that you need to provide a custom Serde for LoggerDomain for this case to Materialized, too.

SqsListener String index out of bounds issue

I'm encountering a really weird problem when trying to use the #SQSListener annotation from the Spring Cloud module.
Here's my listener method:
#SqsListener(value = "myproject-dev-au-error-queue")
public void listenPhoenix(String message) throws IOException {
logger.info(message);
}
However, once I run the project, it starts reading messages from the queue and fails with the following error:
Exception in thread "simpleMessageListenerContainer-4" Exception in thread "simpleMessageListenerContainer-6" Exception in thread "simpleMessageListenerContainer-9" Exception in thread "simpleMessageListenerContainer-10" java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(String.java:1931)
at org.springframework.cloud.aws.messaging.core.QueueMessageUtils.getNumberValue(QueueMessageUtils.java:93)
at org.springframework.cloud.aws.messaging.core.QueueMessageUtils.getMessageAttributesAsMessageHeaders(QueueMessageUtils.java:80)
at org.springframework.cloud.aws.messaging.core.QueueMessageUtils.createMessage(QueueMessageUtils.java:56)
at org.springframework.cloud.aws.messaging.listener.SimpleMessageListenerContainer$MessageExecutor.getMessageForExecution(SimpleMessageListenerContainer.java:375)
at org.springframework.cloud.aws.messaging.listener.SimpleMessageListenerContainer$MessageExecutor.run(SimpleMessageListenerContainer.java:336)
at org.springframework.cloud.aws.messaging.listener.SimpleMessageListenerContainer$SignalExecutingRunnable.run(SimpleMessageListenerContainer.java:392)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
With the problematic part being in the spring-cloud-aws-messaging module QueueMessageUtils class numberType variable assignment:
private static Object getNumberValue(MessageAttributeValue value) {
String numberType = value.getDataType().substring("Number".length() + 1);
try {
Class<? extends Number> numberTypeClass = Class.forName(numberType).asSubclass(Number.class);
return NumberUtils.parseNumber(value.getStringValue(), numberTypeClass);
} catch (ClassNotFoundException var3) {
throw new MessagingException(String.format("Message attribute with value '%s' and data type '%s' could not be converted into a Number because target class was not found.", value.getStringValue(), value.getDataType()), var3);
}
}
Has anyone seen this before and if so is there a way to fix this?
P.S: Since I don't really care about the message attributes I wouldn't mind if they were completely ignored.
Thanks in advance.

The exception in the given code is thrown from the following line.
String numberType = value.getDataType().substring("Number".length() + 1);
According to the documentation in AWS JAVA SDK,it explains how the getDataType() function works (See this link). According to the documentation, it will return one of the following values.
String
Number
Binary
Amazon SQS supports the following logical data types: String, Number,
and Binary. For the Number data type, you must use StringValue.
Now, when you call value.getDataType(), it will return one of the above values. Assuming it's "Number", you are trying to get a substring of it, starting from index 6 (where "Number".length() + 1 = 6)
But there is no such index in the String returned by value.getDataType(). Therefore, it will throw a java.lang.StringIndexOutOfBoundsException exception.
As a solution for this, you can simply use the following instead of getting the substring of it.
String numberType = value.getDataType();

I have also face the same issue, when publishing message with SQS Extended Library, it automatically added one attribute along with the message SQSLargePayloadSize with the datatype Number which is causing the problem. the exception is resolved by updating the dependency from
implementation group: 'org.springframework.cloud', name: 'spring-cloud-aws-messaging', version: '2.2.6.RELEASE'
to any latest version of awspring spring-cloud-aws-messaging
implementation group: 'io.awspring.cloud', name: 'spring-cloud-aws-messaging', version: '2.4.1'
make sure all the package must be imported from io.awspring.cloud

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

"WindowedBy Count KStream" throws StreamsException - spring

Related

Spring Cloud Stream Kafka Binder incorrect partitioning

Creating a state store with Avro inside Kstream Consumer Processor

KafkaStream does not use the serde given in Consumed.with(), but uses the default serde

Is it possible to aggregate an object instad of a string with spring cloud stream api?

SqsListener String index out of bounds issue

Categories

Resources