Spring Cloud Stream Kafka Binder incorrect partitioning

Spring Cloud Stream Kafka Binder incorrect partitioning - spring

I'm using the org.springframework.cloud:spring-cloud-stream-binder-kafka library and I'm having trouble with partitioning messages in a topic. My topic has 4 partitions but i'm only seeing events in partition 0 i.e. the publisher is not partitioning the event correctly.
When i check the topic partitions (the one that has messages in it), I do see that the message has a proper value for the key field (but it's not being used? idk, i'm a little confused)
I followed the official partitioning example and have the following code:
Producer code
#Component
class FooEventPublisher {
private val logger = LoggerFactory.getLogger(this::class.java)
private val mapper = jacksonObjectMapper()
.findAndRegisterModules()
.configure(SerializationFeature.WRITE_DATES_AS_TIMESTAMPS, false)
private val ingressChannel = Channel<FooEvent>(capacity = Channel.UNLIMITED)
/** other component will call this to pipe in events to be published */
suspend fun send(event: FooEvent) = ingressChannel.send(event)
/** helper function to convert [FooEvent] into a [Message] with a JSON payload */
private fun FooEvent.toMessage(): Message<ByteArray> {
val payload = mapper.writeValueAsBytes(this)
val partitionKey = this.name
val message = MessageBuilder
.withPayload(payload)
.setHeader(KafkaHeaders.MESSAGE_KEY, partitionKey.toByteArray())
.setHeader("partitionKey", partitionKey.toByteArray())
.build()
return message
}
#Bean
fun publishFooEvents(): () -> Flux<Message<ByteArray>> = {
ingressChannel
.consumeAsFlow()
.map {
try {
it.toMessage()
} catch (err: Exception) {
logger.error("Skipping event because of encoding failure", err)
logger.trace("problematic event=$it")
null
}
}
.filterNotNull()
.asFlux()
}
}
Relevant Spring Configuration
spring:
cloud:
function:
definition: publishFooEvents
stream:
kafka:
binder:
brokers: localhost:9092
bindings:
publishFooEvents-out-0:
destination: kf-foo-events-topic
producer:
partition-key-expression: headers['partitionKey']
I expected the kafka binder library to use the partitionKey field as the field to partition on e.g. all messages with key 1234 would go to partition 1 and messages with key 5678 would go to partition 2
I'm not sure what i'm missing here? why isn't the binder detecting that the target topic has 4 partitions and using that information to partition?
edit: fixed key in example above

Partitioning at the binder level is not intended for infrastructure that supports partitioning natively, such as Kafka. Just use native Kafka partitioning instead (which by default will be based on the key).
Furthermore, you are setting the header to a byte[]; it should remain as String so that the hash algorithm uses the value; the hash code of byte[] depends on its system identity, not the array contents.
e.g. all messages with key 1234 would go to partition 1 and messages with key 1234 would go to partition 2
That makes no sense, I presume you meant to specify different keys.

Related

#KafkaListener per specific header value

I have #KafkaListener:
#KafkaListener(topicPattern = "SameTopic")
public void onMessage(Message<String> message, Acknowledgment acknowledgment) {
String eventType = new String((byte[]) message.getHeaders().get("Event-Type"), StandardCharsets.UTF_8);
switch (eventType) {
case "create" -> doCreate(message);
case "update" -> doUpdate(message);
case "delete" -> doDelete(message);
}
}
Producer sets custom header Event-Type with three possible values: create, update, delete. Currently I'm reading this header value from Message and then invoke rest of the logic according to the header value.
Is there any way to create three #KafkaListeners where each of them will consume message filtered by some criteria - for my case filtered by header Event-Type value?
#KafkaListener(topicPattern = "SameTopic", ...)
public void onCreate(Message<String> message, Acknowledgment acknowledgment) {
doCreate(message);
}
#KafkaListener(topicPattern = "SameTopic", ...)
public void onUpdate(Message<String> message, Acknowledgment acknowledgment) {
doUpdate(message);
}
#KafkaListener(topicPattern = "SameTopic", ...)
public void onDelete(Message<String> message, Acknowledgment acknowledgment) {
doDelete(message);
}
I'm aware of RecordFilterStrategy, but couldn't get any help of it.

Consider to have those types mapped to the partition on the topic.
This way you definitely can have different #KafkaListener with the specific partition assigned:
/**
* The topicPartitions for this listener when using manual topic/partition
* assignment.
* <p>
* Mutually exclusive with {#link #topicPattern()} and {#link #topics()}.
* #return the topic names or expressions (SpEL) to listen to.
*/
TopicPartition[] topicPartitions() default {};
The doc is here: https://docs.spring.io/spring-kafka/docs/current/reference/html/#manual-assignment
It's probably not going to work well with several instances of your app, since with manual assignment there is no consumer group involved. You may consider to refine the logic to 3 different topics. Or if that is not possible from produce side, use Kafka Streams to split() the original topic to other topics according the record key.

Converting StreamListener with headers to Functional Model

Because #EnableBinding and #StreamListener are deprecated, I need to migrate existing code to the new model, however I could not find any information on whether the argument mapping available in Spring Cloud Stream is still supported and/or any clean workarounds.
My original method:
#StreamListener("mysource)
public void processMessage(byte[] rawMessage, #Header(required = false, value = KafkaHeaders.RECEIVED_MESSAGE_KEY) byte[] rawKey) {
processMessage(rawMessage, rawKey);
}
I managed to convert this to work as follows:
#Bean(name = "mysource")
public Consumer<Message<?>> mySource() {
return message -> {
byte[] rawMessage = message.getPayload().toString().getBytes();
byte[] rawKey = (byte[]) message.getHeaders().get("kafka_receivedMessageKey");
processMessage(rawMessage, rawKey);
};
}
However, what I would prefer is one that maximizes framework support with respect to argument mapping and/or automatic type conversions.
I attempted:
#Bean(name = "mysource")
public BiConsumer<Message<byte[]>, MessageHeaders> mySource() {
return (message, headers) -> {
byte[] rawMessage = message.getPayload();
byte[] rawKey = (byte[]) headers.get("kafka_receivedMessageKey");
processMessage(rawMessage, rawKey);
};
}
But this gives an error at startup: FunctionConfiguration$FunctionBindingRegistrar.afterPropertiesSet - The function definition 'mysource' is not valid. The referenced function bean or one of its components does not exist
I'm also aware that along with Supplier and Consumer, Function is also available, but I'm not sure how to use a Function in this case instead of a BiConsumer or if it's possible, so looking for examples on how to do this type of migration seamlessly and elegantly with respecting to consuming and producing messages plus headers from/to Kafka.

Use micrometer timer with manual counter increment

I have a KafkaListener which receives messages containing a list of objects.
#KafkaListener(
id = "dataConsumer",
topics = "data.topic",
groupId = "${spring.kafka.consumer.group-id}",
containerFactory = "dataKafkaListenerContainerFactory")
public void consumeData(DataContainer message) {
List<Data> data = message.getList();
...
}
The list of objects can vary in size so the metrics for each message may not be useful.
I can get the timer metrics for this method by going to /actuator/metrics/spring.kafka.listener?tag=name:dataConsumer-0 but the count is for the message not the list of elements in the message. How can I switch this metric or make a similar metric for the time and count of the data elements in the message?

You can register your own Meter with the MeterRegistry - refer to the Micrometer Documentation.

Spring Cloud Function - Separate routing-expression for different Consumer

I have a service, which receives different structured messages from different message queues. Having #StreamListener conditions we can choose at every message type how that message should be handled. As an example:
We receive two different types of messages, which have different header fields and values e.g.
Incoming from "order" queue:
Order1: { Header: {catalog:groceries} }
Order2: { Header: {catalog:tools} }
Incoming from "shipment" queue:
Shipment1: { Header: {region:Europe} }
Shipment2: { Header: {region:America} }
There is a binding for each queue, and with according #StreamListener I can process the messages by catalog and region differently
e.g.
#StreamListener(target = OrderSink.ORDER_CHANNEL, condition = "headers['catalog'] == 'groceries'")
public void onGroceriesOrder(GroceryOder order){
...
}
So the question is, how to achieve this with the new Spring Cloud Function approach?
At the documentation https://cloud.spring.io/spring-cloud-static/spring-cloud-stream/3.0.2.RELEASE/reference/html/spring-cloud-stream.html#_event_routing it is mentioned:
Also, for SpEL, the root object of the evaluation context is Message so you can do evaluation on individual headers (or message) as well ….routing-expression=headers['type']
Is it possible to add the routing-expression to the binding like (in application.yml)
onGroceriesOrder-in-0:
destination: order
routing-expression: "headers['catalog']==groceries"
?
EDIT after first answer
If the above expression at this location is not possible, what the first answer implies, than my question goes as follows:
As far as I understand, an expression like routing-expression: headers['catalog'] must be set globally, because the result maps to certain (consumer) functions.
How can I control that the 2 different messages on each queue will be forwarted to their own consumer function, e.g.
Order1 --> MyOrderService.onGroceriesOrder()
Order2 --> MyOrderService.onToolsOrder()
Shipment1 --> MyShipmentService.onEuropeShipment()
Shipment2 --> MyShipmentService.onAmericaShipment()
That was easy with #StreamListener, because each method gets their own #StreamListener annotation with different conditions. How can this be achieved with the new routing-expression setting?
?

Aside from the fact that the above is not a valid expression, but I think you meant headers['catalog']==groceries. If so, what would you expect to happen from evaluating it as the only two option could be true/false. Anyway, these are rhetorical but helps to understand the problem and how to fix it.
The expression must result in a value of a function to route TO. So. . .
routing-expression: headers['catalog'] - assumes that the actual value of catalog header is the name of the function to invoke
routing-expression: headers['catalog']==groceries ? 'processGroceries' : 'processOther' - maps value 'groceries' to 'processGroceries' function.

For a specific routing, you can use MessageRoutingCallback strategy:
MessageRoutingCallback
The MessageRoutingCallback is a strategy to assist with determining
the name of the route-to function definition.
public interface MessageRoutingCallback {
FunctionRoutingResult routingResult(Message<?> message);
. . .
}
All you need to do is implement and register it as a bean to be picked
up by the RoutingFunction. For example:
#Bean
public MessageRoutingCallback customRouter() {
return new MessageRoutingCallback() {
#Override
FunctionRoutingResult routingResult(Message<?> message) {
return new FunctionRoutingResult((String) message.getHeaders().get("func_name"));
}
};
}
Spring Cloud Function

SQS Listener #Headers getting body content instead of Message Attributes

I am using Spring Cloud SQS messaging for listening to a specified queue. Hence using #SqsListener annotation as below:
#SqsListener(value = "${QUEUE}", deletionPolicy = SqsMessageDeletionPolicy.ALWAYS )
public void receive(#Headers Map<String, String> header, #Payload String message) {
try {
logger.logInfo("Message payload is: "+message);
logger.logInfo("Header from SQS is: "+header);
if(<Some condition>){
//Dequeue the message once message is processed successfully
awsSQSAsync.deleteMessage(header.get(LOOKUP_DESTINATION), header.get(RECEIPT_HANDLE));
}else{
logger.logInfo("Message with header: " + header + " FAILED to process");
logger.logError(FLEX_TH_SQS001);
}
} catch (Exception e) {
logger.logError(FLEX_TH_SQS001, e);
}
}
I am able to connect the specified queue successfully and read the message as well. I am setting a message attribute as "Key1" = "Value1" along with message in aws console before sending the message. Following is the message body:
{
"service": "ecsservice"
}
I am expecting "header" to receive a Map of all the message attributes along with the one i.e. Key1 and Value1. But what I am receiving is:
{service=ecsservice} as the populated map.
That means payload/body of message is coming as part of header, although body is coming correctly.
I wonder what mistake I am doing due to which #Header header is not getting correct message attributes.
Seeking expert advice.
-PC

I faced the same issue in one of my spring projects.
The issue for me was, SQS configuration of QueueMessageHandlerFactory with Setting setArgumentResolvers.
By default, the first argument resolver in spring is PayloadArgumentResolver.
with following behavior
#Override
public boolean supportsParameter(MethodParameter parameter) {
return (parameter.hasParameterAnnotation(Payload.class) || this.useDefaultResolution);
}
Here, this.useDefaultResolution is by default set to true – which means any parameter can be converted to Payload.
And Spring tries to match your method actual parameters with one of the resolvers, (first is PayloadArgumentResolver) - Indeed it will try to convert all the parameters to Payload.
Source code from Spring:
#Nullable
private HandlerMethodArgumentResolver getArgumentResolver(MethodParameter parameter) {
HandlerMethodArgumentResolver result = this.argumentResolverCache.get(parameter);
if (result == null) {
for (HandlerMethodArgumentResolver resolver : this.argumentResolvers) {
if (resolver.supportsParameter(parameter)) {
result = resolver;
this.argumentResolverCache.put(parameter, result);
break;
}
}
}
return result;
}
How I solved this,
The overriding default behavior of Spring resolver
factory.setArgumentResolvers(
listOf(
new PayloadArgumentResolver(converter, null, false),
new HeaderMethodArgumentResolver(null, null)
)
)
Where I set, default flag to false and spring will try to convert to payload only if there is annotation on parameter.
Hope this will help.

Apart from #SqsListener, you need to add #MessageMapping to the method. This annotation will helps to resolve method arguments.

I had this issue working out of a rather large codebase. It turned out that a HandlerMethodArgumentResolver was being added to the list of resolvers that are used to basically parse the message into the parameters. In my case it was the PayloadArgumentResolver, which usually always resolves an argument to be the payload regardless of the annotation. It seems by default it's supposed to come last in the list but because of the code I didn't know about, it ended up being added to the front.
Anyway, if you're not sure take a look around your code and see if you're doing anything regarding spring's QueueMessageHandler or HandlerMethodArgumentResolver.
It helped me to use a debugger and look at HandlerMethodArgumentResolver.resolveArgument method to start tracing what happens.
P.S. I think your #SqsListener code looks fine except that I think #Headers is supposed to technically resolve to a Map of < String, Object >", but I'm not sure that would cause the issue you're seeing.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Spring Cloud Stream Kafka Binder incorrect partitioning - spring

Related

#KafkaListener per specific header value

Converting StreamListener with headers to Functional Model

Use micrometer timer with manual counter increment

Spring Cloud Function - Separate routing-expression for different Consumer

SQS Listener #Headers getting body content instead of Message Attributes

Categories

Resources