Kafka Streams Exactly Once Consumer Groups

Kafka Streams Exactly Once Consumer Groups - apache-kafka-streams

When I turn on exactly once processing I get the following error. NOTE: Our application are very secure and we only give kafka users and consumers access to resources that they explicitly need.
2019-04-22 15:28:09 INFO (kafka.authorizer.logger)233 - Principal = User:xxx is Denied Operation = Describe from hos
xxx.xxx.xxx.xxx on resource = TransactionalId:application_consumer-0_16
With exactly once processing does kafka streams use a consumer group per stream task instead of a consumer group across all stream tasks?

With exactly-once enabled, there is still only one consumer group that is the same as the application.id. However, instead of using one Producer per thread, one producer per task is used.
What you need is permission for transaction. The TransactionsId the error reports is from the producer of task 0_16. Each producer uses its own transactional ID, that is constructed as <application.id>-<taskId>.
For details, compare the docs: https://docs.confluent.io/current/kafka/authorization.html#using-acls

Related

One partition multiple consumers same group, consumer IDs

We have one topic with one partition due to ordering of message requirements. We have two consumers running on different servers with same set of configurations i.e. groupId, consumerId, consumerGroup. i.e.
1 Topic -> 1 Partition -> 2 Consumers
When we deploy consumers same code is deployed on both the servers. Noticed when a message comes we see both the consumers are consuming message rather than only one processing. Reason having consumers running on two separate servers is if one server crashes at least other can continue processing messages. But looks like if both up both consuming messages. Reading Kafka docs it says if we have more consumers than partitions then some stay idle don't see that happening. Anything we are missing on configuration side apart from consumerId & groupId. Thanks

As #Gary Russel said, as long as the two consumer instances have their own consumer group, they will consume every event that is written to the topic. Just put them into the same consumer-group. You can provide a consumer-group-id in the consumer.properties.

Parallel processing and auto scaling in spring-kafka KafkaListener

I'm using spring-kafka to consume messages from two Kafka topics, which sends same message format as below.
#KafkaListener(topics = {"topic_country1", "topic_country2"}, groupId = KafkaUtils.MESSAGE_GROUP)
public void onCustomerMessage(String message, Acknowledgment ack) throws Exception {
log.info("Message : {} is received", message);
ack.acknowledge();
}
Can KafkaListener allocate the number of consumer threads according to the number of topics that it listens by it's own and parallel process messages in two topics? Or it doesn't support parallel processing and messages have to wait in the topic till one message gets processed?
In case if the number of messages in the topic is higher, I need to autoscale my micro-service to start new instances (till the number of partitions). What are the parameters (CPU, memory) I can depend on to find out the number of messages in the topics is higher from KafkaListener point of view? (i.e In an API I can auto-scale the service by monitoring the HTTP latency)

You can set the concurrency property to run more threads; but each partition can only be processed by one thread. To increase concurrency you must increase the number of partitions in each topic. When listening to multiple topics in the same listener, if those topics only have one partition, you may not get the concurrency you desire unless you change the kafka consumer partition assignor.
See https://docs.spring.io/spring-kafka/docs/2.5.0.RELEASE/reference/html/#using-ConcurrentMessageListenerContainer
When listening to multiple topics, the default partition distribution may not be what you expect. For example, if you have three topics with five partitions each and you want to use concurrency=15, you see only five active consumers, each assigned one partition from each topic, with the other 10 consumers being idle. This is because the default Kafka PartitionAssignor is the RangeAssignor (see its Javadoc). For this scenario, you may want to consider using the RoundRobinAssignor instead, which distributes the partitions across all of the consumers. Then, each consumer is assigned one topic or partition. ...

If you want to scale horizontal beyond the partition count and dynamically - consider using something like Parallel Consumer (PC). It can be used within a Spring context.
By using PC, you can processing all your keys in parallel, regardless of how long it takes to process, and you can be as concurrent as you wish - and this can scale dynamically.
PC directly solves for this, by sub partitioning the input partitions by key and processing each key in parallel.
It also tracks per record acknowledgement. Check out Parallel Consumer on GitHub (it's open source BTW, and I'm the author).

Read latest message from kafka - segmentio/kafka-go

I'm using segmentio/kafka-go client to read messages from a topic.
I'm unable to find.. how to start reading from last/new message.
Everytime I start the code, it starts reading from beginning offset in that partition.

What you need to know about consuming messages from Kafka is that each consumer client is part of a Consumer Group. Kafka stores the already processed offset for each Consumer Group at Topic-Partition level in an internal Kafka topic called __consumer_offsets. This enables a consumer of a Consumer Group to continue consumption from where it left off after a re-start.
In your case it means you need to set the Consumer Group (in the KafkaConsumer API it is the configuration "group.id") and keep it constant. Only then you will be able to continue reading from the latest/new est message and not start from beginning after a re-start.

Handling transactions with Kafka Streams and Spring-Cloud-Stream

I am developing an app (microservices-based) relying on Kafka and Kafka Streams. I am using Spring Boot and Spring Cloud Stream for that and I am having trouble with handling transactions for Kafka Streams operations. I know that there is no problem with handling transactions purely with Kafka consumer however when I try to add Kafka Streams processing in the middle it becomes tricky to me.
The example case is:
In one of my services order request for a product is consumed from topic A.
Inventory info is consumed from topic B
This service produces inventory updates to topic B but it is also responsible for publishing events regarding products being ready for shipping (to topic C)
When receiving order request from topic A I want to check (by processing topic B) whether inventory for particular product is sufficient and publish an event with either success or failure (regarding that order) to topic C.
At the same time I need to update inventory (subtract the quantity that is let's say reserved for shipping) so that for next order I have actual values from topic B. I want to post success to topic C and update inventory on topic B within one transaction.
Is that possible in spring cloud stream with kafka streams? And if yes, how can I manage to do that?

Spring integration service activator with multiple messages

I would like to process multiple messages at a time e.g. get 10 messages from the channel at a time and write them to a log file at once.
Given the scenario, can I write a service activator which will get messages in predefined set i.e. 5 or 10 messages and process it? If this is not possible then how to achieve this using Spring Integration.

That is exactly what you can get with the Aggregator. You can collect several messages to the group using simple expression like size() == 10. When the group is complete, the DefaultAggregatingMessageGroupProcessor emits a single message with the list of payloads of messages in the group. The result you can send to the service-activator for handling the batch at once.
UPDATE
Something like this:
.aggregate(aggregator -> aggregator
.correlationStrategy(message -> 1)
.releaseStrategy(group -> group.size() == 10)
.outputProcessor(g -> new GenericMessage<Collection<Message<?>>>(g.getMessages()))
.expireGroupsUponCompletion(true))
So, we correlate messages (group or buffer them) by the static 1 key.
The group (or buffer size is 10) and when we reach it we emit a single message which contains all the message from the group. After emitting the result we clean the store from this group to allow to form a new one for a fresh sequence of messages.

It depends on what is creating the messages in the first place; if a message-driven channel adapter, the concurrency in that adapter is the key.
For other message sources, you can use an ExecutorChannel as the input channel to the service activator, with an executor with a pool size of 10.
Depending on what is sending messages, you need to be careful about losing messages in the event of a server failure.
It's difficult to provide a general answer without more information about your application.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Kafka Streams Exactly Once Consumer Groups - apache-kafka-streams

Related

One partition multiple consumers same group, consumer IDs

Parallel processing and auto scaling in spring-kafka KafkaListener

Read latest message from kafka - segmentio/kafka-go

Handling transactions with Kafka Streams and Spring-Cloud-Stream

Spring integration service activator with multiple messages

Categories

Resources