Kafka: Sarama, idempotence and transactional.id

Kafka: Sarama, idempotence and transactional.id - go

Does Shopify/sarama provide an option similar to transactional.id in JVM API?
The library supports idempotence (Config.Producer.Idemponent, similar to enable.idempotence), but I don't understand how to use it without transactional.id.
Please, correct me if I'm wrong, there is a bit lack of documentation about these options in Sarama. But according to JVM docs, idempotence without the identifier will be limited by a single producer session. In other words, we will loss the guarantee when producer fails and restart.
I found relevant properties in the source code and some tests (for example), but don't understand how to use them externally.

Shopify/sarama Provides Kafka Exactly Once (Idempotency) with idempotent enabled producer. But For that below configuration setup need to be there.
From Shopify/sarama/config.go
if c.Producer.Idempotent {
if !c.Version.IsAtLeast(V0_11_0_0) {
return ConfigurationError("Idempotent producer requires Version >= V0_11_0_0")
}
if c.Producer.Retry.Max == 0 {
return ConfigurationError("Idempotent producer requires Producer.Retry.Max >= 1")
}
if c.Producer.RequiredAcks != WaitForAll {
return ConfigurationError("Idempotent producer requires Producer.RequiredAcks to be WaitForAll")
}
if c.Net.MaxOpenRequests > 1 {
return ConfigurationError("Idempotent producer requires Net.MaxOpenRequests to be 1")
}
}
In Shopify/sarama How they do this is, There is a producerEpoch ID in AsyncProducer's transactionManager. You can refer the file in Shopify/sarama/async_producer.go. This Id initialise with the producer initialisation and increment when successfully producing each message. read bumpEpoch() function to see that in async_producer.go file.
This is the sequence id for that producer session with the broker and it is sending with each message. Increment when message published successfully.
Read this example. It describes how idempotence works.
You are correct on producer session fact. That exactly once promised for single producer session. When restating producer just after the sequence failure, there can be a duplicate.
When producer restarts, new PID gets assigned. So the idempotency is promised only for a single producer session. Even though producer retries requests on failures, each message is persisted in the log exactly once. There can still be duplicates depending on the source where the producer is getting data. Kafka won’t take care of the duplicate data received by the producer. So, in some cases, you may require an additional de-duplication system.

Related

Delay start of consumer forces rebalance of group

We need to delay start of consumer.
Here's what we need:
Start consumer A (reading topic "xyz")
When consumer A will process all messages, we need to start consumer B (reading topic "zyx")
After reading this:
How to find no more messages in kafka topic/partition & reading only after writing to topic is done
We set idleEventInterval on containerProperties of consumer A:
containerProperties.setIdleEventInterval(30000L);
and on consumer B:
container.setAutoStartup(false);
then we have:
#EventListener
public void handleListenerContainerIdleEvent(ListenerContainerIdleEvent event) {
if(canStartContainer(event.getListenerId())) {
Optional.ofNullable(containers.get("container-a"))
.ifPresent(AbstractMessageListenerContainer::start);
}
}
We found that it's exactly what we need - it works fine, but we faced one problem: when consumer B is starting, it forces rebalance of all other consumers.
Can we avoid it?
Request joining group due to: group is already rebalancing
Revoke previously assigned partitions
(Re-)joining group
It's not a big issue, but we use ConsumerSeekAware to reset offset using seekToBeginning, so topic is read twice

You should not use the same group.id with consumers on different topics; it will cause an unnecessary rebalance, as you have found out.
Use different group.ids for consumers on different topics.

querying artemis queue size fails

In a spring boot application using artemis we try to avoid queues containing too many messages. The intention is to only put in new messages if the number of messages currently in the queue falls below a certain limit, e.g. 100 messages. However, that seems not to work but we don't know why or what the "correct" method would be to implement that functionality. The number of messages as extracted by the code below is always 0 although in the gui there are messages.
To reproduce the problem I installed apache-artemis-2.13.0 locally.
We are doing something like the following
if (!jmsUtil.queueHasNotMoreElementsThan(QUEUE_ALMOST_EMPTY_MAX_AMOUNT, reprocessingMessagingProvider.getJmsTemplate())) {
log.info("Queue has too many messages. Will not send more...");
return;
}
jmsUtil is implemented like
public boolean queueHasNotMoreElementsThan(int max, JmsOperations jmsTemplate) {
return Boolean.TRUE.equals(
jmsTemplate.browse((session, queueBrowser) -> {
Enumeration enumeration = queueBrowser.getEnumeration();
return notMoreElemsThan(enumeration, max);
}));
}
private Boolean notMoreElemsThan(Enumeration enumeration, int max) {
for (int i = 0; i <= max; i++) {
if (!enumeration.hasMoreElements()) {
return true;
}
enumeration.nextElement();
}
return false;
}
As a check I used additionally the following method to give me the number of messages in the queue directly.
public int countPendingMessages(String destination, JmsOperations jmsTemplate) {
Integer totalPendingMessages = jmsTemplate.browse(destination,
(session, browser) -> Collections.list(browser.getEnumeration()).size());
int messageCount = totalPendingMessages == null ? 0 : totalPendingMessages;
log.info("Queue {} message count: {}", destination, messageCount);
return messageCount;
}
That method of extracting the queue size seems to be used as well by others and is based on the documentation of QueueBrowser: The getEnumeration method returns a java.util.Enumeration that is used to scan the queue's messages.
Would the above be the correct way on how to obtain the queue size? If so, what could be the cause of the problem? If not, how should the queue size be queried? Does spring offer any other possibility of accessing the queue?
Update: I read another post and the documentation but I wouldn't know on how to obtain the ClientSession.

There are some caveats to using a QueueBrowser to count the number of messages in the queue. The first is noted in the QueueBrowser JavaDoc:
Messages may be arriving and expiring while the scan is done. The JMS API does not require the content of an enumeration to be a static snapshot of queue content. Whether these changes are visible or not depends on the JMS provider.
So already the count may not be 100% accurate.
Then there is the fact that there may be messages still technically in the queue which have been dispatched to a consumer but have not yet been acknowledged. These messages will not be counted by the QueueBrowser even though they may be cancelled back to the queue at any point if the related consumer closes its connection.
Simply put the JMS API doesn't provide a truly reliable way to determine the number of messages in a queue. Furthermore, "Spring JMS" is tied to the JMS API. It doesn't have any other way to interact with a JMS broker. Given that, you'll need to use a provider-specific mechanism to determine the message count.
ActiveMQ Artemis has a rich management API that is accessible though, among other things, specially constructed JMS messages. You can see this in action in the "Management" example that ships with ActiveMQ Artemis in the examples/features/standard/management directory. It demonstrates how to use JMS resources and provider-specific helper classes to get the message count for a JMS queue. This is essentially the same solution as given in the other post you mentioned, but it uses the JMS API rather than the ActiveMQ Artemis "core" API.

How to retry a kafka message when there is an error - spring cloud stream

I'm pretty new to Kafka. I'm using spring cloud stream Kafka to produce and consume
#StreamListener(Sink.INPUT)
public void process(Order order) {
try {
// have my message processing
}
catch( exception e ) {
//retry here that record..
}
}
}
Just want to know how can I implement a retry ? Any help on this is highly appreciated

Hy
There are multiple ways to handle "retries" and it depends on the kind of events you encounter.
For basic issues kafka framework will retry for you to recover from an error condition, for example in case of a short network downtime the consumer and producer api implement auto retry.
In particular kafka support "built-in producer/consumer retries" to correctly handle a large variety of errors without loss of messages, but as a developer, you must still be able to handle other types of errors with the try-catch block you mention.
Error in kafka can be divided in the following categories:
(producer & consumer side) Nonretriable broker errors such as errors regarding message size, authorization errors, etc -> you must handle them in "design phase" of your app.
(producer side) Errors that occur before the message was sent to the broker—for example, serialization errors --> you must handle them in the runtime app execution
(producer & consumer sideErrors that occur when the producer exhausted all retry attempts or when the
available memory used by the producer is filled to the limit due to using all of it to store messages while retrying -> you should handle these errors.
Another point of attention regarding "how to retry" is how to handle correctly the order of commits in case of auto-commit option is set to false.
A common and simple pattern to get commit order right is to use a monotonically increasing sequence number. Increase the sequence number every time you commit and add the sequence number at the time of the commit to the commit function.
When you’re getting ready to send a retry, check if the
commit sequence number the callback got is equal to the instance
variable; if it is, there was no newer commit and it is safe to retry. If
the instance sequence number is higher, don’t retry because a
newer commit was already sent.

Kafka Producer Thread, huge amound of threads even when no message is send

I profiled my kafka producer spring boot application and found many "kafka-producer-network-thread"s running (47 in total). Which would never stop running, even when no data is sending.
My application looks a bit like this:
var kafkaSender = KafkaSender(kafkaTemplate, applicationProperties)
kafkaSender.sendToKafka(json, rs.getString("KEY"))
with the KafkaSender:
#Service
class KafkaSender(val kafkaTemplate: KafkaTemplate<String, String>, val applicationProperties: ApplicationProperties) {
#Transactional(transactionManager = "kafkaTransactionManager")
fun sendToKafka(message: String, stringKey: String) {
kafkaTemplate.executeInTransaction { kt ->
kt.send(applicationProperties.kafka.topic, System.currentTimeMillis().mod(10).toInt(), System.currentTimeMillis().rem(10).toString(),
message)
}
}
companion object {
val log = LoggerFactory.getLogger(KafkaSender::class.java)!!
}
}
Since each time I want to send a message to Kafka I instantiate a new KafkaSender, I thought a new thread would be created which then sends the message to the kafka queue.
Currently it looks like a pool of producers is generated, but never cleaned up, even when none of them has anything to do.
Is this behaviour intended?
In my opinion the behaviour should be nearly the same as datasource pooling, keep the thread alive for some time, but when there is nothing to do, clear it up.

When using transactions, the producer cache grows on demand and is not reduced.
If you are producing messages on a listener container (consumer) thread; there is a producer for each topic/partition/consumer group. This is required to solve the zombie fencing problem, so that if a rebalance occurs and the partition moves to a different instance, the transaction id will remain the same so the broker can properly handle the situation.
If you don't care about the zombie fencing problem (and you can handle duplicate deliveries), set the producerPerConsumerPartition property to false on the DefaultKafkaProducerFactory and the number of producers will be much smaller.
EDIT
Starting with version 2.8 the default EOSMode is now V2 (aka BETA); which means it is no longer necessary to have a producer per topic/partition/group - as long as the broker version is 2.5 or later.

Changing state of messages which are "in delivery"

In my application, I have a queue (HornetQ) set up on JBoss 7 AS.
I have used Spring batch to do some work once the messages is received (save values in database etc.) and then the consumer commits the JMS session.
Sometimes when there is an exception while processing the message, the excecution of consumer is aborted abruptly.
And the message remains in "in delivery" state. There are about 30 messages in this state on my production queue.
I have tried restarting the consumer but the state of these messages is not changed. The only way to remove these
messages from the queue is to restart the queue. But before doing that I want a way to read these messages so
that they can be corrected and sent to the queue again to be processed.
I have tried using QueueBrowser to read them but it does not work. I have searched a lot on Google but could not
find any way to read these messages.
I am using a Transacted session, where once the message is processed, I am calling:
session.commit();
This sends the acknowledgement.
I am implementing spring's
org.springframework.jms.listener.SessionAwareMessageListener
to recieve messages and then to process them.
While processing the messages, I am using spring batch to insert some data in database.
For a perticular case, it tries to insert data too big to be inserted in a column.
It throws an exception and transaction is aborted.
Now, I have fixed my producer and consumer not to have such data, so that this case should not happen again.
But my question is what about the 30 "in delivery" state messages that are in my production queue? I want to read them so that they can be corrected and sent to the queue again to be processed. Is there any way to read these messages? Once I know their content, I can restart the queue and submit them again (after correcting them).
Thanking you in anticipation,
Suvarna

It all depends on the Transaction mode you are using.
for instance if you use transactions:
// session here is a TX Session
MessageConsumer cons = session.createConsumer(someQueue);
session.start();
Message msg = consumer.receive...
session.rollback(); // this will make the messages to be redelivered
if you are using non TX:
// session here is auto-ack
MessageConsumer cons = session.createConsumer(someQueue);
session.start();
// this means the message is ACKed as we receive, doing autoACK
Message msg = consumer.receive...
//however the consumer here could have a buffer from the server...
// if you are not using the consumer any longer.. close it
consumer.close(); // this will release messages on the client buffer
Alternatively you could also set consumerWindowSize=0 on the connectionFactory.
This is on 2.2.5 but it never changed on following releases:
http://docs.jboss.org/hornetq/2.2.5.Final/user-manual/en/html/flow-control.html
I"m covering all the possibilities I could think of since you're not being specific on how you are consuming. If you provide me more detail then I will be able to tell you more:

You can indeed read your messages in the queue using jmx (with for example jconsole)
In Jboss As7 you can do it the following way :
MBeans>jboss.as>messaging>default>myJmsQueue>Operations
listMessagesAsJson
[edit]
Since 2.3.0 You have a dedicated method for this specific case :
listDeliveringMessages
See https://issues.jboss.org/browse/HORNETQ-763

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio