I am new to spring and kafka . I have a use case to consume from a kafka topic and produce to another topic using a transactional producer(messages should be processed only once) .I saw the discussion on this thread (https://github.com/spring-projects/spring-kafka/issues/645) but implemented it little differently .
I set a manual ack mode in the listener container factory and then did an acknowledgent after sending to producer using kafkatemplate.executeinTransaction(aysnc send) .Does that acheive the same result as this one ? . Since the send is asynchnrous I am not sure it will serve the purpose
Also in the above example on issue 645 when does the actual commit to kafka broker happen ?(consumer see the data ). Does it happen on a commit interval or record by record ?.I am trying to understand if the actual commit happens on a time interval/for every record or is it something configurable.
If you are using transactions you should not commit offsets via the consumer; instead, you should send the offsets to the transaction using the producer.
If properly configured, the listener container will do it automatically for you when the listener exits. See the documentation.
By configuring the listener container with a KafkaTransactionManager, the container starts the transaction. Any sends on a transactional KafkaTemplate will participate in the transaction and the container will send the offsets to the transaction (and commit the transaction) when the listener exits normally.
See the Javadocs for executeInTransaction...
/**
* Execute some arbitrary operation(s) on the operations and return the result.
* The operations are invoked within a local transaction and do not participate
* in a global transaction (if present).
* #param callback the callback.
* #param <T> the result type.
* #return the result.
* #since 1.1
*/
<T> T executeInTransaction(OperationsCallback<K, V, T> callback);
Such operations will not participate in the container's transaction.
Coming to consumer offset commit there are two ways, one is enabling
enable.auto.commit is set to true
auto.commit.interval.ms // configuring time for commit intervals
The other way is commit offset manually through Acknowledgement
#KafkaListener(topics="${kafka.consumer.topic}", containerFactory="kafkaListenerContainerFactory", groupId="${kafka.consumer.groupId}")
public void taskListner(Task task, Acknowledgment Ack) {
//System.out.println(task.toString());
log.info(task.toString());
Ack.acknowledge();
}
The consumer auto check is called in every poll interval and checks time elapsed is greater than configured, if then it will commit the offset
Related
I am using spring boot (version 2.7.1) with spring cloud stream kafka binder (2.8.5) for processing Kafka messages
I've functional style consumer that consumes messages in batches. Right now its retrying 10 times and commits the offset for errored records.
I want now to introduce the mechanism of retry for certain numbers (works using below error handler) then stop processing messages and fail entire batch messages without auto committing offset.
I read through the documents and understand that CommonContainerStoppingErrorHandler can be used for stopping the container from consuming messages.
My handler looks below now and its retries exponentially.
#Bean
public ListenerContainerCustomizer<AbstractMessageListenerContainer<String, Message>> errorHandler() {
return (container, destinationName, group) -> {
container.getContainerProperties().setAckMode(ContainerProperties.AckMode.BATCH);
ExponentialBackOffWithMaxRetries backOffWithMaxRetries = new ExponentialBackOffWithMaxRetries(2);
backOffWithMaxRetries.setInitialInterval(1);
backOffWithMaxRetries.setMultiplier(2.0);
backOffWithMaxRetries.setMaxInterval(5);
container.setCommonErrorHandler(new DefaultErrorHandler(backOffWithMaxRetries));
};
}
How do I chain CommonContainerStoppingErrorHandler along with above error handler, so the failed batch is not commited and replayed upon restart ?
with BatchListenerFailedException from consumer, it is possible to fail entire batch (including one or other valid records before any problematic record in that batch) ?
Add a custom recoverer to the error handler - see this answer for an example: How do you exit spring boot application programmatically when retries are exhausted, to prevent kafka offset commit
No; records before the failed one will have their offsets committed.
This link tells us to use #Transactional for 1PC between a DB and Kafka for consumer-initiated transactions.
An excerpt:
the container (configured with a KTM) starts the kakfa transaction
The question is where and how do we configure a container with the KTM? Looked at the code sample here and it seems that the configuration for producer has it configured for transactions:
spring.kafka.producer.transaction-id-prefix=tx-
However, imagine that we do not have the producer or any event generation to kafka in the listener as in:
#KafkaListener(id = "group1", topics = "topic1")
#Transactional("dstm")
public void listen1(String in) {
// COMMENT THIS:
// this.kafkaTemplate.send("topic2", in.toUpperCase());
this.jdbcTemplate.execute("insert into mytable (data) values ('" + in + "')");
}
Now the questions is:
if the kafka transaction is in play?
Would the offsets commit (for the message "in") not happen if the db transaction is rolled back?
Do I need to manually ack for the offsets?
See the documentation.
For consumer-initiated transactions, the listener container must be configured with a KafkaTransactionManager so the offset can be sent to the transaction.
If you are not sending data to Kafka it makes no sense to use a consumer initiated transaction. Normal Spring transaction management will apply and if the commit fails, the normal container error handling will handle the exception (normally re-seek the record and retry).
If you are consuming and sending to Kafka then the container must start the Kafka transaction, not the producer; otherwise the send won't be rolled back.
Intro:
We're currently using the spring mail integration to receive and send emails which works without flaws if there's no exception such as a connection error to the exchange server or the database.
These mails come in as Messages and are passed to a handler method which will parse the MimeMessage to a custom mail data object. JPA saves those entities as the last step to our database.
Question/Problem:
There's a problem if the database is down or the mail can't be processed for any other reason, as the IntegrationFlow will still mark it as /SEEN once the message gets passed to the handler.
Setting this flag to false won't fix our problem, because we want Spring to set the /SEEN flag if the mail is processed and saved correctly
shouldMarkMessagesAsRead(false)
Searching for:
Would there be a possibility to set flags AFTER successfully saving the mail to the database?
We'd like to process the failed email again after the cause for the responsible error is fixed, which won't work as long Spring marks them as /SEEN no matter the result.
Reference:
The messages comes in and gets passed to the handler which will parse the mail and execute the CRUD-Repository save(mailDAO) method. The handleMimeMessage() is more or less just a mapper.
#Bean
fun imapIdleFlow(imapProperties: ImapProperties): IntegrationFlow {
imapProperties.username.let { URLEncoder.encode(it, charset) }
return IntegrationFlows
.from(
Mail.imapIdleAdapter(
ImapMailReceiver("imap://${imapProperties.username}:${imapProperties.password}#${imapProperties.host}/Inbox")
.apply {
setSimpleContent(true)
setJavaMailProperties(imapProperties.properties.toProperties())
})
.autoStartup(true)
.shouldReconnectAutomatically(true)
)
.handle(this::handleMimeMessage)
.get()
}
Is it even possible to mark the messages in the same flow afterward as you need to access the exchange a second time or would I need a second flow to get and flag the same mail?
I think it is possible with something like transaction synchronization: https://docs.spring.io/spring-integration/reference/html/mail.html#mail-tx-sync
So, you set transactional(TransactionManager transactionManager) on that Mail.imapIdleAdapter to the JpaTransactionManager to start transaction from this IMAP Idle channel adapter and propagate it to your handleMimeMessage() where you do those JPA saves.
Plus you add:
/**
* Configure a {#link TransactionSynchronizationFactory}. Usually used to synchronize
* message deletion with some external transaction manager.
* #param transactionSynchronizationFactory the transactionSynchronizationFactory.
* #return the spec.
*/
public ImapIdleChannelAdapterSpec transactionSynchronizationFactory(
TransactionSynchronizationFactory transactionSynchronizationFactory) {
To react for commit and rollback of the mentioned transaction.
The DefaultTransactionSynchronizationFactory with some TransactionSynchronizationProcessor impl can give you a desired behavior, where you take a Message and its payload from the provided IntegrationResourceHolder and perform something like message.setFlag(Flag.SEEN, true); on the MimeMessage.
You may consider to use the mentioned in docs an ExpressionEvaluatingTransactionSynchronizationProcessor.
To avoid folder reopening, you may consider to use a public ImapIdleChannelAdapterSpec autoCloseFolder(boolean autoCloseFolder) { with a false option. You need to consider to close it in that TX sync impl or some other way.
Our system receives messages to fetch data from remote service and then store it into the database. Currently, it opens multiple connections with the database to save the fetched data for each request. We want to convert it into a process with multiple producers(fetching data from remote service) and a single consumer to persist data in the database. Doing this it will hold only one connection at most to persist data in the database.
We are using spring-boot with a reactor. We want to have a publisher publishing all the data fetched from the remote service which we can subscribe to and push this data in a batch of say 200 records in the database.
For example, I am planning to us following code to consume messages from ActiveMQ queue:
public Publisher<Message<RestoreMessage>> restoreMessagesSource() {
return IntegrationFlows
.from(Jms.messageDrivenChannelAdapter(this.connectionFactory)
.destination(RestoreMessage.class.getSimpleName() + "Queue"))
.channel(MessageChannels.queue())
.log(LoggingHandler.Level.DEBUG)
.log()
.toReactivePublisher();
}
In this code message from the ActiveMQ qeueu are put into a ReactivePublisher. This publisher has been subsribed. This way we are conusming the messages from the queue.
In a similar fashion, we want the response of all the remote API to be pushed to a publisher which we can process in a subscriber at one place.
Sounds like you are going to have several Publisher<Message<?>> and you want to consume them all in a single subscriber. For this reason you can use:
/**
* Merge data from {#link Publisher} sequences contained in an array / vararg
* into an interleaved merged sequence. Unlike {#link #concat(Publisher) concat},
* sources are subscribed to eagerly.
* <p>
* <img class="marble" src="doc-files/marbles/mergeFixedSources.svg" alt="">
* <p>
* Note that merge is tailored to work with asynchronous sources or finite sources. When dealing with
* an infinite source that doesn't already publish on a dedicated Scheduler, you must isolate that source
* in its own Scheduler, as merge would otherwise attempt to drain it before subscribing to
* another source.
*
* #param sources the array of {#link Publisher} sources to merge
* #param <I> The source type of the data sequence
*
* #return a merged {#link Flux}
*/
#SafeVarargs
public static <I> Flux<I> merge(Publisher<? extends I>... sources) {
So, you are going to sink all your sources to one Flux and will subscribe to this one.
Pay attention to the Note. The .toReactivePublisher() indeed produces an infinite source, although, according the Jms.messageDrivenChannelAdapter() it is done in its specific thread from an executor in listener container. So, try it as is or wrap each source to the Flux with particular publishOn().
Does ConsumerSeekAware interface, onPartitionsAssigned method call when rebalancing. Because I want to seek to specific offset which are in the offset when initializing and rebalancing. Could I use consumerSeekAware for both purposes or should I use ConsumerRebalanceListener for rebalancing purpose. Please give simple answers because I don't have depth knowledge about spring kafka yet. If you could please provide a sample code. Thank you
The ConsumerSeekAware has this method:
/**
* When using group management, called when partition assignments change.
* #param assignments the new assignments and their current offsets.
* #param callback the callback to perform an initial seek after assignment.
*/
void onPartitionsAssigned(Map<TopicPartition, Long> assignments, ConsumerSeekCallback callback);
It is called from the KafkaMessageListenerContainer.seekPartitions(Collection<TopicPartition> partitions, boolean idle), which, in turn, from the ConsumerRebalanceListener.onPartitionsAssigned() internal implementation. And thew last one has this JavaDocs:
* A callback method the user can implement to provide handling of customized offsets on completion of a successful
* partition re-assignment. This method will be called after an offset re-assignment completes and before the
* consumer starts fetching data.
So, yes, ConsumerSeekAware.onPartitionsAssigned() is always called during rebalancing. By the way there is no such a state for Apache Kafka as initializing. It is always rebalancing - the broker is in waiting state and starts rebalancing whenever a new consumer is joined.