Does ConsumerSeekAware interface, onPartitionsAssigned method call when rebalancing. Because I want to seek to specific offset which are in the offset when initializing and rebalancing. Could I use consumerSeekAware for both purposes or should I use ConsumerRebalanceListener for rebalancing purpose. Please give simple answers because I don't have depth knowledge about spring kafka yet. If you could please provide a sample code. Thank you
The ConsumerSeekAware has this method:
/**
* When using group management, called when partition assignments change.
* #param assignments the new assignments and their current offsets.
* #param callback the callback to perform an initial seek after assignment.
*/
void onPartitionsAssigned(Map<TopicPartition, Long> assignments, ConsumerSeekCallback callback);
It is called from the KafkaMessageListenerContainer.seekPartitions(Collection<TopicPartition> partitions, boolean idle), which, in turn, from the ConsumerRebalanceListener.onPartitionsAssigned() internal implementation. And thew last one has this JavaDocs:
* A callback method the user can implement to provide handling of customized offsets on completion of a successful
* partition re-assignment. This method will be called after an offset re-assignment completes and before the
* consumer starts fetching data.
So, yes, ConsumerSeekAware.onPartitionsAssigned() is always called during rebalancing. By the way there is no such a state for Apache Kafka as initializing. It is always rebalancing - the broker is in waiting state and starts rebalancing whenever a new consumer is joined.
Related
So I have an empty map referenced like:
private var labelsForGroupId: Map<GroupId, Label> = emptyMap()
to lower the amount of calls through network api. After first call I cache the response to the map.
However, I would love to add TTL to that map, (for example, every hour it should be empty again). I am quite new to Kotlin, so wondering what would be the best approach here with some examples?
Instead of using a Map, you could use Guava Cache. It works like a Map (key-value) and have expiration policies.
Expiration by time example:
CacheBuilder.newBuilder()
.expireAfterAccess(200, TimeUnit.MILLISECONDS)
.build(loader);
If you are not interested in caches at all, then you could try to setup a Coroutine with a ScheduledExecutorService as Dispatcher. I never did this before but is a way out. Take a look at the Executors documentation - Coroutine context and dispatchers
If the given [ExecutorService] is an instance of
[ScheduledExecutorService], then all time-related * coroutine
operations such as [delay], [withTimeout] and time-based [Flow]
operators will be scheduled * on this executor using
[schedule][ScheduledExecutorService.schedule] method. If the
corresponding * coroutine is cancelled, [ScheduledFuture.cancel] will
be invoked on the corresponding future.
Case
Clients are ReplyingKafkaTemplate instances.
Server is a ConcurrentMessageListenerContainer created using #KafkaListener and #SendTo annotations on a method.
ContainerFactory uses ContainerStoppingErrorHandler.
Request topic has only 1 partition.
Group ids are static. eg. test-consumer-group.
Requests are sent with timeouts.
Due to an exception thrown, server goes down
but the client keeps dispatching requests which queue up on the
request topic.
Current Behavior
When the server comes back up it continues processing old requests which would have timed out.
Desired Behavior
Instead, it would be better to continue with the last message; thereby skipping past even unprocessed messages as corresponding requests would timeout and retry.
Questions
What is the recommended approach to achieve this?
From the little that I understand, it looks like I'll have to manually set the initial offset. What's the simplest way to implement this?
Your #KafkaListener class must extends AbstractConsumerSeekAware and do something like this:
#Override
public void onPartitionsAssigned(Map<TopicPartition, Long> assignments, ConsumerSeekCallback callback) {
super.onPartitionsAssigned(assignments, callback);
callback.seekToEnd(assignments.keySet());
}
So, every time when your consumer joins the group it is going to seek all the assigned partitions to the end skipping all the old records.
Our system receives messages to fetch data from remote service and then store it into the database. Currently, it opens multiple connections with the database to save the fetched data for each request. We want to convert it into a process with multiple producers(fetching data from remote service) and a single consumer to persist data in the database. Doing this it will hold only one connection at most to persist data in the database.
We are using spring-boot with a reactor. We want to have a publisher publishing all the data fetched from the remote service which we can subscribe to and push this data in a batch of say 200 records in the database.
For example, I am planning to us following code to consume messages from ActiveMQ queue:
public Publisher<Message<RestoreMessage>> restoreMessagesSource() {
return IntegrationFlows
.from(Jms.messageDrivenChannelAdapter(this.connectionFactory)
.destination(RestoreMessage.class.getSimpleName() + "Queue"))
.channel(MessageChannels.queue())
.log(LoggingHandler.Level.DEBUG)
.log()
.toReactivePublisher();
}
In this code message from the ActiveMQ qeueu are put into a ReactivePublisher. This publisher has been subsribed. This way we are conusming the messages from the queue.
In a similar fashion, we want the response of all the remote API to be pushed to a publisher which we can process in a subscriber at one place.
Sounds like you are going to have several Publisher<Message<?>> and you want to consume them all in a single subscriber. For this reason you can use:
/**
* Merge data from {#link Publisher} sequences contained in an array / vararg
* into an interleaved merged sequence. Unlike {#link #concat(Publisher) concat},
* sources are subscribed to eagerly.
* <p>
* <img class="marble" src="doc-files/marbles/mergeFixedSources.svg" alt="">
* <p>
* Note that merge is tailored to work with asynchronous sources or finite sources. When dealing with
* an infinite source that doesn't already publish on a dedicated Scheduler, you must isolate that source
* in its own Scheduler, as merge would otherwise attempt to drain it before subscribing to
* another source.
*
* #param sources the array of {#link Publisher} sources to merge
* #param <I> The source type of the data sequence
*
* #return a merged {#link Flux}
*/
#SafeVarargs
public static <I> Flux<I> merge(Publisher<? extends I>... sources) {
So, you are going to sink all your sources to one Flux and will subscribe to this one.
Pay attention to the Note. The .toReactivePublisher() indeed produces an infinite source, although, according the Jms.messageDrivenChannelAdapter() it is done in its specific thread from an executor in listener container. So, try it as is or wrap each source to the Flux with particular publishOn().
This question already has answers here:
Restrict Spring WebClient call at application level
(2 answers)
Closed 4 years ago.
We are using Spring WebClient for calling web services using the same.
However, i don't know how to create/manage connection pool in Spring WebClient.
I got to know that we have use 'ReactorClientHttpConnector' but just don't get any sample code.
Basically, i want to have WebClient pool with maxTotal, maxWaitMillis etc.
Spring WebClient is a No-Blocking IO http client while ReactorClientHttpConnector is a Reactor-Netty based implementation. Said that I can suggest to do not warry about connection pool but focus on a complete no blocking service call. The key of succes using this kind of technology is all on focus on a complete no blocking service call chain, the model do not involve a thread per request, it is like a browser or node js development if you has something that block your code you block anything. I know that it is so not usual but the base implementation on a event-loop model force you to think about a completely different model.
I can comfort you by telling you that typically in a Netty based implementation you have a number of event loop that is the same of the number of your core, it is configurable of course but is think that it is enough, remember the power of reactive and no blocking IO programming is to embrace the no blockin io in all pieces of your code and add more event loop per processor will bring you to add some of concurrency while having one event loop per processor will enabling you to a fully parallel usage of your processor.
I hoe that this reflection can help you
TIP. for timeout on your http service call you can add a timeout on your like in the test below:
#Test
#WithMockUser(username = "user")
fun `read basic personal details data`() {
personalDetailsRepository.save("RESUME_ID", TestCase.personalDetails()).toMono().block();
val expectedJson = TestCase.readFileAsString("personal-details.json")
webClient.get()
.uri("/resume/RESUME_ID/personal-details")
.accept(MediaType.APPLICATION_JSON)
.exchange().toMono().timeout(Duration.ofMinutes(1))
}
Update
Considering the request of restrict on application level the webclient of course it is possible use the Backpressure feature in order to deal with a data stream that may be too large at times to be reliably processed or in case if a stream response like a Flux with the Flux limitRate() operator can be useful taking the official documentation:
/**
* Ensure that backpressure signals from downstream subscribers are split into batches
* capped at the provided {#code prefetchRate} when propagated upstream, effectively
* rate limiting the upstream {#link Publisher}.
* <p>
* Note that this is an upper bound, and that this operator uses a prefetch-and-replenish
* strategy, requesting a replenishing amount when 75% of the prefetch amount has been
* emitted.
* <p>
* Typically used for scenarios where consumer(s) request a large amount of data
* (eg. {#code Long.MAX_VALUE}) but the data source behaves better or can be optimized
* with smaller requests (eg. database paging, etc...). All data is still processed,
* unlike with {#link #limitRequest(long)} which will cap the grand total request
* amount.
* <p>
* Equivalent to {#code flux.publishOn(Schedulers.immediate(), prefetchRate).subscribe() }.
* Note that the {#code prefetchRate} is an upper bound, and that this operator uses a
* prefetch-and-replenish strategy, requesting a replenishing amount when 75% of the
* prefetch amount has been emitted.
*
* #param prefetchRate the limit to apply to downstream's backpressure
*
* #return a {#link Flux} limiting downstream's backpressure
* #see #publishOn(Scheduler, int)
* #see #limitRequest(long)
*/
public final Flux<T> limitRate(int prefetchRate) {
return onAssembly(this.publishOn(Schedulers.immediate(), prefetchRate));
}
said that I suggest to use this features and do not attempt to limit the consuming of data in a forced way like a connection limit. One of the point of strength of reactive programming and No blocking IO is on the incredible efficiency to use the resource and limit the resource usage appear like a against sense of the spirit of paradigm
I am new to spring and kafka . I have a use case to consume from a kafka topic and produce to another topic using a transactional producer(messages should be processed only once) .I saw the discussion on this thread (https://github.com/spring-projects/spring-kafka/issues/645) but implemented it little differently .
I set a manual ack mode in the listener container factory and then did an acknowledgent after sending to producer using kafkatemplate.executeinTransaction(aysnc send) .Does that acheive the same result as this one ? . Since the send is asynchnrous I am not sure it will serve the purpose
Also in the above example on issue 645 when does the actual commit to kafka broker happen ?(consumer see the data ). Does it happen on a commit interval or record by record ?.I am trying to understand if the actual commit happens on a time interval/for every record or is it something configurable.
If you are using transactions you should not commit offsets via the consumer; instead, you should send the offsets to the transaction using the producer.
If properly configured, the listener container will do it automatically for you when the listener exits. See the documentation.
By configuring the listener container with a KafkaTransactionManager, the container starts the transaction. Any sends on a transactional KafkaTemplate will participate in the transaction and the container will send the offsets to the transaction (and commit the transaction) when the listener exits normally.
See the Javadocs for executeInTransaction...
/**
* Execute some arbitrary operation(s) on the operations and return the result.
* The operations are invoked within a local transaction and do not participate
* in a global transaction (if present).
* #param callback the callback.
* #param <T> the result type.
* #return the result.
* #since 1.1
*/
<T> T executeInTransaction(OperationsCallback<K, V, T> callback);
Such operations will not participate in the container's transaction.
Coming to consumer offset commit there are two ways, one is enabling
enable.auto.commit is set to true
auto.commit.interval.ms // configuring time for commit intervals
The other way is commit offset manually through Acknowledgement
#KafkaListener(topics="${kafka.consumer.topic}", containerFactory="kafkaListenerContainerFactory", groupId="${kafka.consumer.groupId}")
public void taskListner(Task task, Acknowledgment Ack) {
//System.out.println(task.toString());
log.info(task.toString());
Ack.acknowledge();
}
The consumer auto check is called in every poll interval and checks time elapsed is greater than configured, if then it will commit the offset