Consumer issues Spring Cloud Rabbit Stream - Cloud Foundry

Consumer issues Spring Cloud Rabbit Stream - Cloud Foundry - spring-boot

I have a spring cloud dataflow stream deployed in PCF using rabbit as the binder. I have multiple processors in the pipeline. Occasionally I see issues wherein a partitioned consumer does not consume messages from Rabbit until the consumer is restarted. For instance, within my stream, I have a processor that has partitioned input. The processor, foo, has 10 partitions. All partitions consume messages without issues 99% of the time. At rare occasions, one partition is not drained. When the instance listening to the partition is terminated and recreated, all works well again. Is there a mechanism to capture these issues? Will listening to ListenerContainerConsumerFailedEvent help in detecting such issues? Is there a preferred way to recover from such issues?
Sample stream definition is as follows:-
Source | foo | bar | Sink
Deployment Properties:-
app.Source.spring.cloud.stream.bindings.output.producer.partition-key-expression=headers['PARTITION_KEY']
app.Source.spring.cloud.stream.bindings.output.producer.partition-count=10
app.foo.spring.cloud.stream.bindings.input.consumer.partitioned=true
app.foo.spring.cloud.stream.instanceCount=10
deployer.foo.count=10

Related

Spring Cloud Stream should only consume not produce

We created a streams application in which we are consuming the data and pushing it to database. But it is creating a dummy topic to produce the data and throwing a error like "Not authorized to access topic".
Is there any configuration to restrict the streams app to consume alone.
We could have used a Consumer application but due to performance consideration we switched to streams.

Sprint Cloud Stream Kafka Streams Binder processor application stuck

I have the following Spring Cloud Stream Kafka Streams Binder 3.x application:
When I run X messages through this application by publishing them to the topic1 from an integration test using #SpringBootTest and #EmbeddedKafka the counts of messages at points 1 and 2 are equal, as I expect.
When I do the same using live application connected to the Kafka broker, the counts at point 1 and point 2 remain significantly different: Count1 >> Count2.
Kafka Tool shows a big Lag of the Processor2 consumer on the topic2 and that lag remains constant (doesn't change after I stop publishing messages)
The Processor2 consists of
flatTransform stateful transformer
aggregator
other downstream steps
What could be the reason of the distinct behaviour during test and live mode and Lag not going down in live mode?
I have thoroughly compared all application property values active in test and in live application, they are exactly equivalent.
There is only 1 partition in all topics in both cases.

In my case the reason was default 7 days retention setting of the topics that were automatically created by the Spring Cloud Stream application.
The messages in my input stream span 8 years, I am using custom TimestampExtractor.
After I have manually configured topics to a large retention time, the issue was solved:
/usr/bin/kafka-configs --bootstrap-server localhost:9092 --alter --entity-type topics --entity-name topic2 --add-config retention.hours=87600
Or set the log.retention.hours for the entire Kafka broker.

duplicate consumption of messages with Spring Cloud Stream Kafka binder

We have several micro-services using Spring Boot and Spring Cloud Stream Kafka binder to communicate between them.
Occasionally, we observe bursts of duplicate messages received by a consumer - often several days after it was first consumed and processed (successfully).
While I understand that Kafka does not guarantee exactly-once delivery, it still looks very strange, given that there were no rebalancing events or any 'suspicious' activity in the logs of either the brokers nor the services. Since the consumer is interacting with external APIs, it is a bit difficult to make it idempotent.
Any hints what might be the cause of duplication? What should I be looking for to figure this out?
We are using Kafka broker 1.0.0, and this particular consumer uses Spring Cloud Stream Binder Kafka 2.0.0, which is based on kafka-client 1.0.2 (version of the other services might be a bit different).

You should show your configuration when asking questions like this.
Best guess is the broker's offsets.retention.minutes.
With modern broker versions (since 2.0), it defaults to 1 week; with older versions it was only one day.

Persist state of Kafka Producer within Spring Clod/Boot

I want to implement a Kafka Producer with Spring that observes a Cloud Storage and emits meta informations about newly arrived files.
Until now we did that with a Kafka Connector but for some reasons we now have to do this with a simple Kafka producer.
Now I need to persist the state of the producer (e.g. timestamp of last commited file) in a kind of Offset Topic like the Connector did, but did not find a reasonable approach to do that.
My current idea is to hold the state by committing it to a topic that the producer also consumes but just acknowledge the last consumed state when commuting a new one. So if the Kubernetes pod of the producer dies and comes up again to consume the last state (not acknowledged) and so knows where it stopped.
But this idea seems to be a bit complex to just hold a state of a Kafka app. Is there a better approach for that?

messages published to all consumers with same consumer-group in spring-data-stream project

I got my zookeeper and 3 kafka broker running locally.
I started one producer and one consumer. I can see consumer is consuming message.
I then started three consumers with same consumer group name (different ports since its a spring boot project). but what I found is that all the consumers are now consuming (receiving) messages. But I expect the message to be load-balanced in that only messages are not repeated across the consumers. I don't know what the problem is.
Here is my property file
spring.cloud.stream.bindings.input.destination=timerTopicLocal
spring.cloud.stream.kafka.binder.zkNodes=localhost
spring.cloud.stream.kafka.binder.brokers=localhost
spring.cloud.stream.bindings.input.group=timerGroup
Here the group is timerGroup.
consumer code : https://github.com/codecentric/edmp-sample-stream-sink
producer code : https://github.com/codecentric/edmp-sample-stream-source

Can you please update dependencies to Camden.RELEASE (and start using Kafka 0.9+) ? In Brixton.RELEASE, Kafka consumers were 0.8-based and required passing instanceIndex/instanceCount as properties in order to distribute partitions correctly.
In Camden.RELEASE we are using the Kafka 0.9+ consumer client, which does load-balancing in the way you are expecting (we also support static partition allocation via instanceIndex/instanceCount, but I suspect this is not what you want). I can enter into more details on how to configure this with Brixton, but I guess an upgrade should be a much easier path.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Consumer issues Spring Cloud Rabbit Stream - Cloud Foundry - spring-boot

Related

Spring Cloud Stream should only consume not produce

Sprint Cloud Stream Kafka Streams Binder processor application stuck

duplicate consumption of messages with Spring Cloud Stream Kafka binder

Persist state of Kafka Producer within Spring Clod/Boot

messages published to all consumers with same consumer-group in spring-data-stream project

Categories

Resources