I have a spring cloud dataflow stream deployed in PCF using rabbit as the binder. I have multiple processors in the pipeline. Occasionally I see issues wherein a partitioned consumer does not consume messages from Rabbit until the consumer is restarted. For instance, within my stream, I have a processor that has partitioned input. The processor, foo, has 10 partitions. All partitions consume messages without issues 99% of the time. At rare occasions, one partition is not drained. When the instance listening to the partition is terminated and recreated, all works well again. Is there a mechanism to capture these issues? Will listening to ListenerContainerConsumerFailedEvent help in detecting such issues? Is there a preferred way to recover from such issues?
Sample stream definition is as follows:-
Source | foo | bar | Sink
Deployment Properties:-
app.Source.spring.cloud.stream.bindings.output.producer.partition-key-expression=headers['PARTITION_KEY']
app.Source.spring.cloud.stream.bindings.output.producer.partition-count=10
app.foo.spring.cloud.stream.bindings.input.consumer.partitioned=true
app.foo.spring.cloud.stream.instanceCount=10
deployer.foo.count=10
Related
We created a streams application in which we are consuming the data and pushing it to database. But it is creating a dummy topic to produce the data and throwing a error like "Not authorized to access topic".
Is there any configuration to restrict the streams app to consume alone.
We could have used a Consumer application but due to performance consideration we switched to streams.
I have the following Spring Cloud Stream Kafka Streams Binder 3.x application:
When I run X messages through this application by publishing them to the topic1 from an integration test using #SpringBootTest and #EmbeddedKafka the counts of messages at points 1 and 2 are equal, as I expect.
When I do the same using live application connected to the Kafka broker, the counts at point 1 and point 2 remain significantly different: Count1 >> Count2.
Kafka Tool shows a big Lag of the Processor2 consumer on the topic2 and that lag remains constant (doesn't change after I stop publishing messages)
The Processor2 consists of
flatTransform stateful transformer
aggregator
other downstream steps
What could be the reason of the distinct behaviour during test and live mode and Lag not going down in live mode?
I have thoroughly compared all application property values active in test and in live application, they are exactly equivalent.
There is only 1 partition in all topics in both cases.
In my case the reason was default 7 days retention setting of the topics that were automatically created by the Spring Cloud Stream application.
The messages in my input stream span 8 years, I am using custom TimestampExtractor.
After I have manually configured topics to a large retention time, the issue was solved:
/usr/bin/kafka-configs --bootstrap-server localhost:9092 --alter --entity-type topics --entity-name topic2 --add-config retention.hours=87600
Or set the log.retention.hours for the entire Kafka broker.
We have several micro-services using Spring Boot and Spring Cloud Stream Kafka binder to communicate between them.
Occasionally, we observe bursts of duplicate messages received by a consumer - often several days after it was first consumed and processed (successfully).
While I understand that Kafka does not guarantee exactly-once delivery, it still looks very strange, given that there were no rebalancing events or any 'suspicious' activity in the logs of either the brokers nor the services. Since the consumer is interacting with external APIs, it is a bit difficult to make it idempotent.
Any hints what might be the cause of duplication? What should I be looking for to figure this out?
We are using Kafka broker 1.0.0, and this particular consumer uses Spring Cloud Stream Binder Kafka 2.0.0, which is based on kafka-client 1.0.2 (version of the other services might be a bit different).
You should show your configuration when asking questions like this.
Best guess is the broker's offsets.retention.minutes.
With modern broker versions (since 2.0), it defaults to 1 week; with older versions it was only one day.
I want to implement a Kafka Producer with Spring that observes a Cloud Storage and emits meta informations about newly arrived files.
Until now we did that with a Kafka Connector but for some reasons we now have to do this with a simple Kafka producer.
Now I need to persist the state of the producer (e.g. timestamp of last commited file) in a kind of Offset Topic like the Connector did, but did not find a reasonable approach to do that.
My current idea is to hold the state by committing it to a topic that the producer also consumes but just acknowledge the last consumed state when commuting a new one. So if the Kubernetes pod of the producer dies and comes up again to consume the last state (not acknowledged) and so knows where it stopped.
But this idea seems to be a bit complex to just hold a state of a Kafka app. Is there a better approach for that?
I got my zookeeper and 3 kafka broker running locally.
I started one producer and one consumer. I can see consumer is consuming message.
I then started three consumers with same consumer group name (different ports since its a spring boot project). but what I found is that all the consumers are now consuming (receiving) messages. But I expect the message to be load-balanced in that only messages are not repeated across the consumers. I don't know what the problem is.
Here is my property file
spring.cloud.stream.bindings.input.destination=timerTopicLocal
spring.cloud.stream.kafka.binder.zkNodes=localhost
spring.cloud.stream.kafka.binder.brokers=localhost
spring.cloud.stream.bindings.input.group=timerGroup
Here the group is timerGroup.
consumer code : https://github.com/codecentric/edmp-sample-stream-sink
producer code : https://github.com/codecentric/edmp-sample-stream-source
Can you please update dependencies to Camden.RELEASE (and start using Kafka 0.9+) ? In Brixton.RELEASE, Kafka consumers were 0.8-based and required passing instanceIndex/instanceCount as properties in order to distribute partitions correctly.
In Camden.RELEASE we are using the Kafka 0.9+ consumer client, which does load-balancing in the way you are expecting (we also support static partition allocation via instanceIndex/instanceCount, but I suspect this is not what you want). I can enter into more details on how to configure this with Brixton, but I guess an upgrade should be a much easier path.