RocketMQ Consumer request offset is much bigger than max offset in broker and Consumer Diff is negative - rocketmq

RocketMQ version: 3.2.6
Our cluster:
2 NameServer
6 Master Broker
6 Slave Broker
We have a lot of consumers(about 100) consume message from broker ,
We use command to monitor consume diff command:
/data/alibaba-rocketmq/bin/mqadmin consumerProgress -n XXX:XX The
diff is negative (eg: -898232391123,-8323231872) in only one
broker,other brokers is healthy.
A lot of broker warn log below
a lot of errors : Consumer request offset is much bigger than max offset
a lot of errors : connection reset frequently

finally,I found the answer in github
https://github.com/alibaba/RocketMQ/releases
It's a bug and fixed in version 3.4.6. That's ok.

Related

Spring cloud stream - Kafka consumer consuming duplicate messages with StreamListener

With our spring boot app, we notice kafka consumer consuming message twice randomly once in a while only in prod env. We have 6 instances with 6 partitions deployed in PCF.We have caught messages with same offset and partition received twice in same topic which causes duplicates and is a business critical for us.
We haven't noticed this in non production env and it is hard to reproduce in non prod env. We have recently switched to Kafka and we are not able to find out the root issue.
We are using spring-cloud-stream/spring-cloud-stream-binder-kafka- 2.1.2
Here is the Config:
spring:
cloud:
stream:
default.consumer.concurrency: 1
default-binder: kafka
bindings:
channel:
destination: topic
content_type: application/json
autoCreateTopics: false
group: group
consumer:
maxAttempts: 1
kafka:
binder:
autoCreateTopics: false
autoAddPartitions: false
brokers: brokers list
bindings:
channel:
consumer:
autoCommitOnError: true
autoCommitOffset: true
configuration:
max.poll.interval.ms: 1000000
max.poll.records: 1
group.id: group
We use #Streamlisteners to consume the messages.
Here is the instance we received duplicate and the error message captured in server logs.
ERROR 46 --- [container-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator
: [Consumer clientId=consumer-3, groupId=group] Offset commit failed
on partition topic-0 at offset 1291358: The coordinator is not aware
of this member. ERROR 46 --- [container-0-C-1]
o.s.kafka.listener.LoggingErrorHandler : Error while processing:
null OUT org.apache.kafka.clients.consumer.CommitFailedException:
Commit cannot be completed since the group has already rebalanced and
assigned the partitions to another member. This means that the time
between subsequent calls to poll() was longer than the configured
max.poll.interval.ms, which typically implies that the poll loop is
spending too much time message processing. You can address this
either by increasing the session timeout or by reducing the maximum
size of batches returned in poll() with max.poll.records.
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:871)
~[kafka-clients-2.0.1.jar!/:na]
There is no crash and all the instances are healthy at the time of duplicate. Also there is confusion with error log - Error while processing: null since Message was successfully processed twice. And max.poll.interval.ms: 100000 which is about 16 minutes and it is supposed to be enough time to process any message for the system and session timeout and heartbit config is default. Duplicate is received within 2 seconds in most of the instances.
Any configs that we are missing ? Any suggestion/help is highly appreciated.
Commit cannot be completed since the group has already rebalanced
A rebalance occurred because your listener took too long; you should adjust max.poll.records and max.poll.interval.ms to make sure you can always handle the records received within the timelimit.
In any case, Kafka does not guarantee exactly once delivery, only at least once delivery. You need to add idempotency to your application and detect/ignore duplicates.
Also, keep in mind StreamListener and the annotation-based programming model has been deprecated for 3+ years and has been removed from the current main, which means the next release will not have it. Please migrate your solution to a functional based programming model

Messages are dropping because too many are queued in AlertManager

I have single instance cluster for AlertManager and I see warning in AlertManager container level=warn ts=2021-11-03T08:50:44.528Z caller=delegate.go:272 component=cluster msg="dropping messages because too many are queued" current=4125 limit=4096
Alert Manager Version information:
Version Information
Branch: HEAD
BuildDate: 20190708-14:31:49
BuildUser: root#868685ed3ed0
GoVersion: go1.12.6
Revision: 1ace0f76b7101cccc149d7298022df36039858ca
Version: 0.18.0
AlertManager metrics
# HELP alertmanager_cluster_members Number indicating current number of members in cluster.
# TYPE alertmanager_cluster_members gauge
alertmanager_cluster_members 1
# HELP alertmanager_cluster_messages_pruned_total Total number of cluster messages pruned.
# TYPE alertmanager_cluster_messages_pruned_total counter
alertmanager_cluster_messages_pruned_total 23020
# HELP alertmanager_cluster_messages_queued Number of cluster messages which are queued.
# TYPE alertmanager_cluster_messages_queued gauge
alertmanager_cluster_messages_queued 4125
How do we see those queued messages in AlertManager?
Do we lose alerts when messages are dropped because of too many
queued ?
Why are messages queued even though there is logic to prune messages
on regular interval i.e 15 minutes ?
Do we lose alerts when AlertManager pruned messages on regular interval?
I am new to alerting. Could you please answer for the above questions?

ActiveMQ warning: Frame size of 1 GB larger than max allowed 100 MB

I'm trying to switch from a legacy jms broker to ActiveMQ.
One thing I cannot figure out is a warning in the logs once per hour:
WARN | Transport Connection to: tcp://127.0.0.1:38542 failed: java.io.IOException:
Frame size of 1 GB larger than max allowed 100 MB | ...
It's obviously some scheduled job in ActiveMQ that outputs this warning,
because it comes at the same minute every hour,
regardless of whether any messages are sent or not.
But what exactly means "Frame size" here?
We are not sending any jms-messages larger than a few kilobytes or so...
I read you can increase this "maxFramesize" in the connector, but doesn't help either.
When I try set it to 1GB (1073741824) (or higher) :
<transportConnector name="openwire"
uri="tcp://0.0.0.0:61616?maximumConnections=100&wireFormat.maxFrameSize=1073741824"/>
I still see the (now absurd) warning-message:
WARN | Transport Connection to: tcp://127.0.0.1:42256 failed: java.io.IOException:
Frame size of 1 GB larger than max allowed 1 GB
What is ActiveMQ actually complaining about?
And how can I fix it?
ActiveMQ 5 would only log this message if someone was sending your broker a message that is encoded to a size larger than the configured limit. Since it happens to you on a regular interval then I'd look for some external resource that is doing something silly like trying to telnet into the broker Openwire port to check liveness and sending some garbage string or some such. The broker would not be logging that error unless something was inbound so you need to start looking for the source of the errant sender.

Why did Kafka Consumer re-processed all the records since last 2 months?

In one of the instances, when the consumer service was restarted, it leads to re-processing all the records which were sent to Kafka.
Kafka Broker: 0.10.0.1
Kafka producer Service: Springboot version 1.4.3.Release
Kafka Consumer Springboot Service: Springboot version 2.2.0.Release
Now for investigating this issue, I want to recreate this scenario again in the dev/local environment which is not happening!!!
What can be the probable cause?
How to check, if the record once is processed from Consumer Side is committed when we send Acknowledgement.acknowledge();
Consumer - properties
Enable Auto commit = false;
Auto offset Reset = earliest;
max poll records = 1;
max poll interval ms config = I am calculating the value of this parameter at runtime from the formula ==>> (number_of_retries * x * 2) <= INTEGER.MaxValue
Retry Policy - Simple
number of retries = 3;
interval between retries = x (millis)
I am creating Topics at runtime on consumer side via beans NewTopic(topic_name, 1, (**short**)1)
There are 2 Kafka clusters and 1 zookeeper instances running.
Any help would be appreciated
That broker is very old; if a consumer receives no records for 24 hours, the offsets were removed and restarting the consumer would cause it to reprocess all the records.
With newer brokers, it was changed to 7 days and the consumer has to be stopped for 7 days for the offsets to be removed.
Spring Boot 1.4.x (and even 1.5.x, 2.0.x) is no longer supported; the current version is 2.3.1.
You should upgrade to a newer broker and a more recent Spring Boot version.

KafkaProducerException on sending message to a topic

Spring boot properties for kafka producer:
spring.kafka.bootstrap-servers=localhost:9092
spring.kafka.client-id=bam
#spring.kafka.producer.acks= # Number of acknowledgments the producer requires the leader to have received before considering a request complete.
spring.kafka.producer.batch-size=0
spring.kafka.producer.bootstrap-servers=localhost:9092
#spring.kafka.producer.buffer-memory= # Total bytes of memory the producer can use to buffer records waiting to be sent to the server.
spring.kafka.producer.client-id=bam-producer
spring.kafka.consumer.auto-offset-reset=earliest
#spring.kafka.producer.compression-type= # Compression type for all data generated by the producer.
spring.kafka.producer.key-serializer= org.apache.kafka.common.serialization.StringSerializer
#spring.kafka.producer.retries= # When greater than zero, enables retrying of failed sends.
spring.kafka.producer.value-serializer= org.apache.kafka.common.serialization.StringSerializer
#spring.kafka.properties.*= # Additional properties used to configure the client.
I am getting below exception when i am trying to send message to a kafka topic :
Caused by: org.springframework.kafka.core.KafkaProducerException: Failed to send; nested exception is org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for bam-0 due to 30004 ms has passed since last append
at org.springframework.kafka.core.KafkaTemplate$1.onCompletion(KafkaTemplate.java:255)
at org.apache.kafka.clients.producer.internals.RecordBatch.done(RecordBatch.java:109)
at org.apache.kafka.clients.producer.internals.RecordBatch.maybeExpire(RecordBatch.java:160)
at org.apache.kafka.clients.producer.internals.RecordAccumulator.abortExpiredBatches(RecordAccumulator.java:245)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:212)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:135)
... 1 more
Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for bam-0 due to 30004 ms has passed since last append
I am not able to figure our why i am getting this exception. Can some one please help ?
The producer is timing out trying to send messages. I notice you are using localhost in your bootstrap. Make sure a broker is available locally and listening on port 9092.
Issue resolved by setting advertised.listeners in server.properties to PLAINTEXT://<ExternalIP>:9092.
Note: Kafka is deployed over aws.

Resources