Kafka cluster on AWS crash - amazon-ec2

I've been having a recurring issue with a kafka cluster running on AWS EC2 instances.
Description
Kafka cluster version 0.10.1.0
3 brokers cluster
topics have 6 partitions per broker
Instance type is m4.xlarge
Symptoms
The following will happen at random intervals, on random brokers
From the logs here is the information I could gather :
Shrinking Intra-cluster replication on a random broker
(I suppose it could be a temporary network failure but couldn't produce evidence of it)
System starts showing close to no activity #02:27:20 (note that it's not load related as it happens at very quiet times)
From there, this kafka broker doesn't process messages which is expected IMO as it dropped out of the cluster replication.
Now the real issue appears as the number of connections in CLOSE_WAIT
is constantly increasing until it reaches the configured ulimit of the system/process, ending up crashing the kafka process.
Now, I've been changing limits to see if kafka would eventually join again the ISR before crashing but even with a limit that's very high, kafka just seems stuck in a weird state and never recovers.
Note that between the time when the faulty broker is on its own and the time it crashes, kafka is listening and kafka producer.
For this single crash, I could see 320 errors like this from the producers :
java.util.concurrent.ExecutionException: org.springframework.kafka.core.KafkaProducerException: Failed to send; nested exception is org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.
The configuration being the default one and the use being quite standard, I'm wondering if I missed something.
I put in place a script that check the number of kafka file descriptors and restarts the service when it gets abnormally high, which does the trick for now but I still lose messages when it crashes.
Any help to get to the bottom of this would be appreciated.

Turns out there was a deadlock in the version I was using.
Upgrading fixed the issue.
See ticket about the issue :
https://issues.apache.org/jira/browse/KAFKA-5721

Related

MassTransit consumers didn't acknowledge some messages

I have a question about some strange behaviour of consumer.
Recently we had strange situation on production environment. Two consumers on two different microservices were stuck at some messages. The first one was holding 20 messages from rabbitMQ queue and the second one 2 messages and they weren't processing them. These messages were visible as Unacked in RabbitMQ for two days. They went back to Ready state just when that two microservices were restarted. At that time when consumers took this messages the whole program was processing thousands messages per hour, so basically our Saga and all consumers were working. When these messages went back to Ready state they were processed in one second after that so I don't think that it's problem with them.
The messages are published by Saga to Exchange and besides these two stucked consumers we have also EventLogger consumer subscribed to all messages and this EventLogger processed this 22 messages normally without any problems (from his own queue). Also we have connected Application Insights to consumers and there is no information about receiving these 22 messages by these two consumers (there are information about receiving it by EventLogger).
The other day we had the same issue with one message on test environment.
Recently we updated version of MassTransit in our project from version 6.2.0 to 7.1.6 and before that we didn't notice any similar issues with consumers but maybe it's just coincidence. We also have retry, redelivery, circuit breaker and in memory outbox mechanisms but I don't think that's problem with them because the consumer didn't even start to process these 22 messages.
Do you have any suggestions what could happened to this consumers?
Usually when a consumer doesn't even start to consume the message once it has been delivered to MassTransit by RabbitMQ, it could be an issue resolving the consumer from the container, such as a dependency to another backing service (database, log server, file, network connection, device, etc.).
The message remains unacknowledged on the broker because the transport/delivery mechanism to the consumer is waiting for a resource to become available. If there isn't anything in the logs for that time period indicating an issue with a resource, it's hard to know what could have blocked those messages from being consumed. The fact that they were ultimately consumed once the services were restarted seems to indicate the message content itself was fine.
Monitoring the lack of message consumption (and likely an associated queue depth increase) would give an indication that the situation has occurred. If it happens again, I'd increase the logging detail levels to see if the issue occurs again and can then be identified.

NiFi from hadoop to kafak with exactly once guarantee

Is it possible for NiFi to read from hdfs (or hive) and publish data-rows to kafka with exactly once delivery guarantee?
Publishing to Kafka from NiFi is at-least-once guarantee because a failure could occur after Kafka has already received the message, but before NiFi receives the response, which could be due to a network issue, or maybe nifi crashed and restarted at that exact moment.
In any of those cases, the flow file would be put back in the original queue before the publish kafka processor (i.e. the session was never committed), and so it would be tried again.
Due to the threading model where different threads may execute the processor, it can't be guaranteed that the same thread that originally did the publishing will be the same thread that does the retry, and therefore can't make use of the "idempotent producer" concept.

Handling kafka clients updates in kubernetes

I have a Kafka cluster running on AWS MSK with Kafka producer and consumer go clients running in kubernetes. The producer is responsible for sending the stream of data to Kafka. I need help solving the following problems:
Let's say, there is some code change in producer code and have to redeploy it in kubernetes. How can I do that? Since the data is continuously generated, I cannot just simply stop the already running producer and deploy the updated one. In this case, I will lose the data between the update process.
Sometimes due to a panic(golang) in the code, the client crashes, but since it is running as a pod, kubernetes restarts it. I am not able to understand as to whether it's a good thing or bad.
Thanks
For your first question, I would suggest having rolling update of your deployment in the cluster.
For second, that is the general behavior of deployments in kubernetes. I could think of an external monitoring solution that de-deploys your application or stops handling requests in case of a panic.
It would be better if you could explain why exactly you need such kind of behavior.!

Storm Pacemaker with upgraded KafkaSpout

I had a question regarding the usage of Pacemaker. We have a currently running Storm cluster on 1.0.2 and are in the process of migrating it to 1.2.2. We also use KafkaSpout to consume data from the KAfka topics.
Now, since this release in for Kafka 0.10 +, most of the load from ZK would be taken off since the offsets won't be stored in ZK.
Considering this, does it make sense for us to also start looking at Pacemaker to reduce load further on ZK?
Our cluster has 70+ supervisor and around 70 workers with a few unused slots. Also, we have around 9100+ executors/tasks running.
Another question I have is regarding the heartbeats and who all send it to whom? From what I have read, workers and supervisors send their heartbeats to ZK, which is what Pacemaker alleviates. How about the tasks? Do they also send heartbeats? If yes, then is it to ZK or where else? There's this config called task.heartbeat.frequency.secs which has led me to some more confusion.
The reason I ask this is that if the task level heartbeats aren't being sent to ZK, then its pretty evident that Pacemaker won't be needed. This is because with no offsets being committed to ZK, the load would be reduced dramatically. Is my assesment correct or would Pacemaker be still a feasible option? Any leads would be appreciated.
Pacemaker is an optional Storm daemon designed to process heartbeats from workers, which is implemented as a in-memory storage. You could use it if ZK become a bottleneck because the storm cluster scaled up
supervisor report heartbeat to nimbusthat it is alive, used for failure tolerance, and the frequency is set via supervisor.heartbeat.frequency.secs, stored in ZK.
And worker should heartbeat to the supervisor, the frequency is set via worker.heartbeat.frequency.secs. These heartbeats are stored in local file system.
task.heartbeat.frequency.secs: How often a task(executor) should heartbeat its status to the master(Nimbus), it never take effect in storm, and has been deprecated for Storm v2.0 RPC heartbeat reporting
This heartbeat stats what executors are assigned to which worker, stored in ZK.

Performance issue on creating new topic

Currently, we have VM configured with 18GB ram, 8 core CPU.
We are running broker and nameserver both on the same machine.
As of now, we have around 3563 topics. So, name server and broker consuming 13 GB from 18GB.
I am facing latency issue on creating a new topic. (For new topic creating it taking around 13 to 15 seconds).
I am looking to create a topic in just fraction of the second.
Can I know a reason for this latency issue?
Quick note: We looking to create millions of topic in RocketMQ.
We also understand it needed sufficient ram/core to manages these.
Is RocketMQ quite enough to handle millions of topics...?
I have encountered this problem also. When I create a topic ,the command line timeout with 3000ms.The lower version has this problem.The reason I will explain below:
1.client send new topic config to each of broker, then wait for broker register all of topic configs to namesrv.
2.If you have a large number of topic, then the process will costs more than 3000ms and result in a timeout.
The higher version's process was changed!
1.client send new topic config to each of broker, then broker return success at once and register all of topic configs to namesrv async.
By the way, although timeout occurred but the topic create success you can verify it by use topicRoute command.If you want to reduce the latency on create topic, you will figure out a mechanism which need not register the whole topic configs to namesrv.

Resources