Confluent kafka go client memory leak - go

My service consumes messages from one kafka topic. While the consumer is idle and blocked waiting for messages I see a continuous and linear increase in the POD memory. GO pprof proves that the go memory consumption is constant around 40 MB, at the same time POD metrics show more than 100 MB is consumed.
This leads me to the conclusion that memory is consumed in the C library librdkafka as mentioned here https://zendesk.engineering/hunting-down-a-c-memory-leak-in-a-go-program-2d08b24b617d
The solution to the memory consumption in librdkafka in the link above was to consume the OffsetCommitResponse events that librdkafka produces. Here is the quote from the link:
It turned out that librdkafka was generating an event every time it
received an OffsetCommitResponse from the Kafka broker (which, with
our auto-commit interval set to 5 seconds, was pretty often), and
placing it in a queue for our app to handle. However, our application
was not actually handling events from that queue, so the size of that
queue grew without bound
Does anyone know how to consume these events in go? unfortunately the link above didn't mention the solution

I solved this issue by counting the number of consumed messages in my service. When the number of consumed messages reaches a configured value e.g. 100,000 in my case, then I simply close and recreate the kafka consumer and producer.
This solution is neither elegant nor doesn't solve the original issue, but hey it stabilized my production. Now I have a flat memory consumption curve.

Related

Decrease consume rate on RabbitMq server

We are running production single server RabbitMQ (3.7) where around 500 mobile applications are connected as producers (MQTT) and around 10 server applications as consumers. Those 500 publishers push messages basically into one queue and less often in the another one.
Recently we had issue with spikes of stacked messages in all our queues. Numbers of stacked messages went from 1 to 1000. This spike was caused by decrease of consumer rate.
I tired to find what happened and how to eliminate spikes in queues and I should limit queue length or eliminate connections. But we can’t limit we have to perform better. I took a look into RabbitMQ memory usage, cpu and same for consumers everything looks fine and RabbitMq was running around 50% on total load same for memory. Also consumers doesn’t seems to be a bottleneck because consume rate went event higher after the queue length grown.
I have a couple of questions:
Is RabbitMQ designed for such a large amount of consumers?
I read that each queue is single threaded is it possible that rabbit just can’t handle 500 producers in one queue and throughput gets lower?
What else I can use to tackle the cause of lower consumer rate? Number of threads in Rabbit?
What do you recon to measure or test benchmark/performance of RabbitMQ server?

Redis vs Kafka vs RabbitMQ for 1MB messages

I am currently researching a queueing solution to handle medium sized messages of 1MB.
Besides the features differences between Redis, Kafka and RabbitMQ I cannot find any good answer to their performance on messages of size around 1MB.
Any of you guys knows how many messages of 1MB can any of these handle?
Do you know any other queueing solutions which can perform better?
When you are evaluating Kafka vs Redis in your case, there are other factors which you have to take into account, besides message size. Here are some of them I can think of:
How many producers/consumers? Redis performance can be affected in case of greater number of producers/consumers due to the nature of Redis (push based queue). This is because Redis delivers the message to all the consumers at once, at the moment the message is put in the queue.
Do you need speed or reliability first? If speed is of utmost importance, use Redis since it does not persist messages and it will deliver them faster. If you need reliability use Kafka since it persist messages even after they are delivered.
Do you want your consumers to get messages once they are ready or you want messages to be sent to the consumers immediately? In first case use Kafka because it's pull based mechanism (consumer have to ask for the message). In second case use Redis since it's push based mechanism (message is pushed to the consumer once it's on the queue). RabbitMQ is also push based (although there is pull API with bad performance)
What is the number of messages expected? If it's not huge use Redis since you are limited with memory. Otherwise use Kafka. Best practice for RabbitMQ is to keep queues short. This means that you can consume messages at the close rate at which they appear on the queue. So if you have some long lasting operation on the consumer part probably RabbitMQ is not the best choice.
Scaling? Kafka scales horizontally really well (it's built with scalability in mind). RabbitMQ is usually scaled vertically. Redis also scales well horizontally if needed.
It's obvious that there are more than one criteria when you evaluate proper queueing solution. There are best practices and recommendations for each of the queueing engines that you are looking at. Think more about your specific use case, it's definitely worth the time since it will save you time later on if you chose inappropriate queueing engine.
I am answering for Kafka.
Kafka itself has very good performance even for big messages.
In our tests with 2 Kafka nodes we reach p2p communication with 170 MB/sec smaller messages 150 MB/s bigger messages.
The only thing you need to remember is to configure the broker to accept bigger messages.
Hier is nice article: Configuring Kafka for Performance and Resource Management - Handling Large Messages
I know other p2p solution which might be interesting when you have concrete requirements look at YAMI4
I was using Redis but only for very small messages, so I cannot say anything about 1MB.

mq slow persistent message reading

I am trying to track down an issue where a client can not read messages as fast as they should. Persistent messages are written to a queue. At times, the GET rate is slower than the PUT rate and we see messages backing up.
Using tcpdump, I see the following:
MQGET: Convert, Fail_If_Quiescing, Accept_Truncated_Msg, Syncpoint, Wait
Message is sent
Notification
MQCMIT
MQCMIT_REPLY
In analyzing the dump, sometimes I see the delta between the MQCMIT and MQCMIT_REPLY be in the 0.001 second timeframe and I also see it in the 0.1 second timeframe. It seems like the 0.1 sec delay is slowing the message transfer down. Is there anything I can do to decrease the delta between the MQCMIT and MQCMIT_REPLY? Should the client be reading multiple messages before the MQCMIT is sent?
This is MQ 8.0.0.3 on AIX 7.1.
The most straightforward way to increase message throughput on the receiving side is to batch MQGET operations. That is, do not issue MQCMIT for every MQGET, but rather after a number of MQGET operations. MQCMIT is the most expensive operation for persistent messages since it involves forcing log writes on the queue manager, and therefore suffers disk I/O latency. Experiment with the batch size - I often use 100, but some applications can go even higher. Too many outstanding MQGET operations can be problematic since they keep the transaction running for much longer time and prevent the log switching.
And of course you can check if your system overall tuning is satisfactory. You might have too long a latency between your client and queue manager, or your logs may reside on a slow device, or the logs may share the device with the queue files or an otherwise busy filesystem.

Kafka Producer 0.9.0 performance, large number of waiting-threads

We are writing messages at the rate of about 9000 records/sec into our kafka cluster, at times we see that the producer performance degrades considerably and then it never recovers. When this happens we see the following error "unable to allocate buffer within timeout". Below are the JMX producer metrics taken when the process is running well and when it reaches the bad state. The "waiting-threads" metric is very high when the process degrades, any inputs would be appreciated.
The producer parameters are
batch.size=1000000
linger.ms=30000
acks=-1
metadata.fetch.timeout.ms=1000
compression.type=none
max.request.size=10000000
Athough the buffer is fully available the errors are "org.apache.kafka.common.errors.TimeoutException: Failed to allocate memory within the configured max blocking time"
at one point you are starting to send batches of 1.000.000 messages I think that that's why you performance gets degraded. Try lowering that number or set the linger.ms lower.

Is it possible to declare a maximum queue size with AMQP?

As the title says — is it possible to declare a maximum queue size and broker behaviour when this maximum size is reached? Or is this a broker-specific option?
I ask because I'm trying to learn about AMQP, not because I have this specific problem with any specific broker… But broker-specific answers would still be insightful.
AFAIK you can't declare maximum queue size with RabbitMQ.
Also there's no such setting in the AMQP sepc:
http://www.rabbitmq.com/amqp-0-9-1-quickref.html#queue.declare
Depending on why you're asking, you might not actually need a maximum queue size. Since version 2.0 RabbitMQ will seamlessly persist large queues to disk instead of storing all the messages in RAM. So if your concern the broker crashing because it exhausts its resources, this actually isn't much of a problem in most circumstances - assuming you aren't strapped for hard disk space.
In general this persistence actually has very little performance impact, because by definition the only "hot" parts of the queue are the head and tail, which stay in RAM; the majority of the backlog is "cold" so it makes little difference that it's sitting on disk instead.
We've recently discovered that at high throughput it isn't quite that simple - under some circumstances the throughput can deteriorate as the queue grows, which can lead to unbounded queue growth. But when that happens is a function of CPU, and we went for quite some time without hitting it.
You can read about RabbitMQ maximum queue implementation here http://www.rabbitmq.com/maxlength.html
They do not block the incoming messages addition but drop the messages from the head of the queue.
You should definitely read about Flow control here:
http://www.rabbitmq.com/memory.html
With qpid, yes
you can confire maximun queue size and politic in case raise the maximum. Ring, ignore messages,broke connection.
you also have lvq queues (las value) very configurable
There are some things that you can't do with brokers, but you can do in your app. For instance, there are two AMQP methods, basic.get and queue.declare, which return the number of messages in the queue. You can use this to periodically get a count of outstanding messages and take action (like start new consumer processes) if the message count gets too high.

Resources