Reduce RabbitMQ memory usage - memory-management

I'm trying to run RabbitMQ on a small VPS (512mb RAM) along with Nginx and a few other programs. I've been able to tweak the memory usage of everything else without difficulty, but I can't seem to get RabbitMQ to use any less RAM.
I think I need to reduce the number of threads Erlang uses for RabbitMQ, but I've not been able to get it to work. I've also tried setting the vm_memory_high_watermark to a few different values below the default (of 40%), even as low as 5%.
Part of the problem might be that the VPS provider (MediaTemple) allows me to go over my allocated memory, so when using free or top, it shows that the server has around 900mb.
Any suggestions to reduce memory usage by RabbitMQ, or limit the number of threads that Erlang will create? I believe Erlang is using 30 threads, based on the -A30 flag that I've seen on the process command.
Ideally I'd like RabbitMQ mem usage to be below 100mb.
Edit:
With vm_memory_high_watermark set to 5% (or 0.05 in the config file), the RabbitMQ logs report that RabbitMQ's memory limit is set to 51mb. I'm not sure where 51mb is coming from. Current VPS allocated memory is 924mb, so 5% of that should be around 46mb.
According to htop/free before starting up RabbitMQ, I'm sitting around 453mb of used ram, and after start RabbitMQ I'm around 650mb. Nearly 200mb increase. Could it be that 200mb is the lower limit that RabbitMQ will run with?
Edit 2
Here are some screenshots of ps aux and free before and after starting RabbitMQ and a graph showing the memory spike when RabbitMQ is started.
Edit 3
I also checked with no plugins enabled, and it made very little difference. It seems the plugins I had (management and its prerequisites) only added about 8mb of ram usage.
Edit 4
I no longer have this server to test with, however, there is a conf setting delegate_count that is set to a default of 16. As far as I know, this spawns 16 sup-procs for rabbitmq. Lowering this number on smaller servers may help reduce the memory footprint. No idea if this actually works, or how it impacts performance, but it's something to try.

The appropriate way to limit memory usage in RabbitMQ is using the vm_memory_high_watermark. You said:
I've also tried setting the
vm_memory_high_watermark to a few
different values below the default (of
40%), even as low as 5%.
This should work, but it might not be behaving the way you expect. In the logs, you'll find a line that tells you what the absolute memory limit is, something like this:
=INFO REPORT==== 29-Oct-2009::15:43:27 ===
Memory limit set to 2048MB.
You need to tweak the memory limit as needed - Rabbit might be seeing your system as having a lot more RAM than you think it has if you're running on a VPS environment.
Sometimes, Rabbit can't tell what system you're on and uses 1GB as the base point (so you get a limit of 410MB by default).
Also, make sure you are running on a version of RabbitMQ that supports the vm_memory_high_watermark setting - ideally you should run with the latest stable release.

Make sure to set an appropriate QoS prefetch value. By default, if there's a client, the Rabbit server will send any messages it has for that client's queue to the client. This results in extensive memory usage both on the client & the server.
Drop the prefetch limit down to something reasonable, like say 100, and Rabbit will keep the remaining messages on disk on the server until the client is really ready to process them, and your memory usage will go way way down on both the client & the server.
Note that the suggestion of 100 is just a reasonable place to start - it sure beats infinity. To really optimize that number, you'll want to take into consideration the messages/sec your client is able to process, the latency of your network, and also how large each of your messages is on average.

Related

Rejecting new http requests when memory usage is high in GoLang

I have a simple http server written in Go that accepts big chunks of data (up to 5MB per single request in some cases, but can be even tens of KB, depends on the usage pattern). Then server asynchronously processes received data by adding it to the buffer from which one of the workers (goroutines) pick ups a task. This server is running as a container in kubernetes and has a memory limit set. Also unfortunately I'm not allowed to use HPA as only one pod is allowed per client.
Problem occurs when, someone tries to send a lot of big chunks of data to my server, due to memory limit kubelet kills my container and as a result all data stored in the buffer is lost.
I have tried next ways to mitigate a problem:
Remove memory limit in pod specs. Unfortunately my server is running in multitenant environment and I'm forced to set memory limit.
Limited a number of requests processed in flight by adding a buffered channel and timeout when request can't be added to it in 10 seconds. This has partially mitigated a problem. But first it's quite tricky to find a good balance between a buffer size and timeout, second if client has a lot of small requests server is dropping part of them even if it has a lot of free memory.
I have found that I get current memory usage of my binary by calling runtime.GetMemStats. So my next idea is to drop requests if for example memory goes above some threshold (80%). Is it the only solution to resolve a problem?

Kubernetes number of replicas vs performance

I have just gotten into Kubernetes and really liking its ability to orchestrate containers. I had the assumption that when the app starts to grow, I can simply increase the replicas to handle the demand. However, now that I have run some benchmarking, the results confuse me.
I am running Laravel 6.2 w/ Apache on GKE with a single g1-small machine as the node. I'm only using NodePort service to expose the app since LoadBalancer seems expensive.
The benchmarking tool used are wrk and ab. When the replicas is increased to 2, requests/s somehow drops. I would expect the requests/s to increase since there are 2 pods available to serve the request. Is there a bottleneck occurring somewhere or perhaps my understanding is flawed. Do hope someone can point out what I'm missing.
A g1-small instance is really tiny: you get 50% utilization of a single core and 1.7 GB of RAM. You don't describe what your application does or how you've profiled it, but if it's CPU-bound, then adding more replicas of the process won't help you at all; you're still limited by the amount of CPU that GCP gives you. If you're hitting the memory limit of the instance that will dramatically reduce your performance, whether you swap or one of the replicas gets OOM-killed.
The other thing that can affect this benchmark is that, sometimes, for a limited time, you can be allowed to burst up to 100% CPU utilization. So if you got an instance and ran the first benchmark, it might have used a burst period and seen higher performance, but then re-running the second benchmark on the same instance might not get to do that.
In short, you can't just crank up the replica count on a Deployment and expect better performance. You need to identify where in the system the actual bottleneck is. Monitoring tools like Prometheus that can report high-level statistics on per-pod CPU utilization can help. In a typical database-backed Web application the database itself is the bottleneck, and there's nothing you can do about that at the Kubernetes level.

WebSphere MQ Performance

I have MQ server 7.1 running in machine1. I have a java app running in machine 2, that uses JMS to write messages to a queue in machine 1. The java app handles hundreds of messages per second (data coming from else where). Currently it takes about 100ms for 200 text messages (average size 600 bytes) or 2000 messages per second to write messages to the queue. Is this reasonable performance. What are some of the things that one can do to improve the performance further. i.e. faster?
There are a number of detailed recommendations available in the WebSphere MQ Performance Reports. These are published as SupportPacs. If you start at the SupportPac landing page, the ones you want are all named MPxx and are available per-platform and per-version.
As you will see from the SupportPacs, WMQ out of the box is tuned for a balance of speed and reliability across a wide variation of message sizes and types. There is considerable latitude for tuning through configuration and through design/architecture.
From the configuration perspective, there are buffers for persistent and non-persistent messages, an option to reduce disk write integrity from triple-write to single-write, tuning of log file sizes and numbers, connection multiplexing, etc., etc. You may infer from this that the more the QMgr is tuned to specific traffic characteristics, the faster you can get it to go. The flip side of this is that a QMgr tuned that tightly will tend to react badly if a new type of traffic shows up that is outside the tuning specifications.
I have also seen tremendous performance improvement allocating the WMQ filesystems to separate spindles. When a persistent message is written, it goes both to queue files and to log files. If both of those filesystems are in contention for the same disk read/write heads, this can degrade performance. This is why WMQ can sometimes run slower on a high-performance laptop than on a virtual machine or server of approximately the same size. If the laptop has physical spinning disk where the WMQ filesystems are both allocated and the server has SAN, there's no comparison.
From a design standpoint, much performance can be gained from parallelism. The Performance reports show that adding more client connections significantly improves performance, up to a point where it then levels off and eventually begins to decline. Fortunately, the top number of clients before it falls off is VERY large and the web app server typically bogs down before WMQ does, just from the number of Java threads required.
Another implementation detail that can make a big difference is the commit interval. If the app is such that many messages can be put or got at a time, doing so improves performance. A persistent message under syncpoint doesn't need to be flushed to disk until the COMMIT occurs. Writing multiple messages in a single unit of work allows WMQ to return control to the program faster, buffer the writes and then optimize them much more efficiently than writing one message at a time.
The Of Mice and Elephants article contains additional in-depth discussion of tuning options. It is part of the developerWorks Mission:Messaging series which contains some other articles which also touch on tuning.
I recommend to see this: Configuring and tuning WebSphere MQ for performance on Windows and UNIX

What is the number of concurrent users support for Nodejs?

i need to scale my system to handle at least 500k users. I came across nodejs and it's quite intriguing.
Do anyone have any idea of how many concurrent users it can support? Has anyone really tested it?
Do you expect all this users to have persistent tcp connections to your server concurrently?
The bottleneck is probably memory with V8 1gb limit (1.7 on 64bit)
You can try to load test with several hundreds to few thousands connections, log heap usage and extrapolate to find one node instance connections limit.
Good question, but hard to answer. I think the amount of concurrent users is dependent on the amount of processing done with each request and the hardware you are using, eg. amount of memory and processor speed. If you want to use multiple cores, you could use multi-node. Multi-node will start multiple node instances. I never used it, but it looks promising.
You could do a quick test using ab, part of apache.
500k concurrent users is quite a lot, and would make me consider using multiple servers and a load-balancer.
Just my 2ct. Hope this helps.

Are Amazon's micro instances (Linux, 64bit) good for MongoDB servers?

Do you think using an EC2 instance (Micro, 64bit) would be good for MongoDB replica sets?
Seems like if that is all they did, and with 600+ megs of RAM, one could use them for a nice set.
Also, would they make good primary (write) servers too?
My database is only 1-2 gigs now but I see it growing to 20-40 gigs this year (hopefully).
Thanks
They COULD be good - depending on your data set, but likely they will not be very good.
For starters, you dont get much RAM with those instances. Consider that you will be running an entire operating system and all related services - 613mb of RAM could get filled up very quickly.
MongoDB tries to keep as much data in RAM as possible and that wont be possible if your data set is 1-2 gigs and becomes even more of a problem if your data set grows to 20-40 gigs.
Secondly they are labeled as "Low IO performance" so when your data swaps to disk (and it will based on the size of that data set), you are going to suffer from disk reads due to low IO throughput.
Be aware that micro instances are designed for spiky CPU usage, and you will be throttled to the "low background level" if you exceed the allotment.
The AWS Micro Documentation has good information of what they are intended for.
Between the CPU and not very good IO performance my experience with using micros for development/testing has not been very good. (larger instance types have been fine though), but a micro may work for your use case.
However, there are exceptions for a config or arbiter nodes, I believe a micro should be good enough for these types of machines.
There is also some mongodb documentation specific to EC2 which might help.

Resources