AWS RDS Proxy CPU consumption when initiating new connections - aws-lambda

I am using RDS Proxy in front of Aurora MySQL writer(db.r3.large), and I started doing some performance testing.
My test consist on making 500 requests per minute and check how the Proxy and Aurora performs.
What I observed is that Aurora instance has worked at 100% of CPU utilization for 40 mins, after that, the CPU consumption dropped until almost 10%, so I guess RDS is caching data or since the connections are already open, the CPU utilization dropped that way.
In order to reduce the CPU consumption for the first 40 mins, I have two alternatives, either I scale the instance vertically or I reduce the connection pool size in the proxy.
What is the better approach to take here?
In addition, I have a read replica that is not being used by the Proxy, but well, I think RDS Proxy does not behaves as a load balancer between the writer and the reader instances.
Thanks

To minimize cache warmup delays, in your Aurora Performance Group,
innodb_fast_shutdown=OFF # recommended for production to usually avoid RECOVERY
innodb_buffer_pool_dump_pct=90 # to preserve active row pointers across instance stop/start
innodb_buffer_pool_dump_at_shutdown=ON # to prepare for next instance start
innodb_buffer_pool_load_at_startup=ON # to reduce loading delays at instance start
innodb_flush_neighbors=2 # sweep all rows for same extent in one sweep cycle
The very best to you.

Related

Lambda Latency with Provisioned Concurrency and AWS Service Calls

This is a general question.
We have 15 provisioned containers running and our Lambda makes calls to a VPC Endpoint as well as to EFS. We've noticed that during deployments, Lambda duration really spikes. Its also worth noting that we might be over scaled in terms of the number of containers we need.
I understand there are AWS Connection timeouts (60 seconds)(https://code.amazon.com/packages/AWSJavaClientRuntime/blobs/mainline/--/main/com/amazonaws/ClientConfiguration.java#L111).
My general question is: Could the latency be coming from the containers having to reestablish connections to the VPC Endpoint / EFS?
We almost always see latency spikes for these service calls during deployments, and then it tends to smooth out. Interestingly though, if we increase provisioned concurrency too high, we tend to see more spikes than usual. My theory is that most container time out between themselves and VPC-E/EFS, and its these timeouts that we observe as latency spikes when provisioned concurrency is too high.
Is this possible? And is it possible to have too much provisioned concurrency (i.e. we need to hit a sweet spot for the best performance such that they're alway warm)?
Additional note:
We deploy to an alias in 10% / 3 minute increments. Deployments take 30 minutes, and we see these spikes for 30 or more minutes until they taper off. Spikes are more likely with higher provisioned concurrency as mentioned prior.

Kubernetes throttling JVM application that isn't hitting CPU quota

I am running a Kotlin Spring Boot based service in a Kubernetes cluster that connects to a PostgreSQL database. Each request takes around 3-5 database calls which partially run in parallel via Kotlin coroutines (with a threadpool backed coroutine context present).
No matter the configuration this services gets throttled heavily after getting hit by real traffic after just starting up. This slowness sometimes persists for 2-3 minutes and often only affects some fresh pods, but not all.
I am looking for new avenues to analyze the problem - here's a succinct list of circumstances / stuff I am already doing:
The usual response time of my service is around 7-20ms while serving 300-400 requests / second per pod
New / autoscaled instances warmup themselfes by doing 15000 HTTP requests against themselfs. The readiness probe is not "up" before this process finishes
We are currently setting a cpu request and limit of 2000m, changing this to 3000m does reduce the issue but the latency still spikes to around 300-400ms which is not acceptable (at most 100ms would be great, 50ms ideal)
The memory is set to 2gb, changing this to 3gb has no significant impact
The pods are allocating 200-300mb/s during peak load, the GC activity does not seem abnormal to me
Switching between GCs (G1 and ZGC) has no impact
We are experiencing pod throttling of around 25-50% (calculated via Kubernetes metrics) while the pod CPU usage is around 40-50%
New pods struggle to take 200-300 requests / sec even though we warm up, curiously enough some pods suffer for long periods. All external factors have been analyzed and disabling most baggage has no impact (this includes testing with disabled tracing, metric collection, disabling Kafka integration and verifying our database load is not maxing out - it's sitting at around 20-30% CPU usage while network and memory usage are way lower)
The throttling is observed in custom load tests which replicates the warmup requests described above
Connecting with visualvm during the load tests and checking the CPU time spent yields no striking issues
This is all done on a managed kubernetes by AWS
All the nodes in our cluster are of the same type (c5.2xlarge of AWS)
Any tools / avenues to investigate are appreciated - thank you! I am still puzzled why my service is getting throttled although its CPU usage is way below 100%. Our nodes are also not affected by the old kernel cfs bug from before kernel 5.6 (not entirely sure in which version it got fixed, we are very recent on our nodes kernel version though).
In the end this all boiled down to missing one part of the equation: I/O bounds.
Imagine if one request takes 10 DB calls, each taking 3 milliseconds to fulfill (including network latency etc.). A single request then takes 10*3 = 30 milliseconds of I/O. The request throughput of one request is then 1000ms / 30ms = 33,33 requests / second. Now if one service instance uses 10 threads to handle requests we get 333,3 requests / seconds as our upper bound of throughput. We can't get any faster than this because we are I/O bottlenecked in regards to our thread count.
And this leaves out multiple factors like:
thread pool size vs. db connection pool size
our service doing non-db related tasks (actual logic, json serialization when the response get fulfilled)
database capacity (was not an issue for us)
TL;DR: You can't get faster when you are I/O bottlenecked, no matter much how CPU you provide. I/O has to be improve if you want your single service instance to have more throughput, this is mostly done by db connection pool sizing in relation to thread pool sizing in relation to db calls per request. We missed this basic (and well known) relation between resources!

Kubernetes number of replicas vs performance

I have just gotten into Kubernetes and really liking its ability to orchestrate containers. I had the assumption that when the app starts to grow, I can simply increase the replicas to handle the demand. However, now that I have run some benchmarking, the results confuse me.
I am running Laravel 6.2 w/ Apache on GKE with a single g1-small machine as the node. I'm only using NodePort service to expose the app since LoadBalancer seems expensive.
The benchmarking tool used are wrk and ab. When the replicas is increased to 2, requests/s somehow drops. I would expect the requests/s to increase since there are 2 pods available to serve the request. Is there a bottleneck occurring somewhere or perhaps my understanding is flawed. Do hope someone can point out what I'm missing.
A g1-small instance is really tiny: you get 50% utilization of a single core and 1.7 GB of RAM. You don't describe what your application does or how you've profiled it, but if it's CPU-bound, then adding more replicas of the process won't help you at all; you're still limited by the amount of CPU that GCP gives you. If you're hitting the memory limit of the instance that will dramatically reduce your performance, whether you swap or one of the replicas gets OOM-killed.
The other thing that can affect this benchmark is that, sometimes, for a limited time, you can be allowed to burst up to 100% CPU utilization. So if you got an instance and ran the first benchmark, it might have used a burst period and seen higher performance, but then re-running the second benchmark on the same instance might not get to do that.
In short, you can't just crank up the replica count on a Deployment and expect better performance. You need to identify where in the system the actual bottleneck is. Monitoring tools like Prometheus that can report high-level statistics on per-pod CPU utilization can help. In a typical database-backed Web application the database itself is the bottleneck, and there's nothing you can do about that at the Kubernetes level.

AWS AutoScaling not working / CPU Utilization stays sub 30%

I have setup AWS AutoScaling as following:
1) created a Load Balancer and registered one instance with it;
2) added Health Checks to the ELB;
3) added 2 Alarms:
- CPU Usage -> 60% for 60s, spin up 1 instance;
- CPU usage < 40% for 120s, spin down 1 instance;
4) wrote a jMeter script to send traffic to the website in question: 250 threads, 200 seconds ramp up time, loop count 5.
What I am seeing was very strange.
I expect the CPU usage to shoot up with the higher number of users. But instead the CPU usage stays between 20-30% (which is why the new instance never fires up) and running instance starts throwing timeout errors once it reaches anything more than 100 users.
I am at a loss to understand why CPU usage is so low when the website is in fact timing out.
Ideas?
This could be a problem with the ELB. The ELB does not scale very quickly, it takes a consistent amount of traffic to the ELB to let amazon know you need a bigger one. If you just hit it really hard all at once that does not help it scale. So the ELB could be having problems handling all the connections.
Is this SSL? Are you doing SSL on the ELB? That would add overhead to an underscaled ELB as well.
I would honestly recommend not using ELB at all. haproxy is a much better product and much faster in most cases. I can elaborate if needed, but just look at how Amazon handles the cname vs what you can do with haproxy...
It sounds like you are testing AutoScaling to ensure it will work for your needs. As a first pass to simply see if AS will launch a new instance, try reducing your CPU up check to trigger at 25%. I realize this is a lot lower than you are hoping to use moving forward, but it will help validate that your initial configuration is working.
As a second step, you should take a look at your application and see if CPU is the best metric to have AS monitor for scaling. It is possible that you have a bottleneck somewhere else in your app that may not necessarily be CPU related (web server tuning, memory, databases, storage, etc). You didn't mention what type of content you're serving out; is it static or generated by an interpreter (like PHP or something else)? You could also send your own custom metric data into CloudWatch and use this metric to trigger the scaling.
You may also want to time how long it takes for an instance to be ready to serve traffic from a cold start. If it takes longer than 60 seconds, you may want to adjust your monitoring threshold time appropriately (or set cool down periods). As chantheman pointed out, it can take some time for the ELB to register the instance as well (and a longer amount of time if the new instance is in a different AZ).
I hope all of this helps.
What we discovered is that when you are using autoscale on t2 instances, and under heavy load, those instances will run out of CPU credits and then they are limited to 20% of CPU (from the monitoring point of view, internal htop is still 100%). Internally they are at maximum load.
This sends false metric to Autoscaling and news instances will not fire.
You need to change metric or develop you own or move to m instances.

max concurrent connection to amazon load balancer

My testing shows that amazon load balancer rest connection with its instance when it has about 10k concurrent connections into it. Is that a limit of Amazon load balancer? If not, is there a setting for it? I need to support upto 1M concurrent connections for my testing.
Thanks,
Sean Nguyen
The ELB should scale way beyond that, but you need to be testing from multiple test clients that appear to come from unique source IPs. This will cause multiple ELB instances to spawn multiple instances behind the scenes (this can be detected by DNS lookups). This is explained in the whitepaper that Rightscale published:
http://blog.rightscale.com/2010/04/01/benchmarking-load-balancers-in-the-cloud/
Note that it takes a little while for ELB resources to scale out, so tests need to run for 20 minutes or more.
You also need to be sure that you have enough resources behind the load balancer. EC2 instances (as shown in the white paper mentioned above) seem to hit a throughput limit of around 100k packets per second which limits the number of concurrent connections that can be served (bear in mind the overhead of TCP and HTTP). You will need a lot of instances to be able to cope with 1M concurrent connections, and I'm not sure at what point you will hit the limit of ELB; in RightScale's test they only hit 19k.
Also you need to be clear about exactly what you mean by 1M concurrent connections, do you mean total keep-alive connections (assuming keep-alive enabled), or do you mean 1M transactions per second?

Resources