Best way to find out and set application resource limits/request on kubernetes

Best way to find out and set application resource limits/request on kubernetes - performance

Hope you can help me with this!
What is the best approach to get and set request and limits resource per pods?
I was thinking in setting an expected number of traffic and code some load tests, then start a single pod with some "low limits" and run load test until OOMed, then tune again (something like overclocking) memory until finding a bottleneck, then attack CPU until everything is "stable" and so on. Then i would use that "limit" as a "request value" and would use double of "request values" as "limit" (or a safe value based on results). Finally scale them out for the average of traffic (fixed number of pods) and set autoscale pods rules for peak production values.
Is this a good approach? What tools and metrics do you recommend? I'm using prometheus-operator for monitoring and vegeta for load testing.
What about vertical pod autoscaling? have you used it? is it production ready?
BTW: I'm using AWS managed solution deployed w/ terraform module
Thanks for reading

I usually start my pods with no limits nor resources set. Then I leave them running for a bit under normal load to collect metrics on resource consumption.
I then set memory and CPU requests to +10% of the max consumption I got in the test period and limits to +25% of the requests.
This is just an example strategy, as there is no one size fits all approach for this.

The VerticalPodAutoScaler is more about making sure that a Pod can run. So it starts it low and doubles memory each time it gets OOMKilled. This can potentially lead to a Pod hogging resource. It is also limited as it doesn't take account of under-performance. If your app is under-resourced it might still respond but not respond in a timeframe you consider acceptable.
I think you are taking a good approach as you are looking at the application under load and assessing what it needs to perform as you want it to. I doubt I can suggest any tools you aren't already aware of but if it helps there is some more discussion in How to set the right cpu millicores for a container? and the threads that link from it

Related

How to tell if I'm hitting the limits of Jmeter & my hardware?

I have seen some questions about the limits of jmeter, like "What is the highest number of threads?" and "What are the physical limits of jmeter?". As some answers indicate, there's no specific limit to jmeter, but rather to jmeter configurations used on specific hardware setups. However, folks do indicate there's a limit & give tips on how to optimize.
My question is more basic - "how can I tell if I'm hitting the limits of my client (Jmeter + hardware)?"
I'm not talking about OOM errors (like described in this blog post), which are pretty obvious, but rather if jmeter is lagging. In the aggregate report, I can see throughput, and I could also count number of responses received in csv output & divide by time. Should I just check if that's equal to my desired QPS? Achieving a desired QPS in jmeter generally seems trickier than just blasting the server with users though, and the math from number of users -> QPS seems a bit tricky.
Finally, how can I tell if it's my server lagging or jmeter lagging? I'm wondering if I can test with some simple static webpage first to confirm jmeter's behavior, and then test my actual server. Any recommendations for a simple static page that can take a high amount of QPS?
Apologies if that's too many questions, but feel free to ask for more details or only answer the primary "how to tell if I'm hitting limits" question.

JMeter doesn't have many "limits", at least they're too high to worry about, you can kick off as many as 2,147,483,647 threads given the underlying hardware/OS allows it and JMeter is properly configured
The easiest solution is switching to Distributed Mode of JMeter execution, i.e. if you're "hitting the limits of JMeter" when you add another instance of JMeter as an extra load generator the throughput should go up given the server is capable of handling the load.
Another option is first of all making sure that you're following JMeter Best Practices and setting up monitoring of baseline resources usage like CPU, RAM, Network, Disk, etc. on the machine where JMeter is running, if any of monitored metrics exceeds i.e. 90% of maximum available capacity - you're "hitting the limits" of the machine where JMeter is running.

I was able to confirm my setup worked by checking the Aggregate Report's throughput metric reached my desired QPS. Initially, when I did not reach the desired throughput and was testing against my server, I was not able to confirm whether the problem was my server or my load testing setup.
To confirm the load testing worked, I swapped the load test to hit a very simple 'hello world' service with an excess of resources. Here, the desired throughput was met.
For reference on actual setup, I ran jmeter on a 5.2xlarge EC2 instance which had 8vCPUs, up to 10Gbps network bandwidth, and 16GiB of RAM and reached 1K QPS. I have yet to see how much further I can push this particular setup.

Kubernetes number of replicas vs performance

I have just gotten into Kubernetes and really liking its ability to orchestrate containers. I had the assumption that when the app starts to grow, I can simply increase the replicas to handle the demand. However, now that I have run some benchmarking, the results confuse me.
I am running Laravel 6.2 w/ Apache on GKE with a single g1-small machine as the node. I'm only using NodePort service to expose the app since LoadBalancer seems expensive.
The benchmarking tool used are wrk and ab. When the replicas is increased to 2, requests/s somehow drops. I would expect the requests/s to increase since there are 2 pods available to serve the request. Is there a bottleneck occurring somewhere or perhaps my understanding is flawed. Do hope someone can point out what I'm missing.

A g1-small instance is really tiny: you get 50% utilization of a single core and 1.7 GB of RAM. You don't describe what your application does or how you've profiled it, but if it's CPU-bound, then adding more replicas of the process won't help you at all; you're still limited by the amount of CPU that GCP gives you. If you're hitting the memory limit of the instance that will dramatically reduce your performance, whether you swap or one of the replicas gets OOM-killed.
The other thing that can affect this benchmark is that, sometimes, for a limited time, you can be allowed to burst up to 100% CPU utilization. So if you got an instance and ran the first benchmark, it might have used a burst period and seen higher performance, but then re-running the second benchmark on the same instance might not get to do that.
In short, you can't just crank up the replica count on a Deployment and expect better performance. You need to identify where in the system the actual bottleneck is. Monitoring tools like Prometheus that can report high-level statistics on per-pod CPU utilization can help. In a typical database-backed Web application the database itself is the bottleneck, and there's nothing you can do about that at the Kubernetes level.

Distributed calculation on Cloud Foundry with help of auto-scaling

I have some computation intensive and long-running task. It can easily be split into sub-tasks and also it would be kind of easy to aggregate the results later on. For example Map/Reduce would work well.
I have to solve this on Cloud Foundry and there I want to get advantage from autos-caling, that is creation of additional instances due to high CPU loads. Normally I use Spring boot for developing my cf apps.
Any ideas are welcome of how to divide&conquer in an elastic way on cf. It would be great to have as many instances created as cf would do, without needing to configure the amount of available application instances in the application. Also I need to trigger the creation of instances by loading the CPUs to provoke auto-scaling.

I have to solve this on Cloud Foundry
It sounds like you're on the right track here. The main thing is that you need to write your app so that it can coexist with multiple instances of itself (or perhaps break it into a primary node that coordinates work and multiple worker apps). However you architect the app, being able to scale up instances is critical. You can then simply cf scale to add or remove nodes and increase capacity.
If you wanted to get clever, you could set up a pipeline to run your jobs. Step one would be to scale up the worker nodes of your app, step two would be to schedule the work to run, step three would be to clean up and scale down your nodes.
I'm suggesting this because manual scaling is going to be the simplest path forward (please read on for why).
and there I want to get advantage from autos-caling, that is creation of additional instances due to high CPU loads.
As to autoscaling, I think it's possible but I also think it's making the problem more complicated than it needs to be. Auto scaling by CPU on Cloud Foundry is not as simple as it seems. The way Linux reports CPU usage, you can exceed 100%, it's 100% per CPU core. Pair this with the fact that you may not know how many CPU cores are on your Cells (like if you're using a public CF provider), the fact that the number of cores could change over time (if your provider changes hardware), and that makes it's difficult to know at what point you should scale your application.
If you must autoscale, I would suggest trying to autoscale on some other metric. What metrics are available, will depend on the autoscaler tool you are using. The best would be if you could have some custom metric, then you could use work queue length or something that's relevant to your application. If custom metrics are not supported, you could always hack together your own autoscaler that does work with metrics relevant to your application (you can scale up and down by adjusting the instance cound of your app using the CF API).
You might also be able to hack together a solution based on the metrics that your autoscaler does provide. For example, you could artificially inflate a metric that your autoscaler does support in proportion to the workload you need to process.
You could also just scale up when your work day starts and scale down at the end of the day. It's not dynamic, but it simple and it will get you some efficiency improvements.
Hope that helps!

What's the correct Cloudwatch/Autoscale settings for extremely short traffic spikes on Amazon Web Services?

I have a site running on amazon elastic beanstalk with the following traffic pattern:
~50 concurrent users normally.
~2000 concurrent users for 1/2 minutes when post is made to Facebook page.
Amazon web services claim to be able to rapidly scale to challenges like this but the "Greater than x for more than 1 minute" setup of cloudwatch doesn't appear to be fast enough for this traffic pattern?
Usually within seconds all the ec2 instances crash, killing all cloudwatch metrics and the whole site is down for 4/6 minutes. So far I've yet to find a configuration that works for this senario.
Here is the graph of a smaller event that also killed the site:

Are these links posted predictably? If so, you can use Scaling by Schedule or as alternative you might change DESIRED-CAPACITY value of Auto Scaling Group or even trigger as-execute-policy to scale out straight before your link is posted.
Do you know you can have multiple scaling policies in one group? So you might have special Auto Scaling policy for your case, something like SCALE_OUT_HIGH which adds say 10 more instances at once. Take a look at as-put-scaling-policy command.
Also, you need to check your code and find bottle necks.
What HTTPD do you use? Consider of switching to Nginx as it's much more faster and less resource consuming software than Apache. Try to use Memcache... NoSQL like Redis for hight read and writes is fine option as well.

The suggestion from AWS was as follows:
We are always working to make our systems more responsive, but it is
challenging to provision virtual servers automatically with a response
time of a few seconds as your use case appears to require. Perhaps
there is a workaround that responds more quickly or that is more
resilient when requests begin to increase.
Have you observed whether the site performs better if you use a larger
instance type or a larger number of instances in the steady state?
That may be one method to be resilient to rapid increases in inbound
requests. Although I recognize it may not be the most cost-effective,
you may find this to be a quick fix.
Another approach may be to adjust your alarm to use a threshold or a
metric that would reflect (or predict) your demand increase sooner.
For example, you might see better performance if you set your alarm to
add instances after you exceed 75 or 100 users. You may already be
doing this. Aside from that, your use case may have another indicator
that predicts a demand increase, for example a posting on your
Facebook page may precede a significant request increase by several
seconds or even a minute. Using CloudWatch custom metrics to monitor
that value and then setting an alarm to Auto Scale on it may also be a
potential solution.
So I think the best answer is to run more instances at lower traffic and use custom metrics to predict traffic from an external source. I am going to try, for example, monitoring Facebook and Twitter for posts with links to the site and scaling up straight away.

How to do load testing using jmeter and visualVM?

I want to do load testing for 10 million users for my site. The site is a Java based web-app. My approach is to create a Jmeter test plan for all the links and then take a report for the 10 million users. Then use jvisualVM to do profiling and check if there are any bottlenecks.
Is there any better way to do this? Is there any existing demo for doing this? I am doing this for the first time, so any assistance will be very helpful.

You are on the correct path, but your load limit is of with a high factor.
Why I'm saying this is cause your site probably will need more machine to handle 10Milj Concurrent users. A process alone would probably struggle to handle concurrent 32K TCP-streams. Also do some math of the bandwidth it would take to actually handle 10Milj users.
Now I do not know what kind of service you thinking of providing on your site, but when thinking of that JVisualVM slows down processing by a factor 10 (or more for method tracing), you would not actually measure the "real world" if you got JMeter and JVisualVM to work at the same time.
JVisualVM is more useful when you run on lower loads.
To create a good measurement first make sure your have a good baseline.
Make a test with 10 concurrent users, connect up JVisuamVM and let it run for a while, not down all interesting values.
After you have your baseline, then you can start adding more load.
Add 10times the load (ea: 100 users), look at the changes in JVisualVM. Continue this until it becomes obvious that JVisualVM slows you down, for every time to add extra load, make sure you have written down the numbers your are interested in. Plot down the numbers in a graph.
Now... Interpolate the graph (by hand) for the number of users you want. This works for memory usage, disc access etc, but not for used CPU time, cause JVisualVM will eat CPU and give you invalid numbers on that (especially if you have method tracing turned on).
If you really want to go as high as 10Milj users, I would not trust JMeter either, I would write a little test program of my own that performs the test you want. This would be okey, since the the setting up the site to handle 10Milj will also take time, so spending a little extra time of the test tools are not a waste.

Just because you have 10 million users in the database, doesn't mean that you need to load test using that many users. Think about it - is your site really going to have 10 million simultaneous users? For web applications, a ratio of 1:100 registered users is common i.e. you are unlikely to have more than 100K users at any moment.
Can JMeter handle that kind of load? I doubt it. Please try faban instead. It is very light-weight and can support thousands of users on a single VM. You also have much better flexibility in creating your workload and can also automate monitoring of your entire test infrastructure.
Now to the analysis part. You didn't say what server you were using. Any Java appserver will provide sufficient monitoring support. Commercial servers provide nice GUI tools while Tomcat provides extensive monitoring via JMX. You may want to start here before getting down to the JVM level.
For the JVM, you really don't want to use VisualVM while running such a large performance test. Besides to support such a load, I assume you are using multiple appserver/JVM instances. The major performance issue is usually GC, so use the JVM options to collect and log GC information. You will have to post-process the data.
This is a non-trivial exercise - good luck!

There are two types of load testing - bottleneck identification and throughput. The question leads me to believe this is about bottlenecks, so number of users is a something of a red herring, instead the goal being for a given configuration finding areas that can be improved to increase concurrency.
Application bottlenecks usually fall into three categories: database, memory leak, or slow algorithm. Finding them involves putting the application in question under stress (i.e. load) for an extended period of time - at least an hour, perhaps up to several days. Jmeter is a good tool for this purpose. One of the things to consider is running the same test with cookie handling enabled (i.e. Jmeter retains cookies and sends with each subsequent request) and disabled - sometimes you get very different results and this is important because the latter is effectively a simulation of what some crawlers do to your site. Details for bottleneck detection follow:
Database
Tables without indices or SQL statements involving multiple joins are frequent app bottlenecks. Every database server I've dealt with, MySQL, SQL Server, and Oracle has some way of logging or identifying slow running SQL statements. MySQL has the slow query log, whereas SQL Server has dynamic management views that track the slowest running SQL. Once you've got your hands on the slow statements use explain plan to see what the database engine is trying to do, use any features that suggest indices, and consider other strategies - such as denormalization - if those two options do not solve the bottleneck.
Memory Leak
Turn on verbose garbage collection logging and a JMX monitoring port. Then use jConsole, which provides much better graphs, to observe trends. In particular leaks usually show up as filling the Old Gen or Perm Gen spaces. Leaks are a bottleneck with the JVM spends increasing amounts of time attempting garbage collection unsuccessfully until an OOM Error is thrown.
Perm Gen implies the need to increase the space as a command line parameter to the JVM. While Old Gen implies a leak where you should stop the load test, generate a heap dump, and then use Eclipse Memory Analysis Tool to identify the leak.
Slow Algorithm
This is more difficult to track down. The most frequent offenders are synchronization, inter process communication (e.g. RMI, web services), and disk I/O. Another common issue is code using nested loops (look mom O(n^2) performance!).
Best way I've found to find these issues absent some deeper knowledge is generating stack traces. These will tell what all threads are doing at a given point in time. What you're looking for are BLOCKED threads or several threads all accessing the same code. This usually points at some slowness within the codebase.

I blogged, the way I proceeded with the performance test:
Make sure that the server (hardware can be as per the staging/production requirements) has no other installations that can affect the performance.
For setting up the users in DB, a procedure can be used and can be called as a part of jmeter test plan.
Install jmeter on a separate machine, so that jmeter won't affect the performance.
Create a test plan in jmeter (as shown in the figure 1) for all the uri's, with response checking and timer based requests.
Take the initial benchmark, using jmeter.
Check for the low performance uri's. These are the points to expect for bottlenecks.
Try different options for performance improvement, but focus on only one bottleneck at a time.
Try any one fix from step 6 and then take an benchmark. If there is any improvement commit the changes and repeat from step 5. Otherwise revert and try for any other options from step 6.
The next step would be to use load balancing, hardware scaling, clustering, etc. This may include some physical setup and hardware/software cost. Give the results with the scalability options.
For detailed explanation: http://www.daemonthread.com/2011/06/site-performance-tuning-using-jmeter.html

I started using JMeter plugins.
This allows me to gather application metrics available over JMX to use in my Load Test.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio