Kubernetes | Multiple pods - performance problem - performance

i m using kubernetes cluster for web app. But i m running into problem when pods start scale.
More pods -> slower app (every click is longer).
From my point of view, there is problem with caches. I m trying solve it, by volume or persistent volume, which all pods share together. But it has still same output, it seems like every pod want to create new cache.
Is there any solution other to redesign code ?

For cache issues have you considered :
Ingress controllers like nginx to cache static content and deliver from it straight? https://medium.com/#vdboor/using-nginx-ingress-as-a-static-cache-91bc27be04a1
CDN may be if the cache is not private or dynamic in nature ?
With increasing pods, increasing times to me doesn't sound like its a cache issue or not the cache alone. The webserver is playing a big part or the load balancer an/or the firewall sitting in the front is capping the bandwidth. Round trip from browser to pod back to browser should be same if you have 1 or 100 pods provided there is no network latency. In your case an increase traffic is slowing the connection speed. I have had similar issues with the network capping bandwidth in front of the pods.

Related

Panels to have in Kibana dashboard for troubleshooting applications

what are some good panels to have in kibana visualisation for developers to troubleshoot issues in applications? I am trying to create a dashboard that developers could use to pinpoint where the app is having issues. So that they could resolve it. These are a few factors that I have considered :
Cpu usage of pod, memory usage of pod, network in and out, application logs are the ones I have got in mind. Any other panels I could add to so that developers could get an idea where to check if something goes wrong in the app.
For example, application slowness could be because of high cpu consumption, app goes down could because OOM kill, request takes longer could be due to latency or cache issues etc Is there any other thing that I could take into consideration if yes please suggest?
So here a few things that we could add are:
Number of pods, deployments,daemonsets,statefulsets present in the cluster
cpu utilised by the pod(pod wise breakdown)
memory utilised by the pod(pod wise breakdown)
Network in/out
5.top memory/cpu consuming pods and nodes
Latency
persistence disk details
error logs as annotations in tsvb
Logstreams to check logs within dashboard.

Google Cloud Platform Load Balancer - The load across the PODs are not so even

In one of my project, we have 9 PODs for a single microservice, and during the load test run that is noticing the load distribution across the 9 PODs are not even. Also, on the PODs (which have received low traffic compared to other PODs) there is a gap between the traffics. Has anyone faced this issue and advise the areas / spaces that could cause this
All 9 PODs are hosted on a different node under the same cluster and we have 3 zones
The load balancer algorithm used is round-robin.
Sample flow: microservices 1 (is running in 3 PODs, which uses Nginx but not as a load balancer) -> microservices 2 (is running 9 PODs, which uses node js)
Another flow: microservices 1 (is running in 6 PODs) -> microservices 2 (running in 9 PODs)
Refer to the below screenshots,
[
As far as Kubernetes is concerned, the LB distributes requests at the node level and not at the pod and it will completely disregard the number of pods on each node. Unfortunately, this is a limitation on Kubernetes. You may also have a look at the last paragraph of this documentation about traffic not equally load balanced across pods. [1]
Defining resources for containers [2] is important as it allows the scheduler to make better decisions when it comes time to place pods in nodes. May I recommend to have a look at the following documentation [3] on how pods with resource limits are set. It is mentioned that a pod will not be allowed to exceed its CPU limit for an extended period of time and it will not be killed, eventually leading to a decreased performance.
[1] https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer
[2] https://kubernetes.io/docs/concepts/configuration/manage-resources-containers
[3]https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#how-pods-with-resource-limits-are-run
Regards,
Anbu.

GKE and RPS - 'Usage is at capacity' - and performance issues

We have a GKE cluster with Ingress (kubernetes.io/ingress.class: "gce") where one backend is serving our production site.
The cluster is regional one with 3 zones enabled (autoscaling enabled).
The backend serving production site is a Varnish server running as Deployment - single replica. Behind Varnish there are multiple Nginx/PHP pods running under HorizontalPodAutoscaler.
The performance of of the site is slow. We have noticed by using GCP console that all traffic is routed only to one Backend and there is only 1/1 healthy endpoint in one zone?
We are getting exclamation mark next to the serving Backend with message 'Usage is at capacity, max = 1' and 'Backend utilization: 0%'. The other backend in second zone has no endpoint configured? And there is no third backed in third zone?
Initially we were getting a lot of 5xx responses from the backend at around 80RPS rate so we have turned on CDN via BackendConfig.
This has reduced 5xx responses including RPS on the backend to around 9RPS and around 83% RPS is being served from CDN.
We are trying to figure it out if it is possible to improve our backend utilization as clearly serving 80RPS from one Varnish server which has many pods behind should be easily achievable. We can not find any underperforming POD (varnish itself or nginx/php) in this scenario.
Is GKE/GCP throttling the backend/endpoint to only support 1RPS?
Is there any way to increase RPS per endpoint and increase number of endpoints, at least one per zone?
Is there any documentation available that explain how to scale such architecture on GKE/GCP?

How many users should a EC2 Micro Instance be able to handle only with a nginx server?

I have a iOS Social App.
This app talks to my server to do updates & retrieval fairly often. Mostly small text as JSON. Sometimes users will upload pictures that my web-server will then upload to a S3 Bucket. No pictures or any other type of file will be retrieved from the web-server
The EC2 Micro Ubuntu 13.04 Instance runs PHP 5.5, PHP-FPM and NGINX. Cache is handled by Elastic Cache using Redis and the database connects to a separate m1.large MongoDB server. The content can be fairly dynamic as newsfeed can be dynamic.
I am a total newbie in regards to configuring NGINX for performance and I am trying to see whether I've configured my server properly or not.
I am using Siege to test my server load but I can't find any type of statistics on how many concurrent users / page loads should my system be able to handle so that I know that I've done something right or something wrong.
What amount of concurrent users / page load should my server be able to handle?
I guess if I cant get hold on statistic from experience what should be easy, medium, and extreme for my micro instance?
I am aware that there are several other questions asking similar things. But none provide any sort of estimates for a similar system, which is what I am looking for.
I haven't tried nginx on microinstance for the reasons Jonathan pointed out. If you consume cpu burst you will be throttled very hard and your app will become unusable.
IF you want to follow that path I would recommend:
Try to cap cpu usage for nginx and php5-fpm to make sure you do not go over the thereshold of cpu penalities. I have no ideia what that thereshold is. I believe the main problem with micro instance is to maintain a consistent cpu availability. If you go over the cap you are screwed.
Try to use fastcgi_cache, if possible. You want to hit php5-fpm only if really needed.
Keep in mind that gzipping on the fly will eat alot of cpu. I mean alot of cpu (for a instance that has almost none cpu power). If you can use gzip_static, do it. But I believe you cannot.
As for statistics, you will need to do that yourself. I have statistics for m1.small but none for micro. Start by making nginx serve a static html file with very few kb. Do a siege benchmark mode with 10 concurrent users for 10 minutes and measure. Make sure you are sieging from a stronger machine.
siege -b -c10 -t600s 'http:// private-ip /test.html'
You will probably see the effects of cpu throttle by just doing that! What you want to keep an eye on is the transactions per second and how much throughput can the nginx serve. Keep in mind that m1small max is 35mb/s so m1.micro will be even less.
Then, move to a json response. Try gzipping. See how much concurrent requests per second you can get.
And dont forget to come back here and report your numbers.
Best regards.
Micro instances are unique in that they use a burstable profile. While you may get up two 2 ECU's in terms of performance for a short period of time, after it uses its burstable allotment it will be limited to around 0.1 or 0.2 ECU. Eventually the allotment resets and you can get 2 ECU's again.
Much of this is going to come down to how CPU/Memory heavy your application is. It sounds like you have it pretty well optimized already.

EC2 for handling demand spikes

I'm writing the backend for a mobile app that does some cpu intensive work. We anticipate the app will not have heavy usage most of the time, but will have occasional spikes of high demand. I was thinking what we should do is reserve a couple of 24/7 servers to handle the steady-state of low demand traffic and then add and remove EC2 instances as needed to handle the spikes. The mobile app will first hit a simple load balancing server that does a simple round-robin user distribution among all the available processing servers. The load balancer will handle bringing new EC2 instances up and turning them back off as needed.
Some questions:
I've never written something like this before, does this sound like a good strategy?
What's the best way to handle bringing new EC2 instances up and back down? I was thinking I could just create X instances ahead of time, set them up as needed (install software, etc), and then stop each instance. The load balancer will then start and stop the instances as needed (eg through boto). I think this should be a lot faster and easier than trying to create new instances and install everything through a script or something. Good idea?
One thing I'm concerned about here is the cost of turning EC2 instances off and back on again. I looked at the AWS Usage Report and had difficulty interpreting it. I could see starting a stopped instance being a potentially costly operation. But it seems like since I'm just starting a stopped instance rather than provisioning a new one from scratch it shouldn't be too bad. Does that sound right?
This is a very reasonable strategy. I used it successfully before.
You may want to look at Elastic Load Balancing (ELB) in combination with Auto Scaling. Conceptually the two should solve this exact problem.
Back when I did this around 2010, ELB had some problems with certain types of HTTP requests that prevented us from using it. I understand those issues are resolved.
Since ELB was not an option, we manually launched instances from EBS snapshots as needed and manually added them to an NGinX load balancer. That certainly could have been automated using the AWS APIs, but our peaks were so predictable (end of month) that we just tasked someone to spin up the new instances and didn't get around to automating the task.
When an instance is stopped, I believe the only cost that you pay is for the EBS storage backing the instance and its data. Unless your instances have a huge amount of data associated, the EBS storage charge should be minimal. Perhaps things have changed since I last used AWS, but I would be surprised if this changed much if at all.
First with regards to costs, whether an instance is started from scratch or from a stopped state has no impact on cost. You are billed for the amount of compute units you use over time, period.
Second, what you are looking to do is called autoscaling. What you do is setup up a launch config that specifies an AMI you are going to use (along with any user-data configs you are using, the ELB and availiabilty zones you are going to use, min and max number of instances, etc. You set up a scaling group using that launch config. Then you set up scaling policies to determine what scaling actions are going to be attached to the group. You then attach cloud watch alarms to each of those policies to trigger the scaling actions.
You don't have servers in reserve that you attach to the ELB or anything like that. Everything is based on creating a single AMI that is used as the template for the servers you need.
You should read up on autoscaling at the link below:
http://aws.amazon.com/autoscaling/

Resources