GOMAXPROCS for Go service in Kubernetes

GOMAXPROCS for Go service in Kubernetes - go

I'm trying to stress test our Go service in kubernetes.
The service is just an http server that accepts requests, send requests to another service, perform some string manipulations and return back response to the original request.
We started with
cpu.requests = 1
cpu.limit = 2
Note: host VM has 6 CPUs
With the following test scenario:
Repeat for 20 times:
1. Send 40 parallel requests
2. Sleep for 200ms
What we observed is Gomaxprocs by default is set to 6 (following host specs)
and we get network i/o timeout after some iterations of test.
In addition, cpu consumption falls to 0 after some time (any idea what might happen here? Go runtime scheduler get stuck?)
Issue is resolved by setting Gomaxprocs explicitly to 1.
Some basic Googling led me to article like https://github.com/uber-go/automaxprocs/issues/12
But not many other articles/documentations that warn us about this GOMAXPROCS behavior on kubernetes.
Help appreciated:
Any other articles that elaborate how misconfigured GOMAXPROCS affect Go service in kubernetes?
What to do if cpu.requests is set to 500mCPU? is GOMAXPROCS=1 still adequate? or it simply means cpu.requests must be at least 1?

Related

What can cause a Cloud Run instance to not be reused despite continuous load?

Context:
My Spring-Boot app runs as expected on Cloud Run when I deploy it with max-instances set to 1: It receives a constant stream of pubsub messages via push, and makes anywhere from 0 to 5 writes to an associated CloudSQL instance, depending on the message payload. Typically it handles between 20 and 40 messages per second. Latency/response-time varies between 50ms and 60sec, probably due to some resource contention.
In order to increase throughput/ decrease resource contention, I'm looking to experiment with the connection pool size per app-instance, as well as the concurrency and max-instances parameters for my cloud run app.
I understand that due to Spring-Boot, my app has a relatively high cold-start time of about 30-40 seconds. This is acceptable for how this service is used.
Problem:
I'm experiencing problems when deploying a spring-boot app to cloud run with max-instances set to a value greater than 1:
Instances start, handle a single request successfully, and then produce no more logs.
This happens a few times per minute, leading me to believe that instances get started (cold-start), handle a single request, die, and then get started again. They are not being reused as described in the docs, and as is happening when I set max-instances to 1. Official docs on concurrency
Instead, I expect 3 container instances to be started, which then each requests according to max-concurrency setting.
Billable container time at max-instances=3:
As shown in the graph, the number of instances is fluctuating wildly, once the new revision with max-instances=3 is deployed.
The graphs for CPU- and memory-usage also look like this.
There are no error logs. As before at max-instaces=1, there are warnings indicating that there are not enough instances available to handle requests (HTTP 429).
Connection Limit of CloudSQL instance has not been exceeded
Requests are handled at less than 10/s
Finally, this is the command used to deploy:
gcloud beta run deploy my-service --project=[...] --image=[...] --add-cloudsql-instances=[...] --region=[...] --platform=managed --memory=1Gi --max-instances=3 --concurrency=3 --no-allow-unauthenticated
What could cause this behavior?

Some month ago, in private Alpha, I performed tests and I observed the same behavior. After discussion with Google team, I understood that instances are over provisioned "in case of": an instances crashes, an instances is preempted, the traffic suddenly increase,...
The trade-off of this is that you will have more cold start that your max instances values. Worse, you will be charged for this over provisioned cold start -> this is not an issue because Cloud Run has a huge free tier that covers this kind of glitches.
Going deeper in the logs (you can do it by creating a sink of Cloud Run logs into BigQuery and then by requesting them), even if there is more instances up than your max instances, only your max instances are active in the same time. I'm not sure to be clear. With your parameters, that means, if you have 5 instances up in the same time, only 3 serve the traffic at the same point of time
This part is not documented because it evolves constantly for find the best balance between over-provisioning and lack of ressources (and 429 errors).
#Steren #AhmetB can you confirm or correct me?

When Cloud Run receives and processes requests rapidly, it predicts how many instances it needs, and will try to scale to the amount. If a sudden burst of requests occur, Cloud Run will instantiate a larger number of instances as a response. This is done in order to adapt to a possible higher number of network requests beyond what it is currently serving, with attempts to take into consideration the length of time it will take for the existing instance to complete loading the request. Per the documentation, it is possible that the amount of container instances can go above the max instance value when it spikes.
You mentioned with max-instances set to 1 it was running fine, but later you mentioned it was in fact producing 429s with it set to 1 as well. Seeing behavior of 429s as well as the instances spiking could indicate that the amount of traffic is not being handled fluidly.
It is also worth noting, because of the cold start time you mention, when instances are serving the first request(s), by design, the number of concurrent requests is actually hard set to 1. Once things are fully ready,only then the concurrency setting you have chosen is applied.
Was there some specific reason you chose 3 and 3 for Max Instance settings and concurrency? Also how was the concurrency set when you had max instance set to 1? Perhaps you could try tinkering up further the concurrency (max 80) and /or Max instances (high limit up to 1000) and see if that removes the 429s.

How to get high rps with JMeter load testing https endpoint

I'm trying to test my https endpoint with JMeter. I want to make at least 10000 requests per second, but when I set the number of threads to 10000 I get way less rps, around 500.
I've tried setting the number of threads to 1000 and 100, surprisingly I get this same number of rps. I'm using HTTP Sampler and "use Keep-Alive" is set to true. When I look in the statistics I see that when using 100 threads, it makes use of Keep-Alive and connect_time is around 100 ms, but when the number of threads is higher connect_time grows, it's like it stops reusing the connections.
I know this isn't a server issue, because I've tried testing that same endpoint with Yandex.Tank and phantom and it can easily maintain 10 000 requests per second, the problem is it can't use response data to make furhter requests, that's why I have to use JMeter for this task.

This can be done by using "Stepping thread group". It will allow you to send 10000 request per second upto specified time. Refer below image.
Stepping Thread Group
Download jar from below link.
https://jmeter-plugins.org/wiki/SteppingThreadGroup/

I hope you are trying to achieve this using one machine. Try with multiple machine or jmeter distributed mode.
https://jmeter.apache.org/usermanual/jmeter_distributed_testing_step_by_step.pdf
https://www.blazemeter.com/blog/how-to-perform-distributed-testing-in-jmeter/
https://blazemeter.com/blog/3-common-issues-when-running-jmeter-scripts-and-how-solve-them/
I am assuming that it is the issue with machine which is not able to generate that much load. Usually, i have use max 300 threads per machine but it depend on the machine config. Just check if the machine is having issue and multiple machine is able to generate more load, considering server is not having any issue.
Hope this helps.
Update:-Usually 200-500 can be handled my modern machines.
Please check the below link to have some more info:-
1.How do threads and number of iterations impact test and what is JMeter’s max. thread limit
2.https://www.blazemeter.com/blog/what%e2%80%99s-the-max-number-of-users-you-can-test-on-jmeter/ .

Performance tunning tomcat for 8 core system

I have a tomcat application running on an 8 core system. I observed that when changed maxthread count from 16 to 2 ,there was a dramatic improvement of performance for throughputs of 13 req/sec
So, started printing the active thread count , it seems that when maxthread of tomcat was set to 2 , the active threads on an average for 8 , so basically 8 threads operating on 8 cores , best possible outcome
However, when I increased the throughput to 30-40 req/sec I saw requests queueing up . So , what happened here is that due to only maxthreads 2 requests started piling up .
And when I then set maxThreads to very high value like 10k I saw JVM taking long again context switching .
My question is , is there any property in tomcat wherein I can specify how many requests are to be picked up to process in JVM parallely .
acceptCount property wont help cause it only defines threshold of request up .
There is another property called acceptorThreadCount which is defined as number of threads to be used to accept connections , is this the property I need to tune , or is there any other property , or anything I am missing here?

According to the Connector documentation for maxThreads (I'm assuming that this is where you changed your maxThreads configuration):
The maximum number of request processing threads to be created by this
Connector, which therefore determines the maximum number of
simultaneous requests that can be handled. If not specified, this
attribute is set to 200. If an executor is associated with this
connector, this attribute is ignored as the connector will execute
tasks using the executor rather than an internal thread pool. Note
that if an executor is configured any value set for this attribute
will be recorded correctly but it will be reported (e.g. via JMX) as
-1 to make clear that it is not used.
There's no problem (quite the opposite) setting the thread count to higher than the number of available cores, as not every core always is working (quite often they're waiting for external input, e.g. data from a database).
In case I've missed the point and you change a different maxThreads configuration, please clarify. On the other hand, your question is about the configuration that specifies how many requests are handled in parallel: If you referred to a different maxThreads, then tomcat's default is 200, and it can be changed in the Connector's configuration (or, as the documentation says, with Executors)

Validate newly created server support the same load

We are creating a new hosted server for one of our APIs on managed containers (Kubernetes) and we're trying to validate that it can handle at least the same amount of traffic load requests.
We've started with one of the APIs, where we would need to handle at least 140k requests per minute, all endpoints combined.
To verify this, I created a simple JMeter test as follows:
-Test Plan
---Thread Group Endpoint1
-----HTTP Request -> a GET request with query params for /path1
---Thread Group Endpoint2
-----HTTP Request -> a GET request with query params for /path2
For a local test, I used the following setup:
Thread Groups Endpoint1 and Endpoint2 are set to 200 threads (users), ramp-up period of 1s, loop count = forever and duration 60s.
Using a Summary Report listener when running the test gets me a total of ~9300 # Samples.
Using this approach, is it safe to just increase the number of threads (users) for the Thread Groups until I reach the desired 140k requests per minute?
Note: I only used JMeter a little before, so I'm aware that the entire approach may be wrong, therefore any suggestions and steering to the right path are more than welcomed.

Your approach is viable as long as it represents real-life application usage. If it has 2 endpoints with equally/evenly distributed load - your setup is just fine. If there are more endpoints and some of them are used more than the others - consider defining the workload correspondingly either using different Thread Groups or other distribution mechanism such as Throughput Controller
Increasing the number of threads is also fine, however consider increasing the load gradually, to wit increase ramp-up time so your test could have:
Arrivals phase
Time to hold the load
Ramp-down phase
This way you will be able to correlate various metrics like increasing response time, throughput, number of errors, etc. with the increasing load. Also you will be able to state what was the number of threads/requests per second when the system reached saturation point/breaking point and does it recover when the load gets back.
Also make sure you're following JMeter Best Practices as 2300/2500 requests per second is not something JMeter can support out of the box and you will need to do some tuning, at least increase JVM Heap size allocated to JMeter.

You may not be able to achieve the desired 140k requests per minute using a single Jmeter Machine, in that case you'll need Distributed Load Testing approach here.
refer: http://jmeter.apache.org/usermanual/jmeter_distributed_testing_step_by_step.html
Also keeping the ramp-up period of 1 second will lead to spike and unrealistic load in the system which will not give proper result unless you've pre-warmed your server, you should gradually increase the load as per real/estimated traffic pattern.

Difference between Jmeter load test scenarios

I am testing asp.net website using Jmeter. I have used below scenarios to load test. Scenario 1 give me correct result(What I expect and can be wrong) and Scenario 2 is not giving same result. But I have used same number of requests within same time. Can someone explain me why is this?
Scenario 1.
Scenario 2.

Ramp up time does not determine when any of your tests are going to complete. It only controls when your test is going to start.
Also, the number of threads any test can create concurrently is limited to the memory you've allocated to JMeter. Even though you've set the thread count to 60000, if you've hit the maximum memory you've allocated, the threads will either queue up or never generate (you can watch the JMeter logs for thread creating or errors).
I recommend tuning your JMeter instance so you have some stability to your tests, here's a good guide. LINK

No of requests you have sent might be same. But the concurrent user load on the server is completely different.
I had clarified similar question few weeks ago. You can check the answer here.
Check Here

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio