AWS APIGateway WebSockets API returns 429 - websocket

I am load-testing a WebSocket Service which I deployed today on AWS APIGateway using a tool called Thor. The service uses Lambda based message handlers and is packed and deployed through Serverless framework in Python3.8. The connection ids are stored in DynamoDB.
My AWS account level throttle is currently at 10000 requests/second and bursts are at 5000.
I'm getting the below results for 2000 concurrent connections with 1 message per connection:
Online 4353 milliseconds
Time taken 4353 milliseconds
Connected 1826
Disconnected 0
Failed 174
Total transferred 549.23kB
Total received 420.84kB
Durations (ms):
min mean stddev median max
Handshaking 840 1629 666 1448 3383
Latency 103 198 150 155 859
Percentile (ms):
50% 66% 75% 80% 90% 95% 98% 98% 100%
Handshaking 1448 1930 2178 2297 2497 2641 3311 3336 3383
Latency 155 163 167 172 194 665 785 798 859
Received errors:
174x unexpected server response (429)
It's clear that the requests are being throttled at this point. I want to know what should be my account level throttle limit for supporting 20,000 concurrent WebSocket connections. Sometimes, I also see 500 errors in the results which I'm unable to debug from the logs.
Let me know if more information is needed to answer this.

There is a limit of 500 WebSocket connections per second on API Gateway.
Check here: https://docs.aws.amazon.com/apigateway/latest/developerguide/limits.html
It is possible that you are running into these limits.

Related

FastAPI sync vs async performance testing

I wrote simple fastapi server in which I want to do performance testing. Which means that I would like to test async vs. sync endpoint, but I got always the same results. Which concept I don't understand or what I am doing wrong?
from fastapi import FastAPI
import time
import asyncio
app = FastAPI()
#app.get("/hello")
def say_hello():
time.sleep(5)
#app.get("/hello-async")
async def say_hello():
await asyncio.sleep(5)
I run server like this:
gunicorn main:app --workers=4 --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000
And do performance testing:
ab -n 80 -c 8 http://localhost:8000/hello-async
ab -n 80 -c 8 http://localhost:8000/hello
And got this kind of results:
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient).....done
Server Software: uvicorn
Server Hostname: localhost
Server Port: 8000
Document Path: /hello-async
Document Length: 4 bytes
Concurrency Level: 8
Time taken for tests: 55.016 seconds
Complete requests: 80
Failed requests: 0
Total transferred: 10240 bytes
HTML transferred: 320 bytes
Requests per second: 1.45 [#/sec] (mean)
Time per request: 5501.634 [ms] (mean)
Time per request: 687.704 [ms] (mean, across all concurrent requests)
Transfer rate: 0.18 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 1
Processing: 5000 5001 0.6 5001 5003
Waiting: 5000 5001 0.6 5001 5003
Total: 5000 5001 0.6 5001 5003
Percentage of the requests served within a certain time (ms)
50% 5001
66% 5002
75% 5002
80% 5002
90% 5002
95% 5002
98% 5003
99% 5003
100% 5003 (longest request)
I wanted to simulate I/O operation, if I call API or database I want to se how much faster is async approach. What happened now I got the same time taken for tests.
If I would do CPU intensive operation I would understand that I will got the same result. What I am doing wrong?
Also in syncing approach the web server is limited to number threads which are spawned, but then what is theoretical limit of async approach? Also how can I test limits number of spawned threads in syncing approach and in async approach what is the limit of event loop.

liveness probe fails with actuator endpoint during stress/load testing?

I am doing ab testing of my application, Where I send 3000 concurrent requests and total of 10000 requests. My application is a spring boot application with actuator and I use kubernetes and docker for containerization. During the testing my actuator end points take longer time than expected, due to this my pods restart and requests start failing.
Now I have stopped liveness probe and during the test if I manually hit the actuator endpoint, I can see that it takes a lot of time to respond back and sometimes it does not even returns the result and just stucks.
I can see that each request is served within 10 millisecond by my application, as per the logs. But the AB test results are completely different. Below are the results from AB test:
Concurrency Level: 3000
Time taken for tests: 874.973 seconds
Complete requests: 10000
Failed requests: 6
(Connect: 0, Receive: 0, Length: 6, Exceptions: 0)
Non-2xx responses: 4
Total transferred: 1210342 bytes
Total body sent: 4950000
HTML transferred: 20580 bytes
Requests per second: 11.43 [#/sec] (mean)
Time per request: 262491.958 [ms] (mean)
Time per request: 87.497 [ms] (mean, across all concurrent requests)
Transfer rate: 1.35 [Kbytes/sec] received
5.52 kb/s sent
6.88 kb/s total
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 372 772.5 0 3051
Processing: 1152 226664 145414.1 188502 867403
Waiting: 1150 226682 145404.6 188523 867402
Total: 2171 227036 145372.2 188792 868447
Percentage of the requests served within a certain time (ms)
50% 188792
66% 249585
75% 295993
80% 330934
90% 427890
95% 516809
98% 635143
99% 716399
100% 868447 (longest request)
Not able to understand this behaviour, as it shows approximately only 11.43 requests are served within a second, which is very low. What could be the reason ? Also What should be the way to keep the liveness probe ?
I have below properties set in my application.properties:
server.tomcat.max-connections=10000
server.tomcat.max-threads=2000

scala Play 2.5 vs golang benchmark, and optimizing performance in play framework

I'm benchmarking a simple hello world example in scala play framework 2.5 and in golang. Golang seems to be out performing play by a significant margin and I would like to know how I could optimize play framework to improve performance.
I'm using the following to benchmark
ab -r -k -n 100000 -c 100 http://localhost:9000/
I'm running play 2.5 in prod mode using the default configuration everywhere in my project. Can someone help me with performance tuning the play server in order to get the most performance? I read up on the default-dispatcher thread pool, but I'm not sure what settings to use for my pc. Also are there any other areas I could check that would help with performance?
here are my marchine specs
Intel(R) Xeon(R) W3670 # 3.20GHz 3.19GHz, 12.0 GM RAM, running windows 7 64-bit
Please note that I'm using sbt (clean and stage) to run the server in place in prod mode and executing the bat file found at target/universal/stage/bin/. Here is the source code for play
package controllers
import play.api.mvc._
class Application extends Controller {
def index = Action {
Ok("Hello, world!")
}
}
here are the result from the ab benchmark
ab -r -k -n 100000 -c 100 http://localhost:9000/
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests
Server Software:
Server Hostname: localhost
Server Port: 9000
Document Path: /
Document Length: 13 bytes
Concurrency Level: 100
Time taken for tests: 1.537 seconds
Complete requests: 100000
Failed requests: 0
Keep-Alive requests: 100000
Total transferred: 15400000 bytes
HTML transferred: 1300000 bytes
Requests per second: 65061.81 [#/sec] (mean)
Time per request: 1.537 [ms] (mean)
Time per request: 0.015 [ms] (mean, across all concurrent requests)
Transfer rate: 9784.69 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 1
Processing: 0 2 1.9 1 72
Waiting: 0 2 1.9 1 72
Total: 0 2 1.9 1 72
Percentage of the requests served within a certain time (ms)
50% 1
66% 2
75% 2
80% 2
90% 3
95% 3
98% 5
99% 8
100% 72 (longest request)
here is the source code for golang
package main
import (
"fmt"
"net/http"
)
func handler(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, "Hello, world!")
}
func main() {
http.HandleFunc("/", handler)
http.ListenAndServe(":8080", nil)
}
here are the result from ab benchmark for golang
ab -r -k -n 100000 -c 100 http://localhost:8080/
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests
Server Software:
Server Hostname: localhost
Server Port: 8080
Document Path: /
Document Length: 13 bytes
Concurrency Level: 100
Time taken for tests: 0.914 seconds
Complete requests: 100000
Failed requests: 0
Keep-Alive requests: 100000
Total transferred: 15400000 bytes
HTML transferred: 1300000 bytes
Requests per second: 109398.30 [#/sec] (mean)
Time per request: 0.914 [ms] (mean)
Time per request: 0.009 [ms] (mean, across all concurrent requests)
Transfer rate: 16452.48 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 1
Processing: 0 1 1.5 1 52
Waiting: 0 1 1.5 1 52
Total: 0 1 1.5 1 52
Percentage of the requests served within a certain time (ms)
50% 1
66% 1
75% 1
80% 1
90% 1
95% 2
98% 5
99% 7
100% 52 (longest request)
Thanking you in advance
Francis
UPDATE!
the following results in improved performance, but I'm still interested in other ideas that could improve performance
package controllers
import play.api.mvc._
import scala.concurrent.Future
import play.api.libs.concurrent.Execution.Implicits.defaultContext
class Application extends Controller {
def index = Action.async {
Future.successful(Ok("Hello, world!"))
}
}
benchmark results
ab -r -k -n 100000 -c 100 http://localhost:9000/
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests
Server Software:
Server Hostname: localhost
Server Port: 9000
Document Path: /
Document Length: 13 bytes
Concurrency Level: 100
Time taken for tests: 1.230 seconds
Complete requests: 100000
Failed requests: 0
Keep-Alive requests: 100000
Total transferred: 15400000 bytes
HTML transferred: 1300000 bytes
Requests per second: 81292.68 [#/sec] (mean)
Time per request: 1.230 [ms] (mean)
Time per request: 0.012 [ms] (mean, across all concurrent requests)
Transfer rate: 12225.66 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 1
Processing: 0 1 2.2 1 131
Waiting: 0 1 2.2 1 131
Total: 0 1 2.2 1 131
Percentage of the requests served within a certain time (ms)
50% 1
66% 1
75% 1
80% 2
90% 2
95% 3
98% 5
99% 7
100% 131 (longest request)
As #marcospereira has said, Play is a relatively high-level framework that focuses on exploiting the much more advanced type-system in Scala to give you a lot of features and safety, which in turn helps you write code that is refactorable and scalable to your needs. Never the less, I've gotten great performance from it in production.
Asides from suggesting that you try running your benchmark on Linux with native socket transport, I'll repeat what #marcospereira said, to run your benchmark a couple of times without stopping your Play server. The standard deviation in your Play benchmark results seems abnormally high (averages of 1 with a SD's of 2.2), which suggests that perhaps JIT hasn't fully finished optimising your code for you yet.

nodejs http with redis, only have 6000req/s

Test node_redis benchmark, it show incr has more than 100000 ops/s
$ node multi_bench.js
Client count: 5, node version: 0.10.15, server version: 2.6.4, parser: hiredis
INCR, 1/5 min/max/avg/p95: 0/ 2/ 0.06/ 1.00 1233ms total, 16220.60 ops/sec
INCR, 50/5 min/max/avg/p95: 0/ 4/ 1.61/ 3.00 648ms total, 30864.20 ops/sec
INCR, 200/5 min/max/avg/p95: 0/ 14/ 5.28/ 9.00 529ms total, 37807.18 ops/sec
INCR, 20000/5 min/max/avg/p95: 42/ 508/ 302.22/ 467.00 519ms total, 38535.65 ops/sec
Then I add redis in nodejs with http server
var http = require("http"), server,     
redis_client = require("redis").createClient();
server = http.createServer(function (request, response) {
    response.writeHead(200, {
        "Content-Type": "text/plain"
    });
    
    redis_client.incr("requests", function (err, reply) {
            response.write(reply+'\n');
        response.end();
    });
}).listen(6666);
server.on('error', function(err){
console.log(err);
process.exit(1);
});
Use ab command to test, it only has 6000 req/s
$ ab -n 10000 -c 100 localhost:6666/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests
Server Software:
Server Hostname: localhost
Server Port: 6666
Document Path: /
Document Length: 7 bytes
Concurrency Level: 100
Time taken for tests: 1.667 seconds
Complete requests: 10000
Failed requests: 0
Write errors: 0
Total transferred: 1080000 bytes
HTML transferred: 70000 bytes
Requests per second: 6000.38 [#/sec] (mean)
Time per request: 16.666 [ms] (mean)
Time per request: 0.167 [ms] (mean, across all concurrent requests)
Transfer rate: 632.85 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.3 0 2
Processing: 12 16 3.2 15 37
Waiting: 12 16 3.1 15 37
Total: 13 17 3.2 16 37
Percentage of the requests served within a certain time (ms)
50% 16
66% 16
75% 16
80% 17
90% 20
95% 23
98% 28
99% 34
100% 37 (longest request)
Last I test 'hello world', it reached 7k req/s
Requests per second: 7201.18 [#/sec] (mean)
How to profile and figure out the reason why redis in http lose some performance?
I think you have misinterpreted the result of multi_bench benchmark.
First, this benchmark spreads the load over 5 connections, while you have only one in your node.js program. More connections mean more communication buffers (allocated on a per socket basis) and better performance.
Then, while a Redis server is able to sustain 100K op/s (provided you open several connections, and/or use pipelining), node.js and node_redis are not able to reach this level. The result of your run of multi_bench shows that when pipelining is not used, only 16K op/s are achieved.
Client count: 5, node version: 0.10.15, server version: 2.6.4, parser: hiredis
INCR, 1/5 min/max/avg/p95: 0/ 2/ 0.06/ 1.00 1233ms total, 16220.60 ops/sec
This result means that with no pipelining, and with 5 concurrent connections, node_redis is able to process 16K op/s globally. Please note that measuring a throughput of 16K op/s while only sending 20K ops (default value of multi_bench) is not very accurate. You should increase num_requests for better accuracy.
The result of your second benchmark is not so surprising: you add an http layer (which is more expensive to parse than Redis protocol itself), use only 1 connection to Redis while ab tries to open 100 concurrent connections to node.js, and finally get 6K op/s, resulting in a 1.2K op/s throughput overhead compared to a "Hello world" HTTP server. What did you expect?
You could try to squeeze out a bit more performance by leveraging node.js clustering capabilities, as described in this answer.

Very poor webserver performance

so i ran this command "ab -c 50 -n 5000 http://lala.la" today on the server, and i got these "amazing" results:
Document Path: /
Document Length: 26476 bytes
Concurrency Level: 50
Time taken for tests: 1800.514 seconds
Complete requests: 2427
Failed requests: 164
(Connect: 0, Receive: 0, Length: 164, Exceptions: 0)
Write errors: 0
Total transferred: 65169733 bytes
HTML transferred: 64345285 bytes
Requests per second: 1.35 [#/sec] (mean)
Time per request: 37093.408 [ms] (mean)
Time per request: 741.868 [ms] (mean, across all concurrent requests)
Transfer rate: 35.35 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 2.7 0 22
Processing: 4335 36740 9513.2 33755 102808
Waiting: 7 33050 8655.1 30407 72691
Total: 4355 36741 9512.4 33755 102808
Percentage of the requests served within a certain time (ms)
50% 33754
66% 37740
75% 40977
80% 43010
90% 47742
95% 56277
98% 62663
99% 71301
100% 102808 (longest request)
This is on a newly installed Nginx server, using Cloudflare and APC.
Dont think ive ever seen such poor performance, so what the heck can be causing it?
Thanks.
For starters, try testing directly to the origin and take cloudflare out of the mix (unless you have the html as cacheable and you're trying to test cloudflare's ability to serve it). Given that one of cloudflare's purposes is to protect sites it's not unreasonable to think that your test might be getting rate limited (at a minimum, bypassing it will remove a possible source of investigation).
Add $request_time to your access log format for nginx and that will tell you the server-side view of performance. If it still looks horrible you might have to use something like New Relic or DynaTrace to get more detail on where the time is going (if you don't instrument the app itself).
Are you using php-fpm for connecting nginx to php? If not, you should look into it.
For times that are that bad, odds are it's in the actual application though and not so much in the config.

Resources