Go http server slow benchmark performance - performance

I was trying to find the maximum throughput of a go webserver. I ran simplewebserver on a 8 core machine(Intel Xeon 2.5 Mhz) and ran wrk tool on a different 8 core machine. Iperf command shows around 8-10Gbps between these machines . Initially, I made a mistake by using apache ab tool that gave only 16k requests/second. The problem was same as link. Now when I switched to wrk tool, I am getting around 90k requests per second and ~ 11MB/sec.
on the first 8 core machine, I ran simplewebserver.go
package main
import (
"io"
"net/http"
"runtime"
)
func hello(w http.ResponseWriter, r *http.Request) {
io.WriteString(w, "Hello world!")
}
func main() {
runtime.GOMAXPROCS(8)
http.HandleFunc("/", hello)
http.ListenAndServe(":8000", nil)
}
On a different 8 core machine, i ran
./wrk -t8 -c1000 -d10s http://10.0.0.6:8000/
Result:
Running 10s test # http://10.0.0.6:8000/
8 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 42.65ms 114.89ms 1.96s 91.33%
Req/Sec 11.54k 3.01k 25.10k 74.88%
923158 requests in 10.10s, 113.57MB read
Socket errors: connect 0, read 0, write 0, timeout 20
Requests/sec: 91415.11
Transfer/sec: 11.25MB
Can i say that I have reached maximum throughput of a golang web server? I am getting 11 MB/sec which looks quite small for the basic program. If I run the wrk tool on www.google.com page, I am getting around 200 MB/sec. Even for larger complex computing systems like kafka(java server), the benchmark results get more than 70 MB/sec(https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines). How can a golang server handle extreme workloads of millions of requests per second(more than 100MB/sec minimum)? Am I wrong in any assumptions?
UPDATE:
I ran the same program on a 16 core machine with the same specs. I changed the maxprocs to 16 and results went higher to 120k requests/sec. I feel that I have the reached the maximum of golang http server and hence I am not finding any significant increment in the requests/sec.

Golang net/http is rather slow. If the goal is to maximize the requests rate there are other implementations. For example https://github.com/valyala/fasthttp In my tests fasthttp doubles the query rate. My test is a short request and short reply.
The library comes with limitations, like no HTTP/2 support. If you need HTTP for REST API and you are not looking for interoperability fasthttp will fit nicely

"premature optimization is the root of all evil"1
I would suggest building something for your use case and then attempting to optimize. It is difficult to build a good benchmark that tests what you want so make sure you are testing what you want and not something like the speed of your router.

Related

Performance issues running kafacat over slow speed link

I have weird performance issues with fetch.max.message.bytes parameter in librdkafka consumer implementation (version 0.11). I run some tests using kafkacat over slow speed network link (4 Mbps) and received following results:
1024 bytes = 1.740s
65536 bytes = 2.670s
131072 bytes = 7.070s
When I started debugging protocol messages I noticed a way to high RTT values.
|SEND|rdkafka| Sent FetchRequest (v4, 68 bytes # 0, CorrId 8)
|RECV|rdkafka| Received FetchResponse (v4, 131120 bytes, CorrId 8, rtt 607.68ms)
It seems that increase of fetch.max.message.bytes value causes very high network saturation, but it carries only single message per request.
On the other hand when I try kafka-console-consumer everything runs as expected (I get throughput 500 messages per second over the same network link).
Any ideas or suggestions where to look at?
You are most likely hitting issue #1384 which is a bug with the new v0.11.0 consumer. The bug is particularly evident on slow links or with MessageSets/batches with few messages.
A fix is on the way.

WRK benchmark: Please explain results

I'm trying to perform benchmark blocking vs non-blocking io.
As a blocking, I use spring-boot.
As a non-blocking - play framework.
I Call endpoint which makes 4 remote calls (sequentially)
Here are results:
Spring boot
Running 5m test # http://localhost:8080/remote-multiple
4 threads and 20000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 713.90ms 429.81ms 2.00s 82.16%
Req/Sec 33.04 22.55 340.00 68.84%
9602 requests in 5.00m, 201.85MB read
Socket errors: connect 15145, read 21942, write 0, timeout 2401
Requests/sec: 32.00
Transfer/sec: 688.83KB
Play framework
Running 5m test # http://localhost:9000/remote-multiple
4 threads and 20000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.40s 395.00ms 2.00s 54.73%
Req/Sec 37.97 21.21 230.00 70.71%
39792 requests in 5.00m, 846.41MB read
Socket errors: connect 15145, read 36185, write 60, timeout 35944
Requests/sec: 132.61
Transfer/sec: 2.82MB
Though Play shows higher Requests/sec, it has more errors, timeout, latency.
Can anybody pls explain, what do all those params in result mean?
Are Requests/sec - succesfull requests per second? etc
P.S.:
I run this benchmark on MBP 2013, Intel Core i7, 2.3 GHz, 16GB
If you post benchmarks : Start with a link to the actual benchmark code. It has no value without. Second : In general, testing code on the same machine is considered bad practice.

Strategies in reducing network delay from 500 milliseconds to 60-100 milliseconds

I am building an autocomplete functionality and realized the amount of time taken between the client and server is too high (in the range of 450-700ms)
My first stop was to check if this is result of server delay.
But as you can see these Nginx logs are almost always 0.001 milliseconds (request time is the last column). It’s hardly a cause of concern.
So it became very evident that I am losing time between the server and the client. My benchmarks are Google Instant's response times. Which almost often is in the range of 30-40 milliseconds. Magnitudes lower.
Although it’s easy to say that Google's has massive infrastructural capabilities to deliver at this speed, I wanted to push myself to learn if this is possible for someone who is not that level. If not 60 milliseconds, I want to shave off 100-150 milliseconds.
Here are some of the strategies I’ve managed to learn.
Enable httpd slowstart and initcwnd
Ensure SPDY if you are on https
Ensure results are http compressed
Etc.
What are the other things I can do here?
e.g
Does have a persistent connection help?
Should I reduce the response size dramatically?
Edit:
Here are the ping and traceroute numbers. The site is served via cloudflare from a Fremont Linode machine.
mymachine-Mac:c name$ ping site.com
PING site.com (160.158.244.92): 56 data bytes
64 bytes from 160.158.244.92: icmp_seq=0 ttl=58 time=95.557 ms
64 bytes from 160.158.244.92: icmp_seq=1 ttl=58 time=103.569 ms
64 bytes from 160.158.244.92: icmp_seq=2 ttl=58 time=95.679 ms
^C
--- site.com ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 95.557/98.268/103.569/3.748 ms
mymachine-Mac:c name$ traceroute site.com
traceroute: Warning: site.com has multiple addresses; using 160.158.244.92
traceroute to site.com (160.158.244.92), 64 hops max, 52 byte packets
1 192.168.1.1 (192.168.1.1) 2.393 ms 1.159 ms 1.042 ms
2 172.16.70.1 (172.16.70.1) 22.796 ms 64.531 ms 26.093 ms
3 abts-kk-static-ilp-241.11.181.122.airtel.in (122.181.11.241) 28.483 ms 21.450 ms 25.255 ms
4 aes-static-005.99.22.125.airtel.in (125.22.99.5) 30.558 ms 30.448 ms 40.344 ms
5 182.79.245.62 (182.79.245.62) 75.568 ms 101.446 ms 68.659 ms
6 13335.sgw.equinix.com (202.79.197.132) 84.201 ms 65.092 ms 56.111 ms
7 160.158.244.92 (160.158.244.92) 66.352 ms 69.912 ms 81.458 ms
mymachine-Mac:c name$ site.com (160.158.244.92): 56 data bytes
I may well be wrong, but personally I smell a rat. Your times aren't justified by your setup; I believe that your requests ought to run much faster.
If at all possible, generate a short query using curl and intercept it with tcpdump on both the client and the server.
It could be a bandwidth/concurrency problem on the hosting. Check out its diagnostic panel, or try estimating the traffic.
You can try and save a response query into a static file, then requesting that file (taking care as not to trigger the local browser cache...), to see whether the problem might be in processing the data (either server or client side).
Does this slowness affect every request, or only the autocomplete ones? If the latter, and no matter what nginx says, it might be some inefficiency/delay in recovering or formatting the autocompletion data for output.
Also, you can try and serve a static response bypassing nginx altogether, in case this is an issue with nginx (and for that matter: have you checked out nginx' error log?).
One approach I didn't see you mention is to use SSL sessions: you can add the following into your nginx conf to make sure that an SSL handshake (very expensive process) does not happen with every connection request:
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;
See "HTTPS server optimizations" here:
http://nginx.org/en/docs/http/configuring_https_servers.html
I would recommend using New Relic if you aren't already. It is possible that the server-side code you have could be the issue. If you think that might be the issue, there are quite a few free code profiling tools.
You may want to consider an option to preload autocomplete options in the background while the page is rendered and then save a trie or whatever structure you use on the client in the local storage. When the user starts typing in the autocomplete field you would not need to send any requests to the server but instead query local storage.
Web SQL Database and IndexedDB introduce databases to the clientside.
Instead of the common pattern of posting data to the server via
XMLHttpRequest or form submission, you can leverage these clientside
databases. Decreasing HTTP requests is a primary target of all
performance engineers, so using these as a datastore can save many
trips via XHR or form posts back to the server. localStorage and
sessionStorage could be used in some cases, like capturing form
submission progress, and have seen to be noticeably faster than the
client-side database APIs.
For example, if you have a data grid component or an inbox with
hundreds of messages, storing the data locally in a database will save
you HTTP roundtrips when the user wishes to search, filter, or sort. A
list of friends or a text input autocomplete could be filtered on each
keystroke, making for a much more responsive user experience.
http://www.html5rocks.com/en/tutorials/speed/quick/#toc-databases

why is lift framework so slow?

I am learning Lift framework. I used project template from git://github.com/lift/lift_25_sbt.git and started server with container:start sbt command.
This template application displays just simple menu. If i use ab from apache to measure performance, its pretty bad. I am missing something fundamental to improve performance?
C:\Program Files\Apache Software Foundation\httpd-2.0.64\Apache2\bin>ab -n 30 -c
10 http://127.0.0.1:8080/
Benchmarking 127.0.0.1 (be patient).....done
Server Software: Jetty(8.1.7.v20120910)
Server Hostname: 127.0.0.1
Server Port: 8080
Document Path: /
Document Length: 2877 bytes
Concurrency Level: 10
Time taken for tests: 8.15625 seconds
Complete requests: 30
Failed requests: 0
Write errors: 0
Total transferred: 96275 bytes
HTML transferred: 86310 bytes
Requests per second: 3.74 [#/sec] (mean)
Time per request: 2671.875 [ms] (mean)
Time per request: 267.188 [ms] (mean, across all concurrent requests)
Transfer rate: 11.73 [Kbytes/sec] received
Are you running it in production mode? I found i had like 30 rps in devel, but over 250 in production mode. ( https://www.assembla.com/spaces/liftweb/wiki/Run_Modes )
as mentioned earlier, you should run Lift in production mode. This is the main key to get good performance. All templates are cached this way, and other optimizations apply.
if you want to measure something not abstract and theoretical, then you should give the JVM time to "warm up", apply it's JIT optimizations. So, you should apply ~thousand requests first and totally ignore them (must be a couple of seconds). After that, measure the real performance of an already-started server
there are some slight JVM optimizations, altrough they seem more like a hack to me, and give a boost not more than around 20%
other small hacks include serving static content with nginx, starting the application in a dedicated server instead of Simple Build Tool and such.

Node.JS Response Time

Threw Node.JS on an AWS instance and was testing the request times, got some interesting results.
I used the following for the server:
var http = require('http');
http.createServer(function(req, res) {
res.writeHead(200, {'Content-Type': 'text/html'});
res.write('Hello World');
res.end();
}).listen(8080);
I have an average 90ms delay to this server, but the total request takes ~350+ms. Obviously a lot of time is wasted on the box. I made sure the DNS was cached prior to the test.
I did an Apache bench on the server with a cocurrency of 1000 - it finished 10,000 requests in 4.3 seconds... which means an average of 4.3 milliseconds.
UPDATE: Just for grins, I installed Apache + PHP on the same machine and did a simple "Hello World" echo and got a 92ms response time on average (two over ping).
Is there a setting somewhere that I am missing?
While Chrome Developer Tools is a good way to investigate front end performance, it gives you very rough estimate of actual server timings / cpu load. If you have ~350 ms total request time in dev tools, subtract from this number DNS lookup + Connecting + Sending + Receiving, then subtract roundtrip time (90 ms?) and after that you have first estimate. In your case I expect actual request time to be sub-millisecond. Try to run this code on server:
var http = require('http');
function hrdiff(t1, t2) {
var s = t2[0] - t1[0];
var mms = t2[1] - t1[1];
return s*1e9 + mms;
}
http.createServer(function(req, res) {
var t1 = process.hrtime();
res.writeHead(200, {'Content-Type': 'text/html'});
res.write('Hello World');
res.end();
var t2 = process.hrtime();
console.log(hrdiff(t1, t2));
}).listen(8080);
Based on ab result you should estimate average send+request+receive time to be at most 4.2 ms ( 4200 ms / 10000 req) (did you run it on server? what concurrency?)
I absolutely hate answering my own questions, but I want to pass along what I have discovered with future readers.
tl;dr: There is something wrong with res.write(). Use express.js or res.end()
I just got through conducting a bunch of tests. I setup multiple types of Node server and mixed in things like PHP and Nginx. Here are my findings.
As stated previously, with the snippet I included above, I was loosing around 250ms/request, but the Apache benchmarks did not replicate that issues. I then proceeded to do a PHP test and got results ranging from 2ms - 20ms over ping... a big difference.
This prompted some more research, I started a Nginx server and proxied the node through it, and somehow, that magically changed the response from 250ms to 15ms over ping. I was on par with that PHP script, but that is a really confusing result. Usually additional hops would slow things down.
Intrigued, I made an express.js server as well - and something even more interesting happened, the ping was 2ms over on its own. I dug around in the source for quite a while and noticed that it lacked a res.write() command, rather, it went straight to the res.end(). I started another server removing the "Hello World" from the res.write and added it to the res.end and amazingly, the ping was 0ms over ping.
I did some searching on this, wanted to see if it was a well-known issue and came across this SO question, who had the exact same problem. nodejs response speed and nginx
Overall, intresting stuff. Make sure you optimize your responses and send it all at once.
Best of luck to everyone!

Resources