Performance Drop whilst testing Go Language

Performance Drop whilst testing Go Language - go

I've been testing out a simple web server written in Go with http_load. When running the test for 1 second with 100 parallel I've seen 16k requests completed. However, running the test for 10 seconds results in a similar number of requests being completed at around 1/10th of the rate of the 1 second test.
Additionally, if I run several 1 second test close together, the first test will complete 16k requests and the subsequent tests will complete just 100-200 requests.
package main
import "net/http"
func main() {
bytes := make([]byte, 1024)
for i := 0; i < len(bytes); i++ {
bytes[i] = 100
}
http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
w.Write(bytes)
})
http.ListenAndServe(":8000", nil)
}
I'm wondering whether there is any reason as to why the performance would hit a ceiling whilst dealing with this number of requests and whether there is something I have missed in the implementation of the above web server.

This is probably a limitation on your own system rather than the go server. The same kind of degradation happens if you try hitting something like google with http_load:
$> http_load -parallel 100 -seconds 10 google.txt
1000 fetches, 100 max parallel, 219000 bytes, in 10.0006 seconds
219 mean bytes/connection
99.9944 fetches/sec, 21898.8 bytes/sec
msecs/connect: 410.409 mean, 4584.36 max, 16.949 min
msecs/first-response: 279.595 mean, 3647.74 max, 35.539 min
HTTP response codes:
code 301 -- 1000
$> http_load -parallel 100 -seconds 50 google.txt
729 fetches, 100 max parallel, 159213 bytes, in 50.0008 seconds
218.399 mean bytes/connection
14.5798 fetches/sec, 3184.21 bytes/sec
msecs/connect: 1588.57 mean, 36192.6 max, 17.944 min
msecs/first-response: 237.376 mean, 33816.7 max, 33.092 min
2 bad byte counts
HTTP response codes:
code 301 -- 727
$> http_load -parallel 100 -seconds 100 google.txt
1091 fetches, 100 max parallel, 223161 bytes, in 100 seconds
204.547 mean bytes/connection
10.91 fetches/sec, 2231.61 bytes/sec
msecs/connect: 1652.16 mean, 35860.4 max, 17.825 min
msecs/first-response: 319.259 mean, 35482.1 max, 31.892 min
HTTP response codes:
code 301 -- 1019
As you can see, the rate goes down quite a bit the longer you hit google (google.txt contains the single url "http://google.com"). This is most likely due to limitations in your system (the maximum num of open connections your programs can have, memory, cpu, etc...).

Related

Understanding Spring Boot actuator `http.server.requests` metrics MAX attribute

can someone explain what does the MAX statistic refers to in the below response. I don't see it documented anywhere.
localhost:8081/actuator/metrics/http.server.requests?tag=uri:/myControllerMethod
Response:
{
"name":"http.server.requests",
"description":null,
"baseUnit":"milliseconds",
"measurements":[
{
"statistic":"COUNT",
"value":13
},
{
"statistic":"TOTAL_TIME",
"value":57.430899
},
{
"statistic":"MAX",
"value":0
}
],
"availableTags":[
{
"tag":"exception",
"values":[
"None"
]
},
{
"tag":"method",
"values":[
"GET"
]
},
{
"tag":"outcome",
"values":[
"SUCCESS"
]
},
{
"tag":"status",
"values":[
"200"
]
},
{
"tag":"commonTag",
"values":[
"somePrefix"
]
}
]
}

You can see the individual metrics by using ?tag=url:{endpoint_tag} as defined in the response of the root /actuator/metrics/http.server.requests call. The details of the measurements values are;
COUNT: Rate per second for calls.
TOTAL_TIME: The sum of the times recorded. Reported in the monitoring system's base unit of time
MAX: The maximum amount recorded. When this represents a time, it is reported in the monitoring system's base unit of time.
As given here, also here.
The discrepancies you are seeing is due to the presence of a timer. Meaning after some time currently defined MAX value for any tagged metric can be reset back to 0. Can you add some new calls to /myControllerMethod then immediately do a call to /actuator/metrics/http.server.requests to see a non-zero MAX value for given tag?
This is due to the idea behind getting MAX metric for each smaller period. When you are seeing these metrics, you will be able to get an array of MAX values rather than a single value for a long period of time.
You can get to see this in action within Micrometer source code. There is a rotate() method focused on resetting the MAX value to create above described behaviour.
You can see this is called for every poll() call, which is triggered every some period for metric gathering.

What does MAX represent
MAX represents the maximum time taken to execute endpoint.
Analysis for /user/asset/getAllAssets
COUNT TOTAL_TIME MAX
5 115 17
6 122 17 (Execution Time = 122 - 115 = 17)
7 131 17 (Execution Time = 131 - 122 = 17)
8 187 56 (Execution Time = 187 - 131 = 56)
9 204 56 From Now MAX will be 56 (Execution Time = 204 - 187 = 17)
Will MAX be 0 if we have less number of request (or 1 request) to the particular endpoint?
No number of request for particular endPoint does not affect the MAX (see Image from Spring Boot Admin)
When MAX will be 0
There is Timer which set the value 0. When the endpoint is not being called or executed for sometime Timer sets MAX to 0. Here approximate timer value is 2 minutes (120 seconds)
DistributionStatisticConfig has .expiry(Duration.ofMinutes(2)).
which sets some measurements to 0 if there is no request has been made in between expiry time or rotate time.
How I have determined the timer value?
For that, I have taken 6 samples (executed the same endpoint for 6 times). For that, I have determined the time difference between the time of calling the endpoint - time for when MAX set back to zero
More Details
UPDATE
Document has been updated.
NOTE:
Max for basic DistributionSummary implementations such as CumulativeDistributionSummary, StepDistributionSummary is a time
window max (TimeWindowMax).
It means that its value is the maximum value during a time window.
If the time window ends, it'll be reset to 0 and a new time window starts again.
Time window size will be the step size of the meter registry unless expiry in DistributionStatisticConfig is set to other value
explicitly.

Why Locking in Go much slower than Java? Lot's of time spent in Mutex.Lock() Mutex.Unlock()

I've written a small Go library (go-patan) that collects a running min/max/avg/stddev of certain variables. I compared it to an equivalent Java implementation (patan), and to my surprise the Java implementation is much faster. I would like to understand why.
The library basically consists of a simple data store with a lock that serializes reads and writes. This is a snippet of the code:
type Store struct {
durations map[string]*Distribution
counters map[string]int64
samples map[string]*Distribution
lock *sync.Mutex
}
func (store *Store) addSample(key string, value int64) {
store.addToStore(store.samples, key, value)
}
func (store *Store) addDuration(key string, value int64) {
store.addToStore(store.durations, key, value)
}
func (store *Store) addToCounter(key string, value int64) {
store.lock.Lock()
defer store.lock.Unlock()
store.counters[key] = store.counters[key] + value
}
func (store *Store) addToStore(destination map[string]*Distribution, key string, value int64) {
store.lock.Lock()
defer store.lock.Unlock()
distribution, exists := destination[key]
if !exists {
distribution = NewDistribution()
destination[key] = distribution
}
distribution.addSample(value)
}
I've benchmarked the GO and Java implementations (go-benchmark-gist, java-benchmark-gist) and Java wins by far, but I don't understand why:
Go Results:
10 threads with 20000 items took 133 millis
100 threads with 20000 items took 1809 millis
1000 threads with 20000 items took 17576 millis
10 threads with 200000 items took 1228 millis
100 threads with 200000 items took 17900 millis
Java Results:
10 threads with 20000 items takes 89 millis
100 threads with 20000 items takes 265 millis
1000 threads with 20000 items takes 2888 millis
10 threads with 200000 items takes 311 millis
100 threads with 200000 items takes 3067 millis
I've profiled the program with the Go's pprof and generated a call graph call-graph. This shows that it basically spends all the time in sync.(*Mutex).Lock() and sync.(*Mutex).Unlock().
The Top20 calls according to the profiler:
(pprof) top20
59110ms of 73890ms total (80.00%)
Dropped 22 nodes (cum <= 369.45ms)
Showing top 20 nodes out of 65 (cum >= 50220ms)
flat flat% sum% cum cum%
8900ms 12.04% 12.04% 8900ms 12.04% runtime.futex
7270ms 9.84% 21.88% 7270ms 9.84% runtime/internal/atomic.Xchg
7020ms 9.50% 31.38% 7020ms 9.50% runtime.procyield
4560ms 6.17% 37.56% 4560ms 6.17% sync/atomic.CompareAndSwapUint32
4400ms 5.95% 43.51% 4400ms 5.95% runtime/internal/atomic.Xadd
4210ms 5.70% 49.21% 22040ms 29.83% runtime.lock
3650ms 4.94% 54.15% 3650ms 4.94% runtime/internal/atomic.Cas
3260ms 4.41% 58.56% 3260ms 4.41% runtime/internal/atomic.Load
2220ms 3.00% 61.56% 22810ms 30.87% sync.(*Mutex).Lock
1870ms 2.53% 64.10% 1870ms 2.53% runtime.osyield
1540ms 2.08% 66.18% 16740ms 22.66% runtime.findrunnable
1430ms 1.94% 68.11% 1430ms 1.94% runtime.freedefer
1400ms 1.89% 70.01% 1400ms 1.89% sync/atomic.AddUint32
1250ms 1.69% 71.70% 1250ms 1.69% github.com/toefel18/go-patan/statistics/lockbased.(*Distribution).addSample
1240ms 1.68% 73.38% 3140ms 4.25% runtime.deferreturn
1070ms 1.45% 74.83% 6520ms 8.82% runtime.systemstack
1010ms 1.37% 76.19% 1010ms 1.37% runtime.newdefer
1000ms 1.35% 77.55% 1000ms 1.35% runtime.mapaccess1_faststr
950ms 1.29% 78.83% 15660ms 21.19% runtime.semacquire
860ms 1.16% 80.00% 50220ms 67.97% main.Benchmrk.func1
Can someone explain why locking in Go seems to be so much slower than in Java, what am I doing wrong? I've also written a channel based implementation in Go, but that is even slower.

It is best to avoid defer in tiny functions that need high performance since it is expensive. In most other cases, there is no need to avoid it since the cost of defer is outweighed by the code around it.
I'd also recommend using lock sync.Mutex instead of using a pointer. The pointer creates a slight amount of extra work for the programmer (initialisation, nil bugs), and a slight amount of extra work for the garbage collector.

I've also posted this question on the golang-nuts group. The reply from Jesper Louis Andersen explains quite well that Java uses synchronization optimization techniques such as lock escape analysis/lock elision and lock coarsening.
Java JIT might be taking the lock and allowing multiple updates at once within the lock to increase performance. I ran the Java benchmark with -Djava.compiler=NONE which gave dramatic performance, but is not a fair comparison.
I assume that many of these optimization techniques have less impact in a production environment.

Golang Server Timeout

I have a very simple go server:
package main
import(
"net/http"
"fmt"
"log"
)
func test(w http.ResponseWriter, r *http.Request){
fmt.Println("No bid")
http.Error(w, "NoBid", 204)
}
func main() {
http.HandleFunc("/test/bid", test)
http.ListenAndServe(":8080", nil)
log.Println("Done serving")
}
I then run the apache benchmark tool:
ab -c 50 -n 50000 -p post.txt http://127.0.0.1:8080/test/bid
The Server runs and responds to about 15000 requests and then times out. I was wondering why this happens and if there is something I can do about this.

If you running in Linux, Maybe too many open files, so it can't create connection, You need change system config to support more connections.
For example,
edit /etc/security/limits.conf add
* soft nofile 100000
* soft nofile 100000
To open more file.
edit /etc/sysctl.conf
# use more port
net.ipv4.ip_local_port_range = 1024 65000
# keep alive timeout
net.ipv4.tcp_keepalive_time = 300
# allow reuse
net.ipv4.tcp_tw_reuse = 1
# quick recovery
net.ipv4.tcp_tw_recycle = 1

I tried to replicate your problem on my linux amd64 laptop with no success - it worked fine even with
ab -c 200 -n 500000 -p post.txt http://127.0.0.1:8080/test/bid
There were about 28,000 sockets open though which may be bumping a limit on your system.
A more real world test might be to turn keepalives on which maxes out at 400 sockets
ab -k -c 200 -n 500000 -p post.txt http://127.0.0.1:8080/test/bid
The result for this was
Server Software:
Server Hostname: 127.0.0.1
Server Port: 8080
Document Path: /test/bid
Document Length: 6 bytes
Concurrency Level: 200
Time taken for tests: 33.807 seconds
Complete requests: 500000
Failed requests: 0
Write errors: 0
Keep-Alive requests: 500000
Total transferred: 77000000 bytes
Total body sent: 221500000
HTML transferred: 3000000 bytes
Requests per second: 14790.04 [#/sec] (mean)
Time per request: 13.523 [ms] (mean)
Time per request: 0.068 [ms] (mean, across all concurrent requests)
Transfer rate: 2224.28 [Kbytes/sec] received
6398.43 kb/s sent
8622.71 kb/s total
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 11
Processing: 0 14 5.2 13 42
Waiting: 0 14 5.2 13 42
Total: 0 14 5.2 13 42
Percentage of the requests served within a certain time (ms)
50% 13
66% 16
75% 17
80% 18
90% 20
95% 21
98% 24
99% 27
100% 42 (longest request)
I suggest you try ab with -k and take a look at tuning your system for lots of open sockets

Jmeter Summary Report analysis

Can anyone please explain like how to analyze jmeter's Summary Report?
Example:
Label : Login Action(sampler)
Sample# : 1
average: 104 // What does this mean actually?
min : 104 // What does this mean actually?
max : 104
stddev : 0 // What does this mean actually?
error% : 0
Throughput : 9.615384615 // What does this mean actually?
Kb/Sec : 91.74053486 // What does this mean actually?
Average Bytes : 9770 // What does this mean actually?

It is pretty straightforward:
Average, min and max is the response times for the request in milliseconds. The response time os from the request is sent to the response is received. Since you have only one request they are of course all equal.
stddev is a measure of the variation of the response times: http://en.wikipedia.org/wiki/Standard_deviation.
Throughput is number of requests per second. With a average response time a little over 100ms the throughput is a little below 10.
Kb/Sec is the number of kilobytes transferred per second. It is Average Bytes (per request) * Throughput. I am not sure if the average bytes is for only the response or both the request and the response. My guess is that it is the latter. It is the latter: response headers and body.

I have just found the very nice and simple explanation here:
http://jmeterresults.blogspot.jp/2012/07/jmeterunderstanding-summary-report.html

Node.js slower than Apache

I am comparing performance of Node.js (0.5.1-pre) vs Apache (2.2.17) for a very simple scenario - serving a text file.
Here's the code I use for node server:
var http = require('http')
, fs = require('fs')
fs.readFile('/var/www/README.txt',
function(err, data) {
http.createServer(function(req, res) {
res.writeHead(200, {'Content-Type': 'text/plain'})
res.end(data)
}).listen(8080, '127.0.0.1')
}
)
For Apache I am just using whatever default configuration which goes with Ubuntu 11.04
When running Apache Bench with the following parameters against Apache
ab -n10000 -c100 http://127.0.0.1/README.txt
I get the following runtimes:
Time taken for tests: 1.083 seconds
Complete requests: 10000
Failed requests: 0
Write errors: 0
Total transferred: 27630000 bytes
HTML transferred: 24830000 bytes
Requests per second: 9229.38 [#/sec] (mean)
Time per request: 10.835 [ms] (mean)
Time per request: 0.108 [ms] (mean, across all concurrent requests)
Transfer rate: 24903.11 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.8 0 9
Processing: 5 10 2.0 10 23
Waiting: 4 10 1.9 10 21
Total: 6 11 2.1 10 23
Percentage of the requests served within a certain time (ms)
50% 10
66% 11
75% 11
80% 11
90% 14
95% 15
98% 18
99% 19
100% 23 (longest request)
When running Apache bench against node instance, these are the runtimes:
Time taken for tests: 1.712 seconds
Complete requests: 10000
Failed requests: 0
Write errors: 0
Total transferred: 25470000 bytes
HTML transferred: 24830000 bytes
Requests per second: 5840.83 [#/sec] (mean)
Time per request: 17.121 [ms] (mean)
Time per request: 0.171 [ms] (mean, across all concurrent requests)
Transfer rate: 14527.94 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.9 0 8
Processing: 0 17 8.8 16 53
Waiting: 0 17 8.6 16 48
Total: 1 17 8.7 17 53
Percentage of the requests served within a certain time (ms)
50% 17
66% 21
75% 23
80% 25
90% 28
95% 31
98% 35
99% 38
100% 53 (longest request)
Which is clearly slower than Apache. This is especially surprising if you consider the fact that Apache is doing a lot of other stuff, like logging etc.
Am I doing it wrong? Or is Node.js really slower in this scenario?
Edit 1: I do notice that node's concurrency is better - when increasing a number of simultaneous request to 1000, Apache starts dropping few of them, while node works fine with no connections dropped.

Dynamic requests
node.js is very good at handling at lot small dynamic requests(which can be hanging/long-polling). But it is not good at handling large buffers. Ryan Dahl(Author node.js) explained this one of his presentations. I recommend you to study these slides. I also watched this online somewhere.
Garbage Collector
As you can see from slide(13 from 45) it is bad at big buffers.
Slide 15 from 45:
V8 has a generational garbage
collector. Moves objects around
randomly. Node can’t get a pointer to
raw string data to write to socket.
Use Buffer
Slide 16 from 45
Using Node’s new Buffer object, the
results change.
Still not that good as for example nginx, but a lot better. Also these slides are pretty old so probably Ryan has even improved this.
CDN
Still I don't think you should be using node.js to host static files. You are probably better of hosting them on a CDN which is optimized for hosting static files. Some popular CDN's(some even free for) via WIKI.
NGinx(+Memcached)
If you don't want to use CDN to host your static files I recommend you to use Nginx with memcached instead which is very fast.

In this scenario Apache is probably doing sendfile which result in kernel sending chunk of memory data (cached by fs driver) directly to socket. In the case of node there is some overhead in copying data in userspace between v8, libeio and kernel (see this great article on using sendfile in node)
There are plenty possible scenarios where node will outperform Apache, like 'send stream of data with constant slow speed to as many tcp connections as possible'

The result of your benchmark can change in favor of node.js if you increase the concurrency and use cache in node.js
A sample code from the book "Node Cookbook":
var http = require('http');
var path = require('path');
var fs = require('fs');
var mimeTypes = {
'.js' : 'text/javascript',
'.html': 'text/html',
'.css' : 'text/css'
} ;
var cache = {};
function cacheAndDeliver(f, cb) {
if (!cache[f]) {
fs.readFile(f, function(err, data) {
if (!err) {
cache[f] = {content: data} ;
}
cb(err, data);
});
return;
}
console.log('loading ' + f + ' from cache');
cb(null, cache[f].content);
}
http.createServer(function (request, response) {
var lookup = path.basename(decodeURI(request.url)) || 'index.html';
var f = 'content/'+lookup;
fs.exists(f, function (exists) {
if (exists) {
fs.readFile(f, function(err,data) {
if (err) { response.writeHead(500);
response.end('Server Error!'); return; }
var headers = {'Content-type': mimeTypes[path.extname(lookup)]};
response.writeHead(200, headers);
response.end(data);
});
return;
}
response.writeHead(404); //no such file found!
response.end('Page Not Found!');
});

Really all you're doing here is getting the system to copy data between buffers in memory, in different process's address spaces - the disk cache means you aren't really touching the disk, and you're using local sockets.
So the fewer copies have to be done per request, the faster it goes.
Edit: I suggested adding caching, but in fact I see now you're already doing that - you read the file once, then start the server and send back the same buffer each time.
Have you tried appending the header part to the file data once upfront, so you only have to do a single write operation for each request?

$ cat /var/www/test.php
<?php
for ($i=0; $i<10; $i++) {
echo "hello, world\n";
}
$ ab -r -n 100000 -k -c 50 http://localhost/test.php
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests
Server Software: Apache/2.2.17
Server Hostname: localhost
Server Port: 80
Document Path: /test.php
Document Length: 130 bytes
Concurrency Level: 50
Time taken for tests: 3.656 seconds
Complete requests: 100000
Failed requests: 0
Write errors: 0
Keep-Alive requests: 100000
Total transferred: 37100000 bytes
HTML transferred: 13000000 bytes
Requests per second: 27350.70 [#/sec] (mean)
Time per request: 1.828 [ms] (mean)
Time per request: 0.037 [ms] (mean, across all concurrent requests)
Transfer rate: 9909.29 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 3
Processing: 0 2 2.7 0 29
Waiting: 0 2 2.7 0 29
Total: 0 2 2.7 0 29
Percentage of the requests served within a certain time (ms)
50% 0
66% 2
75% 3
80% 3
90% 5
95% 7
98% 10
99% 12
100% 29 (longest request)
$ cat node-test.js
var http = require('http');
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'text/plain'});
res.end('Hello World\n');
}).listen(1337, "127.0.0.1");
console.log('Server running at http://127.0.0.1:1337/');
$ ab -r -n 100000 -k -c 50 http://localhost:1337/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests
Server Software:
Server Hostname: localhost
Server Port: 1337
Document Path: /
Document Length: 12 bytes
Concurrency Level: 50
Time taken for tests: 14.708 seconds
Complete requests: 100000
Failed requests: 0
Write errors: 0
Keep-Alive requests: 0
Total transferred: 7600000 bytes
HTML transferred: 1200000 bytes
Requests per second: 6799.08 [#/sec] (mean)
Time per request: 7.354 [ms] (mean)
Time per request: 0.147 [ms] (mean, across all concurrent requests)
Transfer rate: 504.62 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 3
Processing: 0 7 3.8 7 28
Waiting: 0 7 3.8 7 28
Total: 1 7 3.8 7 28
Percentage of the requests served within a certain time (ms)
50% 7
66% 9
75% 10
80% 11
90% 12
95% 14
98% 16
99% 17
100% 28 (longest request)
$ node --version
v0.4.8

In the below benchmarks,
Apache:
$ apache2 -version
Server version: Apache/2.2.17 (Ubuntu)
Server built: Feb 22 2011 18:35:08
PHP APC cache/accelerator is installed.
Test run on my laptop, a Sager NP9280 with Core I7 920, 12G of RAM.
$ uname -a
Linux presto 2.6.38-8-generic #42-Ubuntu SMP Mon Apr 11 03:31:24 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
KUbuntu natty

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Performance Drop whilst testing Go Language - go

Related

Understanding Spring Boot actuator `http.server.requests` metrics MAX attribute

Why Locking in Go much slower than Java? Lot's of time spent in Mutex.Lock() Mutex.Unlock()

Golang Server Timeout

Jmeter Summary Report analysis

Node.js slower than Apache

Categories

Resources