How to high concurency in Spring Boot - spring-boot

I have a requirement to create a product which should support 40 concurrent users per second (I am new to working on concurrency)
To achieve this, I tried to developed one hello world spring-boot project.
i.e.,
spring-boot (1.5.9)
jetty 9.4.15
rest controller which has get endpoint
code below:
#GetMapping
public String index() {
return "Greetings from Spring Boot!";
}
App running on machine Gen10 DL360
Then I tried to benchmark using apachebench
75 concurrent users:
ab -t 120 -n 1000000 -c 75 http://10.93.243.87:9000/home/
Server Software:
Server Hostname: 10.93.243.87
Server Port: 9000
Document Path: /home/
Document Length: 27 bytes
Concurrency Level: 75
Time taken for tests: 37.184 seconds
Complete requests: 1000000
Failed requests: 0
Write errors: 0
Total transferred: 143000000 bytes
HTML transferred: 27000000 bytes
Requests per second: 26893.28 [#/sec] (mean)
Time per request: 2.789 [ms] (mean)
Time per request: 0.037 [ms] (mean, across all concurrent requests)
Transfer rate: 3755.61 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 1 23.5 0 3006
Processing: 0 2 7.8 1 404
Waiting: 0 2 7.8 1 404
Total: 0 3 24.9 2 3007
100 concurrent users:
ab -t 120 -n 1000000 -c 100 http://10.93.243.87:9000/home/
Server Software:
Server Hostname: 10.93.243.87
Server Port: 9000
Document Path: /home/
Document Length: 27 bytes
Concurrency Level: 100
Time taken for tests: 36.708 seconds
Complete requests: 1000000
Failed requests: 0
Write errors: 0
Total transferred: 143000000 bytes
HTML transferred: 27000000 bytes
Requests per second: 27241.77 [#/sec] (mean)
Time per request: 3.671 [ms] (mean)
Time per request: 0.037 [ms] (mean, across all concurrent requests)
Transfer rate: 3804.27 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 2 35.7 1 3007
Processing: 0 2 9.4 1 405
Waiting: 0 2 9.4 1 405
Total: 0 4 37.0 2 3009
500 concurrent users:
ab -t 120 -n 1000000 -c 500 http://10.93.243.87:9000/home/
Server Software:
Server Hostname: 10.93.243.87
Server Port: 9000
Document Path: /home/
Document Length: 27 bytes
Concurrency Level: 500
Time taken for tests: 36.222 seconds
Complete requests: 1000000
Failed requests: 0
Write errors: 0
Total transferred: 143000000 bytes
HTML transferred: 27000000 bytes
Requests per second: 27607.83 [#/sec] (mean)
Time per request: 18.111 [ms] (mean)
Time per request: 0.036 [ms] (mean, across all concurrent requests)
Transfer rate: 3855.39 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 14 126.2 1 7015
Processing: 0 4 22.3 1 811
Waiting: 0 3 22.3 1 810
Total: 0 18 129.2 2 7018
1000 concurrent users:
ab -t 120 -n 1000000 -c 1000 http://10.93.243.87:9000/home/
Server Software:
Server Hostname: 10.93.243.87
Server Port: 9000
Document Path: /home/
Document Length: 27 bytes
Concurrency Level: 1000
Time taken for tests: 36.534 seconds
Complete requests: 1000000
Failed requests: 0
Write errors: 0
Total transferred: 143000000 bytes
HTML transferred: 27000000 bytes
Requests per second: 27372.09 [#/sec] (mean)
Time per request: 36.534 [ms] (mean)
Time per request: 0.037 [ms] (mean, across all concurrent requests)
Transfer rate: 3822.47 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 30 190.8 1 7015
Processing: 0 6 31.4 2 1613
Waiting: 0 5 31.4 1 1613
Total: 0 36 195.5 2 7018
From above test run, I achieved ~27K per second with 75 users itself but it looks increasing the users also increasing the latency. Also, we can clearly note connect time is increasing.
I have requirement for my application to support 40k concurrent users (assume all are using own separate browsers) and request should be finished within 250 milliseconds.
Please help me on this

I am also not a grand wizard in the topic myself but here is some advice:
there is a hard limit how many request can handle one instance so if you want to support a lot of user you need more instance
if you work with multiple instance then you have to somehow distribute the requests among the instances. One popular solution is Netflix Eureka
if you don't want to maintain additional resources and the product will run in cloud then use the provided load balancing services (e.g. LoadBalancer on AWS)
also you can fine-tune your server's connection pool settings

Related

Can't add node to the cockroachde cluster

I'm staking to join a CockroachDB node to a cluster.
I've created first cluster, then try to join 2nd node to the first node, but 2nd node created new cluster as follows.
Does anyone knows whats are wrong steps on the following my steps, any suggestions are wellcome.
I've started first node as follows:
cockroach start --insecure --advertise-host=163.172.156.111
* Check out how to secure your cluster: https://www.cockroachlabs.com/docs/v19.1/secure-a-cluster.html
*
CockroachDB node starting at 2019-05-11 01:11:15.45522036 +0000 UTC (took 2.5s)
build: CCL v19.1.0 # 2019/04/29 18:36:40 (go1.11.6)
webui: http://163.172.156.111:8080
sql: postgresql://root#163.172.156.111:26257?sslmode=disable
client flags: cockroach <client cmd> --host=163.172.156.111:26257 --insecure
logs: /home/ueda/cockroach-data/logs
temp dir: /home/ueda/cockroach-data/cockroach-temp449555924
external I/O path: /home/ueda/cockroach-data/extern
store[0]: path=/home/ueda/cockroach-data
status: initialized new cluster
clusterID: 3e797faa-59a1-4b0d-83b5-36143ddbdd69
nodeID: 1
Then, start secondary node to join to 163.172.156.111, but can't join:
cockroach start --insecure --advertise-addr=128.199.127.164 --join=163.172.156.111:26257
CockroachDB node starting at 2019-05-11 01:21:14.533097432 +0000 UTC (took 0.8s)
build: CCL v19.1.0 # 2019/04/29 18:36:40 (go1.11.6)
webui: http://128.199.127.164:8080
sql: postgresql://root#128.199.127.164:26257?sslmode=disable
client flags: cockroach <client cmd> --host=128.199.127.164:26257 --insecure
logs: /home/ueda/cockroach-data/logs
temp dir: /home/ueda/cockroach-data/cockroach-temp067740997
external I/O path: /home/ueda/cockroach-data/extern
store[0]: path=/home/ueda/cockroach-data
status: restarted pre-existing node
clusterID: a14e89a7-792d-44d3-89af-7037442eacbc
nodeID: 1
The cockroach.log of joining node shows some gosip error:
cat cockroach-data/logs/cockroach.log
I190511 01:21:13.762309 1 util/log/clog.go:1199 [config] file created at: 2019/05/11 01:21:13
I190511 01:21:13.762309 1 util/log/clog.go:1199 [config] running on machine: amfortas
I190511 01:21:13.762309 1 util/log/clog.go:1199 [config] binary: CockroachDB CCL v19.1.0 (x86_64-unknown-linux-gnu, built 2019/04/29 18:36:40, go1.11.6)
I190511 01:21:13.762309 1 util/log/clog.go:1199 [config] arguments: [cockroach start --insecure --advertise-addr=128.199.127.164 --join=163.172.156.111:26257]
I190511 01:21:13.762309 1 util/log/clog.go:1199 line format: [IWEF]yymmdd hh:mm:ss.uuuuuu goid file:line msg utf8=✓
I190511 01:21:13.762307 1 cli/start.go:1033 logging to directory /home/ueda/cockroach-data/logs
W190511 01:21:13.763373 1 cli/start.go:1068 RUNNING IN INSECURE MODE!
- Your cluster is open for any client that can access <all your IP addresses>.
- Any user, even root, can log in without providing a password.
- Any user, connecting as root, can read or write any data in your cluster.
- There is no network encryption nor authentication, and thus no confidentiality.
Check out how to secure your cluster: https://www.cockroachlabs.com/docs/v19.1/secure-a-cluster.html
I190511 01:21:13.763675 1 server/status/recorder.go:610 available memory from cgroups (8.0 EiB) exceeds system memory 992 MiB, using system memory
W190511 01:21:13.763752 1 cli/start.go:944 Using the default setting for --cache (128 MiB).
A significantly larger value is usually needed for good performance.
If you have a dedicated server a reasonable setting is --cache=.25 (248 MiB).
I190511 01:21:13.764011 1 server/status/recorder.go:610 available memory from cgroups (8.0 EiB) exceeds system memory 992 MiB, using system memory
W190511 01:21:13.764047 1 cli/start.go:957 Using the default setting for --max-sql-memory (128 MiB).
A significantly larger value is usually needed in production.
If you have a dedicated server a reasonable setting is --max-sql-memory=.25 (248 MiB).
I190511 01:21:13.764239 1 server/status/recorder.go:610 available memory from cgroups (8.0 EiB) exceeds system memory 992 MiB, using system memory
I190511 01:21:13.764272 1 cli/start.go:1082 CockroachDB CCL v19.1.0 (x86_64-unknown-linux-gnu, built 2019/04/29 18:36:40, go1.11.6)
I190511 01:21:13.866977 1 server/status/recorder.go:610 available memory from cgroups (8.0 EiB) exceeds system memory 992 MiB, using system memory
I190511 01:21:13.867002 1 server/config.go:386 system total memory: 992 MiB
I190511 01:21:13.867063 1 server/config.go:388 server configuration:
max offset 500000000
cache size 128 MiB
SQL memory pool size 128 MiB
scan interval 10m0s
scan min idle time 10ms
scan max idle time 1s
event log enabled true
I190511 01:21:13.867098 1 cli/start.go:929 process identity: uid 1000 euid 1000 gid 1000 egid 1000
I190511 01:21:13.867115 1 cli/start.go:554 starting cockroach node
I190511 01:21:13.868242 21 storage/engine/rocksdb.go:613 opening rocksdb instance at "/home/ueda/cockroach-data/cockroach-temp067740997"
I190511 01:21:13.894320 21 server/server.go:876 [n?] monitoring forward clock jumps based on server.clock.forward_jump_check_enabled
I190511 01:21:13.894813 21 storage/engine/rocksdb.go:613 opening rocksdb instance at "/home/ueda/cockroach-data"
W190511 01:21:13.896301 21 storage/engine/rocksdb.go:127 [rocksdb] [/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/db/version_set.cc:2566] More existing levels in DB than needed. max_bytes_for_level_multiplier may not be guaranteed.
W190511 01:21:13.905666 21 storage/engine/rocksdb.go:127 [rocksdb] [/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/db/version_set.cc:2566] More existing levels in DB than needed. max_bytes_for_level_multiplier may not be guaranteed.
I190511 01:21:13.911380 21 server/config.go:494 [n?] 1 storage engine initialized
I190511 01:21:13.911417 21 server/config.go:497 [n?] RocksDB cache size: 128 MiB
I190511 01:21:13.911427 21 server/config.go:497 [n?] store 0: RocksDB, max size 0 B, max open file limit 10000
W190511 01:21:13.912459 21 gossip/gossip.go:1496 [n?] no incoming or outgoing connections
I190511 01:21:13.913206 21 server/server.go:926 [n?] Sleeping till wall time 1557537673913178595 to catches up to 1557537674394265598 to ensure monotonicity. Delta: 481.087003ms
I190511 01:21:14.251655 65 vendor/github.com/cockroachdb/circuitbreaker/circuitbreaker.go:322 [n?] circuitbreaker: gossip [::]:26257->163.172.156.111:26257 tripped: initial connection heartbeat failed: rpc error: code = Unknown desc = client cluster ID "a14e89a7-792d-44d3-89af-7037442eacbc" doesn't match server cluster ID "3e797faa-59a1-4b0d-83b5-36143ddbdd69"
I190511 01:21:14.251695 65 vendor/github.com/cockroachdb/circuitbreaker/circuitbreaker.go:447 [n?] circuitbreaker: gossip [::]:26257->163.172.156.111:26257 event: BreakerTripped
W190511 01:21:14.251763 65 gossip/client.go:122 [n?] failed to start gossip client to 163.172.156.111:26257: initial connection heartbeat failed: rpc error: code = Unknown desc = client cluster ID "a14e89a7-792d-44d3-89af-7037442eacbc" doesn't match server cluster ID "3e797faa-59a1-4b0d-83b5-36143ddbdd69"
I190511 01:21:14.395848 21 gossip/gossip.go:392 [n1] NodeDescriptor set to node_id:1 address:<network_field:"tcp" address_field:"128.199.127.164:26257" > attrs:<> locality:<> ServerVersion:<major_val:19 minor_val:1 patch:0 unstable:0 > build_tag:"v19.1.0" started_at:1557537674395557548
W190511 01:21:14.458176 21 storage/replica_range_lease.go:506 can't determine lease status due to node liveness error: node not in the liveness table
I190511 01:21:14.458465 21 server/node.go:461 [n1] initialized store [n1,s1]: disk (capacity=24 GiB, available=18 GiB, used=2.2 MiB, logicalBytes=41 MiB), ranges=20, leases=0, queries=0.00, writes=0.00, bytesPerReplica={p10=0.00 p25=0.00 p50=0.00 p75=6467.00 p90=26940.00 pMax=43017435.00}, writesPerReplica={p10=0.00 p25=0.00 p50=0.00 p75=0.00 p90=0.00 pMax=0.00}
I190511 01:21:14.458775 21 storage/stores.go:244 [n1] read 0 node addresses from persistent storage
I190511 01:21:14.459095 21 server/node.go:699 [n1] connecting to gossip network to verify cluster ID...
W190511 01:21:14.469842 96 storage/store.go:1525 [n1,s1,r6/1:/Table/{SystemCon…-11}] could not gossip system config: [NotLeaseHolderError] r6: replica (n1,s1):1 not lease holder; lease holder unknown
I190511 01:21:14.474785 21 server/node.go:719 [n1] node connected via gossip and verified as part of cluster "a14e89a7-792d-44d3-89af-7037442eacbc"
I190511 01:21:14.475033 21 server/node.go:542 [n1] node=1: started with [<no-attributes>=/home/ueda/cockroach-data] engine(s) and attributes []
I190511 01:21:14.475393 21 server/status/recorder.go:610 [n1] available memory from cgroups (8.0 EiB) exceeds system memory 992 MiB, using system memory
I190511 01:21:14.475514 21 server/server.go:1582 [n1] starting http server at [::]:8080 (use: 128.199.127.164:8080)
I190511 01:21:14.475572 21 server/server.go:1584 [n1] starting grpc/postgres server at [::]:26257
I190511 01:21:14.475605 21 server/server.go:1585 [n1] advertising CockroachDB node at 128.199.127.164:26257
W190511 01:21:14.475655 21 jobs/registry.go:341 [n1] unable to get node liveness: node not in the liveness table
I190511 01:21:14.532949 21 server/server.go:1650 [n1] done ensuring all necessary migrations have run
I190511 01:21:14.533020 21 server/server.go:1653 [n1] serving sql connections
I190511 01:21:14.533209 21 cli/start.go:689 [config] clusterID: a14e89a7-792d-44d3-89af-7037442eacbc
I190511 01:21:14.533257 21 cli/start.go:697 node startup completed:
CockroachDB node starting at 2019-05-11 01:21:14.533097432 +0000 UTC (took 0.8s)
build: CCL v19.1.0 # 2019/04/29 18:36:40 (go1.11.6)
webui: http://128.199.127.164:8080
sql: postgresql://root#128.199.127.164:26257?sslmode=disable
client flags: cockroach <client cmd> --host=128.199.127.164:26257 --insecure
logs: /home/ueda/cockroach-data/logs
temp dir: /home/ueda/cockroach-data/cockroach-temp067740997
external I/O path: /home/ueda/cockroach-data/extern
store[0]: path=/home/ueda/cockroach-data
status: restarted pre-existing node
clusterID: a14e89a7-792d-44d3-89af-7037442eacbc
nodeID: 1
I190511 01:21:14.541205 146 server/server_update.go:67 [n1] no need to upgrade, cluster already at the newest version
I190511 01:21:14.555557 149 sql/event_log.go:135 [n1] Event: "node_restart", target: 1, info: {Descriptor:{NodeID:1 Address:128.199.127.164:26257 Attrs: Locality: ServerVersion:19.1 BuildTag:v19.1.0 StartedAt:1557537674395557548 LocalityAddress:[] XXX_NoUnkeyedLiteral:{} XXX_sizecache:0} ClusterID:a14e89a7-792d-44d3-89af-7037442eacbc StartedAt:1557537674395557548 LastUp:1557537671113461486}
I190511 01:21:14.916458 59 gossip/gossip.go:1510 [n1] node has connected to cluster via gossip
I190511 01:21:14.916660 59 storage/stores.go:263 [n1] wrote 0 node addresses to persistent storage
I190511 01:21:24.480247 116 storage/store.go:4220 [n1,s1] sstables (read amplification = 2):
0 [ 51K 1 ]: 51K
6 [ 1M 1 ]: 1M
I190511 01:21:24.480380 116 storage/store.go:4221 [n1,s1]
** Compaction Stats [default] **
Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------
L0 1/0 50.73 KB 0.5 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 8.0 0 1 0.006 0 0
L6 1/0 1.26 MB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0
Sum 2/0 1.31 MB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 8.0 0 1 0.006 0 0
Int 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 8.0 0 1 0.006 0 0
Uptime(secs): 10.6 total, 10.6 interval
Flush(GB): cumulative 0.000, interval 0.000
AddFile(GB): cumulative 0.000, interval 0.000
AddFile(Total Files): cumulative 0, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 0, interval 0
Cumulative compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count
estimated_pending_compaction_bytes: 0 B
I190511 01:21:24.481565 121 server/status/runtime.go:500 [n1] runtime stats: 170 MiB RSS, 114 goroutines, 0 B/0 B/0 B GO alloc/idle/total, 14 MiB/16 MiB CGO alloc/total, 0.0 CGO/sec, 0.0/0.0 %(u/s)time, 0.0 %gc (7x), 50 KiB/1.5 MiB (r/w)net
What is the possibly cause to block to join? Thank you for your suggestion!
It seems you had previously started the second node (the one running on 128.199.127.164) by itself, creating its own cluster.
This can be seen in the error message:
W190511 01:21:14.251763 65 gossip/client.go:122 [n?] failed to start gossip client to 163.172.156.111:26257: initial connection heartbeat failed: rpc error: code = Unknown desc = client cluster ID "a14e89a7-792d-44d3-89af-7037442eacbc" doesn't match server cluster ID "3e797faa-59a1-4b0d-83b5-36143ddbdd69"
To be able to join the cluster, the data directory of the joining node must be empty. You can either delete cockroach-data or specify an alternate directory with --store=/path/to/data-dir

During load testing how to read the report in terminal

While I'm doing load testing of the golang api there is a report is generated but I don't know what is it and how to read it:-
I run the command in terminal
echo "GET http://localhost:8080/api" | vegeta attack -rate=100/m | vegeta report
then it will produce a below report:-
Requests [total, rate] 138, 1.68
Duration [total, attack, wait] 1m22.20931745s, 1m22.200130205s, 9.187245ms
Latencies [mean, 50, 95, 99, max] 8.956174ms, 9.06458ms, 10.682252ms, 16.007578ms, 46.439935ms
Bytes In [total, mean] 19596, 142.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:138
Error Set:
or when I run the echo "GET http://localhost:8080/api" | vegeta attack -rate=100/m | vegeta report -type=json
then the report generated in json format like below:-
{"latencies:
{"total":103506418,
"mean":9409674,
"50th":9484403,
"95th":11918898,
"99th":12008257,
"max":12008257},
"bytes_in":{"total":1562,"mean":142},
"bytes_out":
{"total":0,"mean":0},
"earliest":"2018-10-16T14:15:13.251091124+05:30",
"latest":"2018-10-16T14:15:19.251141502+05:30",
"end":"2018-10-16T14:15:19.260119671+05:30",
"duration":6000050378,
"wait":8978169,
"requests":11,
"rate":1.8333179401848014,
"success":1,
"status_codes":{"200":11},
"errors":[]}
How to understand this report. Is there any document for this or anybody knows about it?
Let's understand it line by line
Requests [total, rate] 138, 1.68
This line prints the total number of requests fired in the session (138) along with the rate per second (1.8 requests per second)
Duration [total, attack, wait] 1m22.20931745s, 1m22.200130205s, 9.187245ms
Total Duration of the attack which should be sum of time spent in requests and time spent in waiting for response
Latencies [mean, 50, 95, 99, max] 8.956174ms, 9.06458ms, 10.682252ms, 16.007578ms, 46.439935ms
This is simple and most useful : mean latency in milliseconds, 50th percentile, 95th percentile and 99th percile latency along with the request which took max latency
99th percentile latency would mean 99% of responses were served within this time
Depending on your product, you should consider 95th or 99th as the real number to improve
Bytes In [total, mean] 19596, 142.00
Total bytes received for all response as well as the mean bytes per response
Bytes Out [total, mean] 0, 0.00
Total bytes sent for all requests as well as the mean bytes per request. Since you are using GET which does not contain any payload it is 0 for you
Success [ratio] 100.00%
Success percentage : 100% of your requests were successful
Status Codes [code:count] 200:138
Status code division by response code: In your case all 138 requests responded with 200 response
Error Set:
Error code division: If there were any errors 400/500s , it would be reported here. This is empty as you have 100% success rate

Why hping3 rtt >> sockperf latency

I tried to run tcp hping3 on linux VM in same network, got avg rtt ~5ms
sudo hping3 -S -p 22 10.1.0.8 -c 100
...
len=44 ip=10.1.0.8 ttl=64 DF id=0 sport=22 flags=SA seq=98 win=29200 rtt=0.6 ms
len=44 ip=10.1.0.8 ttl=64 DF id=0 sport=22 flags=SA seq=99 win=29200 rtt=1.4 ms
--- 10.1.0.8 hping statistic ---
100 packets transmitted, 100 packets received, 0% packet loss
round-trip min/avg/max = 0.6/5.2/9.7 ms
If I measure latency using sockperf tool avg latency is getting ~ 0.5ms
(sockperf.bin ping-pong -i 10.1.0.8 -p 8302 -t 15 --pps=max )
sockperf output:
[[2;35m====> avg-lat=495.943 (std-dev=484.312)
[[0msockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0sockperf: Summary: Latency is 495.943 usecsockperf:
[[2;35mTotal 15119 observations
[[0m; each percentile contains 151.19 observationssockperf: ---> <MAX> observation = 6839.398sockperf: ---> percentile 99.999 = 6839.398sockperf: ---> percentile 99.990 = 5292.623sockperf: ---> percentile 99.900 = 4023.327sockperf: ---> percentile 99.000 = 2434.115sockperf: ---> percentile 90.000 = 1005.612sockperf: ---> percentile 75.000 = 638.746sockperf: ---> percentile 50.000 = 360.516sockperf: ---> percentile 25.000 = 178.134sockperf: ---> <MIN> observation = 45.356
Wanted to know what could be the reason for such large difference between latency of these two tool. Does internal way of measuring tcp rtt from hping3 and tcp latency from sockperf same?
Am I doing anything wrong here?
To verify I also tried to measure tcp latency between two windows VM in same network using psping tool
Connecting to 10.1.0.7:8888: from 10.1.0.6:62312: 1.03ms
TCP connect statistics for 10.1.0.7:8888:
Sent = 100, Received = 100, Lost = 0 (0% loss),
Minimum = 0.59ms, Maximum = 9.82ms, Average = 1.05ms

Golang Server Timeout

I have a very simple go server:
package main
import(
"net/http"
"fmt"
"log"
)
func test(w http.ResponseWriter, r *http.Request){
fmt.Println("No bid")
http.Error(w, "NoBid", 204)
}
func main() {
http.HandleFunc("/test/bid", test)
http.ListenAndServe(":8080", nil)
log.Println("Done serving")
}
I then run the apache benchmark tool:
ab -c 50 -n 50000 -p post.txt http://127.0.0.1:8080/test/bid
The Server runs and responds to about 15000 requests and then times out. I was wondering why this happens and if there is something I can do about this.
If you running in Linux, Maybe too many open files, so it can't create connection, You need change system config to support more connections.
For example,
edit /etc/security/limits.conf add
* soft nofile 100000
* soft nofile 100000
To open more file.
edit /etc/sysctl.conf
# use more port
net.ipv4.ip_local_port_range = 1024 65000
# keep alive timeout
net.ipv4.tcp_keepalive_time = 300
# allow reuse
net.ipv4.tcp_tw_reuse = 1
# quick recovery
net.ipv4.tcp_tw_recycle = 1
I tried to replicate your problem on my linux amd64 laptop with no success - it worked fine even with
ab -c 200 -n 500000 -p post.txt http://127.0.0.1:8080/test/bid
There were about 28,000 sockets open though which may be bumping a limit on your system.
A more real world test might be to turn keepalives on which maxes out at 400 sockets
ab -k -c 200 -n 500000 -p post.txt http://127.0.0.1:8080/test/bid
The result for this was
Server Software:
Server Hostname: 127.0.0.1
Server Port: 8080
Document Path: /test/bid
Document Length: 6 bytes
Concurrency Level: 200
Time taken for tests: 33.807 seconds
Complete requests: 500000
Failed requests: 0
Write errors: 0
Keep-Alive requests: 500000
Total transferred: 77000000 bytes
Total body sent: 221500000
HTML transferred: 3000000 bytes
Requests per second: 14790.04 [#/sec] (mean)
Time per request: 13.523 [ms] (mean)
Time per request: 0.068 [ms] (mean, across all concurrent requests)
Transfer rate: 2224.28 [Kbytes/sec] received
6398.43 kb/s sent
8622.71 kb/s total
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 11
Processing: 0 14 5.2 13 42
Waiting: 0 14 5.2 13 42
Total: 0 14 5.2 13 42
Percentage of the requests served within a certain time (ms)
50% 13
66% 16
75% 17
80% 18
90% 20
95% 21
98% 24
99% 27
100% 42 (longest request)
I suggest you try ab with -k and take a look at tuning your system for lots of open sockets

Node.js slower than Apache

I am comparing performance of Node.js (0.5.1-pre) vs Apache (2.2.17) for a very simple scenario - serving a text file.
Here's the code I use for node server:
var http = require('http')
, fs = require('fs')
fs.readFile('/var/www/README.txt',
function(err, data) {
http.createServer(function(req, res) {
res.writeHead(200, {'Content-Type': 'text/plain'})
res.end(data)
}).listen(8080, '127.0.0.1')
}
)
For Apache I am just using whatever default configuration which goes with Ubuntu 11.04
When running Apache Bench with the following parameters against Apache
ab -n10000 -c100 http://127.0.0.1/README.txt
I get the following runtimes:
Time taken for tests: 1.083 seconds
Complete requests: 10000
Failed requests: 0
Write errors: 0
Total transferred: 27630000 bytes
HTML transferred: 24830000 bytes
Requests per second: 9229.38 [#/sec] (mean)
Time per request: 10.835 [ms] (mean)
Time per request: 0.108 [ms] (mean, across all concurrent requests)
Transfer rate: 24903.11 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.8 0 9
Processing: 5 10 2.0 10 23
Waiting: 4 10 1.9 10 21
Total: 6 11 2.1 10 23
Percentage of the requests served within a certain time (ms)
50% 10
66% 11
75% 11
80% 11
90% 14
95% 15
98% 18
99% 19
100% 23 (longest request)
When running Apache bench against node instance, these are the runtimes:
Time taken for tests: 1.712 seconds
Complete requests: 10000
Failed requests: 0
Write errors: 0
Total transferred: 25470000 bytes
HTML transferred: 24830000 bytes
Requests per second: 5840.83 [#/sec] (mean)
Time per request: 17.121 [ms] (mean)
Time per request: 0.171 [ms] (mean, across all concurrent requests)
Transfer rate: 14527.94 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.9 0 8
Processing: 0 17 8.8 16 53
Waiting: 0 17 8.6 16 48
Total: 1 17 8.7 17 53
Percentage of the requests served within a certain time (ms)
50% 17
66% 21
75% 23
80% 25
90% 28
95% 31
98% 35
99% 38
100% 53 (longest request)
Which is clearly slower than Apache. This is especially surprising if you consider the fact that Apache is doing a lot of other stuff, like logging etc.
Am I doing it wrong? Or is Node.js really slower in this scenario?
Edit 1: I do notice that node's concurrency is better - when increasing a number of simultaneous request to 1000, Apache starts dropping few of them, while node works fine with no connections dropped.
Dynamic requests
node.js is very good at handling at lot small dynamic requests(which can be hanging/long-polling). But it is not good at handling large buffers. Ryan Dahl(Author node.js) explained this one of his presentations. I recommend you to study these slides. I also watched this online somewhere.
Garbage Collector
As you can see from slide(13 from 45) it is bad at big buffers.
Slide 15 from 45:
V8 has a generational garbage
collector. Moves objects around
randomly. Node can’t get a pointer to
raw string data to write to socket.
Use Buffer
Slide 16 from 45
Using Node’s new Buffer object, the
results change.
Still not that good as for example nginx, but a lot better. Also these slides are pretty old so probably Ryan has even improved this.
CDN
Still I don't think you should be using node.js to host static files. You are probably better of hosting them on a CDN which is optimized for hosting static files. Some popular CDN's(some even free for) via WIKI.
NGinx(+Memcached)
If you don't want to use CDN to host your static files I recommend you to use Nginx with memcached instead which is very fast.
In this scenario Apache is probably doing sendfile which result in kernel sending chunk of memory data (cached by fs driver) directly to socket. In the case of node there is some overhead in copying data in userspace between v8, libeio and kernel (see this great article on using sendfile in node)
There are plenty possible scenarios where node will outperform Apache, like 'send stream of data with constant slow speed to as many tcp connections as possible'
The result of your benchmark can change in favor of node.js if you increase the concurrency and use cache in node.js
A sample code from the book "Node Cookbook":
var http = require('http');
var path = require('path');
var fs = require('fs');
var mimeTypes = {
'.js' : 'text/javascript',
'.html': 'text/html',
'.css' : 'text/css'
} ;
var cache = {};
function cacheAndDeliver(f, cb) {
if (!cache[f]) {
fs.readFile(f, function(err, data) {
if (!err) {
cache[f] = {content: data} ;
}
cb(err, data);
});
return;
}
console.log('loading ' + f + ' from cache');
cb(null, cache[f].content);
}
http.createServer(function (request, response) {
var lookup = path.basename(decodeURI(request.url)) || 'index.html';
var f = 'content/'+lookup;
fs.exists(f, function (exists) {
if (exists) {
fs.readFile(f, function(err,data) {
if (err) { response.writeHead(500);
response.end('Server Error!'); return; }
var headers = {'Content-type': mimeTypes[path.extname(lookup)]};
response.writeHead(200, headers);
response.end(data);
});
return;
}
response.writeHead(404); //no such file found!
response.end('Page Not Found!');
});
Really all you're doing here is getting the system to copy data between buffers in memory, in different process's address spaces - the disk cache means you aren't really touching the disk, and you're using local sockets.
So the fewer copies have to be done per request, the faster it goes.
Edit: I suggested adding caching, but in fact I see now you're already doing that - you read the file once, then start the server and send back the same buffer each time.
Have you tried appending the header part to the file data once upfront, so you only have to do a single write operation for each request?
$ cat /var/www/test.php
<?php
for ($i=0; $i<10; $i++) {
echo "hello, world\n";
}
$ ab -r -n 100000 -k -c 50 http://localhost/test.php
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests
Server Software: Apache/2.2.17
Server Hostname: localhost
Server Port: 80
Document Path: /test.php
Document Length: 130 bytes
Concurrency Level: 50
Time taken for tests: 3.656 seconds
Complete requests: 100000
Failed requests: 0
Write errors: 0
Keep-Alive requests: 100000
Total transferred: 37100000 bytes
HTML transferred: 13000000 bytes
Requests per second: 27350.70 [#/sec] (mean)
Time per request: 1.828 [ms] (mean)
Time per request: 0.037 [ms] (mean, across all concurrent requests)
Transfer rate: 9909.29 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 3
Processing: 0 2 2.7 0 29
Waiting: 0 2 2.7 0 29
Total: 0 2 2.7 0 29
Percentage of the requests served within a certain time (ms)
50% 0
66% 2
75% 3
80% 3
90% 5
95% 7
98% 10
99% 12
100% 29 (longest request)
$ cat node-test.js
var http = require('http');
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'text/plain'});
res.end('Hello World\n');
}).listen(1337, "127.0.0.1");
console.log('Server running at http://127.0.0.1:1337/');
$ ab -r -n 100000 -k -c 50 http://localhost:1337/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests
Server Software:
Server Hostname: localhost
Server Port: 1337
Document Path: /
Document Length: 12 bytes
Concurrency Level: 50
Time taken for tests: 14.708 seconds
Complete requests: 100000
Failed requests: 0
Write errors: 0
Keep-Alive requests: 0
Total transferred: 7600000 bytes
HTML transferred: 1200000 bytes
Requests per second: 6799.08 [#/sec] (mean)
Time per request: 7.354 [ms] (mean)
Time per request: 0.147 [ms] (mean, across all concurrent requests)
Transfer rate: 504.62 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 3
Processing: 0 7 3.8 7 28
Waiting: 0 7 3.8 7 28
Total: 1 7 3.8 7 28
Percentage of the requests served within a certain time (ms)
50% 7
66% 9
75% 10
80% 11
90% 12
95% 14
98% 16
99% 17
100% 28 (longest request)
$ node --version
v0.4.8
In the below benchmarks,
Apache:
$ apache2 -version
Server version: Apache/2.2.17 (Ubuntu)
Server built: Feb 22 2011 18:35:08
PHP APC cache/accelerator is installed.
Test run on my laptop, a Sager NP9280 with Core I7 920, 12G of RAM.
$ uname -a
Linux presto 2.6.38-8-generic #42-Ubuntu SMP Mon Apr 11 03:31:24 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
KUbuntu natty

Resources