How to avoid 100% of JVM heap in ElasticSearch and that the garbage collector cleaups the heap?
In my case, when the JVM heap is coming to 99% the site don't respond anymore. Some tips?
Here a screen from Marvel.
elasticsearch.yml config
cluster.name: xxx
node.name: xxxx
node.data: true
node.master: true
bootstrap.mlockall: true
transport.tcp.compress: true;
transport.tcp.port: 9300
http.port: 9200
http.max_content_length: 500mb
index.search.slowlog.threshold.query.warn: 10s
index.search.slowlog.threshold.query.info: 5s
index.search.slowlog.threshold.query.debug: 2s
index.search.slowlog.threshold.query.trace: 500ms
script.engine.groovy.inline.aggs: on
script.inline: on
script.indexed: on
index.max_result_window: 10000
And a heap size of 24g on 9G index.
EDIT -- GC LOG
: 553545K->8330K(613440K), 0.0557428 secs] 732889K->188315K(25097728K), 0.0558526 secs] [Times: user=0.40 sys=0.00, real=0.06 secs]
2017-06-05T10:55:25.416+0200: 88.445: Total time for which application threads were stopped: 0.0561829 seconds, Stopping threads took: 0.0000684 seconds
2017-06-05T10:55:26.416+0200: 89.446: Total time for which application threads were stopped: 0.0002122 seconds, Stopping threads took: 0.0000832 seconds
2017-06-05T10:55:26.833+0200: 89.863: [GC (Allocation Failure) 2017-06-05T10:55:26.834+0200: 89.863: [ParNew
Desired survivor size 34865152 bytes, new threshold 6 (max 6)
- age 1: 530096 bytes, 530096 total
- age 2: 920856 bytes, 1450952 total
- age 3: 1311064 bytes, 2762016 total
- age 4: 258584 bytes, 3020600 total
- age 5: 600504 bytes, 3621104 total
- age 6: 717384 bytes, 4338488 total
: 553674K->5918K(613440K), 0.0450758 secs] 733659K->186089K(25097728K), 0.0451704 secs] [Times: user=0.35 sys=0.00, real=0.04 secs]
Related
Today there was an indexing slow log,
[2020-02-12T15:52:37,418][WARN ][i.i.s.index ] [node-1] [company/KTngnM6ASD-_KdU0FFAWRA] took[22.7s], took_millis[22703], type[_doc], id[20080943028], routing[], source[{...}]
then to check gc.log to find out why is so,
[2020-02-12T07:52:37.417+0000][22539][safepoint ] Total time for which application threads were stopped: 0.0004935 seconds, Stopping threads took: 0.0001389 seconds
[2020-02-12T07:52:37.586+0000][22539][safepoint ] Application time: 0.1682439 seconds
[2020-02-12T07:52:37.586+0000][22539][safepoint ] Entering safepoint region: GenCollectForAllocation
[2020-02-12T07:52:37.586+0000][22539][gc,start ] GC(315124) Pause Young (Allocation Failure)
[2020-02-12T07:52:37.586+0000][22539][gc,task ] GC(315124) Using 8 workers of 8 for evacuation
[2020-02-12T07:52:37.641+0000][22539][gc,age ] GC(315124) Desired survivor size 34865152 bytes, new threshold 3 (max threshold 6)
[2020-02-12T07:52:37.641+0000][22539][gc,age ] GC(315124) Age table with threshold 3 (max threshold 6)
[2020-02-12T07:52:37.641+0000][22539][gc,age ] GC(315124) - age 1: 22998672 bytes, 22998672 total
[2020-02-12T07:52:37.641+0000][22539][gc,age ] GC(315124) - age 2: 4966112 bytes, 27964784 total
[2020-02-12T07:52:37.641+0000][22539][gc,age ] GC(315124) - age 3: 10219520 bytes, 38184304 total
[2020-02-12T07:52:37.641+0000][22539][gc,age ] GC(315124) - age 4: 4875304 bytes, 43059608 total
[2020-02-12T07:52:37.641+0000][22539][gc,heap ] GC(315124) ParNew: 597611K->52614K(613440K)
[2020-02-12T07:52:37.641+0000][22539][gc,heap ] GC(315124) CMS: 4992477K->4998973K(16095680K)
[2020-02-12T07:52:37.641+0000][22539][gc,metaspace ] GC(315124) Metaspace: 103488K->103488K(1144832K)
[2020-02-12T07:52:37.641+0000][22539][gc ] GC(315124) Pause Young (Allocation Failure) 5459M->4933M(16317M) 54.724ms
[2020-02-12T07:52:37.641+0000][22539][gc,cpu ] GC(315124) User=0.35s Sys=0.00s Real=0.06s
[2020-02-12T07:52:37.641+0000][22539][safepoint ] Leaving safepoint region
it seems gc is ok, but some log do not understand, e.g.
Entering safepoint region: Cleanup
Entering safepoint region: RevokeBias
Entering safepoint region: GenCollectForAllocation
what are Cleanup, RevokeBias, GenCollectForAllocation meaning? and What Application time meaning? why are so different?
Application time: 0.1382641 seconds
Application time: 13.2106552 seconds
Application time: 106.3031188 seconds
It's funny that you mention some things, but do not provide logs for them. Anyway:
RevokeBias is when revoking of biased locking happens or when "fat" locks from biased locking are deflated.
GenCollectForAllocation - this is the reason why the safepoint was triggered. You can read it as : "generational collector for allocation failure". There are many more other reasons, FYI.
Cleanup is whatever safepoint cleanup tasks are.
AFAIK, if you want more details, you need to enable tracing log level.
I'm staking to join a CockroachDB node to a cluster.
I've created first cluster, then try to join 2nd node to the first node, but 2nd node created new cluster as follows.
Does anyone knows whats are wrong steps on the following my steps, any suggestions are wellcome.
I've started first node as follows:
cockroach start --insecure --advertise-host=163.172.156.111
* Check out how to secure your cluster: https://www.cockroachlabs.com/docs/v19.1/secure-a-cluster.html
*
CockroachDB node starting at 2019-05-11 01:11:15.45522036 +0000 UTC (took 2.5s)
build: CCL v19.1.0 # 2019/04/29 18:36:40 (go1.11.6)
webui: http://163.172.156.111:8080
sql: postgresql://root#163.172.156.111:26257?sslmode=disable
client flags: cockroach <client cmd> --host=163.172.156.111:26257 --insecure
logs: /home/ueda/cockroach-data/logs
temp dir: /home/ueda/cockroach-data/cockroach-temp449555924
external I/O path: /home/ueda/cockroach-data/extern
store[0]: path=/home/ueda/cockroach-data
status: initialized new cluster
clusterID: 3e797faa-59a1-4b0d-83b5-36143ddbdd69
nodeID: 1
Then, start secondary node to join to 163.172.156.111, but can't join:
cockroach start --insecure --advertise-addr=128.199.127.164 --join=163.172.156.111:26257
CockroachDB node starting at 2019-05-11 01:21:14.533097432 +0000 UTC (took 0.8s)
build: CCL v19.1.0 # 2019/04/29 18:36:40 (go1.11.6)
webui: http://128.199.127.164:8080
sql: postgresql://root#128.199.127.164:26257?sslmode=disable
client flags: cockroach <client cmd> --host=128.199.127.164:26257 --insecure
logs: /home/ueda/cockroach-data/logs
temp dir: /home/ueda/cockroach-data/cockroach-temp067740997
external I/O path: /home/ueda/cockroach-data/extern
store[0]: path=/home/ueda/cockroach-data
status: restarted pre-existing node
clusterID: a14e89a7-792d-44d3-89af-7037442eacbc
nodeID: 1
The cockroach.log of joining node shows some gosip error:
cat cockroach-data/logs/cockroach.log
I190511 01:21:13.762309 1 util/log/clog.go:1199 [config] file created at: 2019/05/11 01:21:13
I190511 01:21:13.762309 1 util/log/clog.go:1199 [config] running on machine: amfortas
I190511 01:21:13.762309 1 util/log/clog.go:1199 [config] binary: CockroachDB CCL v19.1.0 (x86_64-unknown-linux-gnu, built 2019/04/29 18:36:40, go1.11.6)
I190511 01:21:13.762309 1 util/log/clog.go:1199 [config] arguments: [cockroach start --insecure --advertise-addr=128.199.127.164 --join=163.172.156.111:26257]
I190511 01:21:13.762309 1 util/log/clog.go:1199 line format: [IWEF]yymmdd hh:mm:ss.uuuuuu goid file:line msg utf8=✓
I190511 01:21:13.762307 1 cli/start.go:1033 logging to directory /home/ueda/cockroach-data/logs
W190511 01:21:13.763373 1 cli/start.go:1068 RUNNING IN INSECURE MODE!
- Your cluster is open for any client that can access <all your IP addresses>.
- Any user, even root, can log in without providing a password.
- Any user, connecting as root, can read or write any data in your cluster.
- There is no network encryption nor authentication, and thus no confidentiality.
Check out how to secure your cluster: https://www.cockroachlabs.com/docs/v19.1/secure-a-cluster.html
I190511 01:21:13.763675 1 server/status/recorder.go:610 available memory from cgroups (8.0 EiB) exceeds system memory 992 MiB, using system memory
W190511 01:21:13.763752 1 cli/start.go:944 Using the default setting for --cache (128 MiB).
A significantly larger value is usually needed for good performance.
If you have a dedicated server a reasonable setting is --cache=.25 (248 MiB).
I190511 01:21:13.764011 1 server/status/recorder.go:610 available memory from cgroups (8.0 EiB) exceeds system memory 992 MiB, using system memory
W190511 01:21:13.764047 1 cli/start.go:957 Using the default setting for --max-sql-memory (128 MiB).
A significantly larger value is usually needed in production.
If you have a dedicated server a reasonable setting is --max-sql-memory=.25 (248 MiB).
I190511 01:21:13.764239 1 server/status/recorder.go:610 available memory from cgroups (8.0 EiB) exceeds system memory 992 MiB, using system memory
I190511 01:21:13.764272 1 cli/start.go:1082 CockroachDB CCL v19.1.0 (x86_64-unknown-linux-gnu, built 2019/04/29 18:36:40, go1.11.6)
I190511 01:21:13.866977 1 server/status/recorder.go:610 available memory from cgroups (8.0 EiB) exceeds system memory 992 MiB, using system memory
I190511 01:21:13.867002 1 server/config.go:386 system total memory: 992 MiB
I190511 01:21:13.867063 1 server/config.go:388 server configuration:
max offset 500000000
cache size 128 MiB
SQL memory pool size 128 MiB
scan interval 10m0s
scan min idle time 10ms
scan max idle time 1s
event log enabled true
I190511 01:21:13.867098 1 cli/start.go:929 process identity: uid 1000 euid 1000 gid 1000 egid 1000
I190511 01:21:13.867115 1 cli/start.go:554 starting cockroach node
I190511 01:21:13.868242 21 storage/engine/rocksdb.go:613 opening rocksdb instance at "/home/ueda/cockroach-data/cockroach-temp067740997"
I190511 01:21:13.894320 21 server/server.go:876 [n?] monitoring forward clock jumps based on server.clock.forward_jump_check_enabled
I190511 01:21:13.894813 21 storage/engine/rocksdb.go:613 opening rocksdb instance at "/home/ueda/cockroach-data"
W190511 01:21:13.896301 21 storage/engine/rocksdb.go:127 [rocksdb] [/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/db/version_set.cc:2566] More existing levels in DB than needed. max_bytes_for_level_multiplier may not be guaranteed.
W190511 01:21:13.905666 21 storage/engine/rocksdb.go:127 [rocksdb] [/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/db/version_set.cc:2566] More existing levels in DB than needed. max_bytes_for_level_multiplier may not be guaranteed.
I190511 01:21:13.911380 21 server/config.go:494 [n?] 1 storage engine initialized
I190511 01:21:13.911417 21 server/config.go:497 [n?] RocksDB cache size: 128 MiB
I190511 01:21:13.911427 21 server/config.go:497 [n?] store 0: RocksDB, max size 0 B, max open file limit 10000
W190511 01:21:13.912459 21 gossip/gossip.go:1496 [n?] no incoming or outgoing connections
I190511 01:21:13.913206 21 server/server.go:926 [n?] Sleeping till wall time 1557537673913178595 to catches up to 1557537674394265598 to ensure monotonicity. Delta: 481.087003ms
I190511 01:21:14.251655 65 vendor/github.com/cockroachdb/circuitbreaker/circuitbreaker.go:322 [n?] circuitbreaker: gossip [::]:26257->163.172.156.111:26257 tripped: initial connection heartbeat failed: rpc error: code = Unknown desc = client cluster ID "a14e89a7-792d-44d3-89af-7037442eacbc" doesn't match server cluster ID "3e797faa-59a1-4b0d-83b5-36143ddbdd69"
I190511 01:21:14.251695 65 vendor/github.com/cockroachdb/circuitbreaker/circuitbreaker.go:447 [n?] circuitbreaker: gossip [::]:26257->163.172.156.111:26257 event: BreakerTripped
W190511 01:21:14.251763 65 gossip/client.go:122 [n?] failed to start gossip client to 163.172.156.111:26257: initial connection heartbeat failed: rpc error: code = Unknown desc = client cluster ID "a14e89a7-792d-44d3-89af-7037442eacbc" doesn't match server cluster ID "3e797faa-59a1-4b0d-83b5-36143ddbdd69"
I190511 01:21:14.395848 21 gossip/gossip.go:392 [n1] NodeDescriptor set to node_id:1 address:<network_field:"tcp" address_field:"128.199.127.164:26257" > attrs:<> locality:<> ServerVersion:<major_val:19 minor_val:1 patch:0 unstable:0 > build_tag:"v19.1.0" started_at:1557537674395557548
W190511 01:21:14.458176 21 storage/replica_range_lease.go:506 can't determine lease status due to node liveness error: node not in the liveness table
I190511 01:21:14.458465 21 server/node.go:461 [n1] initialized store [n1,s1]: disk (capacity=24 GiB, available=18 GiB, used=2.2 MiB, logicalBytes=41 MiB), ranges=20, leases=0, queries=0.00, writes=0.00, bytesPerReplica={p10=0.00 p25=0.00 p50=0.00 p75=6467.00 p90=26940.00 pMax=43017435.00}, writesPerReplica={p10=0.00 p25=0.00 p50=0.00 p75=0.00 p90=0.00 pMax=0.00}
I190511 01:21:14.458775 21 storage/stores.go:244 [n1] read 0 node addresses from persistent storage
I190511 01:21:14.459095 21 server/node.go:699 [n1] connecting to gossip network to verify cluster ID...
W190511 01:21:14.469842 96 storage/store.go:1525 [n1,s1,r6/1:/Table/{SystemCon…-11}] could not gossip system config: [NotLeaseHolderError] r6: replica (n1,s1):1 not lease holder; lease holder unknown
I190511 01:21:14.474785 21 server/node.go:719 [n1] node connected via gossip and verified as part of cluster "a14e89a7-792d-44d3-89af-7037442eacbc"
I190511 01:21:14.475033 21 server/node.go:542 [n1] node=1: started with [<no-attributes>=/home/ueda/cockroach-data] engine(s) and attributes []
I190511 01:21:14.475393 21 server/status/recorder.go:610 [n1] available memory from cgroups (8.0 EiB) exceeds system memory 992 MiB, using system memory
I190511 01:21:14.475514 21 server/server.go:1582 [n1] starting http server at [::]:8080 (use: 128.199.127.164:8080)
I190511 01:21:14.475572 21 server/server.go:1584 [n1] starting grpc/postgres server at [::]:26257
I190511 01:21:14.475605 21 server/server.go:1585 [n1] advertising CockroachDB node at 128.199.127.164:26257
W190511 01:21:14.475655 21 jobs/registry.go:341 [n1] unable to get node liveness: node not in the liveness table
I190511 01:21:14.532949 21 server/server.go:1650 [n1] done ensuring all necessary migrations have run
I190511 01:21:14.533020 21 server/server.go:1653 [n1] serving sql connections
I190511 01:21:14.533209 21 cli/start.go:689 [config] clusterID: a14e89a7-792d-44d3-89af-7037442eacbc
I190511 01:21:14.533257 21 cli/start.go:697 node startup completed:
CockroachDB node starting at 2019-05-11 01:21:14.533097432 +0000 UTC (took 0.8s)
build: CCL v19.1.0 # 2019/04/29 18:36:40 (go1.11.6)
webui: http://128.199.127.164:8080
sql: postgresql://root#128.199.127.164:26257?sslmode=disable
client flags: cockroach <client cmd> --host=128.199.127.164:26257 --insecure
logs: /home/ueda/cockroach-data/logs
temp dir: /home/ueda/cockroach-data/cockroach-temp067740997
external I/O path: /home/ueda/cockroach-data/extern
store[0]: path=/home/ueda/cockroach-data
status: restarted pre-existing node
clusterID: a14e89a7-792d-44d3-89af-7037442eacbc
nodeID: 1
I190511 01:21:14.541205 146 server/server_update.go:67 [n1] no need to upgrade, cluster already at the newest version
I190511 01:21:14.555557 149 sql/event_log.go:135 [n1] Event: "node_restart", target: 1, info: {Descriptor:{NodeID:1 Address:128.199.127.164:26257 Attrs: Locality: ServerVersion:19.1 BuildTag:v19.1.0 StartedAt:1557537674395557548 LocalityAddress:[] XXX_NoUnkeyedLiteral:{} XXX_sizecache:0} ClusterID:a14e89a7-792d-44d3-89af-7037442eacbc StartedAt:1557537674395557548 LastUp:1557537671113461486}
I190511 01:21:14.916458 59 gossip/gossip.go:1510 [n1] node has connected to cluster via gossip
I190511 01:21:14.916660 59 storage/stores.go:263 [n1] wrote 0 node addresses to persistent storage
I190511 01:21:24.480247 116 storage/store.go:4220 [n1,s1] sstables (read amplification = 2):
0 [ 51K 1 ]: 51K
6 [ 1M 1 ]: 1M
I190511 01:21:24.480380 116 storage/store.go:4221 [n1,s1]
** Compaction Stats [default] **
Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------
L0 1/0 50.73 KB 0.5 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 8.0 0 1 0.006 0 0
L6 1/0 1.26 MB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0
Sum 2/0 1.31 MB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 8.0 0 1 0.006 0 0
Int 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 8.0 0 1 0.006 0 0
Uptime(secs): 10.6 total, 10.6 interval
Flush(GB): cumulative 0.000, interval 0.000
AddFile(GB): cumulative 0.000, interval 0.000
AddFile(Total Files): cumulative 0, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 0, interval 0
Cumulative compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count
estimated_pending_compaction_bytes: 0 B
I190511 01:21:24.481565 121 server/status/runtime.go:500 [n1] runtime stats: 170 MiB RSS, 114 goroutines, 0 B/0 B/0 B GO alloc/idle/total, 14 MiB/16 MiB CGO alloc/total, 0.0 CGO/sec, 0.0/0.0 %(u/s)time, 0.0 %gc (7x), 50 KiB/1.5 MiB (r/w)net
What is the possibly cause to block to join? Thank you for your suggestion!
It seems you had previously started the second node (the one running on 128.199.127.164) by itself, creating its own cluster.
This can be seen in the error message:
W190511 01:21:14.251763 65 gossip/client.go:122 [n?] failed to start gossip client to 163.172.156.111:26257: initial connection heartbeat failed: rpc error: code = Unknown desc = client cluster ID "a14e89a7-792d-44d3-89af-7037442eacbc" doesn't match server cluster ID "3e797faa-59a1-4b0d-83b5-36143ddbdd69"
To be able to join the cluster, the data directory of the joining node must be empty. You can either delete cockroach-data or specify an alternate directory with --store=/path/to/data-dir
I have a requirement to create a product which should support 40 concurrent users per second (I am new to working on concurrency)
To achieve this, I tried to developed one hello world spring-boot project.
i.e.,
spring-boot (1.5.9)
jetty 9.4.15
rest controller which has get endpoint
code below:
#GetMapping
public String index() {
return "Greetings from Spring Boot!";
}
App running on machine Gen10 DL360
Then I tried to benchmark using apachebench
75 concurrent users:
ab -t 120 -n 1000000 -c 75 http://10.93.243.87:9000/home/
Server Software:
Server Hostname: 10.93.243.87
Server Port: 9000
Document Path: /home/
Document Length: 27 bytes
Concurrency Level: 75
Time taken for tests: 37.184 seconds
Complete requests: 1000000
Failed requests: 0
Write errors: 0
Total transferred: 143000000 bytes
HTML transferred: 27000000 bytes
Requests per second: 26893.28 [#/sec] (mean)
Time per request: 2.789 [ms] (mean)
Time per request: 0.037 [ms] (mean, across all concurrent requests)
Transfer rate: 3755.61 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 1 23.5 0 3006
Processing: 0 2 7.8 1 404
Waiting: 0 2 7.8 1 404
Total: 0 3 24.9 2 3007
100 concurrent users:
ab -t 120 -n 1000000 -c 100 http://10.93.243.87:9000/home/
Server Software:
Server Hostname: 10.93.243.87
Server Port: 9000
Document Path: /home/
Document Length: 27 bytes
Concurrency Level: 100
Time taken for tests: 36.708 seconds
Complete requests: 1000000
Failed requests: 0
Write errors: 0
Total transferred: 143000000 bytes
HTML transferred: 27000000 bytes
Requests per second: 27241.77 [#/sec] (mean)
Time per request: 3.671 [ms] (mean)
Time per request: 0.037 [ms] (mean, across all concurrent requests)
Transfer rate: 3804.27 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 2 35.7 1 3007
Processing: 0 2 9.4 1 405
Waiting: 0 2 9.4 1 405
Total: 0 4 37.0 2 3009
500 concurrent users:
ab -t 120 -n 1000000 -c 500 http://10.93.243.87:9000/home/
Server Software:
Server Hostname: 10.93.243.87
Server Port: 9000
Document Path: /home/
Document Length: 27 bytes
Concurrency Level: 500
Time taken for tests: 36.222 seconds
Complete requests: 1000000
Failed requests: 0
Write errors: 0
Total transferred: 143000000 bytes
HTML transferred: 27000000 bytes
Requests per second: 27607.83 [#/sec] (mean)
Time per request: 18.111 [ms] (mean)
Time per request: 0.036 [ms] (mean, across all concurrent requests)
Transfer rate: 3855.39 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 14 126.2 1 7015
Processing: 0 4 22.3 1 811
Waiting: 0 3 22.3 1 810
Total: 0 18 129.2 2 7018
1000 concurrent users:
ab -t 120 -n 1000000 -c 1000 http://10.93.243.87:9000/home/
Server Software:
Server Hostname: 10.93.243.87
Server Port: 9000
Document Path: /home/
Document Length: 27 bytes
Concurrency Level: 1000
Time taken for tests: 36.534 seconds
Complete requests: 1000000
Failed requests: 0
Write errors: 0
Total transferred: 143000000 bytes
HTML transferred: 27000000 bytes
Requests per second: 27372.09 [#/sec] (mean)
Time per request: 36.534 [ms] (mean)
Time per request: 0.037 [ms] (mean, across all concurrent requests)
Transfer rate: 3822.47 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 30 190.8 1 7015
Processing: 0 6 31.4 2 1613
Waiting: 0 5 31.4 1 1613
Total: 0 36 195.5 2 7018
From above test run, I achieved ~27K per second with 75 users itself but it looks increasing the users also increasing the latency. Also, we can clearly note connect time is increasing.
I have requirement for my application to support 40k concurrent users (assume all are using own separate browsers) and request should be finished within 250 milliseconds.
Please help me on this
I am also not a grand wizard in the topic myself but here is some advice:
there is a hard limit how many request can handle one instance so if you want to support a lot of user you need more instance
if you work with multiple instance then you have to somehow distribute the requests among the instances. One popular solution is Netflix Eureka
if you don't want to maintain additional resources and the product will run in cloud then use the provided load balancing services (e.g. LoadBalancer on AWS)
also you can fine-tune your server's connection pool settings
I am testing cassandra performance with a simple model.
CREATE TABLE "NoCache" (
key ascii,
column1 ascii,
value ascii,
PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
bloom_filter_fp_chance=0.010000 AND
caching='ALL' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'SnappyCompressor'};
I am fetching 100 columns of a row key using pycassa, get/xget function (). but getting read latency about 15ms in the server.
colums=COL_FAM.get(row_key, column_count=100)
nodetool cfstats
Column Family: NoCache
SSTable count: 1
Space used (live): 103756053
Space used (total): 103756053
Number of Keys (estimate): 128
Memtable Columns Count: 0
Memtable Data Size: 0
Memtable Switch Count: 0
Read Count: 20
Read Latency: 15.717 ms.
Write Count: 0
Write Latency: NaN ms.
Pending Tasks: 0
Bloom Filter False Positives: 0
Bloom Filter False Ratio: 0.00000
Bloom Filter Space Used: 976
Compacted row minimum size: 4769
Compacted row maximum size: 557074610
Compacted row mean size: 87979499
Latency of this type is amazing! When nodetool info shows that read hits directly in the row cache.
Row Cache : size 4834713 (bytes), capacity 67108864 (bytes), 35 hits, 38 requests, 1.000 recent hit rate, 0 save period in seconds
Can anyone tell me why is cassandra taking so much time while reading from row cache?
Enable tracing and see what it's doing. http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2
I dont seem to have any caching enabled when checking in Opscenter or cfstats. Im running Cassandra 1.1.7 with Solandra on Debian. I have set the required global options in cassandra.yaml:
key_cache_size_in_mb: 800
key_cache_save_period: 14400
row_cache_size_in_mb: 800
row_cache_save_period: 15400
row_cache_provider: SerializingCacheProvider
Column Families were created as follows:
create column family example
with column_type = 'Standard'
and comparator = 'BytesType'
and default_validation_class = 'BytesType'
and key_validation_class = 'BytesType'
and read_repair_chance = 1.0
and dclocal_read_repair_chance = 0.0
and gc_grace = 864000
and min_compaction_threshold = 4
and max_compaction_threshold = 32
and replicate_on_write = true
and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
and caching = 'ALL';
Opscenter shows no data available on caching graphs and CFSTATS doesn't show any cache related fields:
Column Family: charsets
SSTable count: 1
Space used (live): 5558
Space used (total): 5558
Number of Keys (estimate): 128
Memtable Columns Count: 0
Memtable Data Size: 0
Memtable Switch Count: 0
Read Count: 61381
Read Latency: 0.123 ms.
Write Count: 0
Write Latency: NaN ms.
Pending Tasks: 0
Bloom Filter False Postives: 0
Bloom Filter False Ratio: 0.00000
Bloom Filter Space Used: 16
Compacted row minimum size: 1917
Compacted row maximum size: 2299
Compacted row mean size: 2299
Any help or suggestions are appreciated.
Sam
The caching stats have been moved from cfstats to info in Cassandra 1.1. If you run nodetool info you should see something like:
Key Cache : size 5552 (bytes), capacity 838860800 (bytes), 38 hits, 47 requests, 0.809 recent hit rate, 14400 save period in seconds
Row Cache : size 0 (bytes), capacity 838860800 (bytes), 0 hits, 0 requests, NaN recent hit rate, 15400 save period in seconds
This is because there are now global caches, rather than per-CF. It seems that Opscenter needs updating for this change - maybe there is a later version available that will work.