There are 3 objects stored in my map - couple of MB each. They don't change so it makes sense to cache them locally at the node. And that's what I thought I was doing before I realized the average get latency is huge and that slows down my computations by large. See that hazelcast console:
This makes my wonder where did it get from. Is it those 90 and 48 misses which I think happend at first? The computations are run in parallel so I figure they could all issue a reguest to get before the entries were even cached and thus all of those would not benefit from near-cache at this point. Is it then some pre-loading method so that I would run it before I trigger all those parallel tasks? Btw. why entry memory is 0 even if there are entries in that near cache data table?
Here is my map config:
<map name="commons">
<in-memory-format>BINARY</in-memory-format>
<backup-count>0</backup-count>
<async-backup-count>0</async-backup-count>
<eviction-policy>NONE</eviction-policy>
<near-cache>
<in-memory-format>OBJECT</in-memory-format>
<max-size>0</max-size>
<time-to-live-seconds>0</time-to-live-seconds>
<max-idle-seconds>0</max-idle-seconds>
<eviction-policy>NONE</eviction-policy>
<invalidate-on-change>true</invalidate-on-change>
<cache-local-entries>true</cache-local-entries>
</near-cache>
</map>
The actual question is why there are so many misses in the near cache and is it where that huge average get latency may come from?
The latency that management center shows, is the latency after a request hit the server. If you have a Near Cache and you hit Near Cache, that will not show on the Man.Center. I suspect that you shall not be observing the high latency from your application. I see that there have been 34 events. I assume this entry have been updated. When an entry is updated, it is evicted from Near Cache. The subsequent read will hit the server.
Related
We are using Hazelcast (v4.2) as a cache solution. We are using only IMap data structure to store the data.
In Management Center, we are seeing the Map memory is less than 5% and "Others" is taking upto 90% of the heap memory. The value for Others fluctuates between 50 and 90.
We are not able to understand what is it taking this much memory. This sometimes brings down the cluster also, when it reaches 100%.
Has someone faced similar issue?
I am looking to deploy a single node cassandra on aws m4.large instance. Our use case is more read oriented i.e there will be much more reads than writes. We have around 1 gb data now. Now i am wondering about latency of each read and write? Also how many concurrent read a single node can handle? And i am very confused about when to scale i.e have another node deployed. Does this depend solely on data size or we have to scale if read write request reaches a certain limit?
Casssandra can handle quite an amount of requests per node. You will want to look at cassandra-stress (https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCStress_t.html) and YCSB (https://github.com/brianfrankcooper/YCSB/wiki) for some testing.
Cassandra can be scaled out to hande more data (more disk space, same replication factor), or to handle more requests (more replicas) or even both.
1GB of data is such a small value you an m4.large value can keep all your data in memory - if you need really low latency you can enable the row cache with a proper value of row_cache_size_in_mb and maybe row_cache_save_period (see https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsSetCaching.html for caching details). Then all your data will be cached in memory all the time and you can get really low latencies. Fast disks (io1) will also give you lower latency.
But try for yourself with some testing.
Recently our cluster has seen extreme performance degradation. We had 3 nodes, 64 GB, 4CPU (2 core) each for an index that is 250M records, 60GB large. Performance was acceptable for months.
Since then we've:
1. Added a fourth server, same configuration.
2. Split the index into two indexes, query them with an alias
3. Disable paging (windows server 2012)
4. Added synonym analysis on one field
Our cluster can now survive for a few hours before it's basically useless. I have to restart elastic on each node to rectify the problem. We tried bumping each node to 8 cpus (2 cores) with little to no gain.
One issue is that EVERY QUERY uses up 100% of the cpu of whatever node it hits. Every query is facetted on 3+ fields, which hasn't changed since our cluster was healthy. Unfortunately I'm not sure if this was an happening before, but certainly it seems like an issue. We need to be able to respond to more than one request every few seconds obviously. When multiple requests come in at the same time the performance doesn't seem to get worse for those particular responses. Again, over time, the performance slows to a crawl; the CPU (all cores) stays maxed out indefinitely.
I'm using elasticsearch 1.3.4 and the plugin elasticsearch-analysis-phonetic 2.3.0 on every box and have been even when our performance wasn't so terrible.
Any ideas?
UPDATE:
it seems like the performance issue is due to index aliasing. When I pointed the site to a single index that ultimately stores about 80% of the data, the CPU wasn't being throttled. There were still a few 100% spikes, but they were much shorter. When I pointed it back to the alias (which points to two indexes total), I could literally bring the cluster down by refreshing the page a dozen times quickly: CPU usage goes to 100% every query and gets stuck there with many in a row.
Is there a known issue with elastic search aliases? Am I using the alias incorrectly?
UPDATE 2:
Found the cause in the logs. Paging queries are TERRIBLE. Is this a known bug in elastic? If I run an empty query then try and view the last page (from 100,000,000 e.g.) it brings the whole cluster down. That SINGLE QUERY. It gets through the first 1.5M results then quits, all the while taking up 100% of the CPU for over a minute.
UPDATE 3:
So here's somethings else strange. Pointing to an old index on dev (same size, no aliases) and trying to reproduce the paging issue; the cluster doesn't get hit immediately. It has 1% cpu usage for the first 20 seconds after the query. The query returns with an error before the CPU usage every goes up. About 2 minutes later, CPU usage spikes to 100% and server basically crashes (can't do anything else because CPU is so over taxed). On the production index this CPU load is instantaneous (it happens immediately after a query is made)
Without checking certain metrics it is very difficult to identify the cause of slow response or any other issue. But from the data you have mentioned it looks like there are to many cache evictions happening thereby increasing the number of Garbage Collection on your nodes. A frequent Garbage Collection (mainly the old GC) will consume lot of CPU. This in turn will start to affect all cluster.
As you have mentioned it started giving issues only after you added another node. This surprises me. Is there any increase in the traffic?.
Can you include the output of _stats API taken at the time when your cluster slows down. It will have lot of information from which I can make a better diagnosis. Also include a sample of the query.
I suggest you to install bigdesk so that you can have a graphical view of your cluster health more easily.
I setup 3 nodes of Cassandra (1.2.10) cluster on 3 instances of EC2 m1.xlarge.
Based on default configuration with several guidelines included, like:
datastax_clustering_ami_2.4
not using EBS, raided 0 xfs on ephemerals instead,
commit logs on separate disk,
RF=3,
6GB heap, 200MB new size (also tested with greater new size/heap values),
enhanced limits.conf.
With 500 writes per second, the cluster works only for couple of hours. After that time it seems like not being able to respond because of CPU overload (mainly GC + compactions).
Nodes remain Up, but their load is huge and logs are full of GC infos and messages like:
ERROR [Native-Transport-Requests:186] 2013-12-10 18:38:12,412 ErrorMessage.java (line 210) Unexpected exception during request java.io.IOException: Broken pipe
nodetool shows many dropped mutations on each node:
Message type Dropped
RANGE_SLICE 0
READ_REPAIR 7
BINARY 0
READ 2
MUTATION 4072827
_TRACE 0
REQUEST_RESPONSE 1769
Is 500 wps too much for 3-node cluster of m1.xlarge and I should add nodes? Or is it possible to further tune GC somehow? What load are you able to serve with 3 nodes of m1.xlarge? What are your GC configs?
Cassandra is perfectly able to handle tens of thousands small writes per second on a single node. I just checked on my laptop and got about 29000 writes/second from cassandra-stress on Cassandra 1.2. So 500 writes per second is not really an impressive number even for a single node.
However beware that there is also a limit on how fast data can be flushed to disk and you definitely don't want your incoming data rate to be close to the physical capabilities of your HDDs. Therefore 500 writes per second can be too much, if those writes are big enough.
So first - what is the average size of the write? What is your replication factor? Multiply number of writes by replication factor and by average write size - then you'll approximately know what is required write throughput of a cluster. But you should take some safety margin for other I/O related tasks like compaction. There are various benchmarks on the Internet telling a single m1.xlarge instance should be able to write anywhere between 20 MB/s to 100 MB/s...
If your cluster has sufficient I/O throughput (e.g. 3x more than needed), yet you observe OOM problems, you should try to:
reduce memtable_total_space_mb (this will cause C* to flush smaller memtables, more often, freeing heap earlier)
lower write_request_timeout to e.g. 2 seconds instead of 10 (if you have big writes, you don't want to keep too many of them in the incoming queues, which reside on the heap)
turn off row_cache (if you ever enabled it)
lower size of the key_cache
consider upgrading to Cassandra 2.0, which moved quite a lot of things off-heap (e.g. bloom filters and index-summaries); this is especially important if you just store lots of data per node
add more HDDs and set multiple data directories, to improve flush performance
set larger new generation size; I usually set it to about 800M for a 6 GB heap, to avoid pressure on the tenured gen.
if you're sure memtable flushing lags behind, make sure sstable compression is enabled - this will reduce amount of data physically saved to disk, at the cost of additional CPU cycles
I ran a search and the first time and it took 3-4 seconds.
I ran the same search second time and it took less than 100 ms (as expected as it used the cache)
Then I cleared the cache by calling "http://host:port/index/_cache/clear"
Next I ran the same search and was expecting it to take 3-4 seconds but it took less than 100 ms
So the clearing of the cache didn't work?
What exactly got cleared by that url?
How I do make ES do the raw search (i.e. no caching) every time?
I am doing as a part of some load testing.
Clearing the cache will empty:
Field data (used by facets, sorting, geo, etc)
Filter cache
parent/child cache
Bloom filters for posting lists
The effect you are seeing is probably due to the OS file system cache. Elasticsearch and Lucene leverage the OS file system cache heavily due to the immutable nature of lucene segments. This means that small indices tend to be cached entirely in memory by your OS and become diskless.
As an aside, it doesn't really make sense to benchmark Elasticsearch in a "cacheless" state. It is designed and built to operate in a cached environment - much of the performance that Elasticsearch is known for is due to it's excellent use of caching.
To be completely accurate, your benchmark should really be looking at a system that has fully warmed the JVM (to properly size the new-eden space, optimize JIT output, etc) and using real, production-like data to simulate "real world" cache filling and eviction on both the ES and OS levels.
Synthetic tests such as "no-cache environment" make little sense.
I don't know if this is what you're experiencing, but the cache isn't cleared immediately when you call clear cache. It is scheduled to be deleted in the next 60 seconds.
source: https://www.elastic.co/guide/en/elasticsearch/reference/1.5/indices-clearcache.html