How to remove the key and value from the disk store in geode? - disk

While using Apache Geode/Gemfire implementation, my requirement is to not only have the IMDG functionality but also I want to commit the values to disk-stores. In case, all of my Geode server go down, I would like to to bring them back and have a persistence key-values.
Now, when i remove a key from the cache, the key and value is removed from all the clustered cache (or say I use destroy mode for eviction action). However, the disk space is not reclaimed or reduced; so, if I continue to use persistence_overflow for regions, how should I clear up the disk space as well to accommodate new entries?
Again, i understand there is compact option, but then, i am not looking for compaction, i want to completely remove the key/value pair and reclaim the disk space?

As far as I know, this is not configurable under the current implementation: each GemFire/Geode member configured with a disk-store creates the oplog file occupying whatever was configured through the max-oplog-size property, which defaults to 1GB. During regular execution, obsolete operations are only removed from the oplogs during compaction, as explained in Design Your Disk Stores.
Hope this helps, cheers.

As Juan mentioned, compaction is the way that Geode frees up disk space.
You can tune the compaction to free up disk space more quickly at the expense of performance. You can decrease the max-oplog-size and increase the compaction-threshold to reclaim disk space faster.
For most use cases I would recommend leaving the compaction-threshold alone. The defaults are tuned for maximum write throughput and no more than 50% garbage on disk.

Related

Optimal RocksDB configuration for use as secondary "cache"

I am looking at using RocksDB (from Java in my case) as a secondary "cache" behind a RAM based first level cache. I do not expect any items in RocksDB to be dramatically more commonly accessed than others (all the really frequently used items will be in the first level cache) and there will be no "locality" (if there is such a concept in RocksDB?) as the "next" key in sequence is no more likely to be accessed next than any other so I would like to optimize RocksDB for "truly random access" by for instance reading as little data as possible each time, not have any "cache" in Rocks etc.
All suggestions of configurations are appreciated!
The defaults should be more than enough for your use case - but you can increase the block size and pin index and filter blocks
You can also call optimizeForPointLookup if you only are going to do puts and gets to optimize even further

Cassandra client code with high read throughput with row_cache optimization

Can someone point me to cassandra client code that can achieve a read throughput of at least hundreds of thousands of reads/s if I keep reading the same record (or even a small number of records) over and over? I believe row_cache_size_in_mb is supposed to cache frequently used records in memory, but setting it to say 10MB seems to make no difference.
I tried cassandra-stress of course, but the highest read throughput it achieves with 1KB records (-col size=UNIFORM\(1000..1000\)) is ~15K/s.
With low numbers like above, I can easily write an in-memory hashmap based cache that will give me at least a million reads per second for a small working set size. How do I make cassandra do this automatically for me? Or is it not supposed to achieve performance close to an in-memory map even for a tiny working set size?
Can someone point me to cassandra client code that can achieve a read throughput of at least hundreds of thousands of reads/s if I keep reading the same record (or even a small number of records) over and over?
There are some solution for this scenario
One idea is to use row cache but be careful, any update/delete to a single column will invalidate the whole partition from the cache so you loose all the benefit. Row cache best usage is for small dataset and are frequently read but almost never modified.
Are you sure that your cassandra-stress scenario never update or write to the same partition over and over again ?
Here are my findings: when I enable row_cache, counter_cache, and key_cache all to sizable values, I am able to verify using "top" that cassandra does no disk I/O at all; all three seem necessary to ensure no disk activity. Yet, despite zero disk I/O, the throughput is <20K/s even for reading a single record over and over. This likely confirms (as also alluded to in my comment) that cassandra incurs the cost of serialization and deserialization even if its operations are completely in-memory, i.e., it is not designed to compete with native hashmap performance. So, if you want get native hashmap speeds for a small-working-set workload but expand to disk if the map grows big, you would need to write your own cache on top of cassandra (or any of the other key-value stores like mongo, redis, etc. for that matter).
For those interested, I also verified that redis is the fastest among cassandra, mongo, and redis for a simple get/put small-working-set workload, but even redis gets at best ~35K/s read throughput (largely independent, by design, of the request size), which hardly comes anywhere close to native hashmap performance that simply returns pointers and can do so comfortably at over 2 million/s.

Understanding elasticsearch jvm heap usage

Folks,
I am trying reduce my memory usage in my elasticsearch deployment (Single node cluster).
I can see 3GB JVM heap space being used.
To optimize I first need to understand the bottleneck.
I have limited understanding of how is JVM usage is split.
Field data looks to consume 1.5GB and filter cache & query cache combined consume less than 0.5GB, that adds upto 2GB at the max.
Can someone help me understand where does elasticsearch eats up rest of 1GB?
I can't tell for your exact setup, but in order to know what's going on in your heap, you can use the jvisualvm tool (bundled with the jdk) together with marvel or the bigdesk plugin (my preference) and the _cat APIs to analyze what's going on.
As you've rightly noticed, the heap hosts three main caches, namely:
the fielddata cache: unbounded by default, but can be controlled with indices.fielddata.cache.size (in your case it seems to be around 50% of the heap, probably due to the fielddata circuit breaker)
the node query/filter cache: 10% of the heap
the shard request cache: 1% of the heap but disabled by default
There is nice mindmap available here (Kudos to Igor KupczyƄski) that summarizes the roles of caches. That leaves more or less ~30% of the heap (1GB in your case) for all other object instances that ES needs to create in order to function properly (see more about this later).
Here is how I proceeded on my local env. First, I started my node fresh (with Xmx1g) and waited for green status. Then I started jvisualvm and hooked it onto my elasticsearch process. I took a heap dump from the Sampler tab so I can compare it later on with another dump. My heap looks like this initially (only 1/3 of max heap allocated so far):
I also checked that my field data and filter caches were empty:
Just to make sure, I also ran /_cat/fielddata and as you can see there's no heap used by field data yet since the node just started.
$ curl 'localhost:9200/_cat/fielddata?bytes=b&v'
id host ip node total
TMVa3S2oTUWOElsBrgFhuw iMac.local 192.168.1.100 Tumbler 0
This is the initial situation. Now, we need to warm this all up a bit, so I started my back- and front-end apps to put some pressure on the local ES node.
After a while, my heap looks like this, so its size has more or less increased by 300 MB (139MB -> 452MB, not much but I ran this experiment on a small dataset)
My caches have also grown a bit to a few megabytes:
$ curl 'localhost:9200/_cat/fielddata?bytes=b&v'
id host ip node total
TMVa3S2oTUWOElsBrgFhuw iMac.local 192.168.1.100 Tumbler 9066424
At this point I took another heap dump to gain insights into how the heap had evolved, I computed the retained size of the objects and I compared it with the first dump I took just after starting the node. The comparison looks like this:
Among the objects that increased in retained size, he usual suspects are maps, of course, and any cache-related entities. But we can also find the following classes:
NIOFSDirectory that are used to read Lucene segment files on the filesystem
A lot of interned strings in the form of char arrays or byte arrays
Doc values related classes
Bit sets
etc
As you can see, the heap hosts the three main caches, but it is also the place where reside all other Java objects that the Elasticsearch process needs and that are not necessarily cache-related.
So if you want to control your heap usage, you obviously have no control over the internal objects that ES needs to function properly, but you can definitely influence the sizing of your caches. If you follow the links in the first bullet list, you'll get a precise idea of what settings you can tune.
Also tuning caches might not be the only option, maybe you need to rewrite some of your queries to be more memory-friendly or change your analyzers or some fields types in your mapping, etc. Hard to tell in your case, without more information, but this should give you some leads.
Go ahead and launch jvisualvm the same way I did here and learn how your heap is growing while your app (searching+indexing) is hitting ES and you should quickly gain some insights into what's going on in there.
Marvel only plots some instances on the heap which needs to be monitored like Caches in this case.
The caches represent only a portion of the total heap usage. There are a lot many other instances which will occupy the heap memory and those may not have a direct plotting on this marvel interface.
Hence, Not all heap occupied in ES is only by the cache.
In order to clearly understand the exact usage of heap by different instances, you should take heap dump of the process and then analyze it using a Memory Analyzer tool which can provide you with the exact picture.

Is paddingFactor make my update slow?

I have a mongodb instance ,db name:"bnccdb" ,collection name:"AnalysedLiterture" ,document size:6 million.And also ,there is always a lightweight background daemon process which is used to crawl data from the internet and insert into this collection(the insert frequency is very low,about 1-2 documents is inserted per second,so have little influence on db performance).I used db.AnalysedLiterature.stats() to
see this collection's configuration information:
.It show that the paddingFactor is very close to 2.0.
And now , I have another process, which operation is adding two keys to each document in this collection.But it is a pity that the update operation is extremely slow.It really make me confused.When this update process run,the mongostat output is:
you can see that the result of faults and locked db is really high ,it means that database workload is really high.
I really cannot get the reason.I doubt ,since there is always a lightweight daemon process inserting data to this collection ,so the mongodb change the paddingFactor from 1 to a larger value(1.9..).And since paddingFactor is very high , every time my process do update operations(adding two keys to each document),db will reclaim disk space for the padding , thus make a big read/write overhead.
Anyone can give me some suggestion?
Please.
The reason for your padding factor being so high is because of your updates. MongoDB uses this value to "over allocate" space for documents so that they can be updated and grown in place without needing to be moved to a larger space within MongoDBs storage system. This means that your updates have been growing the documents, requiring that they be pulled out from their existing space on disk and moved to another new larger space. The old space is left behind for re-use, but often these are not re-used as efficiently as they can be.
A padding factor of 2 would mean that MongoDB is allocating twice the space needed for each document, suggesting that your system has performed a very large number of updates and moves.
You should look to enable powerOf2Sizes, which will make your space allocations uniform and thus make space re-use better. Once you have enabled this setting you should resync or repair your database to rebuild it from scratch as the new allocation system will only effect new documents.

Why does ElasticSearch Heap Size not return to normal?

I have setup elasticsearch and it works great.
I've done a few bulk inserts and did a bit of load testing. However, its been idle for a while and I'm not sure why the Heap size doesn't reduce to about 50mb which is what it was when it started? I'm guessing GC hasn't happened?
Please note the nodes are running on different machines on AWS. They are all on small instances and each instance has 1.7GB of RAM.
Any ideas?
Probably. Hard to say, the JVM manages the memory and does what it thinks is best. It may be avoiding GC cycles because it simply isn't necessary. In fact, it's recommended to set mlockall to true, so that the heap is fully allocated at startup and will never change.
It's not really a problem that ES is using memory for heap...memory is to be used, not saved. Unless you are having memory problems, I'd just ignore it and continue on.
ElasticSearch and Lucene maintains cache data to perform fast sorts or facets.
If your queries are doing sorts, this may increase the Lucene FieldCache size which may not be released because objects here are not eligible for the GC.
So the default threshold (CMSInitiatingOccupancyFraction) of 75% do not apply here.
You can manage FieldCache duration as explained here : http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html

Resources