What is the maximum size of a value that can be held in EhCache in all the storage tiers(Memory Store, Off-heap Store, Disk Store) ?
My ultimate question is , does EhCache is suitable to cache large file streams?
Ehcache does not have internal limits for storing very large values. The limit lives more on the heap memory your application can have. And that is because since Ehcache ALWAYS has an on heap tier and that values will be put in it when retrieved from the cache.
Ehcache is not designed to stream values from and to offheap or disk tiers, thus skipping the need to have enough heap to hold them.
Related
Assuming a single machine system with an in-memory indexing schema.
I am not able to find this info in ES docs. Does ES start swapping out the overflowing data, loads it when needed and continue working or it gives an error?
In-memory indices provide better performance at the cost of limiting the index size to the amount of available physical memory.
Via the 1.7 documentation. Memory stores are no longer available in 2.0+.
Under the hood it uses the Lucene RAMDirectory, which will just consume RAM (and eventually swap) until either you hit Java heap limits and ES crashes with out-of-memory errors, or the system gives up and oomkills the Elasticsearch process. Don't use in-memory indexes for large indexes, or for any situation where persistence is important.
We have a ES 1.6 cluster with 4 nodes used to store mostly logging data (~500 documents a second).
ES is configured with 10G of Heap but after numerous OutOfMemoryExceptions and stop-the-world GCs we limited the Field data cache to 10%.
My question is, why are all nodes' JVM constantly using ~9Gb Heap when field data (which i understand to be one of the primary users of Heap) is limited to 1Gb.
Some graphs:
It's worth pointing out our filter cache size is much smaller (~200Mb) and yes the aggressively limited Field data cache size does cause a lot of Field data cache evictions.
What else is using so much heap?
Thanks
I use OrientDB version 2.1.3 in embedded mode. Everything is more than fine (performance are very good compared to legacy H2 storage) but the storage space. I have very little information to store in the database and so I don't want the HDD to be wasted by temporary files.
In the database directory, I see the .wal file growing and growing (very fast). So I made some research on internet and end up with :
OGlobalConfiguration.DISK_CACHE_SIZE.setValue(16);
OGlobalConfiguration.WAL_CACHE_SIZE.setValue(16);
But this does nothing. The .wal file is keep growing and even when I delete it, it keeps growing more than 16 MB.
What can cause this file growing even with the conf set up ?
Is there a way to keep cache files under a known limit ?
There are no cache files in the database. Data files are cached to speed up system performance. As more RAM is allocated for disk cache, the faster your system will be. The amount of RAM allocated for disk cache does not affect WAL size.
The properties you have set are not related to the WAL size. Instead, you should set OGlobalConfiguration#WAL_MAX_SIZE property.
Also, the single WAL segment size is (OGlobalConfiguration#WAL_MAX_SEGMENT_SIZE) 128 megabytes so the size of the WAL can not be less than 128 megabytes, or more precisely, the value of that setting.
So, to wrap up, properties (OGlobalConfiguration#WAL_MAX_SEGMENT_SIZE, OGlobalConfiguration#WAL_CACHE_SIZE) should be set before any call to the OrientDB classes. Ideally, they should be set through system properties (storage.wal.maxSegmentSize and storage.wal.maxSize).
Please be aware that usage of such small values means that the disk cache will have to be forcefully flushed after very few operations to make possible to truncate database journal (WAL) and keep it in very small size.
How is the behavior of memory_only and memory_and_disk caching level in spark differ?
As explained in the documentation, Persistence levels in terms of efficiency:
Level Space used CPU time In memory On disk Serialized
-------------------------------------------------------------------------
MEMORY_ONLY High Low Y N N
MEMORY_ONLY_SER Low High Y N Y
MEMORY_AND_DISK High Medium Some Some Some
MEMORY_AND_DISK_SER Low High Some Some Y
DISK_ONLY Low High N Y Y
MEMORY_AND_DISK and MEMORY_AND_DISK_SER spill to disk if there is too much data to fit in memory.
Documentation says ---
Storage Level
Meaning
MEMORY_ONLY
Store RDD as deserialized Java objects in the JVM. If the RDD does not
fit in memory, some partitions will not be cached and will be
recomputed on the fly each time they're needed. This is the default
level.
MEMORY_AND_DISK
Store RDD as deserialized Java objects in the JVM. If the RDD does not
fit in memory, store the partitions that don't fit on disk, and read
them from there when they're needed.
MEMORY_ONLY_SER
Store RDD as serialized Java objects (one byte array per partition).
This is generally more space-efficient than deserialized objects,
especially when using a fast serializer, but more CPU-intensive to
read.
MEMORY_AND_DISK_SER
Similar to MEMORY_ONLY_SER, but spill partitions that don't fit in
memory to disk instead of recomputing them on the fly each time
they're needed.
DISK_ONLY
Store the RDD partitions only on disk.
MEMORY_ONLY_2, MEMORY_AND_DISK_2, etc.
Same as the levels above, but replicate each partition on two cluster
nodes.
OFF_HEAP (experimental)
Store RDD in serialized format in Tachyon. Compared to
MEMORY_ONLY_SER, OFF_HEAP reduces garbage collection overhead and
allows executors to be smaller and to share a pool of memory, making
it attractive in environments with large heaps or multiple concurrent
applications. Furthermore, as the RDDs reside in Tachyon, the crash of
an executor does not lead to losing the in-memory cache. In this mode,
the memory in Tachyon is discardable. Thus, Tachyon does not attempt
to reconstruct a block that it evicts from memory.
It means for Memory ONLY, spark will try to keep partitions in memory always. If some partitions can not be kept in memory, or for node loss some partitions are removed from RAM, spark will recompute using lineage information. In memory-and-disk level, spark will always keep partitions computed and cached. It will try to keep in RAM, but if it does not fit then paritions will be spilled to disk.
I come to know that cassandra uses blooms filter for performance ,and it stores these filter data into physical-memory.
1)Where does cassandra stores this filters?(in heap memory ?)
2)How much memory do these filters consumes?
When running, the Bloom filters must be held in memory, since their whole purpose is to avoid disk IO.
However, each filter is saved to disk with the other files that make up each SSTable - see http://wiki.apache.org/cassandra/ArchitectureSSTable
The filters are typically a very small fraction of the data size, though the actual ratio seems to vary quite a bit. On the test node I have handy here, the biggest filter I can find is 3.3MB, which is for 1GB of data. For another 1.3GB data file, however, the filter is just 93KB...
If you are running Cassandra, you can check the size of your filters yourself by looking in the data directory for files named *-Filter.db