cassandra key cache not working - caching

I am using DSE 6. I have default 100 MiB capacity for key cache. But There isn't any hit.
nodetool info image
I checked saved_caches_directory and there isn't any key cache file.
cache folder contents
My table's description has this attribute:
caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
and I have enough read the query to build cache.
I don't know where is my problem.

Related

Apache Ignite Backup cache Node identification

I have started using Apache Ignite for my current project. I have set up the ignite Cluster with 3 Server Nodes with Backup Cache count as 1. Ignite Client Node is able to create a primary Cache as well as Backup cache in the cluster. But here I want to know for a particular cache which is Primary node and on which Node the Backup Cache is stored. Is there any tool available or any Visor command to do so along with finding the size of each cache.
Thank you.
Visor CLI shows how many primary and backup partitions each node holds.
By default, a cache is split into 1024 partitions. You can change that by configuring affinity function.
You may take a look at control.sh and inspect some specific partition distribution.
--cache distribution nodeId|null [cacheName1,...,cacheNameN] [--user-attributes attrName1,...,attrNameN]
Prints the information about partition distribution.
This commands prints partition distribution across nodes.
Sample:
./control.sh --cache distribution null myCache
[groupId,partition,nodeId,primary,state,updateCounter,partitionSize,nodeAddresses]
[next group: id=1482644790, name=myCache]
1482644790,0,e27ad549,P,OWNING,0,0,[0:0:0:0:0:0:0:1, 10.0.75.1, 127.0.0.1, 172.23.45.97, 172.25.4.211]

How exactly Cassandra read procedure works?

I have a little experience with cassandra But I have one query regarding cassandra read process.
Suppose we have 7 sstables for a given table in our cassandra db now If we perform any read query which is not cached in memtable So Cassandra will look into the sstables. My question is:-
During this process will cassandra load all the sstables(7) into the memtable or It will just look into the all the sstables and will load relevant rows in memtable instead of loading all the sstables ?
Thanking you in advance!!
And please do correct me If I have interpreted something wrong.
And It also would be great If some one can explain/mention better resources to know about working of sstables.
During this process will cassandra load all the sstables(7)
No. Cassandra wouldn't load all the 7 SSTables. Each SSTable has a BloomFilter (in-memory) that tells the possibility for having the data in that SSTable.
If BloomFilter indicates a possibility of having the data in the SSTable, it looks into the partition key cache and gets the compression offset map (in-memory) to retrieve the compressed block that has the data we are looking for.
If found in the partition key cache, then the compressed block is read (I/O) to get the data.
If not found, it looks into partition summary to get the location of index entry and reads that location (I/O) into memory and continues with compression offset map flow earlier.
To start with, this Cassandra Reads link I think should help and depicts the flow pictorially. Capturing below the read path from above link for quick reference.
And one more thing, there is also a row cache which contains the hot rows (accessed frequently) and this will not result in hitting/loading the SSTable if found in the row cache.
Go through this rowcache link to understand row cache and partition key cache.
Another great presentation shared by Jeff Jirsa, Understanding Cassandra Table Options. Really worth going through it.
On a different note, there is compaction the happens periodically to reduce the number of SSTables and delete the rows based on tombstones.

Why SparkUI doesn't show memory usage for SparkSQL LRU cache?

When I run sql query spark-sql will use LRU cache.
Why LRU cache usage doesn't reflects in spark's WebUI? According responce time I feel my queries cached, but "Memory Used" says "0.0 B / 707.0 MB"
Spark version is 1.3.1
Spark does show the caching status.
It's available via Spark application UI on the "Storage" tab.
It will show the storage-level (cache type), number of cached partitions, size in memory & size on disk.
You didn't specify how you use Spark caching mechanism.
Spark caching has to be enabled explicitly.
You can check here how to enable/disable caching for Spark tables
You can also enable the cache for RDDs/DataFrams using:
rdd.cache() / df.cache()
rdd.persist(StorageLevel newLevel) / df.persist(StorageLevel newLevel)

Cloudera Impala performance test - Empty cache

I try to execute a performance test on a cloudera hadoop cluster. However, as far as Impala uses cache to store previous queries, how can I empty cache ?
Does Impala use caching?
Impala does not cache data but it does cache some table and file metadata. Although queries might run faster on subsequent iterations because the data set was cached in the OS buffer cache, Impala does not explicitly control this.
Quoted from : http://www.cloudera.com/content/cloudera/en/documentation/cloudera-impala/latest/topics/impala_faq.html#faq_performance_unique_1__faq_caching_unique_1
The file metadata caching is different from "query caching". It is just caching the locations of files and blocks in HDFS, which is something that most databases already know but Impala may not because it gets table/file metadata from Hive. The file metadata should be available to Impala in your tests.
Impala never caches queries, but file data may be cached in one of two ways:
You've enabled HDFS caching. I assume you're not doing this.
Some data read by HDFS may be in the OS buffer cache. Impala has no control over this. Some googling turns up guidance about clearing the Linux buffer caches, e.g. this unix.stackexchange.com answer.

How cassandra key caches works when same key is exists in more than one sstables?

1) As per datastax key cache stores the primary key index for rowkey.
2) In our case we have enough memory allocated for key cache and same key is present in multiple sstables with diffrent columns.
3) If no of calls are made to access all these same key from multiple sstables then how indexes are stored in key cache? will it store indexes for all the sstables OR just for the last sstable from which key is accessed recently?
From Doc
The key cache holds the location of keys in memory on a per-column
family basis.
Key cache serves as an index for a key in all sstables it is present.
Key cache is maintained per sstable. Hence key cache can save one disk seek per SSTable [minimum]. Every key lookup ends up hitting atleast the bloom filter of all sstable. On success key cache is verified just to skip the sstable index [pointers to key sample # interval of 127 by default] lookup.
Read Path of cassandra goes like this
Memtable -> Row Cache (Off heap) -> Bloom filter -> Key cache -> SSTable Index [if miss] -> Disk
Everything in bold means they are maintained in memory (either in heap or off heap). Hence they don't add up to disk seek
Every sstable should be maintaining its own key cache. Souce from slide no 101 and Source2 from slide no 23
Incase of key cache miss, sstable index is used - that will give the clue over which 128th range might the key lie. From then disk seek for key starts [can be 1 to many].
I'll update the answer once again if I get any clue on how does cassandra descide on key cache size of every sstable may be [key_cache_conf/no_of_sstables]?

Resources