Updates/Deletes in levelDB - key-value-store

The sorted SSTable data structure that levelDB uses is immutable. If so, how are records updated/deleted in levelDB ? Is this process done periodically ?

http://leveldb.googlecode.com/svn/trunk/doc/impl.html is short and covers these questions. See the compaction section especially.

Related

Spark RDD - is partition(s) always in RAM?

We all know Spark does the computation in memory. I am just curious on followings.
If I create 10 RDD in my pySpark shell from HDFS, does it mean all these 10 RDDs data will reside on Spark Workers Memory?
If I do not delete RDD, will it be in memory forever?
If my dataset(file) size exceeds available RAM size, where will data to stored?
If I create 10 RDD in my pySpark shell from HDFS, does it mean all these 10 RDD
data will reside on Spark Memory?
Yes, All 10 RDDs data will spread in spark worker machines RAM. but not necessary to all machines must have a partition of each RDD. off course RDD will have data in memory only if any action performed on it as it's lazily evaluated.
If I do not delete RDD, will it be in memory forever?
Spark Automatically unpersist the RDD or Dataframe if they are no longer used. In order to know if an RDD or Dataframe is cached, you can get into the Spark UI -- > Storage table and see the Memory details. You can use df.unpersist() or sqlContext.uncacheTable("sparktable") to remove the df or tables from memory.
link to read more
If my dataset size exceeds available RAM size, where will data to
stored?
If the RDD does not fit in memory, some partitions will not be cached and will be recomputed on the fly each time, when they're needed.
link to read more
If we are saying RDD is already in RAM, meaning it is in memory, what is the need to persist()? --As per comment
To answer your question, when any action triggered on RDD and if that action could not find memory, it can remove uncached/unpersisted RDDs.
In general, we persist RDD which need a lot of computation or/and shuffling (by default spark persist shuffled RDDs to avoid costly network I/O), so that when any action performed on persisted RDD, simply it will perform that action only rather than computing it again from start as per lineage graph, check RDD persistence levels here.
If I create 10 RDD in my Pyspark shell, does it mean all these 10 RDD
data will reside on Spark Memory?
Answer: RDD only contains the "lineage graph" (the applied transformations). So, RDD is not data!!! When ever we perform any action on an RDD, all the transformations are applied before the action. So if not explicitly (of course there are some optimisations which cache implicitly) cached, each time an action is performed the whole transformation and action are performed again!!!
E.g - If you create an RDD from HDFS, apply some transformations and perform 2 actions on the transformed RDD, HDFS read and transformations will be executed twice!!!
So, if you want to avoid the re-computation, you have to persist the RDD. For persisting you have the choice of a combination of one or more on HEAP, Off-Heap, Disk.
If I do not delete RDD, will it be in memory for ever?
Answer: Considering RDD is just "lineage graph", it will follow the same scope and lifetime rule of the hosting language. But if you have already persisted the computed result, you could unpersist!!!
If my dataset size exceed available RAM size, where will data to stored?
Answer: Assuming you have actually persisted/cached the RDD in memory, it will be stored in memory. And LRU is used to evict data. Refer for more information on how memory management is done in spark.

Spark: how long does it keep RDD in cache

For example, I cached a number of RDDs in memory.
Then I leave the application for a few days or more.
And then I try to access cached RDDs.
Will they still be in memory?
Or Spark will clean unused cached RDDs after some period of time.
Please help!
Spark automatically monitors cache usage on each node and drops out old data partitions in a least-recently-used (LRU) fashion. If you would like to manually remove an RDD instead of waiting for it to fall out of the cache, use the RDD.unpersist() method.

Fraction cached larger than 100%

I have the following Spark job, some RDD have RDD fraction cached more than 100%. How can this be possible? What did I miss? Thanks!
I believe this is because you can have the same partition cached in multiple locations. See SPARK-4049 for more details.
EDIT:
I'm wondering if maybe you have speculative execution (see spark.speculation) set? If you have straggling tasks, they will be relaunched which I believe will duplicate a partition. Also, another useful thing to do might be to call rdd.toDebugString which will provide lots of info on an RDD including transformation history and number of cached partitions.

How consistency works in HBase

From the CAP, I read HBase supports consistency and partition tolerance.
I would like to know how consistency is achieved in HBase. Any locks are applied?
I checked online didn't find good material on this.
Could any body provide any blogs/articles on this topic.
Access to row data is atomic and includes any number of columns being read or written
to. There is no further guarantee or transactional feature that spans multiple rows or
across tables. The atomic access is a factor to this architecture being
strictly consistent, as each concurrent reader and writer can make safe assumptions
about the state of a row.
When data is updated it is first written to a commit log, called a write-ahead log (WAL)
in HBase, and then stored in the (sorted by RowId) in-memory memstore. Once the data in memory has
exceeded a given maximum value, it is flushed as an HFile to disk. After the flush, the
commit logs can be discarded up to the last unflushed modification.
Thus a lock is needed only to protect the row in RAM.
The answer provided by Evgeny is correct but very incomplete.
Contrary to what you wrote, there are many resources and blog articles and good material concerning this specific aspect. The tricky part is to aggregate separate information and make your own synthesis.
Consistency is dealt with in HBase at many levels, and you need to understand those different levels to get a good global understanding of how it is managed.
HBase it a complex beast, give it time.
You can start by reading about Read/Write Path, Timeline-consistent High Available Reads, and Region Replication.
https://hbase.apache.org/book.html#arch.timelineconsistent.reads
https://mapr.com/blog/in-depth-look-hbase-architecture/

What is HBase compaction-queue-size at all?

Any one knows what regionserver queue size is meant?
By doc's definition:
9.2.5. hbase.regionserver.compactionQueueSize Size of the compaction queue. This is the number of stores in the region that have been
targeted for compaction.
It is the number of Store(or store files? I have heard two version of it) of regionserver need to be major compacted.
I have a job writing data in a hotspot style using sequential key(non distributed).
and I saw inside the metric history discovering that at a time it happened a compaction-queue-size = 4.
That's theoretically impossible since I have only one Store to write(sequential key) at any time.
Then I dig into the log ,found there is any hint about queue size > 0:
Every major compaction say "This selection was in queue for 0sec"
013-11-26 12:28:00,778 INFO
[regionserver60020-smallCompactions-1385440028938]
regionserver.HStore: Completed major compaction of 3 file(s) in f1 of
myTable.key.md5....
into md5....(size=607.8 M), total size for
store is 645.8 M. This selection was in queue for 0sec, and took 39sec
to execute.
Just more confusing is : Isn't multi-thread enabled at earlier version and just allocate each compaction job to a thread ,by this reason why there exists compaction queue ?
Too bad that there's no detail explanation in hbase doc.
I don't fully understand your question. But let me attempt to answer it to the best of my abilities.
First let's talk about some terminology for HBase. Source
Table (HBase table)
Region (Regions for the table)
Store (Store per ColumnFamily for each Region for the table)
MemStore (MemStore for each Store for each Region for the table)
StoreFile (StoreFiles for each Store for each Region for the table)
Block (Blocks within a StoreFile within a Store for each Region for the table)
A Region in HBase is defined as the Rows between two row key's. If you have more than one ColumnFamily in your Table, you will get one Store per ColumnFamily per Region. Every Store will have a MemStore and 0 or more StoreFiles
StoreFiles are created when the MemStore is flushed. Every so often, a background thread will trigger a compaction to keep the number of files in check. There are two types of compactions: major and minor. When a Store is targeted for a minor compaction, it will also pick up some adjacent StoreFiles and rewrites them as one. A minor compaction will not remove deleted/expired data. If a minor compaction picks up all StoreFiles in a Store, it's promoted to a major compaction. In a major compaction, all StoreFiles of a Store are rewritten as one StoreFile.
Ok... so what is a Compaction Queue?
It is the number of Stores in a RegionServer that have been targeted for compaction. Similarly a Flush Queue is the number of MemStores that are awaiting flush.
As to the question of why there is a queue when you can do it asynchronously, I have no idea. This would be a great question to ask on the HBase mailing list. It tends to have faster response times.
EDIT: The compaction queue is there to not take up 100% of the resources of a RegionServer.

Resources