Will an in-memory only Aerospike cluster composed of n nodes automatically replicate data across nodes, and in this case, is there a guarantee that no data will be written to disk?
Will an in-memory only Aerospike cluster composed of n nodes automatically replicate data across nodes?
Yes, assuming you are talking about storage-engine memory and not storage-engine device with data-in-memory true.
And in this case, is there a guarantee that no data will be written to disk?
Your records will not be written to disk. Logs and SMD (system meta data) will be written to disk.
Aerospike uses Smart Partition algorithm using RIPEMD160 which takes care of even data distribution and replication across the cluster. If replication factor is configured correctly, it will distribute data correctly/evenly in the cluster.
Only in Persistence mode Aerospike expects to be provided a disk storage file to persist the data into. If Aerospike is configured for In-Memory storage, there is no option of providing disk storage file which essentially means it does not persist the data to the disk.
namespace testreplication {
# memory-size 4G # 4GB of memory to be used for index and data
# replication-factor 2 # For multiple nodes, keep 2 copies of the data
# high-water-memory-pct 60 # Evict non-zero TTL data if capacity exceeds
# 60% of 4GB
# stop-writes-pct 90 # Stop writes if capacity exceeds 90% of 4GB
# default-ttl 0 # Writes from client that do not provide a TTL
# will default to 0 or never expire
# storage-engine memory # Store data in memory only
}
Related
We have 10 node HDFS (Hadoop - 2.6, cloudera - 5.8) cluster, and 4 are of disk size - 10 TB and 6 node of disk size - 3TB. In that case, Disk is constantly getting full on small size disk nodes, however the disk is free available on high disk size nodes.
I tried to understand, how namenode writes data/block to different disk size nodes. whether it is equally divided or some percentage of data getting written.
You should look at dfs.datanode.fsdataset.volume.choosing.policy. By default this is set to round-robin but since you have an asymmetric disk setup you should change it to available space.
You can also fine tune disk usage with the other two choosing properties.
For more information see:
https://www.cloudera.com/documentation/enterprise/5-8-x/topics/admin_dn_storage_balancing.html
I'm reading GFS paper but unable to understand one point, does master maintains 64kb of metadata for each replica of file(s) too? Say if master's memory is 8 gb and I store 1000 files of 1 kb each, how much memory it's going to take? if replication factor is 3.
GFS maintains less than 64 bytes of metadata for each 64 MB chunk, not for an individual file. Each replica costs the same overhead of metadata. Therefore, how much memory 1000 files takes depends on how many chunks in total for those files.
No. The main metadata for each replica are only on chunk servers' memory. Master only store 2 types of chunk metadata.
the chunk handle which less than 64 bytes of metadata for each 64 MB chunk
the location of each chunk which maintained by HeartBeat between chunk servers and master.
Here's details from the paper.
The master stores three major types of metadata: the file and chunkna mespaces, the apping from files to chunks, and the locations of each chunk’s replicas. All metadata is kept in the master’s memory.
The master does not keep a persistent record of which chunkservers have a replica of a given chunk. It simply polls chunkservers for that information at startup. The master can keep itself up-to-date thereafter because it controls all chunk placement and monitors chunkserver status with regular HeartBeat messages.
The most important sentence:
a chunkserver has the final word over what chunks it does or does not have on its own disks.
How is the behavior of memory_only and memory_and_disk caching level in spark differ?
As explained in the documentation, Persistence levels in terms of efficiency:
Level Space used CPU time In memory On disk Serialized
-------------------------------------------------------------------------
MEMORY_ONLY High Low Y N N
MEMORY_ONLY_SER Low High Y N Y
MEMORY_AND_DISK High Medium Some Some Some
MEMORY_AND_DISK_SER Low High Some Some Y
DISK_ONLY Low High N Y Y
MEMORY_AND_DISK and MEMORY_AND_DISK_SER spill to disk if there is too much data to fit in memory.
Documentation says ---
Storage Level
Meaning
MEMORY_ONLY
Store RDD as deserialized Java objects in the JVM. If the RDD does not
fit in memory, some partitions will not be cached and will be
recomputed on the fly each time they're needed. This is the default
level.
MEMORY_AND_DISK
Store RDD as deserialized Java objects in the JVM. If the RDD does not
fit in memory, store the partitions that don't fit on disk, and read
them from there when they're needed.
MEMORY_ONLY_SER
Store RDD as serialized Java objects (one byte array per partition).
This is generally more space-efficient than deserialized objects,
especially when using a fast serializer, but more CPU-intensive to
read.
MEMORY_AND_DISK_SER
Similar to MEMORY_ONLY_SER, but spill partitions that don't fit in
memory to disk instead of recomputing them on the fly each time
they're needed.
DISK_ONLY
Store the RDD partitions only on disk.
MEMORY_ONLY_2, MEMORY_AND_DISK_2, etc.
Same as the levels above, but replicate each partition on two cluster
nodes.
OFF_HEAP (experimental)
Store RDD in serialized format in Tachyon. Compared to
MEMORY_ONLY_SER, OFF_HEAP reduces garbage collection overhead and
allows executors to be smaller and to share a pool of memory, making
it attractive in environments with large heaps or multiple concurrent
applications. Furthermore, as the RDDs reside in Tachyon, the crash of
an executor does not lead to losing the in-memory cache. In this mode,
the memory in Tachyon is discardable. Thus, Tachyon does not attempt
to reconstruct a block that it evicts from memory.
It means for Memory ONLY, spark will try to keep partitions in memory always. If some partitions can not be kept in memory, or for node loss some partitions are removed from RAM, spark will recompute using lineage information. In memory-and-disk level, spark will always keep partitions computed and cached. It will try to keep in RAM, but if it does not fit then paritions will be spilled to disk.
I have been doing some reading on real time processing using hadoop and stumbled upon this http://www.scaleoutsoftware.com/hserver/
From what the documentation says, it looks like they implemented an in memory data grid using the hadoop worker/slave nodes. I have couple of questions here
From my understanding, if i have a data of size 100 GB, i would atleast need 100GB of ram across all nodes on my cluster just for the data + additional ram for task tracker, data node daemons + additional ram for the hServer service that would run on all these nodes. Is my understanding correct?
The software claims they can do real-time data processing by improving the latency issues in hadoop. Is it because, it allows us to write data to the in-memory grid instead of HDFS?
I am new to Big Data technologies. Apologize if some of the questions are naive.
[Full disclosure: I work at ScaleOut Software, the company which created ScaleOut hServer.]
In-memory data grids create a replica for every object to ensure high availability in case of failures.The aggregate amount of memory that is required is the memory used to store the objects with the addition of the memory used to store object replicas. In your example, you will need 200 GB of total memory: 100 GB for objects and 100 GB for replicas. For example, in a four-server cluster, each server needs 50 GB of memory available to the ScaleOut hServer service.
With the current release, ScaleOut hServer takes the first step in enabling real-time analytics by speeding up data access. It does this in two ways, which are implemented using different input/output formats. The first mode of operation uses the grid as a cache for HDFS, and the second uses the grid as the primary storage for a data set, providing support for fast-changing, memory-based data. Accessing data using an in-memory data grid reduces latency by eliminating disk I/O and minimizing network overhead. Also, caching HDFS data provides an additional performance boost by storing keys and values generated by the record reader instead of raw HDFS files in the grid.
I need some help improving Cassandra read performance. I am concerned about degradation of read performance as the size of the column family increases. We have the following stats on single-node Cassandra.
Operating System: Linux - CentOS release 5.4 (Final)
Cassandra version: apache-cassandra-1.1.0
Java version: "1.6.0_14"
Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode)
Cassandra Configuration: (cassandra.yaml)
rpc_server_type: hsha
disk_access_mode: mmap
concurrent_reads: 64
concurrent_writes: 32
Platform: Amazon-ec2/Rightscale m1.Xlarge instance with 4 ephemeral disks with raid0. (15 GB Total Memory, 4 Virtual Cores, 2 ECU , Total ECU = 8)
Experiment configurations:
I have tried to do some experiments with GC
Cassandra config:
10 GB RAM is allocated to Cassandra Heap, 3500MB is Heap NEW size.
JVM Config:
JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=1000"
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=0"
JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=40"
JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedOops"
Result stats from OpsCenter community 2.0:
Read Requests 208 to 240 per second
Write Requests 18 to 28 per second
OS Load 24.5 to 25.85
Write Request Latency 127 to 160 micros
Read Request Latency 82202 to 94612 micros
OS Sent Network Traffic 44646 KB avg per second
OS Recieved Network Traffic 4338 KB avg per second
OS Disk Queue Size 13 to 15 requests
Read Requests Pending 25 to 32
OS Disk latency 48 to 56 ms
OS Disk Read Throughput 4.6 Mb per second
Disk IOPs Reads 420 per second
IOWait 80 % CPU avg
Idle 13 % CPU avg
Rowcache is disabled.
The Column Family
One of the column family i am only reading from is created through CLI
create column family XColFam
with column_type='Standard'
and comparator = CompositeType(BytesType,IntegerType)';"
Column family SSTable Size = 7.10 GB, SSTable Count = 2
XColFam column family has 59499904 no. of estimated row keys (most are utf8 literal with varying length, estimated through mx4jtools) with columns like thin in nature, with the value 0 bytes.....now.
Most of the rows should have very small number of columns, maybe 1 to 10, so with approx 20 to 30 bytes of 1st component of column name and 2nd is of 8 bytes integer....2nd component of composite column is dynamic could repeat but probability is low.......1st component repeats in varieties but number of columns in rows could be different.
I have tried SnappyCompression to compress the column family but there was no change in size.
I have a scheduled service that run for hours with 20 threads and make random read requests for multiple keys (for now its 2 keys per request) to this column family and read full rows, no column slice or etc.
I think it is not performing good now because it is processing too few request per minute. It was working better before when the column family size was not that big. It was around 3 to 4 GB.
I am afraid read performance degrade too fast with the increase in size of the column family.
I have also tried to tweak some GC and memory stuff, because before that I was having lots GC and CPU usage. When data size was smaller and there was very small iowait in wave form.
How can I increase the Cassandra performance. Your suggestions will be appreciated.
Look cassandra is relative I/O dependent.EC instances have "insuficient" I/O by design (Xen virtualization)
And my first recomendation is to use Cassandra on real hardware, where you have a control. e.g u can use SSD disk for CommitLog. Look at Cassandra hardware proposals.
However, switching to own hardware is a bit a radical option. To stay with Amazon try EBS
Amazon Elastic Block Store (EBS) provides block level storage volumes
for use with Amazon EC2 instances. Amazon EBS volumes are
network-attached, and persist independently from the life of an
instance. Amazon EBS provides highly available, highly reliable,
predictable storage volumes that can be attached to a running Amazon
EC2 instance and exposed as a device within the instance. Amazon EBS
is particularly suited for applications that require a database, file
system, or access to raw block level storage.
Amazon EBS allows you to create storage volumes from 1 GB to 1 TB that can be mounted as devices by Amazon EC2 instances. Multiple volumes can be mounted to the same instance. Amazon EBS enables you to provision a specific level of I/O performance if desired, by choosing a Provisioned IOPS volume. This allows you to predictably scale to thousands of IOPS per Amazon EC2 instance.
Also check out Cassandra Performance Testing on EC2
Short Answer: Row Cache and Key Caches.
If your data contains subsets that will be frequently read like most systems try to use row caches and key caches.
Row caches is a in memory cache, which stores the frequently read rows completely in memory. Please keep in mind, that this may have not a desired effect if you are data is spread out.
Key caches are generally more suited as it only stores the partition keys and their offsets on disk. This generally will help skip a lookup by Cassandra(no need to use partition indexes and partition summaries).
Try enabling key cache with the keyspace and table and check out your performance.