Aerospike HDD/Memory usage - memory-management

I'm exploring Aerospike as key-value DB with storing data on disk for safety. Please confirm, that I understand this correctly:
If in namespace configuration I set:
storage-engine device
memory-size 4G
file /opt/aerospike/data/namespace.dat
filesize 16G
data-in-memory false
-> all data will be on disk only, "memory-size" is for indexes only (small usage), all data will be stored in multiple 16GB files (which will be creating automatically), and most important - every read query will trigger reading data from disk?
If in namespace configuration I set:
storage-engine device
memory-size 4G
file /opt/aerospike/data/namespace.dat
filesize 16G
data-in-memory true
-> all data will be on disk and partly in memory, "memory-size" will act like cache and contain 4GB of most used data, all data will be stored in multiple 16GB files (which will be creating automatically), and most important - every read query will trigger checking data from memory and if missing -> reading from disk and adding to memory? What data will be in memory - most used or latest created?
If in namespace configuration I set:
storage-engine memory
memory-size 4G
data-in-memory true
-> all data will be in memory only, I'm limited to 4GB of data and no more?

Aerospike doesn't shuffle data in and out of disk like first generation NoSQL databases do, ones that have a "cache-first" architecture. Aerospike's hybrid memory architecture is such that the primary index (metadata) is always in memory. Depending on the namespace configuration, the data is stored fully on disk or in memory. You define storage for each namespace. If it is in-memory all the data and metadata is in-memory, fully. if the namespace stores its data on a few devices (/dev/sdb, /dev/sdc) the primary index (metadata) is fully in memory and the data is fully on those SSDs.
(1) is data on HDD, and the configuration is correct. If you're using an SSD you probably want to use device instead of file. One thing that isn't true in your question is that Aerospike will first check the post-write-queue on a read.
Aerospike does block writes to optimize around the high-read / low-write performance of HDD and SSD. The size of the block is determined by the write-block-size config parameter (should be 1MB for a HDD). The records are first loaded into a streaming write buffer of an equivalent size. After the buffer is flushed to a block on disk, Aerospike doesn't get rid of this in-memory copy immediately; it remains part of the post-write queue (FIFO). By default, 256 of those blocks are in the queue per-device, or per-file (you can define multiple file lines as the storage device). If your usage pattern is such that reads follow closely after the writes, you'll be getting in-memory access instead of disk access. If your cache_read_pct metric is not single digits and you have DRAM to spare, you probably can benefit from raising the post-write-queue value (max of 2048 blocks per-device).
(2) is an in-memory namespace, persisted to disk. For both (1) and (2) you can use either file (for filesystem based storage) or device (for raw device). Both the primary index (metadata) and storage (data) are in memory for (2). All reads and writes come out of memory, and a secondary write-through goes to the persistence device.
filesize reserves the size of the persistence layer on the filesystem (if you chose to use file and not device). You can have multiple file lines, each of which will be sized from the start to the number given as filesize. memory-size is the maximum amount of memory used by the namespace. This isn't pre-reserved. Aerospike will grow and shrink in memory usage over time, with the maximum for the namespace being its memory-size.
Take a look at What's New in 3.11, specifically the section that touches on in-memory performance improvements. Tuning partition-tree-sprigs and partition-tree-locks will likely boost the performance of your in-memory namespaces.
(3) is a purely in-memory namespace, usually intended to be a cache. The 4G limit affects things such as stop-writes-pct, high-water-memory-pct as those are defined as a percentage of that limit (see evictions, expirations, stop-writes).
There's also a (4) special-case for counters called data-in-index. See storage engine configuration recipes.

Related

Can chronicle-map handle data larger than memory?

I'm a bit confused by how off heap memory works. I have a server that has 32GB ram, and a data set of key-value mappings about 1TB in size. I'm looking for a simple and fast embedded Java database that would allow me to map a key to a value according to this 1TB dataset, which will mostly have to be read from disk. Each entry in this data set is small (<500 bytes), so I think using a file system would be ineffecient.
I'd like to use Chronicle Map for this. I read that off heap memory usage can exceed ram size and that it interacts with the filesysytem somehow, but at the same time, Chronicle Map is described as an in memory database. Can Chronicle Map handle the 1TB data set for my server, or am I only limited to using data sets 32GB or less?
The answer is it depends on your operating system. On Windows a Chronicle Map must fit inside main memory, however on Linux and MacOSX it doesn't have fix in main memory (the difference is how memory mapping is implemented on these OSes) Note: Linux even allows you to map a region larger than your disk space (MacOSX and Windows doesn't)
So on Linux you could map 1 TB or even 100 TB on a machine with 32 GB of memory. It is important to remember that your access pattern and your choice of drive will be critical to performance. If you generally access the same data most of the time and you have SSD this will perform well. If you have spinning disk and a random access pattern, you will be limited by the speed of your drive.
Note: we have tested Chronicle Map to 2.5 billion entries and it performs well as it uses 64-bit hashing of keys.

Ignite uses more memory than expected

I am using Ignite to build a framework for data calculation. One big problem is the memory usage is a little more than expected. The data using 1G memory outside Ignite will use more than 1.5G in Ignite cache.
I turned off backup and copyOnRead already. I don't use query feature so no extra index space. I also counted in the extra space used for each cache and cache entry. The total memory usages still doesn't add up.
The data value for each cache entry is a big map contains list of primitive arrays. Each entry is about 120MB.
What can be the problem? The data structure or the configuration?
Ignite does introduce some overhead to your data and half of a GB doesn't sound too bad too me. I would recommend you to refer to this guide for more details: https://apacheignite.readme.io/docs/capacity-planning
Difference between expected and real memory usage arises from 2 main points:
Each entry takes constant overhead consists of objects providing support for processing entries in distributed computing environment.
E.g. you can declare integer local variable, it takes 4 bytes in the stack, but it's hard to make the variable long live and accessible from other places of program. So you have to create new Integer object, which consumes at least 16 bytes (300% overhead isn't it?). Going further, if you want to make this object mutable and safely acsessible by multiple threads, you have to create new AtomicReference and store your object inside. Total memory consumption will be at least 32 bytes... and so on. Every time we're extending object functionality, we get additional overhead, there is no other way.
Each entry stored inside a cache in a special serialized format. So the actual memory footprint of an entry depends on the format is used. By default Ignite uses BinaryMarshaller to convert an object to the byte array, and this array is stored inside a BinaryObject.
The reason is simple, distributed computing systems continiously exchange entries between nodes, and every entry in cache should be ready to be transferred as a byte array.
Please, read the article, it was recently updated. You could estimate entry overhead for small entries by hand, but for big entries you should inspect actual entry stored in the cache as a byte array. Look at the withKeepBinary method.

statement_mem seems to limit the node memory instead of the segment memory

According to the GreenPlum documentation, GUCs such as statement_mem, gp_vmem_protect_limit should work at segment level. Same thing should happen with a resource queue memory allowance.
On our system we have 8 primary segments per node. So if I set the statement_mem of a query to 2GB I would expect the query to consume (if needed) up to 2GB x 8 = 16GBs of RAM. But it seems that it would only use 2GBs total per node before starting to write into disk (that's it 2GB/8 per segment). I tried with different statement_values and same thing.
max_statement_mem or gp_vmem_protect_limit limits are never reached. RAM usage on nodes have been monitored using various tools (from GP command center to top, free, all the way across Pivotal suggested session_level_memory_consumption view).
EDITED FROM HERE
ADDED two documentation sources where statement_mem is defined per segment and not per host. (#Jon Roberts)
On the GP best practices guide, beginning of page 32, it clearly says that if the statement_mem is 125MB and we have 8 segments on the server, each query will get 1GB allocated per server.
https://www.google.es/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0ahUKEwi6sOTx8O3KAhVBKg4KHTwICX0QFggmMAE&url=http%3A%2F%2Fgpdb.docs.pivotal.io%2F4300%2Fpdf%2FGPDB43BestPractices.pdf&usg=AFQjCNGkTqa6143fvJUztYISWAiVyj62dA&sig2=D2ZcJwLDqN0qBzU73NjXNg&bvm=bv.113943164,d.ZWU&cad=rja
On the https://support.pivotal.io/hc/en-us/articles/201947018-Pivotal-Greenplum-GPDB-Memory-Configuration it seems to use statement_mem as segment memory and not host memory. It keeps interrelating statement_mem with the memory limit of the resource queues as well as with the gp_vmem_protect_limit (both parameters defined per segment basis).
This is why I'm getting confused about how to properly manage the memory resources.
Thanks
I incorrectly stated that statement_mem is on a per host and that is not the case. This link is talking about the memory on a segment level:
http://gpdb.docs.pivotal.io/4370/guc_config-statement_mem.html#statement_mem
With the default of "eager_free" gp_resqueue_memory_policy, memory gets re-used so the aggregate amount of memory used may look low for a particular query execution. If you change it to "auto" where the memory isn't re-used, the memory usage is more noticeable.
Run an "explain analyze" of your query and see the slices that are used. With eager_free, the memory gets re-used so you may only have a single slice wanting more memory than available such as this one:
(slice18) * Executor memory: 10399K bytes avg x 2 workers, 10399K bytes max (seg0). Work_mem: 8192K bytes max, 13088K bytes wanted.
And for your question on how to manage the resources, most people don't change the default values. A query that spills to disk is usually an indication that the query needs to be revised or the data model needs some work.

Whole memory cycle in executing a program

I have been thinking about how the whole information(data) is passed while executing any program or query.
The below diagram I used expand my assumption:
All data are stored in a disk storage.
The whole platter of the disk is divided into many sectors, and sectors are divided into blocks. Blocks are divided into pages, and pages are contain in a page table and sequence id.
The most frequently used data are stored in cache for faster access.
If data is not found in cache then program goes to check Main Memory and if page fault occurs, then it goes into disk storage.
Virtual Memory is used as a address mapping from RAM to Disk Storage.
Do you think I am missing anything here? Is my assumption correct regarding how memory management works? Will appreciate any helpful comments. Thank you
I think you are mixing too many things up together.
All data are stored in a disk storage.
In most disk based operating systems, all user data (and sometimes kernel data) is stored on disk (somewhere) and mapped to memory.
The whole platter of the disk is divided into many sectors, and sectors are divided into blocks. Blocks are divided into pages, and pages are contain in a page table and sequence id.
No.
Most disks these days use logical I/O so that the software only sees blocks, not tracks, sectors, and platters (as in ye olde days).
Blocks exist only on disk. Pages only exist in memory. Blocks are divided into pages.
The most frequently used data are stored in cache for faster access.
There are two common caches. I cannot tell which you are referring to. One is the CPU cache (hardware) and the other is software caches maintained by the operating system.
If data is not found in cache then program goes to check Main Memory and if page fault occurs, then it goes into disk storage.
No.
This sounds like you are referring to the CPU cache. Page faults are triggered when reading the page table.
Virtual Memory is used as a address mapping from RAM to Disk Storage.
Logical memory mapping is used to map logical pages to physical page frames. Virtual memory is used to map logical pages to disk storage.

MongoDB process size in task manager

I have been working on MongoDB and insterted upto 1 GB data into a database collection and noticed that the process size of MongoDB shown in task manager is 25mb but overall Memory in Performance tab of task manager is getting higher as i insert data, Question is why that 1 GB is not part of Process Size shown by task manager, i know that mongodb store it on Files but yet it cache a part of that data in memory.
MongoDB (<= 2.6) uses memory-mapped files. This means that the database asks the operating system to map the data files to a portion of virtual memory. The operating system then handles moving things in and out of physical memory according to what the database accesses. Your 1GB of data is mapped into virtual memory, but is likely not resident in physical memory since you have not accessed it recently. To see more detailed statistics about MongoDB's memory usage, run db.serverStatus() in the shell and look at the mem section. You can read a bit more about the memory-mapped storage engine in the storage FAQ.

Resources