MongoDB process size in task manager - windows

I have been working on MongoDB and insterted upto 1 GB data into a database collection and noticed that the process size of MongoDB shown in task manager is 25mb but overall Memory in Performance tab of task manager is getting higher as i insert data, Question is why that 1 GB is not part of Process Size shown by task manager, i know that mongodb store it on Files but yet it cache a part of that data in memory.

MongoDB (<= 2.6) uses memory-mapped files. This means that the database asks the operating system to map the data files to a portion of virtual memory. The operating system then handles moving things in and out of physical memory according to what the database accesses. Your 1GB of data is mapped into virtual memory, but is likely not resident in physical memory since you have not accessed it recently. To see more detailed statistics about MongoDB's memory usage, run db.serverStatus() in the shell and look at the mem section. You can read a bit more about the memory-mapped storage engine in the storage FAQ.

Related

Aerospike HDD/Memory usage

I'm exploring Aerospike as key-value DB with storing data on disk for safety. Please confirm, that I understand this correctly:
If in namespace configuration I set:
storage-engine device
memory-size 4G
file /opt/aerospike/data/namespace.dat
filesize 16G
data-in-memory false
-> all data will be on disk only, "memory-size" is for indexes only (small usage), all data will be stored in multiple 16GB files (which will be creating automatically), and most important - every read query will trigger reading data from disk?
If in namespace configuration I set:
storage-engine device
memory-size 4G
file /opt/aerospike/data/namespace.dat
filesize 16G
data-in-memory true
-> all data will be on disk and partly in memory, "memory-size" will act like cache and contain 4GB of most used data, all data will be stored in multiple 16GB files (which will be creating automatically), and most important - every read query will trigger checking data from memory and if missing -> reading from disk and adding to memory? What data will be in memory - most used or latest created?
If in namespace configuration I set:
storage-engine memory
memory-size 4G
data-in-memory true
-> all data will be in memory only, I'm limited to 4GB of data and no more?
Aerospike doesn't shuffle data in and out of disk like first generation NoSQL databases do, ones that have a "cache-first" architecture. Aerospike's hybrid memory architecture is such that the primary index (metadata) is always in memory. Depending on the namespace configuration, the data is stored fully on disk or in memory. You define storage for each namespace. If it is in-memory all the data and metadata is in-memory, fully. if the namespace stores its data on a few devices (/dev/sdb, /dev/sdc) the primary index (metadata) is fully in memory and the data is fully on those SSDs.
(1) is data on HDD, and the configuration is correct. If you're using an SSD you probably want to use device instead of file. One thing that isn't true in your question is that Aerospike will first check the post-write-queue on a read.
Aerospike does block writes to optimize around the high-read / low-write performance of HDD and SSD. The size of the block is determined by the write-block-size config parameter (should be 1MB for a HDD). The records are first loaded into a streaming write buffer of an equivalent size. After the buffer is flushed to a block on disk, Aerospike doesn't get rid of this in-memory copy immediately; it remains part of the post-write queue (FIFO). By default, 256 of those blocks are in the queue per-device, or per-file (you can define multiple file lines as the storage device). If your usage pattern is such that reads follow closely after the writes, you'll be getting in-memory access instead of disk access. If your cache_read_pct metric is not single digits and you have DRAM to spare, you probably can benefit from raising the post-write-queue value (max of 2048 blocks per-device).
(2) is an in-memory namespace, persisted to disk. For both (1) and (2) you can use either file (for filesystem based storage) or device (for raw device). Both the primary index (metadata) and storage (data) are in memory for (2). All reads and writes come out of memory, and a secondary write-through goes to the persistence device.
filesize reserves the size of the persistence layer on the filesystem (if you chose to use file and not device). You can have multiple file lines, each of which will be sized from the start to the number given as filesize. memory-size is the maximum amount of memory used by the namespace. This isn't pre-reserved. Aerospike will grow and shrink in memory usage over time, with the maximum for the namespace being its memory-size.
Take a look at What's New in 3.11, specifically the section that touches on in-memory performance improvements. Tuning partition-tree-sprigs and partition-tree-locks will likely boost the performance of your in-memory namespaces.
(3) is a purely in-memory namespace, usually intended to be a cache. The 4G limit affects things such as stop-writes-pct, high-water-memory-pct as those are defined as a percentage of that limit (see evictions, expirations, stop-writes).
There's also a (4) special-case for counters called data-in-index. See storage engine configuration recipes.

Can chronicle-map handle data larger than memory?

I'm a bit confused by how off heap memory works. I have a server that has 32GB ram, and a data set of key-value mappings about 1TB in size. I'm looking for a simple and fast embedded Java database that would allow me to map a key to a value according to this 1TB dataset, which will mostly have to be read from disk. Each entry in this data set is small (<500 bytes), so I think using a file system would be ineffecient.
I'd like to use Chronicle Map for this. I read that off heap memory usage can exceed ram size and that it interacts with the filesysytem somehow, but at the same time, Chronicle Map is described as an in memory database. Can Chronicle Map handle the 1TB data set for my server, or am I only limited to using data sets 32GB or less?
The answer is it depends on your operating system. On Windows a Chronicle Map must fit inside main memory, however on Linux and MacOSX it doesn't have fix in main memory (the difference is how memory mapping is implemented on these OSes) Note: Linux even allows you to map a region larger than your disk space (MacOSX and Windows doesn't)
So on Linux you could map 1 TB or even 100 TB on a machine with 32 GB of memory. It is important to remember that your access pattern and your choice of drive will be critical to performance. If you generally access the same data most of the time and you have SSD this will perform well. If you have spinning disk and a random access pattern, you will be limited by the speed of your drive.
Note: we have tested Chronicle Map to 2.5 billion entries and it performs well as it uses 64-bit hashing of keys.

Tachyon Doesn't Seem to be Aware of Available Memory

Just to see if Tachyon would give me an error about configured memory being more than available I set:
# Some value over combined available mem and disk space.
export TACHYON_WORKER_MEMORY_SIZE=1000GB
And observed the allocation in the web UI without error.
Is some of the info going to be pushed to disk when available RAM is exceeded?
What happens when it exceeds disk space? Dropped file errors or system failure?
This is the expected (if perhaps unhelpful behaviour) and ultimately it is to do with the fact that Tachyon uses Linux ramfs as the in-memory storage.
As this article explains:
ramfs file systems cannot be limited in size like a disk base file
system which is limited by it’s capacity. ramfs will continue using
memory storage until the system runs out of RAM and likely crashes or
becomes unresponsive.
Note that Tachyon will enforce the size constraint based on the size you give it. However as you've found you can allocate more RAM than is actually available and Tachyon won't check this so you may want to go ahead and file a bug report.
To answer your specific questions:
No excess data will not be pushed to disk automatically
When RAM is full behaviour is OS dependent
Note that the setting you are referring to only controls the in-memory space, if you want to use local disks in addition to RAM then you need to use Tachyon's Tiered Storage.

How much memory is available for database use in memsql

I have created memsql cluster on 7 machines. One of the machine shows that out of 62.86 GB only 2.83 is used. So here I am assuming that around 60 GB
memory is available to store data.
But my top command tell another story
Here we can see that about 21.84 GB memory is getting used and free memory is 41 GB.
So
1> How much exact memory is available for database? Is it 60 Gb as per cluster URL or 42 Gb as per top command
Note that:
1>memsql-op is consuming aroung 13.5 g virtual memory.
2> as per 'top' if we subtract buffered and cached memory's total size from used memory, then it comes to 2.83GB which is used memory as per cluster URL
To answer your question, you currently have about 60GB of memory free to be used by any process on your machine including the MemSQL database. Note that MemSQL has some overhead and by default reserves a small percentage of the total memory for overhead. If you visit the status page in the MemSQL Ops UI and view the "Leaf Table Memory" card, you will discover the amount of memory that can be used for data storage within the leaf nodes of your MemSQL cluster.
MemSQL Ops is written in Python which is then embedded into a "single binary" via a packaging tool. Because of this it exhibits a couple of oddities including high VM use. Note that this should not affect the amount of data you can store, as Ops is only consuming 308MB of resident memory on your machine. It should stay relatively constant based on the size of your cluster.

What pitfalls should I be wary of when memory mapping BIG files?

I have a bunch of big files, each file can be over 100GB, the total amount of data can be 1TB and they are all read-only files (just have random reads).
My program does small reads in these files on a computer with about 8GB main memory.
In order to increase performance (no seek() and no buffer copying) i thought about using memory mapping, and basically memory-map the whole 1TB of data.
Although it sounds crazy at first, as main memory << disk, with an insight on how virtual memory works you should see that on 64bit machines there should not be problems.
All the pages read from disk to answer to my read()s will be considered "clean" from the OS, as these pages are never overwritten. This means that all these pages can go directly to the list of pages that can be used by the OS without writing back to disk OR swapping (wash them). This means that the operating system could actually store in physical memory just the LRU pages and would operate just reads() when the page is not in main memory.
This would mean no swapping and no increase in i/o because of the huge memory mapping.
This is theory; what I'm looking for is any of you who has every tried or used such an approach for real in production and can share his experience: are there any practical issues with this strategy?
What you are describing is correct. With a 64-bit OS you can map 1TB of address space to a file and let the OS manage reading and writing to the file.
You didn't mention what CPU architecture you are on but most of them (including amd64) the CPU maintains a bit in each page table entry as to whether data in the page has been written to. The OS can indeed use that flag to avoid writing pages that haven't been modified back to disk.
There would be no increase in IO just because the mapping is large. The amount of data you actually access would determine that. Most OSes, including Linux and Windows, have a unified page cache model in which cached blocks use the same physical pages of memory as memory mapped pages. I wouldn't expect the OS to use more memory with memory mapping than with cached IO. You're just getting direct access to the cached pages.
One concern you may have is with flushing modified data to disk. I'm not sure what the policy is on your OS specifically but the time between modifying a page and when the OS will actually write that data to disk may be a lot longer than your expecting. Use a flush API to force the data to be written to disk if it's important to have it written by a certain time.
I haven't used file mappings quite that large in the past but I would expect it to work well and at the very least be worth trying.

Resources