Can chronicle-map handle data larger than memory? - chronicle

I'm a bit confused by how off heap memory works. I have a server that has 32GB ram, and a data set of key-value mappings about 1TB in size. I'm looking for a simple and fast embedded Java database that would allow me to map a key to a value according to this 1TB dataset, which will mostly have to be read from disk. Each entry in this data set is small (<500 bytes), so I think using a file system would be ineffecient.
I'd like to use Chronicle Map for this. I read that off heap memory usage can exceed ram size and that it interacts with the filesysytem somehow, but at the same time, Chronicle Map is described as an in memory database. Can Chronicle Map handle the 1TB data set for my server, or am I only limited to using data sets 32GB or less?

The answer is it depends on your operating system. On Windows a Chronicle Map must fit inside main memory, however on Linux and MacOSX it doesn't have fix in main memory (the difference is how memory mapping is implemented on these OSes) Note: Linux even allows you to map a region larger than your disk space (MacOSX and Windows doesn't)
So on Linux you could map 1 TB or even 100 TB on a machine with 32 GB of memory. It is important to remember that your access pattern and your choice of drive will be critical to performance. If you generally access the same data most of the time and you have SSD this will perform well. If you have spinning disk and a random access pattern, you will be limited by the speed of your drive.
Note: we have tested Chronicle Map to 2.5 billion entries and it performs well as it uses 64-bit hashing of keys.

Related

Is it possible to "gracefully" use virtual memory in a program whose regular use would consume all physical RAM?

I am intending to write a program to create huge relational networks out of unstructured data - the exact implementation is irrelevant but imagine a GPT-3-style large language model. Training such a model would require potentially 100+ gigabytes of available random access memory as links get reinforced between new and existing nodes in the graph. Only a small portion of the entire model would likely be loaded at any given time, but potentially any region of memory may be accessed randomly.
I do not have a machine with 512 Gb of physical RAM. However, I do have one with a 512 Gb NVMe SSD that I can dedicate for the purpose. I see two potential options for making this program work without specialized hardware:
I can write my own memory manager that would swap pages between "hot" resident memory and "cold" on the hard disk, probably using memory-mapped files or some similar construct. This would require me coding all memory accesses in the modeling program to use this custom memory manager, and coding the page cache and concurrent access handlers and all of the other low-level stuff that comes along with it, which would take days and very likely introduce bugs. Also performance would likely be poor. Or,
I can configure the operating system to use the entire SSD as a page file / SWAP filesystem, and then just have the program reserve as much virtual memory as it needs - the same as any other normal program, relying on the kernel's memory manager which is already doing the page mapping + swapping + caching for me.
The problem I foresee with #2 is making the operating system understand what I am trying to do in a "cooperative" way. Ideally I would like to hint to the OS that I would only like a specific fraction of resident memory and swap the rest, to keep overall system RAM usage below 90% or so. Otherwise the OS will allocate 99% of physical RAM and then start aggressively compacting and cutting down memory from other background programs, which ends up making the whole system unresponsive. Linux apparently just starts sacrificing entire processes if it gets too bad.
Does there exist a kernel command in any language or operating system that would let me tell the OS to chill out and proactively swap user memory to disk? I have looked through VMM functions in kernel32.dll and the Linux paging and swap daemon (kswapd) documentation, but nothing looks like what I need. Perhaps some way to reserve, say, 1Gb of pages and then "donate" them back to the kernel to make sure they get used for processes that aren't my own? Some way to configure memory pressure or limits or make kswapd work more aggressively for just my process?

How much virtual memory does a 30.5Gb heap(256 Gb memory in total) for Elasticsearch support?

Assume I have a machine with 256gb memory and 12TB SSD. Indexed document size is 100TB. I assign 30.5 GB to Elasticsearch heap. The remaining is for Lucene and OS.
My question is, how much virtual memory does Elasticsearch support? To put it in another way, how many indexed documents can I put into the virtual memory for each machine?
Thanks
The amount of virtual memory ES can use is defined by the value of the vm.max_map_count setting in /etc/sysctl.conf. By default it is set at 262144, but you can change this value using:
sysctl -w vm.max_map_count=262144
From the linux documentation:
This file contains the maximum number of memory map areas a process
may have. Memory map areas are used as a side-effect of calling
malloc, directly by mmap and mprotect, and also when loading shared
libraries.
While most applications need less than a thousand maps, certain
programs, particularly malloc debuggers, may consume lots of them,
e.g., up to one or two maps per allocation.
The default value is 65536.
So this setting doesn't impose a specific size available to ES/Lucene, but a number of discrete memory areas that a given process can use. How much memory is used exactly will depend on the size of the memory chunks being allocated by ES/Lucene. By default, Lucene uses
1<<30 = 1,073,741,824 ~= 1GB chunks on a 64 bit JRE and
1<<28 = 268,435,456 ~= 256MB chunks on 32 bit JRE
So if you do the math, the default value of vm.max_map_count is probably good enough for your case, if not you can tune it and monitor your virtual memory usage.

MongoDB process size in task manager

I have been working on MongoDB and insterted upto 1 GB data into a database collection and noticed that the process size of MongoDB shown in task manager is 25mb but overall Memory in Performance tab of task manager is getting higher as i insert data, Question is why that 1 GB is not part of Process Size shown by task manager, i know that mongodb store it on Files but yet it cache a part of that data in memory.
MongoDB (<= 2.6) uses memory-mapped files. This means that the database asks the operating system to map the data files to a portion of virtual memory. The operating system then handles moving things in and out of physical memory according to what the database accesses. Your 1GB of data is mapped into virtual memory, but is likely not resident in physical memory since you have not accessed it recently. To see more detailed statistics about MongoDB's memory usage, run db.serverStatus() in the shell and look at the mem section. You can read a bit more about the memory-mapped storage engine in the storage FAQ.

Use 3GB Free Space to Access 30 GB info without Virtual Memory Paging?

I have a quick question:
How can we Use a 3GB Free Space to Access roughly 30 GB of data without Virtual Memory or Compression? It's more of a Data Structure Question.
Thanks
You should somehow mimic the paging mechanism.
One way to do it is hashing1.
Hash all your data into bins, and store these bins in disk. In your main memory (RAM) you will only hold an array of pointers to disk. Once you need an address, you know where it is on disk by accessing the RAM and taking the pointer from the location hash(address)
You can of course optimize it to keep a portion of the data in memory - using the principle of locality - and hoping to get a hit - and avoid reloading a chunk from disk.
(1) The hashing does not have to be complex or uniformly distributed. I believe using the MSb's of the address will be just fine - and will actually mimic the paging mechanism better.
The most obvious way would be through a typical filesystem API with read, write, and seek functions.

What pitfalls should I be wary of when memory mapping BIG files?

I have a bunch of big files, each file can be over 100GB, the total amount of data can be 1TB and they are all read-only files (just have random reads).
My program does small reads in these files on a computer with about 8GB main memory.
In order to increase performance (no seek() and no buffer copying) i thought about using memory mapping, and basically memory-map the whole 1TB of data.
Although it sounds crazy at first, as main memory << disk, with an insight on how virtual memory works you should see that on 64bit machines there should not be problems.
All the pages read from disk to answer to my read()s will be considered "clean" from the OS, as these pages are never overwritten. This means that all these pages can go directly to the list of pages that can be used by the OS without writing back to disk OR swapping (wash them). This means that the operating system could actually store in physical memory just the LRU pages and would operate just reads() when the page is not in main memory.
This would mean no swapping and no increase in i/o because of the huge memory mapping.
This is theory; what I'm looking for is any of you who has every tried or used such an approach for real in production and can share his experience: are there any practical issues with this strategy?
What you are describing is correct. With a 64-bit OS you can map 1TB of address space to a file and let the OS manage reading and writing to the file.
You didn't mention what CPU architecture you are on but most of them (including amd64) the CPU maintains a bit in each page table entry as to whether data in the page has been written to. The OS can indeed use that flag to avoid writing pages that haven't been modified back to disk.
There would be no increase in IO just because the mapping is large. The amount of data you actually access would determine that. Most OSes, including Linux and Windows, have a unified page cache model in which cached blocks use the same physical pages of memory as memory mapped pages. I wouldn't expect the OS to use more memory with memory mapping than with cached IO. You're just getting direct access to the cached pages.
One concern you may have is with flushing modified data to disk. I'm not sure what the policy is on your OS specifically but the time between modifying a page and when the OS will actually write that data to disk may be a lot longer than your expecting. Use a flush API to force the data to be written to disk if it's important to have it written by a certain time.
I haven't used file mappings quite that large in the past but I would expect it to work well and at the very least be worth trying.

Resources