Multiple hugepage sizes in Linux (x86-64)? - linux-kernel

Does the Linux on x86-64 support multiple huge page sizes (e.g., both 2MB and 1GB page sizes beyond the 4KB base page size)? If yes, is there a way to specify that for a given allocation which huge page size to use? In other words, my question is if "MAP_HUGETLB" flag is used while doing mmap() it maps them allocation to hugepages of default size. Is there anyway to request an allocation to be mapped on to non-default hugepage size?

Not quite yet, but it's working its way through the LKML. At a guess, the feature will be available in a few releases time.
You will then be able to use the flags MAP_HUGE_2MB and MAP_HUGE_1GB to configure this explicitly.

Related

Allocating huge pages in kernel modules

I'm looking for a way to allocate huge pages (2M or 1G) in a kernel module (I'm using kernel version 4.15.0).
In user space, I can mount the hugetlbfs file system, and then allocate huge pages using mmap (see, e.g., https://blog.kevinhu.me/2018/07/01/01-Linux-Hugepages/). Is there a similar way to do this in kernel space?
I'm aware that I could allocate them in user space first, and then pass them to the kernel using get_user_pages, as described in Sequential access to hugepages in kernel driver. However, I'm looking for a more direct way to allocate them, as I only need them in kernel space.
Something similar to
kmalloc(0x200000, GFP_KERNEL | __GFP_COMP)
should work.
As explained in this LWN article:
A compound page is simply a grouping of two or more physically contiguous pages into a unit that can, in many ways, be treated as a single, larger page. They are most commonly used to create huge pages, used within hugetlbfs or the transparent huge pages subsystem, but they show up in other contexts as well. Compound pages can serve as anonymous memory or be used as buffers within the kernel; they cannot, however, appear in the page cache, which is only prepared to deal with singleton pages.
This makes the assumption that huge pages are configured and available.

Resource Sizing with only static memory use

For my embedded application, we are using a STM32F411 chip. The chip has 512kb of flash and 128kb of RAM.
I wanted to do a resource sizing exersize so I can monitor how I am doing on resources (FLASH and RAM)
I only statically allocate memory with no Malloc() calls. and sizing with gcc gives me:
text data bss dec hex filename
230868 11236 74048 316152 4d2f8 application.elf
From the readings I have done (https://mcuoneclipse.com/2013/04/14/text-data-and-bss-code-and-data-size-explained/) I understand that because there are no dynamically allocated resources, the above information should give me a clear measure of how deep into the RAM usage I will run.
Can I expect the RAM use to ultimately be the data section + the bss sections per the summary on the link above? So in this case 85284 bytes.
And the Flash size to be text + data sections. In this case: 242104 ?
Can I expect the RAM use to ultimately be the data section + the bss
sections per the summary on the link above? So in this case 85284
bytes.
Depending on your linker script. Especially stack and heap configuration. Same is with the text segment & data segment.
For more detailed information you need to see the .map file.
Yes, but also consider that even though you don't explicitly use dynamic memory in your code, library functions might. If you're trying to maintain super-tight control over memory use and you have an application that uses close to your total amount of RAM, you'll need to account for that. If you don't, you may run into nasty runtime issues.
In short, yes. Because of the need to store the initializers for the initialized data section, the "data" section counts twice in the memory usage -- once for flash and once for RAM. This is also why it is important to be very diligent about declaring constant data as "const". That data is then placed in flash and only counts once in the overall memory usage.

What is the cost of mmaping on Mac OS X?

I have an algorithm where my life would be greatly simplified if I could reserve about 20 blocks of memory addresses of size 4GB. In practice, I never use more than 4GB, but I do not know which block will fill up in advance.
If I mmap 20 blocks of 4GB everything seems to work fine -- until I write to memory the OS does not seem to actually allocate anything.
Is there any reason I should not use mmap to allocate 80GB of memory, and then only using a small amount of it? I assume there is some cost to setting up these buffers. Can I measure it?
The only drawback of mmap-ing 80GB at once is that a page table has to be created for the full 80GB. So if the pages are 4kB, this table could consume a lot of memory (unless huge pages are used).
For sizes like that it is probably better to use one or more sliding mmap-ed views (i.e. create and remove them when needed).
On Windows, memory usage for mmap/page tables can be checked with RamMap, not sure about Mac.

OSX Set Page size

How would I go about setting the page size in memory for OS X Yosemite?
If I enter pagesize into terminal I get 4096. Is there a way I can modify this?
Short answer: No.
The page size is specific to your architecture and cannot generally be changed by a user at run-time. Intel x86 processors all use a page size of 4 KiB.
Longer answer:
Your CPU may also support larger pages, like 2 MiB, and 1 GiB. (See Huge Pages on Wikipedia.) However, it is completely up to your OS kernel to manage how pages of memory are mapped into various address spaces.
Few userspace APIs concern themselves with the platform's page size. The ones that do, however (e.g. mmap) are written to the least-common-denominator of the available page sizes, because you can't guarantee that a larger page size will be used for a particular page. For this reason, the "page size" exposed to userspace is a simple single value, like 4 KiB.
On Linux, there is some "control" over this mechanism. Check out:
Huge pages part 1 (Introduction) [LWN.net]
Huge pages part 2: Interfaces [LWN.net]
hugetlbpage.txt [kernel.org]
Hugepages [Debian Wiki]
I have no idea if OS X supports this. Searches for OS X hugepages came up thin.

What pitfalls should I be wary of when memory mapping BIG files?

I have a bunch of big files, each file can be over 100GB, the total amount of data can be 1TB and they are all read-only files (just have random reads).
My program does small reads in these files on a computer with about 8GB main memory.
In order to increase performance (no seek() and no buffer copying) i thought about using memory mapping, and basically memory-map the whole 1TB of data.
Although it sounds crazy at first, as main memory << disk, with an insight on how virtual memory works you should see that on 64bit machines there should not be problems.
All the pages read from disk to answer to my read()s will be considered "clean" from the OS, as these pages are never overwritten. This means that all these pages can go directly to the list of pages that can be used by the OS without writing back to disk OR swapping (wash them). This means that the operating system could actually store in physical memory just the LRU pages and would operate just reads() when the page is not in main memory.
This would mean no swapping and no increase in i/o because of the huge memory mapping.
This is theory; what I'm looking for is any of you who has every tried or used such an approach for real in production and can share his experience: are there any practical issues with this strategy?
What you are describing is correct. With a 64-bit OS you can map 1TB of address space to a file and let the OS manage reading and writing to the file.
You didn't mention what CPU architecture you are on but most of them (including amd64) the CPU maintains a bit in each page table entry as to whether data in the page has been written to. The OS can indeed use that flag to avoid writing pages that haven't been modified back to disk.
There would be no increase in IO just because the mapping is large. The amount of data you actually access would determine that. Most OSes, including Linux and Windows, have a unified page cache model in which cached blocks use the same physical pages of memory as memory mapped pages. I wouldn't expect the OS to use more memory with memory mapping than with cached IO. You're just getting direct access to the cached pages.
One concern you may have is with flushing modified data to disk. I'm not sure what the policy is on your OS specifically but the time between modifying a page and when the OS will actually write that data to disk may be a lot longer than your expecting. Use a flush API to force the data to be written to disk if it's important to have it written by a certain time.
I haven't used file mappings quite that large in the past but I would expect it to work well and at the very least be worth trying.

Resources