Why can't 64-bit Windows allocate a lot of virtual memory? - windows

On a system with virtual memory, it should be possible to allocate lots of address space, more than you have physical RAM, and then only write to as much of it as you need.
On a 32-bit system of course there is only four gigabytes of virtual address space, but that limit disappears on a 64-bit system.
Granted that Windows doesn't use the full 64-bit address space, apparently it uses 44 bits; that is still sixteen terabytes, so there should be no problem with allocating e.g. one terabyte: Behind Windows x64's 44-bit virtual memory address limit
So I wrote a program to test this, attempting to allocate a terabyte of address space in chunks of ten gigabytes each:
#include <new>
#include <stdio.h>
void main() {
std::set_new_handler([]() {
perror("new");
exit(1);
});
for (int i = 0; i < 100; i++) {
auto p = new char[10ULL << 30];
printf("%p\n", p);
}
}
Run on Windows x64 with 32 gigabytes of RAM, it gives this result (specifics differ between runs, but always qualitatively similar):
0000013C881C1040
0000013F081D0040
00000141881E2040
00000144081F1040
0000014688200040
0000014908219040
0000014B88226040
0000014E08232040
0000015088246040
0000015308252040
0000015588260040
new: Not enough space
So it only allocates 110 gigabytes before failing. That is larger than physical RAM, but much smaller than the address space that should be available.
It is definitely not trying to actually write to the allocated memory (that would require the allocation of physical memory); I tried explicitly doing that with memset immediately after allocation, and the program ran much slower, as expected.
So where is the limit on allocated virtual memory coming from?

Related

Linux Kernel Driver - physical CPU memory not updated. DMA problem

I'm using Orange Pi3 LTS with Allwinner H6 ARM CPU. I'm writing now UART driver with DMA for Rx and Tx. I allocated physical RAM memory using kmalloc() call and I got physical and logical address for my allocated memory. So, I know physical address in processor and corresponding logical address in Linux Kernel Driver space. I have a problem with updating physical memory after update logical. I mean, for example in my linux kernel driver I have callback init() when I'm attaching my driver to kernel and exit() when I'm disconnecting driver from kernel. In this call init() I'm allocating physical memory using kmalloc() call. In the same call I'm filling this memory with some data, but using logical address (because from kernel I can't access physical memory). In the same call (after fill memory) I'm triggering one of DMA channel to do job (I'm putting data to CPU registers). So, DMA should take descriptor (as pointer) from physical RAM memory and do some job for transmit data over UART. But it seems that physical memory is not updated in this "init()" call. Only logical RAM memory is updated, because in CPU registers I have wrong data. But when I put filling in RAM only descriptor data and for example in another kernel callback (exit) I'm triggering DMA then it is working -> in physical RAM memory is correct data and data is sending over UART as expected. I don't understand this situation. Why in single linux kernel driver callback (i.e. "init") physical memory is not updated, but it is updated only in logical memory space. Why linux kernel driver is not updating physical memory (over MMU) directly after write to logical memory, but after this call (after leave init() callbcak)?
As I wrote in problem description.
I studied documentation about DMA API Linux. Finally I found solution.
As was wrote in comment here was a problem with cache coherency.
Instead of use kmalloc() call to allocate RAM memory for DMA should be use dma_alloc_coherent() which returns pointer to logical address for kernel and also in argument it returns physical address without cache (non-cached).
Here is my example/test code which is working for me and now physical memory is updated immediately with logical inside kernel memory space. Allocation of 1024 bytes in RAM.
static struct device *dev;
static dma_addr_t physical_address;
static unsigned int *logical_address;
static void ptr_init(void)
{
unsigned long long dma_mask = DMA_BIT_MASK(32);
dev->dma_mask = &dma_mask;
if (dma_set_mask_and_coherent(dev, dma_mask) != 0)
printk("Mask not OK\n");
else
printk("Mask OK\n");
logical_address = (unsigned int *)dma_alloc_coherent(dev, 1024, &physical_address, GFP_KERNEL);
if (logical_address != NULL)
printk("allocation OK \n");
else
printk("allocation NOT OK\n");
printk("logical address: %x\n", logical_address);
printk("physical address: %x\n", physical_address);
}

A heap manager for C/Pascal that automatically fills freed memory with zero bytes

What do you think about an option to fill freed (not actually used) pages with zero bytes? This may improve performance under Windows, and also under VMWare and other virtual machine environments? For example, VMWare and HyperV calculate hash of memory pages, and, if the contents is the same, mark this page as "shared" inside a virtual machine and between virtual machines on the same host, until the page is modified. It effectively decreases memory consumption. Windows does the same - it handles zero pages differently, treating them as free.
We could have the heap manager that would automatically fill memory with zeros when we call FreeMem/ReallocMem. As an alternative option, we could have a function that zeroizes empty memory by demand, i.e. only when this function is explicitly called. Of course, this function has to be thread-safe.
The drawback of filling memory with zeros is touching the memory, which might have already been turned into virtual, thus issuing page faults. Besides that, any memory store operations are slow, so our program will be slower, albeit to an unknown extent (maybe negligible).
If we manage to fill 4-K pages completely with zeros, the hypervisor or Windows will explicitly mark it as a zero page. But even partial zeroizing may be beneficial, since the hypervisor may compress pages using LZ or similar algorithms to save physical memory.
I just want to know your opinion whether the benefits of filling emptied heap memory with zero bytes by the heap manager itself will outweigh the disadvantages of such a technique.
Is zeroizing worth its price when we buy reduced physical memory consumption?
When you have a page whose contents you no longer care about but you still want to keep it allocated, you can call VirtualAlloc (and variants) and pass the MEM_RESET flag.
From VirtualAlloc on MSDN:
MEM_RESET
Indicates that data in the memory range specified by lpAddress and
dwSize is no longer of interest. The pages should not be read from or
written to the paging file. However, the memory block will be used
again later, so it should not be decommitted. This value cannot be
used with any other value.
Using this value does not guarantee that
the range operated on with MEM_RESET will contain zeros. If you want
the range to contain zeros, decommit the memory and then recommit it.
This gives the best of both worlds - you don't have the cost of zeroing the memory, and the system does not have the cost of paging it back in. You get to take advantage of the well-tuned memory manager which already has a zero-pool.
Similar functionality also exists on Linux under the MADV_FREE (or MADV_DONTNEED for Posix) flag to madvise. Glibc uses this function in the implementation of its heap.:
/*
* Stack:
* int shrink_heap (heap_info *h, long diff)
* int heap_trim (heap_info *heap, size_t pad) at arena.c:660
* void _int_free (mstate av, mchunkptr p, int have_lock) at malloc.c:4097
* void __libc_free (void *mem) at malloc.c:2948
* void free(void *mem)
*/
static int
shrink_heap (heap_info *h, long diff)
{
long new_size;
new_size = (long) h->size - diff;
/* ... snip ... */
__madvise ((char *) h + new_size, diff, MADV_DONTNEED);
/* ... snip ... */
h->size = new_size;
return 0;
}
If your heap is in user space this will never work. The kernel can only trust itself, not user space. If the kernel zeros a page, it can treat it as zero. If user space says it zeroed a page, the kernel would still have to check that. It might just as well zero it. One thing user space can do is to discard pages. Which marks them as "don't care". Then a kernel can treat them as zero. But manually zeroing pages in user space is futile.

Why do we need external sort?

The main reason for external sort is that the data may be larger than the main memory we have.However,we are using virtual memory now, and the virtual memory will take care of swapping between main memory and disk.Why do we need to have external sort then?
An external sort algorithm makes sorting large amounts of data efficient (even when the data does not fit into physical RAM).
While using an in-memory sorting algorithm and virtual memory satisfies the functional requirements for an external sort (that is, it will sort the data), it fails to achieve the non-functional requirement of being efficient. A good external sort minimises the amount of data read and written to external storage (and historically also seek times), and a general-purpose virtual memory implementation on top of a sort algorithm not designed for this will not be competitive with an algorithm designed to minimise IO.
In addition to #Anonymous's answer that external sort is better optimized for less disk IO, sometimes using in-memory sort and using the virtual memory is infeasible, since the virtual memory space is smaller than the file's size.
For example, if you have a 32 bits system (there are still a lot of these), and you want to sort a 20 GB file, 32bits system allow you to have 2^32 ~= 4GB virtual addresses, but the file you are trying to sort cannot fit in.
This used to be a real issue when 64 bits systems were still not very common, and is still an issue today for old 32 bits systems and some embadded devices.
However, even for 64 bits system, as expained in previous answers, the external sort algorithm is more optimized for the nature of sorting, and will require significantly less disk IO than letting the OS "take care of things".
I'm using Windows, in common line shell, you could run "systeminfo", it gives me my laptop's memory usage information.
Total Physical Memory: 8,082 MB
Available Physical Memory: 2,536 MB
Virtual Memory: Max Size: 11,410 MB
Virtual Memory: Available: 2,686 MB
Virtual Memory: In Use: 8,724 MB
I just write a app to test max size of array I could initialize from my laptop.
public static void BurnMemory()
{
for(var i = 1; i <= 1024; i++)
{
long size = 1 << i;
long t = 4 * size / (1 << 30);
try
{
// 1 int32 takes 32 bit(4 byte) memmory,
var arr = new int[size];
Console.WriteLine("Test pass initialize a array with size = 2^" + i.ToString());
}
catch(OutOfMemoryException err)
{
Console.WriteLine("Reach memory limitation when initialize a array with size = 2^{0} int32 = 4 x {1}B= {2}TB",i, size, t );
break;
}
}
}
It seems it terminate when it is trying to initialize array with size of 2^29.
Reach memory limitation when initialize a array with size = 2^29 int32 = 4 x 536870912B= 2TB
What I get from the test:
It is not hard to reach the memory limitation.
We need to understand our server's capability, then decide whether use in-memory sort or external sort.

Linux memory overcommit details

I am developing SW for embedded Linux and i am suffering system hangs because OOM Killer appears from time to time. Before going beyond i would like to solve some confusing issues about how Linux Kernel allocate dynamic memory assuming /proc/sys/vm/overcommit_memory has 0 and /proc/sys/vm/min_free_kbytes has 712, and no swap.
Supposing embedded Linux currently physical memory available is 5MB (5MB of free memory and there is not usable cached or buffered memory available) if i write this piece of code:
.....
#define MEGABYTE 1024*1024
.....
.....
void *ptr = NULL;
ptr = (void *) malloc(6*MEGABYTE); //Preserving 6MB
if (!prt)
exit(1);
memset(ptr, 1, MEGABYTE);
.....
I would like to know if when memset call is committed the kernel will try to allocate ~6MB or ~1MB (or min_free_kbytes multiple) in the physical memory space.
Right now there is about 9MB in my embedded device which has 32MB RAM. I check it by doing
# echo 3 > /proc/sys/vm/drop_caches
# free
total used free shared buffers
Mem: 23732 14184 9548 0 220
Swap: 0 0 0
Total: 23732 14184 9548
Forgetting last piece of C code, i would like to know if its possible that oom killer appears when for instance free memory is about >6MB.
I want to know if the system is out of memory when oom appears, so i think i have two options:
See VmRSS entries in /proc/pid/status of suspicious process.
Set /proc/sys/vm/overcommit_memory = 2 and /proc/sys/vm/overcommit_memory = 75 and see if there is any process requiring more of physical memory available.
I think you can read this document. Is provides you three small C programs that you can use to understand what happens with the different possible values of /proc/sys/vm/overcommit_memory .

Hello world - what is a simple program to use 16GB of memory?

How can I allocate a large amount of memory with 16GB ram? Please provide a simple C/C++ program as an example.
E.g.
main()
{
// (10 gigabytes) / (4 bytes) = 2 684 354 560
int *hugearray = malloc( 2684354560 * sizeof(int) );
}
...obviously that doesn't work.
malloc() does allocate the memory, but most OS will only give you a virtual address space until you actually try to read or write within that memory, at which time they'll start allocating backing physical or swap memory. You simply need to loop writing some garbage values into the memory.
Sample program works fine if you change the declaration from int to long.
I'm running Mint Linux on a 64-bit Intel-esque CPU with 16GB of memory.

Resources