QueryWorkingSetEx returning invalid pages when applied to shared memory - windows

I'm creating a block of shared memory using CreateFileMapping and MapViewOfFile thus obtaining a pointer. I then apply QueryWorkingSetEx to to it, the problem is that i keep getting invalid pages in the PSAPI_WORKING_SET_EX_INFORMATION return structure. I'm on a NUMA architecture however the same thing happens on other non-NUMA machines.
If i try the exact same procedure on memory allocated with malloc and get valid results, is it possible that QueryWorkingSetEx does not support shared memory pointers?

after talking with Microsoft's support i was given the solution for this, since QueryWorkingSetEx is being called immediately after MapViewOfFile the memory address hasn't yet been touched so the pages are not yet backed by any physical memory.
The solution is to simply do a read loop over the memory address before QueryWorkingSetEx is invoked, this forces the memory manager to back up the pages with physical memory.

Related

map a buffer from Kernel to User space allocated by another module

I am developing a Linux kernel driver on 3.4. The purpose of this driver is to provide a mmap interface to Userspace from a buffer allocated in an other kernel module likely using kzalloc() (more details below). The pointer provided by mmap must point to the first address of this buffer.
I get the physical address from virt_to_phys(). I give this address right shifted by PAGE_SHIFT to remap_pfn_range() in my mmap fops call.
It is working for now but it looks to me that I am not doing the things properly because nothing ensure me that my buffer is at the top of the page (correct me if I am wrong). Maybe mmap()ing is not the right solution? I have already read the chapter 15 of LDD3 but maybe I am missing something?
Details:
The buffer is in fact a shared memory region allocated by the remoteproc module. This region is used within an asymetric multiprocessing design (OMAP4). I can get this buffer thanks to the rproc_da_to_va() call. That is why there is no way to use something like get_free_pages().
Regards
Kev
Yes, you're correct: there is no guarantee that the allocated memory is at the beginning of a page. And no simple way for you to both guarantee that and to make it truly shared memory.
Obviously you could (a) copy the data from the kzalloc'd address to a newly allocated page and insert that into the mmap'ing process' virtual address space, but then it's not shared with the original datum created by the other kernel module.
You could also (b) map the actual page being allocated by the other module into the process' memory map but it's not guaranteed to be on a page boundary and you'd also be sharing whatever other kernel data happened to reside in that page (which is both a security issue and a potential source of kernel data corruption by the user-space process into which you're sharing the page).
I suppose you could (c) modify the memory manager to return every piece of allocated data at the beginning of a page. This would work, but then every time a driver wants to allocate 12 bytes for some small structure, it will in fact be allocating 4K bytes (or whatever your page size is). That's going to waste a huge amount of memory.
There is simply no way to trick the processor into making the memory appear to be at two different offsets within a page. It's not physically possible.
Your best bet is probably to (d) modify the other driver to allocate the specific bits of data that you want to make shared in a way that ensures alignment on a page boundary (i.e. something you write to replace kzalloc).

Requesting memory more than available

Suppose there is 50 MB to allocate and there is this for loop which allocates memory in each iteration. What happens when the following for loop runs.
for(int i=0; i < 20; i++)
{
int *p = malloc(5MB);
}
I was asked this question in an interview. Can someone guide me on this and direct me to the necessary topics which I must learn in order to understand such situations.
If this is a system utilising virtual memory then the situation is obviously more complex than malloc simply failing and returning null pointers. In this case the malloc call will result in allocation of pages of memory in the virtual address space. When this memory is accessed a page fault will result, control will be given to the os memory manager, and this will map the virtual memory pages to pages of physical memory. When the physical memory available is full then the memory manager will generally handle further page faults by writing data which is currently in physical memory into disk backing (or simply discarding this data if it is already backed by a disk file), and then remapping this now available physical memory for the virtual memory page which originally resulted in the page fault. Any attempts to access virtual address pages which have previously been written out to disk backing will result in a similar process occurring.
Wikipedia contains a reasonably basic overview of this process (http://en.wikipedia.org/wiki/Paging), including some implementation details for different os's. More detailed information is available from a number of other sources, eg. the intel architectures software development manuals (http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html)
According to documentation for malloc (eg. http://www.cplusplus.com/reference/cstdlib/malloc/) you will get a pointer to a memory block of 5MB (noting that the argument for malloc should be the number of bytes required, not MB..) the first 10 times (maybe 9 times if there is overhead associated with the malloc'ed space which is taken out of the 50MB available), and a pointer to this space will be returned. Following this further 5MB chunks of memory will not be available, and malloc will fail, returning a null pointer.

How to pin a shared memory segment into physical memory

I use boost::interprocess::managed_shared_memory to load a data structure in shared memory. I need to pin the shared memory segment into physical memory (for example similar to system call mlock for mapped files).
In linux, sooner or later my data structure gets swapped out of physical memory. In my case this incurs a forbidding cost for the next process accessing the structure, after it has been swapped out.
Is there any way to pin shared memory into physical memory? I am interested in any solution, even if it means that I cannot use boost::interprocess.
Using basic_managed_xsi_shared_memory (apparently available since boost 1.46), you can access the underlying shmid (from the get_shmid member) which should allow you to control the shmid using shmctl. With shmctl you can prevent the swapping of shared memory pages by applying the SHM_LOCK command to the shmid.
Other types of locking (which you refer to as 'pinning'), such as locking memory mapped files into memory, can be realized by supplying return values obtained from mapped_region's get_address and get_size member functions to the mlock command.

Is there any way to convert unshared heap memory to shared memory? Primary target *nix

I was wondering whether there is any reasonably portable way to take existing, unshared heap memory and to convert it into shared memory. The usage case is a block of memory which is too large for me to want to copy it unnecessarily (i.e. into a new shared memory segment) and which is allocated by routines out of my control. The primary target is *nix/POSIX, but I would also be interested to know if it can be done on Windows.
Many *nixe have Plan-9's procfs which allows to open read a process's memory by inspecting /proc/{pid}/mem
http://en.wikipedia.org/wiki/Procfs
You tell the other process your pid, size and the base address and it can simply read the data (or mmap the region in its own address space)
EDIT:: Apparently you can't open /proc/{pid}/mem without a prior ptrace(), so this is basically worthless.
Under most *nixes, ptrace(2) allows to attach to a process and read its memory.
The ptrace method does not work on OSX, you need more magic there:
http://www.voidzone.org/readprocessmemory-and-writeprocessmemory-on-os-x/
What you need under Windows is the function ReadProcessMemory
.
Googling "What is ReadProcessMemory for $OSNAME" seems to return comprehensive result sets.
You can try using mmap() with MAP_FIXED flag and address of your unshared memory (allocated from heap). However, if you provide mmap() with own pointer then it is constrained to be aligned and sized according to the size of memory page (may be requested by sysconf()) since mapping is only available for the whole page.

What is the purpose of allocating pages in the pagefile with CreateFileMapping?

The function CreateFileMapping can be used to allocate space in the pagefile (if the first argument is INVALID_HANDLE_VALUE). The allocated space can later be memory mapped into the process virtual address space.
Why would I want to do this instead of using just VirtualAlloc?
It seems that both functions do almost the same thing. Memory allocated by VirtualAlloc may at some point be pushed out to the pagefile. Why should I need an API that specifically requests that my pages be allocated there in the first instance? Why should I care where my private pages live?
Is it just a hint to the OS about my expected memory usage patterns? (Ie, the former is a hint to swap out those pages more aggressively.)
Or is it simply a convenience method when working with very large datasets on 32-bit processes? (Ie, I can use CreateFileMapping to make >4Gb allocations, then memory map smaller chunks of the space as needed. Using the pagefile saves me the work of manually managing my own set of files to "swap" to.)
PS. This question is sparked by an article I read recently: http://blogs.technet.com/markrussinovich/archive/2008/11/17/3155406.aspx
From the CreateFileMappingFunction:
A single file mapping object can be shared by multiple processes.
Can the Virtual memory be shared across multiple processes?
One reason is to share memory among different processes. Different processes by only knowing the name of the mapping object can communicate over page file. This is preferable over creating a real file and doing the communications. Of course there may be other use cases. You can refer to Using a File Mapping for IPC at MSDN for more information.

Resources