a quick question about 64 bit pointer values and memory boundaries - winapi

I'm learning how to work with (reading/writing to) memory - so I have to deal with pointer values and math. For example some process has a base memory address of 0x00007FF786F20000 (and import address table values higher than that when running - some imported Kernel32.dll function is at 0x00007FF832C2E060).
But since laptop I work on has only 8 GBs of RAM and a pagefile of the same size - that value seem to be much much higher than anything in this system - it would be dozens of terabytes if taken at a face value so to speak.
So why is it so high? Or is some arbitrary value added to it? How to correctly interpret its position in memory? I tried googling it but not exactly sure how to word the search correctly so decided to ask for a solution here.

Related

Restricting virtual address range of a process?

On Linux x86_64, I have a simple application for which I track all memory accesses with Intel's PIN. The program uses only "a bit" of memory, most of it for dynamically allocated matrices (I've bisected the right value with ulimit). However, the memory accesses span the whole range of the VM address space, low addresses for what I presume global variables in the code, high addresses for the malloc()ed arrays.
There's a huge gap in the middle, and even in the high addresses the range is between 0x7fff4e1a37f4 and 0x7fea3af99000, which is much larger than what I would assume my application to use in total.
The post-processing that I need to do on the memory accesses deals very badly with these sparse accesses, so I'm looking for a way to restrict the virtual address range available to the process so that "it just fits", and accesses will show addresses between 0 and some more reasonable value for dynamically allocated memory (somewhere around the 40 Mb that I've discovered through ulimit).
Q: Is there an easy way to limit the available address space (and hence implicitly, available memory) to an individual process on Linux, ideally from the command line on a per-process basis?
Further notes:
I can link my application statically.
Even if I limit the memory with ulimit, the process still uses the full VM address range (not entirely unexpected).
I know about /proc/${pid}/maps, but would like to avoid creating wrappers to deal with this, and how to actually use the data in there.
I've heard about prelink (which may not apply to my static binary, but only libraries?) and can image that there are more intrusive ways to interfere with malloc(), but these solutions are too far out of my expertise to evaluate their usefulness (Limiting the heap area's Virtual address range, https://stackoverflow.com/a/29960208/60462)
If there's no simple command line-solution, instead of going for any elaborate hack, I'll probably just wing it in the post-processing and "normalize" the addresses e.g. via a few lines of perl).
ld.so(8) lists LD_PREFER_MAP_32BIT_EXEC as a way to restrict the address space to the lower 2GiB which is significantly less than the normal 64-bit address space, but possibly not small enough for your purposes.
It may also be possible to use the prctl(2) PR_SET_MM options to control the addresses of different parts of the program to solve your problem.

What is the cost of mmaping on Mac OS X?

I have an algorithm where my life would be greatly simplified if I could reserve about 20 blocks of memory addresses of size 4GB. In practice, I never use more than 4GB, but I do not know which block will fill up in advance.
If I mmap 20 blocks of 4GB everything seems to work fine -- until I write to memory the OS does not seem to actually allocate anything.
Is there any reason I should not use mmap to allocate 80GB of memory, and then only using a small amount of it? I assume there is some cost to setting up these buffers. Can I measure it?
The only drawback of mmap-ing 80GB at once is that a page table has to be created for the full 80GB. So if the pages are 4kB, this table could consume a lot of memory (unless huge pages are used).
For sizes like that it is probably better to use one or more sliding mmap-ed views (i.e. create and remove them when needed).
On Windows, memory usage for mmap/page tables can be checked with RamMap, not sure about Mac.

How does the Freepascal Memory Manager work on enlarging arrays

I recently stumbled over an issue within a FreePascal project I'm developing: The application requires a look-up array which may become very large during the runtime (a few million entries). Each array element is about 8 bytes in size.
I observed the following behavior of my application: If the array is already quite large (~130 MB), another enlargement will result in a peak in memory consumption and maybe also in a volatile raise of used RAM.
As far as I read, the peak may be explained with the internal behavior of the SetLength()-method which allocates memory with the size of the new array and then copies the old array to its new destination in memory.
But during examination of the sudden increase of used memory it seemed that there were situations where the "old" memory was not freed, resulting in a doubled usage of the RAM.
I was able to reproduce this behavior more clearly as I raised the steps in which the array was enlarged.
To become rid of this problem, I changed the memory manager to CMem and the issue was gone.
Unfortunately I did not found a clear description of the Free Pascal memory manager and I can only guess, that the space of the "old" (small) array is not used because the built-in memory manager wants the heap to be not-fragmented all the time, but I could not prove that.
Does someone of you have a source which describes the basic functionality of the Free Pascal memory manager and the C memory manager and/or the differences between both?
Thank you very much, kind regards
Alex

Need complete picture of virtual address space

This image gives a good picture about Virtual Address space. But it only says half of the story. It only gives complete picture of User Address space ie.. lower 50% (or 75% in some cases).
What about the rest 50% (or 25%) which is occupied by the kernel. I know kernel also has so many different things like kernel modules , device drivers, core kernel itself. There must be some kind of layout right?
What is its layout? If you say its Operating System dependent. I would say, there are two major operating systems Windows & Linux. Please give answer for any one these.
alt text http://img690.imageshack.us/img690/2543/virtualadressspace.gif
I've got even worse news for you, there's also a feature to explicitly randomize kernel address layouts by the OS as a security feature. This is on by default in most recent Windows, OpenBSD as well as being an option for Linux.
Like users said here, your picture is incomplete. It tends to look something specific to single-threaded OS. In particular there may be hundreds of threads within the process (hence - sharing the same address space), everyone with its own stack.
Also, I believe the actual picture of the address space may vary strongly depending on OS version and some subtle changes.
It's not completely clear from your question or the image, but with the 'System Address Space' you probably mean the area between 2GB-4GB. This indeed takes up half of the theoretical 4GB space, but there is a valid reason for it.
Normally with 32-bits you can address 4 GB of memory (2^32=4294967296) so it would seem logical to have 4 GB of address space, not 2 GB. The reason for this is the following:
Suppose you have 2 pointers, like this in C/C++:
char *ptr1;
char *ptr2;
I now want to know what the difference is between the two pointers, like this:
offset = ptr2 - ptr1;
What should be the data type of 'offset'?
If we don't know whether ptr1 comes before ptr2 or vice versa, the offset can be positive or negative. Now if both ptr1 or ptr2 are between the range 0 - 2GB, then the offset is always between -2147483648 and +2147483647, which fits exactly in a 4 byte signed integer.
However, if ptr1 and ptr2 would be able to access the full 4 GB address space, offset would be between -4294967296 and +4294967295 which doesn't fit in a 4 byte signed integer anymore.
If you are sure that you are never doing this kind of calculations in your application, or you are sure that if you subtract 2 pointers that they will be never more apart than 2 GB (or your vectors are always smaller than 2 GB), you can tell the linker (Windows, Visual Studio) that your application is LARGEADDRESSAWARE. This linker flag sets a bit in the executable, and if a 32-bit Windows is booted correctly (on XP you had to boot with the /3GB) flag, Windows gave you 3GB instead of 2GB (only for the LARGEADDRESSAWARE executables).
The remaining 1GB is still used for operating system data structures (but I have no details about them).
If you are running a 64-bit Windows, then things get even more interesting, because LARGEADDRESSAWARE executables will then get 4GB of memory. Apparently, the operating system data structures are now stored somewhere in the 64-bit address space, outside the 4GB used by the application.
Hope this clarifies a bit.
Memory Layout of Windows Kernel. Picture taken from Reversing: Secrets of Reverse Engineering
alt text http://img821.imageshack.us/img821/1525/windowskernelmemorylayo.jpg

64-bits and Memory Bandwidth

Mason asked about the advantages of a 64-bit processor.
Well, an obvious disadvantage is that you have to move more bits around. And given that memory accesses are a serious issue these days[1], moving around twice as much memory for a fair number of operations can't be a good thing.
But how bad is the effect of this, really? And what makes up for it? Or should I be running all my small apps on 32-bit machines?
I should mention that I'm considering, in particular, the case where one has a choice of running 32- or 64-bit on the same machine, so in either mode the bandwidth to main memory is the same.
[1]: And even fifteen years ago, for that matter. I remember talk as far back as that about good cache behaviour, and also particularly that the Alpha CPUs that won all the benchmarks had a giant, for the time, 8 MB of L2 cache.
Whether your app should be 64-bit depends a lot on what kind of computation it does. If you need to process very large data sets, you obviously need 64-bit pointers. If not, you need to know whether your app spends relatively more time doing arithmetic or memory accesses. On x86-64, the general purpose registers are not only twice as wide, there are twice as many and they are more "general purpose". This means that 64-bit code can have much better integer op performance. However, if your code doesn't need the extra register space, you'll probably see better performance by using smaller pointers and data, due to increased cache effectiveness. If your app is dominated by floating point operations, there probably isn't much point in making it 32-bit, because most of the memory accesses will be for wide vectors anyways, and having the extra SSE registers will help.
Most 64-bit programming environments use the "LP64" model, meaning that only pointers and long int variables (if you're a C/C++ programmer) are 64 bits. Integers (ints) remain 32-bits unless you're in the "ILP64" model, which is fairly uncommon.
I only bring it up because most int variables aren't being used for size_t-like purposes--that is, they stay within ranges comfortably held by 32 bits. For variables of that nature, you'll never be able to tell the difference.
If you're doing numerical or data-heavy work with > 4GB of data, you'll need 64 bits anyways. If you're not, you won't notice the difference, unless you're in the habit of using longs where most would use ints.
I think you're starting off with a bad assumption here. You say:
moving around twice as much memory
for a fair number of operations can't
be a good thing
and the first question is ask is "why not"? In a true 64 bit machine, the data path is 64 bits wide, and so moving 64 bits takes exactly (to a first approximation) as many cycles as moving 32 bits on a 32 bit machine. So, if you need to move 128 bytes, it takes half as many cycles as it would take on a 32 bit machine.

Resources