What will happen if the highest bit of the pointer as a sign bit,can someone make a example to explain it? - windows

I am reading the book <windows via c/c++> ,in Chapter 13 - Windows Memory Architecture -
Getting a Larger User-Mode Partition in x86 Windows
I occur at this:
In early versions of Windows, Microsoft didn't allow applications to
access their address space above 2 GB. So some creative developers
decided to leverage this and, in their code, they would use the high
bit in a pointer as a flag that had meaning only to their
applications. Then when the application accessed the memory address,
code executed that cleared the high bit of the pointer before the
memory address was used. Well, as you can imagine, when an application
runs in a user-mode environment greater than 2 GB, the application
fails in a blaze of fire.
I can't understand that, can someone make an example to explain it for me, thanks.

To access ~2GB of memory, you only need a 31 bit address. However, on 32 bit systems, addresses are 32 bit long and hence, pointers are 32 bit long.
As the book describes, in early versions of windows developers could only use 2GB of memory, therefore, the last bit in each 32-bit pointer could be used for other purposes, as it was ALWAYS zero. However, before using the address, this extra bit had to be cleared again, presumably so the program didn't crash, because it tried to access a higher than 2GB address.
The code probably looked something like this:
int val = 1;
int* p = &val;
// ...
// Using the last bit of p to set a flag, for some purpose
p |= 1UL << 31;
// ...
// Then before using the address in some way, the bit has to be cleared again:
p &= ~(1UL << 31);
*p = 3;
Now, if you can be certain that your pointers will only ever point to an address where the most significant bit (MSB) is zero, i.e. in a ~2GB address space, this is fine. However, if the address space is increased, some pointers will have a 1 in their MSB and by clearing it, you set your pointer to an incorrect address in memory. If you then try to read from or write to that address, you will have undefined behavior and your program will most likely fail in a blaze of fire.

Related

LLDB memory read failed for 0x0. Is this a bug?

In my LLDB session, memory read 0x00000003 throws an error message.
IMHO, the message error: memory read failed for 0x0 should end with 0x3.
In case this is no bug but intended behaviour, could anybody explain where the offset/trim comes from?
Further details: x86_64
The memory address will be floored (rounded down) to the nearest multiple of 256 (0x100).
You don't say what system you are on, but it is very common for 64 bit systems to unmap the first 32 bit page of memory. That was originally done to catch 32 bit -> 64 bit transition errors. A common error in 32 bit code was to pass a pointer somewhere as a 32 bit integer, which in a 64 bit world would truncate it to 32 bits. Making the read/write of a < 32 bit pointer always fail makes it much easier to trap this error.
It's also generally handy to map out the pages at 0x0 to catch an access off a nullptr immediately, so many systems map out some pages above zero, even if not the full 32 bits, for that reason as well.
So most likely lldb is right, the memory at 0x0 and some region above that is not mapped, and we can't read it.
Semnodime is right about why the access is 0x0. lldb uses a "memory cache" internally. If you read one bit of memory you're very likely to read some around it, so this speeds lldb up, particularly when doing remote debugging. So by default lldb reads some amount around the address it actually needs.
You can control the amount it reads if you want, using:
settings set target.process.memory-cache-line-size <SomeValue>
And:
settings set target.process.disable-memory-cache true
turns the cache off altogether. If you did that, lldb would then try to read starting with 0x3, but I'm guessing that's still going to fail.

Error code 487 (ERROR_INVALID_ADDRESS) when using VirtualAllocEX

I'm trying to use VirtualAllocEx(). When I set dwSize (the third parameter) to a number larger than about 63 MB, it cause to generate error code 487 when I look at GetLastError(). However, it works with smaller sizes such as 4MB.
Here is part of my code:
VirtualAllocEx(peProcessInformation.hProcess,
(LPVOID)(INH.OptionalHeader.ImageBase),
dwImageSize,
MEM_RESERVE | MEM_COMMIT,
PAGE_EXECUTE_READWRITE);
In the case that I used a 4MB EXE file, the LPVOID return value is 0x00400000, but in other cases (20MB or bigger file) it returns 0x00000000.
Is there a maximum value for the dwSize parameter?
Is there any other solution for my problem, such as another function?
My guess from your code is that you're trying to load a DLL or EXE into memory manually using something like this technique - is that right? I'll address this at the end (pun intended) but first a quick explanation of why VirtualAllocEx is failing.
Why is VirtualAllocEx giving this error?
The problem with allocating memory at a specific address is that there needs to be enough room at that address to allocate the memory size you request. This is why, generally, when you request memory you let the OS decide where to put it. (Plus letting the OS / malloc library decide can lead to other benefits, such as decreased fragmentation etc - out of scope for this answer.)
The problem you're getting is not that VirtualAllocEx is incapable of allocating 64MB rather than 4MB. VirtualAllocEx can allocate (nearly) as much memory as you want it to. The problem is that at the address you specify, in your process, there isn't 64MB of unallocated memory.
Consider hypothetical addresses 0-15 (0x0 - 0xF), where - marks empty memory and x marks allocated memory:
0 1 2 3 4 5 6 7 8 9 A B C D E F
x x - x - - - - x - - - - - - -
This is your process's memory space. Now, you want to allocate 4 bytes at address 0x4. Easy - 0x4 to 0x7 are free, so you allocate and get (new allocation marked with X):
0 1 2 3 4 5 6 7 8 9 A B C D E F
x x - x X X X X x - - - - - - -
Fantastic. But now suppose that instead you wanted to allocate 6 bytes. There aren't six free bytes at address 0x4: there's some memory being used at 0x8:
0 1 2 3 4 5 6 7 8 9 A B C D E F
x x - x - - - - x - - - - - - -
1 2 3 4 bang!
You can't do it. The problem isn't that the memory allocator can't handle allocating 6 bytes, but that the memory isn't free for it to do it. Nor, most likely, can it shuffle the memory around - in a normal non-GC program you can't move memory to make space, because you might, say, leave dangling pointers which don't know the contents of the memory they were pointing at has changed address. The only thing to do is either fail and not allocate memory at all, or allocate where it has free space, say at 0x9 or 0xA.
You might wonder why VirtualAllocEx is failing with ERROR_INVALID_ADDRESS instead of NULL: most likely, it is because you specified an address it couldn't allocate at; thus, even though there is some free memory at that address (maybe) there isn't enough and the address isn't valid. This is hinted at in the documentation:
Attempting to commit a specific address range by specifying MEM_COMMIT
without MEM_RESERVE and a non-NULL lpAddress fails unless the entire
range has already been reserved. The resulting error code is
ERROR_INVALID_ADDRESS.
This isn't quite your situation: you're specifying both flags at once, but if the method can't reserve then it effectively falls into this situation. It can't reserve the entire range at that address, so it gives error code ERROR_INVALID_ADDRESS.
Loading DLL or EXE images
So, what should you do with your problem, which I am guessing from your question and code is loading a DLL or EXE image in memory?
Here you need a bit of background on image locations in an EXE file. Generally, an EXE is loaded into memory at the process's virtual address location 0x400000. It's optional: your linker can ask it be put wherever, but this value is common. Similarly, DLLs have a common default location: 0x10000000. So, for one EXE and one DLL, you're fine: the image loader can almost certainly load them at their requested locations.
What happens when you have two DLLs, both asking to be located at 0x10000000?
The answer is image rebasing. The image location is optional, it's not necessary. Code inside the image that depends on being loaded at a specific address can be adjusted by the image loader, and so the second DLL might be loaded not at 0x10000000, but somewhere else - say, 0x1080000. That's an address difference of 0x80000, and so the loader actually patches up a bunch of addresses and code inside the DLL so all the bits that thought they should refer to 0x10000000 now refer to 0x10800000.
This is really, really common, and every time you load an EXE this will be done to several DLLs. It is so common that Microsoft have a little optimisation tool called rebase, (for "rebasing", that is, adjusting the base address) and when you distribute your EXE and your own DLLs with it, you can use this to make sure each DLL has a different base address, each of which is located so that when Windows loads your EXE and the DLLs they will already have the right addresses and it is unlikely to have to rebase - perform the above operation - on any of them. For some applications this can make a noticeable improvement in starting time. (In modern versions of Windows, sometimes DLLs are moved around anyway - this is for address space layout randomization, a security technique to deliberately make sure code is not at the same address each time it's run.)
(One other thing is that some DLL and EXE compression tools strip out the data that is used for this relocation. That's fine, because it makes the EXE smaller... right up until it needs to be relocated, and because the data is missing it can't, and so can't be loaded at all. Or you can build with a fixed base, and it will magically work right until it doesn't. Don't do this to your EXEs or DLLs.)
So, what should you do when you try to manually load a DLL into memory, and there isn't enough space for it at the address it asks to be loaded at? Easy - it's not a fatal error, just load it somewhere else, and then perform the rebasing yourself. I would suggest if you have problems to ask a new SO question, but to give you a starting point you can use the RebaseImage function, or if you can't use it or want to do it yourself, I found this code which from a quick overview seems to perform this manually. No guarantees about its correctness.
TLDR
Your process address space doesn't have 64MB of empty space at the address you specify, so you can't allocate 64MB of memory there. Instead, allocate it somewhere else and patch / rebase your loaded DLL or EXE image to match the new address.
Use from NULL in address parameter:
void* pImageBase = VirtualAllocEx(peProcessInformation.hProcess,
NULL,
dwImageSize,
MEM_RESERVE | MEM_COMMIT,
PAGE_EXECUTE_READWRITE);
If lpAddress is NULL, the function determines where to allocate the
region.
Note that for the rest of the code, use the (void* pImageBase), not the (INH.OptionalHeader.ImageBase).

When truncating a 64 bit address to a 32 bit address in windows, why do we need to guarantee that the high 33 bits are 0, and not the high 32 bits?

I've been reading Windows via C/C++ by Jeffrey Richter and came across the following snippet in the chapter about Windows' memory architecture related to porting 32 bit applications to a 64 bit environment.
If the system could somehow guarantee that no memory allocations would every be made above 0x00000000'7FFFFFFF, the application would work fine. Truncating a 64 bit address to a 32 bit address when the high 33 bits are 0 causes no problem whatsoever.
I'm having some trouble understanding why the system needs to guarantee that no memory allocations are made above 0x00000000'7FFFFFFF and not 0x00000000'FFFFFFFF. Shouldn't it be okay to truncate the address so long as the high 32 bits are 0? I'm probably missing something and would really appreciate it if someone with more knowledge about windows than me could explain why this is the case.
Not all 32bit systems/languages use unsigned values for memory addresses, so the 32th bit might have different meaning in some contexts. By limiting the address space to 31 bits, you don't run into that problem. And also, Windows limits a 32bit app from accessing addresses higher than 2 GB without the use of special extensions to extend that, so most apps would not need the 32th bit anyway.

Memory, Stack and 64 bit

On a x86 system a memory location can hold 4 bytes (32 / 8) of data, therefore a single memory address in a 64 bit system can hold 8 bytes per memory address. When examining the stack in GDB though this doesn't appear to be the case, example:
0x7fff5fbffa20: 0x00007fff5fbffa48 0x0000000000000000
0x7fff5fbffa30: 0x00007fff5fbffa48 0x00007fff857917e1
If I have this right then each hexadecimal pair (48) is a byte, thus the first memory address
0x7fff5fbffa20: is actually holding 16 bytes of data and not 8.
This has had me really confused and has for a while, so absolutely any input is vastly appreciated.
Short answer: on both x86 and x64 the minimum addressable entity is a byte: each "memory location" contains one byte, in each case. What you are seeing from GDB is only formatting: it is dumping 16 contiguous bytes, as the address increasing from ....20 to ....30, (on the left) indicates.
Long answer: 32bit or 64bit is used to indicate many things, in an architecture: almost always, is the addressable size (how many bits are in an address = how much memory you can directly address - again, bytes of memory). It also usually indicates the dimension of registers, and also (but not always) the native word size.
That means that usually, even if you can address a single byte, the machine works "better" using data of different (longer) size. What "better" means is beyond the question; a little background, however, is good to understand some misconceptions about word size in the question.

Unexpected page handling (also, VirtualLock = no op?)

This morning I stumbled across a surprising number of page faults where I did not expect them. Yes, I probably should not worry, but it still strikes me odd, because in my understanding they should not happen. And, I'd like better if they didn't.
The application (under WinXP Pro 32bit) reserves a larger section (1GB) of address space with VirtualAlloc(MEM_RESERVE) and later allocates moderately large blocks (20-50MB) of memory with VirtualAlloc(MEM_COMMIT). This is done in a worker ahead of time, the intent being to stall the main thread as little as possible. Obviously, you cannot ever assure that no page faults happen unless the memory region is currently locked, but a few of them are certainly tolerable (and unavoidable). Surprisingly every single page faults. Always.
The assumption was thus that the system only creates pages lazily after allocating them, which somehow makes sense too (although the documentation suggests something different). Fair enough, my bad.
The obvious workaround is therefore VirtualLock/VirtualUnlock, which forces the system to create those pages, as they must exist after VirtualLock returns. Surprisingly, still every single page faults.
So I wrote a little test program which did all above steps in sequence, sleeping 5 seconds in between each, to rule out something was wrong in the other code. The results were:
MEM_RESERVE 1GB ---> success, zero CPU, zero time, nothing happens
MEM_COMMIT 1 GB ---> success, zero CPU, zero time, working set increases by 2MB, 512 page faults (respectively 8 bytes of metadata allocated in user space per page)
for(... += 128kB) { VirtualLock(128kB); VirtualUnlock(128kB); } ---> success, zero CPU, zero time, nothing happens
for(... += 4096) *addr = 0; ---> 262144 page faults, about 0.25 seconds (~95% kernel time). 1GB increase for both "working set" and "physical" inside Process Explorer
VirtualFree ---> zero CPU, zero time, both "working set" and "physical" instantly go * poof *.
My expectation was that since each page had been locked once, it must physically exist at least after that. It might of course still be moved in and out of the WS as the quota is exceeded (merely changing one reference as long as sufficient RAM is available). Yet, neither the execution time, nor the working set, nor the physical memory metrics seem to support this. Rather, as it looks, each single accessed page is created upon faulting, even if it had been locked previously. Of course I can touch every page manually in a worker thread, but there must be a cleaner way too?
Am I making a wrong assumption about what VirtualLock should do or am I not understanding something right about virtual memory? Any idea about how to tell the OS in a "clean, legitimate, working" way that I'll be wanting memory, and I'll be wanting it for real?
UPDATE:
In reaction to Harry Johnston's suggestion, I tried the somewhat problematic approach of actually calling VirtualLock on a gigabyte of memory. For this to succeed, you must first set the process' working set size accordingly, since the default quotas are 200k/1M, which means VirtualLock cannot possibly lock a region larger than 200k (or rather, it cannot lock more than 200k alltogether, and that is minus what is already locked for I/O or for another reason).
After setting a minimum working set size of 1GB and a maximum of 2GB, all the page faults happen the moment VirtualAlloc(MEM_COMMIT) is called. "Virtual size" in Process Explorer jumps up by 1GB instantly. So far, it looked really, really good.
However, looking closer, "Physical" remains as it is, actual memory is really only used the moment you touch it.
VirtualLock remains a no-op (fault-wise), but raising the minimum working set size kind of got closer to the goal.
There are two problems with tampering the WS size, however. First, you're generally not meant to have a gigabyte of minimum working set in a process, because the OS tries hard to keep that amount of memory locked. This would be acceptable in my case (it's actually more or less just what I ask for).
The bigger problem is that SetProcessWorkingSetSize needs the the PROCESS_SET_QUOTA access right, which is no problem as "administrator", but it fails when you run the program as a restricted user (for a good reason), and it triggers the "allow possibly harmful program?" alert of some well-known Russian antivirus software (for no good reason, but alas, you can't turn it off).
Technically VirtualLock is a hint, and so the OS is allowed to ignore it. It's backed by the NtLockVirtualMemory syscall which on Reactos/Wine is implemented as a no-op, however Windows does back the syscall with real work (MiLockVadRange).
VirtualLock isn't guarranteed to succeed. Calls to this function require the SE_LOCK_MEMORY_PRIVILEGE to work, and the addresses must fulfil security and quota restrictions. Additionally after a VirtualUnlock, the kernel is no longer obliged to keep your page in memory, so a page fault after that is a valid action.
And as Raymond Chen points out, when you unlock the memory it can formally release the page. This means that the next VirtualLock on the next page might obtain that very same page again, so when you touch the original page you'll still get a page-fault.
VirtualLock remains a no-op (fault-wise)
I tried to reproduce this, but it worked as one might expect. Running the example code shown at the bottom of this post:
start application (523 page faults)
adjust the working set size (21 page faults)
VirtualAlloc with MEM_COMMIT 2500 MB of RAM (2 page faults)
VirtualLock all of that (about 641,250 page faults)
perform writes to all of this RAM in an infinite loop (zero page faults)
This all works pretty much as expected. 2500 MB of RAM is 640,000 pages. The numbers add up. Also, as far as the OS-wide RAM counters go, commit charge goes up at VirtualAlloc, while physical memory usage goes up at VirtualLock.
So VirtualLock is most definitely not a no-op on my Win7 x64 machine. If I don't do it, the page faults, as expected, shift to where I start writing to the RAM. They still total just over 640,000. Plus, the first time the memory is written to takes longer.
Rather, as it looks, each single accessed page is created upon faulting, even if it had been locked previously.
This is not wrong. There is no guarantee that accessing a locked-then-unlocked page won't fault. You lock it, it gets mapped to physical RAM. You unlock it, and it's free to be unmapped instantly, making a fault possible. You might hope it will stay mapped, but no guarantees...
For what it's worth, on my system with a few gigabytes of physical RAM free, it works the way you were hoping for: even if I follow my VirtualLock with an immediate VirtualUnlock and set the minimum working set size back to something small, no further page faults occur.
Here's what I did. I ran the test program (below) with and without the code that immediately unlocks the memory and restores a sensible minimum working set size, and then forced physical RAM to run out in each scenario. Before forcing low RAM, neither program gets any page faults. After forcing low RAM, the program that keeps the memory locked retains its huge working set and has no further page faults. The program that unlocked the memory, however, starts getting page faults.
This is easiest to observe if you suspend the process first, since otherwise the constant memory writes keep it all in the working set even if the memory isn't locked (obviously a desirable thing). But suspend the process, force low RAM, and watch the working set shrink only for the program that has unlocked the RAM. Resume the process, and witness an avalanche of page faults.
In other words, at least in Win7 x64 everything works exactly as you expected it to, using the code supplied below.
There are two problems with tampering the WS size, however. First, you're generally not meant to have a gigabyte of minimum working set in a process
Well... if you want to VirtualLock, you are already tampering with it. The only thing that SetProcessWorkingSetSize does is allow you to tamper with it. It doesn't degrade performance by itself; it's VirtualLock that does - but only if the system actually runs low on physical RAM.
Here's the complete program:
#include <stdio.h>
#include <tchar.h>
#include <Windows.h>
#include <iostream>
using namespace std;
int _tmain(int argc, _TCHAR* argv[])
{
SIZE_T chunkSize = 2500LL * 1024LL * 1024LL; // 2,626,568,192 = 640,000 pages
int sleep = 5000;
Sleep(sleep);
cout << "Setting working set size... ";
if (!SetProcessWorkingSetSize(GetCurrentProcess(), chunkSize + 5001001L, chunkSize * 2))
return -1;
cout << "done" << endl;
Sleep(sleep);
cout << "VirtualAlloc... ";
UINT8* data = (UINT8*) VirtualAlloc(NULL, chunkSize, MEM_COMMIT, PAGE_READWRITE);
if (data == NULL)
return -2;
cout << "done" << endl;
Sleep(sleep);
cout << "VirtualLock... ";
if (VirtualLock(data, chunkSize) == 0)
return -3;
//if (VirtualUnlock(data, chunkSize) == 0) // enable or disable to experiment with unlocks
// return -3;
//if (!SetProcessWorkingSetSize(GetCurrentProcess(), 5001001L, chunkSize * 2))
// return -1;
cout << "done" << endl;
Sleep(sleep);
cout << "Writes to the memory... ";
while (true)
{
int* end = (int*) (data + chunkSize);
for (int* d = (int*) data; d < end; d++)
*d = (int) d;
cout << "done ";
}
return 0;
}
Note that this code puts the thread to sleep after VirtualLock. According to a 2007 post by Raymond Chen, the OS is free to page it all out of physical RAM at this point and until the thread wakes up again. Note also that MSDN claims otherwise, saying that this memory will not be paged out, regardless of whether all threads are sleeping or not. On my system, they certainly remain in the physical RAM while the only thread is sleeping. I suspect Raymond's advice applied in 2007, but is no longer true in Win7.
I don't have enough reputation to comment, so I'll have to add this as an answer.
Note that this code puts the thread to sleep after VirtualLock. According to a 2007 post by Raymond Chen, the OS is free to page it all out of physical RAM at this point and until the thread wakes up again [...] I suspect Raymond's advice applied in 2007, but is no longer true in Win7.
What romkyns said has been confirmed by Raymond Chen in 2014. That is, when you lock memory with Virtual­Lock, it will remain locked even if all your threads are blocked. He also says the fact that pages remain locked, may be just an implementation detail and not contractual.
This is probably not the case, because according to msdn, it is contractual
Pages that a process has locked remain in physical memory until the process unlocks them or terminates. These pages are guaranteed not to be written to the pagefile while they are locked.

Resources