VB6: Minor memory management mystery - vb6

I can't quite wrap my head around what just happened in a VB6 program. This is on Win7-64 and Win10.
I wrote a Q & D proof of concept for loading and displaying 4K (3640x2160) images. Each image takes up 24MB of memory, so I "knew", based on the 2GB memory limit for 32-bit processes, that I could load at most ~80 images.
The system has 32GB of memory, but that's all not accessible to my program... right?
Const nPix As Long = 80
Dim Pix(1 To nPix) as stdPicture ' an OLE construct
For k = 1 to nPix
Pix(k) = LoadPicture("next in folder")
Next
No problem, takes a bit of time but works and uses the expected memory.
For grins I increased nPix to 100, just to see how it failed. But it didn't. Tried nPix = 200, then 300. Still kept going, by then eating up 8GB of system memory. And no problem at all with:
PictureBox.PaintPicture Pix(300)
What the heck is going on here? Whose memory am I using, and how?

I think it is because the images are loaded by operating system itself and only returns some kind of handle to VB6 process for manipulating...

Related

Julia drawing from standard normal distribution

I need to draw 53000000 observations from a standard normal distribution. My current code takes a long time to run in Julia (in fact, it's been running for the past twenty minutes) and I'm wondering if there's anything I can do to speed it up. Here's what I tried:
using Distributions
d = Normal()
shock = rand(d, 1, 53000000)
The code works instantaneously when I execute it in REPL (I am working in Juno/Atom), but lags at this point (drawing from the standard normal) when I step through using the debugger. So I think the debugger may be the real culprit here.
It may be that the 1/2 gig of memory used by the allocation of the variable shock is sometimes causing swapping when the debugger is loaded.
Try running this to see, in the debugger:
using Distributions, Base.Sys
println("Free memory is $(Int(Sys.free_memory()))")
d = Normal()
shock = rand(d, 1, 53000000)
println("shock uses $(sizeof(shock)) bytes.")
println("Free memory is $(Int(Sys.free_memory()))")
Are you close to out of memory in gigs?

Error code 487 (ERROR_INVALID_ADDRESS) when using VirtualAllocEX

I'm trying to use VirtualAllocEx(). When I set dwSize (the third parameter) to a number larger than about 63 MB, it cause to generate error code 487 when I look at GetLastError(). However, it works with smaller sizes such as 4MB.
Here is part of my code:
VirtualAllocEx(peProcessInformation.hProcess,
(LPVOID)(INH.OptionalHeader.ImageBase),
dwImageSize,
MEM_RESERVE | MEM_COMMIT,
PAGE_EXECUTE_READWRITE);
In the case that I used a 4MB EXE file, the LPVOID return value is 0x00400000, but in other cases (20MB or bigger file) it returns 0x00000000.
Is there a maximum value for the dwSize parameter?
Is there any other solution for my problem, such as another function?
My guess from your code is that you're trying to load a DLL or EXE into memory manually using something like this technique - is that right? I'll address this at the end (pun intended) but first a quick explanation of why VirtualAllocEx is failing.
Why is VirtualAllocEx giving this error?
The problem with allocating memory at a specific address is that there needs to be enough room at that address to allocate the memory size you request. This is why, generally, when you request memory you let the OS decide where to put it. (Plus letting the OS / malloc library decide can lead to other benefits, such as decreased fragmentation etc - out of scope for this answer.)
The problem you're getting is not that VirtualAllocEx is incapable of allocating 64MB rather than 4MB. VirtualAllocEx can allocate (nearly) as much memory as you want it to. The problem is that at the address you specify, in your process, there isn't 64MB of unallocated memory.
Consider hypothetical addresses 0-15 (0x0 - 0xF), where - marks empty memory and x marks allocated memory:
0 1 2 3 4 5 6 7 8 9 A B C D E F
x x - x - - - - x - - - - - - -
This is your process's memory space. Now, you want to allocate 4 bytes at address 0x4. Easy - 0x4 to 0x7 are free, so you allocate and get (new allocation marked with X):
0 1 2 3 4 5 6 7 8 9 A B C D E F
x x - x X X X X x - - - - - - -
Fantastic. But now suppose that instead you wanted to allocate 6 bytes. There aren't six free bytes at address 0x4: there's some memory being used at 0x8:
0 1 2 3 4 5 6 7 8 9 A B C D E F
x x - x - - - - x - - - - - - -
1 2 3 4 bang!
You can't do it. The problem isn't that the memory allocator can't handle allocating 6 bytes, but that the memory isn't free for it to do it. Nor, most likely, can it shuffle the memory around - in a normal non-GC program you can't move memory to make space, because you might, say, leave dangling pointers which don't know the contents of the memory they were pointing at has changed address. The only thing to do is either fail and not allocate memory at all, or allocate where it has free space, say at 0x9 or 0xA.
You might wonder why VirtualAllocEx is failing with ERROR_INVALID_ADDRESS instead of NULL: most likely, it is because you specified an address it couldn't allocate at; thus, even though there is some free memory at that address (maybe) there isn't enough and the address isn't valid. This is hinted at in the documentation:
Attempting to commit a specific address range by specifying MEM_COMMIT
without MEM_RESERVE and a non-NULL lpAddress fails unless the entire
range has already been reserved. The resulting error code is
ERROR_INVALID_ADDRESS.
This isn't quite your situation: you're specifying both flags at once, but if the method can't reserve then it effectively falls into this situation. It can't reserve the entire range at that address, so it gives error code ERROR_INVALID_ADDRESS.
Loading DLL or EXE images
So, what should you do with your problem, which I am guessing from your question and code is loading a DLL or EXE image in memory?
Here you need a bit of background on image locations in an EXE file. Generally, an EXE is loaded into memory at the process's virtual address location 0x400000. It's optional: your linker can ask it be put wherever, but this value is common. Similarly, DLLs have a common default location: 0x10000000. So, for one EXE and one DLL, you're fine: the image loader can almost certainly load them at their requested locations.
What happens when you have two DLLs, both asking to be located at 0x10000000?
The answer is image rebasing. The image location is optional, it's not necessary. Code inside the image that depends on being loaded at a specific address can be adjusted by the image loader, and so the second DLL might be loaded not at 0x10000000, but somewhere else - say, 0x1080000. That's an address difference of 0x80000, and so the loader actually patches up a bunch of addresses and code inside the DLL so all the bits that thought they should refer to 0x10000000 now refer to 0x10800000.
This is really, really common, and every time you load an EXE this will be done to several DLLs. It is so common that Microsoft have a little optimisation tool called rebase, (for "rebasing", that is, adjusting the base address) and when you distribute your EXE and your own DLLs with it, you can use this to make sure each DLL has a different base address, each of which is located so that when Windows loads your EXE and the DLLs they will already have the right addresses and it is unlikely to have to rebase - perform the above operation - on any of them. For some applications this can make a noticeable improvement in starting time. (In modern versions of Windows, sometimes DLLs are moved around anyway - this is for address space layout randomization, a security technique to deliberately make sure code is not at the same address each time it's run.)
(One other thing is that some DLL and EXE compression tools strip out the data that is used for this relocation. That's fine, because it makes the EXE smaller... right up until it needs to be relocated, and because the data is missing it can't, and so can't be loaded at all. Or you can build with a fixed base, and it will magically work right until it doesn't. Don't do this to your EXEs or DLLs.)
So, what should you do when you try to manually load a DLL into memory, and there isn't enough space for it at the address it asks to be loaded at? Easy - it's not a fatal error, just load it somewhere else, and then perform the rebasing yourself. I would suggest if you have problems to ask a new SO question, but to give you a starting point you can use the RebaseImage function, or if you can't use it or want to do it yourself, I found this code which from a quick overview seems to perform this manually. No guarantees about its correctness.
TLDR
Your process address space doesn't have 64MB of empty space at the address you specify, so you can't allocate 64MB of memory there. Instead, allocate it somewhere else and patch / rebase your loaded DLL or EXE image to match the new address.
Use from NULL in address parameter:
void* pImageBase = VirtualAllocEx(peProcessInformation.hProcess,
NULL,
dwImageSize,
MEM_RESERVE | MEM_COMMIT,
PAGE_EXECUTE_READWRITE);
If lpAddress is NULL, the function determines where to allocate the
region.
Note that for the rest of the code, use the (void* pImageBase), not the (INH.OptionalHeader.ImageBase).

Memory used by a long running process on OS X

I want to ensure that a long-running number crunching algorithm doesn't use too much memory. The algorithm is written in C++ and runs on OS X. A drastically simplified version is:
int main() {
while (someCondition) {
// notice nothing is allocated on the heap
vector<int> v(10, 0);
}
}
I've profiled the code using Instruments (allocations and leaks). I don't see any leaks. And while the "live bytes" count looks fine (hovers around 20 MB) the "overall bytes" count keeps growing. What concerned me is when the "overall count" reached about 80 GB I received an OS X warning about lack of hard disk space (I have a 120 GB solid state disk). I don't know much about OS/process interaction so I thought I'd ask:
Is memory used by a long running process on a UNIX-based OS available to other processes before the first process is killed or no longer running?
Edit: Looks like I'm misinterpreting the "overall bytes" number in Instruments:Instruments ObjectAlloc: Explanation of Live Bytes & Overall Bytes. When I check out the process in Activity Monitor the "real memory" is essentially constant.
The reason you get a disk space warning is probably related to virtual memory allocation. Every time your process (or the OS) requests memory it is usually first "allocated" in backing-store - swap.
Total virtual memory is size of available swap plus RAM. I do not have access to OSX, and I know it plays by its own rules, but there must be a command that shows swap usage
swap -l (Solaris)
swap -s (Solaris)
free (linux)
The only command I came up with is vm_stat, plus top - it appears top is probably the closest to what I am talking about.

Unexpected page handling (also, VirtualLock = no op?)

This morning I stumbled across a surprising number of page faults where I did not expect them. Yes, I probably should not worry, but it still strikes me odd, because in my understanding they should not happen. And, I'd like better if they didn't.
The application (under WinXP Pro 32bit) reserves a larger section (1GB) of address space with VirtualAlloc(MEM_RESERVE) and later allocates moderately large blocks (20-50MB) of memory with VirtualAlloc(MEM_COMMIT). This is done in a worker ahead of time, the intent being to stall the main thread as little as possible. Obviously, you cannot ever assure that no page faults happen unless the memory region is currently locked, but a few of them are certainly tolerable (and unavoidable). Surprisingly every single page faults. Always.
The assumption was thus that the system only creates pages lazily after allocating them, which somehow makes sense too (although the documentation suggests something different). Fair enough, my bad.
The obvious workaround is therefore VirtualLock/VirtualUnlock, which forces the system to create those pages, as they must exist after VirtualLock returns. Surprisingly, still every single page faults.
So I wrote a little test program which did all above steps in sequence, sleeping 5 seconds in between each, to rule out something was wrong in the other code. The results were:
MEM_RESERVE 1GB ---> success, zero CPU, zero time, nothing happens
MEM_COMMIT 1 GB ---> success, zero CPU, zero time, working set increases by 2MB, 512 page faults (respectively 8 bytes of metadata allocated in user space per page)
for(... += 128kB) { VirtualLock(128kB); VirtualUnlock(128kB); } ---> success, zero CPU, zero time, nothing happens
for(... += 4096) *addr = 0; ---> 262144 page faults, about 0.25 seconds (~95% kernel time). 1GB increase for both "working set" and "physical" inside Process Explorer
VirtualFree ---> zero CPU, zero time, both "working set" and "physical" instantly go * poof *.
My expectation was that since each page had been locked once, it must physically exist at least after that. It might of course still be moved in and out of the WS as the quota is exceeded (merely changing one reference as long as sufficient RAM is available). Yet, neither the execution time, nor the working set, nor the physical memory metrics seem to support this. Rather, as it looks, each single accessed page is created upon faulting, even if it had been locked previously. Of course I can touch every page manually in a worker thread, but there must be a cleaner way too?
Am I making a wrong assumption about what VirtualLock should do or am I not understanding something right about virtual memory? Any idea about how to tell the OS in a "clean, legitimate, working" way that I'll be wanting memory, and I'll be wanting it for real?
UPDATE:
In reaction to Harry Johnston's suggestion, I tried the somewhat problematic approach of actually calling VirtualLock on a gigabyte of memory. For this to succeed, you must first set the process' working set size accordingly, since the default quotas are 200k/1M, which means VirtualLock cannot possibly lock a region larger than 200k (or rather, it cannot lock more than 200k alltogether, and that is minus what is already locked for I/O or for another reason).
After setting a minimum working set size of 1GB and a maximum of 2GB, all the page faults happen the moment VirtualAlloc(MEM_COMMIT) is called. "Virtual size" in Process Explorer jumps up by 1GB instantly. So far, it looked really, really good.
However, looking closer, "Physical" remains as it is, actual memory is really only used the moment you touch it.
VirtualLock remains a no-op (fault-wise), but raising the minimum working set size kind of got closer to the goal.
There are two problems with tampering the WS size, however. First, you're generally not meant to have a gigabyte of minimum working set in a process, because the OS tries hard to keep that amount of memory locked. This would be acceptable in my case (it's actually more or less just what I ask for).
The bigger problem is that SetProcessWorkingSetSize needs the the PROCESS_SET_QUOTA access right, which is no problem as "administrator", but it fails when you run the program as a restricted user (for a good reason), and it triggers the "allow possibly harmful program?" alert of some well-known Russian antivirus software (for no good reason, but alas, you can't turn it off).
Technically VirtualLock is a hint, and so the OS is allowed to ignore it. It's backed by the NtLockVirtualMemory syscall which on Reactos/Wine is implemented as a no-op, however Windows does back the syscall with real work (MiLockVadRange).
VirtualLock isn't guarranteed to succeed. Calls to this function require the SE_LOCK_MEMORY_PRIVILEGE to work, and the addresses must fulfil security and quota restrictions. Additionally after a VirtualUnlock, the kernel is no longer obliged to keep your page in memory, so a page fault after that is a valid action.
And as Raymond Chen points out, when you unlock the memory it can formally release the page. This means that the next VirtualLock on the next page might obtain that very same page again, so when you touch the original page you'll still get a page-fault.
VirtualLock remains a no-op (fault-wise)
I tried to reproduce this, but it worked as one might expect. Running the example code shown at the bottom of this post:
start application (523 page faults)
adjust the working set size (21 page faults)
VirtualAlloc with MEM_COMMIT 2500 MB of RAM (2 page faults)
VirtualLock all of that (about 641,250 page faults)
perform writes to all of this RAM in an infinite loop (zero page faults)
This all works pretty much as expected. 2500 MB of RAM is 640,000 pages. The numbers add up. Also, as far as the OS-wide RAM counters go, commit charge goes up at VirtualAlloc, while physical memory usage goes up at VirtualLock.
So VirtualLock is most definitely not a no-op on my Win7 x64 machine. If I don't do it, the page faults, as expected, shift to where I start writing to the RAM. They still total just over 640,000. Plus, the first time the memory is written to takes longer.
Rather, as it looks, each single accessed page is created upon faulting, even if it had been locked previously.
This is not wrong. There is no guarantee that accessing a locked-then-unlocked page won't fault. You lock it, it gets mapped to physical RAM. You unlock it, and it's free to be unmapped instantly, making a fault possible. You might hope it will stay mapped, but no guarantees...
For what it's worth, on my system with a few gigabytes of physical RAM free, it works the way you were hoping for: even if I follow my VirtualLock with an immediate VirtualUnlock and set the minimum working set size back to something small, no further page faults occur.
Here's what I did. I ran the test program (below) with and without the code that immediately unlocks the memory and restores a sensible minimum working set size, and then forced physical RAM to run out in each scenario. Before forcing low RAM, neither program gets any page faults. After forcing low RAM, the program that keeps the memory locked retains its huge working set and has no further page faults. The program that unlocked the memory, however, starts getting page faults.
This is easiest to observe if you suspend the process first, since otherwise the constant memory writes keep it all in the working set even if the memory isn't locked (obviously a desirable thing). But suspend the process, force low RAM, and watch the working set shrink only for the program that has unlocked the RAM. Resume the process, and witness an avalanche of page faults.
In other words, at least in Win7 x64 everything works exactly as you expected it to, using the code supplied below.
There are two problems with tampering the WS size, however. First, you're generally not meant to have a gigabyte of minimum working set in a process
Well... if you want to VirtualLock, you are already tampering with it. The only thing that SetProcessWorkingSetSize does is allow you to tamper with it. It doesn't degrade performance by itself; it's VirtualLock that does - but only if the system actually runs low on physical RAM.
Here's the complete program:
#include <stdio.h>
#include <tchar.h>
#include <Windows.h>
#include <iostream>
using namespace std;
int _tmain(int argc, _TCHAR* argv[])
{
SIZE_T chunkSize = 2500LL * 1024LL * 1024LL; // 2,626,568,192 = 640,000 pages
int sleep = 5000;
Sleep(sleep);
cout << "Setting working set size... ";
if (!SetProcessWorkingSetSize(GetCurrentProcess(), chunkSize + 5001001L, chunkSize * 2))
return -1;
cout << "done" << endl;
Sleep(sleep);
cout << "VirtualAlloc... ";
UINT8* data = (UINT8*) VirtualAlloc(NULL, chunkSize, MEM_COMMIT, PAGE_READWRITE);
if (data == NULL)
return -2;
cout << "done" << endl;
Sleep(sleep);
cout << "VirtualLock... ";
if (VirtualLock(data, chunkSize) == 0)
return -3;
//if (VirtualUnlock(data, chunkSize) == 0) // enable or disable to experiment with unlocks
// return -3;
//if (!SetProcessWorkingSetSize(GetCurrentProcess(), 5001001L, chunkSize * 2))
// return -1;
cout << "done" << endl;
Sleep(sleep);
cout << "Writes to the memory... ";
while (true)
{
int* end = (int*) (data + chunkSize);
for (int* d = (int*) data; d < end; d++)
*d = (int) d;
cout << "done ";
}
return 0;
}
Note that this code puts the thread to sleep after VirtualLock. According to a 2007 post by Raymond Chen, the OS is free to page it all out of physical RAM at this point and until the thread wakes up again. Note also that MSDN claims otherwise, saying that this memory will not be paged out, regardless of whether all threads are sleeping or not. On my system, they certainly remain in the physical RAM while the only thread is sleeping. I suspect Raymond's advice applied in 2007, but is no longer true in Win7.
I don't have enough reputation to comment, so I'll have to add this as an answer.
Note that this code puts the thread to sleep after VirtualLock. According to a 2007 post by Raymond Chen, the OS is free to page it all out of physical RAM at this point and until the thread wakes up again [...] I suspect Raymond's advice applied in 2007, but is no longer true in Win7.
What romkyns said has been confirmed by Raymond Chen in 2014. That is, when you lock memory with VirtualĀ­Lock, it will remain locked even if all your threads are blocked. He also says the fact that pages remain locked, may be just an implementation detail and not contractual.
This is probably not the case, because according to msdn, it is contractual
Pages that a process has locked remain in physical memory until the process unlocks them or terminates. These pages are guaranteed not to be written to the pagefile while they are locked.

Examining Erlang crash dumps - how to account for all memory?

I've been poring over this Erlang crash dump where the VM has run out of heap memory. The problem is that there is no obvious culprit allocating all that memory.
Using some serious black awk magic I've summed up the fields Stack+heap, OldHeap, Heap unused and OldHeap unused for each process and ranked them by memory usage. The problem is that this number doesn't come even close to the number that is representing the total memory for all the processes processes_used according to the Erlang crash dump guide.
I've already tried the Crashdump Viewer and either I'm missing something or there isn't much help there for my kind of problem.
The number I get is 525 MB whereas the processes_used value is at 1348 MB. Where can I find the rest of the memory?
Edit: The Heap unused and OldHeap unused shouldn't have been included since they are a sub-part of Stack+Heap and OldHeap, that plus the fact that the number displayed for Stack+Heap and OldHeap are listed as number of words, not bytes, was the problem.
There is an module called crashdump_viewer which is great for these kinds of analysis.
Another thing to keep in mind is that Heap+Stack is afaik in words, not bytes which would mean that you have to multiply Heap+Stack with 4 on 32 and 8 on 64 bit. Can't find a reference in the manual for this but Processes talks about it a bit.

Resources