DLLs reloaded to their preferred address - windows

On Windows Server 2003, my application has started taking a long time to load on fresh install. Suspecting the DLLs are not loading to their preferred address and this is taking some time (the application has over 100 DLLs, 3rd parties included) I ran the sysinternals listDLLs utility, asking it to flag every dll that has been relocated. Oddly enough, for most of the DLLs in the list I get something like this:
Base Size Path
### Relocated from base of 0x44e90000:
0x44e90000 0x39000 validation.dll
That is: they are flagged as relocated (and the load time definitely seems to support that theory) but their load address remains the preferred address.
Some third party DLLs seem to be immune from this, but as a whole this happens to ~90% of the DLLs loaded by the application.
On windows 7, it would seem the only flagged DLLs are ones that actually move, and loading time is (as expected) significantly faster.
What is causing this? How can I stop it?
Edited: Since it sounds (in theory) like the effects of ASLR, I checked and while the OS DLLs are indeed ASLR-enabled, ours are not. And even those are relocated in place, and therefore not taking up the address for any of the other DLLs.

This is very common, setting the linker's /BASE option is often overlooked and maintaining it when DLLs grow is an unpleasant maintenance task. This does not tend to repeat well between operating system versions, they'll load different DLLs ahead of yours and that can force relocation on one but not the other. Also a single relocation can cause a train of forced relocations on all subsequent DLLs.
Having this notably affect loading time is a bit remote on modern machines. The relocation itself is very fast, just a memory operation. You do pay for having the memory used by the relocated DLL getting committed. Required since the original DLL file is no longer suitable to reload code when it gets swapped out, it is now backed by the paging file. If it needs to grow to accommodate the commit size then that costs time. That's not common.
The far more common issue in loading time is the speed of the disk drive. An issue when you have a lot of DLLs, they need to be located on disk on a cold start. With 100 DLLs, that can easily cost 5 seconds. You should suspect a cold start problem when you don't see the delay when you terminate the program and start it again. That's a warm start, the DLLs are already present in the file system cache so don't have to be found again. Solving a cold start problem requires better hardware, SSDs are nice. Or the machine learning your usage pattern so SuperFetch will pre-fetch the DLLs for you before you start the program.
Anyhoo, if you do suspect a rebasing problem then you'll need to create your own memory map to find good base addresses that don't force relocation. You need a good starting point, knowing the load order and sizes of the DLLs. You get that from, say, the VS debugger. The Output window shows the load order, the Debug + Windows + Modules window shows the DLL sizes. The linker supports specifying a .txt file for the base addresses in the /BASE option, best way to do this so you don't constantly have to tinker with individual /BASE value while your code keeps growing.

Related

Why can't running exes and loaded dlls be deleted on Windows?

I mean, what's the point? They're on system memory anyway.
I couldn't find any "official" docs that explains why Windows protects loaded objects (exe, dll and even ocx).
I'm guessing:
Intended measure for security matter or against human error
File system limitation
We can easily delete any file unless locked on Unix. This only hinders ux in my opinion. Hoogle "how to delete dll" if you need proof. Many people suffered and i'm one of them.
Any words that Microsoft mention about this?
Any way to disable this "protection"? (probably isn't and never will be because Windows!)
They're on system memory anyway.
No, they're not. Individual pages are loaded on demand, and discarded from RAM when the system decides that they've been unused for a while and the RAM could be put to better use for another process (or another page in this process).
Which means that, effectively, the EXE file is open for as long as the process is running, and the DLL file is open until/unless the process unloads the DLL, in both cases so pages can be loaded/reloaded as needed.

Does ASLR mean rebasing dlls isn't required?

Am I right in thinking there is no point in rebasing our dlls during our build if we use ASLR as the dlls will be rebased again anyway when the kernel comes to load them?
I am concerned that our application is often used on Terminal Services machines. So, if rebasing occurs at load time, we could end up with dlls being rebased for each process they are loaded into (there would be one process per session). And this would result in more memory usage and paging than we want to pay for. Do I need to be concerned?
I've found the following blog post that says the rebasing only happens once and it is system wide: Matt Evans - Enabling ASLR for memory savings?. I haven't seen any other references about this, so just wanted to be sure if I use ASLR and don't rebase during our build I won't cause memory problems on a Terminal Services box?
So based on my reading you should not have a problem. ASLR causes the the dll's to be loaded to semi random memory address and should not just start rebasing for every process. If you want to check memory use of dll's there is a free tool called MassiveRebase that lets you dynamically load two dll's and view info about their memory use. The was designed to view changes that aslr may have on memory.
The tool and more about it can be found here: http://www.tmurgent.com/appv/index.php/en/resources/tools/137-massive-rebase
Hope this helps.
Rebasing is still helpful. When the operating system loads, it applies a fixed random value to the DLL base.
The result is that the location a DLL is loaded to, is typical for a single boot, but different between machines and boots.
This means that a given DLL in lots of processes can be shared between processes, as all its code data is shared with the same value.
When a DLL is moved because it's address space is taken, it has to modify the fixups, and less of the DLL is shared, increasing system load.
If your DLL is not shared, then it does not affect resources.
The cost of fixing up a DLL used to be cheaper if it was loaded to the correct place, not sure if that is true for ASLR, but may still save resource loading time.

Does Windows XP have an equivalent to VAX/VMS Installed Shared Images?

Back in the good old/bad old days when I developed on VAX/VMS it had a feature called 'Installed Shared Images' whereby if one expected one's executable program would be run by many users concurrently one could invoke the INSTALL utility thus:
$ INSTALL
INSTALL> ADD ONES_PROGRAM.EXE/SHARE
INSTALL> EXIT
The /SHARE flag had the effect of separating out the code from the data so that concurrent users of ONES_PROGRAM.EXE would all share the code (on a read-only basis of course) but each would have their own copy of the data (on a read-write basis). This technique/feature saved Mbytes of memory (which was necessary in those days) as only ONE copy of the program's code ever needed to be resident in VAX memory irrespective of the number of concurrent users.
Does Windows XP have something similar? I can't figure out if the Control Panel's 'Add Programs/Features' is the equivalent (I think it is, but I'm not sure)
Many thanks for any info
Richard
p.s. INSTALL would also share Libraries as well as Programs in case you were curious
The Windows virtual memory manager will do this automatically for you. So long as the module can be loaded at the same address in each process, the physical memory for the code will be shared between each process that loads that module. That is true for all modules, libraries as well as executables.
This is achieved by the linker marking code segments as being shareable. So, linkers mark code segments as being shareable, and data segments otherwise.
The bottom line is that you do not have to do anything explicit to make this happen.

How to speed up Visual Studio 2008 for many-project solutions?

I am aware that there are a couple of questions that look similar to mine, e.g. here, here, here or here. Yet none of these really answer my question. Here it goes.
I am building a modified version of a Chromium browser in VS2008 (mostly written in C++). It has 500 projects in one solution with plenty of dependencies between them. There are generally two problems:
Whenever I start debugging (press F5 or green Play button) for the first time in a session the VS freezes and it takes a couple of minutes before it recovers and actually starts debugging. Note that I have disabled building before running, because whenever I want to build my project I use F7 explicitly. I do not understand why it takes so long to "just" start a compiled binary. Probably VS is checking all the deps and making sure everything up-to-date despite my request not to build a solution before running. Is the a way speed this one up?
Every time I perform a build it takes about 5-7 minutes even if I have only changed one instruction in one file. Most of the time is consumed by the linking process, since most projects generate static libs that are then linked into one huge dll. Apparently incremental linking only works in about 10% of the cases and still takes considerably long. What can I do to speed it up?
Here is some info about my software and hardware:
MacBook Pro (Mid-2010)
8 GB RAM
dual-core Intel i7 CPU with HT (which makes it look like 4-core in Task Manager)
500GB Serial ATA; 5400 rpm (Hitachi HTS725050A9A362)
Windows 7 Professional 64-bit
Visual Assist X (with disabled code coloring)
Here are some things that I have noticed:
Linking only uses one core
When running solution for the second time in one session it is much quicker (under 2-3 seconds)
while looking up information on VS linker I came across this page:
http://msdn.microsoft.com/en-us/library/whs4y2dc%28v=vs.80%29.aspx
Also take a look the two additional topics on that page:
Faster Builds and Smaller Header Files
Excluding Files When Dependency Checking
I have switched to the component build mode for Chromium project, which reduced the number of files that need to be linked. Component build mode creates a set of smaller DLLs rather than a set of static libraries that are then linked into huge chrome.dll. Also I am using incremental linking a lot, which makes linking even faster. Finally linking for the second and subsequent times gets faster since necessary files are already cached in the memory and disk access is unnecessary. Thus when working incrementally and linking often, I get to as low as 15 seconds for linking of webkit.dll which is where I mostly change the code.
As for the execution it has same behavior as linking - it runs slow only for the first time and with every subsequent run it gets faster and faster until it takes less than 3-5 seconds to start the browser and load all symbols. Windows is caching files that are accessed most often into the memory.

Are there risks associated with IMAGE_FILE_REMOVABLE_RUN_FROM_SWAP or IMAGE_FILE_NET_RUN_FROM_SWAP?

I'm thinking of including the IMAGE_FILE_REMOVABLE_RUN_FROM_SWAP and IMAGE_FILE_NET_RUN_FROM_SWAP PE flags to my executable.
The idea is to prevent occasional exceptions seen by clients who run the executable from the network, for example when network volumes fail to reconnect after sleep. Up to now we have always advised clients to run executables from locally connected volumes.
However, I don't know enough about virtual memory, the loader etc. to know what, if any, risks there are associated with using these PE flags.
For example, if I do this will more physical memory be consumed by my executable, especially if there are multiple instances of the executable running at the same time?
I'm sorry that I can't give more examples of potential risks, but that's the nature of my question. I have a feeling that there could be downsides to doing this but simply don't know what those downsides could be.
The PE loader works together vith the virtual memory manager. Simply put, your executable isn't so much loaded as demand-paged in. And, of course, demand-paged out. Since executables are locked and don't change, this works quite well. No swap is needed; RAM just contains the MRU parts.
The PE flags change this. If the conditions are satisfied, the executable isn't locked and might change/disappear. This means the VMM has to keep all its pages either in RAM or swap, even at startup. That's a lot of copying and RAM use, but as a result the loss of the network no longer causes page-in faults. And when RAM is low, pages can't be discarded but have to be saved to swap.
In particular, these flags work if and only if the conditions are satisfied. IMAGE_FILE_NET_RUN_FROM_SWAP does not affect apps that are run locally. So the only customers that pay the price in RAM/swap are those that choose to.

Resources