Why creating DLLs instead of compiling everything to a one big executable? - performance

I saw and done myself a lot of small products where a same piece of software is separated into one executable and several DLLs, and those DLLs are not just shared libraries done by somebody else, but libraries which are done exclusively for this software, by the same developer team. (I'm not talking here about big scale products which just require hundreds of DLLs and share them extensively with other products.)
I understand that separating code into several parts, each one compiling into a separate DLL, is good from the point of view of a developer. It means that:
If a developer changes one project, he has to recompile only this one, and dependent ones, which can be much faster.
A project can be done by a single developer in a team, while other developers will just use provided interfaces, without stepping into the code.
Auto updates of the software may sometimes be faster, with lower server impact.
But what about the end user? Isn't it just bad to deliver a piece of software composed of one EXE & several DLLs, when everything could be grouped together? After all:
The user may not even understand what are those files and why they fill memory on his hard disk,
The user may want to move a program, for example save it on an USB flash drive. Having one big executable makes things easier,
Most anti-virus software will check each DLL. Checking one executable will be much faster than the smaller executable and dozens of libraries.
Using DLLs makes some things slower (for example, in .NET Framework, a "good" library must be found and checked if it is signed),
What happens if a DLL is removed or replaced by a bad version? Does every program handle this? Or does it crash without even explaining what's wrong with it?
Having one big executable has some other advantages.
So isn't it better from end users point of view, for small/medium size programs, to deliver one big executable? If so, why there are no tools allowing to do it easily (for example a magic tool integrated in common IDEs which compiles the whole solution into one executable, not each time, of course, but on-demand or during deployment).
This is someway similar to putting all CSS or all JavaScript files into one big file for the user. Having several files is much smarter for the developer and easier to maintain, but linking each page of a website to two files instead of dozens optimizes performance. In the same manner, CSS sprites are awful for the designers, because they require much more work, but are better from users point of view.

It's a tradeoff
(You figured that out yourself already ;))
For most projects, the customer doesn't care about how many files get installed, but he cares how many features are completed in time. Making life easier for developers benefits the user, too.
Some more reasons for DLL's
Some libraries don't play well together in the same build, but can be made to behave in a DLL (e.g. one DLL may use WTL3, the other requires WTL8).
Some of the DLL's may contain components to be loaded into other executables (global hooks, shell extensions, browser addons).
Some of the DLL's might be 3rd party, only available as DLL.
There may be reuse within the company - even if you see only one "public" product, it might be used in a dozen of internal projects using that DLL.
Some of the DLL's might have been built with a different environment thats not available for all developers in the company.
Standalone EXE vs. Installed product
Many products won't work as standalone executable anyway. They require installation, and the user not touching things he's not supposed to touch. Having one or more binaries doesn't matter.
Build Time Impact
Maybe you underestimate the impact of build times, and maintaining a stable build for large projects. If a build takes even 5 minutes, you could ephemistically call that "make developers think ahead, instead of tinker until it seems to work ok". But it's a serious time eater, and creates a serious distraction.
Build time of a single project is hard to improve. Working on VC9, build parallelization within one project is shaky, as is the incremental linker. Link times are especially hard to "optimize away" by faster machines.
Developer Independence
Another thing you might underestimate.
To use a DLL, you need a .dll and a .h.
To compile and link source code, you usually need to set up include directories, output directories, install 3rd party libraries, etc. It's a pain, really.

Yes, it is better IMHO - and I always use static linking for exactly the reasons you give, wherever possible. Lots of the reasons that dynamic linkage was invented for (saving memory, for example) no longer really apply. OTOH, there are architectural reasons, for example plugin architectures, why dynamic linking may be preferable to static.

I think your general point about considering carefully the final packaging of deliverables is well made. In the case of JavaScript such packaging is indeed possible, and with compression makes a significant difference.

Done lots of projects, never met an end-user which has any problem with some dll files residing on his box.
As a developer I would say yes it could matter. As an end-user who cares...

Yes, it may often be better from the end user's point of view. However, the benefits to the developer (and the development process) that you mention often mean that a business will prefer the cost-effective option.
It's a feature that too few users will appreciate, and that will cost a non-trivial amount to deliver.
Remember that we on StackOverflow are "above average" users. How many (non-geek) family members and friends do you have that would really value the ability to install their software to a USB stick?

The big advantages for dll are linked to the introduction of borders and independance .
For example in C/C++ only symbols exported are visible. Imagine a module A with a global variable "scale" and a module B with another global variable "scale" if you put all together you go to desaster ; in this case a dll may help you.
You can distribute those dll as component for customers without exactly the same compiler / linker options ; and this is often a good way to do cross language interop.

Related

How resource files in Windows work and why use them?

I'm getting frustrated at every tutorial/book using resource files but never explaining why they use them or how they work under the hood. I haven't been able to find any information whatsoever on this. I'm hoping someone could create a public answer for everyone to find later.
Some relevant information may include...
What is the rationale behind resource files?
Are they a feature or Windows or language compilers?
Why should you not just create a GUI via code only?
Are there situations where only a resource file can be used or only code can be used?
How do resource file entries get converted in to actual window's objects at runtime?
What exactly does the resource compiler do with the entries and what does the compiled format contain?
Is a difference in loading times created using resource files rather than code?
What's your recommendation on using resources or code?
Any additional information would be appreciated.
Traditionally in Windows development, resource files are embedded directly in the executable binary. I don't think this particular approach to resources is much used outside Windows, but Java has a cross-platform way of doing the same thing by storing resources in a compressed archive along with the executable bits. You can see the same thing in Android development, where resources are embedded in APK files. It's pretty common (except in Windows binaries) to use XML to specify resources.
Packing resources into an executable has the potential advantage of offering a "single-file solution"; the executable contains the application logic and the supporting resources all bundled into one file. This can be helpful in some situations, but it's pretty rare to find a substantial application that can be distributed as a single file. In any event, this method of packing resources goes back to the earliest days of Windows development; Microsoft probably drew on approaches that were common in other desktop micro platforms of that time, all of which, I guess, are now long dead.
Resources allow certain things -- typically user interface elements - to be specified declaratively, rather than in code. A nice feature is to allow these declarative elements to be locale-dependent. So, for example, we can specify a main menu for English, one for French, and so on, and have the proper one picked at run-time.
It's easier to have graphical design tools write and edit declarative resource files than code, although it's not impossible to have them output code directly. In any event, it's usually considered "nicer" to have user interface elements separated from application logic. With careful design, the user interface bits and other resources can be edited after the application itself is finalized, and can be maintained separately.
Classical Windows resources are compiled by a resource compiler into a packed binary format, poked in that format into object (.obj, .o) files, and then linked into the executable by the linker. I suspect that with modern development tools, all that stuff is completely hidden, and you just see the final executable. The Windows APIs know how to unpack resources from the executable, and turn them into programmatic representations -- not code, as such, but data in memory that can be used by other API calls.
In my experience, Windows (binary) resources don't have any significant overhead over explicit coding, although processing an XML-based resource can be a bit slower.
Coding a user interface usually offers more flexibility than using resources; it's not uncommon to see both approaches used in the same application.

Does exporting everything from DLL affect performance

To do unit testing I had to export many small, internal classes which were never intended for consumption by the clients of my DLL.
I know that each exported function results in a stub in executable image and that Windows loader has to perform fix-up on those stubs if DLL is not loaded at its preferred location.
Someone suggested building DLL as a static lib, solely for the purposes of unit testing.
I wonder if it is worth the trouble? I could find no reference to how significant a problem of exporting every class from DLL may be, or is there any significant gain in loader performance and memory consumption if I am selective about it.
I think I read somewhere that GCC compiler exports everything by default.
EDIT: since the stated motivation for the question is disputable, let me rephrase it:
Should I go through my DLLs and remove DLLEXPORT on all classes that are not exposed to its clients? Let's say I am working with a bunch of legacy DLLs and I noticed they have a lot of unnecessary exports. Will that improve the speed of loading? Specifically on Windows 7 and 8 using MSVC version 9+.
Does exporting everything from DLL affect performance?
It probably does, however the effect is immeasurably small. I made a python script that creates a test DLL exporting > 50,000 symbols. It consists of 1024 exported classes that each contain 48 functions (16 member functions, 16 virtuals and 16 static functions). The compiler also generates about 4-5 exports for each class that appear to be things like the vtable.
I measured load time of the application using SysInternals ProcMon. The load time on the very ancient underpowered test machine before linking the DLL was between 15-30ms. Adding the DLL, and one call to each of the ~50,000 exported functions resulted in no measurable change.
This is not a completely conclusive test, but it is good enough to convince me that the symbol resolution and fix-ups are probably an order of magnitude or more faster than any other limiting factor.
Interestingly, to be able to create such an insane DLL with the Microsoft tools required adding the /bigobj compiler flag and it appears there is also limit of 64K exported symbols in the PE format of a DLL. Furthermore, the static (compile time) compile and link phases for the DLL and application each took many minutes and used a lot of memory.
So you will be pushing on all kinds of other limits before you get to loader performance problems.
Let's say I am working with a bunch of legacy DLLs and I noticed they have a lot of unnecessary exports. Will that improve the speed of loading?
Nope.
Should I go through my DLLs and remove DLLEXPORT on all classes that are not exposed to its clients?
It depends.
Not simply because of load performance. If this was so critical to the application, then presumably somebody would be benchmarking the startup and would know exactly where the performance problems were. We shouldn't guess about performance impacts:
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." -- Knuth
However, there may be other reasons for not exporting those "internal" classes and functions. The point to exporting a class/function is so that client code can use it. It should match the DLL's logical external API. If that wasn't the case for a function or class, then it shouldn't have been exported. If there is a lot of functionality in internal classes that cannot be used, or tested, without going through the DLL's public API, it makes one wonder why that functionality exists? If the intent was to create generic reusable classes, perhaps they should be in a library of their own?
Test Driven Design doesn't mean you have to go around exposing everything publicly. And DLL exports are not necessarily required for even the most invasive white-box unit testing of a class. For example, the unit test fixture could be built monolithically and statically link (or even directly include the sources) to whatever internal classes are needed.
Conversely, a completely excusable explanation for having done it this way may be simply that it was easy and simple to implement. If everything else is essentially equal (modulo style and some theoretical architecture concerns), it is also poor form to needlessly change and disrupt a system that was already done a certain way and is working fine.
So this may be a design that should not be copied or extended, and maybe it is worth cleaning it up whenever maintenance or refactoring opportunities come up.
I think I read somewhere that GCC compiler exports everything by default.
Mingw LD documentation concurs. Although, note that it says that if you export with __declspec or .DEF files, this auto-export behavior is disabled.

Embed and execute native code from memory

I want to do these two things with my application (Windows only):
Allow user to insert (with a tool) a native code into my application before starting it.
Run this user-inserted code straight from memory during runtime.
Ideally, it would have to be easy for user to specify this code.
I have two ideas how to do this that I'm considering right now:
User would embed a native dll into application's resources. Application would load this dll straight from memory using techniques from this article.
Somehow copy assembly code of .dll method specified by user into my application resources, and execute this code from heap as described in this article.
Are there any better options to do this? If not, any thoughts on what might cause problems in those solutions?
EDIT
I specifically do not want to use LoadLibrary* calls as they require dll file to be already on hard drive which I'm trying to avoid. I'm also trying to make dissasembling harder.
EDIT
Some more details:
Application code is under my control and is native. I simply want to provide user with a way to embed his own customized functions after my application is compiled and deployed.
User code can have arbitrary restrictions placed on it by me, it is not a problem.
The aim is to allow third parties to statically link code into a native application.
The obvious way to do this is to supply the third parties with the application's object files and a linker. This could be wrapped up in a tool to make it easy to use.
As ever, the devil is in the detail. In addition to object files, applications contain manifests, resources, etc. You need to find a linker that you are entitled to distribute. You need to use a compiler that is compatible with said linker. And so on. But this is certainly feasible, and likely to be more reliable than trying to roll your own solution.
Your option 2 is pretty much intractable in my view. For small amounts of self-contained code it's viable. For any serious amount of code you cannot realistically hope for success without re-inventing the wheel that is your option 1.
For example, real code is going to link to Win32 functions, and how are you going to resolve those? You'd have to invent something just like a PE import table. So, why do so when DLLs already exist. If you invented your own PE-like file format for this code, how would anyone generate it? All the standard tools are in the business of making PE format DLLs.
As for option 1, loading a DLL from memory is not supported. So you have to do all the work that the loader would do for you if it were loading from file. So, if you want to load a DLL that is not present on the disk, then option 1 is your only choice.
Any half competent hacker will readily pull the DLL from the executing process though so don't kid yourself that running DLLs from memory will somehow protect your code from inspection.
This is something called "application virtualization", there are 3rd party tools for that, check them on google.
In a simple case, you may just load "DLL" into memory, apply relocs, setup imports and call entry point.

Substituting a dll, to monitor dll usage

Let's say i have a console application that writes to a file. If I understand correctly, C++ uses some dll, to create and write to the file.
Is it possible, to create a dll with the same name, having the same function signatures, and forward these calls to the real api? The application would not see any change, and it would be possible to notify, or restrict certain calls.
My worry is - is there any security signature that the applications check in a dll?
Would there be any conflicts with the libary names?
You don't need to create a new DLL to replace the original, nor should you. That would have global repercussions on the entre OS. What you should do instead is have your app use Detours to hook the particular DLL functions you are interested in. That way, you are not modifying any DLLs at all, and the OS can do its normal work, while still allowing your custom code to run and deciding whether to call the original DLL functions or not.
yes, entirely possible you can already figure out what the function signatures are and re-implement them (heh, Google already did this with Java JRE :) )
The problem you have is loading a different dll with the same name, though its entirely possible you can do this explicitly with a fixed directory. you can load the dll and then hook up all its functions.
At least that's what I think will happen - having 2 dlls of the same name in the same process might be troublesome (but I think, different path, all's ok).
Security generally isn't done when loading dlls, however MS does this with some .NET assemblies, but the cost is that it takes a long time to load them as there's a significant delay caused by the decryption required to secure the dll. this is why a lot of .NET applications (especially those that use dlls installed in the GAC) are perceived as slow to start - there can be a significant amount of security checking occurring.
I think, generally, if someone has enough access to your computer to install a dll, he could do a lot worse. A skilled hacker woudl just replace the original dll with a new one that does all of the above - and then you wouldn't be able to see a new, rogue dll lying around your system.
If you are security-conscious and worried about this kind of think, the correct way to resolve it is with an intrusion-detection system like AIDE. This scans your computer and builds a database of all the files present, with a secure hash of each. You then re-scan at regular intervals and compare the results with the original DB: any changes will be obvious and can be flagged for investigation or ignored as legitimate changes. Many Linux servers do this regularly as part of their security hardening. For more info, go to ServerFault.

Speeding up WIX compiles

I have a WIX 3.0 installer that is building 88 slightly different builds (cross product of 32 and 64-bit, 11 locales, four editions (Beta, Retail, Evaluation, Different Evaluation).
Each build has slightly different contents in addition to localized UI, so I can't just build one configuration with multiple locales.
The resulting MSI is about 120MB. I'm already using the CabCache.
The installer takes about 3-5 minutes per release to build, resulting in a pretty lengthy overall build time.
The install appears to be heavily disk bound during linking (light.exe).
Clearly making the disks faster could help. Does anybody have advice on how to set up a machine that could crank through these installers faster? (or advice on reconfiguring my WIX project to build more efficiently?)
Get an SSD. Like one of those with internal RAID architecture from e.g. OCZ. SSD is every developer's upgrade of the decade. Plus more RAM if swapping is an issue.
If you have common parts (that are not localized) you can create a merge module with the common parts and then just add the differencing stuff to each build.
I am not sure if you have any say or communication with the developers of the application that you are installing, but if you have to create that many MSI's mainly because of languages, have you considered just offering one Language MSI that delivers all the language specific files to a resources directory and then the user can choose which language they would like to use (but only install this if they need something other than the default language). Also it might be worth looking into having the product made in such a way that the user can pick from within which language is best, then having all the languages installed from the start.
As for your question about speeding up the build, that is a tricky one. Using Merge Modules I would rule out right away, as I don't see any actual gain coming out of that. Of course updating the hardware (as you said) will give some results, but again, I am not sure how much of a jump you would be making so it is hard to tell what kind of gain that would give. I think it might be best to go over your WXS with a fine tooth comb and see what is really going on in there. You can sometimes find things that are left over from the developement of the package, or from a previous tool that are really slowing you down. One example would be that my company recently switched to WiX from a more automated setup creation utility (leaving the name out on purpose cause I am listing the problems with it :P ) and it automatically created every folder under Windows that might possibly be needed in the running of a windows application, as well as the common files folder, the current user profile, and many many more. I think I ended up erasing in all over 100 empty directories that this old technology was nice enough to add for me. That is just one example of optimization that was done. It is amazing what can be found when you take the time to REALLY review what is going on under the hood.
In your wixproj setup file add this just before the end of file in <PropertyGroup> tag
<IncrementalGet>true</IncrementalGet>
This will tell WIX to compile only those files which are changed after the previous build.

Resources