Visual Studio: use the results of Profile Guided Optimization from one exe to a different dll? - visual-studio

I have a dll, call it core.dll which I want to optimize using Visual Studio's excellent Profile Guided Optimization. Most of the code is the dll actually compiles into a library called core.lib which is then wrapped by core.dll.
To unit-test this code I also have a tester executable called test_core.exe. this executable links to core.lib and activates various functions from it. The DLL core.dll has very few exports, only enough to start its main functionality. It cannot be unit tested fully using these exports.
What I want is to do the PGO data collection by activating some of the tests in test_core.exe and then to use this PGO data to link and optimize core.dll.
It seems that the Visual Studio framework was designed so that the collecting executable and optimized executable are the same.
One option is to add the relevant tests to be inside core.dll and run them using a special export but that would bloat core.dll with test code which is not used in any other circumstance.

It seems that the Visual Studio framework was designed so that the collecting executable and optimized executable are the same.
That was very, very intentional. Profile guided optimization can only work properly when it uses profile data that was collected from a realistic simulation of the way your users are going to run your program. That requires the actual executables as deployed to the user and using realistic data that's a reasonable match with the data the program is going to process at the user's site.
Trying to spike it with test unit profile results will achieve the opposite, your user isn't going to run the code the same way. Significant odds that you'd end up with a less optimized program it that was possible. The profile data you've gather is only good enough to optimize the unit test, that's not useful.
Don't try to cook the profile data, it can't work. This does mean that you can't necessarily easily measure the effectiveness of the optimization if you require a unit test to see a signal. In that case you need to just assume that PGO gets the job done.

It seems that the Visual Studio framework was designed so that the
collecting executable and optimized executable are the same.
This is true, but in you're case, you want to optimize a DLL, not an executable. You can compile the static library and the DLL using the /GL switch and link the DLL using the /LTCG:PGINSTRUMENT switch. This creates a DLL that is instrumented. The test_core.exe image doesn't have to be instrumented, so you can just compile it normally (in Debug or Release mode). Then, by running test_core.exe, a PGC file will be generated containing a profile of the behavior of core.dll only. This profile can then be used to optimize core.dll by compiling it again and specifying the /LTCG:PGOPTIMIZE switch. As long as test_core.exe exercises core.dll for common usage scenarios, you'll certainly benefit from it. See this for more information.

Related

Why has debug dll a dependence on a certain dll but release does not

This seems to be a tricky question, also because it is difficult to present here all settings involved. So let me state the total picture:
I use Visual Studio 2022 to build a solution (all in x64 mode) comprising of two projects
CapopConsole
and
CapopWrap
CapopConsole is a C# Console-Project generating an exe and CapopWrap is a C++/CLI project wrapping a C++ native library called Capop. It generates a dll called CapopWrap.dll. CapopConsole has CapopWrap as dependency.
CapopWrap has three extra .lib dependencies, namely
capoplib.lib, libpredicates.lib, jsoncpp.lib
in Release mode and
capoplib.lib, libpredicates_d.lib, jsoncppd.lib
in Debug mode.
The first two of each row are from a certain project, special for me, the last one is from the well known jsoncpp project. All three libs are built with Visual Studio 2022 with the new CMake mode (no 'solutions').
Now the point is: Compiling and linking CapopWrap produces both for release and debug mode respectively a result dll, in both cases called CapopWrap.dll.
But: CapopWrap.dll from debug mode has a dependency on jsoncppd.dll whereas CapopWrap.dll from release mode has no explicit jsoncpp related dependency whatsoever (as shown with dumpbin, for example).
By symmetry one might expect that CapopWrap.dll (Release) would incur a dependency on jsoncpp.dll but this is not the case.
I would really be happy, if someone could help me find an explanation for this - my own attempts so far where fruitless.
Addendum: I do not give concrete excerpts from my .csproj and cmake files at first, because they are a bit lengthy and maybe an answer is possible without them. If not, please tell me, what you want to see concretely from my configuration files (or filesystem contents), I will edit it in then.

Why would a library consist of both a .lib and .dll file? [duplicate]

I know very little about DLL's and LIB's other than that they contain vital code required for a program to run properly - libraries. But why do compilers generate them at all? Wouldn't it be easier to just include all the code in a single executable? And what's the difference between DLL's and LIB's?
There are static libraries (LIB) and dynamic libraries (DLL) - but note that .LIB files can be either static libraries (containing object files) or import libraries (containing symbols to allow the linker to link to a DLL).
Libraries are used because you may have code that you want to use in many programs. For example if you write a function that counts the number of characters in a string, that function will be useful in lots of programs. Once you get that function working correctly you don't want to have to recompile the code every time you use it, so you put the executable code for that function in a library, and the linker can extract and insert the compiled code into your program. Static libraries are sometimes called 'archives' for this reason.
Dynamic libraries take this one step further. It seems wasteful to have multiple copies of the library functions taking up space in each of the programs. Why can't they all share one copy of the function? This is what dynamic libraries are for. Rather than building the library code into your program when it is compiled, it can be run by mapping it into your program as it is loaded into memory. Multiple programs running at the same time that use the same functions can all share one copy, saving memory. In fact, you can load dynamic libraries only as needed, depending on the path through your code. No point in having the printer routines taking up memory if you aren't doing any printing. On the other hand, this means you have to have a copy of the dynamic library installed on every machine your program runs on. This creates its own set of problems.
As an example, almost every program written in 'C' will need functions from a library called the 'C runtime library, though few programs will need all of the functions. The C runtime comes in both static and dynamic versions, so you can determine which version your program uses depending on particular needs.
Another aspect is security (obfuscation). Once a piece of code is extracted from the main application and put in a "separated" Dynamic-Link Library, it is easier to attack, analyse (reverse-engineer) the code, since it has been isolated. When the same piece of code is kept in a LIB Library, it is part of the compiled (linked) target application, and this thus harder to isolate (differentiate) that piece of code from the rest of the target binaries.
One important reason for creating a DLL/LIB rather than just compiling the code into an executable is reuse and relocation. The average Java or .NET application (for example) will most likely use several 3rd party (or framework) libraries. It is much easier and faster to just compile against a pre-built library, rather than having to compile all of the 3rd party code into your application. Compiling your code into libraries also encourages good design practices, e.g. designing your classes to be used in different types of applications.
A DLL is a library of functions that are shared among other executable programs. Just look in your windows/system32 directory and you will find dozens of them. When your program creates a DLL it also normally creates a lib file so that the application *.exe program can resolve symbols that are declared in the DLL.
A .lib is a library of functions that are statically linked to a program -- they are NOT shared by other programs. Each program that links with a *.lib file has all the code in that file. If you have two programs A.exe and B.exe that link with C.lib then each A and B will both contain the code in C.lib.
How you create DLLs and libs depend on the compiler you use. Each compiler does it differently.
One other difference lies in the performance.
As the DLL is loaded at runtime by the .exe(s), the .exe(s) and the DLL work with shared memory concept and hence the performance is low relatively to static linking.
On the other hand, a .lib is code that is linked statically at compile time into every process that requests. Hence the .exe(s) will have single memory, thus increasing the performance of the process.

DLL dependencies and allowing one to fail

I am currently looking at an issue where a project is generating a DLL. Now it is built upon a chain of other projects which are all C++, so they're just .lib files being linked in.
In this case one project uses OpenCL, however I don't believe those code paths are being run. However it would appear that just having OpenCL linked in causes the output .DLL to have a dependency on OpenCL.dll.
Please correct me if I am wrong here (in which case I'll go over the code with a fine-toothed comb to ensure no OpenCL calls are being executed).
I am not sure how Visual Studio (or dependency walker?) figures out which DLL's are dependencies for a given DLL. However I don't want the OpenCL.dll dependency.
What are my options?
One possible one is to take the project with the OpenCL code and refactor so some portion of it can be build with the OpenCL portion excluded from the build. However this will be a fair chunk of work so I am really hoping for something simpler.
If you link a lib you depend on the dll. The easiest thing for you would be to add /DELAYLOAD for this dll:
The /DELAYLOAD option causes the DLL that's specified by dllname to be loaded only on the first call by the program to a function in that DLL.
This generates a 'soft' dependency which only kicks in if you call a function that actually needs the DLL. Make sure you read Constraints of Delay Loading DLLs.
The other option (which I don't recommend) is to use runtime binding via LoadModule and GetProcAddress, then invoke functions via the pointer to function. This removes any dependency, but you are tasked to implement all check, all errors if the DLL is missing, and it can easy go astray if you mismatch the function signature/call convention. Ultimately, you'd be implementing /DELOAYLOAD manually.

Quickly testing a function that is a part of a big DLL project

I use VS2010 for C++ development, and I often end up doing work in some dll project and after everything compiles nicely I would like to try to run dummy data on some classes, but ofc the fact that it is a dll and not an exe with main makes that a no go. So is there a simple way to do what I want, or Im cursed till eternity to c/p parts of a big project into small testing one?
Ofc changing the type of the project also works, but I would like to have some almost like iteractive shell way of testing functions.
I know this isn't a library or anything, but if you want to run the dll on windows simply without framing it into anything, or writing a script, you can use rundll32.exe within windows. It allows you to run any of the exported functions in the dll. The syntax should be similiar to:
rundll32.exe PathAndNameofDll,exportedFunctionName [ArgsToTheExportedFunction]
http://best-windows.vlaurie.com/rundll32.html -- is a good simple still relevant tutorial on how to use this binary. Its got some cool tricks in there that may surprise you.
If you are wondering about a 64-bit version, it has the same name (seriously microsoft?) check it out here:
rundll32.exe equivalent for 64-bit DLLs
Furthermore, if you wanted to go low level, you could in theory utilize OllyDbg which comes with a DLL loader for running DLL's you want to debug (in assembly), which you can do the same type of stuff in (call exported functions and pass args) but the debugger is more for reverse engineering than code debugging.
I think you have basically two options.
First, is to use some sort of unit tests on the function. For C++ you can find a variety of implementations, for one take a look at CppUnit
The second option is to open the DLL, get the function via the Win32API and call it that way (this would still qualify as unit testing on some level). You could generalize this approach somewhat by creating an executable that does the above parametrized with the required information (e.g. dll path, function name) to achieve the "interactive shell" you mentioned -- if you decide to take this path, you can check out this CodeProject article on loading DLLs from C++
Besides using unit tests as provided by CppUnit, you can still write your own
small testing framework. That way you can setup your Dll projects as needed,
load it, link it, whatever you want and prove it with some simple data as
you like.
This is valueable if you have many Dlls that depend on each other to do a certain job.
(legacy Dlls projects in C++ tend to be hardly testable in my experience).
Having done some frame application, you can also inspect the possibilities that
CppUnit will give you and combine it with your test frame.
That way you will end up with a good set of automated test, which still are
valueable unit tests. It is somewhat hard starting to make unit tests if
a project already has a certain size. Having your own framework will let you
write tests whenever you make some change to a dll. Just insert it into your
framework, test what you expect it to do and enhance your frame more and more.
The basic idea is to separate the test, the testrunner, the testdata and the asserts
to be made.
I’m using python + ctypes to build quick testing routines for my DLL applications.
If you are using the extended attribute syntax, will be easy for you.
Google for Python + ctypes + test unit and you will find several examples.
I would recommend Window Powershell commandlets.
If you look at the article here - http://msdn.microsoft.com/en-us/magazine/cc163430.aspx you can see how easy it is to set up. Of course this article is mostly about testing C# code, but you can see how they talk about also being able to load any COM enabled DLL in the same way.
Here you can see how to load a COM assembly - http://blogs.technet.com/b/heyscriptingguy/archive/2009/01/26/how-do-i-use-windows-powershell-to-work-with-junk-e-mail-in-office-outlook.aspx
EDIT: I know a very successful storage virtualization software company that uses Powershell extensively to test both it's managaged and unmanaged (drivers) code.

Is it possible to get Code Coverage Analysis on an Interop Assembly?

I've asked this question over on the MSDN forums also and haven't found a resolution:
http://forums.microsoft.com/msdn/ShowPost.aspx?PostID=3686852&SiteID=1
The basic problem here as I see it is that an interop assembly doesn't actually contain any IL that can be instrumented (except for maybe a few delegates). So, although I can put together a test project that exercises the interop layer, I can't get a sense for how many of those methods and properties I'm actually calling.
Plan B is to go and write a code generator that creates a library of RCWWs (Runtime Callable Wrapper Wrappers), and instrument that for the purposes of code coverage.
Edit: #Franci Penov,
Yes that's exactly what I want to do. The COM components delivered to us constitute a library of some dozen DLLs containing approx. 3000 types. We consume that library in our application and are charged with testing that Interop layer, since the group delivering the libraries to us does minimal testing. Code coverage would allow us to ensure that all interfaces and coclasses are exercised. That's all I'm attempting to do. We have separate test projects that exercise our own managed code.
Yes, ideally the COM server team should be testing and analyzing their own code, but we don't live in an ideal world and I have to deliver a quality product based on their work. If can produce a test report indicating that I've tested 80% of their code interfaces and 50% of those don't work as advertised, I can get fixes done where fixes need to be done, and not workaround problems.
The mock layer you mentioned would be useful, but wouldn't ultimately be achieving the goal of testing the Interop layer itself, and I certainly would not want to be maintaining it by hand -- we are at the mercy of the COM guys in terms of changes to the interfaces.
Like I mentioned above -- the next step is to generate wrappers for the wrappers and instrument those for testing purposes.
To answer your question - it's not possible to instrument interop assemblies for code coverage. They contain only metadata, and no executable code as you mention yourself.
Besides, I don't see much point in trying to code coverage the interop assembly. You should be measuring the code coverage of code you write.
From the MDN forums thread you mention, it seems to me you actually want to measure how your code uses the COM component. Unless your code's goal is to enumerate and explicitly call all methods and properties of the COM object, you don't need to measure code coverage. You need unit/scenario testing to ensure that your code is calling the right methods/properties at the right time.
Imho, the right way to do this would be to write a mock layer for the COM object and test that you are calling all the methods/properties as expected.
Plan C:
use something like Mono.Cecil to weave simple execution counters into the interop assembly. For example, check out this one section in the Faq: "I would like to add some tracing functionality to an assembly I can’t debug, is it possible using Cecil?"

Resources