How can I compare .text and .data segments of a .dll against the same ones in a different .dll? - visual-studio

I have a 20+ yo .dll, written in C that none of my colleagues want to touch. With good reason, it uses macros, macro constants and casting EVERYWHERE, making the symbol table quite lean.
Unfortunately, I have to sometimes debug this code and it drives me crazy that it doesn't use something as simple as enums which would put symbols in the .pdb file to make debugging just that little bit easier.
I would love to convert some of the #defines to enums, even if I don't change the variable types as yet, but there is a genuine fear that it will cause possible issues in terms of performance if it were to change the code generated.
I need to show definitively, that no compiled code changes will occur, but it would seem that the .dll is actually changing significantly in a 64 bit build. I looked at one of the function's disassembly code and it appears to be unaffected, but I need to show what is and is not changing in the binary to alleviate the fears of my colleagues as well as some of my own trepidation, plus the bewilderment as to why any changes would propagate to the .dll at all, though the .dlls are of the same size.
Does anyone have any idea how I could do this? I've tried to use dumpbin, but I'm not that familiar with it and am getting some mixed results, prolly because I'm not understanding the output as much as I like.

The way I did this was as follows:
Turn on /FAs switch for project.
Compile that project.
Move the object file directory (Release => Release-without-enums)
Change #defines to enums
Compile that project again.
Move the object file directory (Release => Release-with-enums)
From a bash command line. Use the command from the parent of the Release directory:
for a in Release-without-enum/*.asm; do
git diff --no-index --word-diff --color -U10000 $a "Release-with-enum/$(basename $a)";
done | less -R
The -U10000 is just so that I can see the entire file of each file. Remove it if you just want to see the changes.
This will list all of the modifications in the generated assembly code.
The changes found were as follows:
Symbol addresses were moved about for apparently no reason
Referencing __FILE__ seems to result in not getting a full path when using enums. Why this would translate to removing the full path when using enums is a mystery as the compiler flags have not changed.
Some symbols were renamed for apparently no reason.
Edit
2 and 3 seem to be caused by a corrupted .pdb error. This might be due to the files being used in multiple projects in the same solution. Rebuilding the entire solution fixed those 2 problems.

Related

Getting rid of empty cpp files for interfaces in Rhapsody

In my project, I want to get rid of tons of empty and pointless cpp files for interfaces in IBM Rational Rhapsody.
Setting CPP_CG:File:Generate to Specification yields only header file generation of a class, which is almost what I want. But, the makefile (gpj) still looks for the *Ifc.cpp file. Is there a straight way to exclude these cpp files from makefile?
There is an option CG::File::AddToMakefile which does only work for component files. I found some info that it was working before but with Rhapsody 8, it stopped working.
You should be able to force the suppression of the either header or implementation file of the interface using those properties. However!
Rhapsody expects to find the cpp file of the interface and suppressing it will cause problems with roundtrip function - Roundtrip doesn't just occur explicitly, it will also trigger implicitly by default when you save the project or change focus from code editor to model browser.
During this roundtrip, Rhapsody will try to "fix" the model by replacing the missing cpp file. This will be followed by roundtrip error messages. Disregarding the errors and continuing with roundtrip will probably cause duplicate elements and all sorts of mess.
In other words, what you're trying to do is not really supported and is a bad idea.

Boost 1_44 includes don't work

Sorry for what seems like a silly question: But I've never, ever worked with boost, until tonight, and I'm finding that getting it configured seems to be harder to use than it should be.
I wanted experiment with it tonight. So I downloaded the zip file, and unzipped it to a directory here:
F:/boost_1_44_0
Then I created an empty c++ project in visual studio 2010 (not using pch either). So all I wanted to do was to include a header file. But not even a silly thing like that seems to work. Now I've been using visual studio for years, though at work we are still stuck on vs 2008 (That is another story). So usually what you do is set an include directory, and then you can include files in at will right?
So I set the global include directory to include the boost root. i.e. Property Manager -> My configuration (debug|win32) -> Microsoft.Cpp.Win32.user -> Common Properties -> C++ Directories -> Include Directories. There I added my path to f:/boost_1_44_0.
I also went to the project properties and set the C++ include directory for the project to point to the boost root like in vs 2008.
I then added a silly include declaration like so:
#include <boost/lambda/lambda.hpp>
But, amazingly it fails to compile!!! with the following error:
Error 1 error C1083: Cannot open include file: 'boost/type_traits/transform_traits.hpp': No such file or directory f:\boost_1_44_0\boost\lambda\core.hpp 25 1 test_boost
Which when I double click it, it opens up in f:\boost_1_44_0\boost\lambda\core.hpp, and takes me to this line:
#include "boost/type_traits/transform_traits.hpp"
So I have no idea what's happening. Is visual studio just not delivering up my global include paths that I set? It seems also that the include directive in core.hpp should be using angle brackets and not quotes.
If I'm doing something wrong what?
EDIT:
!! SOLVED !!
Before I didn't have all the files unzipped. I don't know what happened. So I re-downloaded the zip file, and unzipped it again. This time the zip file took much longer to unzip, and it extracted much more files: Including the missing files.
Problem solved, my hello world app compiles just fine now.
The behaviour of compilers in locating header files is implementation defined for both the <> and "" variants.
However, based on this page for VC2010, it appears the quoted form searches a superset of the angle bracket form so I'm not sure that's the problem.
I suppose it would be a silly question to ask if the following file actually existed?
f:\boost_1_44_0\boost\type_traits\transform_traits.hpp
So, a couple of investigative jobs:
Make sure that f:\boost_1_44_0\boost\type_traits\transform_traits.hpp exists.
Try changing your top-level include to use quotes.
Try changing the include in f:\boost_1_44_0\boost\lambda\core.hpp to use angle brackets.
Make sure you try all four possibilities for those last two.
Is f: a network-mounted drive? What happens if you put it all on c:?
That last one is just in case Windows is doing some shenanigans under the covers :-)
While it's a bit overkill for this, learning to use SysInternals' Process Monitor will pay off over time. It will show you what files are actually opened, and which attempts failed. Look where Visual Studio tries to read transform_traits.hpp from, and you'll probably have the answer.

How do you verify that 2 copies of a VB 6 executable came from the same code base?

I have a program under version control that has gone through multiple releases. A situation came up today where someone had somehow managed to point to an old copy of the program and thus was encountering bugs that have since been fixed. I'd like to go back and just delete all the old copies of the program (keeping them around is a company policy that dates from before version control was common and should no longer be necessary) but I need a way of verifying that I can generate the exact same executable that is better than saying "The old one came out of this commit so this one should be the same."
My initial thought was to simply MD5 hash the executable, store the hash file in source control, and be done with it but I've come up against a problem which I can't even parse.
It seems that every time the executable is generated (method: Open Project. File > Make X.exe) it hashes differently. I've noticed that Visual Basic messes with files every time the project is opened in seemingly random ways but I didn't think that would make it into the executable, nor do I have any evidence that that is indeed what's happening. To try to guard against that I tried generating the executable multiple times within the same IDE session and checking the hashes but they continued to be different every time.
So that's:
Generate Executable
Generate MD5 Checksum: md5sum X.exe > X.md5
Verify MD5 for current executable: md5sum -c X.md5
Generate New Executable
Verify MD5 for new executable: md5sum -c X.md5
Fail verification because computed checksum doesn't match.
I'm not understanding something about either MD5 or the way VB 6 is generating the executable but I'm also not married to the idea of using MD5. If there is a better way to verify that two executables are indeed the same then I'm all ears.
Thanks in advance for your help!
That's going to be nearly impossible. Read on for why.
The compiler will win this game, every time...
Compiling the same project twice in a row, even without making any changes to the source code or project settings, will always produce different executable files.
One of the reasons for this is that the PE (Portable Executable) format that Windows uses for EXE files includes a timestamp indicating the date and time the EXE was built, which is updated by the VB6 compiler whenever you build the project. Besides the "main" timestamp for the EXE as a whole, each resource directory in the EXE (where icons, bitmaps, strings, etc. are stored in the EXE) also has a timestamp, which the compiler also updates when it builds a new EXE. In addition to this, EXE files also have a checksum field that the compiler recalculates based on the EXE's raw binary content. Since the timestamps are updated to the current date/time, the checksum for the EXE will also change each time a project is recompiled.
But, but...I found this really cool EXE editing tool that can undo this compiler trickery!
There are EXE editing tools, such as PE Explorer, that claim to be able to adjust all the timestamps in an EXE file to a fixed time. At first glance you might think you could just set the timestamps in two copies of the EXE to the same date, and end up with equivalent files (assuming they were built from the same source code), but things are more complicated than that: the compiler is free to write out the resources (strings, icons, file version information, etc.) in a different order each time you compile the code, and you can't really prevent this from happening. Resources are stored as independent "chunks" of data that can be rearranged in the resulting EXE without affecting the run-time behavior of the program.
If that wasn't enough, the compiler might be building up the EXE file in an area of uninitialized memory, so certain parts of the EXE might contain bits and pieces of whatever was in memory at the time the compiler was running, creating even more differences.
As for MD5...
You are not misunderstanding MD5 hashing: MD5 will always produce the same hash given the same input. The problem here is that the input in this case (the EXE files) keep changing.
Conclusion: Source control is your friend
As for solving your current dilemma, I'll leave you with this: associating a particular EXE with a specific version of the source code is a more a matter of policy, which has to be enforced somehow, than anything else. Trying to figure out what EXE came from what version without any context is just not going to be reliable. You need to track this with the help of other tools. For example, ensuring that each build produces a different version number for your EXE's, and that that version can be easily paired with a specific revision/branch/tag/whatever in your version control system. To that end, a "free-for-all" situation where some developers use source control and others use "that copy of the source code from 1997 that I'm keeping in my network folder because it's my code and source control is for sissies anyway" won't help make this any easier. I would get everyone drinking the source control Kool-Aid and adhering to a standard policy for creating builds right away.
Whenever we build projects, our build server (we use Hudson) ensures that the compiled EXE version is updated to include the current build number (we use the Version Number Plugin and a custom build script to do this), and when we release a build, we create a tag in Subversion using the version number as the tag name. The build server archives release builds, so we can always get the specific EXE (and setup program) that was given to a customer. For internal testing, we can choose to pull an archived EXE from the build server, or just tell the build server to rebuild the EXE from the tag we created in Subversion.
We also never, ever, ever release any binaries to QA or to customers from any machine other than the build server. This prevents "works on my machine" bugs, and ensures that we are always compiling from a "known" copy of the source code (it only pulls and builds code that is in our Subversion repository), and that we can always associate a given binary with the exact version of the code that it was created from.
I know it has been a while, but since there is VB De-compiler app, you may consider bulk-decompiling vb6 apps, and then feeding decompilation results to an AI/statistical anomaly detection on the various code bases. Given the problem you face doesn't have an exact solution, it is unlikely the results will be 100% accurate, but as you feed more data, the detection should become more and more accurate

VS2010 always relinks the project

I am migrating a complex mixed C++/.NET solution from VS2008 to VS2010.
The upgraded solution works in VS2010, but the build system is always refereshing one C++/CLI assembly. It doesn't recompile anything, but the linker touches the file. The causes a ripple effect downstream in the build as a whole bunch of dependent then get rebuilt.
Any ideas on how to find out why it thinks it needs to relink the file? I've turned on verbose build logging, but nothing stands out.
Turns out the problem was that the PDB filename was defined under both the compiler settings and the linker settings (with the same name).
This seemed to cause a problem in VS2010 as somehow an 'old' pdb from the intermediate directory (compiler output?) was being copied over the one in the output directory (linker output?). This resulted in the pdb in the output directory being older than some of the obj files and forcing the relink next time around (rinse and repeat).
Clearing the pdb name settings seemed to fix the problem, and the defaults were fine.
I'm just adding this for the record, in case anyone else gets this problem in the future.
We had a similar problem with a large, mixed FORTRAN/C++ project relinking whether or not anything had changed. It seems to have started when the solution was upgraded from VS2008 to 2010, although no-one could quite remember.
Eventually, I took a serious look at it (it was annoying, but not enough to do anything). By process of elimination, I have found the solution:
Remove any quotes in the "Additional Library Directories" of the top-level FORTRAN project (i.e. the one that makes the executable).
Now, I wouldn't believe this myself without evidence, so if you have the urge to reproduce this error yourself:
Open a new VS2010 session.
Create a new FORTRAN project that makes an executable.
Leave it empty, but link it to a non-inbuilt lib file (i.e. one of your own), and add your library's directory to the Additional Library Directories.
Check this compiles and links correctly.
Now try adding double quotes (") around the directory, and click on "Build" several times. If your session is like mine, then it will relink every time. When you remove the quotes, it stops.
This only appears to be a problem on the top-level project, and when those projects are FORTRAN - the quotes have no effect on C++ projects or those that create libs.
The relinking does not occur if VS does not need to search for libs (for example, if all of your libs are inbuilt, like kernel32.lib), but will occur whether or not your lib is in the directory which has quotes or a different one.
If anyone can justify this "feature", please let me know!

ada95 have 3 files .ali, .adb and .o - can I compile

I've found some old college work, with my final Ada95 project on it. Sadly, the disc was corrupted, and I have only managed to recover 3 files (the source and executable couldnt be recovered):
project.adb, project.ali and project.o
Are these 3 files enough to compile a new exe? I'm downloading the gnat compiler now, but have to admit, I have forgotten almost everything ada related...
Frank
[EDIT]
shucks.... using GCC to compile the project.adb throws an error about a missing ads file, which I cannot recover.
Is it possible to extract this / compile just the ".o" or ".ali" files? Or, am I stuffed?
project.adb is a source file.
Since you say that gcc complains about a missing .ads file, that indicates that project.adb contains a package body. You can manually construct a corresponding package spec by putting the following into package.ads:
package Project is
end Project;
Now that's almost certainly not enough, because the project spec probably had some type and constant declarations in it, so you'd have to analyze your package body and identify what it references. Infer what those declarations should look like and add them. Oh, and if your package body "with's" any packages that are not part of the standard Ada library, you'll have to recover those as well.
If you do manage to get your reverse engineered spec and the body to compile, you'll still have to create a "driver" program that "with's" the project package, and calls whatever functions and/or procedures that carried out the function of your project (and you'll have to pull the specs of those subprograms--which match their appearance in the package body--into the spec as well.)
Frankly, if it were me, I'd spend more time on trying to use some disk recovery tools to pull whatever else I could off the disk.
In Ada95 (and 2005) one mostly work with adb files (occasionally with ads files) everything else is generated on the run. In your case the adb file is surely other linked up to other ads files.
However, ads files are usually small programs (Obviously, if you are not attempting really exotic things as 'the dining philosophers') which pertain to the algorithmic/mathematical structure of the program, if you can dig out what you did in your project then it should not be impossible to restore it !

Resources