Comparing VB6.exes - vb6

We're going through a massive migration project at the minute and trying to validate the code that is deployed to the live estate matches the code we have in source control.
Obviously the .net code is easy to compare because we can disassemble. I don't believe this is possible in vb6 exes because of the manner of compilation.
Does anyone have any ideas on how I could validate the source code and the compiled executable matches the file I have in Live.
Thanks

Visual Basic had (has) two ways of compiling, one to the interpreter ( called P-code) that would result in smaller binaries, and a second one that generates "regular" windows .exe file (called native) that was introduced because it was supposed to be fastar than p-code; although the compiled file size increased with this option.
If your compilation was using p-code, it is in theory possible to restore the sources.
Either way is pretty difficult to do, but there are tools that claim they can partially do this, one that I know of ( never tried but there is a trial version ) is VB-decompiler
http://www.vb-decompiler.org/

Unfortunately that's almost impossible. Bear in mind that VB6 code compiled on different machines will have different exe sizes and deployment requirements.
This is why the old VB'ers had a dedicated machine to compile their code.

This won't help you with already deployed items, but if you upped the revision number on every compile (there is a project setting to do this for you automatically) then you could easily compare version numbers.

My old company bought a copy of that VB-Decompiler and as noted before VB5/6 generates P-Code extra, that tool did produce some code and if not Assembly code which could be "read".

If you have all the code you compiled, you could compare the CRC's of that code to what is deployed in the field. But if you don't have the original compiled code, depending upon how you compiled the code you (if you used P-Code rather than Native Code you may be able to disassemble but the disassembly will look nothing like your source code). I doubt you would have shipped the PDB's with the exe's, but if you did, you could certainly use those to compare with the source code in your repository.

Have a trusted computer that can check out the various libraries and exes you make and compile them automatically. Keep those in a read-only but accessible location. Then do a binary comparison between the deployed site and your comparison site.
However I am not sure of the logic over disassembling the the complied units. My company and most other places I know of use a combination of a build computer and unit testing. In our company the EXE we make is a very thin shell over a bunch of libraries. For example a button click will be passed to a UI Active X DLL that does the actual processing. What we do after a build is run a special EXE that perform our list of unit test. If they all passed we know our libraries, where 90% of our code is, are good. As for the actual EXE we have a hand procedure that takes about two hours to do and then we are good. IIt is rare for any errors to happen in the EXE.

Related

Why might it be necessary to force rebuild a program?

I am following the book "Beginning STM32" by Warren Gay (excellent so far, btw) which goes over how to get started with the Blue Pill.
A part that has confused me is, while we are putting our first program on our Blue Pill, the book advises to force rebuild the program before flashing it to the device. I use:
make clobber
make
make flash
My question is: Why is this necessary? Why not just flash the program since it is already made? My guess is that it is just to learn how to make an unmade program... but I also wonder if rebuilding before flashing to the device is best practice? The book does not say why?
You'd have to ask the author, but I would suggest it is "just in case" no more. Lack of trust that the make file specifies all possible dependencies perhaps. If the makefile were hand-built without automatically generated dependencies, that is entirely possible. Also it is easier to simply advise to rebuild than it is to explain all the situations where it might be necessary or otherwise, which will not be exhaustive.
From the author's point of view, it eliminates a number of possible build consistency errors that are beyond his control so it ensures you don't end up thinking the book is wrong, when it might be something you have done that the author has no control over.
Even with automatically generated dependencies, a project may have dependencies that the makefile or dependency generator does not catch, resource files used for code generation using custom tools for example.
For large projects developed over a long time, some seldom modified modules could well have been compiled with an older version of the tool chain, a clean build ensures everything is compiled and linked with the current tool.
make builds dependencies based on file timestamp; if you have build variants controlled by command-line macros, the make will not determine which dependencies are dependent on such a macro, so when building a different variant (switching from a "debug" to "release" build for example), it is good idea to rebuild all to ensure each module is consistent and compatible.
I would suggest that during a build/debug cycle you use incremental build for development speed as intended, and perform a rebuild for release or if changing some build configuration (such as enabling/disabling floating-point hardware for example or switching to a different processor variant.
If during debug you get results that seem to make no sense, such as breakpoints and code stepping not aligning with the source code, or a crash or behaviour that seems unrelated to some small change you may have made (perhaps that code has not even been executed), sometimes it may be down to a build inconsistency (for a variety of reasons usually with complex builds) and in such cases it is wise to at least eliminate build consistency as a cause by performing a rebuild all.
Certainly if you if you are releasing code to a third-party, such as a customer or for production of some product, you would probably want to perform a clean build just to ensure build consistency. You don't want users reporting bugs you cannot reproduce because the build they have been given is not reproducible.
Rebuilding the complete software is a good practice beacuse here you will generate all dependencies and symbol files path along with your local machine paths.
If you would need to debug any application with a debugger then probably you would need symbol file and paths where your source code is present and if you directly flash the application without rebuilding , then you might not be able to debug certain paths because you dont get to know at which path the application was compiled and you might miss the symbol related information.

Recommended tool to automate complicated build procedure

I am developing an OS for embedded devices that runs bytecode. Basically, a micro JVM.
In the process of doing so, I am able to compile and run Java applications to bytecode(ish) and flash that on, for instance, an Atmega1284P.
Now I've added support for C applications: I compile and process it using several tools and with some manual editing I eventually get bytecode that runs on my OS.
The process is very cumbersome and heavy and I would like to automate it.
Currently, I am using makefiles for automatic compilation and flashing of the Java applications & OS to devices.
All steps, roughly, for a C application are as follows and consist of consecutive manual steps:
(1) Use Docker to run a Linux container with lljvm that compiles a .c file to a .class file (see also https://github.com/davidar/lljvm/tree/master)
(2) convert this c.class file to a jasmin file (https://github.com/davidar/jasmin) using the ClassFileAnalyzer tool (http://classfileanalyzer.javaseiten.de/)
(3) manually edit this jasmin file in a text editor by replacing/adjusting some strings
(4) convert the modified jasmin file to a .class file again using jasmin
(5) put this .class file in a folder where the rest of my makefiles (the ones that already make and deploy the OS and class files from Java apps) can take over.
Current options seem to be just keep using makefiles but this is a bit unwieldly (I already have 5 different makefiles and this would further extend that chain). I've also read a bit about scons. In essence, I'm wondering which are some recommended tools or a good approach for complicated builds.
Hopefully this may help a bit, but the question as such could probably be a subject for a heated discussion without much helpful results.
As pointed out in the comments by others, you really need to automate the steps starting with your .c file to the point you can integrated it with the rest of your system.
There is generally nothing wrong with make and you would not win too much by switching to SCons. You'd get more ways to express what you want to do. Among other things meaning that if you wanted to write that automation directly inside the build system and its rules, you could also use Python and not only shell (should that be of a concern though, you could just as well call that Python code from make). But the essence of target, prerequisite, recipe is still there. And with that need for writing necessary automation for those .c to integration steps.
If you really wanted to look into alternative options. bazel might be of interest to you. The downside being the initial effort to write the necessary rules to fit your needs could be costly. And depending on size of your project, might just be too much. On the other hand once done with that, it'd be very easy to use (apply those rules on growing code base) and you could also ditch the container and rely on its more lightweight sand-boxing and external rules to get the tools and bits you need for your build... all with a single system for build description.

How do you verify that 2 copies of a VB 6 executable came from the same code base?

I have a program under version control that has gone through multiple releases. A situation came up today where someone had somehow managed to point to an old copy of the program and thus was encountering bugs that have since been fixed. I'd like to go back and just delete all the old copies of the program (keeping them around is a company policy that dates from before version control was common and should no longer be necessary) but I need a way of verifying that I can generate the exact same executable that is better than saying "The old one came out of this commit so this one should be the same."
My initial thought was to simply MD5 hash the executable, store the hash file in source control, and be done with it but I've come up against a problem which I can't even parse.
It seems that every time the executable is generated (method: Open Project. File > Make X.exe) it hashes differently. I've noticed that Visual Basic messes with files every time the project is opened in seemingly random ways but I didn't think that would make it into the executable, nor do I have any evidence that that is indeed what's happening. To try to guard against that I tried generating the executable multiple times within the same IDE session and checking the hashes but they continued to be different every time.
So that's:
Generate Executable
Generate MD5 Checksum: md5sum X.exe > X.md5
Verify MD5 for current executable: md5sum -c X.md5
Generate New Executable
Verify MD5 for new executable: md5sum -c X.md5
Fail verification because computed checksum doesn't match.
I'm not understanding something about either MD5 or the way VB 6 is generating the executable but I'm also not married to the idea of using MD5. If there is a better way to verify that two executables are indeed the same then I'm all ears.
Thanks in advance for your help!
That's going to be nearly impossible. Read on for why.
The compiler will win this game, every time...
Compiling the same project twice in a row, even without making any changes to the source code or project settings, will always produce different executable files.
One of the reasons for this is that the PE (Portable Executable) format that Windows uses for EXE files includes a timestamp indicating the date and time the EXE was built, which is updated by the VB6 compiler whenever you build the project. Besides the "main" timestamp for the EXE as a whole, each resource directory in the EXE (where icons, bitmaps, strings, etc. are stored in the EXE) also has a timestamp, which the compiler also updates when it builds a new EXE. In addition to this, EXE files also have a checksum field that the compiler recalculates based on the EXE's raw binary content. Since the timestamps are updated to the current date/time, the checksum for the EXE will also change each time a project is recompiled.
But, but...I found this really cool EXE editing tool that can undo this compiler trickery!
There are EXE editing tools, such as PE Explorer, that claim to be able to adjust all the timestamps in an EXE file to a fixed time. At first glance you might think you could just set the timestamps in two copies of the EXE to the same date, and end up with equivalent files (assuming they were built from the same source code), but things are more complicated than that: the compiler is free to write out the resources (strings, icons, file version information, etc.) in a different order each time you compile the code, and you can't really prevent this from happening. Resources are stored as independent "chunks" of data that can be rearranged in the resulting EXE without affecting the run-time behavior of the program.
If that wasn't enough, the compiler might be building up the EXE file in an area of uninitialized memory, so certain parts of the EXE might contain bits and pieces of whatever was in memory at the time the compiler was running, creating even more differences.
As for MD5...
You are not misunderstanding MD5 hashing: MD5 will always produce the same hash given the same input. The problem here is that the input in this case (the EXE files) keep changing.
Conclusion: Source control is your friend
As for solving your current dilemma, I'll leave you with this: associating a particular EXE with a specific version of the source code is a more a matter of policy, which has to be enforced somehow, than anything else. Trying to figure out what EXE came from what version without any context is just not going to be reliable. You need to track this with the help of other tools. For example, ensuring that each build produces a different version number for your EXE's, and that that version can be easily paired with a specific revision/branch/tag/whatever in your version control system. To that end, a "free-for-all" situation where some developers use source control and others use "that copy of the source code from 1997 that I'm keeping in my network folder because it's my code and source control is for sissies anyway" won't help make this any easier. I would get everyone drinking the source control Kool-Aid and adhering to a standard policy for creating builds right away.
Whenever we build projects, our build server (we use Hudson) ensures that the compiled EXE version is updated to include the current build number (we use the Version Number Plugin and a custom build script to do this), and when we release a build, we create a tag in Subversion using the version number as the tag name. The build server archives release builds, so we can always get the specific EXE (and setup program) that was given to a customer. For internal testing, we can choose to pull an archived EXE from the build server, or just tell the build server to rebuild the EXE from the tag we created in Subversion.
We also never, ever, ever release any binaries to QA or to customers from any machine other than the build server. This prevents "works on my machine" bugs, and ensures that we are always compiling from a "known" copy of the source code (it only pulls and builds code that is in our Subversion repository), and that we can always associate a given binary with the exact version of the code that it was created from.
I know it has been a while, but since there is VB De-compiler app, you may consider bulk-decompiling vb6 apps, and then feeding decompilation results to an AI/statistical anomaly detection on the various code bases. Given the problem you face doesn't have an exact solution, it is unlikely the results will be 100% accurate, but as you feed more data, the detection should become more and more accurate

How to automatically detect and release DLLs that have really changed?

Whenever we recompile an exe or a DLL, its binary image is different even if the source code is the same, due to various timestamps and checksums in the image.
But, our quality system implies that each time a new DLL is published, related validation tests must be performed again (often manually, and it takes a significant amount of time.)
So, our goal is to avoid releasing DLLs that have not actually changed. I.e: having an automatic procedure (script, tool, whatever...) that detect different Dlls based only on meaningful information they contain (code and data), ignoring timestamps and checksum.
Is there a good way to achieve this ?
Base it off the version information, and only update the version information when you actually make changes.
Have your build tool build the DLL twice. Whatever differences exist between the two are guaranteed to be the result of timestamps or checksums. Now you can use that information to compare to your next build.
If you have an automated build system that syncs source before starting a build, only proceed with building and publishing if there any actual changes in source control. You should be able to detect this easily from the output of your source control client.
We have the same problem with our build system. Unfortunately it is not trivial to detect if there are any material code changes since we have numerous static libraries, so a change to one may result in a dll/exe changing. Changes to a file directly used by the dll/exe may just be fixing a bad comment, not changing the resulting object code.
I've looked previously for a tool to do what you desired and I did not see one. My thought was to compare the two files manually and skip the non meaningful differences in the two versions. The Portable File Format is well documented, so I don't expect this to be terribly difficult. Our requirements additional require that we ignore the version stamped into the dll/exe since we uniquely stamp all our files, and also to ignore the signature as we sign all our executables.
I've yet to find time to do any of this, but I'd be interested in collaborating with you if you proceed with implementing a solution. If you do find a tool that does this, please do let us know.

Should you build components every time you build a main app

We have started using Final Builder to create builds for our vb6 and .net projects. We are also using Visual Source Safe to manage our source. Some of our vb6 exe's are dependent on certain ocx's, such that a particular vb6 exe may require a particular version of an ocx.
The question is, should the final builder script for our exe project also re-build the ocx project, or is it better to simply pull a particular version of the already compiled ocx. My concern is that other developers could have broken the build (or created a bug) for the ocx which could then break the exe we are trying to build. Moreover, re-building the ocx project would result in the same version of the ocx but with a different date, resulting in confusion if dllhell(ocx hell) issues arise.
There is no difference in terms of building and maintaining your app between a ocx and a activex dll. The ocx should use binary compatibility and be part of your compile process.
This is however a general rule. You may have some components that rarely change if ever. In my own VB6 application I have a handful of components that reside at the bottomost level of my reference hierarchy that rarely get updated. They maybe get updated one or twice a year at best. Some haven't been updated for several years now.
However based on your description it sounds like the controls are still being modified. So I doubt the second case applies.
In the end use your best judgment.
There are two ways to use OCX/DLLs: code reusability vs. fragmentation of an over-large project.
Those meant for re-use would be absurd to build, build, and rebuild, and almost never should be customized to fit a new application. These are your crown jewels, and most people should have no ability to modify the source. They are the domain of your organization's "library writers" because that's what they are: libraries.
If you simply have large, monolithic, unweildy applications you may have to go the other route. Then OCXs and DLLs simply become an awkward extension of the "module" concept. This is why we have Project Groups.
Your library users should not be fiddling with libraries though. I'm sure they all fancy themselves able to "ensure they are up to date and performant" but that's a different debate entirely.

Resources