How do patches or service packs work? I don't know how to explain my question but I will give a try
Take Windows for example. It has files which altogether consume 100s of MB. Now a single service pack (may be 300 MB file) updates the whole windows OS.
Similarly I have seen updates happening for softwares like adobe reader etc. In all these cases the main exe is much larger compared to the update. How does the process work? If the main file refers any dependancy files and if the update change the version or size. Will it not affect the exe?
Patches and service packs usually only need to update core shared libraries of the system. These libraries are replaced or patched from a compressed archive, hence their size. Once the libraries are updated the rest of the software of the OS can continue using the new versions.
Applications nowadays are designed to be modular and to use external libraries which can be updated easily. Sometimes the main application or any media used does not need to be replaced, only the library that's changed.
To complement earlier answers, back in the day, when file size really mattered, some patches were delivered as binary diffs, meaning, the patch itself was an executable that knew what files needed to be changed, and how, and it actually changed only a certain part of the files' zeroes and ones, locally, instead of replacing the files entirely.
Following URL may be of interest to you in knowing architecture.
http://msdn.microsoft.com/en-us/library/aa387291(VS.85).aspx
Patches (also called deltas) are only the differences between two files. If only few bytes of 1GB file change, patch will have only few bytes of size. For text files diff is used, for binary files xdelta or similar. Service packs are collections of patches.
Related
Some archive file formats, e.g. ZIP (see Section 8 in https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT), support being split into multiple parts of a limited size. ZIP files can be opened natively on recent versions of Microsoft Windows, but it seems that Windows cannot open split ZIP files natively, only with special tools like 7-Zip. I would like to use this "split archive" functionality in a web app that I'm writing in which the created archives should be opened by a large audience of "average" computer users, so my question is: Is there an archive file format (like ZIP) that supports being split in multiple parts and can be unpacked without installing additional software on recent versions of Microsoft Windows? And ideally on other widely used operating systems as well.
Background: My final goal is to export a directory structure that is split over multiple web servers to a single local directory tree. My current idea is to have each web server produce one part of the split archive, provide all of them as some sort of Javascript multi-file download and then have one archive (in multiple parts) on the user's computer that just needs to be unpacked. An alternative idea for this final goal was to use Javascript's File System Access API (https://developer.mozilla.org/en-US/docs/Web/API/File_System_Access_API), but it is not supported on Firefox, which is a showstopper.
CAB archives meet this purpose a bit (see this library's page for example, it says that through it, archives can even be extracted directly from a HTTP(S)/FTP server). Since the library relies on .NET, it could even be used on Linux through Mono/Wine, which is a crucial part if your servers aren't running Windows... Because archive must be created on server, right?.
Your major problem is more that a split archive can't be created in parallel on multiple servers, at least because of LZx's dictionnary. Each server should create the whole set of archives and send only the ones it should send, and you don't have ANY guarantee that all these archives' sets would be identical on each server.
Best way is probably to create the whole archive on ONE server, then distribute each part (or the whole splitted archive...) on your various servers, through a replication-like interface.
Otherwise, you can also make individual archives that contains only a subset of the directory tree (you'll have to partition the files across servers), but it won't meet your requirements since it would be a collection of individual archives, and not a big splitted archive.
Some precisions may be required:
Do you absolutely need a system without any client besides the browser? Or can you use other protocols, as long as they natively exist on Windows (like FTP / SSH client that are now provided by default)?
What is the real purpose behind this request? Distribute load across all servers? Or to avoid too big single file downloads (i.e. a 30 GB archive) in case of transfer failure? Or both?
In case of a file size problem, why don't rely on resuming download?
We are building an audio plugin that can be loaded by a various number of audio production softwares. To make it as compatible as possible to all common softwares, we actually build three versions of it (Steinberg VST2 format, Steinberg VST3 format, Avid AAX format), which is achieved by wrapping our core plugin code with wrappers for those three APIs. All three versions are installed in the standard location as specified for each format.
Our plugin now depends on the Microsoft onnxruntime, which we want to dynamically link against. Now, what is the right way of deploying and handling this dependency? As the plugin is loaded by the users host software of choice at runtime, placing the dll dependency next to the executable is no option, since we don't know which host software the user will use and which of the three plugin formats this software will chose.
Being a macOS developer, I'm unfamiliar with Windows best practice here.
Ideally we would like to install the dll into a custom location. But this would need us to modify the systems PATH variable to ensure that the dll is found for all users when a host loads one of our plugins, right? I'm not sure if this is a clean solution?
Another option could be to install the dll into C:\Windows\System32 but my research revealed that there is no versioning information on the dlls located there, so in case some other application installed onnxruntime there as well (or if it's a never windows installation that already ships with onnxruntime), how could we ensure that its version is equal or greater than the version needed by our plugin (in which case we wouldn't overwrite it) or below our minimum needed version (in which would replace it)? This generally seems like bad practice as well to me.
So what's common best practice on Windows for such scenarios? Am I overlooking a proper solution?
I'm looking for general advice. I created a Visual Studio 2010 project that outputs an ocx file that is used on XP and Vista machines. The DLL on which it depends has been updated on our Win7 machines. I simply needed to rebuild for Win7 using the exact same code with an updated .lib file. I created a second project configuration (ReleaseW7) and it only differs from the original project config (Release) in that it points to the new .lib.
So now I have 2 files both named xx.ocx. Besides looking at the name of the folder each file resides in (or looking at the creation time of each) there is no way to determine which is which. I thought of using different file version numbers but as far as I can tell (and I'm relatively new to this so I could certainly be wrong) that would require two separate projects each with a slightly modified resource (.rc) file, instead of simply having two configurations within the same project. If nothing more, that seems like a waste of hard drive space. It also feels like the "wrong" way of using file version numbers
Is there a cleaner or more "standard" way of handling this? All I really want is a way for the folks who install the ocx and support the end user to know for certain that they are working with the correct file.
Long story short, I decided to use different version numbers. I was able to setup a preprocessor definition for the resource compiler and use that to handle different versions of VS_VERSION_INFO in my .rc file.
In case anyone is interested, this is the resource I found:
http://social.msdn.microsoft.com/Forums/en-US/winformssetup/thread/605275c0-3001-45d2-b6c1-652326ca5340/
I have a program under version control that has gone through multiple releases. A situation came up today where someone had somehow managed to point to an old copy of the program and thus was encountering bugs that have since been fixed. I'd like to go back and just delete all the old copies of the program (keeping them around is a company policy that dates from before version control was common and should no longer be necessary) but I need a way of verifying that I can generate the exact same executable that is better than saying "The old one came out of this commit so this one should be the same."
My initial thought was to simply MD5 hash the executable, store the hash file in source control, and be done with it but I've come up against a problem which I can't even parse.
It seems that every time the executable is generated (method: Open Project. File > Make X.exe) it hashes differently. I've noticed that Visual Basic messes with files every time the project is opened in seemingly random ways but I didn't think that would make it into the executable, nor do I have any evidence that that is indeed what's happening. To try to guard against that I tried generating the executable multiple times within the same IDE session and checking the hashes but they continued to be different every time.
So that's:
Generate Executable
Generate MD5 Checksum: md5sum X.exe > X.md5
Verify MD5 for current executable: md5sum -c X.md5
Generate New Executable
Verify MD5 for new executable: md5sum -c X.md5
Fail verification because computed checksum doesn't match.
I'm not understanding something about either MD5 or the way VB 6 is generating the executable but I'm also not married to the idea of using MD5. If there is a better way to verify that two executables are indeed the same then I'm all ears.
Thanks in advance for your help!
That's going to be nearly impossible. Read on for why.
The compiler will win this game, every time...
Compiling the same project twice in a row, even without making any changes to the source code or project settings, will always produce different executable files.
One of the reasons for this is that the PE (Portable Executable) format that Windows uses for EXE files includes a timestamp indicating the date and time the EXE was built, which is updated by the VB6 compiler whenever you build the project. Besides the "main" timestamp for the EXE as a whole, each resource directory in the EXE (where icons, bitmaps, strings, etc. are stored in the EXE) also has a timestamp, which the compiler also updates when it builds a new EXE. In addition to this, EXE files also have a checksum field that the compiler recalculates based on the EXE's raw binary content. Since the timestamps are updated to the current date/time, the checksum for the EXE will also change each time a project is recompiled.
But, but...I found this really cool EXE editing tool that can undo this compiler trickery!
There are EXE editing tools, such as PE Explorer, that claim to be able to adjust all the timestamps in an EXE file to a fixed time. At first glance you might think you could just set the timestamps in two copies of the EXE to the same date, and end up with equivalent files (assuming they were built from the same source code), but things are more complicated than that: the compiler is free to write out the resources (strings, icons, file version information, etc.) in a different order each time you compile the code, and you can't really prevent this from happening. Resources are stored as independent "chunks" of data that can be rearranged in the resulting EXE without affecting the run-time behavior of the program.
If that wasn't enough, the compiler might be building up the EXE file in an area of uninitialized memory, so certain parts of the EXE might contain bits and pieces of whatever was in memory at the time the compiler was running, creating even more differences.
As for MD5...
You are not misunderstanding MD5 hashing: MD5 will always produce the same hash given the same input. The problem here is that the input in this case (the EXE files) keep changing.
Conclusion: Source control is your friend
As for solving your current dilemma, I'll leave you with this: associating a particular EXE with a specific version of the source code is a more a matter of policy, which has to be enforced somehow, than anything else. Trying to figure out what EXE came from what version without any context is just not going to be reliable. You need to track this with the help of other tools. For example, ensuring that each build produces a different version number for your EXE's, and that that version can be easily paired with a specific revision/branch/tag/whatever in your version control system. To that end, a "free-for-all" situation where some developers use source control and others use "that copy of the source code from 1997 that I'm keeping in my network folder because it's my code and source control is for sissies anyway" won't help make this any easier. I would get everyone drinking the source control Kool-Aid and adhering to a standard policy for creating builds right away.
Whenever we build projects, our build server (we use Hudson) ensures that the compiled EXE version is updated to include the current build number (we use the Version Number Plugin and a custom build script to do this), and when we release a build, we create a tag in Subversion using the version number as the tag name. The build server archives release builds, so we can always get the specific EXE (and setup program) that was given to a customer. For internal testing, we can choose to pull an archived EXE from the build server, or just tell the build server to rebuild the EXE from the tag we created in Subversion.
We also never, ever, ever release any binaries to QA or to customers from any machine other than the build server. This prevents "works on my machine" bugs, and ensures that we are always compiling from a "known" copy of the source code (it only pulls and builds code that is in our Subversion repository), and that we can always associate a given binary with the exact version of the code that it was created from.
I know it has been a while, but since there is VB De-compiler app, you may consider bulk-decompiling vb6 apps, and then feeding decompilation results to an AI/statistical anomaly detection on the various code bases. Given the problem you face doesn't have an exact solution, it is unlikely the results will be 100% accurate, but as you feed more data, the detection should become more and more accurate
I'm building an OS X Installer package for a product. When it is run, the 'Select a Destination' pane has an 'Installing this software requires X MB of space' label. But I can run the same package twice on the same machine, and see the claimed usage vary from, i.e. 85 to 127 MB, neither of which is the actual ~65MB usage of the product.
How does Installer calculate required space?
The installer .pkg file contains several components:
the archive of files to install
a bill of materials (metadata listing all the installable files)
resources for the installation itself (images, scripts, etc)
an Info.plist containing version information and defaults
The bill of materials, or "BOM", contains information such as permissions, file sizes, checksum, and so on. When the installer runs for a package the very first time, the total of the file sizes listed in the BOM is used to estimate the required size. (If there are any shared components, this will obviously affect the total.)
After an installation is complete, the BOM is saved in the package receipts folder (/Library/Receipts/boms) as a record of what was installed. The lsbom utility can be used to inspect the contents of these files.
On subsequent installations of the same package (as determined by the package identifier), the BOM receipts are consulted to determine what files are already installed, and their total size. The existing unchanged files are totalled and subtracted from the new files to be installed, while updated files that need to replace older files are taken into account too. The pkgutil tool can be used to display information about installed packages.
So this is why the installation size estimate can vary across installations. New and existing files add to the total, while existing unchanged files subtract from the installation requirements.
Could the installer be including any other files (Frameworks, StartupItems, Drivers, etc) that your program uses in the file size? If so, then the changes in sizes you are experiencing may be due to you not having those files at one point, and having them at another?
Of course, I could be wrong =]
I may be wrong but I'm guessing it's an aproxamation set by you the developer. You would put something like "Installing this software requires 120 MB of Space"
I know that when I install a product on my mac, I see what it says it will take, and I see what is currently available, however I NEVER go in and actually check that the software used EXACTLY what it said it would. especially if it's only about 50MB.