Fast deletion of files in a Setup project - visual-studio

I am maintaining a Setup project under VS2008. The project contains thousands of files arranged in a hierachy of folders.
Every now and then, I want to renew part of this hierarchy, which means deleting a number of nodes and reinserting the new content. I need to do that because some of the files are obsolete and need to be removed. It is much safer to delete all than to chase the obsolete files away.
Unfortunately, this is a very tedious task because you can't delete empty folders and you have to delete every node in the hierarchy one by one. In addition, for a large project, every deletion takes seconds.
Do you know of a way to speed-up or automate that task ? Merely erasing lines in the .vdproj file doesn't seem to work.

If you do not feel too queasy about it.... This is worth the effort if changes are frequent.
The .vcproj file is in xml format. You could use explorer to manage your files, and write a small utility that checks and removes the (then) missing files from your project. The files are marked with tags, as in
<File
RelativePath=".\AudioPlayerPane.cpp"
>
</File>
You have to remove the entire "File" tag, that's 3 lines, or more, if the file has special compilation options, etc.. up to and including the "/File" tag. In addition, you will want to delete the .suo, .ncb and .cache files
It's too bad that boost.property_tree xml does not support any encoding other than UTF-8, as I would suggest you use that to recursively walk the vcproj file. This would make the utility very easy to write, and ensure the resulting file is correct. Maybe you can use the encoding capabilities of notepad++ to manually change the encoding of the file before and after changes.

Related

Checksum File Comparison Tool

So I am looking for a tool that can compare files in folders based on checksums (this is common, not hard to find); however, my use-case is that the files can exist in pretty deep folder paths that can change, I am expected to compare them every few months and ONLY create a package of the different files. I don't care what folders the files are in, the same file can move between folders regularly and files wouldn't change names much, only content (so checksums are a must).
My issue is that almost all of the tools I can find do care about the folder paths when they compare folders, I don't and I actually want it to ignore the folder paths. I rather not develop anything or at least only have to develop a small part of the process to save time.
To be clear the order I am looking for things to happen are:
Program scans directory from 1/1/2020 (A).
Program scans directory from 4/1/2020 (B)
Finds all files where checksum in B don't exist in A and make a new folder with differences (C).
Any ideas?

Undo a botched command prompt copy which concatenated all of my files

In a Windows 8 Command Prompt, I had a backup drive plugged in and I navigated to my User directory. I executed the command:
copy Documents G:/Seagate_backup/Documents
What I assumed was that copy would create the Documents directory on my backup drive and then copy the contents of the C: Documents directory into it. That is not what happened!
I proceeded to wipe my hard-drive and re-install the operating system, thinking I had backed up the important files, only to find out that copy seemingly concatenated all the C: Documents files of different types (.doc, .pdf, .txt, etc) into one file called "Documents." This file is of course unreadable but opening it in Notepad reveals what happened. I can see some of my documents which were plain text throughout the massively long file.
How do I undo this!!? It's terrible because I was actually helping a friend and was so sure of myself but now this has happened. The only thing I can think of doing is searching for some common separator amongst the concatenated files and write some sort of script to split the file back apart. But then I would have to guess the extensions of each of the pieces...
Merging files together in the fashion that copy uses, discards important file system information such as file size and file name. While the file name may not be as important the size is. Both parameters are used by the OS to discriminate files.
This problem might sound familiar if you have damaged your file allocation table before and all files disappeared. In both cases, you will end up with a binary blob (be it an actual disk or something like your file which might resemble a disk image) that lacks any size and filename information.
Fortunately, this is where a lot of file system recovery tools can help. They are specialized in matching patterns. Specifically they are looking for giveaway clues to what type a file is of, where it starts and what it's size is.
This is for instance enabled by many file types having a set of magic numbers that are used to allow a program to check if a file really is of the type that the extension claims to be.
In principle it is possible to undo this process more or less well.
You will need to use data recovery tools or other analysis tools like binwalk to extract the concatenated binary blob. Essentially the same tools that are used to recover deleted files should be able to extract your documents again. Without any filename of course. I recommend renaming the file to a disk image (.img) and either mounting it from within the operating system as a virtual harddisk (don't worry that it has no file system - it should show up as an unformatted drive) or directly using a data recovery tool or analysis tool which can read binary files (binwalk, for instance, can do that directly, but may not find all types of files as it's mainly for unpacking firmware images that may be assembled in the same or a similar way to how your files ended up).

How should I mark a folder as processed in a script?

A script shall process files in a folder on a Windows machine and mark it as done once it is finished in order to not pick it up in the next round of processing.
My tendency is to let the script rename the folder to a different name, like adding "_done".
But on Windows, renaming a folder is not possible if some process has the folder or a file within it open. In this setup, there is a minor chance that some user may have the folder open.
Alternatively I could just write a stamp-file into that folder.
Are there better alternatives?
Is there a way to force the renaming anyway, in particular when it is on a shared drive or some NAS drive?
You have several options:
Put a token file of some sort in each processed folder and skip the folders that contain said file
Keep track of the last folder processed and only process ones newer (Either by time stamp or (since they're numbered sequentially), by sequence number)
Rename the folder
Since you've already stated that other users may already have the folder/files open, we can rule out #3.
In this situation, I'm in favor of option #1 even though you'll end up with extra files, if someone needs to try and figure out which folders have already been processed, they have a quick, easy method of discerning that with the naked eye, rather than trying to find a counter somewhere in a different file. It's also a bit less code to write, so less pieces to break.
Option #2 is good in this situation as well (I've used both depending on the circumstances), but I tend to favor it for things that a human wouldn't really need to care about or need to look for very often.

strategies for backing up packages on macosx

I am writing a program that synchronizes files across file systems much like rsync but I'm stuck when it comes to handling packages. These are folders that are identified by the system as containing a coherent set of files. Pages and Numbers can use packages rather than monolithic files, and applications are actually packages for example. My problem is that I want to keep the most recent version and also keep a backup copy. As far as I can see I have two options -
I can just treat the whole thing as a regular folder and handle the contents entry by entry.
I can look at all the modification dates of all the contents and keep the complete folder tree for the one that has the most recently modified contents.
I was going for (2) and then I found that the iPhoto library is actually stored as a package and that would mean I would copy the whole library (10s, or even 100s of gigabytes) even if only one photograph was altered.
My worry with (1) is that handling the content files individually might break things. I haven't really come up with a good solution that will guarantee that the package will work and won't involved unnecessarily huge backup files in some cases. If it is just iPhoto then I can probably put in a special case, or perhaps change strategy if the package is bigger than some user specified limit.
Packages are surprisingly mysterious, and what the system treats as a package does not seem to be just a matter of setting an extended attribute on a folder.
It depends on how you treat the "backup" version. Do you keep two versions of each file (the current and first previous), or two versions of the sync snapshot (i.e. if a file hasn't changed between the last two syncs, you only store one version)?
If it's two versions of the sync, packages shouldn't be a big problem -- just provide a way to restore the "backup" version, which if necessary splices together the changed files from the "backup" with the unchanged files from the current sync. There are some things to watch out for, though: make sure you correctly handle files that're deleted or added between the two snapshots.
If you're storing two versions of each file, things are much more complicated -- you need some way to record which versions of the files within the package "go together". I think in this case I'd be tempted to only store backup versions of files within the package from the last time something within the package changed. So, for example, say you sync a package called preso.key. On the second sync, preso.key/index.apxl.gz and preso.key/splash.png are modified, so the old version of those two files get stored in the backup. On the third sync, preso.key/index.apxl.gz is modified again, so you store a new backup version of it and remove the backup version of preso.key/splash.png.
BTW, another way to save space would be hard-linking. If you want to store two "full" versions of a big package without without wasting space, just store one copy of each unchanged file and hard-link it into both backups.

Write multiple files atomically

Suppose I have a folder with a few files, images, texts, whatever, it only matters that there are multiple files and the folder is rather large (> 100 mb). Now I want to update five files in this folder, but I want to do this atomically, normally I would just create a temporary folder and write everything into it and if it succeeds, just replace the existing folder. But because I/O is expensive, I don't really want to go this way (resaving hundreds of files just to update five seems like a huge overhead). But how am I supposed to write these five files atomically? Note, I want the writing of all files to be atomic, not each file separately.
You could adapt your original solution:
Create a temporary folder full of hard links to the original files.
Save the five new files into the temporary folder.
Delete the original folder and move the folder of hard links in its place.
Creating a few links should be speedy, and it avoids rewriting all the files.

Resources