Atomically delete a list of paths - bash

I need to be able to delete an arbitrary list of paths (both files and directories) and, if any of the deletions fail, I need to be able to roll back. Is there a Unix command that can accommodate this? If not, a bash script works as well.

There's unlikely to be a command that does this in its full generality. The O/S does not support atomically deleting multiple paths, so it is hard to impossible for a command to do so. Consider a SIGKILL; the command cannot recover, and the kernel won't know it has to undo what was done, so the atomicity is broken.
You can approximate atomicity by moving the deleted files or directories to a trash folder, and then only deleting the contents of the trash folder when everything else has succeeded (and recover the data from the trash folder if anything goes wrong). But it isn't guaranteed atomic. And you have to worry about where to place the trash when the files are on different file systems, so you need a per-file-system trash folder. You also need to worry about atomically deleting 30 files all called 'makefile'; that means you're going to need directory hierarchy information in the trash directory (probably actual directories under the trash directory since anything else is, ultimately, ambiguous).

Related

Is there a way to limit my executable's ability to delete to only files it has created?

I'm on Windows writing a C++ executable that deletes and replaces some files in a directory it creates during an earlier run session. Maybe I'm a little panicky, but since my directory and file arguments for the deletions are generated by parsing an input file's path, I worry about the parse throwing out a much higher or different directory due to an oversight and systematically deleting unrelated files unintentionally.
Is there a way to limit my executable's reign to only include write/delete access to files it has created during earlier run sessions, while retaining read access to everything else? Or at least provide a little extra peace of mind that, even if I really mis-speak my strings to DeleteFileA() and RemoveDirectoryA() I'll avoid causing catastrophic damage?
It doesn't need to be a restriction to the entire executable, it's good enough if it limits the function calls to delete and remove in some way.

Checksum File Comparison Tool

So I am looking for a tool that can compare files in folders based on checksums (this is common, not hard to find); however, my use-case is that the files can exist in pretty deep folder paths that can change, I am expected to compare them every few months and ONLY create a package of the different files. I don't care what folders the files are in, the same file can move between folders regularly and files wouldn't change names much, only content (so checksums are a must).
My issue is that almost all of the tools I can find do care about the folder paths when they compare folders, I don't and I actually want it to ignore the folder paths. I rather not develop anything or at least only have to develop a small part of the process to save time.
To be clear the order I am looking for things to happen are:
Program scans directory from 1/1/2020 (A).
Program scans directory from 4/1/2020 (B)
Finds all files where checksum in B don't exist in A and make a new folder with differences (C).
Any ideas?

How to remove directory in Windows synchronous

RemoveDirectory() is documented as only marking a directory for deletion. I have an application where I have to be sure that the directory is actually deleted (because I create a new one with the same name, or delete directories recursively).
First idea I had was to use GetFileAttributes() to test if the directory still exists, or to use SHFileOperation() for deletion. But when running long test, at some point both solutions fail - CreateDirectory() fails.
Is there a solution for this?
This video by Douglas Niall at the 2015 CppCon covers the solution in detail, starting at about 7:30.
The idea is to first rename (move) the file or directory to another place (on the same volume), which happens synchronously, and then delete it, which happens asynchronously.
Consider this tree:
C:\Users\me\
foo\
bar\
obsolete.txt
If you try to remove bar after deleting obsolete.txt, it may fail because there can be a delay before obsolete.txt is really deleted.
Instead suppose you first move obsolete.txt to C:\Users\me, and give it a temporary name to ensure it doesn't collide with another obsolete.txt in the directory. Maybe you prefix it with a GUID, like 2DCD7863-456C-4B6C-AD84-C4F5E8009D81_obsolete.txt. Now you can delete the file using that temporary name, and, even if there's a delay before it's really deleted, you know bar is truly empty. You can now delete bar or create a new obsolete.txt in bar without worries of a conflict.
To remove bar (a directory) on the way to deleting foo (the root of the tree you're trying to delete), you play the same game. Move it to the parent of the root, call RemoveDirectory, and then proceed along your merry way knowing that it will eventually be deleted.
Possible options:
Delete Directory and check for its existence afterwards
if no handle was open, it is deleted. if a handle is still open there is another problem. Optionally you can wait a few ms after each existence check until it disappears.
Delete all files inside the directory
you mentioned you want to recreate it, so just delete its content. Doing this allows you to see which files/folders are still open inside the directory.

How should I mark a folder as processed in a script?

A script shall process files in a folder on a Windows machine and mark it as done once it is finished in order to not pick it up in the next round of processing.
My tendency is to let the script rename the folder to a different name, like adding "_done".
But on Windows, renaming a folder is not possible if some process has the folder or a file within it open. In this setup, there is a minor chance that some user may have the folder open.
Alternatively I could just write a stamp-file into that folder.
Are there better alternatives?
Is there a way to force the renaming anyway, in particular when it is on a shared drive or some NAS drive?
You have several options:
Put a token file of some sort in each processed folder and skip the folders that contain said file
Keep track of the last folder processed and only process ones newer (Either by time stamp or (since they're numbered sequentially), by sequence number)
Rename the folder
Since you've already stated that other users may already have the folder/files open, we can rule out #3.
In this situation, I'm in favor of option #1 even though you'll end up with extra files, if someone needs to try and figure out which folders have already been processed, they have a quick, easy method of discerning that with the naked eye, rather than trying to find a counter somewhere in a different file. It's also a bit less code to write, so less pieces to break.
Option #2 is good in this situation as well (I've used both depending on the circumstances), but I tend to favor it for things that a human wouldn't really need to care about or need to look for very often.

Windows remembering lower case filename, how to force it to forget?

Here's my problem:
I've got source files I'm publishing (.dita files, publishing using Oxygen) and I need to change capitalization on a lot of them, along with folders and subfolders that they're in. Everything is in source control using SVN.
When I change only an initial cap, say, and leave everything about the filename the same otherwise, Windows "remembers" the lower case name, and that's what gets published, even though the source name is now upper case.
I can even search for the filename, for example Foobar.dita, and the search results will show me "foobar.dita". When I go to that location directly in the file explorer, the file is named Foobar.dita. It's not a duplicate, it's the same file.
What I understand from reading up on this is that Windows isn't case-sensitive, but it "remembers" the filename as one case or the other. So my question is, if I can't force Windows to be case-sensitive, can I somehow force Windows to forget the filename? I've tried deleting it from both Windows and SVN, and recreating it, but it still gets read as lower case when it's initial cap.
If I rename the file, even slightly, it solves the problem, but many of the filenames are just what they need to be, and it's a lot more work to rename them (to think of another good filename) than just to change to initial cap.
UPDATE:
Here's where I read about about the "remembering" idea, in response two, the one with 7 recommendations.
To be explicit: I'm not updating from SVN and thus turning it back to lower case, it's upper case in SVN. It appears upper case in the Windows folder.
UPDATE II: This seems to be what I'm up against:
http://support.microsoft.com/kb/100625
In NTFS, you can create unique file names, stored in the same directory, that differ only in case. For example, the following filenames can coexist in one directory on an NTFS volume:
CASE.TXT
case.txt
case.TXT
However, if you attempt to open one of these files in a Win32 application, such as Notepad, you would only have access to one of the files, regardless of the case of the filename you type in the Open File dialog box.
So it sounds like the only answer is rename the files, not just change case.

Resources