How should I mark a folder as processed in a script? - windows

A script shall process files in a folder on a Windows machine and mark it as done once it is finished in order to not pick it up in the next round of processing.
My tendency is to let the script rename the folder to a different name, like adding "_done".
But on Windows, renaming a folder is not possible if some process has the folder or a file within it open. In this setup, there is a minor chance that some user may have the folder open.
Alternatively I could just write a stamp-file into that folder.
Are there better alternatives?
Is there a way to force the renaming anyway, in particular when it is on a shared drive or some NAS drive?

You have several options:
Put a token file of some sort in each processed folder and skip the folders that contain said file
Keep track of the last folder processed and only process ones newer (Either by time stamp or (since they're numbered sequentially), by sequence number)
Rename the folder
Since you've already stated that other users may already have the folder/files open, we can rule out #3.
In this situation, I'm in favor of option #1 even though you'll end up with extra files, if someone needs to try and figure out which folders have already been processed, they have a quick, easy method of discerning that with the naked eye, rather than trying to find a counter somewhere in a different file. It's also a bit less code to write, so less pieces to break.
Option #2 is good in this situation as well (I've used both depending on the circumstances), but I tend to favor it for things that a human wouldn't really need to care about or need to look for very often.

Related

Find next file (but not FindNextFile)

If a user opens a file in a program (for example using GetOpenFileNameW, DragQueryFileW, command line argument, or whatever else to get the path, and a subsequent CreateFileW call), is there a way to find the next file in the parent directory of the opened file?
The obvious solution is to cycle through the results from FindNextFileW or NtQueryDirectoryFileEx until the opened file is encountered, and just open the next file.
However, this seems undesireable.
First, because these functions use paths (instead of for example a handle), the original file is decoupled from the search algorithm, so the original file might not even get encountered in that search. This is not much of an issue (as failing in this case is the expected outcome), and it probably could be resolved with (temporarly) changing the sharing mode, using LockFile or similar (though I would like to avoid that).
Second, this cycling search would have to be done every time, because the contents of the directory might have changed (retaining hFindFile does not work, because only FindFirstFileW calls NtQueryDirectoryFileEx and enumerates the contents of the directory). Which seems like unnecessary work and might even affect performance (for example if the directory contains a lot of files).
In theory any file system has some way of enumerating the files in a directory. Meaning there is some ordered data structure of the files' metadata. And getting the next file should only involve going back from the existing file handle to that file's entry, and then getting the next entry from that data structure. So there does not seem to be a fundamental reason why this cannot be done more sanely.
I thought maybe there exist a better way to do this somewhere in WinAPI...
Same question for finding the previous file.

Is there a way to limit my executable's ability to delete to only files it has created?

I'm on Windows writing a C++ executable that deletes and replaces some files in a directory it creates during an earlier run session. Maybe I'm a little panicky, but since my directory and file arguments for the deletions are generated by parsing an input file's path, I worry about the parse throwing out a much higher or different directory due to an oversight and systematically deleting unrelated files unintentionally.
Is there a way to limit my executable's reign to only include write/delete access to files it has created during earlier run sessions, while retaining read access to everything else? Or at least provide a little extra peace of mind that, even if I really mis-speak my strings to DeleteFileA() and RemoveDirectoryA() I'll avoid causing catastrophic damage?
It doesn't need to be a restriction to the entire executable, it's good enough if it limits the function calls to delete and remove in some way.

Windows remembering lower case filename, how to force it to forget?

Here's my problem:
I've got source files I'm publishing (.dita files, publishing using Oxygen) and I need to change capitalization on a lot of them, along with folders and subfolders that they're in. Everything is in source control using SVN.
When I change only an initial cap, say, and leave everything about the filename the same otherwise, Windows "remembers" the lower case name, and that's what gets published, even though the source name is now upper case.
I can even search for the filename, for example Foobar.dita, and the search results will show me "foobar.dita". When I go to that location directly in the file explorer, the file is named Foobar.dita. It's not a duplicate, it's the same file.
What I understand from reading up on this is that Windows isn't case-sensitive, but it "remembers" the filename as one case or the other. So my question is, if I can't force Windows to be case-sensitive, can I somehow force Windows to forget the filename? I've tried deleting it from both Windows and SVN, and recreating it, but it still gets read as lower case when it's initial cap.
If I rename the file, even slightly, it solves the problem, but many of the filenames are just what they need to be, and it's a lot more work to rename them (to think of another good filename) than just to change to initial cap.
UPDATE:
Here's where I read about about the "remembering" idea, in response two, the one with 7 recommendations.
To be explicit: I'm not updating from SVN and thus turning it back to lower case, it's upper case in SVN. It appears upper case in the Windows folder.
UPDATE II: This seems to be what I'm up against:
http://support.microsoft.com/kb/100625
In NTFS, you can create unique file names, stored in the same directory, that differ only in case. For example, the following filenames can coexist in one directory on an NTFS volume:
CASE.TXT
case.txt
case.TXT
However, if you attempt to open one of these files in a Win32 application, such as Notepad, you would only have access to one of the files, regardless of the case of the filename you type in the Open File dialog box.
So it sounds like the only answer is rename the files, not just change case.

strategies for backing up packages on macosx

I am writing a program that synchronizes files across file systems much like rsync but I'm stuck when it comes to handling packages. These are folders that are identified by the system as containing a coherent set of files. Pages and Numbers can use packages rather than monolithic files, and applications are actually packages for example. My problem is that I want to keep the most recent version and also keep a backup copy. As far as I can see I have two options -
I can just treat the whole thing as a regular folder and handle the contents entry by entry.
I can look at all the modification dates of all the contents and keep the complete folder tree for the one that has the most recently modified contents.
I was going for (2) and then I found that the iPhoto library is actually stored as a package and that would mean I would copy the whole library (10s, or even 100s of gigabytes) even if only one photograph was altered.
My worry with (1) is that handling the content files individually might break things. I haven't really come up with a good solution that will guarantee that the package will work and won't involved unnecessarily huge backup files in some cases. If it is just iPhoto then I can probably put in a special case, or perhaps change strategy if the package is bigger than some user specified limit.
Packages are surprisingly mysterious, and what the system treats as a package does not seem to be just a matter of setting an extended attribute on a folder.
It depends on how you treat the "backup" version. Do you keep two versions of each file (the current and first previous), or two versions of the sync snapshot (i.e. if a file hasn't changed between the last two syncs, you only store one version)?
If it's two versions of the sync, packages shouldn't be a big problem -- just provide a way to restore the "backup" version, which if necessary splices together the changed files from the "backup" with the unchanged files from the current sync. There are some things to watch out for, though: make sure you correctly handle files that're deleted or added between the two snapshots.
If you're storing two versions of each file, things are much more complicated -- you need some way to record which versions of the files within the package "go together". I think in this case I'd be tempted to only store backup versions of files within the package from the last time something within the package changed. So, for example, say you sync a package called preso.key. On the second sync, preso.key/index.apxl.gz and preso.key/splash.png are modified, so the old version of those two files get stored in the backup. On the third sync, preso.key/index.apxl.gz is modified again, so you store a new backup version of it and remove the backup version of preso.key/splash.png.
BTW, another way to save space would be hard-linking. If you want to store two "full" versions of a big package without without wasting space, just store one copy of each unchanged file and hard-link it into both backups.

command line wisdom for 2 panel file manager user

Want to upgrade my file management productivity by replacing 2 panel file manager with command line (bash or cygwin). Can commandline give same speed? Please advise a guru way of how to do e.g. copy of some file in directory A to the directory B. Is it heavy use of pushd/popd? Or creation of links to most often used directories? What are the best practices and a day-to-day routine to manage files of a command line master?
Can commandline give same speed?
My experience is that commandline copying is significantly faster (especially in the Windows environment). Of course the basic laws of physics still apply, a file that is 1000 times bigger than a file that copies in 1 second will still take 1000 seconds to copy.
..(howto) copy of some file in directory A to the directory B.
Because I often have 5-10 projects that use similar directory structures, I set up variables for each subdir using a naming convention :
project=NewMatch
NM_scripts=${project}/scripts
NM_data=${project}/data
NM_logs=${project}/logs
NM_cfg=${project}/cfg
proj2=AlternateMatch
altM_scripts=${proj2}/scripts
altM_data=${proj2}/data
altM_logs=${proj2}/logs
altM_cfg=${proj2}/cfg
You can make this sort of thing as spartan or baroque as needed to match your theory of living/programming.
Then you can easily copy the cfg from 1 project to another
cp -p $NM_cfg/*.cfg ${altM_cfg}
Is it heavy use of pushd/popd?
Some people seem to really like that. You can try it and see what you thing.
Or creation of links to most often used directories?
Links to dirs are, in my experience used more for software development where a source code is expecting a certain set of dir names, and your installation has different names. Then making links to supply the dir paths expected is helpful. For production data, is just one more thing that can get messed up, or blow up. That's not always true, maybe you'll have a really good reason to have links, but I wouldn't start out that way, just because it is possible to do.
What are the best practices and a day-to-day routine to manage files of a command line master?
( Per above, use standardized directory structure for all projects.
Have scripts save any small files to a directory your dept keeps in the /tmp dir, .
i.e /tmp/MyDeptsTmpFile (named to fit your local conventions) )
It depends. If you're talking about data and logfiles, dated fileNames can save you a lot of time. I recommend dateFmts like YYYYMMDD(_HHMMSS) if you need the extra resolution.
Dated logfiles are very handy, when a current process seems like it is taking a long time, you can look at the log file from a week ago and quantify exactly how long this process took, a week, month, 6 months (up to how much space you can afford). LogFiles should also capture all STDERR messages, so you never have to re-run a bombed program just to see what the error message was.
This is Linux/Unix you're using, right? Read the man page for the cp cmd installed on your machine. I recommend using an alias like alias CP='/bin/cp -pi' so you always copy a file with the same permissions and with the original files' time stamp. Then it is easy to use /bin/ls -ltr to see a sorted list of files with the most recent files showing up at the bottom of the list. (No need to scroll back to the top, when you sort by time,reverse). Also the '-i' option will warn you that you are going to overwrite a file, and this has saved me more than a couple of times.
I hope this helps.
P.S. as you appear to be a new user, if you get an answer that helps you please remember to mark it as accepted, and/or give it a + (or -) as a useful answer.

Resources