I have been tasked with restructuring the directory of files relating to employees. As it is now, each employee has their own folder and all the files are grouped into 3 subfolders, divided by year. I'd like to sort the files in each of the folders into 4 other subfolders that are organized by subject matter. Is there any way to automate the creation of folders and transferring of files into these folders?
If this is not a sufficient information about my issue, please say so and I will attempt to provide a more accurate explanation.
You could use PowerShell or any number of scripting languages/tools (Perl, Python). The trick may be knowing which target folder each of the files should go into. If you can determine that from the name of the file or the file type it will be trivial, but if there is some other criterion it may be harder.
Related
So I am looking for a tool that can compare files in folders based on checksums (this is common, not hard to find); however, my use-case is that the files can exist in pretty deep folder paths that can change, I am expected to compare them every few months and ONLY create a package of the different files. I don't care what folders the files are in, the same file can move between folders regularly and files wouldn't change names much, only content (so checksums are a must).
My issue is that almost all of the tools I can find do care about the folder paths when they compare folders, I don't and I actually want it to ignore the folder paths. I rather not develop anything or at least only have to develop a small part of the process to save time.
To be clear the order I am looking for things to happen are:
Program scans directory from 1/1/2020 (A).
Program scans directory from 4/1/2020 (B)
Finds all files where checksum in B don't exist in A and make a new folder with differences (C).
Any ideas?
A script shall process files in a folder on a Windows machine and mark it as done once it is finished in order to not pick it up in the next round of processing.
My tendency is to let the script rename the folder to a different name, like adding "_done".
But on Windows, renaming a folder is not possible if some process has the folder or a file within it open. In this setup, there is a minor chance that some user may have the folder open.
Alternatively I could just write a stamp-file into that folder.
Are there better alternatives?
Is there a way to force the renaming anyway, in particular when it is on a shared drive or some NAS drive?
You have several options:
Put a token file of some sort in each processed folder and skip the folders that contain said file
Keep track of the last folder processed and only process ones newer (Either by time stamp or (since they're numbered sequentially), by sequence number)
Rename the folder
Since you've already stated that other users may already have the folder/files open, we can rule out #3.
In this situation, I'm in favor of option #1 even though you'll end up with extra files, if someone needs to try and figure out which folders have already been processed, they have a quick, easy method of discerning that with the naked eye, rather than trying to find a counter somewhere in a different file. It's also a bit less code to write, so less pieces to break.
Option #2 is good in this situation as well (I've used both depending on the circumstances), but I tend to favor it for things that a human wouldn't really need to care about or need to look for very often.
How can I find a completely random folder on a user's file system, test that I have write permission, and then create a file in that folder?
I am planning to write a little "treasure hunt" puzzle application where clues are randomly distributed throughout your system and you have to find them.
I have no idea how to begin picking a random folder though.
I still say this is a bad idea... but to answer your question:
You can use the Dir class, start in / and grab a list of all directories, then pick one randomly and transverse into it, check if you can write, and then repeat the process. Making note of those directories you have write access in.
It's not going to be quick.
Want to upgrade my file management productivity by replacing 2 panel file manager with command line (bash or cygwin). Can commandline give same speed? Please advise a guru way of how to do e.g. copy of some file in directory A to the directory B. Is it heavy use of pushd/popd? Or creation of links to most often used directories? What are the best practices and a day-to-day routine to manage files of a command line master?
Can commandline give same speed?
My experience is that commandline copying is significantly faster (especially in the Windows environment). Of course the basic laws of physics still apply, a file that is 1000 times bigger than a file that copies in 1 second will still take 1000 seconds to copy.
..(howto) copy of some file in directory A to the directory B.
Because I often have 5-10 projects that use similar directory structures, I set up variables for each subdir using a naming convention :
project=NewMatch
NM_scripts=${project}/scripts
NM_data=${project}/data
NM_logs=${project}/logs
NM_cfg=${project}/cfg
proj2=AlternateMatch
altM_scripts=${proj2}/scripts
altM_data=${proj2}/data
altM_logs=${proj2}/logs
altM_cfg=${proj2}/cfg
You can make this sort of thing as spartan or baroque as needed to match your theory of living/programming.
Then you can easily copy the cfg from 1 project to another
cp -p $NM_cfg/*.cfg ${altM_cfg}
Is it heavy use of pushd/popd?
Some people seem to really like that. You can try it and see what you thing.
Or creation of links to most often used directories?
Links to dirs are, in my experience used more for software development where a source code is expecting a certain set of dir names, and your installation has different names. Then making links to supply the dir paths expected is helpful. For production data, is just one more thing that can get messed up, or blow up. That's not always true, maybe you'll have a really good reason to have links, but I wouldn't start out that way, just because it is possible to do.
What are the best practices and a day-to-day routine to manage files of a command line master?
( Per above, use standardized directory structure for all projects.
Have scripts save any small files to a directory your dept keeps in the /tmp dir, .
i.e /tmp/MyDeptsTmpFile (named to fit your local conventions) )
It depends. If you're talking about data and logfiles, dated fileNames can save you a lot of time. I recommend dateFmts like YYYYMMDD(_HHMMSS) if you need the extra resolution.
Dated logfiles are very handy, when a current process seems like it is taking a long time, you can look at the log file from a week ago and quantify exactly how long this process took, a week, month, 6 months (up to how much space you can afford). LogFiles should also capture all STDERR messages, so you never have to re-run a bombed program just to see what the error message was.
This is Linux/Unix you're using, right? Read the man page for the cp cmd installed on your machine. I recommend using an alias like alias CP='/bin/cp -pi' so you always copy a file with the same permissions and with the original files' time stamp. Then it is easy to use /bin/ls -ltr to see a sorted list of files with the most recent files showing up at the bottom of the list. (No need to scroll back to the top, when you sort by time,reverse). Also the '-i' option will warn you that you are going to overwrite a file, and this has saved me more than a couple of times.
I hope this helps.
P.S. as you appear to be a new user, if you get an answer that helps you please remember to mark it as accepted, and/or give it a + (or -) as a useful answer.
we have a project that constitutes a large archive of image files...
We try to split them into sub-folders within the main archive folder.
Each sub-folder contains up to 2500 files in it.
For example:
C:\Archive
C:\Archive\Animals\
C:\Archive\Animals\001 - 2500 files...
C:\Archive\Animals\002 - 2300 files..
C:\Archive\Politics\
C:\Archive\Politics\001 - 2000 files...
C:\Archive\Politics\002 - 2100 files...
Etc... What would be the best way of storing files in such way under Windows ? and why exactly, please ... ?
Later on, the files have their EXIF metadata extracted and indexed for keywords, to be added into a Lucene index... (this is done by a Windows service that lives on the server)
We have an application where we try to make sure we don't store more than around 1000 files in a directory. Under Windows at least, we noticed extreme degradation in performance over this number. The folder can theoretically store up to 4,294,967,295 in Windows 7. Note that because the OS does a scan of the folder, doing lookups and lists very quickly degrades as you add many more files. Once we got to 100,000 files in a folder it was almost completely unusable.
I'd recommend breaking down the animals even further, perhaps by first letter of name. Same with the other files. This will let you separate things out more so you won't have to worry about the directory performance. Best advice I can give is to perform some stress tests on your system to see where the performance starts to tail off once you have enough files in a directory. Just be aware you'll need several thousand files to test this out.