Bash script to move folders based on filesize changes? - bash

I have some automated downloads in a proprietary linux distro.
They go to a temp scratch disk. I want to move them when they're finished to the main RAID array. The best way I can see to do this is to check the folders on the disk to see if the contents have changed in the last minute. If not then its probably finished downloading and then move it.
Assuming there could be hundreds of folders or just one in this location and its all going to the same place. Whats the best way to write this?
I can get a list of folder sizes with
du -h directory/name
The folders can contain multiple files anywhere from 1.5mb to 10GB
Temp Loc: /volume2/4TBScratch/Processing
Dest Loc when complete: /volume1/S/00 Landing
EDIT:
Using this:
find /volume2/4TBScratch/Processing -mindepth 1 -type d -not -mmin +10 -exec mv "{}" "/volume1/S/00 Landing" \;
find: `/volume2/4TBScratch/Processing/test': No such file or directory
4.3#
yet it DOES copy the relevant folders and all files. But the error worries me that something might go wrong in the future.... is it because there is multiple files and it's running the same move command for EACH file or folder in the root folder? But since it moves it all on the first iteration it cant find it on the next ones?
EDIT2:
Using Rsync
4.3# find /volume2/4TBScratch/Processing -mindepth 1 -type d -not -mmin +10 -exec rsync --remove-source-files "{}" "/volume1/S/00 Landing" \;
skipping directory newtest
skipping directory erw
RESOLVED: EDIT3
Resolved with the help in the comments below. Final script looks like this:
find /volume2/4TBScratch/Processing -mindepth 1 -type d -not -mmin +10 -exec rsync -a --remove-source-files "{}" "/volume1/S/00 Landing" \;
find /volume2/4TBScratch/Processing -depth -type d -empty -delete
rsync to move folders and files but leaves empty root dir
the next command finds empty folders and removes them.
Thanks all!

You can use GNU find with options -size for detecting files/folders of certain size and use mv with the -exec option to move to destination directory. The syntax is
find /volume2/4TBScratch/Processing -type d -maxdepth 1 -size -10G -exec mv "{}" "/volume1/S/00 Landing" \;
Using rsync
find /volume2/4TBScratch/Processing -type d -maxdepth 1 -size -10G -exec rsync --remove-source-files "{}" "/volume1/S/00 Landing" \;
The size with a - sign to indicate less than the mentioned size which in this case is 10GB. A note on each of the flags used
-type d -> For identifying only the folders from the source path.
-maxdepth 1 -> To look only on the current source directory and not
being recursive.
-exec -> Execute command following it.
Alternatively, if you want to find files that are last modified over a certain time(minutes), find has an option for -mmin which can be set to a value. E.g. -mmin -5 would return files modified five minutes ago.
So suggest adding it to your requirement, for x as you need and see if the directories are listed, then you can add the -exec option for moving the directories
find /volume2/4TBScratch/Processing -type d -maxdepth 1 -mmin -2 -size -10G
Refer to the GNU documentation for finding files according to size on how this works.
Note:- The double quotes("") are added to avoid Bash from splitting the names containing spaces.

Related

Delete files that are 5 days old using bash script

I currently use a command that searches a directory and deletes 5 day old files.
find /path/to/files* -mtime +5 -exec rm {} \;
I run it from the command line and it works fine. But when I put it in a .sh file it says findĀ /path/to/files*: No such file or directory.
There is only two lines in the shell script:
#! /usr/bin/env bash
find /path/to/files* -mtime +5 -exec rm {} \;
How can I rewrite the script so that it works?
`
The error happens if there are currently no files matching the wildcard, presumably because none have been created since you deleted them previously.
The argument to find should be the directory containing the files, not the filenames themselves, since find will automatically search the directory. If you want to restrict the filenames, use the -name option to specify the wildcard.
And if you don't want to go into subdirectories, use the -maxdepth option.
find /path/to -maxdepth 1 -type f -name 'files*' -mtime +5 -delete
This works:
#! /usr/bin/env bash
find /home/ubuntu/directory -type f -name 'filename*' -mtime +4 -delete
Here is an example:
find /home/ubuntu/processed -type f -name 'output*' -mtime +4 -delete

Bash script for removing specific file from certain subdirectories

On a unix server, I'm trying to figure out how to remove a file, say "example.xls", from any subdirectories that start with v0 ("v0*").
I have tried something like:
find . -name "v0*" -type d -exec find . -name "example.xls" -type f
-exec rm {} \;
But i get errors. I have a solution but it works too well, i.e. it will delete the file in any subdirectory, regardless of it's name:
find . -type f -name "example.xls" -exec rm -f {} \;
Any ideas?
You will probably have to do it in two steps -- i.e. first find the directories, and then the files -- you can use xargs to make it in a single line, like
find . -name "v0*" -type d | \
xargs -l -I[] \
find [] -name "example.xls" -type f -exec rm {} \;
what it does, is first generating a list of viable directory name, and let xargs call the second find with the names locating the file name within that directory
Try:
find -path '*/v0*/example.xls' -delete
This matches only files named example.xls which, somewhere in its path, has a parent directory name that starts with v0.
Note that since find offers -delete as an action, it is not necessary to invoke the external executable rm.
Example
Consider this directory structure:
$ find .
.
./a
./a/example.xls
./a/v0
./a/v0/b
./a/v0/b/example.xls
./a/v0/example.xls
We can identify files example.xls who have one of their parent directories named v0*:
$ find -path '*/v0*/example.xls'
./a/v0/b/example.xls
./a/v0/example.xls
To delete those files:
find -path '*/v0*/example.xls' -delete
Alternative: find only those files directly under directory v0*
find -regex '.*/v0[^/]*/example.xls'
Using the above directory structure, this approach returns one file:
$ find -regex '.*/v0[^/]*/example.xls'
./a/v0/example.xls
To delete such files:
find -regex '.*/v0[^/]*/example.xls' -delete
Compatibility
Although my tests were performed with GNU find, both -regex and -path are required by POSIX and also supported by OSX.

Deleting files older than 10 days in wildcard directory loop

I would like to delete old files from multiple directories but there is a wild card for one of the path attributes. So I'm trying to loop through each of those directories without specifying each one. I think I'm almost there but I'm not sure how to cd into the specific directory to delete the relevant files.
#! /bin/bash
DELETE_SEARCH_DIR=/apps/super/userprojects/elasticsearch/v131/node*/elasticsearch-1.3.1/logs/
for entry in `ls $DELETE_SEARCH_DIR`; do
find $path -name "*super*" -type f -mtime +10 -print -delete
#find . -type f -name $entry -exec rm -f {} \;
done
Any ideas on how to get into the specific directory and apply the delete?
find can search in multiple directories. You can do it like this:
DELETE_SEARCH_DIR=/apps/super/userprojects/elasticsearch/v131/node*/elasticsearch-1.3.1/logs
find $DELETE_SEARCH_DIR -type f -name '*super*' -mtime +10 -print -delete

Trying to remove a file and its parent directories

I've got a script that finds files within folders older than 30 days:
find /my/path/*/README.txt -mtime +30
that'll then produce a result such as
/my/path/jobs1/README.txt
/my/path/job2/README.txt
/my/path/job3/README.txt
Now the part I'm stuck at is I'd like to remove the folder + files that are older than 30 days.
find /my/path/*/README.txt -mtime +30 -exec rm -r {} \;
doesn't seem to work. It's only removing the readme.txt file
so ideally I'd like to just remove /job1, /job2, /job3 and any nested files
Can anyone point me in the right direction ?
This would be a safer way:
find /my/path/ -mindepth 2 -maxdepth 2 -type f -name 'README.txt' -mtime +30 -printf '%h\n' | xargs echo rm -r
Remove echo if you find it already correct after seeing the output.
With that you use printf '%h\n' to get the directory of the file, then use xargs to process it.
You can just run the following command in order to recursively remove directories modified more than 30 days ago.
find /my/path/ -type d -mtime +30 -exec rm -rf {} \;

Copy changed files, create a changeset and maintain directory structure

I want to copy just the files i've created/edited today into a separate directory "changeset" whilst maintaining their directory structure
I came up with the following script
cd ./myproject/
find ./* -mtime -1 -daystart -exec cp {} ../changeset/{} \;
The drawbacks of the above is that directories aren't created and the copy throws an error.
I've manually gone into ../changeset/ and create the folder structure until the command runs without errors.. but thats a little tedious.
Is there a simple solution to this?
find * -mtime -1 -daystart -print0 | cpio -pd0 ../changeset
cpio is an old, oddball archival program that is occasionally the best tool for the job. With -p it copies files named on stdin to another directory. With -d it creates directories as needed.
I've found another solution which isn't as elegant as John's but which isn't reliant on cpio, which i dont have.
cd ./myproject/
# Create all directories
find ./* -type d -exec mkdir ../changeset/{} \;
# Copy files
find ./* -mtime -1 -daystart -exec cp {} ../changeset/{} \;
# Delete empty directories, run this several times because after moving a child the parent directory needs to be removed
find ../changeset/ -type d -empty -exec rmdir {} \;

Resources