Shell script to remove and compress files from 2 paths

Shell script to remove and compress files from 2 paths - shell

Need to delete log files older than 60 days and compress it if files are greater than 30 days and lesser than 60 days. I have to remove and compress files from 2 paths as mentioned in PURGE_DIR_PATH.
Also have to take the output of find command and redirect it to log file. Basically need to create an entry in the log file whenever a file is deleted. How can i achieve this?
I have to also validate if the directory path is valid or not and put a message in log file if the directory is valid or not
I have written a shell script but doesn't cover all the scenarios. This is my first shell script and need some help. How do i keep just one variable log_retention and
use it to compress files as the condition would be >30 days and <60 days? how do I validate if directories is valid or not? is my IF condition checking that?
Please let me know.
#!/bin/bash
LOG_RETENTION=60
WEB_HOME="/web/local/artifacts"
ENG_DIR="$(dirname $0)"
PURGE_DIR_PATH="$(WEB_HOME)/backup/csvs $(WEB_HOME)/home/archives"
if[[ -d /PURGE_DIR_PATH]] then echo "/PURGE_DIR_PATH exists on your filesystem." fi
for dir_name in ${PURGE_DIR_PATH}
do
echo $PURGE_DIR_PATH
find ${dir_name} -type f -name "*.csv" -mtime +${LOG_RETENTION} -exec ls -l {} \;
find ${dir_name} -type f -name "*.csv" -mtime +${LOG_RETENTION} -exec rm {} \;
done

Off the top of my head -
#!/bin/bash
CSV_DELETE=60
CSV_COMPRESS=30
WEB_HOME="/web/local/artifacts"
PURGE_DIR_PATH=( "$(WEB_HOME)/backup/csvs" "$(WEB_HOME)/home/archives" ) # array, not single string
# eliminate the oldest
find "${PURGE_DIR_PATH[#]}" -type f -name "*.csv" -mtime +${CSV_DELETE} |
xargs -P 100 rm -f # run 100 in bg parallel
# compress the old-enough after the oldest are gone
find "${PURGE_DIR_PATH[#]}" -type f -name "*.csv" -mtime +${CSV_COMPRESS} |
xargs -P 100 gzip # run 100 in bg parallel
Shouldn't need loops.

Related

A script that iterates over all files in folder

There is a script on a server that I need to run over all the files in a folder. To run this script over one file I use this shell script:
for input in /home/arashsa/duo-bokmaal/Bokmaal/DUO_BM_28042.txt ; do
name=$(basename "$input")
/corpora/bokm/tools/The-Oslo-Bergen-Tagger/./tag-lbk.sh "$input" > "/home/arashsa/duo-bokmaal-obt/$name"
done
I'm terrible at writing shell scripts, and have not managed to found out how to iterate over files. What I want it is to make the script iterate over all files in a given folder that end with .txt and not those that end with _metadata.txt. So I'm thinking I would give it the folder path as argument, make it iterate over all the files in that folder, and run script on files ending with .txt and not _metadata.txt

Use find and the exec option.
$ find /path/to/dir -exec <command here> \;
Each file or directory can be obtained by using {}.
Example usage: $ find . -exec echo {} \;, this will echo each file name recursively or directory name in the current directory. You can use some other options to further specify the desired files and directories you wish to handle. I will briefly explain some of them. Note that the echo is redundant because the output of find will automatically print but I'll leave it there to illustrate the working of exec. This being said, following commands yield the same result: $ find . -exec echo {} \; and $ find .
maxdepth and mindepth
Specifying the maxdepth and mindepth allows you to go as deep down the directory structure as you like. Maxdepth determines how many times find will enter a directory and mindepth determines how many times a directory should be entered before selecting a file or dir.
Example usages:
(1) listing only elements from this dir, including . (= current dir).
(2) listing only elements from current dir excluding .
(3) listing elements from root dir and all dirs in this dir
(1)$ find . -maxdepth 1 -exec echo {} \;
(2)$ find . -mindepth 1 -maxdepth 1 -exec echo {} \;
# or, alternatively
(2)$ find . ! -path . -maxdepth 1 -exec echo {} \;
(3)$ find / -maxdepth 2 -exec echo {} \;
type
Specifying a type option allows you to filter files or directories only, example usage:
(1) list all files in this dir
(2) call shell script funtion func on every directory in the root dir.
(1)$ find . -maxdepth 1 -type f -exec echo {} \;
(2)$ find / -maxdepth 1 -type d -exec func {} \;
name & regex
The name option allows you to search for specific filenames, you can also look for files and dirs using a regex format.
Example usage: find all movies in a certain directory
$ find /path/to/dir -maxdepth 1 -regextype sed -regex ".*\.\(avi\|mp4\|mkv\)"
size
Another filter is the file size, any file or dir greater than this value will be returned. Example usage:
(1) find all empty files in current dir.
(2) find all non empty files in current dir.
(1)$ find . -maxdepth 1 -type f -size 0
(2)$ find . -maxdepth 1 -type f ! -size 0
Further examples
Move all files of this dir to a directory tmp present in .
$ find . -type f -maxdepth 1 -exec mv {} tmp \;
Convert all mkv files to mp4 files in a dir /path/to/dir and child directories
$ find /path/to/dir -maxdepth 2 -regextype sed -regex ".*\.mkv" -exec ffmpeg -i {} -o {}.mp4 \;
Convert all your jpeg files to png (don't do this, it will take very long to both find them and convert them).
$ find ~ -maxdepth 420 -regextype sed -regex '.*\.jpeg' -exec mogrify -format png {} \;
Note
The find command is a strong tool and it can prove to be fruitful to pipe the output to xargs. It's important to note that this method is superior to the following construction:
for file in $(ls)
do
some commands
done,
as the latter will handle files and directories containing spaces the wrong way.

In bash:
shopt -s extglob
for input in /dir/goes/here/*!(_metadata).txt
do
...
done

How to copy files recursively, rename them but keep the same extension in Bash?

I have a folder with tens of thousands of different file types. Id like to copy them all to a new folder (Copy1) but also rename them all to $RANDOM but keep the extension intact. I realize I can write a line specifying which extension to find and how to name it, but there is got to be a way to do it dynamically, because there are at least 100 file types and may be more in the future.
I have the following so far:
find ./ -name '*.*' -type f -exec bash -c 'cp "$1" "${1/\/123_//_$RANDOM}"' -- {} \;
but that puts the random number after the extension, and also it puts the all in the same folder. I cant figure out how to do the following 2 things:
1 - Keep all paths intact, but in a new root folder (Copy1)
2 - How to have name be $RANDOM.extension, instead of .extension.$RANDOM
PS - by $RANDOM i mean actual randomly generated number. I am interested in keeping folder structure, so we are dealing with a few hundred files at most per directory, but all directories/files need to be renamed to $RANDOM. Another way to look at what I need to do. Copy all contents or Folder1 with all subdirectories and files to Folder2 (where Fodler2 is a $RANDOM name), then rename all folders and files to random names but keep all extensions.
EDIT: Ok i figured out how to rename and keep extension. But I have a problem where its dumping all of the files into the root directory where script is run from. How do I keep them in their respective folders? Command Im using is:
find ./ -name '*.*' -type f -exec bash -c 'mv "$1" $RANDOM.${1##*.}' -- {} \;
Thanks!

Ok i figured out how to rename and keep extension. But I have a
problem where its dumping all of the files into the root directory
where script is run from. How do I keep them in their respective
folders? Command Im using is:
find ./ -name '*.*' -type f -exec bash -c 'mv "$1" $RANDOM.${1##*.}' -- {} \;
Change your command to:
PATH=/bin:/usr/bin find . -name '*.*' -type f -execdir bash -c 'mv "$1" $RANDOM.${1##*.}' -- {} \;
Or alternatively using uuids instead of random numbers:
PATH=/bin:/usr/bin find . -name '*.*' -type f -execdir bash -c 'mv "$1" $(uuidgen).${1##*.}' -- {} \;

Here's what I came up with :
i=1
random="whatever"
find . -name "*.*" -type f | while read f
do
newbase=${f/*./$random$i.} //added counter to filename
cp $f /Path/Name/"$newbase"
((i++))
done
I had to add a counter to random (i), otherwise, if the extensions are similar, your files would overwrite themselves when copied.
In your new folder, your files should look like this :
whatever1.txt
whatever2.txt
etc etc
I hope this is what you were looking for.

Here is the command that worked for me.
find . -name '*.pdf' -type f -exec bash -c 'echo "{}" && cp "$1" ./$RANDOM.${1##*.}' -- {} \;

Trying to remove a file and its parent directories

I've got a script that finds files within folders older than 30 days:
find /my/path/*/README.txt -mtime +30
that'll then produce a result such as
/my/path/jobs1/README.txt
/my/path/job2/README.txt
/my/path/job3/README.txt
Now the part I'm stuck at is I'd like to remove the folder + files that are older than 30 days.
find /my/path/*/README.txt -mtime +30 -exec rm -r {} \;
doesn't seem to work. It's only removing the readme.txt file
so ideally I'd like to just remove /job1, /job2, /job3 and any nested files
Can anyone point me in the right direction ?

This would be a safer way:
find /my/path/ -mindepth 2 -maxdepth 2 -type f -name 'README.txt' -mtime +30 -printf '%h\n' | xargs echo rm -r
Remove echo if you find it already correct after seeing the output.
With that you use printf '%h\n' to get the directory of the file, then use xargs to process it.

You can just run the following command in order to recursively remove directories modified more than 30 days ago.
find /my/path/ -type d -mtime +30 -exec rm -rf {} \;

help to explain scheduled script

Purpose of the script :
1.This script will delete Files older than 4 months.
2.Files older than 3 days will be compressed.
A script has been written such as :
#!/bin/bash
exec >> /dir5/dir6/cleanup-logfiles.log 2>&1
# customer list job
cd /dir1/dir2/dir3/dir4/tmp
find -type f -mtime +120 -exec rm -v '{}' \;
find -type f -mtime +3 -name '*.csv' -exec gzip -v '{}' \;
Can anyone please explain usage of both the above commands (and how do they serve the purpose ?
And this script has been placed at /etc/. what could be the reason ?

exec without a command parameter redirects all output (stdout + stderr [2>&1]) from the current shell (i.e. this script) to /dir5/dir6/cleanup-logfiles.log
cd changes directory ;)
the find commands will find all files (-type f) whose modified time (-mtime) is older than 120, respectively 3 days and: delete them (-exec rm -v '{}' \;) or gzip them (-exec gzip -v '{}' \;). gzipping only happens when the file has a csv extension (-name '*.csv')
{} is a placeholder for the currently found file
the script is probably run through cron (/etc/cron.{d,daily,hourly,weekly,monthly} or /etc/crontab)

How can I list all unique file names without their extensions in bash?

I have a task where I need to move a bunch of files from one directory to another. I need move all files with the same file name (i.e. blah.pdf, blah.txt, blah.html, etc...) at the same time, and I can move a set of these every four minutes. I had a short bash script to just move a single file at a time at these intervals, but the new name requirement is throwing me off.
My old script is:
find ./ -maxdepth 1 -type f | while read line; do mv "$line" ~/target_dir/; echo "$line"; sleep 240; done
For the new script, I basically just need to replace find ./ -maxdepth 1 -type f
with a list of unique file names without their extensions. I can then just replace do mv "$line" ~/target_dir/; with do mv "$line*" ~/target_dir/;.
So, with all of that said. What's a good way to get a unique list of files without their file names with bash script? I was thinking about using a regex to grab file names and then throwing them in a hash to get uniqueness, but I'm hoping there's an easier/better/quicker way. Ideas?

A weird-named files tolerant one-liner could be:
find . -maxdepth 1 -type f -and -iname 'blah*' -print0 | xargs -0 -I {} mv {} ~/target/dir
If the files can start with multiple prefixes, you can use logic operators in find. For example, to move blah.* and foo.*, use:
find . -maxdepth 1 -type f -and \( -iname 'blah.*' -or -iname 'foo.*' \) -print0 | xargs -0 -I {} mv {} ~/target/dir
EDIT
Updated after comment.
Here's how I'd do it:
find ./ -type f -printf '%f\n' | sed 's/\..*//' | sort | uniq | ( while read filename ; do find . -type f -iname "$filename"'*' -exec mv {} /dest/dir \; ; sleep 240; done )
Perhaps it needs some explaination:
find ./ -type f -printf '%f\n': find all files and print just their name, followed by a newline. If you don't want to look in subdirectories, this can be substituted by a simple ls;
sed 's/\..*//': strip the file extension by removing everything after the first dot. Both foo.tar ad foo.tar.gz are transformed into foo;
sort | unique: sort the filenames just found and remove duplicates;
(: open a subshell:
while read filename: read a line and put it into the $filename variable;
find . -type f -iname "$filename"'*' -exec mv {} /dest/dir \;: find in the current directory (find .) all the files (-type f) whose name starts with the value in filename (-iname "$filename"'*', this works also for files containing whitespaces in their name) and execute the mv command on each one (-exec mv {} /dest/dir \;)
sleep 240: sleep
): end of subshell.
Add -maxdepth 1 as argument to find as you see fit for your requirements.

Nevermind, I'm dumb. there's a uniq command. Duh. New working script is: find ./ -maxdepth 1 -type f | sed -e 's/.[a-zA-Z]*$//' | uniq | while read line; do mv "$line*" ~/target_dir/; echo "$line"; sleep 240; done
EDIT: Forgot close tag on code and a backslash.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio