Bash: Maintaining a set of files and their gzipped equivalent - bash

I have a directory tree in which there are some files and some subdirectories.
/
/file1.txt
/file2.png
/dir1
/subfile1.gif
The objective is to have a script that generates a gzipped version of each file and saves it next to each file, with an added .gz suffix:
/
/file1.txt
/file1.txt.gz
/file2.png
/file2.png.gz
/dir1
/subfile1.gif
/subfile1.gif.gz
This would handle the creation of new .gz files.
Another part is deletion: Whenever a non-gzipped file is created, the script would need to delete the orphaned .gz version when it runs.
The last and trickiest part is modification: Whenever some (non-gzipped) files are changed, re-running the script would update the .gz version of only those changed files, based on file timestamp (mtime) comparison between a file and its gzipped version.
Is it possible to implement such a script in bash?
Edit: The goal of this is to have prepared compressed copies of each file for nginx to serve using the gzip_static module. It is not meant to be a background service which automatically compresses things as soon as anything changes, because nginx's gzip_static module is smart enough to serve content from the uncompressed version if no compressed version exists, or if the uncompressed version's timestamp is more recent than the gzipped version's timestamp. As such, this is a script that would run occasionally, whenever the server is not busy.

Here is my attempt at it:
#!/bin/bash
# you need to clean up .gz files when you remove things
find . -type f -perm -o=r -not -iname \*.gz | \
while read -r x
do
if [ "$x" -nt "$x.gz" ]; then
gzip -cn9 "$x" > "$x.gz"
chown --reference="$x" "$x.gz"
chmod --reference="$x" "$x.gz"
touch --reference="$x" "$x.gz"
if [ `stat -c %s "$x.gz"` -ge `stat -c %s "$x"` ]; then
rm "$x.gz"
fi
fi
done
Stole most of it from here: https://superuser.com/questions/482787/gzip-all-files-without-deleting-them
Changes include:
skipping .gz files
adding -9 and -n to make the files smaller
deleting files that ended up larger (unfortunately this means they will be retried every time you run the script.)
made sure the owner, permissions, and timestamp on the compressed file matches the original
only works on files that are readable by everyone

Something like this, maybe?
#!/bin/sh
case $1 in
*.gz )
# If it's an orphan, remove it
test -f "${1%.gz}" || rm "$1" ;;
# Otherwise, will be handled when the existing parent is handled
* )
make -f - <<'____HERE' "$1.gz"
%.gz: %
# Make sure you have literal tab here!
gzip -9 <$< >$#
____HERE
;;
esac
If you have a Makefile already, by all means use a literal file rather than a here document.
Integrating with find left as an exercise. You might want to accept multiple target files and loop over them, if you want to save processes.

Related

Bash scanning for filenames containing keywords and move them

I'm looking to find a way to constantly scan a folder tree for new subfolders containing MKV/MP4 files. If that file contains a keyword and ends in MP4 or MKV, it'll be moved to a defined location matching that keyword. As a bonus, it would delete the folder and all it's leftover contents where the file resided previosly. The idea would be to have this run in the background and sort everything where it belongs and clean up after itself if possible.
example:
Media\anime\Timmy\Timmy_S1E1\Timmy_S1E1_720p.mkv #Found Keyword Timmy, allowed filetype
Move to destination:
Media\series\Timmy\
Delete subfolder:
Media\anime\Timmy\Timmy_S1E1\
I would either do separate scripts for each keyword, or, if possible, have the script match each keyword with a destination
#!/bin/bash
#!/bin/sh
#!/etc/shells/bin/bash
while true
do
shopt -s globstar
start_dir="//srv/MEDIA2/shows"
for name in "$start_dir"/**/*.*; do
# search the directory recursively
done
sleep 300
done
This could be done by:
creating a script that does what you want to do, once.
run the script from cron, at a certain interval. Say a couple minutes, or a couple hours, depends on the volume of files you receive.
no need for a continually running daemon.
Ex:
#!/bin/bash
start_dir="/start/directory"
if [[ ! -d "$start_dir" ]]
then
echo "ERROR: start_dir ($start_dir) not found."
exit 1
fi
target_dir="/target/directory"
if [[ ! -d "$target_dir" ]]
then
echo "ERROR: target_dir ($target_dir) not found."
exit 1
fi
# Move all MP4 and MKV files to the target directory
find "$start_dir" -type f \( -name "*keyword*.MP4" -o -name "*keyword*.MKV" \) -print0 | while read -r -d $'\0' file
do
# add any processing here...
filename=$(basename "$file")
echo "Moving $filename to $target_dir..."
mv "$file" "$target_dir/$filename"
done
# That being done, all that is left in start_dir can be deleted
find "$start_dir" -type d ! -path "$start_dir" -exec /bin/rm -fr {} \;
Details:
scanning for files is most efficient with the find command
the -print0 with read ... method is to ensure all valid filenames are processed, even if they include spaces or other "weird" characters.
the result of the above code is that each file that matches your keyword, with extensions MP4 or MKV will be processed once.
you can then use "$file" to access the file being processed in the current loop.
make sure you ALWAYS double quote $file, otherwise any weird filename will brake your code. Well you should always double quote your variables anyway.
more complex logic can be added for your specific needs. Ex. create the target directory if it does not exist. Create a different target directory depending on your keyword. etc.
to delete all sub-directories under $start_dir, I use find. Again this will process weird directory names.
One point, some will argue that it could all be done in 1 find command with -exec option. True, but IMHO the version with the while loop is easier to code, understand, debug, learn.
And this construct is good to have in your bash toolbox.
When you create a script, only one #! line is needed.
And I fixed the indentation in your question, much easier to read your code properly indented and formatted (see the edit help in the question editor).
Last point to discuss, lets say you have a LARGE number of directories and files to process, and it is possible that new files are added while the script is running. Ex. you are moving many MP4 files, and while it is doing it, new files are deposited in the directories. Then when you do the deletion you could potentially loose files.
If such a case is possible, you could add a check for new files just before you do the /bin/rm, it would help. To be absolutely certain, you could setup a script that processes 1 file, and have it triggered by inotify. But that is another ball game, more complicated and out of scope for this answer.

Shell script to archive & delete files older than 5 days based on created date of the files

I am trying to compress 5 days' worth log at a time and moving the compressed files to another location and deleting the logs files from original location. I need bash script to accomplish this. I got the files compressed using the below command, but not able to move them to the archive folder. I also need to compress based on date created. Now it's compressing all the files starting with a specific name.
#!/bin/bash
cd "C:\Users\ann\logs"
for filename in acap*.log*; do
# this syntax emits the value in lowercase: ${var,,*} (bash version 4)
mkdir -p archive
gzip "$filename_.zip" "$filename"
mv "$filename" archive
done
#!/bin/bash
mkdir -p archive
for file in $(find . -mtime +3 -type f -printf "%f ")
do
if [[ "$file" =~ ^acap.*\.log$ ]]
then
tar -czf archive/${file}.tar.gz $file
rm $file
fi
done
This finds all files in the current directory that match the regex and compresses them in an tar for every file. Then it deletes all the files.

Effeciantly moving half a million files based on extention in bash

Scenario:
With Locky virus on the rampage the computer center I work for have found the only method of file recovery is using tools like Recuva now the problem with that is it dumps all the recovered files into a single directory. I would like to move all those files based on there file extensions into categories. All JPG in one all BMP in another ... etc. i have looked around Stackoverflow and based off of various other questions and responses I managed to build a small bash script (sample provided) that kinda does that however it takes forever to finish and i think i have the extensions messed up.
Code:
#!/bin/bash
path=$2 # Starting path to the directory of the junk files
var=0 # How many records were processed
SECONDS=0 # reset the clock so we can time the event
clear
echo "Searching $2 for file types and then moving all files into grouped folders."
# Only want to move Files from first level as Directories are ok were they are
for FILE in `find $2 -maxdepth 1 -type f`
do
# Split the EXT off for the directory name using AWK
DIR=$(awk -F. '{print $NF}' <<<"$FILE")
# DEBUG ONLY
# echo "Moving file: $FILE into directory $DIR"
# Make a directory in our path then Move that file into the directory
mkdir -p "$DIR"
mv "$FILE" "$DIR"
((var++))
done
echo "$var Files found and orginized in:"
echo "$(($diff / 3600)) hours, $((($diff / 60) % 60)) minutes and $(($diff % 60)) seconds."
Question:
How can i make this more efficient while dealing with 500,000+ files? The find takes forever to grab a list of files and in the loop its attempting to create a directory (even if that path is already there). I would like to more efficiently deal with those two particular aspects of the loop if at possible.
The bottleneck of any bash script is usually the number of external processes you start. In this case, you can vastly reduce the number of calls to mv you make by recognizing that a large percentage of the files you want to move will have a common suffix like jpg, etc. Start with those.
for ext in jpg mp3; do
mkdir -p "$ext"
# For simplicity, I'll assume your mv command supports the -t option
find "$2" -maxdepth 1 -name "*.$ext" -exec mv -t "$ext" {} +
done
Use -exec mv -t "$ext" {} + means find will pass as many files as possible to each call to mv. For each extension, this means one call to find and a minimum number of calls to mv.
Once those files are moved, then you can start analyzing files one at a time.
for f in "$2"/*; do
ext=${f##*.}
# Probably more efficient to check in-shell if the directory
# already exists than to start a new process to make the check
# for you.
[[ -d $ext ]] || mkdir "$ext"
mv "$f" "$ext"
done
The trade-off occurs in deciding how much work you want to do beforehand identifying the common extensions to minimize the number of iterations of the second for loop.

Move files to the correct folder in Bash

I have a few files with the format ReportsBackup-20140309-04-00 and I would like to send the files with same pattern to the files as the example to the 201403 file.
I can already create the files based on the filename; I would just like to move the files based on the name to their correct folder.
I use this to create the directories
old="directory where are the files" &&
year_month=`ls ${old} | cut -c 15-20`&&
for i in ${year_month}; do
if [ ! -d ${old}/$i ]
then
mkdir ${old}/$i
fi
done
you can use find
find /path/to/files -name "*201403*" -exec mv {} /path/to/destination/ \;
Here’s how I’d do it. It’s a little verbose, but hopefully it’s clear what the program is doing:
#!/bin/bash
SRCDIR=~/tmp
DSTDIR=~/backups
for bkfile in $SRCDIR/ReportsBackup*; do
# Get just the filename, and read the year/month variable
filename=$(basename $bkfile)
yearmonth=${filename:14:6}
# Create the folder for storing this year/month combination. The '-p' flag
# means that:
# 1) We create $DSTDIR if it doesn't already exist (this flag actually
# creates all intermediate directories).
# 2) If the folder already exists, continue silently.
mkdir -p $DSTDIR/$yearmonth
# Then we move the report backup to the directory. The '.' at the end of the
# mv command means that we keep the original filename
mv $bkfile $DSTDIR/$yearmonth/.
done
A few changes I’ve made to your original script:
I’m not trying to parse the output of ls. This is generally not a good idea. Parsing ls will make it difficult to get the individual files, which you need for copying them to their new directory.
I’ve simplified your if ... mkdir line: the -p flag is useful for “create this folder if it doesn’t exist, or carry on”.
I’ve slightly changed the slicing command which gets the year/month string from the filename.

Unzip ZIP file and extract unknown folder name's content

My users will be zipping up files which will look like this:
TEMPLATE1.ZIP
|--------- UnknownName
|------- index.html
|------- images
|------- image1.jpg
I want to extract this zip file as follows:
/mysite/user_uploaded_templates/myrandomname/index.html
/mysite/user_uploaded_templates/myrandomname/images/image1.jpg
My trouble is with UnknownName - I do not know what it is beforehand and extracting everything to the "base" level breaks all the relative paths in index.html
How can I extract from this ZIP file the contents of UnknownName?
Is there anything better than:
1. Extract everything
2. Detect which "new subdidrectory" got created
3. mv newsubdir/* .
4. rmdir newsubdir/
If there is more than one subdirectory at UnknownName level, I can reject that user's zip file.
I think your approach is a good one. Step 2 could be improved my extracting to a newly created directory (later deleted) so that "detection" is trivial.
# Bash (minimally tested)
tempdest=$(mktemp -d)
unzip -d "$tempdest" TEMPLATE1.ZIP
dir=("$tempdest"/*)
if (( ${#dir[#]} == 1 )) && [[ -d $dir ]]
# in Bash, etc., scalar $var is the same as ${var[0]}
mv "$dir"/* /mysite/user_uploaded_templates/myrandomname
else
echo "rejected"
fi
rm -rf "$tempdest"
The other option I can see other than the one you suggested is to use the unzip -j flag which will dump all paths and put all files into the current directory. If you know for certain that each of your TEMPLATE1.ZIP files includes an index.html and *.jpg files then you can just do something like:
destdir=/mysite/user_uploaded_templates/myrandomname
unzip -j -d "$destdir"
mkdir "${destdir}/images"
mv "${destdir}/*.jpg" "${destdir}/images"
It's not exactly the cleanest solution but at least you don't have to do any parsing like you do in your example. I can't seem to find any option similar to patch -p# that lets you specify the path level.
Each zip and unzip command differs, but there's usually a way to list the file contents. From there, you can parse the output to determine the unknown directory name.
On Windows, the 1996 Wales/Gaily/van der Linden/Rommel version it is unzip -l.
Of course, you could just simply allow the unzip to unzip the files to whatever directory it wants, then use mv to rename the directory to what you want it as.
$tempDir = temp.$$
mv $zipFile temp.$$
cd $tempDir
unzip $zipFile
$unknownDir = * #Should be the only directory here
mv $unknownDir $whereItShouldBe
cd ..
rm -rf $tempDir
It's always a good idea to create a temporary directory for these types of operations in case you end up running two instances of this command.

Resources