Shell Script to delete specific image-files recursively - bash

I do have a third-party program, which uploads files to a webserver. These files are images, in different folders and with different names. Those files get references into a database. The program imports new images and upload those to those folders. If there is an existing file, it just takes the name and add a special counter, create a new reference in the database and the old one will be removed. But instead of removing the file as well, it keeps a copy.
Lets say, we have a image-file name "109101.jpg".
There is a new version of the file and it will be uploaded with the filename: "109101_1.jpg". This goes further till "109101_103.jpg" for example.
Now, all the 103 files before this one are outdated and could be deleted.
Due to the fact, that the program is not editable and third-party, I am not able to change that behavior. Instead, I need a Shell script, which walks through those folders and deletes all the images before the latest one. So only "109101_103.jpg" will survive and all the others before this number will be deleted.
And as a side effect, there are also images, with a double underscored name (only these, no tripple ones or so).
For example: "109013_35_1.jpg" is the original one, the next one is "109013_35_1_1.jpg" and now its at "109013_35_1_24.jpg". So only "109013_35_1_24.jpg" has to survive.
Right now I am not even having an idea, how to solve this problem. Any ideas?

Here's a one line pipeline, because I felt like it. Shown with newlines inserted because I'm not evil.
for F in $(find . -iname '*.jpg' -exec basename {} .jpg \;
| sed -r -e 's/^([^_]+|[^_]+_[^_]+_[^_]+)_[0-9]+$/\1/'
| sort -u); do
find -regex ".*${F}_[0-9]*.jpg"
| sort -t _ -k 2 -n | sort -n -t _ -k 4 -s | head -n -1;
done

The following script deletes the files in a given directory:
#! /bin/bash
cd $1
shopt -s extglob # Turn on extended patterns.
shopt -s nullglob # Non matched pattern expands to null.
delete=()
for file in +([^_])_+([0-9]).jpg \
+([^_])_+([0-9])_+([0-9])_+([0-9]).jpg ; do # Only loop over non original files.
[[ $file ]] || continue # No files in the directory.
base=${file%_*} # Delete everything after the last _.
num=${file##*_} # Delete everything before the last _.
num=${num%.jpg} # Delete the extension.
[[ -f $base.jpg ]] && rm $base.jpg # Delete the original file.
[[ -f "$base"_$((num+1)).jpg ]] && delete+=($file) # The file itself is scheduled for deletion.
done
(( ${#delete[#]} )) && rm "${delete[#]}"
The numbered files are not deleted immediately, because that could remove a "following" file for another file. They are just remembered in an array and deleted at the end.
To apply the script recursively, you can run
find /top/directory -type d -exec script.sh {} \;

Related

How to remove files from a directory if their names are not in a text file? Bash script

I am writing a bash script and want it to tell me if the names of the files in a directory appear in a text file and if not, remove them.
Something like this:
counter = 1
numFiles = ls -1 TestDir/ | wc -l
while [$counter -lt $numFiles]
do
if [file in TestDir/ not in fileNames.txt]
then
rm file
fi
((counter++))
done
So what I need help with is the if statement, which is still pseudo-code.
You can simplify your script logic a lot :
#/bin/bash
# for loop to iterate over all files in the testdir
for file in TestDir/*
do
# if grep exit code is 1 (file not found in the text document), we delete the file
[[ ! $(grep -x "$file" fileNames.txt &> /dev/null) ]] && rm "$file"
done
It looks like you've got a solution that works, but I thought I'd offer this one as well, as it might still be of help to you or someone else.
find /Path/To/TestDir -type f ! -name '.*' -exec basename {} + | grep -xvF -f /Path/To/filenames.txt"
Breakdown
find: This gets file paths in the specified directory (which would be TestDir) that match the given criteria. In this case, I've specified it return only regular files (-type f) whose names don't start with a period (-name '.*'). It then uses its own builtin utility to execute the next command:
basename: Given a file path (which is what find spits out), it will return the base filename only, or, more specifically, everything after the last /.
|: This is a command pipe, that takes the output of the previous command to use as input in the next command.
grep: This is a regular-expression matching utility that, in this case, is given two lists of files: one fed in through the pipe from find—the files of your TestDir directory; and the files listed in filenames.txt. Ordinarily, the filenames in the text file would be used to match against filenames returned by find, and those that match would be given as the output. However, the -v flag inverts the matching process, so that grep returns those filenames that do not match.
What results is a list of files that exist in the directory TestDir, but do not appear in the filenames.txt file. These are the files you wish to delete, so you can simply use this line of code inside a parameter expansion $(...) to supply rm with the files it's able to delete.
The full command chain—after you cd into TestDir—looks like this:
rm $(find . -type f ! -name '.*' -exec basename {} + | grep -xvF -f filenames.txt")

mv Bash Shell Command (on Mac) overwriting files even with a -i?

I am flattening a directory of nested folders/picture files down to a single folder. I want to move all of the nested files up to the root level.
There are 3,381 files (no directories included in the count). I calculate this number using these two commands and subtracting the directory count (the second command):
find ./ | wc -l
find ./ -type d | wc -l
To flatten, I use this command:
find ./ -mindepth 2 -exec mv -i -v '{}' . \;
Problem is that when I get a count after running the flatten command, my count is off by 46. After going through the list of files before and after (I have a backup), I found that the mv command is overwriting files sometimes even though I'm using -i.
Here's details from the log for one of these files being overwritten...
.//Vacation/CIMG1075.JPG -> ./CIMG1075.JPG
..more log
..more log
..more log
.//dog pics/CIMG1075.JPG -> ./CIMG1075.JPG
So I can see that it is overwriting. I thought -i was supposed to stop this. I also tried a -n and got the same number. Note, I do have about 150 duplicate filenames. Was going to manually rename after I flattened everything I could.
Is it a timing issue?
Is there a way to resolve?
NOTE: it is prompting me that some of the files are overwrites. On those prompts I just press Enter so as not to overwrite. In the case above, there is no prompt. It just overwrites.
Apparently the manual entry clearly states:
The -n and -v options are non-standard and their use in scripts is not recommended.
In other words, you should mimic the -n option yourself. To do that, just check if the file exists and act accordingly. In a shell script where the file is supplied as the first argument, this could be done as follows:
[ -f "${1##*/}" ]
The file, as first argument, contains directories which can be stripped using ##*/. Now simply execute the mv using ||, since we want to execute when the file doesn't exist.
[ -f "${1##*/}" ] || mv "$1" .
Using this, you can edit your find command as follows:
find ./ -mindepth 2 -exec bash -c '[ -f "${0##*/}" ] || mv "$0" .' '{}' \;
Note that we now use $0 because of the bash -c usage. It's first argument, $0, can't be the script name because we have no script. This means the argument order is shifted with respect to a usual shell script.
Why not check if file exists, prior move? Then you can leave the file where it is or you can rename it or do something else...
Test -f or, [] should do the trick?
I am on tablet and can not easyly include the source.

Using bash I need to perform a find of 0 byte files but report on their existence before deletion

The history of this problem is:
I have millions of files and directories on a NAS system. I found a count of 1,095,601 empty (0 byte) files. These files used to have data but were destroyed by a predecessor not using the correct toolsets to migrate data between an XSAN and this Isilon NAS.
The files were media production data, like fonts, pdfs and image files. They are no longer useful beyond the history of their existence. Before I proceed to delete them, the production user's need a record of which files used to exist, so when they browse a project folder, they can use the unaffected files but then refer to a text file in the same directory which records which files used to also be there and thus provide reason as to why certain reference files are broken.
So how do I find files across multiple directories and delete them but first output their filename to a text file which would be saved to each relevant path location?
I am thinking along the lines of:
for file in $(find . -type f -size 0); do
echo "$file" >> /PATH/TO/FOUND/FILE/PARENT/DIR/deletedFiles.txt -print0 |
xargs -0 rm ;
done
To delete each empty file while leaving behind a file called deletedFiles.txt which contains the names of the deleted files, try:
PATH=/bin:/usr/bin find . -empty -type f -execdir bash -c 'printf "%s\n" "$#" >>deletedFiles.txt' none {} + -delete
How it works
PATH=/bin:/usr/bin
This sets a temporary but secure path.
find .
This starts find looking in the current directory
-empty
This tells find to only look for empty files
-type f
This restricts find to looking for regular files.
-execdir bash -c 'printf "%s\n" "$#" >>deletedFiles.txt' none {} +
In each directory that contains an empty file, this adds the name of each empty file to the file deletedFiles.txt.
Notice the peculiar use of none in the command:
bash -c 'printf "%s\n" "$#" >>deletedFiles.txt' none {} +
When this command is run, bash will execute the string printf "%s\n" "$#" >>deletedFiles.txt and the arguments that follow that string are assigned to the positional parameters: $0, $1, $2, etc. When we use $#, it does not include $0. It, as is usual, expands to $1, $2, .... Thus, we add the placeholder none so that the placeholder is assigned is the $0, which we will ignore, and the complete list of file names are assigned to "$#".
-delete
This deletes each empty file.
Why not simply
find . -type f -size 0 -exec rm -v + |
sed -e 's%^removed .\./%%' -e 's/.$//' >deletedFiles.txt
If your find is too old to support -exec ... + you'll need to revert to -exec rm -v {} \; or refactor to
find . -type f -size 0 -print0 |
xargs -r -0 rm -v |
sed -e 's%^removed .\./%%' -e 's/.$//' >deletedFiles.txt
The brief sed script is to postprocess the output from rm -v which looks like
removed ‘./bar’
removed ‘./foo’
(with some funny quote characters around the file name) on my system. If you are fine with that output, of course, just omit the sed script from the pipeline.
If you know in advance which directories contain empty files, you can run the above snippet individually in those directories. Assuming you saved the snippet above as a script (with a proper shebang and execute permissions) named find-empty, you could simply use
for path in /path/to/first /path/to/second/directory /path/to/etc; do
cd "$path" && find-empty
done
This will only work if you have absolute paths (if not, you can run the body of the loop in a subshell by adding parentheses around it).
If you want to inspect all the directories in a tree, change the script to print to standard output instead (remove >deletedFiles.txt from the script) and try something like
find /path/to/tree -type d -exec sh -c '
t=$(mktemp -t find-emptyXXXXXXXX)
cd "$1" &&
find-empty | grep . >"$t" &&
mv "$t" deletedFiles.txt ||
rm "$t"' _ {} \;
This uses a temporary file so as to avoid updating the timestamp of directories which do not contain any empty files. The grep . is used purely for side effect; if any (non-empty) lines are printed, it will return success, whereas otherwise, it will report failure; this way, we know whether or not to move the temporary file to the target directory.
With prompting from #JonathanLeffler I have succeeded with the following:
#!/bin/bash
## call this script with: find . -type f -empty -exec handleEmpty.sh {} +
for file in "$#"
do
file2="$(basename "$file")"
echo "$file2" >> "$(dirname "$file")"/deletedFiles.txt
rm "$file"
done
This means I retain a trace of the removed files in a deletedFiles.txt flag file in each respective directory for the users to see when files are missing. That way, they can pursue going back to archive CD's to retrieve these deleted files, which are hopefully not 0 byte files.
Thanks to #John1024 for the suggestion of using the empty flag rather than size.

Rename files in shell

I've folder and file structure like
Folder/1/fileNameOne.ext
Folder/2/fileNameTwo.ext
Folder/3/fileNameThree.ext
...
How can I rename the files such that the output becomes
Folder/1_fileNameOne.ext
Folder/2_fileNameTwo.ext
Folder/3_fileNameThree.ext
...
How can this be achieved in linux shell?
How many different ways do you want to do it?
If the names contain no spaces or newlines or other problematic characters, and the intermediate directories are always single digits, and if you have the list of the files to be renamed in a file file.list with one name per line, then one of many possible ways to do the renaming is:
sed 's%\(.*\)/\([0-9]\)/\(.*\)%mv \1/\2/\3 \1/\2_\3%' file.list | sh -x
You'd avoid running the command through the shell until you're sure it will do what you want; just look at the generated script until its right.
There is also a command called rename — unfortunately, there are several implementations, not all equally powerful. If you've got the one based on Perl (using a Perl regex to map the old name to the new name) you'd be able to use:
rename 's%/(\d)/%/${1}_%' $(< file.list)
Use a loop as follows:
while IFS= read -d $'\0' -r line
do
mv "$line" "${line%/*}_${line##*/}"
done < <(find Folder -type f -print0)
This method handle spaces, newlines and other special characters in the file names and the intermediate directories don't necessarily have to be single digits.
This may work if the name is always the same, ie "file":
for i in {1..3};
do
mv $i/file ${i}_file
done
If you have more dirs on a number range, change {1..3} for {x..y}.
I use ${i}_file instead of $i_file because it would consider $i_file a variable of name i_file, while we just want i to be the variable and file and text attached to it.
This solution from AskUbuntu worked for me.
Here is a bash script that does that:
Note: This script does not work if any of the file names contain spaces.
#! /bin/bash
# Only go through the directories in the current directory.
for dir in $(find ./ -type d)
do
# Remove the first two characters.
# Initially, $dir = "./directory_name".
# After this step, $dir = "directory_name".
dir="${dir:2}"
# Skip if $dir is empty. Only happens when $dir = "./" initially.
if [ ! $dir ]
then
continue
fi
# Go through all the files in the directory.
for file in $(ls -d $dir/*)
do
# Replace / with _
# For example, if $file = "dir/filename", then $new_file = "dir_filename"
# where $dir = dir
new_file="${file/\//_}"
# Move the file.
mv $file $new_file
done
# Remove the directory.
rm -rf $dir
done
Copy-paste the script in a file.
Make it executable using
chmod +x file_name
Move the script to the destination directory. In your case this should be inside Folder/.
Run the script using ./file_name.

Backup script: How to keep the last N entries?

For a backup script, I need to clean old backups. How can I keep the last N backups and delete the rest?
A backup is either a single folder or a single file and the script will either keep all backups in folders or files (no mixing).
If possible, I'd like to avoid parsing of the output of ls. Even though all the entries in the backup folder should have been created by the backup script and there should be no funny characters in the entry names, a hacker might be able to create new entries in there.
This should do it (untested!):
#!/usr/bin/env bash
set -o errexit -o noclobber -o nounset -o pipefail
i=0
max=7 # Could be anything you want
while IFS= read -r -d '' -u 9
do
let ++i
if [ "$i" -gt "$max" ]
then
rm -- "$REPLY"
fi
done 9< <(find /var/backup -type f -maxdepth 1 -regex '.*/[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]\.tar\.gz' -print0 | sort -rz)
Explained from the outside in:
Ensure that the script stops at any common errors.
Find all files in /var/backup (and not subdirectories) matching a YYYY-MM-DD.tar.gz format.
Reverse sort these, so the latest are listed first.
Send these to file descriptor 9. This avoids any problems with cat, ssh or other programs which read standard input by default.
Read files one by one from FD 9, separated by NUL.
Count files until you get past your given max.
Nuke the rest from orbit.

Resources