How do I check if all files inside directories are valid jpegs (Linux, sh script needed)? - shell

Ok, I got a directory (for instance, named '/photos') in which there are different directories
(like '/photos/wedding', '/photos/birthday', '/photos/graduation', etc...) which have .jpg files in them. Unfortunately, some of jpeg files are broken. I need to find a way how to determine, which files are broken.
I found out, that there is tool named imagemagic, which can help a lot. If you use it like this:
identify -format '%f' whatever.jpg
it prints the name of the file only if file is valid, if it is not it prints something like "identify: Not a JPEG file: starts with 0x69 0x75 `whatever.jpg' # jpeg.c/EmitMessage/232.".
So the correct solution should be find all files ending with ".jpg", apply to them "identify", and if the result is just the name of the file - don't do anything, and if the result is different from the name of the file - then save the name of the file somethere (like in a file "errors.txt").
Any ideas how I can probably do that?

The short-short version:
find . -iname "*.jpg" -exec jpeginfo -c {} \; | grep -E "WARNING|ERROR"
You might not need the same find options, but jpeginfo was the solution that worked for me:
find . -type f -iname "*.jpg" -o -iname "*.jpeg"| xargs jpeginfo -c | grep -E "WARNING|ERROR" | cut -d " " -f 1
as a script (as requested in this question)
#!/bin/sh
find . -type f \
\( -iname "*.jpg" \
-o -iname "*.jpeg" \) \
-exec jpeginfo -c {} \; | \
grep -E "WARNING|ERROR" | \
cut -d " " -f 1
I was clued into jpeginfo for this by http://www.commandlinefu.com/commands/view/2352/find-corrupted-jpeg-image-files and this explained mixing find -o OR with -exec

One problem with identify -format is that it doesn't actually verify that the file is not corrupt, it just makes sure that it's really a jpeg.
To actually test it you need something to convert it. But the convert that comes with ImageMagick seems to silently ignore non-fatal errors in the jpeg (such as being truncated.)
One thing that works is this:
djpeg -fast -grayscale -onepass file.jpg > /dev/null
If it returns an error code, the file has a problem. If not, it's good.
There are other programs that could be used as well.

You can put this into bash script file or run directly:
find -name "*.jpg" -type f |xargs --no-run-if-empty identify -format '%f' 1>ok.txt 2>errors.txt
In case identify is missing, here is how to install it in Ubuntu:
sudo apt install imagemagick --no-install-recommends

This script will print out the names of the bad files:
#!/bin/bash
find /photos -name '*.jpg' | while read FILE; do
if [[ $(identify -format '%f' "$FILE" 2>/dev/null) != $FILE ]]; then
echo "$FILE"
fi
done
You could run it as is or as ./badjpegs > errors.txt to save the output to a file.
To break it down, the find command finds *.jpg files in /photos or any of its subdirectories. These file names are piped to a while loop, which reads them in one at a time into the variable $FILE. Inside the loop, we grab the output of identify using the $(...) operator and check if it matches the file name. If not, the file is bad and we print the file name.
It may be possible to simplify this. Most UNIX commands indicate success or failure in their exit code. If the identify command does this, then you could simplify the script to:
#!/bin/bash
find /photos -name '*.jpg' | while read FILE; do
if ! identify "$FILE" &> /dev/null; then
echo "$FILE"
fi
done
Here the condition is simplified to if ! identify; then which means, "did identify fail?"

Related

Wildcard within if conditional of Bash fails to execute. Works when literal filename provided

I am trying to execute a command depending on the file type within directory. But am unable to check the content within directory using wildcard. When provided a literal filename I am able to execute.
find ./* -type d -execdir bash -c 'DIR=$(basename {}); if [[ -e {}/*.png ]]; then echo "img2pdf {}/*.png -o $DIR.pdf"; fi ' \;
Instead of going over directories, and then looking for png-s inside, find can find png-s straight away:
find . -name '*.png'
Then you can process it as you do, or using xargs:
find . -name '*.png' | xargs -I '{}' img2pdf '{}' -o '{}.pdf'
The command above will process convert each png to a separate pdf.
If you want to pass all png-s at once, and call img2pdf once:
find . -name '*.png' | xargs img2pdf -o out.pdf

Bash: List directories with a type of file, but missing another type of file

I'm new(ish) to using Bash and I'm trying to figure out how to combine a few different things into one script.
I'm looking for file transfers that were interrupted. These folders contain image files (either jpgs or pngs), but are missing another specific file (finished.txt).
Here is what I'm using to find folders with images (from here):
for f in */incoming/ ; do
log_f="${f//\//}"
echo "searching $f"
find "$f" -iname "*jpg*" -o -iname "*png*" > "/output/${log_f}.txt"
echo "$f finished"
done
Then, I'm running this command to find folders that are missing the finished.txt file (from here):
find -mindepth 2 -maxdepth 2 -type d '!' -exec test -e "{}/finished.txt" ';' -print
Is there a way to combine them so I have a list of folders which have jpg or png files, but don't have finished.txt? Also, If I want to add -mtime, where do I put that?
Alternatively, if there's a better/faster way to do this, I'm interested in that too.
Thanks!
From the first pass when you get the files with jpg/png you can get the directory by using dirname. The list of directories can then be used for iterating over and looking for finished.txt file. On finding you can skip the directory if not print it out.
Something as below should do the needful
for i in `find "$f" -iname "*jpg*" -o -iname "*png*" -exec dirname {} \;`
do
ls $i | grep finished >/dev/null
if [ $? -eq 1 ]; then
echo $i
fi
done
Add " | sort | uniq" at the end of find command to perhaps remove the duplicates. Something like
find "$f" -iname "jpg" -o -iname "png" -exec dirname {} \; | sort | uniq

sorting output of find before running the command in -exec

I have a series of directories containing multiple mp3 files with filenames 001.mp3, 002.mp3, ..., 030.mp3.
What I want to do is to put them all together in order into a single mp3 file and add some meta data to that.
Here's what I have at the moment (removed some variable definitions, for clarity):
#!/bin/bash
for d in */; do
cd $d
find . -iname '*.mp3' -exec lame --decode '{}' - ';' | lame --tt "$title_prefix$name" --ty "${name:5}" --ta "$artist" --tl "$album" -b 64 - $final_path"${d%/}".mp3
cd ..
done
Sometimes this works and I get a single file with all the "tracks" in the correct order.
However, more often than not I get a single file with all the "tracks" in reverse order, which really isn't good.
What I can't understand is why the order varies between different runs of the script as all the directories contain the same set of filenames. I've poured over the man page and can't find a sort option for find.
I could run find . -iname '*.mp3' | sort -n >> temp.txt to put the files in a temporary file and then try and loop through that, but I can't get that to work with lame.
Is there any way I can put a sort in before find runs the exec? I can find plenty of examples here and elsewhere of doing this with -exec ls but not where one needs to execute something more complicated with exec.
find . -iname '*.mp3' -print0 | sort -zn | xargs -0 -I '{}' lame --decode '{}' - | lame --tt "$title_prefix$name" --ty "${name:5}" --ta "$artist" --tl "$album" -b 64 - $final_path"${d%/}".mp3
Untested but might be worth a try.
Normally xargs appends the arguments to the end of the command you give it. The -I option tells it to replace the given string instead ({} in this case).
Edit: I've added -print0, -z, -0 to make sure the pipeline still works even if your filenames contain newlines.

Copying list of files to a directory

I want to make a search for all .fits files that contain a certain text in their name and then copy them to a directory.
I can use a command called fetchKeys to list the files that contain say 'foo'
The command looks like this : fetchKeys -t 'foo' -F | grep .fits
This returns a list of .fits files that contain 'foo'. Great! Now I want to copy all of these to a directory /path/to/dir. There are too many files to do individually , I need to copy them all using one command.
I'm thinking something like:
fetchKeys -t 'foo' -F | grep .fits > /path/to/dir
or
cp fetchKeys -t 'foo' -F | grep .fits /path/to/dir
but of course neither of these works. Any other ideas?
If this is on Linux/Unix, can you use the find command? That seems very much like fetchkeys.
$ find . -name "*foo*.fit" -type f -print0 | while read -r -d $'\0' file
do
basename=$(basename $file)
cp "$file" "$fits_dir/$basename"
done
The find command will find all files that match *foo*.fits in their name. The -type f says they have to be files and not directories. The -print0 means print out the files found, but separate them with the NUL character. Normally, the find command will simply return a file on each line, but what if the file name contains spaces, tabs, new lines, or even other strange characters?
The -print0 will separate out files with nulls (\0), and the read -d $'\0' file means to read in each file separating by these null characters. If your files don't contain whitespace or strange characters, you could do this:
$ find . -name "*foo*.fit" -type f | while read file
do
basename=$(basename $file)
cp "$file" "$fits_dir/$basename"
done
Basically, you read each file found with your find command into the shell variable file. Then, you can use that to copy that file into your $fits_dir or where ever you want.
Again, maybe there's a reason to use fetchKeys, and it is possible to replace that find with fetchKeys, but I don't know that fetchKeys command.
Copy all files with the name containing foo to a certain directory:
find . -name "*foo*.fit" -type f -exec cp {} "/path/to/dir/" \;
Copy all files themselves containing foo to a certain directory (solution without xargs):
for f in `find . -type f -exec grep -l foo {} \;`; do cp "$f" /path/to/dir/; done
The find command has very useful arguments -exec, -print, -delete. They are very robust and eliminate the need to manually process the file names. The syntax for -exec is: -exec (what to do) \;. The name of the file currently processed will be substituted instead of the placeholder {}.
Other commands that are very useful for such tasks are sed and awk.
The xargs tool can execute a command for every line what it gets from stdin. This time, we execute a cp command:
fetchkeys -t 'foo' -F | grep .fits | xargs -P 1 -n 500 --replace='{}' cp -vfa '{}' /path/to/dir
xargs is a very useful tool, although its parametrization is not really trivial. This command reads in 500 .fits files, and calls a single cp command for every group. I didn't tested it to deep, if it doesn't go, I'm waiting your comment.

More elegant use of find for passing files grouped by directory?

This script has taken me too long (!!) to compile, but I finally have a reasonably nice script which does what I want:
find "$#" -type d -print0 | while IFS= read -r -d $'\0' dir; do
find "$dir" -iname '*.flac' -maxdepth 1 ! -exec bash -c '
metaflac --list --block-type=VORBIS_COMMENT "$0" 2>/dev/null | grep -i "REPLAYGAIN_ALBUM_PEAK" &>/dev/null
exit $?
' {} ';' -exec bash -c '
echo Adding ReplayGain tags to "$0"/\*.flac...
metaflac --add-replay-gain "${#:1}"
' "$dir" {} '+'
done
The purpose is to search the file tree for directories containing FLAC files, test whether any are missing the REPLAYGAIN_ALBUM_PEAK tag, and scan all the files in that directory for ReplayGain if they are missing.
The big stumbling block is that all the FLAC files for a given album must be passed to metaflac as one command, otherwise metaflac doesn't know they're all one album. As you can see, I've achieved this using find ... -exec ... +.
What I'm wondering is if there's a more elegant way to do this. In particular, how can I skip the while loop? Surely this should be unnecessary, because find is already iterating over the directories?
You can probably use xargs to achieve it.
For example, if you are looking for text foo in all your files you'll have something like
find . type f | xargs grep foo
xargs passes each result from left-end expression (find) to the right-end invokated command.
Then, if no command exists to achieve what you want to do, you can always create a function, and pass if to xargs
I can't comment on the flac commands themselves, but as for the rest:
find . -name '*.flac' \
! -exec bash -c 'metaflac --list --block-type=VORBIS_COMMENT "$1" | grep -qi "REPLAYGAIN_ALBUM_PEAK"' -- {} \; \
-execdir bash -c 'metaflac --add-replay-gain *.flac' \;
You just find the relevant files, and then treat the directory it's in.

Resources