find, xargs: execute chain of commands for each file - bash

I am sorry if the question title is not informative enough. Please feel free to suggest a better variant.
I want to perform the following task:
In a directory I have a number of files that are photos in JPEG format. I want to extract from EXIF the dates when those photos were taken, create a new directory for each date, and move a file to the relevant directory.
(EXIF date and time have the format YYYY:MM:DD hh:mm:ss, and I want the directory names to be formatted as YYYY-MM-DD, that's why I use sed)
I kind of know how to perform each of those tasks separately, but failed to put them together. I spent some time investigating how to execute commands using find with -exec or xargs but still failed to understand how to properly chain everything.
Finally I was able to fulfil my task using two commands:
find . -name '*.jpg' -exec sh -c "identify -format %[exif:DateTimeOriginal] {}
| sed 's/ [0-9:]*//; s/:/-/g' | xargs mkdir -p" \;
find . -name '*.jpg' -exec sh -c "identify -format %[exif:DateTimeOriginal] {}
| sed 's/ [0-9:]*//; s/:/-/g; s/$/\//' | xargs mv {}" \;
But I do not like the duplication, and I do not like -exec sh -c. Is there the right way to do this in one line and without using -exec sh -c?

Rather than focusing on one-liners, a better solution would be to put the logic into a script which makes it easy to execute and test. Put this in a file called movetodate.sh:
#!/usr/bin/env bash
# This script takes one or more image file paths
set -e
set -o pipefail
for path in "$#"; do
date=$(identify -format %[exif:DateTimeOriginal] | sed 's/ [0-9:]*//; s/:/-/g')
dest=$(dirname "$path")/$date
mkdir -p "$dest"
mv "$path" "$dest"
done
Then, to invoke it:
find . -name '*.jpg' -exec ./movetodate.sh {} +

It is easily done with exiftool:
exiftool "-Directory<DateTimeOriginal" -d %Y-%m-%d *.jpg
For example, the command transforms a layout like this:
.
├── a.jpg (2013:10:17 10:01:00)
└── b.jpg (2012:08:07 16:11:15)
to this:
.
├── 2012-08-07
│   └── b.jpg
└── 2013-10-17
└── a.jpg
If you still want to use identify, the commands can be rewritten as follows:
script=$(cat <<'SCRIPT'
d=$(
d=$(identify -format "%[exif:DateTimeOriginal]" "$0" 2>/dev/null) || exit $?
d=${d:0:10}
printf '%s/%s' "$(dirname "$0")" "${d//:/-}"
) || exit $?
mkdir -p "$d" && mv -v "$0" "$d"
SCRIPT
)
find "$dir" -name '*.jpg' -exec bash -c "$script" {} \;
Note the use of $0 variable within the script. We pass the {} placeholder to the script as the first argument.
The script can easily be transformed to accept multiple arguments (paths) with the help of a for file in "$#" loop. In this case the \; character should be replaced with +. However, if you have a large number of files exceeding the $(getconf ARG_MAX) limit, you will need either xargs, or processing the files one by one as shown in the script above. The same considerations are applied to the exiftool command.

With parallel you do not need the script but would instead do:
doit() {
path="$1"
date=$(identify -format %[exif:DateTimeOriginal] | sed 's/ [0-9:]*//; s/:/-/g')
dest=$(dirname "$path")/$date
mkdir -p "$dest"
mv "$path" "$dest"
}
export -f doit
find . -name '*.jpg' | parallel doit

Related

How can I use sed to change my target dir in this shell command line?

I use this command line to find all the SVGs (thousands) in a directory and convert them to PNGs using Inkscape. Works great. Here is my issue. It outputs the PNGs in the same directory. I would like to change the target directory.
for i in `find /home/wyatt/test/svgsDIR -name "*.svg"`; do inkscape $i --export-background-opacity=0 --export-png=`echo $i | sed -e 's/svg$/png/'` -w 700 ; done
It appears $i is the file_path + file_name, and sed does a search/replace on the file extension. How do I search/replace my file_path? Or is there a better way to define a different target path within this command line?
Any help is much appreciated.
Would you please try:
destdir="DIR" # replace with your desired directory name
mkdir -p "$destdir"
find /home/wyatt/test/svgsDIR -name "*.svg" -print0 | while IFS= read -r -d "" i; do
destfile="$destdir/$(basename -s .svg "$i").png"
inkscape "$i" --export-background-opacity=0 --export-png="$destfile" -w 700
done
or
destdir="DIR"
mkdir -p "$destdir"
for i in /home/wyatt/test/svgsDIR/*.svg; do
destfile="$destdir/$(basename -s .svg "$i").png"
inkscape "$i" --export-background-opacity=0 --export-png="$destfile" -w 700
done
This may be off-topic but it is not recommended to use a for loop relying on the word-splitting especially when dealing with the filenames. Please consider the filenames and the pathnames may contain whitespace, newline, tab or other special characters.
Or with a one-liners (split for readability)
find /home/wyatt/test/svgsDIR -name "*.svg" |
xargs -I{} sh -c 'inkscape "{}" --export-background-opacity=0 --export-png='$destdir'/$(basename {} .svg).png -w 700'
Might work with find built-in exec:
find /home/wyatt/test/svgsDIR -name "*.svg" -exec sh -c 'inkscape "{}" --export-background-opacity=0 --export-png='$destdir'/$(basename {} .svg).png -w 700' \;
Or by passing target-dir as arguments, to simplify quoting.
find /home/wyatt/test/svgsDIR -name "*.svg" -exec sh -c 'inkscape "$1" --export-background-opacity=0 --export-png="$2/$(basename $1 .svg).png" -w 700' '{}' "$targetdir" \;

need help utilizing find command and xargs command

I'm trying to write a simple scripts that can mv every file within a folder to a folder generated from the current date.
This is my initiatives.
#!/bin/bash
storage_folder=`date +%F` # date is generated to name the folder
mkdir "$storage_folder" #createing a folder to store data
find "$PWD" | xargs -E mv "$storage_folder" # mv everyfile to the folder
xargs is not needed. Try:
find . -exec mv -t "$storage_folder" {} +
Notes:
Find's -exec feature eliminates most needs for xargs.
Because . refers to the current working directoy, find "$PWD" is the same as the simpler find ..
The -t target option to mv tells mv to move all files to the target directory. This is handy here because it allows us to fit the mv command into the natural format for a find -exec command.
POSIX
If you do not have GNU tools, then your mv may not have the -t option. In that case:
find . -exec sh -c 'mv -- "$1" "$storage_folder"' Move {} \;
The above creates one shell process for each move. A more efficient approach, as suggested by Charles Duffy in the comments, passes in the target directory using $0:
find . -exec sh -c 'mv -- "$#" "$0"' "$storage_folder" {} +
Safety
As Gordon Davisson points out in the comments, for safety, you may want to use the -i or -n options to mv so that files at the destination are not overwritten without your explicit approval.

More elegant use of find for passing files grouped by directory?

This script has taken me too long (!!) to compile, but I finally have a reasonably nice script which does what I want:
find "$#" -type d -print0 | while IFS= read -r -d $'\0' dir; do
find "$dir" -iname '*.flac' -maxdepth 1 ! -exec bash -c '
metaflac --list --block-type=VORBIS_COMMENT "$0" 2>/dev/null | grep -i "REPLAYGAIN_ALBUM_PEAK" &>/dev/null
exit $?
' {} ';' -exec bash -c '
echo Adding ReplayGain tags to "$0"/\*.flac...
metaflac --add-replay-gain "${#:1}"
' "$dir" {} '+'
done
The purpose is to search the file tree for directories containing FLAC files, test whether any are missing the REPLAYGAIN_ALBUM_PEAK tag, and scan all the files in that directory for ReplayGain if they are missing.
The big stumbling block is that all the FLAC files for a given album must be passed to metaflac as one command, otherwise metaflac doesn't know they're all one album. As you can see, I've achieved this using find ... -exec ... +.
What I'm wondering is if there's a more elegant way to do this. In particular, how can I skip the while loop? Surely this should be unnecessary, because find is already iterating over the directories?
You can probably use xargs to achieve it.
For example, if you are looking for text foo in all your files you'll have something like
find . type f | xargs grep foo
xargs passes each result from left-end expression (find) to the right-end invokated command.
Then, if no command exists to achieve what you want to do, you can always create a function, and pass if to xargs
I can't comment on the flac commands themselves, but as for the rest:
find . -name '*.flac' \
! -exec bash -c 'metaflac --list --block-type=VORBIS_COMMENT "$1" | grep -qi "REPLAYGAIN_ALBUM_PEAK"' -- {} \; \
-execdir bash -c 'metaflac --add-replay-gain *.flac' \;
You just find the relevant files, and then treat the directory it's in.

How to go to each directory and execute a command?

How do I write a bash script that goes through each directory inside a parent_directory and executes a command in each directory.
The directory structure is as follows:
parent_directory (name could be anything - doesnt follow a pattern)
001 (directory names follow this pattern)
0001.txt (filenames follow this pattern)
0002.txt
0003.txt
002
0001.txt
0002.txt
0003.txt
0004.txt
003
0001.txt
the number of directories is unknown.
This answer posted by Todd helped me.
find . -maxdepth 1 -type d \( ! -name . \) -exec bash -c "cd '{}' && pwd" \;
The \( ! -name . \) avoids executing the command in current directory.
You can do the following, when your current directory is parent_directory:
for d in [0-9][0-9][0-9]
do
( cd "$d" && your-command-here )
done
The ( and ) create a subshell, so the current directory isn't changed in the main script.
You can achieve this by piping and then using xargs. The catch is you need to use the -I flag which will replace the substring in your bash command with the substring passed by each of the xargs.
ls -d */ | xargs -I {} bash -c "cd '{}' && pwd"
You may want to replace pwd with whatever command you want to execute in each directory.
If you're using GNU find, you can try -execdir parameter, e.g.:
find . -type d -execdir realpath "{}" ';'
or (as per #gniourf_gniourf comment):
find . -type d -execdir sh -c 'printf "%s/%s\n" "$PWD" "$0"' {} \;
Note: You can use ${0#./} instead of $0 to fix ./ in the front.
or more practical example:
find . -name .git -type d -execdir git pull -v ';'
If you want to include the current directory, it's even simpler by using -exec:
find . -type d -exec sh -c 'cd -P -- "{}" && pwd -P' \;
or using xargs:
find . -type d -print0 | xargs -0 -L1 sh -c 'cd "$0" && pwd && echo Do stuff'
Or similar example suggested by #gniourf_gniourf:
find . -type d -print0 | while IFS= read -r -d '' file; do
# ...
done
The above examples support directories with spaces in their name.
Or by assigning into bash array:
dirs=($(find . -type d))
for dir in "${dirs[#]}"; do
cd "$dir"
echo $PWD
done
Change . to your specific folder name. If you don't need to run recursively, you can use: dirs=(*) instead. The above example doesn't support directories with spaces in the name.
So as #gniourf_gniourf suggested, the only proper way to put the output of find in an array without using an explicit loop will be available in Bash 4.4 with:
mapfile -t -d '' dirs < <(find . -type d -print0)
Or not a recommended way (which involves parsing of ls):
ls -d */ | awk '{print $NF}' | xargs -n1 sh -c 'cd $0 && pwd && echo Do stuff'
The above example would ignore the current dir (as requested by OP), but it'll break on names with the spaces.
See also:
Bash: for each directory at SO
How to enter every directory in current path and execute script? at SE Ubuntu
If the toplevel folder is known you can just write something like this:
for dir in `ls $YOUR_TOP_LEVEL_FOLDER`;
do
for subdir in `ls $YOUR_TOP_LEVEL_FOLDER/$dir`;
do
$(PLAY AS MUCH AS YOU WANT);
done
done
On the $(PLAY AS MUCH AS YOU WANT); you can put as much code as you want.
Note that I didn't "cd" on any directory.
Cheers,
for dir in PARENT/*
do
test -d "$dir" || continue
# Do something with $dir...
done
While one liners are good for quick and dirty usage, I prefer below more verbose version for writing scripts. This is the template I use which takes care of many edge cases and allows you to write more complex code to execute on a folder. You can write your bash code in the function dir_command. Below, dir_coomand implements tagging each repository in git as an example. Rest of the script calls dir_command for each folder in directory. The example of iterating through only given set of folder is also include.
#!/bin/bash
#Use set -x if you want to echo each command while getting executed
#set -x
#Save current directory so we can restore it later
cur=$PWD
#Save command line arguments so functions can access it
args=("$#")
#Put your code in this function
#To access command line arguments use syntax ${args[1]} etc
function dir_command {
#This example command implements doing git status for folder
cd $1
echo "$(tput setaf 2)$1$(tput sgr 0)"
git tag -a ${args[0]} -m "${args[1]}"
git push --tags
cd ..
}
#This loop will go to each immediate child and execute dir_command
find . -maxdepth 1 -type d \( ! -name . \) | while read dir; do
dir_command "$dir/"
done
#This example loop only loops through give set of folders
declare -a dirs=("dir1" "dir2" "dir3")
for dir in "${dirs[#]}"; do
dir_command "$dir/"
done
#Restore the folder
cd "$cur"
I don't get the point with the formating of the file, since you only want to iterate through folders... Are you looking for something like this?
cd parent
find . -type d | while read d; do
ls $d/
done
you can use
find .
to search all files/dirs in the current directory recurive
Than you can pipe the output the xargs command like so
find . | xargs 'command here'
#!/bin.bash
for folder_to_go in $(find . -mindepth 1 -maxdepth 1 -type d \( -name "*" \) ) ;
# you can add pattern insted of * , here it goes to any folder
#-mindepth / maxdepth 1 means one folder depth
do
cd $folder_to_go
echo $folder_to_go "########################################## "
whatever you want to do is here
cd ../ # if maxdepth/mindepath = 2, cd ../../
done
#you can try adding many internal for loops with many patterns, this will sneak anywhere you want
You could run sequence of commands in each folder in 1 line like:
for d in PARENT_FOLDER/*; do (cd "$d" && tar -cvzf $d.tar.gz *.*)); done
for p in [0-9][0-9][0-9];do
(
cd $p
for f in [0-9][0-9][0-9][0-9]*.txt;do
ls $f; # Your operands
done
)
done

How do I check if all files inside directories are valid jpegs (Linux, sh script needed)?

Ok, I got a directory (for instance, named '/photos') in which there are different directories
(like '/photos/wedding', '/photos/birthday', '/photos/graduation', etc...) which have .jpg files in them. Unfortunately, some of jpeg files are broken. I need to find a way how to determine, which files are broken.
I found out, that there is tool named imagemagic, which can help a lot. If you use it like this:
identify -format '%f' whatever.jpg
it prints the name of the file only if file is valid, if it is not it prints something like "identify: Not a JPEG file: starts with 0x69 0x75 `whatever.jpg' # jpeg.c/EmitMessage/232.".
So the correct solution should be find all files ending with ".jpg", apply to them "identify", and if the result is just the name of the file - don't do anything, and if the result is different from the name of the file - then save the name of the file somethere (like in a file "errors.txt").
Any ideas how I can probably do that?
The short-short version:
find . -iname "*.jpg" -exec jpeginfo -c {} \; | grep -E "WARNING|ERROR"
You might not need the same find options, but jpeginfo was the solution that worked for me:
find . -type f -iname "*.jpg" -o -iname "*.jpeg"| xargs jpeginfo -c | grep -E "WARNING|ERROR" | cut -d " " -f 1
as a script (as requested in this question)
#!/bin/sh
find . -type f \
\( -iname "*.jpg" \
-o -iname "*.jpeg" \) \
-exec jpeginfo -c {} \; | \
grep -E "WARNING|ERROR" | \
cut -d " " -f 1
I was clued into jpeginfo for this by http://www.commandlinefu.com/commands/view/2352/find-corrupted-jpeg-image-files and this explained mixing find -o OR with -exec
One problem with identify -format is that it doesn't actually verify that the file is not corrupt, it just makes sure that it's really a jpeg.
To actually test it you need something to convert it. But the convert that comes with ImageMagick seems to silently ignore non-fatal errors in the jpeg (such as being truncated.)
One thing that works is this:
djpeg -fast -grayscale -onepass file.jpg > /dev/null
If it returns an error code, the file has a problem. If not, it's good.
There are other programs that could be used as well.
You can put this into bash script file or run directly:
find -name "*.jpg" -type f |xargs --no-run-if-empty identify -format '%f' 1>ok.txt 2>errors.txt
In case identify is missing, here is how to install it in Ubuntu:
sudo apt install imagemagick --no-install-recommends
This script will print out the names of the bad files:
#!/bin/bash
find /photos -name '*.jpg' | while read FILE; do
if [[ $(identify -format '%f' "$FILE" 2>/dev/null) != $FILE ]]; then
echo "$FILE"
fi
done
You could run it as is or as ./badjpegs > errors.txt to save the output to a file.
To break it down, the find command finds *.jpg files in /photos or any of its subdirectories. These file names are piped to a while loop, which reads them in one at a time into the variable $FILE. Inside the loop, we grab the output of identify using the $(...) operator and check if it matches the file name. If not, the file is bad and we print the file name.
It may be possible to simplify this. Most UNIX commands indicate success or failure in their exit code. If the identify command does this, then you could simplify the script to:
#!/bin/bash
find /photos -name '*.jpg' | while read FILE; do
if ! identify "$FILE" &> /dev/null; then
echo "$FILE"
fi
done
Here the condition is simplified to if ! identify; then which means, "did identify fail?"

Resources