resizing images with imagemagick via shell script - bash

I don't really know that much about bash scripts OR imagemagick, but I am attempting to create a script in which you can give some sort of regexp matching pattern for a list of images and then process those into new files that have a given filename prefix.
for example given the following dir listing:
allfiles01.jpg allfiles02.jpg
allfiles03.jpg
i would like to call the script like so:
./resisemany.sh allfiles*.jpg 30 newnames*.jpg
the end result of this would be that you get a bunch of new files with newnames, the numbers match up,
so far what i have is:
IMAGELIST=$1
RESIEZFACTOR=$2
NUMIMGS=length($IMAGELIST)
for(i=0; i<NUMIMGS; i++)
convert $IMAGELIST[i] -filter bessel -resize . RESIZEFACTOR . % myfile.JPG
Thanks for any help...
The parts that I obviously need help with are
1. how to give a bash script matching criteria that it understands
2. how to use the $2 without having it match the 2nd item in the image list
3. how to get the length of the image list
4. how to create a proper for loop in such a case
5. how to do proper text replacement for a shell command whereby you are appending items as i allude to.
jml

Probably the way a standard program would work would be to take an "in" filename pattern and an "out" filename pattern and perform the operation on each file in the current directory that matches the "in" pattern, substituting appropriate parts into the "out" pattern. This is pretty easy if you have a hard-coded pattern, when you can write one-off commands like
for infile in *.jpg; do convert $infile -filter bessel -resize 30% ${infile//allfiles/newnames}; done
In order to make a script that will do this with any pattern, though, you need something more complicated because your filename transformation might be something more complicated than just replacing one part with another. Unfortunately Bash doesn't really give you a way to identify what part of the filename matched a specific part of the pattern, so you'd have to use a more capable regular expression engine, like sed for example:
#!/bin/bash
inpattern=$1
factor=$2
outpattern=$3
for infile in *; do
outfile=$(echo $infile | sed -n "s/$inpattern/$outpattern/p")
test -z $outfile && continue
convert $infile -filter bessel -resize $factor% $outfile
done
That could be invoked as
./resizemany.sh 'allfiles\(.*\).jpg' 30 'newnames\1.jpg'
(note the single quotes!) and it would resize allfiles1.jpg to newnames1.jpg, etc. But then you'd wind up basically having to learn sed's regular expression syntax to specify your in and out patterns. (It's not that bad, really)

You could eliminate the regex problem if you make a folder of all the files to be processed, and then run something like:
for img in `ls *.jpg`
do
convert $img -filter bessel -resize 30% processed-$img
done
Then, if you need to rename them all later, you could do something like:
ls | nl -nrz -w2 | while read a b; do mv "$b" newfilename.$a.jpg; done;
Also, If you are doing a batch process of the same operation, you might see if using mogrify might help (imagemagik's method for converting multiple files). Like the above example, it's always good to make a copy of the folder, and then run any processing so you don't destroy your original files.

Your script should be called using a syntax such as:
./resizemany.sh -r 30 -n newnames -o allfiles allfiles*.jpg
and use getopts to process the options. What you may not be aware of is that the shell expands the file glob before the script gets it so the way you had your arguments your script would never be able to distinguish the filenames from the other parameters.
Output files will be named using the rename script often found on systems with Perl installed. A file named "allfiles03.jpg" will be output as "newname03.jpg".
#!/bin/bash
options=":r:n:o:"
while getopts $options option
do
case $option in
n)
newnamepattern=$OPTARG
;;
o)
oldnamepattern=$OPTARG
;;
r)
resizefacor=$OPTARG
;;
\?)
echo "Invalid option"
exit 1
esac
done
# a check to see if any options are missing should be performed (not implemented)
shift $((OPTIND - 1))
# now all that's left will be treated as filenames
for file
do
convert (input options) "$file" -resize $resizefactor (output options) "${file}.out"
rename "s/$old/$new/;s/\.out$//" "${file}.out"
done
This is untested (obviously since most of the arguments to convert are missing).
Parameter validation such as range checks, missing required options and others are left as exercises for further development. Also absent are checks for successful completion of one step before continuing to the next one. Also issues such as locations of files and name collisions and others are not addressed.

Related

Wildcard on mv folder destination

I'm writing a small piece of code that checks for .mov files in a specific folder over 4gb and writes it to a log.txt file by name (without an extension). I'm then reading the names into a while loop line by line which signals some archiving and copying commands.
Consider a file named abcdefg.mov (new) and a corresponding folder somewhere else named abcdefg_20180525 (<-*underscore timestamp) that also contains a file named abcedfg.mov (old).
When reading in the filename from the log.txt, I strip the extension to store the variable "abcdefg" ($in1) and i'm using that variable to locate a folder elsewhere that contains that matching string at the beginning.
My problem is with how the mv command seems to support a wild card in the "source" string, but not in the "destination" string.
For example i can write;
mv -f /Volumes/Myshare/SourceVideo/$in1*/$in1.mov /Volumes/Myshare/Archive
However a wildcard on the destination doesn't work in the same way. For example;
mv -f /Volumes/Myshare/Processed/$in1.mov Volumes/Myshare/SourceVideo/$in1*/$in1.mov
Is there an easy fix here that doesn't involve using another method?
Cheers for any help.
mv accepts a single destination path. Suppose that $in1 is abcdfg, and that $in1* expands to abcdefg_20180525 and abcdefg_20180526. Then the command
mv -f /dir1/$in1 /dir2/$in1*/$in1.mov
will be equivalent to:
mv -f /dir1/abcdefg.mov /dir2/abcdefg_20180526/abcdefg.mov
mv -f /dir1/abcdefg.mov /dir2/abcdefg_20180526/abcdefg.mov
mv -f /dir2/abcdefg_20180525/abcdefg.mov /dir2/abcdefg_20180526/abcdefg.mov
Moreover, because the destination file is the same in all three cases, the first two files will be overwritten by the third.
You should create a precise list and do a precise copy instead of using wild cards.
This is what I would probably do, generate a list of results in a file with FULL path information, then read those results in another function. I could have used arrays but I wanted to keep it simple. At the bottom of this script is a function call to scan for files of EXT mp4 (case insensitive) then writes the results to a file in tmp. then the script reads the results from that file in another function and performs some operation (mv etc.). Note, if functions are confusing , you can just remove the function name { } and name calls and it becomes a normal script again. functions are really handy, learn to love them!
#!/usr/bin/env bash
readonly SIZE_CHECK_LIMIT_MB="10M"
readonly FOLDER="/tmp"
readonly DESTINATION_FOLDER="/tmp/archive"
readonly SAVE_LIST_FILE="/tmp/$(basename $0)-save-list.txt"
readonly EXT="mp4"
readonly CASE="-iname" #change to -name for exact ext type upper/lower
function find_files_too_large() {
> ${SAVE_LIST_FILE}
find "${FOLDER}" -maxdepth 1 -type f "${CASE}" "*.${EXT}" -size +${SIZE_CHECK_LIMIT_MB} -print0 | while IFS= read -r -d $'\0' line ; do
echo "FOUND => $line"
echo "$line" >> ${SAVE_LIST_FILE}
done
}
function archive_large_files() {
local read_file="${SAVE_LIST_FILE}"
local write_folder="$DESTINATION_FOLDER"
if [ ! -s "${read_file}" ] || [ ! -f "${read_file}" ] ;then
echo "No work to be done ... "
return
fi
while IFS= read -r line ;do
echo "mv $line $write_folder" ;sleep 1
done < "${read_file}"
}
# MAIN (this is where the script starts) We just call two functions.
find_files_too_large
archive_large_files
it might be easier, i think, to change the filenames to the folder name initially. So abcdefg.mov would be abcdefg_timestamp.mov. I can always strip the timestamp from the filename easy enough after its copied to the right location. I was hoping i had a small syntax issue but i think there is no easy way of doing what i thought i could...
I think you have a basic misunderstanding of how wildcards work here. The mv command doesn't support wildcards at all; the shell expands all wildcards into lists of matching files before they get passed to the mv command as wildcards. Furthermore, the mv command doesn't know if the list of arguments it got came from wildcards or not, and the shell doesn't know anything about what the command is going to do with them. For instance, if you run the command grep *, the grep command just gets a list of names of files in the current directory as arguments, and will treat the first of them as a regex pattern ('cause that's what the first argument to grep is) to search the rest of the files for. If you ran mv * (note: don't do this!), it will interpret all but the last filename as sources, and the last one as a destination.
I think there's another source of confusion as well: when the shell expands a string containing a wildcard, it tries to match the entire thing to existing files and/or directories. So when you use Volumes/Myshare/SourceVideo/$in1*/$in1.mov, it looks for an already-existing file in a matching directory; AIUI the file isn't there yet, there's no match. What it does in that case is pass the raw (unexpanded) wildcard-containing string to mv as an argument, which looks for that exact name, doesn't find it, and gives you an error.
(BTW, should there be a "/" at the front of that pattern? I assume so below.)
If I understand the situation correctly, you might be able to use this:
mv -f /Volumes/Myshare/Processed/$in1.mov /Volumes/Myshare/SourceVideo/$in1*/
Since the filename isn't supplied in the second string, it doesn't look for existing files by that name, just directories with the right prefix; mv will automatically retain the filename from the source.
However, I'll echo #Sergio's warning about chaos from multiple matches. In this case, it won't overwrite files (well, it might, but for other reasons), but if it gets multiple matching target directories it'll move all but the last one into the last one (along with the file you meant to move). You say you're 100% certain this won't be a problem, but in my experience that means that there's at least a 50% chance that something you'd never have thought of will go ahead and make it happen anyway. For instance, is it possible that $in1 could wind up empty, or contain a space, or...?
Speaking of spaces, I'd also recommend double-quoting all variable references. You want the variables inside double-quotes, but the wildcards outside them (or they won't be expanded), like this:
mv -f "/Volumes/Myshare/Processed/$in1.mov" "/Volumes/Myshare/SourceVideo/$in1"*/

Bash for loop testing two boolean expressions

Below is a simple bash program. It takes file types as command line arguments and it queries the current directory and prints the files of the type specified.
I would like to be able to query two different file types and therefore need two boolean expressions to represent this.
Below is my code for querying just one file type
#!/bin/bash
for x in $(ls *$1); do
echo $x;
done
Now what I would like to be able to do is (in pseudocode)
command line args fileName .sh .c
for x in (current directory files of *.sh) OR (in current directory files of *.c) do
print .sh files
print.c files
done
I've tried using || and I get syntax errors I can not find any evidence of being able to use || for two expressions in for loop.
I've tried using two nested for loops but they do not work and yield errors.
Is there any way I can accomplish this using the same for loop system.
Thank you.
Sounds like you want something like:
for extension in "$#"; do
printf 'Files ending in %s:\n' "$extension"
printf '%s\n' *"$extension"
done
Loop through all arguments passed to the script and print all files ending in each extension + a newline character.
Note that printf is a much more useful tool than echo, as it allows you to control the format of each thing is prints.
ls doesn't do anything useful either here; it is the shell which expands the * to the list of files matching the pattern.

Append to list of files in bash

so I'm trying to get a simple bash script to continuously read a directory and update a list of files to play through a command. However, I'm having some trouble thinking out the logic in it. What I need to do is put the current items in the directory into the list, have each item in the directory run through a program, and when a new item comes in, just append it to the list. I'm attempting to use inotifywait but can't seem to think of the proper logic. I may need it to run in the background, as the process that is running on these files will run before inotifywait is read again, at which point it will not pick up any new files that have been added as it only checks when it runs. Here's the code so hopefully it makes more sense.
#!/bin/bash
#Initial check to see if files are converted.
if [ ! -d "/home/pi/rpitx/converted" ]; then
echo "Converted directory does not exist, cannot play!"
exit 1
fi
CYAN='\e[36m'
NC='\e[39m'
LGREEN='\e[92m'
#iterate through directory first and act upon each item
for f in $FILES
do
echo -e "${CYAN}Now playing ${f##*/}...${NC}"
#Figure out a way to always watch directory even when it is playing
inotifywait -m /home/pi/rpitx/converted -e create -e moved_to |
while read path action file; do
echo -e "${LGREEN}New file found: ${CYAN}${file}${NC}"
FILES+=($file)
done
# take action on each file. $f store current file name
sudo ./rpitx -m RF -i "${f}" -f 101100
done
exit 0
So for example. if rpitx is currently playing something, and a file is converted, it won't pick up the latest file and add it to the list, nor will it make it since it's always reading. Is there a way to get inotifywait to run in the background of this script somehow? Thanks.
This is actually quite a difficult problem to get 100% perfect, but it is possible to get pretty close.
It is easy to get all the files in a directory, and it is easy to use inotifywait to get iteratively informed of new files being placed into the directory. The issue is getting the two to be consistent. If inotifywait isn't started until all the files have been processed (or even just listed), then you might miss new files created between the listing and the invocation of inotifywait. If, on the other hand, you start inotifywait first, then a file created after the invocation of inotifywait and the extraction of the current file list will be listed twice.
Since it is easier to filter duplicates than notice orphans, the recommended approach is the second one.
As a first approximation, we could ignore the duplicate problem on the assumption that the window of vulnerability is pretty short and so it is probably unlikely to happen. This simplifies the code, but it's not that difficult to track and eliminate duplicates: we could, for example, store each filename as the key in an associative array, ignoring the file if the key already exists.
We need three processes: one to execute inotifywait; one to produce the list of initial files; and one to handle each file as it is identified. So the basic structure of the code will be:
list_new_files |
{ list_existing_files; pass_through; } |
while read action file; do
handle -r "$action" "$file"
done
Note that the second process first produces the existing files, and then calls pass_through, which reads from standard input and writes to standard output, thus passing through the files being discovered by list_new_files. Since pipes have a finite capacity, it is possible that the execution of list_existing_files will block a few times (if there are lots of existing files and handling them takes a long time), so when pass_through finally gets executed, it could have quite a bit of queued-up input to pass through. That doesn't matter, unless the first pipe also fills up, which will happen if a large number of new files are created. And that still won't matter as long as inotifywait doesn't lose notifications while it is blocked on a write. (This may actually be a problem, since the manpage for inotifywait on my system includes in the "BUGS" section the note, "It is assumed the inotify event queue will never overflow." We could fix the problem by inserting another process which carefully buffers inotifywait's output, but that shouldn't be necessary unless you intend to flood the directory with lots of files.)
Now, let's examine each of the functions in turn.
list_new_files could be just the call to inotifywait from your original script:
inotifywait -m /home/pi/rpitx/converted -e create -e moved_to
Listing existing files is also easy. Here's one simple solution:
printf "%s\n" /home/pi/rpitx/converted/*
However, that will print out the full file path, which is different from the output from inotifywait. To make them the same, we cd into the directory in order to do the listing. Since we might not actually want to change the working directory, we use a subshell by surrounding the commands inside parentheses:
( cd /home/pie/rpitx/converted; printf "%s\n" *; )
The printf just prints its arguments each on a separate line. Since glob-expansions are not word-split or recursively glob-expanded, this is safe against whitespace or metacharacters in filenames, except newline characters. Filenames with newline characters are pretty rare; for now, I'll ignore the issue but I'll indicate how to handle it at the end.
Even with the change indicated above, the output from these two commands is not compatible: the first one outputs three things on each line (directory, action, filename), and the second one just one thing (the filename). In the listing below, you'll see how we modify the format to printf and introduce a format for inotifywait in order to make the outputs fully compatible, with the "action" for existing files set to EXISTING.
pass_through could, in theory, just be cat, and that's how I've coded it below. However, it is important that it operate in line-buffered mode; otherwise, nothing will happen until "enough" files have been written by list_existing_files. On my system, cat in this configuration works perfectly; if that doesn't work for you or you don't want to count on it, you could write it explicitly as a while read loop:
pass_through() {
while read -r line; do echo "$line"; done
}
Finally, handle is essentially the code from the original post, but modified a bit to take the new format into account, and to do the right thing with action EXISTING.
# Colours. Note the use of `$'...'` to actually store the code,
# thereby avoiding the need to later reinterpret backslash sequences
CYAN=$'\e[36m'
NC=$'\e[39m'
LGREEN=$'\e[92m'
converted=/home/pi/rpitx/converted
list_new_files() {
inotifywait -m "$converted" -e create -e moved_to --format "%e %f"
}
# Note the use of ( ) around the body instead of { }
# This is the same as `{( ... )}'; it makes the `cd` local to the function.
list_existing_files() (
cd "$converted"
printf "EXISTING %s\n" *
)
# Invoked as `handle action filename`
handle() {
case "$1" in
EXISTING)
echo "${CYAN}Now playing ${2}...${NC}"
;;
*)
echo "${LGREEN}New file found: ${CYAN}${file}${NC}"
;;
esac
sudo ./rpitx -m RF -i "${f}" -f 101100
}
# Put everything together
list_new_files |
{ list_existing_files; cat; } |
while read -r action file; do handle "$action" "$file"; done
What if we thought a filename might have a newline character in it? There are two "safe" characters which could be used to delimit the filenames, in the sense that they cannot appear inside a filename. One is /, which can obviously appear in a path, but cannot appear in a simple filename, which is what we're working with here. The other one is the NUL character, which cannot appear inside a filename at all, but can sometimes be a bit annoying to deal with.
Normally, faced with this problem, we would use a NUL, but that depends on the various utilities we're using allowing the separation of data with NUL instead of newline. That's not the case for inotifywait, which always outputs a newline after a notification line. So in this case it seems simpler to use a /. First we modify the formats:
inotifywait -m "$converted" -e create -e moved_to --format "%e %f/"
printf "%s/\n" *
Now, when we're reading the lines, we need to read until we find a line ending with / (and remember to remove it). read doesn't allow two-character line terminators, so we need to accumulate the lines ourselves:
while read -r action file; do
# If file doesn't end with a slash, we need to read another line
while [[ file != */ ]] && read -r line; do
file+=$'\n'"$line"
done
# Remember to remove the trailing slash
handle "$action" "${file%/}"
done

Properly handle lists of files with whitespace in filename

I want to iterate over a list of files in Bash and perform some action. The problem: the file names may contain whitespace, which creates an obvious problem with wildcards or ls:
touch a\ b
FILES=* # or $(ls)
for FILE in $FILES; do echo $FILE; done
yields
a
b
Now, the conventional way to handle this is to use find … -print0 instead. However, this only works (well) in conjunction with xargs -0, not with Bash variables / loops.
My idea was to set $IFS to the null character to make this work. However, the comp.unix.shell seems to think that this is impossible in bash.
Bummer. Well, it’s theoretically possible to use another character, such as : (after all, $PATH uses this format, too):
IFS=$':'
FILES=$(find . -print0 | xargs -0 printf "%s:")
for FILE in $FILES; do echo $FILE; done
(The output is slightly different but fair enough.)
However, I can’t help but feel that this is clumsy and that there should be a more direct way of doing this. I’m looking for a more direct way of accomplishing this, preferably using wildcards or ls.
The best way to handle this is to store the file list as an array, rather than a string (and be sure to double-quote all variable substitutions):
files=(*)
for file in "${files[#]}"; do
echo "$file"
done
If you want to generate an array from find's output (e.g. if you need to search recursively), see this previous answer.
Exactly what you have in the first example works fine for me in Msys Bash, Cygwin and on my Fedora box:
FILES=*
for FILE in $FILES
do
echo $FILE
done
Its very important to preceed
IFS=""
otherwise files with two directly following spaces will not be found

shell scripting: search/replace & check file exist

I have a perl script (or any executable) E which will take a file foo.xml and write a file foo.txt. I use a Beowulf cluster to run E for a large number of XML files, but I'd like to write a simple job server script in shell (bash) which doesn't overwrite existing txt files.
I'm currently doing something like
#!/bin/sh
PATTERN="[A-Z]*0[1-2][a-j]"; # this matches foo in all cases
todo=`ls *.xml | grep $PATTERN -o`;
isdone=`ls *.txt | grep $PATTERN -o`;
whatsleft=todo - isdone; # what's the unix magic?
#tack on the .xml prefix with sed or something
#and then call the job server;
jobserve E "$whatsleft";
and then I don't know how to get the difference between $todo and $isdone. I'd prefer using sort/uniq to something like a for loop with grep inside, but I'm not sure how to do it (pipes? temporary files?)
As a bonus question, is there a way to do lookahead search in bash grep?
To clarify/extend the problem:
I have a bunch of programs that take input from sources like (but not necessarily) data/{branch}/special/{pattern}.xml and write output to another directory results/special/{branch}-{pattern}.txt (or data/{branch}/intermediate/{pattern}.dat, e.g.). I want to check in my jobfarming shell script if that file already exists.
So E transforms data/{branch}/special/{pattern}.xml->results/special/{branch}-{pattern}.dat, for instance. I want to look at each instance of the input and check if the output exists. One (admittedly simpler) way to do this is just to touch *.done files next to each input file and check for those results, but I'd rather not manage those, and sometimes the jobs terminate improperly so I wouldn't want them marked done.
N.B. I don't need to check concurrency yet or lock any files.
So a simple, clear way to solve the above problem (in pseudocode) might be
for i in `/bin/ls *.xml`
do
replace xml suffix with txt
if [that file exists]
add to whatsleft list
end
done
but I'm looking for something more general.
#!/bin/sh
shopt -s extglob # allow extended glob syntax, for matching the filenames
LC_COLLATE=C # use a sort order comm is happy with
IFS=$'\n' # so filenames can have spaces but not newlines
# (newlines don't work so well with comm anyhow;
# shame it doesn't have an option for null-separated
# input lines).
files_todo=( **([A-Z])0[1-2][a-j]*.xml )
files_done=( **([A-Z])0[1-2][a-j]*.txt )
files_remaining=( \
$(comm -23 --nocheck-order \
<(printf "%s\n" "${files_todo[#]%.xml}") \
<(printf "%s\n" "${files_done[#]%.txt}") ))
echo jobserve E $(for f in "${files_remaining[#]%.xml}"; do printf "%s\n" "${f}.txt"; done)
This assumes that you want a single jobserve E call with all the remaining files as arguments; it's rather unclear from the specification if such is the case.
Note the use of extended globs rather than parsing ls, which is considered very poor practice.
To transform input to output names without using anything other than shell builtins, consider the following:
if [[ $in_name =~ data/([^/]+)/special/([^/]+).xml ]] ; then
out_name=results/special/${BASH_REMATCH[1]}-${BASH_REMATCH[2]}.dat
else
: # ...handle here the fact that you have a noncompliant name...
fi
The question title suggests that you might be looking for:
set -o noclobber
The question content indicates a wholly different problem!
It seems you want to run 'jobserve E' on each '.xml' file without a matching '.txt' file. You'll need to assess the TOCTOU (Time of Check, Time of Use) problems here because you're in a cluster environment. But the basic idea could be:
todo=""
for file in *.xml
do [ -f ${file%.xml}.txt ] || todo="$todo $file"
done
jobserve E $todo
This will work with Korn shell as well as Bash. In Bash you could explore making 'todo' into an array; that will deal with spaces in file names better than this will.
If you have processes still generating '.txt' files for '.xml' files while you run this check, you will get some duplicated effort (because this script cannot tell that the processing is happening). If the 'E' process creates the corresponding '.txt' file as it starts processing it, that minimizes the chance or duplicated effort. Or, maybe consider separating the processed files from the unprocessed files, so the 'E' process moves the '.xml' file from the 'to-be-done' directory to the 'done' directory (and writes the '.txt' file to the 'done' directory too). If done carefully, this can avoid most of the multi-processing problems. For example, you could link the '.xml' to the 'done' directory when processing starts, and ensure appropriate cleanup with an 'atexit()' handler (if you are moderately confident your processing program does not crash). Or other trickery of your own devising.
whatsleft=$( ls *.xml *.txt | grep $PATTERN -o | sort | uniq -u )
Note this actually gets a symmetric difference.
i am not exactly sure what you want, but you can check for existence of the file first, if it exists, create a new name? ( Or in your E (perl script) you do this check. )
if [ -f "$file" ];then
newname="...."
fi
...
jobserve E .... > $newname
if its not what you want, describe more clearly in your question what you mean by "don't overwrite files"..
for posterity's sake, this is what i found to work:
TMPA='neverwritethis.tmp'
TMPB='neverwritethat.tmp'
ls *.xml | grep $PATTERN -o > $TMPA;
ls *.txt | grep $PATTERN -o > $TMPB;
whatsleft = `sort $TMPA $TMPB | uniq -u | sed "s/%/.xml" > xargs`;
rm $TMPA $TMPB;

Resources