Parallelize for loop in bash - bash

I have the following snippet in my bash script
#!/bin/bash
for ((i=100; i>=70; i--))
do
convert test.png -quality "$i" -sampling-factor 1x1 test_libjpeg_q"$i".jpg
done
How can i execute the for loop in parallel using all cpu cores.I have seen gnu parallel being used but here i need the output filename in a specific naming scheme as shown above

You can use parallel like this:
parallel \
'convert test.png -quality {} -sampling-factor 1x1 test_libjpeg_q{}.jpg' ::: {100..70}

Related

convert and mogrify: The correct way to use them in modern versions of ImageMagick

To create an image thumbnail using an older version of ImageMagick, it was possible in the following ways:
(To aid in futher referencing, examples are numbered.)
1. convert.exe image.jpg -thumbnail 100x100 ./converted/converted_image.jpg
2. mogrify.exe -thumbnail 100x100 -path ./converted image.png
Now I have ImageMagick 7 (downloaded just yesterday), and during installation I intentionally turned "Install legacy utilities (e.g. convert.exe)" checkbox off. That is, I have only one utility in my ImageMagick directory: magick.exe.
I'm trying to understand what is the correct and future-proof way to perform above-mentioned operations according to modern ImageMagick versions.
A quote from https://imagemagick.org/script/porting.php#cli:
animate, compare, composite, conjure, convert, display, identify, import, mogrify, montage, stream
To reduce the footprint of the command-line utilities, these utilities are symbolic links to the magick utility. You can also invoke them from the magick utility, for example, use magick convert logo: logo.png to invoke the magick utility.
In the same source:
With the IMv7 parser, activated by the magick utility, settings are applied to each image in memory in turn (if any). While an option: only need to be applied once globally. Using the other utilities directly, or as an argument to the magick CLI (e.g. magick convert) utilizes the legacy parser.
Hmm...
Works:
3. magick.exe convert image.jpg -thumbnail 100x100 ./converted/converted_image.jpg
4. magick.exe mogrify -thumbnail 100x100 -path ./converted image.png
Still works (the same way as magick.exe convert):
5. magick.exe image.jpg -thumbnail 100x100 ./converted/converted_image.jpg
However, the following one doesn't work (expected: should work the same way as magick.exe mogrify):
6. magick.exe -thumbnail 100x100 -path ./converted image.png
My question is: Which syntax should I use for convert and for mogrify? 3 and 4, or 4 and 5, or something different?
AFAIK, and I am happy to add any corrections suggested, it works like this.
The first idea is that you should use version 7 if possible and all the old v6 commands, WITH THE EXCEPTION OF convert should be prefixed with magick. That means you should use these
magick ... # in place of `convert`
magick identify ... # in place of `identify`
magick mogrify ... # in place of `mogrify`
magick compare ... # in place of `compare`
magick compose ... # in place of `compose`
If you use magick convert you will get old v6 behaviour, so you want to avoid that!
Furthermore, v7 is more picky about the ordering. You must specify the image you want something done to before doing it. That means old v6 commands like:
convert -trim -resize 80% input.jpg output.jpg
must now become:
magick input.jpg -trim -resize 80% output.jpg # magick INPUT operations OUTPUT
So, looking specifically at your numbered examples:
Should become:
magick image.jpg -thumbnail 100x100 ./converted/converted_image.jpg
Should become:
magick mogrify -thumbnail 100x100 -path ./converted image.png
invokes old v6 behaviour because you use magick convert instead of plain magick, and should be avoided
Is correct, modern syntax
Is correct, modern syntax
Looks like you meant magick mogrify because you didn't give input and output filenames and because you use -path, but it looks like you accidentally omitted mogrify. If you didn't accidentally omit mogrify, then you probably meant to use the old convert-style command, and need an input and an output file and you need to specify the input file before the -thumbnail.
Keywords: Usage, wrong, modern, v7 syntax, prime.

sed to manipulate output of imagemagick in bash

I'm trying to write a bash script to trim the scanner white space around some old photos that were scanned in ages ago. I've got hundreds of photos so I'm not doing it manually.
Fred's imagemagick scripts don't manage to select the appropriate area.
I am no programmer so please dont be too offended by my terrible attempts at scripting!
I've found a combination of commands using imagemagick that does it.
first I use a blurring filter to confuse imagemagick into correctly selecting the photo size:
convert input -virtual-pixel edge -blur 0x15 -fuzz 15% -trim info:
This spits out data as follows:
0001.jpeg JPEG 3439x2437 4960x6874+1521+115 8-bit DirectClass 0.070u 0:00.009
I then use the numbers to do a crop which has been very accurate on my scans. The following is an example using the numbers from above.
convert inputfile -crop 3439x2437+1521+115 +repage outputfile
My problem is in writing the bash file to go through a directory of pictures and automate the process.
Here's what I have so far:
#!/bin/bash
ls *.jpeg > list
cat list | while read line; do
convert $line -virtual-pixel edge -blur 0x15 -fuzz 15% -trim info: > blurtrim.txt
#need a line to manipulate the output of the above to spit out the crop coordinates for the next command
crop=$(<crop.txt)
convert $line -crop $crop +repage trim$line.jpeg
rm blurtext.txt
rm crop.txt
done
rm list
The key bit I can't do is changing the string output of the first imagemagick command.
the file goes along the lines of:
input fileformat 1111x2222 3333x4444+5555+666 and then a load of crap i dont care about
the numbers I need in my script are:
1111x2222+5555+666
the cherry on the top is that while most of the numbers are four digits long not all of them are so I cant rely on that.
any ideas on how to use sed or preferably something else less demonic to get the above numbers in my script?
an explanation of the syntax would be nice (but i understand if the explantion is the size of a book then its best left out).
thanks in advance!
You don't need to parse anything! ImageMagick can tell you the trim box directly itself, using the %# format:
convert image.jpg -virtual-pixel edge -blur 0x15 -fuzz 15% -format "%#" info:
1111x2222+5555+666
So, you can say:
trimbox=$(convert image.jpg -virtual-pixel edge -blur 0x15 -fuzz 15% -format "%#" info:)
convert image.jpg -crop $trimbox ...
Benefits include the fact that this approach works on Windows too, where there is no sed.
So, the full solution would be something like:
#!/bin/bash
shopt -s nullglob
for f in *.jpeg; do
trimbox=$(convert "$f" -virtual-pixel edge -blur 0x15 -fuzz 15% -format "%#" info:)
convert "$f" -crop "$trimbox" +repage "trimmed-$f"
done
Solution
This will parse your file line by line, extract the desired parameters, concatenate them together, and use it as the argument value to 'crop' for the convert program:
regex='([0-9]+x[0-9]+) [0-9]+x[0-9]+\+([0-9]+\+[0-9]+)'
while read line
do
if [[ $line =~ $regex ]]
then
cropParam="${BASH_REMATCH[1]}+${BASH_REMATCH[2]}"
convert inputfile -crop $cropParam +repage outputfile
else
echo "ERROR: Line was not in the expected format ($line)"
exit 1;
fi
done < blurtrim.txt
Explanation
The regex variable holds a regular expression (brief introduction to regular expressions in bash here: http://www.tldp.org/LDP/abs/html/x17129.html) which describes the format of the numbers you describe in your question. The () around parts of the pattern denotes something called a capture group. If the pattern matches, the part that is in the first () is captured in a bash variable BASH_REMATCH[1], and the second () is captured in BASH_REMATCH[2]. BASH_REMATCH[0] contains the whole match, in case you're wondering why we start at index 1.
The line [[ $line =~ $regex ]] is what actually executes the pattern matching algorithm for us. In Bash [[ is called the extended test command, and the operator =~ is called the regular expression matching operator. This article explains the operator in more detail: http://www.linuxjournal.com/content/bash-regular-expressions.
I would propose a similar solution to Jonathan:
re='([0-9x]+) [0-9x]+(\+[0-9+]+)'
for file in *.jpeg; do
output=$(convert "$file" -virtual-pixel edge -blur 0x15 -fuzz 15% -trim info:)
if [[ $output =~ $re ]]; then
crop="${BASH_REMATCH[1]}${BASH_REMATCH[2]}"
convert "$file" -crop "$crop" +repage "trim$file.jpeg"
fi
done
The regular expression captures any group containing characters within the range 0-9 or x and then a + followed by numbers and + characters. It is a less strict pattern as it includes the x and + inside the bracket expressions, so technically would allow things like 0x9x9x0 but I can't imagine that this would present a problem based on the output you've shown us.
The other differences between this and your original attempt are that no temporary files are created and the loop is run over the list of files, rather than using ls, the parsing of which should generally be avoided in scripts.

How to automate file transformation with simultanous execution?

I am working on transforming a lot of image files (png) into text files. I have the basic code to do this one by one, which is really time consuming. My process involves converting the image files into a black and white format and then using tesseract to transform those into a text file. This process works great but it would take days for me to acomplisyh my task if done file by file.
Here is my code:
for f in $1
do
echo "Processing $f file..."
convert $f -resample 200 -colorspace Gray ${f%.*}BW.png
echo "OCR'ing $f"
tesseract ${f.*}BW.png ${f%.*} -l tla -psm 6
echo "Removing black and white for $f"
rn ${f%.*}BW.png
done
echo "Done!"
Is there a way to perform this process to each file at the same time, that is, how would I be able to run this process simultaneously instead of one by one? My goal is to significantly reduce the amount of time it would take for me to transform these images into text files.
Thanks in advance.
You could make the content for your for loop a function then call the function multiple times but send each all to the background so you could execute another.
function my_process{
echo "Processing $1 file..."
convert $1 -resample 200 -colorspace Gray ${1%.*}BW.png
echo "OCR'ing $1"
tesseract ${1.*}BW.png ${1%.*} -l tla -psm 6
echo "Removing black and white for $1"
rn ${1%.*}BW.png
}
for file in ${files[#]}
do
# & at the end send it to the background.
my_process "$file" &
done
I want to thank contributors #Songy and #shellter.
To answer my question... I ended up using GNU Parallel in order to make these processes run in intervals of 5. Here is the code that I used:
parallel -j 5 convert {} "-resample 200 -colorspace Gray" {.}BW.png ::: *.png ; parallel -j 5 tesseract {} {} -l tla -psm 6 ::: *BW.png ; rm *BW.png
I am now in the process of splitting my dataset in order to run this command simultaneously with different subgroups of my (very large) pool of images.
Cheers

command does not execute as variable?

Whenever I try to execute the following shell command , it works properly .
convert maanavulu_GIST-TLOTKrishna.tif -alpha set -matte -virtual-pixel transparent -set option:distort:viewport 1000x1000 -distort perspective-projection '1.06,0.5,0,0,1.2,0,0,0' -trim 1.jpg
But , whenever I try assign the command to a variable and then execute it , it reports the following error .
convert.im6: invalid argument for option PerspectiveProjection : 'Needs 8 coefficient values' # error/distort.c/GenerateCoefficients/873.
The short of it: it's best to:
store your arguments in an array
not including the command itself, for safety (preferable to an eval solution)
then invoke the command with the array
# Store options in array - note that the filenames are excluded here, too,
# for modularity
opts=(-alpha set -matte -virtual-pixel transparent -set option:distort:viewport \
1000x1000 -distort perspective-projection '1.06,0.5,0,0,1.2,0,0,0' -trim)
# Invoke command with filenames and saved options
convert maanavulu_GIST-TLOTKrishna.tif "${opts[#]}" 1.jpg
Afterthought: As #konsolebox and #chepner point out: using a function is probably the best choice (clear separation between fixed and variable parts, encapsulation, full range of shell commands available).
The proper way to assign-and-execute a command is to use an array:
COMMAND=(convert maanavulu_GIST-TLOTKrishna.tif -alpha set -matte -virtual-pixel transparent -set option:distort:viewport 1000x1000 -distort perspective-projection '1.06,0.5,0,0,1.2,0,0,0' -trim 1.jpg)
Then execute it:
"${COMMAND[#]}"
I like eval but it's definitely not the solution this time.
And just a tip: If you can use a function, use a function. And quote your arguments properly.
Quotes are not processed after expanding a variable. The only processing that occurs is word splitting and wildcard expansion. If you need to perform all the normal steps of command execution, you have to use eval:
eval "$variable"

GNU parallel with nested for loops and multiple commands

I am trying to run 10 instances of a BASH function simultaneously with GNU Parallel
The BASH function downloads tiles from an image and stitches them together - first single rows, then each column - to a single image file.
function DOWNLOAD_PAGE {
for PAGE in {0041..0100}
do
for COLUMN in {0..1}
do
for ROW in {0..2}
do wget -O "$PAGE"_"$COLUMN"_"$ROW".jpg "http://www.webb$PAGE$COLUMN$ROW"
done
convert "$PAGE"_"$COLUMN"_*.jpg -append "$PAGE"__"$COLUMN".jpg
done
convert "$PAGE"__*.jpg +append "$PAGE"_done.jpg
done
}
Unfortunately, the apparently obviuous solutions - the first one being
export -f DOWNLOAD_PAGE
parallel -j10 DOWNLOAD_PAGE
do not work.
Is there a way to do this using GNU Parallel?
Parts of your function can be parallized and others cannot: E.g. you cannot append the images before you have downloaded them.
function DOWNLOAD_PAGE {
export PAGE=$1
for COLUMN in {0..1}
do
parallel wget -O "$PAGE"_"$COLUMN"_{}.jpg "http://www.webb$PAGE$COLUMN{}" ::: {0..2}
convert "$PAGE"_"$COLUMN"_*.jpg -append "$PAGE"__"$COLUMN".jpg
done
convert "$PAGE"__*.jpg +append "$PAGE"_done.jpg
}
export -f DOWNLOAD_PAGE
parallel -j10 DOWNLOAD_PAGE ::: {0041..0100}
A more parallelized version (but harder to read):
function DOWNLOAD_PAGE {
export PAGE=$1
parallel -I // --arg-sep /// parallel wget -O "$PAGE"_//_{}.jpg "http://www.webb$PAGE//{}"\; convert "$PAGE"_"//"_\*.jpg -append "$PAGE"__"//".jpg ::: {0..2} /// {0..1}
convert "$PAGE"__*.jpg +append "$PAGE"_done.jpg
}
export -f DOWNLOAD_PAGE
parallel -j10 DOWNLOAD_PAGE ::: {0041..0100}
Your understanding of what GNU Parallel does is somewhat misguided. Consider walking though the tutorial http://www.gnu.org/software/parallel/parallel_tutorial.html and then try to understand how the examples work: n1_argument_appending">http://www.gnu.org/software/parallel/man.html#example__working_as_xargs_n1_argument_appending

Resources