Convert image bank using parallel - parallel-processing

I've a bank of images and I used to use ImageMagick Convert to resize them, but I would like to take advantage of the power of multi-core by using the parallel command.
The statement I use to perform the single core conversion is as follows:
find . -type f -iname "*.jpg" -exec convert {} -quality 90 -format jpg -resize 240x160 ../small/{} \;
It works prefectly, but the problem is that, by default, convert only uses a single core to perform the process. So, how can I use parallel to use any number of cores I want to perform the same job?
Thanks!

It should be something like this:
find . -type f -iname "*.jpg" -print0 |
parallel -0 convert {} -quality 90 -resize 240x160 ../small/{}
Test it with find ... | parallel --dry-run ...
Note that I am using find ... -print0 and parallel -0 to match it so that filenames are null-terminated and spaces in filenames don't cause issues.
By default this will use all available cores. If you want to use just 2 cores, try:
parallel -j 2 ...
If you want to use all but one of your cores, try:
parallel -j -1 ...
If you want to use half your cores, try:
parallel -j 50% ...

Related

Is there a way to parallelize find without piping to xargs / parallel?

I am trying to parallelize the following command:
find . -type f -name "{file_regex}" -exec zgrep -cH "{match_regex}" {} \;
I'm hoping to run this across 100 cores. I currently run the find separately in python and capture the stdout, and then create a threadpool and run zgrep -cH ${match_regex} {file_path} to get the matches. Is there a simpler way of doing this aside from just piping to xargs/parallel?

Convert 5 Mio tiff Files in png

i am thinking about the best and fastet way to convert 5 Mio tiff Files (in Folders Subfolders and SubSubfolders) in 5 Mio png Files (same directory).
Is there any way to parallelise this job?
How could i check then if all files are converted?
ls *.tif | wc -l # compared to
ls *.png | wc -l
but for every folder.
Thanks.
Marco
Your question is very vague on details, but you can use GNU Parallel and ImageMagick like this:
find STARTDIRECTORY -iname "*.tif" -print0 | parallel -0 --dry-run magick {} {.}.png
If that looks correct, I would make a copy of a few files in a temporary location and try it for real by removing the --dry-run. If it works ok, you can add --bar for a progress bar too.
In general, GNU Parallel will keep N jobs running, where N is the number of CPU cores you have. You can change this with -j parameter.
You can set up GNU Parallel to halt on fail, or on success, or a number of failures, or after currently running jobs complete and so on. In general you will get an error message if any file fails to convert but your jobs will continue till completeion. Run man parallel and search for --halt option.
Note that the above starts a new ImageMagick process for each image which is not the most efficient although it will be pretty fast on a decent machine with good CPU, disk subsystem and RAM. You could consider using different tools such as vips if you feel like experimenting - there are a few ideas and benchmarks here.
Depending on how your files are actually laid out, you might do better using ImageMagick's mogrify command, and getting GNU Parallel to pass as many files to each invocation as your maximum command line length permits. So, for example, if you had a whole directory of TIFFs that you wanted to make into PNGs, you can do that with a single mogrify like this:
magick mogrify -format PNG *.tif
You could pair that command with a find looking for directories something like this:
find STARTDIRECTORY -type d -print0 | parallel -0 'cd {} && magick mogrify -format PNG *.tif`
Or you could find TIFF files and pass as many as possible to each mogrify something like this:
find STARTDIRECTORY -iname "*.tif" -print0 | parallel -0 -X magick mogrify -format PNG {}

using the `find` command with -exec for a lot of files, how can I compare the output against a value?

I am trying to iterate over a large number of images and check their 'mean' value using ImageMagick.
The following command finds the images I want to check, and executes the correct command on them.
find `pwd` -type f -name "*.png" -exec /usr/bin/identify -ping -format "%[mean]" info: {} \;
Now I want to compare the output to see if it comes up with a certain value, 942.333
How can I get the output of each value that find returns to check and spit out the filename of any matched image who has the ouput of 942.333 from my command?
Thanks!
Change your identify command so it outputs the filename and the mean, then use grep:
find `pwd` -type f -name "*.png" -exec identify -ping -format "%[mean] %f\n" {} \; | grep "942.333"
Or, if you really have lots of images, you could put all your lovely CPU cores to work and do them in parallel, using GNU Parallel:
find . -name \*.png -print0 | parallel -m -0 'identify -ping -format "%[mean] %f\n" {1}' | grep ...

Simple way to resize large number of image files

I have a folder which contains about 45000 jpeg images. Most of them are from 10KB - 20Kb.
Now I want to write a script to resize all of them to fixed size 256x256. I wonder if there is any simple way to do that like: for a in *.jpg do .... I am using 8-core CPU with 8GB of RAM machine running Ubuntu 14.04, so it is fine if the process requires many resources
I would use GNU Parallel, like this to make the most of all those cores:
find . -name \*.jpg | parallel -j 16 convert {} -resize 256x256 {}
If you had fewer files, you could do it like this, but the commandline would be too long for 45,000 files:
parallel -j 16 convert {} -resize 256x256 {} ::: *.jpg
Also, note that if you want the files to become EXACTLY 256x256 regardless of input dimensions and aspect ratio, you must add ! after the -resize like this -resize 256x256!
As Tom says, make a backup first!
Here is a little benchmark...
# Create 1,000 files of noisy junk #1024x1024 pixels
seq 1 1000 | parallel convert -size 1024x1024 xc:gray +noise random {}.jpg
# Resize all 1,000 files using mogrify
time mogrify -resize 256x256 *.jpg
real 1m23.324s
# Create all 1,000 input files afresh
seq 1 1000 | parallel convert -size 1024x1024 xc:gray +noise random {}.jpg
# Resize all 1,000 files using GNU Parallel
time parallel convert -resize 256x256 {} {} ::: *.jpg
real 0m22.541s
You can see that GNU Parallel is considerably faster for this example. To be fair though, it is also wasteful of resources though because a new process has to be created for each input file, whereas mogrify just uses one process that does all the files. If you knew that the files were named in a particular fashion, you may be able to optimise things better...
Finally, you may find xargs and mogrify in concert work well for you, like this:
time find . -name \*.jpg -print0 | xargs -0 -I {} -n 100 -P 8 mogrify -resize 256x256 {}
real 0m20.840s
which allows up to 8 mogrify processes to run in parallel (-P 8), and each one processes up to 100 input images (-n 100) thereby amortizing the cost of starting a process over a larger number of files.
You could use the mogrify tool provided by ImageMagick
mogrify -resize 256x256 *.jpg
This modifies all files in place, resizing them to 256x256px. Make sure to take a backup of your originals before using this command.

How can I convert JPG images recursively using find?

Essentially what I want to do is search the working directory recursively, then use the paths given to resize the images. For example, find all *.jpg files, resize them to 300x300 and rename to whatever.jpg.
Should I be doing something along the lines of $(find | grep *.jpg) to get the paths? When I do that, the output is directories not enclosed in quotation marks, meaning that I would have to insert them before it would be useful, right?
I use mogrify with find.
Lets say, I need everything inside my nested folder/another/folder/*.jpg to be in *.png
find . -name "*.jpg" -print0|xargs -I{} -0 mogrify -format png {}
&& with a bit of explaination:
find . -name *.jpeg -- to find all the jpeg's inside the nested folders.
-print0 -- to print desired filename withouth andy nasty surprises (eg: filenames space seperated)
xargs -I {} -0 -- to process file one by one with mogrify
and lastly those {} are just dummy file name for result from find.
You can use something like this with GNU find:
find . -iname \*jpg -exec /your/image/conversion/script.sh {} +
This will be safer in terms of quoting, and spawn fewer processes. As long as your script can handle the length of the argument list, this solution should be the most efficient option.
If you need to handle really long file lists, you may have to pay the price and spawn more processes. You can modify find to handle each file separately. For example:
find . -iname \*jpg -exec /your/image/conversion/script.sh {} \;

Resources