Convert 5 Mio tiff Files in png - parallel-processing

Convert 5 Mio tiff Files in png - parallel-processing

i am thinking about the best and fastet way to convert 5 Mio tiff Files (in Folders Subfolders and SubSubfolders) in 5 Mio png Files (same directory).
Is there any way to parallelise this job?
How could i check then if all files are converted?
ls *.tif | wc -l # compared to
ls *.png | wc -l
but for every folder.
Thanks.
Marco

Your question is very vague on details, but you can use GNU Parallel and ImageMagick like this:
find STARTDIRECTORY -iname "*.tif" -print0 | parallel -0 --dry-run magick {} {.}.png
If that looks correct, I would make a copy of a few files in a temporary location and try it for real by removing the --dry-run. If it works ok, you can add --bar for a progress bar too.
In general, GNU Parallel will keep N jobs running, where N is the number of CPU cores you have. You can change this with -j parameter.
You can set up GNU Parallel to halt on fail, or on success, or a number of failures, or after currently running jobs complete and so on. In general you will get an error message if any file fails to convert but your jobs will continue till completeion. Run man parallel and search for --halt option.
Note that the above starts a new ImageMagick process for each image which is not the most efficient although it will be pretty fast on a decent machine with good CPU, disk subsystem and RAM. You could consider using different tools such as vips if you feel like experimenting - there are a few ideas and benchmarks here.
Depending on how your files are actually laid out, you might do better using ImageMagick's mogrify command, and getting GNU Parallel to pass as many files to each invocation as your maximum command line length permits. So, for example, if you had a whole directory of TIFFs that you wanted to make into PNGs, you can do that with a single mogrify like this:
magick mogrify -format PNG *.tif
You could pair that command with a find looking for directories something like this:
find STARTDIRECTORY -type d -print0 | parallel -0 'cd {} && magick mogrify -format PNG *.tif`
Or you could find TIFF files and pass as many as possible to each mogrify something like this:
find STARTDIRECTORY -iname "*.tif" -print0 | parallel -0 -X magick mogrify -format PNG {}

Related

Convert image bank using parallel

I've a bank of images and I used to use ImageMagick Convert to resize them, but I would like to take advantage of the power of multi-core by using the parallel command.
The statement I use to perform the single core conversion is as follows:
find . -type f -iname "*.jpg" -exec convert {} -quality 90 -format jpg -resize 240x160 ../small/{} \;
It works prefectly, but the problem is that, by default, convert only uses a single core to perform the process. So, how can I use parallel to use any number of cores I want to perform the same job?
Thanks!

It should be something like this:
find . -type f -iname "*.jpg" -print0 |
parallel -0 convert {} -quality 90 -resize 240x160 ../small/{}
Test it with find ... | parallel --dry-run ...
Note that I am using find ... -print0 and parallel -0 to match it so that filenames are null-terminated and spaces in filenames don't cause issues.
By default this will use all available cores. If you want to use just 2 cores, try:
parallel -j 2 ...
If you want to use all but one of your cores, try:
parallel -j -1 ...
If you want to use half your cores, try:
parallel -j 50% ...

Batch convert PNGs to individual PDFs while maintaining deep folder hierarchy in bash

I've found a solution that claims to do one folder, but I have a deep folder hierarchy of sheet music that I'd like to batch convert from png to pdf. What do my solutions look like?
I will run into a further problem down the line, which may complicate things. Maybe I should write a script? (I'm a total n00b fyi)
The "further problem" is that some of my sheet music spans more than one page, so if the script can parse filenames that include "1of2" and "2of2" to be turned into a single pdf, that'd be neat.
What are my options here?
Thank you so much.

Updated Answer
As an alternative, the following should be faster (as it does the conversions in parallel) and also able to handle larger numbers of files:
find . -name \*.png -print0 | parallel -0 convert {} {.}.pdf
It uses GNU Parallel which is readily available on Linux/Unix and which can be simply installed on OSX with homebrew using:
brew install parallel
Original Answer (as accepted)
If you have bash version 4 or better, you can use extended globbing to recurse directories and do your job very simply:
First enable extended globbing with:
shopt -s globstar
Then recursively convert PNGs to PDFs:
mogrify -format pdf **/*.png

You can loop over png files in a folder hierarchy, and process each one as follows:
find /path/to/your/files -name '*.png' |
while read -r f; do
g=$(basename "$f" .png).pdf
your_conversion_program <"$f" >"$g"
done
To merge pdf-s, you could use pdftk. You need to find all pdf files that have a 1of2 and 2of2 in their name, and run pdftk on those:
find /path/to/your/files -name '*1of2*.pdf' |
while read -r f1; do
f2=${f1/1of2/2of2} # name of second file
([ -f "$f1" ] && [ -f "$f2" ]) || continue # check both exist
g=${f1/1of2//} # name of output file
(! [ -f "$g" ]) || continue # if output exists, skip
pdftk "$f1" "$f2" output "$g"
done
See:
bash string substitution

Regarding a deep folder hierarchy you may use find with -exec option.
First you find all the PNGs in every subfolder and convert them to PDF:
find ./ -name \*\.png -exec convert {} {}.pdf \;
You'll get new PDF files with extension ".png.pdf" (image.png would be converted to image.png.pdf for example)
To correct extensions you may run find command again but this time with "rename" after -exec option.
find ./ -name \*\.png\.pdf -exec rename s/\.png\.pdf/\.pdf/ {} \;
If you want to delete source PNG files, you may use this command, which deletes all files with ".png" extension recursively in every subfolder:
find ./ -name \*\.png -exec rm {} \;

if i understand :
you want to concatenate all your png files from a deep folders structure into only one single pdf.
so...
insure you png are ordered as you want in your folders
be aware you can redirect output of a command (say a search one ;) ) to the input of convert, and tell convert to output in one pdf.
General syntax of convert :
convert 1.png 2.png ... global_png.pdf
The following command :
convert `find . -name '*'.png -print` global_png.pdf
searches for png files in folders from cur_dir
redirects the output of the command find to the input of convert, this is done by back quoting find command
converts works and output to pdf file
(this very simple command line works fine only with unspaced filenames, don't miss quoting the wild char, and back quoting the find command ;) )
[edit]Care....
be sure of what you are doing.
if you delete your png files, you will just loose your original sources...
it might be a very bad practice...
using convert without any tricky -quality output option could create an enormous pdf file... and you might have to re-convert with -quality "60" for instance...
so keep your original sources until you do not need them any more

Bash Use Origin File as Destination File

I'm working with the ImageMagick bash tool, which uses commands of the form:
<command> <input filename> <stuff> <output filename>
I'm trying to do the following command:
<command> x.png <stuff> x.png
but for every file in a directory. I tried:
<command> *.png <stuff> *.png
But that didn't work. What's the correct way to perform such a command on every file in a directory?

Many Unix command-line tools, such as awk, sed, grep are stream-based or line-oriented, which means they process files one line at a time. When using these, it is necessary to write output to an intermediate file and then rename that (with mv) back over the original input file. The reason for that is that you may be writing over the input file before you have read it all, so you will clobber your inputs. In that case, #sjsam's answer is absolutely correct - especially since he is careful to use the && in between so that the mv is not done if the command is not successful.
In the specific case of ImageMagick however, and its convert, compare, compose commands, this is NOT the case. These programs are file-oriented, not line-oriented, so they read the entire input file into memory before writing any outputs. The reason is that they often need to know if there are any transparent pixels in an image before they can start processing, or how many colours there are in an image and they cannot know this till they have read the entire file. As such, it is perfectly safe, and in fact idiomatic to use the same input filename as output filename, like this:
convert image.jpg ... -crop 100x100 -colors 16 ... image.jpg
In answer to your question, when you have multiple files to process, the preferred method is to use the mogrify tool (also part of the ImageMagick suite) which is specifically intended for processing multiple files. So, you would do this:
mogrify -crop 100x100 -colors 16 *.jpg
which would overwrite your input files with the results - which is what you asked for. If you did not want it to overwrite your input files, you would add a -path telling ImageMagick the path to an output directory like this:
mogrify -path /path/to/thumbnails -thumbnail -colors 16 *.jpg
If you had too many files for the Windows COMMAND.EXE/CMD.EXE to expand and pass to mogrify, you would let ImageMagick expand the glob (the asterisk) internally like this and be able to deal with unlimited numbers of files:
mogrify -crop 100x100 '*.jpg'
The point is that not only is the mogrify syntax concise, and portable across Windows/OSX/Linux, it also means you only need to create a single process to do all your images, rather than having to write a FOR loop and create and execute a convert process and a mv process for each and every one of potentially thousands of files - so it is much more efficient and less error-prone.

For a single file do like this :
<command> x.png <stuff> temp.png && mv temp.png x.png
For a set of files do like this :
#!/bin/bash
find `pwd` -name \*.png | while read line
do
<command> "$line" <stuff> temp.png && mv temp.png "$line"
done
Save the script in the folder containing the png files as processpng. Make it an executable and run it.
./processpng
A general version of the above script which takes the path to folder containing the above files as argument is below :
#!/bin/bash
find $1 -name \*.png | while read line
do
<command> "$line" <stuff> temp.png && mv temp.png "$line"
done
Save the script as processpng anywhere in the computer. Make it an executable and run it like :
./processpng /path/to/your/png/folder
Edit:
Incorporating #anishsane's solution, a compact way of achieving the same
results would be
#!/bin/bash
find $1 -name \*.png -exec bash -c "<command> {} <stuff> temp.png && mv temp.png {}"
In the context of find .... -exec:
{} indicates (contains) the result(s) from the find expression. Note
that empty curly braces {} have no special meaning to shell so we can
get away without escaping {}
Note: Emphasis mine.
Reference: This AskUbuntu Question.

Simple way to resize large number of image files

I have a folder which contains about 45000 jpeg images. Most of them are from 10KB - 20Kb.
Now I want to write a script to resize all of them to fixed size 256x256. I wonder if there is any simple way to do that like: for a in *.jpg do .... I am using 8-core CPU with 8GB of RAM machine running Ubuntu 14.04, so it is fine if the process requires many resources

I would use GNU Parallel, like this to make the most of all those cores:
find . -name \*.jpg | parallel -j 16 convert {} -resize 256x256 {}
If you had fewer files, you could do it like this, but the commandline would be too long for 45,000 files:
parallel -j 16 convert {} -resize 256x256 {} ::: *.jpg
Also, note that if you want the files to become EXACTLY 256x256 regardless of input dimensions and aspect ratio, you must add ! after the -resize like this -resize 256x256!
As Tom says, make a backup first!
Here is a little benchmark...
# Create 1,000 files of noisy junk #1024x1024 pixels
seq 1 1000 | parallel convert -size 1024x1024 xc:gray +noise random {}.jpg
# Resize all 1,000 files using mogrify
time mogrify -resize 256x256 *.jpg
real 1m23.324s
# Create all 1,000 input files afresh
seq 1 1000 | parallel convert -size 1024x1024 xc:gray +noise random {}.jpg
# Resize all 1,000 files using GNU Parallel
time parallel convert -resize 256x256 {} {} ::: *.jpg
real 0m22.541s
You can see that GNU Parallel is considerably faster for this example. To be fair though, it is also wasteful of resources though because a new process has to be created for each input file, whereas mogrify just uses one process that does all the files. If you knew that the files were named in a particular fashion, you may be able to optimise things better...
Finally, you may find xargs and mogrify in concert work well for you, like this:
time find . -name \*.jpg -print0 | xargs -0 -I {} -n 100 -P 8 mogrify -resize 256x256 {}
real 0m20.840s
which allows up to 8 mogrify processes to run in parallel (-P 8), and each one processes up to 100 input images (-n 100) thereby amortizing the cost of starting a process over a larger number of files.

You could use the mogrify tool provided by ImageMagick
mogrify -resize 256x256 *.jpg
This modifies all files in place, resizing them to 256x256px. Make sure to take a backup of your originals before using this command.

Can someone please explain clearly how to use "pngcrush" for multiply items

I have a stack of hundreds of pictures and i want to use pngcrush for reducing the file sizes.
I know how to crush one file with terminal, but all over the web i find parts of explanations that assume previous knowledge.
can someone please explain how to do it clearly.
Thanks
Shani

You can use following script:
#!/bin/bash
# uncomment following line for more aggressive but longer compression
# pngcrush_options=-reduce -brute -l9
find . -name '*.png' -print | while read f; do
pngcrush $pngcrush_options -e '.pngcrushed' "$f"
mv "$f" "${f/%.pngcrushed/}"
done

Current versions of pngcrush support this functionality out of the box.
( I am using pngcrush 1.7.81)
pngcrush -dir outputFolder inputFolder/*.png
will create "outputFolder" if it does not exist and process all the .png files in the "inputFolder" placing them in "outputFolder".
Obviously you can add other options e.g.
pngcrush -dir outputFolder -reduce -brute -l9 inputFolder/*.png

Being in 2023, there are better tools to optimize png images like OptiPNG
install
sudo apt-get install optipng
use for one picture
optipng imagen.png
use for all pictures in folder
find /path/to/files/ -name '*.png' -exec optipng -o7 {} \;
optionally the command -o defines the quality, being possible from 1 to 7, where 7 is the maximum compression level of the image.
-o7

The high rated fix appears dangerous to me; it started compressing all png files in my iMac; needed is a command restricted to a specified directory; I am no UNIX expert; I undid the new files by searching for all files ending in .pngcrushed and deleting them

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio