Imagemagick Batch operation conditional on Filesize - How? - bash

I'm running Imagemagick on a command line Ubuntu terminal in Windows 10 - using the built in facility in Windows 10 - the Ubuntu App.
I am a complete linux novice but have installed imagemagick in the above environment.
My task - Auto remove the black(ish) border and deskew the images of thousands of scanned 35mm slides.
I can successfully run commands such as
mogrify -fuzz 35% -deskew 80% -trim +repage *.tif
The problem is:-
The border is not crisply defined nor is completely black, hence the -fuzz. Some images are over-trimmed at a certain fuzz, while others are not trimmed enough.
So what I want to do is to have two passes at this, with different fuzz %, for these reasons:-
1st pass with a low Fuzz%. Many images will not be trimmed at all but I have found that the ones that are susceptible to over-trimming will trim Ok with low %
Since all the images start with an identical filesize, the ones that have trimmed Ok will have a lower filesize (note these are tifs not jpgs)
So what I need to do is set a file size condition for the second pass at higher fuzz% THAT IGNORES file sizes below a certain value and does not perform any operation.
In this way, with few errors, all the images will be trimmed correctly.
So the question
- How can I adjust the command line to have 2 passes and to ignore a lower file size on the second pass?
I have a horrible feeling the the answer will be a script. I have no idea how to construct or set up Ubuntu to run this so if so, please can you point me to help for that also!!

In ImageMagick, you could do something like the following:
Get the input filesize
Use convert to deskew and trim.
Then find the new file
Then compare the new to the old to compute the percentdifference to some percent threshold
If the percent difference is less than some threshold, then the processing did not trim enough
So reprocess with a higher fuzz value and write over the input; otherwise keep the first one only and do not write over the old one.
Unix syntax.
Choose two fuzz values
Choose a percent change threshold
Create a new empty directory to hold the output (results)
cd
cd desktop/Originals
fuzz1=20
fuzz2=40
threshpct=10
list=`ls`
for img in $list; do
filesize=`convert -ping $img -precision 16 -format "%b" info: | sed 's/[B]*$//'`
echo "filesize=$filesize"
convert $img -background black -deskew 40% -fuzz $fuzz1% ../results/$img
newfilesize=`convert -ping ../results/$img -precision 16 -format "%b" info: | sed 's/[B]*$//'`
test=`convert xc: -format "%[fx:100*($filesize-$newfilesize)/$filesize<$threshpct?1:0]" info:`
echo "newfilesize=$newfilesize; test=$test;"
[ $test -eq 1 ] && convert $img -background black -deskew 40% -fuzz $fuzz2% ../results/$img
done
The issue is that you need to be sure you set your TIFF compression for the output the same as for the input so that the file sizes are equivalent and presumably the new size is not larger than the old one as happens with JPG.
Note that the sed is used to remove the letter B (bytes) from the file size, so they can be compared as numerals and not strings. The -precision 16 forces "%b" to report as B and not KB or MB.

Related

Image Conversion - RAW to png/raw for game (Pac The Man X)

So I have raw image and I am just curious If I can edit such image to save as RGB-32 Packed transparent interlaced raw and what program I could use, there is specification:
Format of RAW image
I have tried using photoshop but then game crashes. Is it even possible? I should get file without thumbnail. I also tried using gimp, free converters and Raw viewer but no luck. Any suggestions?
Edit:
Used photoshop (interleaved with transparency format), game starts but images are just bunch of pixels.
file that i try to prepare (221bits)
We are still not getting a handle on what output format you are really trying to achieve. Let's try generating a file from scratch, to see if we can get there.
So, let's just use simple commands that are available on a Mac and generate some test images from first principles. Start with exactly the same ghost.raw image you shared in your question. We will take the first 12 bytes as the header, and then generate a file full of red pixels and see if that works:
# Grab first 12 bytes from "ghost.raw" and start a new file "red.raw"
head -c 12 ghost.raw > red.raw
# Now generate 512x108 pixels, where red=ff, green=00, blue=01, alpha=fe and append to "red.raw"
perl -E 'say "ff0001fe" x (512*108)' | xxd -r -p >> red.raw
So you can try using red.raw in place of ghost.raw and tell me what happens.
Now try generating a blue file just the same:
# Grab first 12 bytes from "ghost.raw" and start a new file "blue.raw"
head -c 12 ghost.raw > blue.raw
# Now generate 512x108 pixels, where red=00, green=01, blue=ff, alpha=fe and append to "blue.raw"
perl -E 'say "0001fffe" x (512*108)' | xxd -r -p >> blue.raw
And then try blue.raw.
Original Answer
AFAIK, your image is actually 512 pixels wide by 108 pixels tall in RGBA8888 format with a 12-byte header at the start - making 12 + 4*(512 * 108) bytes.
You can convert it to PNG or JPEG with ImageMagick like this:
magick -size 512x108+12 -depth 8 RGBA:ghost.raw result.png
I still don't understand from your question or comments what format you actually want - so if you clarify that, I am hopeful we can get you answered.
Try using online converters. They help most of the time.\
A Website like these can possibly help:
https://www.freeconvert.com/raw-to-png
https://cloudconvert.com/raw-to-png
https://www.zamzar.com/convert/raw-to-png/
Some are specific websites which ask you for detail and some are straight forward conversions.

Filling the gaps made in chinese character due to line removal for ocr

Hello friends,
I have a hard time to ocr the above image due to the gaps that were made due to line removal.So could anyone kindly guide me on how to fill the gaps in chinese character using imagemagick
Cool question! There are many ways of approaching this but unfortunately I can't tell which ones work! So I'll give you some code and you can experiment by changing it around.
For the moment, I tried simply removing any lines that have white pixels in them, but you could look at the lines above and below, or do something else.
#!/bin/bash -xv
# Get lines containing white pixels
convert chinese.gif -colorspace gray -threshold 80% DEBUG-white-lines.png
# Develop that idea and get the line numbers in an array
wl=( $(convert chinese.gif -colorspace gray -threshold 80% -resize 1x\! -threshold 20% txt: | awk -F '[,:]' '/FFFFFF/{print $2}') )
# White lines are:
echo "${wl[#]}"
# Build a string of a whole load of "chop" commands to apply in one go, rather than applying one-at-a-time and saving/re-loading
# As we chop each line, the remaining lines move up, changing their offset by one line - UGHH. Apply a correction!
chop=""
correction=0
for line in "${wl[#]}" ; do
((y=line-correction))
chop="$chop -chop 0x1+0+$y "
((correction=correction+1))
done
echo $chop
convert chinese.gif $chop result.png
Here's the image DEBUG-white-lines.png:
The white lines are identified as:
44 74 134 164 194 254 284 314 374 404
The final command run is:
convert chinese.gif -chop 0x1+0+44 -chop 0x1+0+73 -chop 0x1+0+132 -chop 0x1+0+161 -chop 0x1+0+190 -chop 0x1+0+249 -chop 0x1+0+278 -chop 0x1+0+307 -chop 0x1+0+366 -chop 0x1+0+395 result.png
If I understand this correctly then you want to find a way of removing the white lines and then still get it to go through an OCR?
The best way would be by eye and connect the dots so to speak so the last pixel of the characters line up.
A programitcal way would be to remove the white line ad then duplicate the line above (or below) and shift it into place.
康 家 月 而 视 , 喝 道
" 你 想 做 什 么 !"
秦 微 微 一 笑 , 轻 声 道
不 知 道 看 着 些 亲 死 眼 前 ,
前 辈 会 不 会 有 痛 的 感 觉 。"
说 , 伸 手 一 指 , 一 位 少 妇
身 形 一 顿 , 小 出 现 了 一 个 血 洞
倒 地 身 广 。
康 家 相 又 惊 又 , 痛 声 道
I don't read Chinese but this is what it got machine translated as
Kang Jia month and watch, drink
"What do you want to do !"
Qin Weiwei smiled, softly
I don't know. look at some dead eyes. ,
Predecessors will not feel pain ."
And said, stretch out a finger , a young woman.
In The Shape of a meal, a small blood hole appeared
Down to the ground wide.
The Kang family was shocked and sore

How do I improve the performance of an read-write intensive imagemagick script?

I use a bash script to process a bunch of images for a timelapse movie. The method is called shutter drag, and i am creating a moving average for all images. The following script works fine:
#! /bin/bash
totnum=10000
seqnum=40
skip=1
num=$(((totnum-seqnum)/1))
i=1
j=1
while [ $i -le $num ]; do
echo $i
i1=$i
i2=$((i+1))
i3=$((i+2))
i4=$((i+3))
i5=$((i+4))
...
i37=$((i+36))
i38=$((i+37))
i39=$((i+38))
i40=$((i+39))
convert $i1.jpg $i2.jpg $i3.jpg $i4.jpg $i5.jpg ... \
$i37.jpg $i38.jpg $i39.jpg $i40.jpg \
-evaluate-sequence mean ~/timelapse/Images/Shutterdrag/$j.jpg
i=$((i+$skip))
j=$((j+1))
done
However, i noticed that this script takes a very long time to process a lot of images with a large average window (1s per image). I guess, this is caused by a lot of reading and writing in the background.
Is it possible to increase the speed of this script? For example by storing the images in the memory, and with every iteration deleting the first, and loading the last image only.
I discovered the mpr:{label} function of imagemagick, but i guess this is not the right approach, as the memory is cleared after the convert command?
Suggestion 1 - RAMdisk
If you want to put all your files on a RAMdisk before you start, it should help the I/O speed enormously.
So, to make a 1GB RAMdisk, use:
sudo mkdir /RAMdisk
sudo mount -t tmpfs -o size=1024m tmpfs /RAMdisk
Suggestion 2 - Use MPC format
So, assuming you have done the previous step, convert all your JPEGs to MPC format files on the RAMdisk. The MPC file can be dma'ed straight into memory without your CPU needing to do costly JPEG decoding as MPC is just the same format as ImageMagick uses in memory, but on-disk.
I would do that with GNU Parallel like this:
parallel -X mogrify -path /RAMdisk -fmt MPC ::: *.jpg
The -X passes as many files as possible to mogrify without creating loads of convert processes. The -path says where the output files must go. The -fmt MPC makes mogrify convert the input files to MPC format (Magick Pixel Cache) files which your subsequent convert commands in the loop can read by pure DMA rather than expensive JPEG decoding.
If you don't have, or don't like, GNU Parallel, just omit the leading parallel -X and the :::.
Suggestion 3 - Use GNU Parallel
You could also run #chepner's code in parallel...
for ...; do
echo convert ...
done | parallel
Essentially, I am echoing all the commands instead of running them and the list of echoed commands is then run by GNU Parallel. This could be especially useful if you cannot compile ImageMagick with OpenMP as Eric suggested.
You can play around with switches such as --eta after parallel to see how long it will take to finish, or --progress. Also, experiment with -j 2 or -j4 depending how big your machine is.
I did some benchmarks, just for fun. First, I made 250 JPEG images of random noise at 640x480, and ran chepner's code "as-is" - that took 2 minutes 27 seconds.
Then, I used the same set of images, but changed the loop to this:
for ((i=1, j=1; i <= num; i+=skip, j+=1)); do
echo convert "${files[#]:i:seqnum}" -evaluate-sequence mean ~/timelapse/Images/Shutterdrag/$j.jpg
done | parallel
The time went down to 35 seconds.
Then I put the loop back how it was, and changed all the input files to MPC instead of JPEG, the time went down to 36 seconds.
Finally, I used MPC format and GNU Parallel as above and the time dropped to 19 seconds.
I didn't use a RAMdisk as I am on a different OS from you (and have extremely fast NVME disks), but that should help you enormously too. You could write your output files to RAMdisk too, and also in MPC format.
Good luck and let us know how you get on please!
There is nothing you can do in bash to speed this up; everything except the actual IO that convert has to do is pretty trivial. However, you can simplify the script greatly:
#! /bin/bash
totnum=10000
seqnum=40
skip=1
num=$(((totnum-seqnum)/1))
# Could use files=(*.jpg), but they probably won't be sorted correctly
for ((i=1; i<=totnum; i++)); do
files+=($i.jpg)
done
for ((i=1, j=1; i <= num; i+=skip, j+=1)); do
convert "${files[#]:i:seqnum}" -evaluate-sequence mean ~/timelapse/Images/Shutterdrag/$j.jpg
done
Storing the files in a RAM disk would certainly help, but that's beyond the scope of this site. (Of course, if you have enough RAM, the OS should probably be keeping a file in disk cache after it is read the first time so that subsequent reads are much faster without having to preload a RAM disk.)

Determine bit depth of bmp file on os x

How can I determine the bit depth of a bmp file on Mac OS X? In particular, I want to check if a bmp file is a true 24 bit file, or if it is being saved as a greyscale (i.e. 8 bit) image. I have a black-and-white image which I think I have forced to be 24 bit (using convert -type TrueColor), but Imagemagick gives conflicting results:
> identify -verbose hiBW24.bmp
...
Type: Grayscale
Base type: Grayscale
Endianess: Undefined
Colorspace: Gray
> identify -debug coder hiBW24.bmp
...
Bits per pixel: 24
A number of other command-line utilities are no help, it seems:
> file hi.bmp
hi.bmp: data
> exiv2 hiBW24.bmp
File name : hiBW24.bmp
File size : 286338 Bytes
MIME type : image/x-ms-bmp
Image size : 200 x 477
hiBW24.bmp: No Exif data found in the file
> mediainfo -f hi.bmp
...[nothing useful]
If you want a commend-line utility try sips (do not forget to read the manpage with man sips). Example:
*terminal input*
sips -g all /Users/hg/Pictures/2012/03/14/QRCodeA.bmp
*output is:*
/Users/hg/Pictures/2012/03/14/QRCodeA.bmp
pixelWidth: 150
pixelHeight: 143
typeIdentifier: com.microsoft.bmp
format: bmp
formatOptions: default
dpiWidth: 96.000
dpiHeight: 96.000
samplesPerPixel: 3
bitsPerSample: 8
hasAlpha: no
space: RGB
I think the result contains the values you are after.
Another way is to open the image with the previewer preview.app and the open the info panel.
One of the most informative programs (but not easy to use) is exiftool by Phil Harvey http://www.sno.phy.queensu.ca/~phil/exiftool/ , which also works very well on MacOSX for a lot of file formats but maybe an overkill for your purpose.
I did this to investigate:
# create a black-to-white gradient and save as a BMP, then `identify` it to a file `unlim`
convert -size 256x256 gradient:black-white a.bmp
identify -verbose a.bmp > unlim
# create another black-to-white gradient but force 256 colours, then `identify` to a second file `256`
convert -size 256x256 gradient:black-white -colors 256 a.bmp
identify -verbose a.bmp > 256
# Now look at difference
opendiff unlim 256
And the difference is that the -colors 256 image has a palette in the header and has a Class:PseudoClass whereas the other has Class:Direct

ImageMagick crop huge image

I am trying to create tiles from a huge image say 40000x40000
i found a script on line for imagemagick he crops the tiles. it works fine on small images like say 10000x5000
once i get any bigger it ends up using to much memory and the computer dies.
I have added the limit options but they dont seem to take affect
i have the monitor in there but it does not help as the script just slows down and locksup the machine
it seems to just goble up like 50gig of swap disk then kill the machine
i think the problem is that as it crops each tile it keeps them in memory. What i think i needs is for it to write each tile to disk as it creates it not store them all up in memory.
here is the script so far
#!/bin/bash
file=$1
function tile() {
convert -monitor -limit memory 2GiB -limit map 2GiB -limit area 2GB $file -scale ${s}%x -crop 256x256 \
-set filename:tile "%[fx:page.x/256]_%[fx:page.y/256]" \
+repage +adjoin "${file%.*}_${s}_%[filename:tile].png"
}
s=100
tile
s=50
tile
After a lot more digging and some help from the guys on the ImageMagick forum I managed to get it working.
The trick to getting it working is the .mpc format. Since this is the native image format used by ImageMagick it does not need to convert the initial image, it just cuts out the piece that it needs. This is the case with the second script I setup.
Lets say you have a 50000x50000 .tif image called myLargeImg.tif. First convert it to the native image format using the following command:
convert -monitor -limit area 2mb myLargeImg.tif myLargeImg.mpc
Then, run the bellow bash script that will create the tiles. Create a file named tiler.sh in the same folder as the mpc image and put the below script:
#!/bin/bash
src=$1
width=`identify -format %w $src`
limit=$[$width / 256]
echo "count = $limit * $limit = "$((limit * limit))" tiles"
limit=$((limit-1))
for x in `seq 0 $limit`; do
for y in `seq 0 $limit`; do
tile=tile-$x-$y.png
echo -n $tile
w=$((x * 256))
h=$((y * 256))
convert -debug cache -monitor $src -crop 256x256+$w+$h $tile
done
done
In your console/terminal run the below command and watch the tiles appear one at at time into your folder.
sh ./tiler.sh myLargeImg.mpc
libvips has an operator that can do exactly what you want very quickly. There's a chapter in the docs introducing dzsave and explaining how it works.
It can also do it in relatively little memory: I regularly process 200,000 x 200,000 pixel slide images using less than 1GB of memory.
See this answer, but briefly:
$ time convert -crop 512x512 +repage huge.tif x/image_out_%d.tif
real 0m5.623s
user 0m2.060s
sys 0m2.148s
$ time vips dzsave huge.tif x --depth one --tile-size 512 --overlap 0 --suffix .tif
real 0m1.643s
user 0m1.668s
sys 0m1.000s
You may try to use gdal_translate utility from GDAL project. Don't get scared off by the "geospatial" in the project name. GDAL is an advanced library for access and processing of raster data from various formats. It is dedicated to geospatial users, but it can be used to process regular images as well, without any problems.
Here is simple script to generate 256x256 pixel tiles from large in.tif file of dimensions 40000x40000 pixels:
#!/bin/bash
width=40000
height=40000
y=0
while [ $y -lt $height ]
do
x=0
while [ $x -lt $width ]
do
outtif=t_${y}_$x.tif
gdal_translate -srcwin $x $y 256 256 in.tif $outtif
let x=$x+256
done
let y=$y+256
done
GDAL binaries are available for most Unix-like systems as well as Windows are downloadable.
ImageMagick is simply not made for this kind of task. In situations like yours I recommend using the VIPS library and the associated frontend Nip2
VIPS has been designed specifically to deal with very large images.
http://www.vips.ecs.soton.ac.uk/index.php?title=VIPS

Resources