Handle 150GB .jp2 image - image

I downloaded a 150GB satellite .jp2 image, of which I only want a small section at a time.
How would I go about tiling the image in manageable chunks? Just extracting a part of the image would also be enough.
As I'm only somewhat familiar with Python, I looked at the Pillow and OpenCV libraries, but without success as the image resolution exceeds their limits.
I also looked into Openslide for Python, but couldn't get rid of an error (Could not find module 'libopenslide-0.dll').

libvips can process huge images efficiently.
For example, with this 2.8gb test image:
$ vipsheader 9235.jp2
9235.jp2: 107568x79650 uchar, 4 bands, srgb, jp2kload
$ ls -l 9235.jp2
-rw-r--r-- 1 john john 2881486848 Mar 1 22:37 9235.jp2
I see:
$ /usr/bin/time -f %M:%e \
vips crop 9235.jp2 x.jpg 10000 10000 1000 1000
190848:0.45
So it takes a 1,000 x 1,000 pixel chunk out of a 110,000 x 80,000 pixel jp2 image in 0.5s and needs under 200mb of memory.
There are bindings for python, ruby, node, etc., so you don't have to use the CLI.
In python you could write:
import pyvips
image = pyvips.Image.new_from_file("9235.jp2")
tile = image.crop(10000, 10000, 1000, 1000)
tile.write_to_file("x.jpg")
It does depend a bit on the jp2 image you are reading. Some are untiled (!!) and it can be very slow to read out a section.
There are windows binaries as well, check the "download" page.
If you're on linux, vipsdisp can view huge images like this very quickly.

Grok JPEG 2000 toolkit
is able to decompress regions of extremely large images such as
the 150 GB image you linked to.
Sample command:
grk_decompress -i FOO.jp2 -o FOO.tif -d 10000,10000,15000,15000 -v
to decompress the region (10000,10000,15000,15000) to TIFF format.

Related

fastest ways of cropping a fixed size image into 9 fixed size crops

I have thousands of images in a folder and I need to crop each image(3840x2160) into 9 crops (1280x720) with the fastest way possible. The cropped images should preserve all the details of the original image as it may contain small objects of interest. I found that image_slicer does the job, but not sure about its speed.
from image_slicer import slice
slice('cake.png', 9)
I also looked at PIL (image_slicer uses it), Imagemagick and some other similar libraries, but was not able to evaluate their speed. Maybe ffmpeg has some line command to achieve my goal.
Eventually, I need a Windows executable file where a user inserts a full path to images and gets cropped images for each image in the new folder.
Imagemagick can do that with the tile mode for cropping. But you would have to write a bat script to loop over all the files that you want to process. For any given image
convert.exe image.suffix -crop 1280x720 +repage +adjoin newimage_%d.suffix
The +repage removes any virtual canvas that might be supported by the suffix.
The +adjoin ensure that each image is written as a new file for suffix types that support multiple pages in the same file.
Imagemagick is not the fastest processor around.
See https://imagemagick.org/Usage/crop/#crop_tile
Given that:
2GHz processors have been available for around 20 years and only around 3GHz is "mainstream" today, and
quad-core and up to 16-core processors are fairly common nowadays
it would seem processors are becoming "fatter" (more cores) rather than "taller" (more GHz). So you would probably do well to leverage that.
Given that Python has a GIL, it is not so good for multi-threading, so you would probably do better to use multi-processing instead of multi-threading and allow each core to work independently on an image to minimise the amount of pickling and sharing data between processes.
You didn't mention the format or dimensions of your images. If they are JPEGs, you might consider using turbo-jpeg. If they are very large, memory may be an issue with multiprocessing.
The likely candidates are probably the following:
OpenCV
vips
Pillow
ImageMagick/wand
but it will depend on many things:
CPU - GHz, cores, generation
RAM - amount, speed, channels, timings
disk subsystem - spinning, SSD, NVMe
image format, dimensions, bit-depth
So you'll need to benchmark. I did some similar benchmarking here.
If you want to replace the process_file() in John's answer with a PIL or OpenCV version, it might look like this:
import pathlib
import numpy as np
from PIL import Image
import cv2
def withPIL(filename):
pathlib.Path(f"out/{filename}_tiles").mkdir(parents=True, exist_ok=True)
image = Image.open(filename)
for y in range(3):
for x in range(3):
top, left = y*720, x*1280
tile = image.crop((left,top,left+1280,top+720))
tile.save(f"out/{filename}_tiles/{x}_{y}.png")
def withOpenCV(filename):
pathlib.Path(f"out/{filename}_tiles").mkdir(parents=True, exist_ok=True)
image = cv2.imread(filename,cv2.IMREAD_COLOR)
for y in range(3):
for x in range(3):
top, left = y*720, x*1280
tile = image[top:top+720, left:left+1280]
cv2.imwrite(f"out/{filename}_tiles/{x}_{y}.png",tile)
As Mark says, you need to make a benchmark and test on your system.
I tried here with pyvips:
import sys
from multiprocessing import Pool
import pyvips
import os
def process_file(filename):
os.mkdir(f"out/{filename}_tiles")
image = pyvips.Image.new_from_file(filename)
for y in range(3):
for x in range(3):
tile = image.crop(x * 1280, y * 720, 1280, 720)
tile.write_to_file(f"out/{filename}_tiles/{x}_{y}.png")
with Pool(32) as p:
p.map(process_file, sys.argv[1:])
I made 100 test PNGs:
$ vips crop ~/pics/wtc.jpg uhd.png 0 0 3840 2160
$ for i in {1..100}; do cp uhd.png test_$i.png; done
And ran the program like this:
$ mkdir out
$ /usr/bin/time -f %M:%e ../crop9.py test_*.png
206240:5.45
So about 5.5s and 200mb of ram on my PC.
PNG is a very slow format. It'll be several times quicker if you use something like JPG.

Recent MATLAB and Octave have stronger JPEG compression and show artifacts

I wonder why the jpeg compression in recent MATLAB and Octave versions has gone so strong that it causes noticable compression artifacts:
Octave 3 jpeg image with size of 41 KB with no artifacts:
MATLAB 9 jpeg image with size of 26 KB with artifacts:
Octave 5 jpeg image with size of 23 KB with artifacts:
Here is the code to plot:
description = strcat('version-', num2str(version));% find out MATLAB/Octave version
x=1:2; % simple var
figure; % plot
plot(x, x);
title(description);
print(strcat("test_jpeg_size_", description ,'.jpg'), '-djpeg'); % write file
Do you know a possibility to tell MATLAB and Octave to do a weaker jpeg compression. I cannot find anything like this on https://de.mathworks.com/help/matlab/ref/print.html.
I know that I could plot png files and use imagemagick to convert it to jpeg with a given quality but this would be a workaround with additional tools. Or I could png files in the first place but the real images have no compression advantages for png (like like simple one here) and I would have to change a lot of other stuff.
This used to be documented*, I was surprised to not find it in the documentation pages. I tested it with the latest version of MATLAB (R2019b) and it still works:
The -djpeg option can take a quality value between 0 and 100, inclusive. The device option becomes -djpeg100 or -djpeg80, or whatever value you want to use.
print(strcat("test_jpeg_size_", description ,'.jpg'), '-djpeg100');
* Or at least I remember it being documented... The online documentation goes back to R13 (MATLAB 6.5), and it's not described in that version of the documentation nor in a few random versions in between that and the current version.
However, I strongly recommend that you use PNG for line drawings. JPEG is not intended for line drawings, and makes a mess of them (even at highest quality setting). PNG will produce better quality with a much smaller file size.
Here I printed a graph with -djpeg100 and -dpng, then cut out a small portion of the two files and show them side by side. JPEG, even at 100 quality, makes a mess of the lines:
Note that, in spite of not having any data loss, the PNG file is about 10 times smaller than the JPEG100 file.
You can go for
f = getframe(gcf);
imwrite(f.cdata, 'Fig1.jpg')
where imwrite takes the following options
Compression (compression scheme)
Quality (quality of JPEG-compressed file from 0 to 100)
See the doc of imwrite.

Extracting a region of interest from an image file without reading the entire image

I am searching for a library (in any language) that is capable of reading a region of an image file (any format) without having to initially read that entire image file.
I have come across a few options such as vips, which does indeed not keep the entire image in memory, but still seems to need to read it entirely to begin with.
I realize this may not be available for compressed formats such as jpegs, but in theory it sounds like bmps or tiffs should allow for this type of reading.
libvips will read just the part you need, when it can. For example, if you crop 100x100 pixels from the top-left of a large PNG, it's fast:
$ time vips crop wtc.png x.jpg 0 0 100 100
real 0m0.063s
user 0m0.041s
sys 0m0.023s
(the four numbers are left, top, width, height of the area to be cropped from wtc.png and written to x.jpg)
But a 100x100 pixel region from near the bottom is rather slow, since it has to read and decompress the pixels before the pixels you want to get to the right point in the file:
$ time vips crop wtc.png x.jpg 0 9000 100 100
real 0m3.063s
user 0m2.884s
sys 0m0.181s
JPG and strip TIFF work in the same way, though it's less obvious since they are much faster formats.
Some formats support true random-access read. For example, tiled TIFF is fast everywhere, since libvips can use libtiff to read only the tiles it needs:
$ vips copy wtc.png wtc.tif[tile]
$ time vips crop wtc.tif x.jpg 0 0 100 100
real 0m0.033s
user 0m0.013s
sys 0m0.021s
$ time vips crop wtc.tif x.jpg 0 9000 100 100
real 0m0.037s
user 0m0.021s
sys 0m0.017s
OpenSlide, vips, tiled OpenEXR, FITS, binary PPM/PGM/PBM, HDR, RAW, Analyze, Matlab and probably some others all support true random access like this.
If you're interested in more detail, there's a chapter in the API docs describing how libvips opens a file:
http://libvips.github.io/libvips/API/current/How-it-opens-files.md.html
Here's crop plus save in Python using pyvips:
import pyvips
image = pyvips.Image.new_from_file(input_filename, access='sequential')
tile = image.crop(left, top, width, height)
tile.write_to_file(output_filename)
The access= is a flag that hints to libvips that it's OK to stream this image, in case the underlying file format does not support random access. You don't need this for formats that do support random access, like tiled TIFF.
You don't need to write to a file. For example, this will make a buffer object containing the file encoded as a JPG:
buffer = tile.write_to_buffer('.jpg', Q=85)
Or this will write directly to stdout:
target = pyvips.Target.new_from_descriptor(0)
tile.write_to_target('.jpg', Q=85)
The Q=85 is an optional argument to set the JPG Q factor. You can set any of the file save options.
ITK can do it with some formats. There is a method CanStreamRead which returns true for formats which support streaming, such as MetaImageIO. An example can be found here. You can ask more detailed questions on ITK's forum.
If have control over the file format, I would suggest you use tiled TIFF files. These are typically used in digital pathology whole slide images, with average sizes of 100kx30k pixels or so.
LibTiff makes it easy to read the tiles corresponding to a selected ROI. Tiles can be compressed without making it less efficient to read a small region (no need to decode whole scan lines).
The BMP format (uncompressed) is simple enough that you can write the function yourself.
TIFF is a little less easy, as there are so many subformats. But the TIFF library (TIFFlib) supports a "tile-oriented" I/O mode. http://www.libtiff.org/libtiff.html#Tiles
I don't know of such a library solution.
Low level, file-read access is format specific and in particular, file mapping is OS specific.
If you have access to the raw bytes then assuming you know the width, height, depth and number of channels etc. then calculating file offsets is trivial so just roll your own.
If you're transferring the extracted data over a network you might consider compressing the extracted ROI in-memory if it's relatively big before sending it over the network.

DeepZoom white images - Imagemagick Vips Cocoa

i have a little cocoa osXa app that uses Vips DZSAVE and imagemagick to create the DeepZoom Tile from a big psb file.
the problem is that it works fine just till a undefined size. i'm managing correctly files about 60.000px X 50.000px 27Gb size but whit bigger files the app is generating a tile made by white images.
No data are written...
i have to manage images around 170.000px X 170.000px between 60 and 80 Gb.
i have tried Environment Variables to increase imagemagick cache limits but, no results...
someone has some ideas about the white output?
I'm the vips maintainer. Try at the command-line, something like:
vips dzsave huge.psb output_name --tile-size 256 --overlap 0 --vips-progress --vips-leak
and see what happens. If you run "top" at the same time you can watch memory use.
vips uses libMagick to load the psb files and my guess would be that this is hitting a memory limit somewhere inside ImageMagick.
Do you have to use psb? If you can use a format that vips can process directly it should work much better. Big TIFF or Openslide (if these are slide images) are both good choices. I regularly process 200,000 x 200,000 images with dzsave on a very modest laptop.

Large Image Processing with ImageMagick convert - Need More Throughput

I am converting some largish images from a multi-image (pyramidal) tif to png format. The salient parts of the report from "identity -verbose" on the largest image are here:
Image:
Format: TIFF (Tagged Image File Format)
Class: DirectClass
Geometry: 72224x64080+0+0
Resolution: 72x72
Print size: 1003.11x890
Units: PixelsPerInch
Type: TrueColor
Base type: TrueColor
Endianess: MSB
Colorspace: RGB
Depth: 8-bit
Channel depth:
red: 8-bit
green: 8-bit
blue: 8-bit
...
Page geometry: 72224x64080+0+0
...
Scene: 2 of 12
Compression: JPEG
Orientation: TopLeft
Properties:
...
Filesize: 1.389GBB
Number pixels: 4.6281GB
Pixels per second: 5.516MB
User time: 218.277u
Elapsed time: 13:60.020
Version: ImageMagick 6.7.1-0 2011-07-06 Q16 http://www.imagemagick.org
I am intending to use deepzoom composer to produce input for the Silverlight multiscaleimage control with this image. My question is how do I bring my system to its knees while processing these images with ImageMagick - it is taking too long to convert them. I have looked at a few articles, but I can't seem to get anywhere.
Some system and other related information:
OS: Windows 7 64 bit.
CPU: Intel Core2 Duo E7300 # 2.66, 2.67
RAM: 4.0 GB
PAGEFILE: 8-12GB on non-OS disk
"MAGICK_TMPDIR": Yet another empty, non-os disk with 140GB available.
Here is the result of "identify -list resource":
File Area Memory Map Disk Thread
------------------------------------------------------------------
1536 4.1582GB 15.491GiB 30.981GiB unlimited 2
I am running this command to extract the image referenced above:
convert "myFN.tif[2]" -limit file 8192GB -limit thread 32 "myFN%d.png"
Adding the two limit values did not seem to make a difference. When I run this, I average about 10% CPU utilization and have a pagefile commit size of 3BG. I can barely tell that it is running.
Q1) Is there anything else I can do to get ImageMagick to use more system resources? Most of the "large image" links I have found are asking the opposite question.
Q2) Changing "policy.xml" values (such as files) located here:
C:\Program Files\ImageMagick-6.7.1-Q16\www\source
did not seem to affect anything - the changes did not show up in the next "identify -list resource." Is there a trick to this?
Q3) Any other hints or ideas for this task?
Thanks,
David
libvips can convert pyramidal tiff directly into deepzoom pyramids. It's free, very fast and doesn't need much memory.
For example, I see:
$ vipsheader vips-pyr.tif
vips-pyr.tif: 18008x22764 uchar, 3 bands, srgb, tiffload
$ time vips dzsave vips-pyr.tif x.zip
real 0m9.763s
user 0m19.700s
sys 0m4.644s
peak memory: 180mb
That's a 20,000 x 20,000 pyramidal tiff converted to deepzoom in under 10 seconds on a small laptop. It's writing a zip file containing the pyramid, so you can upload to a server immediately. Memory use scales with image width, so it'll do very large images --- I regularly process 250,000 x 250,000 pixel slides.
There's a chapter in the docs introducing dzsave.
For your (my) image, the limiting factor is the size of the pixel cache, which is limited by the setting "MAGICK_AREA_LIMIT". The default of 4GB is not large enough for 72224 x 64080 - that would require a setting of at least 4.4GB - try "MAGICK_AREA_LIMIT=8GB."
If you want to control the impact that ImageMagick has on system RAM and the system page file, then you can limit that using "MAGICK_MEMORY_LIMIT." In truth, there isn't much need to use a large limit there since the fallback location for the pixel cache is mapped memory files, which are on the same order of magnitude of efficiency as the system page file. Try "MAGICK_MEMORY_LIMIT=2GB", to keep the pixel cache out of there (not that it would go there anyways - it is way bigger than 12GB.)
You want the pixel cache to go to mapped memory, so try "MAGICK_MAP_LIMIT=100GB" to take advantage of that space you have. The memory mapped files will end up, not in the system temp directory, but in the directory specified by "MAGICK_TMPDIR".
For extra credit, you also might experiment with the Q8 version, since you don't need 16 bit color channels. You can expect roughly half the disk io with that version.
Good luck!
David
The Q8 version uses half the disk space and time to complete a conversion compared to the Q16 version! Also, if you are going to end up breaking up the image into tiles, you can do that in a single step with a command like:
convert.exe" "WRL_15_1A.tif[2]" -crop 14409x15396 +repage
-scene 0 "temp\WRL_15_1A%d.tif"
The "[2]" calls out the third image (the one with the highest
resolution.)
The -crop parameters are 1/4 of the width and height
respectively, giving us 16 tiles.
The +repage sets all of the tiles at origin (0,0)
The "%d" numbers the files, starting at the # set by
"-scene".
Imagemagick has a format for handling large files (mpc). Basically trading disk space for ram. Two files are generated on convert, .mpc and .cache, and you can run imagemagick commands on the smaller .mpc file. These files may only work on your current build of imagemagick, so they aren't suitable for archive.

Resources