compress dicom files to be sent to a remote server - image

I'm looking for a way to compress dicom files and send them to a remote server (nodejs in my case)
I tried bz2 compression and it seems to work very well on large dicom files (tested it with 10 Mb file which gave me a 5Mb compressed file).
When it comes to small size (like a 250Kb file ), I get a size reduced by a very few kb (5 to 10 kb in most cases) which it won't be worth it
can someone please explain me why bz2 is working very well with large dicom files and is there a better way to compress dicom files that I can use in order to send them via internet.
thanks in advance.

If you want to compress a DICOM dataset with images, recommendation is to use one of the compression types supported by DICOM Standard. This includes lossy and lossless JPEG, JPEG 2000, JPEG-LS, and RLE to name a few. Standard also supports encoding extended grayscale (12-16 bit grayscale) using the standard based compression techniques.
The Transfer Syntax element (0002, 0010) will indicate whether image in the DICOM dataset is already compressed or not. As for example, recompressing an already compressed image is going to appear having less compression ratio compared to the original. So the best way to measure is to compare with the original uncompressed image. If your original image is already compressed, you can calculate the uncompressed image size using (Rows x Columns x Bits Allocated / 8 x Sample Per Pixel x Number of Frames). Also, the compression ratio will vary based on the image type (color vs grayscale) and compression technique used. Typically, you will get much better compression when dealing with true color image vs a grayscale image such as an X-RAY.
As for using HTTP for uploading the file, you can also use DICOM standard defined service such as DICOMWeb (STOW-RS) REST service.
I work for LEAD Technologies and if you want to test various compressions on your DICOM files, we have a demo exe (Transfer Syntax) that ships with our free 60 day evaluation SDK that you can use for testing. Also, there is a demo for testing the DICOMWeb REST services. You can download the evaluation copy from our web site.

There is not one solution that perfectly fits all...
BZ2 is based on the principle that "colours" (or gray values, but I will use "colours" in this explanation) which frequently occur in the image are encoded with less bits than colours which are rare. Thus, as a rule of thumb: the bigger the image the better the compression ratio.
JPEG is a different approach which decomposes the image into tiles and optimizes the encoding for each tile. Thus the compression ratio is less dependent on the image size than it is for BZ2. JPEG comes in different flavors (lossy, lossless, JPEG 2000 which can create different serializations of the compressed data for different purposes, e.g. progressive refinement).
Less popular compression algorithms which are valid in DICOM but not widely supported by DICOM products are:
RLE (Run Length Encoding) - the pixel data is described by pairs of colour and number of pixels, so it compresses very well when you have large homogenous areas in the image. In all other cases it is rather increasing the size of the "compressed" image
JPEG-LS - I do not know how it is working internally, but it provides a lossless algorithm and a lossy algorithm in which you can control the loss of information (maximum difference of a pixel value after compression to the original pixel value). It is said to achieve better ratios than traditional JPEG, but as it is not widely supported, I have not used it in practice yet.
If you do not want to select the compression algorithm depending on the image type, JPEG-Lossless is probably a good compromise for you. In typical medical images, it achieves an average compression ratio of roughly 1:, a bit more with JPEG-2000.

Related

JPG vs compressed JPG vs WEBP - why WEBP isn't the smallest one?

I have this image (photo taken by me on SGS 9 plus): Uncompressed JPG image. Its size is 4032 x 3024 and its weight is around 3MB. I compressed it with TinyJPG Compressor and its weight was 1.3MB. For PNG images I used Online-Convert and I saw webp images much more smaller even than compressed with TinyPNG. I expected something similar, especially that I read an article JPG to WebP – Comparing Compression Sizes where WEBP is much smaller that compressed JPG.
But when I convert my JPG to WEBP format in various online image convertion tools, I see 1.5-2MB size, so file is bigger than my compressed JPG. Am I missing something? WEBP should not be much smaller than compressed JPG? Thank you in advance for every answer.
These are lossy codecs, so their file size mostly depends on quality setting used. Comparing just file sizes from various tools doesn't say anything without ensuring images have the same quality (otherwise they're incomparable).
There are a couple of possibilities:
JPEG may compress better than WebP. WebP has problems with blurring out of the details, low-resolution color, and using less than full 8 bits of the color space. In the higher end of quality range, a well-optimized JPEG can be similar or better than WebP.
However, most of file size differences in modern lossy codecs are due to difference in quality. The typical difference between JPEG and WebP at the same quality is 15%-25%, but file sizes produced by each codec can easily differ by 10× between low-quality and high-quality image. So most of the time when you see a huge difference in file sizes, it's probably because different tools have chosen different quality settings (and/or recompression has lost fine details in the image, which also greatly affects file sizes). Even visual difference too small for human eye to notice can cause noticeable difference in file size.
My experience is that lossy WebP is superior below quality 70 (in libjpeg terms) and JPEG is often better than WebP at quality 90 and above. In between these qualities it doesn't seem to matter much.
I believe WebP qualities are inflated about 7 points, i.e., to match JPEG quality 85 one needs to use WebP quality 92 (when using the cwebp tool). I didn't measure this well, this is based on rather ad hoc experiments and some butteraugli runs.
Lossy WebP has difficulties compressing complex textures such as leafs of trees densely, whereas JPEGs difficulties are with thin lines against flat borders, like a telephone line hanging against the sky or computer graphics.

LZW or JBIG is better lossless compression algorithm for images?

Which lossless compression algorithm [between LZW or JBIG] is better for compressing data sets consisting of images (colored and monochrome) ?
I have implemented both and tested on smaller data sets [each containing 100 images] and have found inconclusive results.
Please Note:: I cannot use Lossy compressions like Jpeg because the data after decompression has to be identical to that of the source. Neither I can other lossless algorithms like PNG as they are not supported by the firmware which is responsible for the decompression.
Neither LZW or JBIG are optimal, although JBIG (JBIG2) should give you better results.
LZW is not designed for images (e.g., it does not exploit 2D correlation), and JBIG. JBIG (perhaps you mean JBIG2?) does exploit the 2D correlation, although it is designed for monochrome images such as fax pages.
Of course, results will depend on your particular dataset, so if results are inconclusive the best thing you can do is to test on more images (and perhaps differenciate between color and grayscale images).
If your firmware supports it, I would also test JPEG-LS (https://jpeg.org/jpegls/), which in my experience gives good overall lossless compression performance.
JPEG-LS or JPEG 2000 would give better results. You can think about WebP or JPEG XR as well.
Note: If you want to render compressed image to browser then you may need to take the browser support into account. e.g. JPEG 2000 supported by safari, WebP supported by chrome and android browsers, JPEG-XR supported by IE11 & Edge likewise.

What is the state-of-art in lossy image compression?

What are the state-of-art algorithms when it comes to compressing digital images (say for instance color photos, maybe 800x480 pixels)?
Some of the formats that are frequently discussed as possible JPEG successors are:
JPEG XR (aka HD Photo, Windows Media Photo). According to a study the Graphics and Media Lab at Moscow State University (MSU) image quality is comparable to JPEG2000 and significantly better than JPEG, compression efficiency is comparable to JPEG-2000
WebP is already tested in the wild on Google properties mainly, where the format is served to Chrome users exclusively (if you connect with a different browser, you get png or jpg images instead). It's very web-oriented
HEVC-MSP. In a study of Mozilla Corporation (oct 2013) HEVC-MSP performed best in most tests, and in the tests that it was not best, it came in second to the original JPEG format (but the study only looked at the image compression efficiency and not at other metrics and data that matters: feature sets, performance during run-rime, licensing...)
Jpeg 2000. The most computational intensive to encode/decode. Compared with the regular JPEG format, it offers advantages such as support for higher bit depths, more advanced compression and a lossless compression option. It is the standard comparison term for the others but it is a bit "slow in acceptance".
Anyway JPEG encoders haven't really reached their full compression potential after 20+ years. Even within the constraints of strong compatibility requirements, there are projects (e.g. Mozilla mozjpeg Project or Google Guetzli) that can produce smaller JPG files without sacrificing quality.
It would depend on what you need to do with the encoded images of course. For webpages and small sizes, lossy compression systems may be suitable, but for satellite images, medical images etc. lossless compression may be required.
None of the formats mentioned above satisfy both situations. Not all of the above formats support every pixel format either, so they cannot be compared like for like.
I've been doing my own research into lossless compression for high bit depth images, and what I've found so far is that a Huffman coder with suitable reversible pre-compression filtering beats jpeg2000 and-jpeg xr in terms of file size by 56% on average (i.e., makes files less than half the size) on cinematic real world footage and faster. I can also beats FFV1 in the limited tests I've conducted, producing files under half the size even after FFV1 has truncated the source image pixel depths from 16 bits to 10 bits. Really most surprising.
For lossless compression ratios FLIF is ranked number one for me, but encoding times are astronomical. I've never made a file smaller than a FLIF file when compared. So good things come to those who wait. FLIF uses machine learning to achieve its compression ratios. Applying a lossy pre-compression filter to images before FLIF compression (something the encoder enables), creates visually lossless images that competes with the best lossy encoders, but with the advantage that re-encoding the output files repeatedly won't further reduce quality (as the encoder is lossless).
One thing that is obvious to me - nothing is really state of the art currently. Most formats are using old technology, designed in a time when memory and processing power was a premium. As far as lossless compression goes, FLIF is a big jump forward, but its an area of research that is wide open. Most research seems to be into lossy compression systems.

Is there an algorithm to compress multiple images that are not much different apart?

I have a set of 100+ images that only differ a few pixels. They also don't have many different colors. Is there a way to compress them together by taking advantage of that?
There's an algorithm: calculate the difference of each image to some reference image. Is there a ready made application? Probably not. One approach could be to combine the images to a very big image (by interleaving or placing them next to each other) and using png.
If this is for archiving purpose and you don't need a random access to them, you can concatenate them to zip / tar (with zero compression) and compress the whole thing. .bz2 algorithm (Burrows-Wheeler Transform) is able to search for similarities over much larger window than deflate of png. It's an order of ten megabytes vs. 10 kilobytes. If the images are large enough, the window size will limit the compression in both algorithms -- that would have to be combat by interleaving or delta compressing between each image.
The delta compression is utilized regularly in some video compressing applications and e.g. screen capture and virtual desktop applications where lossless compression is required.
String them together as a movie and use video compression. That is exactly what video compression does.
Nothing ready-made. If you want lossless compression, you could store just a reference image and the delta images (difference of images at the pixel leve), each one encoded with PNG. But you must write yourself the delta transformation.
You could also take advantage of the delta compression mode of the MNG format, an extension to PNG for animation (each of your image would be an animation frame). But the format is not widely supported.
You could also take the same approach (one image = one video frame) and use any standard video format (MPEG) but this would be lossy.

What is the difference between "JPG" / "JPEG" / "PNG" / "BMP" / "GIF" / "TIFF" Image?

I have seen many types of image extensions but have never understood the real differences between them. Are there any links out there that clearly explain their differences?
Are there standards to consider when choosing a particular type of image to use in an application? What do we use for web applications?
Yes. They are different file formats (and their file extensions).
Wikipedia entries for each of the formats will give you quite a bit of information:
JPEG (or JPG, for the file extension; Joint Photographic Experts Group)
PNG (Portable Network Graphics)
BMP (Bitmap)
GIF (Graphics Interchange Format)
TIFF (or TIF, for the file extension; Tagged Image File Format)
Image formats can be separated into three broad categories:
lossy compression,
lossless compression,
uncompressed,
Uncompressed formats take up the most amount of data, but they are exact representations of the image. Bitmap formats such as BMP generally are uncompressed, although there also are compressed BMP files as well.
Lossy compression formats are generally suited for photographs. It is not suited for illustrations, drawings and text, as compression artifacts from compressing the image will standout. Lossy compression, as its name implies, does not encode all the information of the file, so when it is recovered into an image, it will not be an exact representation of the original. However, it is able to compress images very effectively compared to lossless formats, as it discards certain information. A prime example of a lossy compression format is JPEG.
Lossless compression formats are suited for illustrations, drawings, text and other material that would not look good when compressed with lossy compression. As the name implies, lossless compression will encode all the information from the original, so when the image is decompressed, it will be an exact representation of the original. As there is no loss of information in lossless compression, it is not able to achieve as high a compression as lossy compression, in most cases. Examples of lossless image compression is PNG and GIF. (GIF only allows 8-bit images.)
TIFF and BMP are both "wrapper" formats, as the data inside can depend upon the compression technique that is used. It can contain both compressed and uncompressed images.
When to use a certain image compression format really depends on what is being compressed.
Related question: Ruthlessly compressing large images for the web
You should be aware of a few key factors...
First, there are two types of compression: Lossless and Lossy.
Lossless means that the image is made smaller, but at no detriment to the quality. Lossy means the image is made (even) smaller, but at a detriment to the quality. If you saved an image in a Lossy format over and over, the image quality would get progressively worse and worse.
There are also different colour depths (palettes): Indexed color and Direct color.
With Indexed it means that the image can only store a limited number of colours (usually 256) that are chosen by the image author, with Direct it means that you can store many thousands of colours that have not been chosen by the author.
BMP - Lossless / Indexed and Direct
This is an old format. It is Lossless (no image data is lost on save) but there's also little to no compression at all, meaning saving as BMP results in VERY large file sizes. It can have palettes of both Indexed and Direct, but that's a small consolation. The file sizes are so unnecessarily large that nobody ever really uses this format.
Good for: Nothing really. There isn't anything BMP excels at, or isn't done better by other formats.
GIF - Lossless / Indexed only
GIF uses lossless compression, meaning that you can save the image over and over and never lose any data. The file sizes are much smaller than BMP, because good compression is actually used, but it can only store an Indexed palette. This means that there can only be a maximum of 256 different colours in the file. That sounds like quite a small amount, and it is.
GIF images can also be animated and have transparency.
Good for: Logos, line drawings, and other simple images that need to be small. Only really used for websites.
JPEG - Lossy / Direct
JPEGs images were designed to make detailed photographic images as small as possible by removing information that the human eye won't notice. As a result it's a Lossy format, and saving the same file over and over will result in more data being lost over time. It has a palette of thousands of colours and so is great for photographs, but the lossy compression means it's bad for logos and line drawings: Not only will they look fuzzy, but such images will also have a larger file-size compared to GIFs!
Good for: Photographs. Also, gradients.
PNG-8 - Lossless / Indexed
PNG is a newer format, and PNG-8 (the indexed version of PNG) is really a good replacement for GIFs. Sadly, however, it has a few drawbacks: Firstly it cannot support animation like GIF can (well it can, but only Firefox seems to support it, unlike GIF animation which is supported by every browser). Secondly it has some support issues with older browsers like IE6. Thirdly, important software like Photoshop have very poor implementation of the format. (Damn you, Adobe!) PNG-8 can only store 256 colours, like GIFs.
Good for: The main thing that PNG-8 does better than GIFs is having support for Alpha Transparency.
Important Note: Photoshop does not support Alpha Transparency for PNG-8 files. (Damn you, Photoshop!) There are ways to convert Photoshop PNG-24 to PNG-8 files while retaining their transparency, though. One method is PNGQuant, another is to save your files with Fireworks.
PNG-24 - Lossless / Direct
PNG-24 is a great format that combines Lossless encoding with Direct color (thousands of colours, just like JPEG). It's very much like BMP in that regard, except that PNG actually compresses images, so it results in much smaller files. Unfortunately PNG-24 files will still be much bigger than JPEGs, GIFs and PNG-8s, so you still need to consider if you really want to use one.
Even though PNG-24s allow thousands of colours while having compression, they are not intended to replace JPEG images. A photograph saved as a PNG-24 will likely be at least 5 times larger than a equivalent JPEG image, which very little improvement in visible quality. (Of course, this may be a desirable outcome if you're not concerned about filesize, and want to get the best quality image you can.)
Just like PNG-8, PNG-24 supports alpha-transparency, too.
Generally these are either:
Lossless compression
Lossless compression algorithms reduce file size without losing image quality, though they are not compressed into as small a file as a lossy compression file. When image quality is valued above file size, lossless algorithms are typically chosen.
Lossy compression
Lossy compression algorithms take advantage of the inherent limitations of the human eye and discard invisible information. Most lossy compression algorithms allow for variable quality levels (compression) and as these levels are increased, file size is reduced. At the highest compression levels, image deterioration becomes noticeable as "compression artifacting". The images below demonstrate the noticeable artifacting of lossy compression algorithms; select the thumbnail image to view the full size version.
Each format is different as described below:
JPEG
JPEG (Joint Photographic Experts Group) files are (in most cases) a lossy format; the DOS filename extension is JPG (other OS might use JPEG). Nearly every digital camera can save images in the JPEG format, which supports 8 bits per color (red, green, blue) for a 24-bit total, producing relatively small files. When not too great, the compression does not noticeably detract from the image's quality, but JPEG files suffer generational degradation when repeatedly edited and saved. Photographic images may be better stored in a lossless non-JPEG format if they will be re-edited, or if small "artifacts" (blemishes caused by the JPEG's compression algorithm) are unacceptable. The JPEG format also is used as the image compression algorithm in many Adobe PDF files.
TIFF
The TIFF (Tagged Image File Format) is a flexible format that normally saves 8 bits or 16 bits per color (red, green, blue) for 24-bit and 48-bit totals, respectively, using either the TIFF or the TIF filenames. The TIFF's flexibility is both blessing and curse, because no single reader reads every type of TIFF file. TIFFs are lossy and lossless; some offer relatively good lossless compression for bi-level (black&white) images. Some digital cameras can save in TIFF format, using the LZW compression algorithm for lossless storage. The TIFF image format is not widely supported by web browsers. TIFF remains widely accepted as a photograph file standard in the printing business. The TIFF can handle device-specific colour spaces, such as the CMYK defined by a particular set of printing press inks.
PNG
The PNG (Portable Network Graphics) file format was created as the free, open-source successor to the GIF. The PNG file format supports truecolor (16 million colours) while the GIF supports only 256 colours. The PNG file excels when the image has large, uniformly coloured areas. The lossless PNG format is best suited for editing pictures, and the lossy formats, like JPG, are best for the final distribution of photographic images, because JPG files are smaller than PNG files. Many older browsers currently do not support the PNG file format, however, with Internet Explorer 7, all contemporary web browsers fully support the PNG format. The Adam7-interlacing allows an early preview, even when only a small percentage of the image data has been transmitted.
GIF
GIF (Graphics Interchange Format) is limited to an 8-bit palette, or 256 colors. This makes the GIF format suitable for storing graphics with relatively few colors such as simple diagrams, shapes, logos and cartoon style images. The GIF format supports animation and is still widely used to provide image animation effects. It also uses a lossless compression that is more effective when large areas have a single color, and ineffective for detailed images or dithered images.
BMP
The BMP file format (Windows bitmap) handles graphics files within the Microsoft Windows OS. Typically, BMP files are uncompressed, hence they are large; the advantage is their simplicity, wide acceptance, and use in Windows programs.
Use for Web Pages / Web Applications
The following is a brief summary for these image formats when using them with a web page / application.
PNG is great for IE6 and up (will require a small CSS patch to get transparency working well). Great for illustrations and photos.
JPG is great for photos online
GIF is good for illustrations when you do not wish to move to PNG
BMP shouldn't be used online within web pages - wastes bandwidth
Source: Image File Formats
Since others have covered the differences, I'll hit the uses.
TIFF is usually used by scanners. It makes huge files and is not really used in applications.
BMP is uncompressed and also makes huge files. It is also not really used in applications.
GIF used to be all over the web but has fallen out of favor since it only supports a limited number of colors and is patented.
JPG/JPEG is mainly used for anything that is photo quality, though not for text. The lossy compression used tends to mar sharp lines.
PNG isn't as small as JPEG but is lossless so it's good for images with sharp lines. It's in common use on the web now.
Personally, I usually use PNG everywhere I can. It's a good compromise between JPG and GIF.
JPG > Joint Photographic Experts Group
1 JPG images support 16 million colors and are best suited for photographs and complex graphics
2 JPGs do not support transparency.
PNG > Portable Network Graphics
1 It's used as an alternative to the GIF file format when the GIF technology was copyrighted and required permission to use.
2 PNGs allow for 5 to 25 percent greater compression than GIFs, and with a wider range of colors. PNGs use two-dimensional interlacing, which makes them load twice as fast as GIF images.”
3 Image that has a lot of colors or requires advanced variable transparency, PNG is the preferred file type.
GIF > Graphics Interchange Format
1 Reduces the number of colors in an image to 256.
2 GIFs also support transparency.
3 GIFs have the unique ability to display a sequence of images, similar to videos, called an animated GIF.
4 If the image has few colors and does not require any advanced alpha transparency effect, GIF is the way to go.
SVG > Scalable Vector Graphics
1 SVGs are a web standard based on XML that describe both static images and animations in two dimensions.
2 SVG allows you to create very high-quality graphics and animations that do not lose detail as their size increases/decreases.
These names refers to different ways to encode pixel image data (JPG and JPEG are the same thing, and TIFF may just enclose a jpeg with some additional metadata).
These image formats may use different compression algorithms, different color representations, different capability in carrying additional data other than the image itself, and so on.
For web applications, I'd say jpeg or gif is good enough. Jpeg is used more often due to its higher compression ratio, and gif is typically used for light weight animation where a flash (or something similar) is an over kill, or places where transparent background is desired. PNG can be used too, but I don't have much experience with that. BMP and TIFF probably are not good candidates for web applications.
What coobird and Gerald said.
Additionally, JPEG is the file format name. JPG is commonly used abbreviated file extension for this format, as you needed to have a 3-letter file extension for earlier Windows systems. Likewise with TIFF and TIF.
Web browsers at the moment only display JPEG, PNG and GIF files - so those are the ones that can be shown on web pages.
PNG supports alphachannel transparency.
TIFF can have extended options I.e.
Geo referencing for GIS applications.
I recommend only ever using JPEG for photographs, never for images like clip art, logos, text, diagrams, line art.
Favor PNG.
The named ones are all raster graphics, but beside that don't forget the more and more important vectorgraphics.
There are compressed and uncompressed types (in a more or less way), but they're all lossless. Most important are:
SVG / SVGZ
EPS
EMF / (WMF)
The file extension tells you how the image is saved. Some of those formats just save the bits as they are, some compress the image in different ways, including lossless and lossy methods. The Web can tell you, although I know some of the patient responders will outline them here.
The web favors gif, jpg, and png, mostly. JPEG is the same (or very close) to jpg.
For the specified difference and usage between the varies of image formats have a good discussion above already.
However, I want to add something for the overall process of capturing a picture and storing them.
The capturing process
Or you can say the construct process(as we can draw or make pictures with computers now). If you take a photograph with a camera, you are already using lots of sensors(CCD or CMOS) and algorithms(Bayer Pattern Filter, Sub-sampling and quantization, etc.) Also there are stuff like Pixel Format and Color Space. After you got the basic pixel information, there must be a way for storing them.
The basic image file structure
For storing the pixels info a file, we need a convention and related algorithms. For saving space, there are compression, but basically problem is encoding the pixels to bytes and decoding the bytes to pixels for display.
A typical image file may be consisted by several parts, basically two:meta data or file header and pixel data section. The meta data tells about the image itself, maybe height and width, file format, etc. And the pixel data section is the real section who deals with the real picture.
Storing and Displaying
As we said earlier, files are stored in hard disk and are in bytes/bits. So image files have no priority but also bytes stream actually. For displaying, maybe we should get something to know how monitor works. Typical PC monitors use RGB model for displaying.
Hope this helps:-)

Resources