Good compression algorithm for low entropy image - image

I am currently trying to further compress a very simple image. The image uses 2 sets of colors as well as 1 character per "pixel". each set of color may be 1 of 16 options. Because of this I have already combined both colors into 1 byte per pixel representing both of them. I already implemented MTF and BWT encoding methods to assist in RLE. I am positive I can get some more compression out of it however I am not sure what algorithm to use. I have tried huffman however because of the fact the image tends to be small already and RLE compresses most of it due to the lack of entropy, huffman half the time increases the size by adding its decoding table to the file. Please note this will also be run on a slower system so any really heavy algorithms may not work either.

First off, it sounds like you should compress the background and character color images separately. Second, you say that "the colors don't change too often from pixel to pixel". Are some colors "closer" to each other than others? I.e., when color changes from color x, is it more likely to change to a small subset of the remaining colors? If so, you can map the colors to be more adjacent to those they are likely to change to, and taking differences before coding. Then runs of the same color become runs of zeros, and changes to the "next" color become ones.
Once you have a good representation as a series of bytes with lots of runs and a skewed probability of occurrence of bytes values, e.g. lots of zeros and one, then apply zlib or gzip to take advantage of the apparent redundancy and skew.

Related

Huffman compression Images

I am working on a project I wanted to do for quite a while. I wanted to make an all-round huffman compressor, which will work, not just in theory, on various types of files, and I am writing it in python:
text - which is, for obvious reasons, the easiet one to implement, already done, works wonderfully.
images - this is where I am struggling. I don't know how to approach images and how to read them in a simple way that it'd actually help me compress them easily.
I've tried reading them pixel by pixel, but somehow, it actually enlarges the picture instead of compressing it.
What I've tried:
Reading the image pixel by pixel using Image(PIL), get all the pixels in a list, create a freq table (for each pixel) and then encrypt it. Problem is, imo, that I am reading each pixel and trying to make a freq table out of that. That way, I get way too many symbols, which leads to too many lengthy huffman codes (over 8 bits).
I think I may be able to solve this problem by reading a larger set of pixels or anything of that sort because then I'd have a smaller code table and therefore less lengthy huffman codes. If I leave it like that, I can, in theory, get 255^3 sized code table (since each pixel is (0-255, 0-255, 0-255)).
Is there any way to read larger amount of pixels at a time (>1 pixel) or is there a better way to approach images when all needed is to compress?
Thank you all for reading so far, and a special thank you for anyone who tries to lend a hand.
edited: If huffman is a real bad compression algorithm for images, are there any better ones you can think off? The project I'm working on can take different algorithms for different file types if it is neccessary.
Encoding whole pixels like this often results in far too many unique symbols, that each are used very few times. Especially if the image is a photograph or if it contains many coloured gradients. A simple way to fix this is splitting the image into its R, G and B colour planes and encoding those either separately or concatenated, either way the actual elements that are being encoded are in the range 0..255 and not multi-dimensional.
But as you suspect, exploiting just 0th order entropy is not so great for many images, especially photographs. As example of what some existing formats do, PNG uses filters to take some advantage of spatial correlation (great for smooth gradients), JPG uses quantized discrete cosine transforms and (usually) a colour space transformation to YCbCr (to decorrelate the channels, and to crush Chroma more mercilessly than Luma) and (usually) Chroma subsampling, JPEG2000 uses wavelets and colour space transformation both in its lossy and lossless forms (though different wavelets, and a different colour space transformation) and also supports subsampling though dropping a wavelet scale achieves a similar effect.

Algorithm to choose a transparent color for GIF

I have a bunch (up to 255) of colors (of the total 256^3 possible values) and for compression purposes I want to think up some another color, that isn't among them.
For example, I have such a small color table: [0,0,0], [1,42,69] -- any one of the remaining 256^3-2 colors would be fine -- no matter whether it is [0,0,7] or [6,6,6].
Can someone provide me with an easy and efficient algorithm to find another color?
UPD: bad ideas are also welcome.
Make a hash table of all known colors, and put your colors into it.
Make an algorithm that takes a color, and produces its "successor" by incrementing the lowest byte, and continuing with the increment into higher-order bytes when there is a carry.
Start at [0,0,0] and check it against the hash table from step 1.
Loop until you find the first gap.
This algorithm is linear in the number of colors.
Since now we have two answers, I would like to post my own.
We don't want any color in existing color table to become transparent. That is why I stated, that color table can be maximum of 255 colors long.
Because of this there would be at least one Red (or Green or Blue, whatever) channel value left unused. So we don't have to use 256^3 large table of flags -- 256 (bits for memory or bytes for speed) would be enough.
Walk through your image, counting the number of times each of the 256 possible pixel values occurs. Use std::min_element (for one possibility) to find the smallest count, and use that color number. If you're really talking about 256 possible color values, that's about it.
If you really have 24 bits per pixel, then you probably want to use a sparse representation for the counts, since (for any reasonable size of picture) many of them will inevitably be zero (you'd need roughly a 16-megapixel picture to even theoretically use all the possible colors). OTOH, on a modern computer, even using the few dozen megabytes (or so) necessary for a dense representation of the count may be worthwhile--it'll probably make your processing faster (no hash codes to compute) and still little enough memory usage that the reduction in processing time is worth it.

How to estimate GIF file size?

We're building an online video editing service. One of the features allows users to export a short segment from their video as an animated gif. Imgur has a file size limit of 2Mb per uploaded animated gif.
Gif file size depends on number of frames, color depth and the image contents itself: a solid flat color result in a very lightweight gif, while some random colors tv-noise animation would be quite heavy.
First I export each video frame as a PNG of the final GIF frame size (fixed, 384x216).
Then, to maximize gif quality I undertake several gif render attempts with slightly different parameters - varying number of frames and number of colors in the gif palette. The render that has the best quality while staying under the file size limit gets uploaded to Imgur.
Each render takes time and CPU resources — this I am looking to optimize.
Question: what could be a smart way to estimate the best render settings depending on the actual images, to fit as close as possible to the filesize limit, and at least minimize the number of render attempts to 2–3?
The GIF image format uses LZW compression. Infamous for the owner of the algorithm patent, Unisys, aggressively pursuing royalty payments just as the image format got popular. Turned out well, we got PNG to thank for that.
The amount by which LZW can compress the image is extremely non-deterministic and greatly depends on the image content. You, at best, can provide the user with a heuristic that estimates the final image file size. Displaying, say, a success prediction with a colored bar. You'd can color it pretty quickly by converting just the first frame. That won't take long on 384x216 image, that runs in human time, a fraction of a second.
And then extrapolate the effective compression rate of that first image to the subsequent frames. Which ought to encode only small differences from the original frame so ought to have comparable compression rates.
You can't truly know whether it exceeds the site's size limit until you've encoded the entire sequence. So be sure to emphasize in your UI design that your prediction is just an estimate so your user isn't going to disappointed too much. And of course provide him with the tools to get the size lowered, something like a nearest-neighbor interpolation that makes the pixels in the image bigger. Focusing on making the later frames smaller can pay off handsomely as well, GIF encoders don't normally do this well by themselves. YMMV.
There's no simple answer to this. Single-frame GIF size mainly depends on image entropy after quantization, and you could try using stddev as an estimator using e.g. ImageMagick:
identify -format "%[fx:standard_deviation]" imagename.png
You can very probably get better results by running a smoothing kernel on the image in order to eliminate some high-frequency noise that's unlikely to be informational, and very likely to mess up compression performance. This goes much better with JPEG than with GIF, anyway.
Then, in general, you want to run a great many samples in order to come up with something of the kind (let's say you have a single compression parameter Q)
STDDEV SIZE W/Q=1 SIZE W/Q=2 SIZE W/Q=3 ...
value1 v1,1 v1,2 v1,3
After running several dozens of tests (but you need do this only once, not "at runtime"), you will get both an estimate of, say, , and a measurement of its error. You'll then see that an image with stddev 0.45 that compresses to 108 Kb when Q=1 will compress to 91 Kb plus or minus 5 when Q=2, and 88 Kb plus or minus 3 when Q=3, and so on.
At that point you get an unknown image, get its stddev and compression #Q=1, and you can interpolate the probable size when Q equals, say, 4, without actually running the encoding.
While your service is active, you can store statistical data (i.e., after you really do the encoding, you store the actual results) to further improve estimation; after all you'd only store some numbers, not any potentially sensitive or personal information that might be in the video. And acquiring and storing those numbers would come nearly for free.
Backgrounds
It might be worthwhile to recognize images with a fixed background; in that case you can run some adaptations to make all the frames identical in some areas, and have the GIF animation algorithm not store that information. This, when and if you get such a video (e.g. a talking head), could lead to huge savings (but would throw completely off the parameter estimation thing, unless you could estimate also the actual extent of the background area. In that case, let this area be B, let the frame area be A, the compressed "image" size for five frames would be A+(A-B)*(5-1) instead of A*5, and you could apply this correction factor to the estimate).
Compression optimization
Then there are optimization techniques that slightly modify the image and adapt it for a better compression, but we'd stray from the topic at hand. I had several algorithms that worked very well with paletted PNG, which is similar to GIF in many regards, but I'd need to check out whether and which of them may be freely used.
Some thoughts: LZW algorithm goes on in lines. So whenever a sequence of N pixels is "less than X%" different (perceptually or arithmetically) from an already encountered sequence, rewrite the sequence:
018298765676523456789876543456787654
987678656755234292837683929836567273
here the 656765234 sequence in the first row is almost matched by the 656755234 sequence in the second row. By changing the mismatched 5 to 6, the LZW algorithm is likely to pick up the whole sequence and store it with one symbol instead of three (6567,5,5234) or more.
Also, LZW works with bits, not bytes. This means, very roughly speaking, that the more the 0's and 1's are balanced, the worse the compression will be. The more unpredictable their sequence, the worse the results.
So if we can find out a way of making the distribution more **a**symmetrical, we win.
And we can do it, and we can do it losslessly (the same works with PNG). We choose the most common colour in the image, once we have quantized it. Let that color be color index 0. That's 00000000, eight fat zeroes. Now we choose the most common colour that follows that one, or the second most common colour; and we give it index 1, that is, 00000001. Another seven zeroes and a single one. The next colours will be indexed 2, 4, 8, 16, 32, 64 and 128; each of these has only a single bit 1, all others are zeroes.
Since colors will be very likely distributed following a power law, it's reasonable to assume that around 20% of the pixels will be painted with the first nine most common colours; and that 20% of the data stream can be made to be at least 87.5% zeroes. Most of them will be consecutive zeroes, which is something that LZW will appreciate no end.
Best of all, this intervention is completely lossless; the reindexed pixels will still be the same colour, it's only the palette that will be shifted accordingly. I developed such a codec for PNG some years ago, and in my use case scenario (PNG street maps) it yielded very good results, ~20% gain in compression. With more varied palettes and with LZW algorithm the results will be probably not so good, but the processing is fast and not too difficult to implement.

How do font ID algorithms work?

I was wondering how automatic font identification services (like WhatTheFont, not question-based ones like Identifont) work. The most basic variant would be a service that lets you upload an image that contains text, and the service returns the name of the font used. How is this done, and how is it done so fast as to be practical? I'm fairly new to this kind of thing, but here's my understanding so far:
Perhaps some pre-processing to reduce noise. I'm not particularly interested in this part.
First the image is run through an OCR to extract the text – simple enough.
Then you go through every font in the tens/hundreds-of-thousands in your database and render the text you have extracted in each one, seeing if it's close to the original. Adjusting for size, alignment, kerning, different weights or italics, etc. How is this possibly fast enough to be practical?
Is this correct?
Please offer some insight into how this is done, and how it's done efficiently.
Let us assume you are doing the match in the raster representation (not on vectorized outlines).
Indeed, the text should be recognized first to reduce the number of comparisons with the characters in the reference fonts; at this stage it matters to avoid any dubious recognition as this would wreak havoc.
Then a stage of normalization is needed: you can transform the character position, size (and possibly italics angle ?) to a standard bounding box so that pixel-by-pixel comparison becomes possible. Then the amount of computation will be proportional to the area of the characters times the number of reference fonts.
Beware that normalization in size is not fully accurate as big character shrunk will differ from a smaller character in the same font in a few details and stroke thickness. It is probably useful to consider two or three representative sizes per font.

Expanding 8-bit color to 24-bit color

Setup
I have a couple hundred Sparkfun LED pixels (similar to https://www.sparkfun.com/products/11020) connected to an Arduino Uno and want to control the pixels from a PC using the built-in Serial-over-USB connection of the Arduino.
The pixels are individually adressable, each has 24 bits for the color (RGB). Since I want to be able to change the color of each pixel very quickly, the transmission of the data from the pc to the Arduino has to be very efficient (the further transmission of data from the Arduino to the pixels is very fast already).
Problem
I've tried simply sending the desired RGB-Values directly as is to the Arduino but this leads to a visible delay, when I want to for example turn on all LEDs at the same time. My straightforward idea to minimize the amount of data is to reduce the available colors from 24-bit to 8-bit, which is more than enough for my application.
If I do this, I have to expand the 8-bit values from the PC to 24-bit values on the Arduino to set the actual color on the pixels. The obvious solution here would be a palette that holds all available 8-bit values and the corresponding 24-bit colors. I would like to have a solution without a palette though, mostly for memory space reasons.
Question
What is an efficient way to expand a 8-bit color to a 24-bit one, preferrably one that preserves the color information accurately? Are there standard algorithms for this task?
Possible solution
I was considering a format with 2 bits for each R and B and 3 bits for G. These values would be packed into a single byte that would be transmitted to the Arduino and then be unpacked using bit-shifting and interpolated using the map() function (http://arduino.cc/en/Reference/Map).
Any thoughts on that solution? What would be a better way to do this?
R2B2G3 would give you very few colors (there's actually one more bit left). I don't know if it would be enough for your application. You can use dithering technique to make 8-bit images look a little better.
Alternatively, if you have any preferred set of colors, you can store known palette on your device and never send it over the wire. You can also store multiple palettes for different situations and specify which one to use with small integer index.
On top of that it's possible to implement some simple compression algorithm like RLE or LZW and decompress after receiving.
And there are some very fast compression libraries with small footprint you can use: Snappy, miniLZO.
Regarding your question “What would be a better way to do this?”, one of the first things to do (if not yet done) is increase the serial data rate. An Arduino Forum suggests using 115200 bps as a standard rate, and trying 230400 bps. At those rates you would need to write the receiving software so it quickly transfers data from the relatively small receive buffer into a larger buffer, instead of trying to work on the data out of the small receive buffer.
A second possibility is to put activation times into your data packets. Suppose F1, F2, F3... are a series of frames you will display on the LED array. Send those frames from the PC ahead of time, or during idle or wait times, and let the Arduino buffer them until they are scheduled to appear. When the activation time arrives for a given frame, have the Arduino turn it on. If you know in advance the frames but not the activation times, send and buffer the frames and send just activation codes at appropriate times.
Third, you can have multiple palettes and dynamic palettes that change on the fly and can use pixel addresses or pixel lists as well as pixel maps. That is, you might use different protocols at different times. Protocol 3 might download a whole palette, 4 might change an element of a palette, 5 might send a 24-bit value v, a time t, a count n, and a list of n pixels to be set to v at time t, 6 might send a bit map of pixel settings, and so forth. Bit maps can be simple 1-bit-per-pixel maps indicating on or off, or can be k-bits-per-pixel maps, where a k-bit entry could specify a palette number or a frame number for a pixel. This is all a bit vague because there are so many possibilities; but in short, define protocols that work well with whatever you are displaying.
Fourth, given the ATmega328P's small (2KB) RAM but larger (32KB) flash memory, consider hard-coding several palettes, frames, and macros into the program. By macros, I mean routines that generate graphic elements like arcs, lines, open or filled rectangles. Any display element that is known in advance is a candidate for flash instead of RAM storage.
Your (2, 3, 2) bit idea is used "in the wild." It should be extremely simple to try out. The quality will be pretty low, but try it out and see if it meets your needs.
It seems unlikely that any other solution could save much memory compared to a 256-color lookup table, if the lookup table stays constant over time. I think anything successful would have to exploit a pattern in the kind of images you are sending to the pixels.
Any way you look at it, what you're really going for is image compression. So, I would recommend looking at the likes of PNG and JPG compression, to see if they're fast enough for your application.
If not, then you might consider rolling your own. There's only so far you can go with per-pixel compression; size-wise, your (2,3,2) idea is about as good as you can expect to get. You could try a quadtree-type format instead: take the average of a 4-pixel block, transmit a compressed (lossy) representation of the differences, then apply the same operation to the half-resolution image of averages...
As others point out, dithering will make your images look better at (2,3,2). Perhaps the easiest way to dither for your application is to choose a different (random or quasi-random) fixed quantization threshold offset for each color of each pixel. Both the PC and the Arduino would have a copy of this threshold table; the distribution of thresholds would prevent posterization, and the Arduino-side table would help maintain accuracy.

Resources