A bit of tiling - algorithm

I just finished writing a small script to combine a lot of png pictures into
a bigger one for CSS Sprite.
So basically, I have a list of dimensions [(w1,h2), ..., (wn,hn)]
and I need to put them into a frame with dimension (W,H) with WH as
small as possible. (Of course they cannot overlap)
The heuristic I used is obviously not optimal. I was wondering
if you had any ideas on the subject?
Do you think some smarter constraint, like grouping images with
similar histograms would make the png smaller?

It is not obvious from how you pose the question whether it is the dimensions of the picture, or the size of the PNG file, which you want to optimize, as picture histograms do not affect their dimensions (width and height).
However, finding minimal K so that, say, max(W, H) < K, so that all your pictures can be fit into a rectangular area W wide and H tall is an NP-complete problem, which means that it is difficult to solve well in general. See for example here.

How does the output from your script compare to the output from ImageMagick?

About making the resulting png smaller:
Be sure to use best compression level (level 9) and if possible, use PNGOUT to compress the PNG even better (there's a PNGOUT plugin for IrfanView, too).
Grouping images with similar histograms could be good, note that PNG uses zLib which uses a sliding window, so it work best if you group the images horizontal.
EDIT: Same for free space. It should be better to have free space at the top or the bottom of the image than at the left or right. The color of the free space shouldn't really matter, some color or black or white should be fine.
By the way, the size of the zLib sliding window is 32K, so it's not that good for detecting image duplicates if images are big. You'd better process the images yourself with an own algorithm in this case and use some duplicate removal or delta filter.

Related

Huffman compression Images

I am working on a project I wanted to do for quite a while. I wanted to make an all-round huffman compressor, which will work, not just in theory, on various types of files, and I am writing it in python:
text - which is, for obvious reasons, the easiet one to implement, already done, works wonderfully.
images - this is where I am struggling. I don't know how to approach images and how to read them in a simple way that it'd actually help me compress them easily.
I've tried reading them pixel by pixel, but somehow, it actually enlarges the picture instead of compressing it.
What I've tried:
Reading the image pixel by pixel using Image(PIL), get all the pixels in a list, create a freq table (for each pixel) and then encrypt it. Problem is, imo, that I am reading each pixel and trying to make a freq table out of that. That way, I get way too many symbols, which leads to too many lengthy huffman codes (over 8 bits).
I think I may be able to solve this problem by reading a larger set of pixels or anything of that sort because then I'd have a smaller code table and therefore less lengthy huffman codes. If I leave it like that, I can, in theory, get 255^3 sized code table (since each pixel is (0-255, 0-255, 0-255)).
Is there any way to read larger amount of pixels at a time (>1 pixel) or is there a better way to approach images when all needed is to compress?
Thank you all for reading so far, and a special thank you for anyone who tries to lend a hand.
edited: If huffman is a real bad compression algorithm for images, are there any better ones you can think off? The project I'm working on can take different algorithms for different file types if it is neccessary.
Encoding whole pixels like this often results in far too many unique symbols, that each are used very few times. Especially if the image is a photograph or if it contains many coloured gradients. A simple way to fix this is splitting the image into its R, G and B colour planes and encoding those either separately or concatenated, either way the actual elements that are being encoded are in the range 0..255 and not multi-dimensional.
But as you suspect, exploiting just 0th order entropy is not so great for many images, especially photographs. As example of what some existing formats do, PNG uses filters to take some advantage of spatial correlation (great for smooth gradients), JPG uses quantized discrete cosine transforms and (usually) a colour space transformation to YCbCr (to decorrelate the channels, and to crush Chroma more mercilessly than Luma) and (usually) Chroma subsampling, JPEG2000 uses wavelets and colour space transformation both in its lossy and lossless forms (though different wavelets, and a different colour space transformation) and also supports subsampling though dropping a wavelet scale achieves a similar effect.

Starting point for image recognition?

I have a set of 274 color images (each one is 200x150 pixels). Each image is visually distinct. I would like to build an app which accepts an up/down-scaled version of one of the base set of images and determines the closest match.
I'm a senior software engineer but am totally new to image recognition. I'd really appreciate any recommendations as to where to start.
If you're comparing extremely similar images, it's in theory sufficient to calculate the Euclidean distance between the 2 images. The images must be the same size to do so, so it is often necessary to rescale an image to do so (generally the larger image is scaled down). Note that aliasing issues can happen here, so pay some attention to your downsampling algorithm. There's also an issue if your images don't have the same aspect ratio.
However, this is almost never done in practice since it's extremely slow. For N images of size WxH and 3 color channels, it requires N x W x H x 3 comparisons, which quickly gets unworkable (consider that many users can have over 1000 images of size >1000x1000).
Generally we attempt to reduce the image to a smaller array that captures the image information much more briefly, called a visual descriptor. For example taking a 1024x1024x3 image and reducing it to a 128 length vector. This needs only be calculated once for the reference images, and then stored in an appropriate data structure. Then we can compare the descriptor for the query image against the descriptor for the reference images.
The cost of calculating the distance for our dataset of N images for a descriptor of length L is then N x L instead of the original N x W x H x 3
So the issue is to find efficient descriptors that are (a) cheap to compute and (b) capture the image accurately. This is still an active area of research, but I can suggest some:
Histograms are probably the simplest way to do this, although they do very poorly with any illumination change and incorporate only color information, no spatial information. Make sure you normalise your histogram before doing any comparison
Perceptual hashing works well with very similar images or slightly cropped images. See here
GIST descriptors are powerful, but more complex, see here

layout algorithm for tightly-packed image thumbnails

I'm working on an image gallery and I'd like to tightly pack the image thumbnails. The thumbnails are:
different aspect ratios
available at the same source resolution (longest edge 256 pixels)
I'd like to come up with an optimal solution (will probably have to be a heuristic) that allowed me to balance:
the padding between each thumbnail (preferably constant)
the consistency of thumbnail size (preferably all the same size)
the amount of each image that gets cropped for the display (preferably none)
the proximity of images consistent with their sort order (preferably sort-neighbours will be near one another in the grid)
I think this is a variant of the rectangle packing problem.
I've found some good references: Fast Optimizing Rectangle Packing Algorithm for Building CSS Sprites
But I wanted to check with the experts to see if anyone knows of:
any established algorithms that are available publicly,
any open source libraries that implement them or
any other mathematical references or guidance that might help me produce something as good as: http://labs.tineye.com/multicolr#colors=4b669e;weights=100;
I have come up with something like this (now also with code on github)
https://mendrik.github.io/diorama/
I should add that the order will be random, and the sizes try to be uniform, but for me it was more important to fill out the entire space rather than keep the sizes consistent. You can resize the browser window to see how it works.
If your height is not fixed, there are several other options, mostly knapsack or partitioning algorithms. 2d bin backing will leave you with gaps or wont find solutions that always fit all images.
my algorithm has almost no cropping and fits always all images into the given space, provided there are enough combinations to do so. the less images the more cropping obviously.

How do I locate black rectangles in a grid and extract the binary code from that

i'm working in a project to recognize a bit code from an image like this, where black rectangle represents 0 bit, and white (white space, not visible) 1 bit.
Somebody have any idea to process the image in order to extract this informations? My project is written in java, but any solution is accepted.
thanks all for support.
I'm not an expert in image processing, I try to apply Edge Detection using Canny Edge Detector Implementation, free java implementation find here. I used this complete image [http://img257.imageshack.us/img257/5323/colorimg.png], reduce it (scale factor = 0.4) to have fast processing and this is the result [http://img222.imageshack.us/img222/8255/colorimgout.png]. Now, how i can decode white rectangle with 0 bit value, and no rectangle with 1?
The image have 10 line X 16 columns. I don't use python, but i can try to convert it to Java.
Many thanks to support.
This is recognising good old OMR (optical mark recognition).
The solution varies depending on the quality and consistency of the data you get, so noise is important.
Using an image processing library will clearly help.
Simple case: No skew in the image and no stretch or shrinkage
Create a horizontal and vertical profile of the image. i.e. sum up values in all columns and all rows and store in arrays. for an image of MxN (width x height) you will have M cells in horizontal profile and N cells in vertical profile.
Use a thresholding to find out which cells are white (empty) and which are black. This assumes you will get at least a couple of entries in each row or column. So black cells will define a location of interest (where you will expect the marks).
Based on this, you can define in lozenges in the form and you get coordinates of lozenges (rectangles where you have marks) and then you just add up pixel values in each lozenge and based on the number, you can define if it has mark or not.
Case 2: Skew (slant in the image)
Use fourier (FFT) to find the slant value and then transform it.
Case 3: Stretch or shrink
Pretty much the same as 1 but noise is higher and reliability less.
Aliostad has made some good comments.
This is OMR and you will find it much easier to get good consistent results with a good image processing library. www.leptonica.com is a free open source 'C' library that would be a very good place to start. It could process the skew and thresholding tasks for you. Thresholding to B/W would be a good start.
Another option would be IEvolution - http://www.hi-components.com/nievolution.asp for .NET.
To be successful you will need some type of reference / registration marks to allow for skew and stretch especially if you are using document scanning or capturing from a camera image.
I am not familiar with Java, but in Python, you can use the imaging library to open the image. Then load the height and the widths, and segment the image into a grid accordingly, by Height/Rows and Width/Cols. Then, just look for black pixels in those regions, or whatever color PIL registers that black to be. This obviously relies on the grid like nature of the data.
Edit:
Doing Edge Detection may also be Fruitful. First apply an edge detection method like something from wikipedia. I have used the one found at archive.alwaysmovefast.com/basic-edge-detection-in-python.html. Then convert any grayscale value less than 180 (if you want the boxes darker just increase this value) into black and otherwise make it completely white. Then create bounding boxes, lines where the pixels are all white. If data isn't terribly skewed, then this should work pretty well, otherwise you may need to do more work. See here for the results: http://imm.io/2BLd
Edit2:
Denis, how large is your dataset and how large are the images? If you have thousands of these images, then it is not feasible to manually remove the borders (the red background and yellow bars). I think this is important to know before proceeding. Also, I think the prewitt edge detection may prove more useful in this case, since there appears to be less noise:
The previous method of segmenting may be applied, if you do preprocess to bin in the following manner, in which case you need only count the number of black or white pixels and threshold after some training samples.

Get dominant colors from image discarding the background

What is the best (result, not performance) algorithm to fetch dominant colors from an image. The algorithm should discard the background of the image.
I know I can build an array of colors and how many they appear in the image, but I need a way to determine what is the background and what is the foreground, and keep only the second (foreground) in mind while read the dominant colors.
The problem is very hard especially for gradient backgrounds or backrounds with patterns (not plain)
Isolating the foreground from the background is beyond the scope of this particular answer, but...
I've found that applying a pixelation filter to an image will draw out a really good set of 'average' colours.
Before
After
I sometimes use this approach to derive a pallete of colours with a particular mood. I first find a photograph with the general tones I'm after, pixelate and then sample from the resulting image.
(Thanks to Pietro De Grandi for the image, found on unsplash.com)
The colour summarizer is a pretty sweet spot for info on this subject, not to mention their seemingly free XML Web API that will produce descriptive colour statistics for an image of your choosing, reporting back the following formatted with swatches in HTML or as XML...
what is the average color hue, saturation and value in my image?
what is the RGB colour that is most representative of the image?
what do the RGB and HSV histograms look like?
what is the image's human readable colour description (e.g. dark pure blue)?
The purpose of this utility is to generate metadata that summarizes an
image's colour characteristics for inclusion in an image database,
such as Flickr. In particular this tool is being used to generate
metadata for Flickr's Color Fields group.
In my experience though.. this tool still misses the "human-readable" / obvious "main" color, A LOT of the time. Silly machines!
I would say this problem is closer to "impossible" than "very hard". The only approach to it that I can think of would be to make the assumption that the background of an image is likely to consist of solid blocks of similar colors, while the foreground is likely to consist of smaller blocks of dissimilar colors.
If this assumption is generally true, then you could scan through the whole image and weight pixels according to how similar or dissimilar they are to neighboring pixels. In other words, if a pixel's neighbors (within some arbitrary radius, perhaps) were all similar colors, you would not incorporate that pixel into the overall estimate. If the neighbors tend to be very different colors, you would weight the pixel heavily, perhaps in proportion to the degree of difference.
This may not work perfectly, but it would definitely at least tend to exclude large swaths of similar colors.
As far as my knowledge of image processing algorithms extends , there is no certain way to get the "foreground"; it is only possible to get the borders between objects. You'll probably have to make do with an average, or your proposed array count method. In that, you'll want to give colours with higher saturation a higher "score" as they're much more prominent.

Resources