If there is a two dimensional array of booleans, would there be an efficient way to represent this as an image and save this using minimal space? So just one bit for every pixel, having it either be black or white?
so you could put a bitmap header on a buffer and display it this way, you wont save any memory, but you will be able to view it... if you are looking to save space there are lots of lossless encoding techniques... huffman coding and lzw are some methods... some of those methods get grouped into formats like zip, bzip, gzip, deflate, etc
My guess is no, because in an image, the color attribute of each pixel is represented by at least 8 bits. So you in effect are using an 8-bit byte to store a value which can be represented by one solitary bit (0 or 1).
In addition, there are other attributes that describe each pixel in an image, including alpha channel, opacity, and so on.
So, in short, although it may be visually pleasing to use images to store binary data, it would in fact be using way more storage space.
Most programming languages have native support for biinary data, and these provide much more efficient storage for them.
Related
I am working on a project I wanted to do for quite a while. I wanted to make an all-round huffman compressor, which will work, not just in theory, on various types of files, and I am writing it in python:
text - which is, for obvious reasons, the easiet one to implement, already done, works wonderfully.
images - this is where I am struggling. I don't know how to approach images and how to read them in a simple way that it'd actually help me compress them easily.
I've tried reading them pixel by pixel, but somehow, it actually enlarges the picture instead of compressing it.
What I've tried:
Reading the image pixel by pixel using Image(PIL), get all the pixels in a list, create a freq table (for each pixel) and then encrypt it. Problem is, imo, that I am reading each pixel and trying to make a freq table out of that. That way, I get way too many symbols, which leads to too many lengthy huffman codes (over 8 bits).
I think I may be able to solve this problem by reading a larger set of pixels or anything of that sort because then I'd have a smaller code table and therefore less lengthy huffman codes. If I leave it like that, I can, in theory, get 255^3 sized code table (since each pixel is (0-255, 0-255, 0-255)).
Is there any way to read larger amount of pixels at a time (>1 pixel) or is there a better way to approach images when all needed is to compress?
Thank you all for reading so far, and a special thank you for anyone who tries to lend a hand.
edited: If huffman is a real bad compression algorithm for images, are there any better ones you can think off? The project I'm working on can take different algorithms for different file types if it is neccessary.
Encoding whole pixels like this often results in far too many unique symbols, that each are used very few times. Especially if the image is a photograph or if it contains many coloured gradients. A simple way to fix this is splitting the image into its R, G and B colour planes and encoding those either separately or concatenated, either way the actual elements that are being encoded are in the range 0..255 and not multi-dimensional.
But as you suspect, exploiting just 0th order entropy is not so great for many images, especially photographs. As example of what some existing formats do, PNG uses filters to take some advantage of spatial correlation (great for smooth gradients), JPG uses quantized discrete cosine transforms and (usually) a colour space transformation to YCbCr (to decorrelate the channels, and to crush Chroma more mercilessly than Luma) and (usually) Chroma subsampling, JPEG2000 uses wavelets and colour space transformation both in its lossy and lossless forms (though different wavelets, and a different colour space transformation) and also supports subsampling though dropping a wavelet scale achieves a similar effect.
I remember a story about someone filtering images with a spam filter which he fed with some training data.
I come to the point where I exactly need something like this.
I have a lot different types of images (mainly people, e.g. selfies, group pictures, portraits, ..) but I only want a certain type (e.g. only male) of them.
With the right algorithm and training data I think it's possible to get it to the point where I can pass an image to it and i get true or false whether it matches my type or not.
I had a look at a few Face/Gender Detection APIs, but none of them worked for me that's why I want to try the approach with the spam-filter - seems like a funny idea.
Here's what I need:
a trainable spam-filter algorithm/code sample/API
has to work offline
preferably for C# or Java
I already spent a few hours trying different things and googling, now I'm here and I'd like to get your opinion on this problem and the solution you think is appropriate.
Buddha
There is a simple image comparison algorithm that you can read about here: compareImages php class.
Basically the way it works is this:
it takes an image (a cropped image would be best), scales it down to a 8x8 pixels image, converts it to a BW / Greyscale image, and then it calculates the mean value of the pixels (which is the average value).
Then it goes over all the pixels of the scaled image (64 pixels), and in every pixel where the pixel's value >= the mean value, it puts "1", and if the pixel's value < the mean value, it puts "0", resulting in a 64bit "signature" value of 0s and 1s.
This signature value is what identifies the image, and then you can save this signature value in some kind of a database, as your "learned" filter.
Then if an email arrives with some images.. you can just crop them, and scan them, produce a signature, and see if it matches any known signature in your database.
The good things about this algorithm are:
It is very fast and scalable (scaling an image down to 8x8 is fast, and scanning the pixels as described is fast too).
Because it converts the image to greyscale & resizes it down, it means it can detect any color variations or sizes of the same image.
Because you use 64bit signatures, it doesn't take alot of space in your database as well.
Hope this helps.
I would like to know why we need to decode let's say a png to a bitmap in order to show the image.
Why not just show the png like that (encoded).
I'm asking here a moron type of question on purpose. It's clear to me it's impossible to show an encoded image just like that but I want to know why, and how an image is shown on a screen because it's easy just to do :
canvas.drawBitmap(((AndroidImage)Image).bitmap, x, y, null);
I want to understand the full of it. I'm guessing we need to show every pixels one by one, but I want more details.
It's easy to know how to do, it's a bit harder to understand why.
If someone has a course/tuto/article/explanation that explains it... I would appreciate
Thanks in advance
PS : Please don't respond "you need to decode/convert png to bitmap" I know that... And that's not my question
There are lots of reasons. There is not really a direct relation between 'a value in a file' and 'a pixel on a screen'.
You need to know the width and height of the bitmap. You cannot infer this from the image size -- it has to be stored somewhere inside the image file. (Or anywhere else. Point is, you have to know its size.)
You need to know the bit depth and color model of the bitmap. You cannot meaningfully copy an 8-bit indexed image directly onto a screen that accepts 32-bit BGR ordering with an unused byte, for example.
Your example, the PNG file format, specifies that all image data is compressed. This is for a sane reason: the PNG format was designed for use on web pages, in a time period where every byte still counted. But even the lowly simple BMP file format uses a very specific form of 'encoding': in its 24-bit format, every line consists of sets of BGR values for each pixel and is padded at the end with enough bytes to make its total length evenly divisible by 4.
JPEG uses an even more advanced encoding scheme (which is too difficult to explain in a few short words) so it can compress images even more. The advanced encoding scheme allows far more compression than regular methods (which in turn means there is only the tiniest relation between 'values in the file' and 'pixels on the screen').
I'm trying to implement JPEG Compression (or as close to it as I can), but there are some points that I need clarity on with the actual implementation. I will explain what I currently know and where I see the issues, if anyone could clear them up that would be fantastic.
The first step is to split the image into 8x8 blocks. But I am not sure on the best way to do this, for example, what dimension array would be best to use to store all of these segments considering that it will have to have chroma down sampled and then DCT applied. Would it be 3D array (two dimensions to store 2D elements of the image and then one dimension for the color channels) and then iterate through in groups of 8 or a 4D array (with an extra dimension for storing each 8x8 group) or another method entirely.
I can then potentially see issues with the chrominance down sampling because then the size of the array would have to change size once the number of chrominance values have been reduced and these would then have to be put into the DCT which cant really take all of the different sized arrays for chrominance and luminescence at the same time.
Also is the idea of the DCT that it takes all three color channels of the 8x8 group and then converts the three values to one value thus saving space or does it take each color channel one at a time (if so I don't really understand the point how converting to Fourier space makes the compression more efficient)? Also I have noticed that the values I get for the DCT are well out of bounds of 0-255 and are instead much higher. As far as I know these values for each 8x8 block would be divided by the IJG standard quantization matrix follow by different entropy encoding.
I realize this question covers a lot of areas and is quite messy, but I can provide any additional information if required, any help would be greatly appreciated.
"Digital Video Compression" by Peter Symes has a chapter on JPEG, and is a good introduction to compression in general.
The reference implementation for JPEG might be a good start.
I am wondering what the data structure is behind storing images with HDR data. I understand how regular images (rgba) and cubemaps are stored. I doubt its as simple as storing multiple images at different exposures inside the same file.
You've probably moved on long ago, but I thought it worth posting references for anyone else who happened upon this question.
Here is an old reference for the Radiance .pic (now .hdr) file format. The useful info starts at the bottom of page 29.
http://radsite.lbl.gov/radiance/refer/filefmts.pdf
excerpt:
The basic idea is to store a 1-byte mantissa for each of three
primaries, and a common 1-byte exponent. The accuracy of these values
will be on the order of 1% (+/-1 in 200) over a dynamic range from
10^-38 to 10^38.
And here is a more recent reference for JPEG HDR format: http://www.anyhere.com/gward/papers/cic05.pdf
It's generally a matter of increasing the range of values (in an HSV sense) representable, so you can use e.g. RGB[A] where each element is a 16-bit int, 32-bit int, float, double etc. instead of a JPEG-type-quality 8-bit int. There's a trade-off between increasing the range represented, retaining fine gradations within that range, and whether some particular intensity levels are given priority via some non-linearity in the mapping (e.g. storing a log of the value).
The raw file from the camera normally stores the 12-14bit values from the Bayer mask - so effectively a greeyscale. These are sometimes compressed losslessly (in Canon or Nikon) or as 16bit values (Olympus). The header also contains the white balance and gain calibrations for the red,green,blue masked pixels so you can generate a color image.
Once you have a color image you can store it however you want, normally 16bit RGB is the easiest.
Here is some information on the Radiance file format, used for HDR images. It uses 32-bit floating-point numbers.
First, I am not sure if there is a public format for storing multiple images at different exposures inside cause the usage is rare. Those multiple images are used as one sort of HDR sources, but they are not HDR, they are just normal LDR (L for low) or SDR (S for standard?) images encoded like JPEG from digital cameras.
It is more common to store resulting in HDR format and the point is just like everyone mentioned, in floating point.
There are some HDR formats:
OpenEXR
TIF
Radiance
...
You can get more info from wiki