I'm trying to remove damaged jpeg duplicates of some 27k photos. Unfortunately most of these are actually damaged duplicates showing half or less of the original image before cutting out to mess/grey.
Is there any intelligent algorithm to hash a picture that instead of hashing a reduced size version of the full image (as in aHash, pHash and dHash) does it pixel by pixel (starting top left and reading LTR)?
The thing is most algorithms just reduce the image size and then create an hash in order to compare the pictures. As these damaged files really lack most of the data it's impossible to compare the first "few lines or few pixels of an image". The only software coming close to this is AllDup, but it only does a comparison bit-by-bit and not checking the actual image data.
Would that even be possible?
Thanks in advance.
Related
I am working on a project I wanted to do for quite a while. I wanted to make an all-round huffman compressor, which will work, not just in theory, on various types of files, and I am writing it in python:
text - which is, for obvious reasons, the easiet one to implement, already done, works wonderfully.
images - this is where I am struggling. I don't know how to approach images and how to read them in a simple way that it'd actually help me compress them easily.
I've tried reading them pixel by pixel, but somehow, it actually enlarges the picture instead of compressing it.
What I've tried:
Reading the image pixel by pixel using Image(PIL), get all the pixels in a list, create a freq table (for each pixel) and then encrypt it. Problem is, imo, that I am reading each pixel and trying to make a freq table out of that. That way, I get way too many symbols, which leads to too many lengthy huffman codes (over 8 bits).
I think I may be able to solve this problem by reading a larger set of pixels or anything of that sort because then I'd have a smaller code table and therefore less lengthy huffman codes. If I leave it like that, I can, in theory, get 255^3 sized code table (since each pixel is (0-255, 0-255, 0-255)).
Is there any way to read larger amount of pixels at a time (>1 pixel) or is there a better way to approach images when all needed is to compress?
Thank you all for reading so far, and a special thank you for anyone who tries to lend a hand.
edited: If huffman is a real bad compression algorithm for images, are there any better ones you can think off? The project I'm working on can take different algorithms for different file types if it is neccessary.
Encoding whole pixels like this often results in far too many unique symbols, that each are used very few times. Especially if the image is a photograph or if it contains many coloured gradients. A simple way to fix this is splitting the image into its R, G and B colour planes and encoding those either separately or concatenated, either way the actual elements that are being encoded are in the range 0..255 and not multi-dimensional.
But as you suspect, exploiting just 0th order entropy is not so great for many images, especially photographs. As example of what some existing formats do, PNG uses filters to take some advantage of spatial correlation (great for smooth gradients), JPG uses quantized discrete cosine transforms and (usually) a colour space transformation to YCbCr (to decorrelate the channels, and to crush Chroma more mercilessly than Luma) and (usually) Chroma subsampling, JPEG2000 uses wavelets and colour space transformation both in its lossy and lossless forms (though different wavelets, and a different colour space transformation) and also supports subsampling though dropping a wavelet scale achieves a similar effect.
I'm trying to find the algorithm for IBM/Xerox algorithm: Recorded Image Data Inline Coding recording algorithm (RIDIC).
Within an IPDS print stream, the image gets wrapped in this RIDIC algorithm. I need to be able to take the stream and decode the image portion back to its original image. There is little-to-no information out there as far as I've been able to find.
Here is literally all the information I have on it so far from http://afpcinc.org/wp-content/uploads/2014/07/IPDS-Reference-10.pdf:
"The Recorded Image Data Inline Coding recording algorithm (RIDIC) formats a single image in the binary element sequence of a unidirectional raster scan with no interlaced fields and with parallel raster lines, from left
to right and from top to bottom."
"Each binary element representing an image data element after decompression, without grayscale, is 0 for an
image data element without intensity, and 1 for an image data element with intensity. More than one binary
element can represent an image data element after decompression, corresponding to a grayscale or color
algorithm. Each raster scan line is an integral multiple of 8 bits. If an image occupies an area whose width is
other than an integral multiple of 8 bits, the scan line is padded with zeros."
Any information to work with this algorithm would be greatly appreciated!
Most likely, you're making it a bigger thing than it really is. RIDIC is a recording algorithm: it is the format in which the original image data is arranged prior to compression. Only if the compression is set to "No Compression" would you have to deal with data in the recording format. And then, RIDIC is simply an ordering of bit groups that describe each pixel. E.g. if you had 16-level grayscale, RIDIC encodes each pixel in left-right,top-down order in a nibble, and pads to an even number of bytes.
We're building an online video editing service. One of the features allows users to export a short segment from their video as an animated gif. Imgur has a file size limit of 2Mb per uploaded animated gif.
Gif file size depends on number of frames, color depth and the image contents itself: a solid flat color result in a very lightweight gif, while some random colors tv-noise animation would be quite heavy.
First I export each video frame as a PNG of the final GIF frame size (fixed, 384x216).
Then, to maximize gif quality I undertake several gif render attempts with slightly different parameters - varying number of frames and number of colors in the gif palette. The render that has the best quality while staying under the file size limit gets uploaded to Imgur.
Each render takes time and CPU resources — this I am looking to optimize.
Question: what could be a smart way to estimate the best render settings depending on the actual images, to fit as close as possible to the filesize limit, and at least minimize the number of render attempts to 2–3?
The GIF image format uses LZW compression. Infamous for the owner of the algorithm patent, Unisys, aggressively pursuing royalty payments just as the image format got popular. Turned out well, we got PNG to thank for that.
The amount by which LZW can compress the image is extremely non-deterministic and greatly depends on the image content. You, at best, can provide the user with a heuristic that estimates the final image file size. Displaying, say, a success prediction with a colored bar. You'd can color it pretty quickly by converting just the first frame. That won't take long on 384x216 image, that runs in human time, a fraction of a second.
And then extrapolate the effective compression rate of that first image to the subsequent frames. Which ought to encode only small differences from the original frame so ought to have comparable compression rates.
You can't truly know whether it exceeds the site's size limit until you've encoded the entire sequence. So be sure to emphasize in your UI design that your prediction is just an estimate so your user isn't going to disappointed too much. And of course provide him with the tools to get the size lowered, something like a nearest-neighbor interpolation that makes the pixels in the image bigger. Focusing on making the later frames smaller can pay off handsomely as well, GIF encoders don't normally do this well by themselves. YMMV.
There's no simple answer to this. Single-frame GIF size mainly depends on image entropy after quantization, and you could try using stddev as an estimator using e.g. ImageMagick:
identify -format "%[fx:standard_deviation]" imagename.png
You can very probably get better results by running a smoothing kernel on the image in order to eliminate some high-frequency noise that's unlikely to be informational, and very likely to mess up compression performance. This goes much better with JPEG than with GIF, anyway.
Then, in general, you want to run a great many samples in order to come up with something of the kind (let's say you have a single compression parameter Q)
STDDEV SIZE W/Q=1 SIZE W/Q=2 SIZE W/Q=3 ...
value1 v1,1 v1,2 v1,3
After running several dozens of tests (but you need do this only once, not "at runtime"), you will get both an estimate of, say, , and a measurement of its error. You'll then see that an image with stddev 0.45 that compresses to 108 Kb when Q=1 will compress to 91 Kb plus or minus 5 when Q=2, and 88 Kb plus or minus 3 when Q=3, and so on.
At that point you get an unknown image, get its stddev and compression #Q=1, and you can interpolate the probable size when Q equals, say, 4, without actually running the encoding.
While your service is active, you can store statistical data (i.e., after you really do the encoding, you store the actual results) to further improve estimation; after all you'd only store some numbers, not any potentially sensitive or personal information that might be in the video. And acquiring and storing those numbers would come nearly for free.
Backgrounds
It might be worthwhile to recognize images with a fixed background; in that case you can run some adaptations to make all the frames identical in some areas, and have the GIF animation algorithm not store that information. This, when and if you get such a video (e.g. a talking head), could lead to huge savings (but would throw completely off the parameter estimation thing, unless you could estimate also the actual extent of the background area. In that case, let this area be B, let the frame area be A, the compressed "image" size for five frames would be A+(A-B)*(5-1) instead of A*5, and you could apply this correction factor to the estimate).
Compression optimization
Then there are optimization techniques that slightly modify the image and adapt it for a better compression, but we'd stray from the topic at hand. I had several algorithms that worked very well with paletted PNG, which is similar to GIF in many regards, but I'd need to check out whether and which of them may be freely used.
Some thoughts: LZW algorithm goes on in lines. So whenever a sequence of N pixels is "less than X%" different (perceptually or arithmetically) from an already encountered sequence, rewrite the sequence:
018298765676523456789876543456787654
987678656755234292837683929836567273
here the 656765234 sequence in the first row is almost matched by the 656755234 sequence in the second row. By changing the mismatched 5 to 6, the LZW algorithm is likely to pick up the whole sequence and store it with one symbol instead of three (6567,5,5234) or more.
Also, LZW works with bits, not bytes. This means, very roughly speaking, that the more the 0's and 1's are balanced, the worse the compression will be. The more unpredictable their sequence, the worse the results.
So if we can find out a way of making the distribution more **a**symmetrical, we win.
And we can do it, and we can do it losslessly (the same works with PNG). We choose the most common colour in the image, once we have quantized it. Let that color be color index 0. That's 00000000, eight fat zeroes. Now we choose the most common colour that follows that one, or the second most common colour; and we give it index 1, that is, 00000001. Another seven zeroes and a single one. The next colours will be indexed 2, 4, 8, 16, 32, 64 and 128; each of these has only a single bit 1, all others are zeroes.
Since colors will be very likely distributed following a power law, it's reasonable to assume that around 20% of the pixels will be painted with the first nine most common colours; and that 20% of the data stream can be made to be at least 87.5% zeroes. Most of them will be consecutive zeroes, which is something that LZW will appreciate no end.
Best of all, this intervention is completely lossless; the reindexed pixels will still be the same colour, it's only the palette that will be shifted accordingly. I developed such a codec for PNG some years ago, and in my use case scenario (PNG street maps) it yielded very good results, ~20% gain in compression. With more varied palettes and with LZW algorithm the results will be probably not so good, but the processing is fast and not too difficult to implement.
I have some JPEG images that I need scale down to about 80% of original size. Original image dimension are about 700px × 1000px. Images contain some computer generated text and possibly some graphics (similar to what you would find in corporate word documents).
How to scale image so that the text is as legible as possible? Currently we are scaling the imaeg down using bicubic interpolation, but that makes the text blurry and foggy.
Two options:
Use a different resampling algorithm. Lanczos gives you a much less blurrier result.
You ight use an advances JPEG library that resamples the 8x8 blocks to 6x6 pixels.
If you are not set on exactly 80% you can try getting and building djpeg from http://www.ijg.org/ as it can decompress your jpeg to 6/8ths (75%) or 7/8ths (87.5%) size and the text quality will still be pretty good:
Original
7/8
3/4
(SO decided to scale the images when showing them inline)
There may be a scaling algorithm out there that works similarly, but this is an easy off the shelf solution.
There is always a loss involved in scaling down, but it again depends of your trade offs.
Blurring and artifact generation is normal for jpeg images, so its recommended that you generate images is the correct size the first time.
Lanczos is a fine solution, but you have your trade offs
If its just the text and you are concerned about it, you could try dilation filter over the resampled image. This would correct some blurriness but may also affects the graphics. If you can live with it, its good. Alternatively if you can identify the areas of text, you can apply dilation just over those areas.
The JPEG compression encoding process splits a given image into blocks of 8x8 pixels, working with these blocks in future lossy and lossless compressions. [source]
It is also mentioned that if the image is a multiple 1MCU block (defined as a Minimum Coded Unit, 'usually 16 pixels in both directions') that lossless alterations to a JPEG can be performed. [source]
I am working with product images and would like to know both if, and how much benefit can be derived from using multiples of 16 in my final image size (say, using an image with size 480px by 360px) vs. a non-multiple of 16 (such as 484x362). In this example I am not interested in further alterations, editing, or recompression of the final image.
To try to get closer to a specific answer where I know there must be largely generalities: Given a 480x360 image that is 64k and saved at maximum quality in Photoshop [example]:
Can I expect any quality loss from an image that is 484x362
What amount of file size addition can I expect (for this example, the additional space would be white pixels)
Are there any other disadvantages to growing larger than the 8px grid?
I know it's arbitrary to use that specific example, but it would still be helpful (for me and potentially any others pondering an image size) to understand what level of compromise I'd be dealing with in breaking the non-8px grid.
The key issue here is a debate I've had is whether 8-pixel divisible images are higher quality than images that are not divisible by 8-pixels.
8 pixels is the cutoff. The reason is because JPEG images are simply an array of 8x8 DCT blocks; if the image resolution isn't mod8 in both directions, the encoder has to pad the sides up to the next mod8 resolution. This in practice is not very expensive bit-wise; what's much worse are the cases when an image has sharp black lines (such as a letterboxed image) that don't lie on block boundaries. This is especially problematic in video encoding. The reason for this being a problem is that the frequency transform of a sharp line is a Gaussian distribution of coefficients--resulting in an enormous number of bits to code.
For those curious, the most common method of padding edges in intra compression (such as JPEG images) is to mirror the lines of pixels before the edge. For example, if you need to pad three lines and line X is the edge, line X+1 is equal to line X, line X+2 is equal to line X-1, and line X+3 is equal to line X-2. This quite effectively minimizes the cost in transform coefficients of the extra lines.
In inter coding, however, the padding algorithms generally simply duplicate the last line, because the mirror method does not work well for inter compression, such as in video compression.
Sometimes you need to use 16 pixel boundaries rather than 8 because of subsampling; every 2nd pixel is thrown away during the encoding process, and those 8x8 DCT blocks started out as 16x16 and will decode back to 16x16. This won't be a problem at the highest quality settings.
A JPG with sizes being multiplies of 8 can also be rotated/flipped with no quality loss. For example gthumb can do this on Linux.
The image dimensions being multiples of 8 or 16 is not going to affect the size on disk very much, but you can get dramatic savings if you can line up the visual contents to the 8x8 pixel grid, such as if there is a repeating pattern or texture in the image.
What Tometzky said. If you don't have the correct multiple, the lossless flip and rotate algorithms don't work. That's because the padding on the right/bottom that can be safely ignored now ends up on the left/top, where it can't.