Creating a unique hash based on the CONTENTS of an image (PNG) in Ubuntu? - image

For the purposes of insuring images are not tampered with, I would like to create a unique hash based on the contents of an image file (a PNG specifically). I've googled, and I know it's very possible to create a hash based on a file, but it seems to take into account things other hten the contents of the image?
For instance, to test, I create a very large PNG file with random colors/lines/shapes/etc. Then I saved the file as test1.png. I then created a single pixel black dot in the corner of the image and saved as test2.png.
I ran md5sum on both images, and got different hash values (expected). I then downloaded test2.png, removed the single black pixel, and saved the file as test3.png. test3.png and test1.png contain the EXACT same image.
Now, from what I understand PNG should be a lossless compression, so that shouldn't be the issue (?). I'm a bit in the dark (as you can probably tell) about all of this, so if anybody can give me any ideas, I'd much appreciate it!

You didn't say so, but I guess you are getting different hashes for test1.png and test3.png?
PNG files can contain a fair bit of metadata in addition to the image data; it's possible that some of the metadata is different. It's also possible for the same image data to be compressed in different ways. If you really want to know, compare the files to find out what exactly is different.
If you really want to hash just the contents of the files, you'll most likely have to convert them to a raw RGB format and hash that instead.

Related

Non-deterministic* data in header/beginning of PNG files

I noticed that PNG files created by Gimp from the same RPG data are identical except for the very beginning. This image shows a diff of otherwise identical PNG files created with Gimp:
What is this data which changes each time and how is it encoded? Are there tools to decode it? Can you learn something from this information, e.g. can you find out when a PNG file was (probably) created by this information?
I was under the impression that PNG files are created deterministically* and don't store meta data which isn't necessary to decode the image. (Obviously, the last part is not true, either, as Gimp writes its own name into the files but doesn't ask the user (which is does if you export something as a JPEG file).)
 * I use the word "deterministic" here to refer to things and only such which are the same on each execution/export/whatever given the same input. I'd usually use the word "functional" (i.e. like a mathematical function) but I fear this could be misunderstood by people who don't know what "functional" means in mathematics. Obviously, this is different from the usage of this word in information theory.
See the PNG header definition.
tIME stores the time that the image was last changed, so for me it's the same as the timestamp of the file you create.
bKGD gives the default background color. Possibly the bakcgournd color you are using in Gimp, or the color of the transparent pixels.
tEXT with key Comment and value Created with Gimp is just the default comment. You can change the comment for the image in Image>Properties and you can set a default comment in Edit>Preferences>Default Image
When I export the same PNG twice, I only see a change in tIME. In fact I can't get a bKGD item, even when exporting a PNG with transparent pixels. Are you using any specific options when exporting?

VBA to read image directly from word document for Base64 Conversion

Basically I have a document with lots of small images and I need a fast way to build a list of unique images (and identify them in the document), I was thinking of using Base64 conversion to store each unique image in an xml file (or simply store the MD5 hash of each image).
If the images were saved as files somewhere (i.e outside of the document) I'd be able to do this but (unless there's good reason not to) I'd like to learn how to read the image directly from the document.
Specifically, if I have myDoc.InlineShapes(myImageIndex) how can I most efficiently PREPARE that to either convert to Base64 or create a hash?
All the examples I can find assume the image is loaded from file, I'm hoping to load the image directly from the document... (e.g. Convert image (jpg) to base64 in Excel VBA?)
Many thanks in advance,

EXIF and thumbnails

I'm working on a photo viewer. In this context, I wrote a small class to be able to read and use some EXIF data, as e.g. image orientation. This class works well for reading.
However, I would add a new option to rotate photos. I want to rotate and write the photo data itself, not just rewrite the orientation tag. I already wrote the code to rotate and save the primary JPEG image, and it works well. But I also need to rotate the thumbnail contained in the EXIF data, if any, to keep the image coherent. For this reason I need to write in the EXIF data, to replace the existing thumbnail.
But this raises some questions, that I've some trouble answering, namely:
Can the EXIF data contains more than 1 thumbnail, and if yes, what is the maximum thumbnail count that an image can contain?
What are the supported formats for thumbnails? (I found JPEG and TIFF, are there other?)
Is there any guarantee in the EXIF standards that the thumbnails are always written in the late EXIF data, just before the primary image?
If not, then each tags containing an offset that points to a location beyond the thumbnail to replace should be updated. So, is there a standard way to iterate through all tags and sub-directories, to recognize which EXIF tags contain offsets, and to update them if needed? Or the only way is to read a maximum of tags and rewrite only that are known?
Or is there a way to guarantee that the size of the newly rotated thumbnail will be smaller or equal to previous thumbnail size to replace with?
Regards
Here are some answers for your questions:
1) The EXIF data is laid out like a TIFF file with 2 pages. The first page is the camera information and the second page is the thumbnail. If you add more pages (with thumbnails), 99.99% of the applications probably won't notice since you'll be doing it differently than the "standard" way. As far as "maximum count", you have 64k of data that can be stored in any JFIF tag. You can put what you want in that 64k.
2) There is only 1 supported EXIF thumbnail format: TIFF. Inside the TIFF there can be compressed (JPEG) or uncompressed data. Again, you're welcome to stick LZW-compressed data in there, but most apps probably won't be prepared to display it properly.
3) The JFIF container format allows for tags with metadata to appear before the main image. The APPx tags contain metadata that can follow the standard or not. You're welcome to stick multiple EXIF APP1 tags into your files, but again, most apps won't be able to properly handle that situation. So the simple answer is that the EXIF data (including thumbnail) must come before the main image and if you put more than 1 thumbnail it will most likely be ignored.
4) If you are modifying a JFIF (including the metadata), you must rewrite the metadata. It's actually quite simple because each tag is independent and has a simple length value instead of relative offsets.
5) You can do anything you want with the size/orientation of your thumbnail as long as you make the EXIF APP1 tag data total size fit within 64k.
Here's what you need to do...
1) Read the source image (and thumbnail if present).
2) Prepare your rotated image (and thumbnail).
3) Write the new metadata with the new thumbnail image.
4) Write the new main image.
If you want to preserve the original metadata along with your new thumbnail, it's pretty easy. Just read the original tags and hold on to them, then write them in the new image. Each JFIF tag is just a 2 byte identifier (FFxx) followed by a 2 byte length and then the data. They can be packed in almost any order and there's no hard limit on how many total tags can appear before the main image.

Can I edit the thumbnail image inside JFIF files?

Can I edit the thumbnail image inside JPG/JFIF files?
If this is possible--how so (using what utility)?
The end result needs to be that the thumbnail image "can" be a wholly different image than the jpeg.
Thank you much,
Michael
Typically, thumbnails are uncompressed RGB data. You locate the marker, see where the thumbnail's width/height are marked, then modify the byte stream following it. the stream is of length width*height*3 bytes.
If it's indexed, you'd have to overwrite the palette and the index entries. Just look for the APP0 marker, start modifying it.
A compliant EXIF thumbnail image must fit in the 64K APP1 marker and is usually compressed as JPEG (unlike what #Karthik says). The thumbnail image is independent of the main image and can easily be changed since it is inside a marker segment that doesn't affect the main image. The JPEG marker segments are basically a linked list of independent binary blobs with 2-byte identifiers (e.g. FFE1 in this case) and 2-byte lengths. You can swap out one for another and you won't "break" the file. There is no checksum or other mechanism that verifies the entire file data integrity. I'm not familiar with libraries to edit this information, but you can do it in a small amount of code that only has to parse the marker blobs type and length without knowing their contents. You can also do it the "quick and dirty" way by ensuring that your new thumbnail is no larger than the original and then you can just write it in it's place without moving the other parts of the file around. The marker length is not checked against its contents, so unused space is ignored.

How can I store raw data in an image file?

I have some raw data in a file that I would like to store in an image file (bmp, jpg, png, or even gif (eegad)). I would like this to be a two way process: I need to be able to reliably convert the image file back later and get a file that is identical to the original file.
I am not looking for a how-to on steganography; the image file will probably be one pixel wide and millions of pixels high and look like garbage. That is fine.
I looked into the Imagemagick utility convert, but am intimidated by the large number of options and terse man page. I am guessing I could just use this to convert from a 'raw' black channel to png, but would have to specify a bunch of other stuff. Any hints? I would prefer to work within Imagemagick or using Linux utilities.
If you are wondering, there's nothing black hat or cloak and dagger about my request. I simply want to automatically backup some important data to a photo-sharing site.
I'd plow into ImageMagick if that's what you'd prefer anyway.
Specific image formats support storing text data to different degrees, and ImageMagick supports all of the formats you mentioned. I'd choose the one that lets you store what you need.

Resources