EXIF and thumbnails - image

I'm working on a photo viewer. In this context, I wrote a small class to be able to read and use some EXIF data, as e.g. image orientation. This class works well for reading.
However, I would add a new option to rotate photos. I want to rotate and write the photo data itself, not just rewrite the orientation tag. I already wrote the code to rotate and save the primary JPEG image, and it works well. But I also need to rotate the thumbnail contained in the EXIF data, if any, to keep the image coherent. For this reason I need to write in the EXIF data, to replace the existing thumbnail.
But this raises some questions, that I've some trouble answering, namely:
Can the EXIF data contains more than 1 thumbnail, and if yes, what is the maximum thumbnail count that an image can contain?
What are the supported formats for thumbnails? (I found JPEG and TIFF, are there other?)
Is there any guarantee in the EXIF standards that the thumbnails are always written in the late EXIF data, just before the primary image?
If not, then each tags containing an offset that points to a location beyond the thumbnail to replace should be updated. So, is there a standard way to iterate through all tags and sub-directories, to recognize which EXIF tags contain offsets, and to update them if needed? Or the only way is to read a maximum of tags and rewrite only that are known?
Or is there a way to guarantee that the size of the newly rotated thumbnail will be smaller or equal to previous thumbnail size to replace with?
Regards

Here are some answers for your questions:
1) The EXIF data is laid out like a TIFF file with 2 pages. The first page is the camera information and the second page is the thumbnail. If you add more pages (with thumbnails), 99.99% of the applications probably won't notice since you'll be doing it differently than the "standard" way. As far as "maximum count", you have 64k of data that can be stored in any JFIF tag. You can put what you want in that 64k.
2) There is only 1 supported EXIF thumbnail format: TIFF. Inside the TIFF there can be compressed (JPEG) or uncompressed data. Again, you're welcome to stick LZW-compressed data in there, but most apps probably won't be prepared to display it properly.
3) The JFIF container format allows for tags with metadata to appear before the main image. The APPx tags contain metadata that can follow the standard or not. You're welcome to stick multiple EXIF APP1 tags into your files, but again, most apps won't be able to properly handle that situation. So the simple answer is that the EXIF data (including thumbnail) must come before the main image and if you put more than 1 thumbnail it will most likely be ignored.
4) If you are modifying a JFIF (including the metadata), you must rewrite the metadata. It's actually quite simple because each tag is independent and has a simple length value instead of relative offsets.
5) You can do anything you want with the size/orientation of your thumbnail as long as you make the EXIF APP1 tag data total size fit within 64k.
Here's what you need to do...
1) Read the source image (and thumbnail if present).
2) Prepare your rotated image (and thumbnail).
3) Write the new metadata with the new thumbnail image.
4) Write the new main image.
If you want to preserve the original metadata along with your new thumbnail, it's pretty easy. Just read the original tags and hold on to them, then write them in the new image. Each JFIF tag is just a 2 byte identifier (FFxx) followed by a 2 byte length and then the data. They can be packed in almost any order and there's no hard limit on how many total tags can appear before the main image.

Related

Converting PDF to images of original size

I have a PDF file which is made of photographs of a book connected in a single PDF file. I'm trying to convert it back to single images in PNG format, every tool I tried asks me to set DPI which alters the size of resulting images, is there a way to get images of the exact same pixel size the original images were?
Most PDFs of books contain a single image per page and depending on the scanner these images can basically be in three different formats: JPEG, JPEG2000 or TIFF. JPEG2000 is rarely used, so your PDF probably contains JPEG and/or TIFF images.
The good thing about JPEG (and JPEG2000) images is that they can be embedded as-is into a PDF! So you can extract the images as they are stored in the PDF. With TIFF this is also sometimes possible (but I don't think always...).
As mentioned by Tim Roberts you should try using pdfimages or hexapdf images to view and extract the images stored in the PDF. This will give you the best result.

Create small high quality PDF embedding optimized PNG?

I'm trying to create a small PDF file, embedding one optimized PNG image displayed as a header and footer on a 3 page PDF (same image must appear 6x in the PDF)
My optimized PNG image is only 2.3KB. It looks very sharp.
Failed with libreoffice
When I insert just one instance of the 2.3KB PNG image into a Libreoffice Writer doc containing only text, then export as PDF I can see that the image gets re-compressed to JPG and the resulting PDF file grows by about 40KB after adding the image. It also loses quality, the PNG also gets JPG fuzzy edges.
If I right click the image and select compression, there is no way to disable recompressing the image (it's already optimized better than libreoffice could do it) I've tried setting a compression level of 0,1,9 etc. Choosing JPG, no resize, lossless, etc but there was no improvement.
Failed with wkhtmltopdf
I also tried making a test page and used wkhtml2pdf but it did the same thing. Adding the low quality flag made no difference.
PDF Spec suggests PNG is supported?
From skimming the PDF spec, it looks like PNG images are supported.
Even plain text PDF files are surprisingly large
The disappointing thing is also when I take a 7KB HTML file which is basically just <html><body><p>foo...</p><p>bar...</p> (only about 15 paragraphs) with no CSS. The resulting 2 page PDF file is 30KB. Why should a 7kb (almost plain text) file become 30kb as a PDF?
Suggestions?
Can someone please suggest how to make a small PDF file in Linux?
I need to include 7KB of text and repeat one PNG image 6 times.
Manually or programatically. I'll take whatever I can get at this point.
PDF Spec suggests PNG is supported?
PNG isn't supported per se; PDF allows embedding JPEG images as-is, but not PNG images. PDF does borrow a set of features of the PNG format, however.
rinohtype (full disclosure: I'm the author) tries to embed as much as possible from PNG images as-is into the PDF. This does involve some bit-juggling to separate the alpha channel from the color data for example, but no reencoding of the image is performed. It does not (yet) support interlaced PNGs.
rinohtype should be able to do what you want to achieve. But please note that it currently is in a beta stage, so you might encounter some bugs.
Even plain text PDF files are surprisingly large
To keep the PDF size as small as possible, make sure not to embed/subset any of the fonts. Use only the fonts from the base 14 PDF fonts which are provided by PDF readers.
What you want is certainly achievable. Regarding the image quality, I would recommend making your image twice the size that you want it to actually display at in the PDF to keep it looking sharp.
As to the size, I've just modified a test in my PDF writer module (WIP..) to include a 7.2K png, 200px x 70px, in a PDF twice and the PDF came out at 6.8K 8). There's not much text included, but more text will only add what it's worth + a small percentage.
You can see the module and original test here.. https://github.com/DoccaPDF/docca-pdf-writer/blob/master/src/tests/writer.js#L40
That test adds ~112K of images to the PDF and results in a 103K PDF.
Of course not all images are created equal so you milage may vary..
*the images are only actually added to the PDF once, but are displayed multiple time.

Convert image to Blob

I want to upload image data to a php script on the server. I have a URL for an image source (PNG, the image might be located on a different server). I load this into a Javascript image, draw this into a canvas and use the canvas.toBlob() method (or a polyfill as it is not mainly supported yet) to generate a blob holding the image data. This works fine, but I recognized that the resulting blob size is much bigger than the original image data.
In contrast if I use a HTML File input and let the user select an image on the client the resulting blob has equal size to the original image. Can I get image data from a canvas that is equal to the original image size?
I guess the reason is that I loose the PNG (or any image compression) when using the canvas.toBlob() polyfill:
value: function (callback, type, quality) {
var binStr = atob(this.toDataURL(type, quality).split(',')[1]),
len = binStr.length,
arr = new Uint8Array(len);
for (var i=0; i<len; i++ ) {
arr[i] = binStr.charCodeAt(i);
}
callback(new Blob([arr], {type: type || 'image/png'}));
}
I am confused by so many conversion steps via image, canvas, blob - so maybe there is an alternative to get the image data from a given URL and finally append it to FormData to send it to the server?
The method toDataURL when using the png format only uses a limited set of the possible formats available for PNG files. It is the 8bit per channel RGBA (32 bits) compressed format. There are no options to use any of the other formats available so you are forced to include redundant data when you save as a PNG. PNG also has a 24bit and 8 bit format. PNG also has several compression options available though I am unsure which is used but each browser.
In most cases it is best to send the original image. If you need to modify the image and do not use the alpha channel (no transparency) but still want the quality to be high send it as a jpeg with quality set to 1 (max).
You may also consider the use of a custom encoder for PNG that gives you access to more of the PNG encoding options, or even try one of the many other formats available, or make up your own format, though you will be hard pushed to improve on jpeg and webp.
You could also consider compressing the data on the server when you store it, even jpeg and webp have a little room for more compression. For transport you should not worry as most data these days is compressed as it leaves the page and most definitely compressed by the time it leaves the clients ISP

Can I edit the thumbnail image inside JFIF files?

Can I edit the thumbnail image inside JPG/JFIF files?
If this is possible--how so (using what utility)?
The end result needs to be that the thumbnail image "can" be a wholly different image than the jpeg.
Thank you much,
Michael
Typically, thumbnails are uncompressed RGB data. You locate the marker, see where the thumbnail's width/height are marked, then modify the byte stream following it. the stream is of length width*height*3 bytes.
If it's indexed, you'd have to overwrite the palette and the index entries. Just look for the APP0 marker, start modifying it.
A compliant EXIF thumbnail image must fit in the 64K APP1 marker and is usually compressed as JPEG (unlike what #Karthik says). The thumbnail image is independent of the main image and can easily be changed since it is inside a marker segment that doesn't affect the main image. The JPEG marker segments are basically a linked list of independent binary blobs with 2-byte identifiers (e.g. FFE1 in this case) and 2-byte lengths. You can swap out one for another and you won't "break" the file. There is no checksum or other mechanism that verifies the entire file data integrity. I'm not familiar with libraries to edit this information, but you can do it in a small amount of code that only has to parse the marker blobs type and length without knowing their contents. You can also do it the "quick and dirty" way by ensuring that your new thumbnail is no larger than the original and then you can just write it in it's place without moving the other parts of the file around. The marker length is not checked against its contents, so unused space is ignored.

Creating a unique hash based on the CONTENTS of an image (PNG) in Ubuntu?

For the purposes of insuring images are not tampered with, I would like to create a unique hash based on the contents of an image file (a PNG specifically). I've googled, and I know it's very possible to create a hash based on a file, but it seems to take into account things other hten the contents of the image?
For instance, to test, I create a very large PNG file with random colors/lines/shapes/etc. Then I saved the file as test1.png. I then created a single pixel black dot in the corner of the image and saved as test2.png.
I ran md5sum on both images, and got different hash values (expected). I then downloaded test2.png, removed the single black pixel, and saved the file as test3.png. test3.png and test1.png contain the EXACT same image.
Now, from what I understand PNG should be a lossless compression, so that shouldn't be the issue (?). I'm a bit in the dark (as you can probably tell) about all of this, so if anybody can give me any ideas, I'd much appreciate it!
You didn't say so, but I guess you are getting different hashes for test1.png and test3.png?
PNG files can contain a fair bit of metadata in addition to the image data; it's possible that some of the metadata is different. It's also possible for the same image data to be compressed in different ways. If you really want to know, compare the files to find out what exactly is different.
If you really want to hash just the contents of the files, you'll most likely have to convert them to a raw RGB format and hash that instead.

Resources