GIF format as png? - image

I'm seeing some images online that end in .png but appear as GIF. How is this possible?
Example:
https://www.khanacademy.org/computer-programming/loading/6267221601681408/5689792285114368.png

This is a GIF file, with an .png extension. Though the extension is "wrong", many image viewers (including browsers) can still it interpret them correctly because they don't believe blindly what the extension says (remember that the "extension" is just a hint), but they look into the image content. The first bytes of most common image formats allow to easily identify the image type. In this case, you can check (looking at the image content, say, in some hexadecimal editor/viewer) that the file content starts with the ASCII characters "GIF89a".

Related

Converting PDF to images of original size

I have a PDF file which is made of photographs of a book connected in a single PDF file. I'm trying to convert it back to single images in PNG format, every tool I tried asks me to set DPI which alters the size of resulting images, is there a way to get images of the exact same pixel size the original images were?
Most PDFs of books contain a single image per page and depending on the scanner these images can basically be in three different formats: JPEG, JPEG2000 or TIFF. JPEG2000 is rarely used, so your PDF probably contains JPEG and/or TIFF images.
The good thing about JPEG (and JPEG2000) images is that they can be embedded as-is into a PDF! So you can extract the images as they are stored in the PDF. With TIFF this is also sometimes possible (but I don't think always...).
As mentioned by Tim Roberts you should try using pdfimages or hexapdf images to view and extract the images stored in the PDF. This will give you the best result.

How to recognize an image file format using its contents?

If a Image file is of format .png then it will contain ‰PNG, at the beginning of the file. (when read in Text mode)
If a Image file is of format .bmp then it will contain BM, at the beginning of the file. (when read in Text mode)
I know that Image formats contain text (data) of certain size (bytes) in the beginning of the file, which is used as metadata of the Image file?
My Questions are:-
Is this behavior same in all image file formats (or formats in general)?
Could a image file (of no extension) be recognized just using this data?
Is there information available on how this metadata is broken down? By that I mean, data at which position in the metadata has what meaning?
Is this behavior same in all image file formats (or formats in
general)?
For most of them, yes. There are some proprietary formats (e.g. for games) that might have very short or no metadata. Also, metadata might be in another file (e.g. animations together with XML metadata).
Could a image file (of no extension) be recognized just using this
data?
Yes. In fact, most image viewers will warn you if an image file has an incorrect extension and ask you if they should fix it.
On Unix systems, there's a file command that identifies files based on their metadata. There is a better tool specific for images called identify (part of ImageMagick) that returns more detailed information on resolution, bitdepth, etc.
Is there information available on how this metadata is broken down? By
that I mean, data at which position in the metadata has what meaning?
There are books about (image) file formats and for most formats, this information is available in official specifications (e.g. RFC 2083 for PNG). They list all of the (optional) file contents, describe the compressions and what a viewer/decoder/encoder can/must/should do with the data. A good starting point might be the Wikipedia list of image file formats.
Note that based on the examples you gave I suppose you opened files with a text editor which is not the ideal tool for that task. It's better to use a hex-editor for this. Text editors won't show most bytes (e.g. 255) by default and interprete others (e.g. tab or line feed). They might be good enough to see magic text strings like "BM" and "PNG", but with a hex editor, you can see both these text parts and their numerical representation - e.g. allowing you to extract image width and height. For this, some tool to convert hexademical values to decimal is useful, most calculators can do this.
As an example, let's look at the beginning of a PNG file with a resolution of 6146 x 14293 in both a text editor and a hex editor:
You can see that the file is a PNG image in both of them, that's correct. But the marked part in the hex editor view will show the width and height of the image (matching the PNG chunk specification of the "IHDR" part) - 0x00001802 is 6146 in decimal, 0x000037D5 is 14293. There's no way to do this in the text editor.
Also note that even if you don't know an image format, you might be lucky with just guessing it's uncompressed data (this often works for some game image file formats, most notable Unity's "assets"). E.g. if you rename files to ".raw", the image viewer IrfanView will give you a dialog (see the screenshot below) where you can guess width, height and bit depth of the image and see if the result looks good. This requires some experience in interpreting the outcome though, if width and bitdepth don't match, images will look like noise, warped, or have wrong colors.
This "image geometry guessing" can be improved/automated by trying different widths and computing the correlation coefficent between two lines. The tool raw2tiff can do this. Quote from the site:
There is no magic, it is just a mathematical statistics, so it can be
wrong in some cases. But for most ordinary images guessing method will
work fine.
Using Imagemagick, you can get that information (if available) for formats that Imagemagick can read from its "magick" data in the header file as follows:
convert image -format "%m\n" info:
For example:
convert lena.png -format "%m\n" info:
PNG
convert lena.jpg -format "%m\n" info:
JPEG
convert lena.pnm -format "%m\n" info:
PPM
Even if the suffix is removed, this still works:
convert lena_copy -format "%m\n" info:
PNG

Non-deterministic* data in header/beginning of PNG files

I noticed that PNG files created by Gimp from the same RPG data are identical except for the very beginning. This image shows a diff of otherwise identical PNG files created with Gimp:
What is this data which changes each time and how is it encoded? Are there tools to decode it? Can you learn something from this information, e.g. can you find out when a PNG file was (probably) created by this information?
I was under the impression that PNG files are created deterministically* and don't store meta data which isn't necessary to decode the image. (Obviously, the last part is not true, either, as Gimp writes its own name into the files but doesn't ask the user (which is does if you export something as a JPEG file).)
 * I use the word "deterministic" here to refer to things and only such which are the same on each execution/export/whatever given the same input. I'd usually use the word "functional" (i.e. like a mathematical function) but I fear this could be misunderstood by people who don't know what "functional" means in mathematics. Obviously, this is different from the usage of this word in information theory.
See the PNG header definition.
tIME stores the time that the image was last changed, so for me it's the same as the timestamp of the file you create.
bKGD gives the default background color. Possibly the bakcgournd color you are using in Gimp, or the color of the transparent pixels.
tEXT with key Comment and value Created with Gimp is just the default comment. You can change the comment for the image in Image>Properties and you can set a default comment in Edit>Preferences>Default Image
When I export the same PNG twice, I only see a change in tIME. In fact I can't get a bKGD item, even when exporting a PNG with transparent pixels. Are you using any specific options when exporting?

Create small high quality PDF embedding optimized PNG?

I'm trying to create a small PDF file, embedding one optimized PNG image displayed as a header and footer on a 3 page PDF (same image must appear 6x in the PDF)
My optimized PNG image is only 2.3KB. It looks very sharp.
Failed with libreoffice
When I insert just one instance of the 2.3KB PNG image into a Libreoffice Writer doc containing only text, then export as PDF I can see that the image gets re-compressed to JPG and the resulting PDF file grows by about 40KB after adding the image. It also loses quality, the PNG also gets JPG fuzzy edges.
If I right click the image and select compression, there is no way to disable recompressing the image (it's already optimized better than libreoffice could do it) I've tried setting a compression level of 0,1,9 etc. Choosing JPG, no resize, lossless, etc but there was no improvement.
Failed with wkhtmltopdf
I also tried making a test page and used wkhtml2pdf but it did the same thing. Adding the low quality flag made no difference.
PDF Spec suggests PNG is supported?
From skimming the PDF spec, it looks like PNG images are supported.
Even plain text PDF files are surprisingly large
The disappointing thing is also when I take a 7KB HTML file which is basically just <html><body><p>foo...</p><p>bar...</p> (only about 15 paragraphs) with no CSS. The resulting 2 page PDF file is 30KB. Why should a 7kb (almost plain text) file become 30kb as a PDF?
Suggestions?
Can someone please suggest how to make a small PDF file in Linux?
I need to include 7KB of text and repeat one PNG image 6 times.
Manually or programatically. I'll take whatever I can get at this point.
PDF Spec suggests PNG is supported?
PNG isn't supported per se; PDF allows embedding JPEG images as-is, but not PNG images. PDF does borrow a set of features of the PNG format, however.
rinohtype (full disclosure: I'm the author) tries to embed as much as possible from PNG images as-is into the PDF. This does involve some bit-juggling to separate the alpha channel from the color data for example, but no reencoding of the image is performed. It does not (yet) support interlaced PNGs.
rinohtype should be able to do what you want to achieve. But please note that it currently is in a beta stage, so you might encounter some bugs.
Even plain text PDF files are surprisingly large
To keep the PDF size as small as possible, make sure not to embed/subset any of the fonts. Use only the fonts from the base 14 PDF fonts which are provided by PDF readers.
What you want is certainly achievable. Regarding the image quality, I would recommend making your image twice the size that you want it to actually display at in the PDF to keep it looking sharp.
As to the size, I've just modified a test in my PDF writer module (WIP..) to include a 7.2K png, 200px x 70px, in a PDF twice and the PDF came out at 6.8K 8). There's not much text included, but more text will only add what it's worth + a small percentage.
You can see the module and original test here.. https://github.com/DoccaPDF/docca-pdf-writer/blob/master/src/tests/writer.js#L40
That test adds ~112K of images to the PDF and results in a 103K PDF.
Of course not all images are created equal so you milage may vary..
*the images are only actually added to the PDF once, but are displayed multiple time.

Can a PNG image contain multiple pages?

On OSX I converted a multi-page PDF file to PNG and (somehow) it created a multi-page PNG file.
Is there an extension to the PNG format that allows this? Or is this not something I can validly create?
~~~~
To clarify, this is a PNG file, per the builtin file command and the identify command from imagemagick.
$ file algorithms-combined-print.png
algorithms-combined-print.png: PNG image data, 1275 x 1650, 8-bit/color RGBA, non-interlaced
$ identify algorithms-combined-print.png
algorithms-combined-print.png PNG 1275x1650 1275x1650+0+0 8-bit sRGB 3.537MB 0.000u 0:00.000
And here is a pastebin of the command identify -verbose algorithms-combined-print.png: http://pastebin.com/hw1yuRKa
What is notable from that output is that the pixel count is Number pixels: 2.104M which corresponds to one page. However, the file size is 3.537MB, which is clearly sufficient to hold all the pages.
Per request, here is the output of pngcheck: http://pastebin.com/aCRMEd9L
PNG does not support "multipage" images.
MNG is a PNG variant that supports multiple images - mostly for animations, but it's not a real PNG image (diffent signature/header), and has never become popular.
APNG is a similar attempt, but more focused on animations - it's more popular and alive, though it's less official - it's also PNG compatible (a standard PNG viewer, unaware of APNG, will display it as a single PNG image).
Another possible explanation is that your image is actually a TIFF image with a wrong .png extension, and the viewer ignores it.
The only way to know for sure is to look inside the image file itself (at least to the first bytes)
Update: given the pngcheck output, it seems to be a APNG file.

Resources