I have some raw data in a file that I would like to store in an image file (bmp, jpg, png, or even gif (eegad)). I would like this to be a two way process: I need to be able to reliably convert the image file back later and get a file that is identical to the original file.
I am not looking for a how-to on steganography; the image file will probably be one pixel wide and millions of pixels high and look like garbage. That is fine.
I looked into the Imagemagick utility convert, but am intimidated by the large number of options and terse man page. I am guessing I could just use this to convert from a 'raw' black channel to png, but would have to specify a bunch of other stuff. Any hints? I would prefer to work within Imagemagick or using Linux utilities.
If you are wondering, there's nothing black hat or cloak and dagger about my request. I simply want to automatically backup some important data to a photo-sharing site.
I'd plow into ImageMagick if that's what you'd prefer anyway.
Specific image formats support storing text data to different degrees, and ImageMagick supports all of the formats you mentioned. I'd choose the one that lets you store what you need.
Related
If a Image file is of format .png then it will contain ‰PNG, at the beginning of the file. (when read in Text mode)
If a Image file is of format .bmp then it will contain BM, at the beginning of the file. (when read in Text mode)
I know that Image formats contain text (data) of certain size (bytes) in the beginning of the file, which is used as metadata of the Image file?
My Questions are:-
Is this behavior same in all image file formats (or formats in general)?
Could a image file (of no extension) be recognized just using this data?
Is there information available on how this metadata is broken down? By that I mean, data at which position in the metadata has what meaning?
Is this behavior same in all image file formats (or formats in
general)?
For most of them, yes. There are some proprietary formats (e.g. for games) that might have very short or no metadata. Also, metadata might be in another file (e.g. animations together with XML metadata).
Could a image file (of no extension) be recognized just using this
data?
Yes. In fact, most image viewers will warn you if an image file has an incorrect extension and ask you if they should fix it.
On Unix systems, there's a file command that identifies files based on their metadata. There is a better tool specific for images called identify (part of ImageMagick) that returns more detailed information on resolution, bitdepth, etc.
Is there information available on how this metadata is broken down? By
that I mean, data at which position in the metadata has what meaning?
There are books about (image) file formats and for most formats, this information is available in official specifications (e.g. RFC 2083 for PNG). They list all of the (optional) file contents, describe the compressions and what a viewer/decoder/encoder can/must/should do with the data. A good starting point might be the Wikipedia list of image file formats.
Note that based on the examples you gave I suppose you opened files with a text editor which is not the ideal tool for that task. It's better to use a hex-editor for this. Text editors won't show most bytes (e.g. 255) by default and interprete others (e.g. tab or line feed). They might be good enough to see magic text strings like "BM" and "PNG", but with a hex editor, you can see both these text parts and their numerical representation - e.g. allowing you to extract image width and height. For this, some tool to convert hexademical values to decimal is useful, most calculators can do this.
As an example, let's look at the beginning of a PNG file with a resolution of 6146 x 14293 in both a text editor and a hex editor:
You can see that the file is a PNG image in both of them, that's correct. But the marked part in the hex editor view will show the width and height of the image (matching the PNG chunk specification of the "IHDR" part) - 0x00001802 is 6146 in decimal, 0x000037D5 is 14293. There's no way to do this in the text editor.
Also note that even if you don't know an image format, you might be lucky with just guessing it's uncompressed data (this often works for some game image file formats, most notable Unity's "assets"). E.g. if you rename files to ".raw", the image viewer IrfanView will give you a dialog (see the screenshot below) where you can guess width, height and bit depth of the image and see if the result looks good. This requires some experience in interpreting the outcome though, if width and bitdepth don't match, images will look like noise, warped, or have wrong colors.
This "image geometry guessing" can be improved/automated by trying different widths and computing the correlation coefficent between two lines. The tool raw2tiff can do this. Quote from the site:
There is no magic, it is just a mathematical statistics, so it can be
wrong in some cases. But for most ordinary images guessing method will
work fine.
Using Imagemagick, you can get that information (if available) for formats that Imagemagick can read from its "magick" data in the header file as follows:
convert image -format "%m\n" info:
For example:
convert lena.png -format "%m\n" info:
PNG
convert lena.jpg -format "%m\n" info:
JPEG
convert lena.pnm -format "%m\n" info:
PPM
Even if the suffix is removed, this still works:
convert lena_copy -format "%m\n" info:
PNG
I use gostscript to convert text to outlines with the following code :gswin32c.exe -sDEVICE=pdfwrite -sOutputFile=output.pdf -dQUIET -dNOPAUSE -dBATCH -dNoOutputFonts -f test_new.pdf,it works.But i got a very small output file from 2.5M to 70kb.Then i find the picture became blurred in pdf.
Add -dPDFSETTINGS=/default,This will have the same result.
I's better to use -dPDFSETTINGS=/printer or -dPDFSETTINGS=/prepress,but 300dpi is not enough for me(or for my boss).
Is there any way to keep the original resolution of the picture.
Or how to set a higher dpi for images in output pdf.
The test file is here.
Thanks in advance.
The answer to your question is 'yes' (but see later). Don't use PDFSETTINGS, that sets lots of things all in one go. If you want control then you need to specify each setting individually.
Rather than use this shotgun approach you need to read the documentation, decide which controls affect areas you want to change, and alter those controls only.
However, image downsampling is not your problem. If you don't use -dPDFSETTINGS then PDF file written by Ghostscript contains an image at exactly the same resolution as the image in the original file.
Your problem is that the image is being written with JPEG compression, and JPEG is a lossy compression, so you are losing fidelity. Note that in the original file the image is written uncompressed, which is why its so large.
It looks like the original image was a JPEG, and the free PDF editor you are using has realised that so it saved the image uncompressed (I may be giving it too much credit here, it may save all images uncompressed). Applying JPEG to an image which has already been quantised simply amplifies the artefacts.
Instead you need to specify that you want images compressed with Flate, which is a lossless compression. The documentation for the pdfwrite controls can be found here, you need to change AutoFilterColorImages and ColorImageFilter.
Note that by not applying JPEG quantisation (a second time) and DCT encoding, the compression is less than your first experience. For me the output file comes in at just over 600Kb (leaving the font in place, and the text as text, would be a couple of Kb smaller). However the image is identical, as expected.
Since you are clearly using Ghostscript in a commercial environment, can I just point you at the licence and ask you to check that your usage is compatible with the AGPL, bearing in mind that this covers software as a service usage as well.
I noticed that PNG files created by Gimp from the same RPG data are identical except for the very beginning. This image shows a diff of otherwise identical PNG files created with Gimp:
What is this data which changes each time and how is it encoded? Are there tools to decode it? Can you learn something from this information, e.g. can you find out when a PNG file was (probably) created by this information?
I was under the impression that PNG files are created deterministically* and don't store meta data which isn't necessary to decode the image. (Obviously, the last part is not true, either, as Gimp writes its own name into the files but doesn't ask the user (which is does if you export something as a JPEG file).)
* I use the word "deterministic" here to refer to things and only such which are the same on each execution/export/whatever given the same input. I'd usually use the word "functional" (i.e. like a mathematical function) but I fear this could be misunderstood by people who don't know what "functional" means in mathematics. Obviously, this is different from the usage of this word in information theory.
See the PNG header definition.
tIME stores the time that the image was last changed, so for me it's the same as the timestamp of the file you create.
bKGD gives the default background color. Possibly the bakcgournd color you are using in Gimp, or the color of the transparent pixels.
tEXT with key Comment and value Created with Gimp is just the default comment. You can change the comment for the image in Image>Properties and you can set a default comment in Edit>Preferences>Default Image
When I export the same PNG twice, I only see a change in tIME. In fact I can't get a bKGD item, even when exporting a PNG with transparent pixels. Are you using any specific options when exporting?
You know, computer stores images as channels and pixels in those channels. And pixel values are like "00110101" which fills 8 bits at memory. I want to know truly where that bits stored at memory, and how can i make operations on them.
Thanks!
Well, the standard book is Digital Image Processing by Gonzalez and Woods.
Another book, where you can pick up the PDF for free is Image Processing in C by Dwayne Philips - PDF here.
First, you need to get a decent C compiler and development system - personally I use Mac OSX, but I guess you would want Visual Studio free edition on Windows.
Then you need to get started with some simple reading and writing of files and memory allocation. I would go with greyscale images of the NetPBM format - probably just PGM files - described here as they are the easiest. You can download the NetPBM programs and run them in a Windows Command Prompt and see how they work and try and implement them yourself in C. You can also download ImageMagick for Windows and try converting images from colour to greyscale and resizing them like this:
convert input.png -colorspace gray result.jpg
convert input.tif -resize 400x400 result.pgm
When you have got that, I would move on to colour PPM format and then maybe PNG and/or JPEG. Remember there are libraries for TIF/JPEG/PNG/BMP so don't be afraid to use them.
Finally, move on to displaying images yourself with Windows GDI etc.
Come back to StackOverflow if you get stuck - questions are free!
tl;dr wildly different with different encodings/filesystems/os'es/drivers
Well that depends on the image format. BMP is one of the easier formats, details on what these files look like can be found on for instance wiki
And to answer "where its stored", it is stored on permanent storage (hardrive/ssd), where exactly depends on the filesystem (FAT/NTFS/EXT etc).
When an image is to be displayed, its read into memory, where it can be manipulated and through some apis this data can be put into a memory region specifically meant to display the current images on you screen.
I have a small problem, I have a set of animated gif images. I want to pick individual gif image files, and create multiple tiff images capturing individual frames.
I am looking to do it in Python/Java.
Help would be appreciated!
You can do this easily from the command-line using ImageMagick. It is available for free from here. It has bindings for Perl, C/C++, Python and lots of others. It is ready installed in many Linux distros.
Your command looks like this:
convert -coalesce input.gif %02d.tif
which will produce TIFF format output files, numbered 01.tif, 02.tif etc. according to the frame number.
You can also extract an individual frame, say frame 7, like this:
convert -coalesce input.gif[7] my_favourite.tif
or a sequence of frames, say 3-7 like this:
convert -coalesce input.gif[3-7] frames%02d.tif
Note however, that when you extract individual frames, you may get artefacts depending on how well compressed your original GIF files are - since they sometimes only store DIFFERENCES between frames, so you may be best advised to extract all frames then discard any you don't want.