find cropped area of image inside the basic image with xxd - image

I done the following:
I made a screenshot of my linux desktop with gnome-screenshot then converted it to a bmp image and dumped it as raw hex values with xxd -ps.[resolution:969x1920]
From the bmp image i made before, i cropped a small area and exported it to another bmp image and dumped it as raw hex values as well with same method as before.[resolution:47x79]
Now when i go and copy a row of hex values (lets say from the end of the file just to avoid any headers) from the smaller image and try to find it on the other dumped file, it shows that there is not there.
I don't know much from image formats, i just want to know if there is something fundamental behind it that i am missing and i have to study before trying again something similar.
Thank you in advance!

Related

How to recognize an image file format using its contents?

If a Image file is of format .png then it will contain ‰PNG, at the beginning of the file. (when read in Text mode)
If a Image file is of format .bmp then it will contain BM, at the beginning of the file. (when read in Text mode)
I know that Image formats contain text (data) of certain size (bytes) in the beginning of the file, which is used as metadata of the Image file?
My Questions are:-
Is this behavior same in all image file formats (or formats in general)?
Could a image file (of no extension) be recognized just using this data?
Is there information available on how this metadata is broken down? By that I mean, data at which position in the metadata has what meaning?
Is this behavior same in all image file formats (or formats in
general)?
For most of them, yes. There are some proprietary formats (e.g. for games) that might have very short or no metadata. Also, metadata might be in another file (e.g. animations together with XML metadata).
Could a image file (of no extension) be recognized just using this
data?
Yes. In fact, most image viewers will warn you if an image file has an incorrect extension and ask you if they should fix it.
On Unix systems, there's a file command that identifies files based on their metadata. There is a better tool specific for images called identify (part of ImageMagick) that returns more detailed information on resolution, bitdepth, etc.
Is there information available on how this metadata is broken down? By
that I mean, data at which position in the metadata has what meaning?
There are books about (image) file formats and for most formats, this information is available in official specifications (e.g. RFC 2083 for PNG). They list all of the (optional) file contents, describe the compressions and what a viewer/decoder/encoder can/must/should do with the data. A good starting point might be the Wikipedia list of image file formats.
Note that based on the examples you gave I suppose you opened files with a text editor which is not the ideal tool for that task. It's better to use a hex-editor for this. Text editors won't show most bytes (e.g. 255) by default and interprete others (e.g. tab or line feed). They might be good enough to see magic text strings like "BM" and "PNG", but with a hex editor, you can see both these text parts and their numerical representation - e.g. allowing you to extract image width and height. For this, some tool to convert hexademical values to decimal is useful, most calculators can do this.
As an example, let's look at the beginning of a PNG file with a resolution of 6146 x 14293 in both a text editor and a hex editor:
You can see that the file is a PNG image in both of them, that's correct. But the marked part in the hex editor view will show the width and height of the image (matching the PNG chunk specification of the "IHDR" part) - 0x00001802 is 6146 in decimal, 0x000037D5 is 14293. There's no way to do this in the text editor.
Also note that even if you don't know an image format, you might be lucky with just guessing it's uncompressed data (this often works for some game image file formats, most notable Unity's "assets"). E.g. if you rename files to ".raw", the image viewer IrfanView will give you a dialog (see the screenshot below) where you can guess width, height and bit depth of the image and see if the result looks good. This requires some experience in interpreting the outcome though, if width and bitdepth don't match, images will look like noise, warped, or have wrong colors.
This "image geometry guessing" can be improved/automated by trying different widths and computing the correlation coefficent between two lines. The tool raw2tiff can do this. Quote from the site:
There is no magic, it is just a mathematical statistics, so it can be
wrong in some cases. But for most ordinary images guessing method will
work fine.
Using Imagemagick, you can get that information (if available) for formats that Imagemagick can read from its "magick" data in the header file as follows:
convert image -format "%m\n" info:
For example:
convert lena.png -format "%m\n" info:
PNG
convert lena.jpg -format "%m\n" info:
JPEG
convert lena.pnm -format "%m\n" info:
PPM
Even if the suffix is removed, this still works:
convert lena_copy -format "%m\n" info:
PNG

Non-deterministic* data in header/beginning of PNG files

I noticed that PNG files created by Gimp from the same RPG data are identical except for the very beginning. This image shows a diff of otherwise identical PNG files created with Gimp:
What is this data which changes each time and how is it encoded? Are there tools to decode it? Can you learn something from this information, e.g. can you find out when a PNG file was (probably) created by this information?
I was under the impression that PNG files are created deterministically* and don't store meta data which isn't necessary to decode the image. (Obviously, the last part is not true, either, as Gimp writes its own name into the files but doesn't ask the user (which is does if you export something as a JPEG file).)
 * I use the word "deterministic" here to refer to things and only such which are the same on each execution/export/whatever given the same input. I'd usually use the word "functional" (i.e. like a mathematical function) but I fear this could be misunderstood by people who don't know what "functional" means in mathematics. Obviously, this is different from the usage of this word in information theory.
See the PNG header definition.
tIME stores the time that the image was last changed, so for me it's the same as the timestamp of the file you create.
bKGD gives the default background color. Possibly the bakcgournd color you are using in Gimp, or the color of the transparent pixels.
tEXT with key Comment and value Created with Gimp is just the default comment. You can change the comment for the image in Image>Properties and you can set a default comment in Edit>Preferences>Default Image
When I export the same PNG twice, I only see a change in tIME. In fact I can't get a bKGD item, even when exporting a PNG with transparent pixels. Are you using any specific options when exporting?

Octave, display image received by socket connection does't show

I have an octave script in which i open a socket server an receive some commands from connected clients. This already works. Now i need to send data to Octave, mostly images and process them. To test this i wanted to receive and display a grayscale test image.
bufflen = 4096;
[data,count]=recv(b,bufflen);
imshow (data)
the image window opens but it is empty. The size of data is exactly the size of the image file i am sending. I also tried saving the image with
imwrite (data, "test.jpg");
this produced a file but every line of the image was in one long line. When i open the image with
imshow test.jpg
everything works as it should, the image window appears and shows the image.
I am sending the data via netcat with
>ncat.exe 127.0.0.1 12346 < test.jpg
this seems to work, i was able to transfer the image to another pc and view it there.
Every hint or tip is greatly appreciated, thank you.
You are sending your jpeg as byte stream (ncat.exe 127.0.0.1 12346 < test.jpg) but you are trying to show is with imshow as it would be an uncompressed RGB, grayscale or index image. You can receive it and save it to a tempfile and then load it with imread. In this case graphics/image-magick will do the uncompression from JPE to RGB to you.
Guessing here since you didn't provide much information, but it sounds like you're data is coming as a vector and you need to reshape it into an array for imshow
>> newdata = reshape(data, 64, 64)
You haven't shown us an example of the input data, so it is also possible that your data is a string of characters, while image arrays need to be numerical values. To verify before reshaping you could run:
>> class(data)
If so you will need to convert it to an array of numerical values. You can use str2num for that, but exactly how to do so will depend on what the string looks like, are there value separators, etc.
See:
https://www.gnu.org/software/octave/doc/interpreter/String-Conversions.html

Save uint16 tiff image as truecolor with Matlab

I am processing microscopy images (in Matlab) in the tiff format, normally uint8 or uint16. Basically I read them, put them in a cell array for processing and then export them in the tiff format either as an image sequence or a stack (using imwrite and either the 'overwrite' or 'append' writemode property of imwrite, respectively). Up to now everything works very well.
The problem I'm having is the following:
When I open the images with ImageJ, they are not in truecolor "RGB" color mode, but rather in composite mode. For example ImageJ reads the data as 8 bit, which it is, but does not open the image as a truecolor (Sorry for the bad choice of words I don't know the right terminology). Hence I have to manually combine the 3 channels together, which is bothersome for large datasets.
Here is a screen shot explaining. On the left is what I would like,i.e. what I obtain if I open the image directly with ImageJ, and on the right is what I currently have after saving images with Matlab and opening them with ImageJ, which I don't want.
The code I'm using to export the image sequence is the following. "FinalSequenceToExport" is the cell array containing the images.
for i = 1:SliceNumber
ExportedName = sprintf('%s%s%d.tiff',fileName,'Z',i);
imwrite(FinalSequenceToExport{i},ExportedName,'tif','WriteMode','overwrite','Compression','none');
end
If I ask Matlab the size of FinalSequenceToExport{1}, for instance, it gives 512 x 512 x 3.
If I open a given image in the command window and then save it with the same code as above, it does what I want and the resulting image opens as I want in ImageJ. Hence my guess would be that the problem arises from the use of the cell array but I don't understand how.
I hope I've been clear enough. If not please ask for more details.
Thanks for the help!
You need to specify the the 'ColorSpace'
Try this
imwrite(FinalSequenceToExport{i},ExportedName,...
'tif','WriteMode','overwrite','Compression','none', ...
'ColorSpace', 'rgb');
After revisiting this question I found the following to work, thanks to the hint from #Ashish:
imwrite(uint8(FinalSequenceToExport{i}/255),...);
I just needed to divide by 255 after converting to uint8.

Creating a unique hash based on the CONTENTS of an image (PNG) in Ubuntu?

For the purposes of insuring images are not tampered with, I would like to create a unique hash based on the contents of an image file (a PNG specifically). I've googled, and I know it's very possible to create a hash based on a file, but it seems to take into account things other hten the contents of the image?
For instance, to test, I create a very large PNG file with random colors/lines/shapes/etc. Then I saved the file as test1.png. I then created a single pixel black dot in the corner of the image and saved as test2.png.
I ran md5sum on both images, and got different hash values (expected). I then downloaded test2.png, removed the single black pixel, and saved the file as test3.png. test3.png and test1.png contain the EXACT same image.
Now, from what I understand PNG should be a lossless compression, so that shouldn't be the issue (?). I'm a bit in the dark (as you can probably tell) about all of this, so if anybody can give me any ideas, I'd much appreciate it!
You didn't say so, but I guess you are getting different hashes for test1.png and test3.png?
PNG files can contain a fair bit of metadata in addition to the image data; it's possible that some of the metadata is different. It's also possible for the same image data to be compressed in different ways. If you really want to know, compare the files to find out what exactly is different.
If you really want to hash just the contents of the files, you'll most likely have to convert them to a raw RGB format and hash that instead.

Resources