Data Types and file structure - format

I interested to work with data types and file formats.
For example I want to open a jpeg file with php and work with it.
For example get the size, or change it to black and white without any library.
I want to know that how can decode bytes of a file and get information about it?
I opened a jpeg file with HxD and saw some data in hexadecimal.
Please give me a reference to know more about files and structures...
Sorry for bad English.
Thanks a lot ...

A lot of image files are encoded using the Exchangeable Image File Format
In PHP you could use something like this method:
http://php.net/manual/en/function.exif-read-data.php
That will allow you to have access to the different properties of an image which are stored in the image header, such as resolution, endianess, etc. which you can then use to read in the raw image data.
The raw image data is usually stored immediately after the image header.
Here is the spec for the Jpeg File Interchange Format (JFIF):
http://www.jpeg.org/public/jfif.pdf
Also, in PHP if you're just reading raw image data you would use:
$file = 'picture.jpg';
readFile($file);
and you can then display it in a browser using:
header('Content-type: image/jpeg');

Related

Cannot open PNGs created with canvas.toDataUrl

I create a png blob from a canvas with the toDataUrl method.
const pngdata = canvas.toDataURL("image/png");
I open a text editor and copy the content of pngdata into a file that I call img.png
data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAkMAAAiICAYAAAD...
I save this file. When I try to open it (Windows 10) I get "It looks like we don't support this file format"
Removing data:image/png;base64, from the file doesnt help
Why does this not work?
Because your file is still encoded as Base64. You need to decode it to actual binary data for it to be a correct binary file.
You can find many online tools that will do this for you, or the best is probably to directly export your canvas drawing to a binary Blob and download this Blob thanks to a blob:// URL.

find cropped area of image inside the basic image with xxd

I done the following:
I made a screenshot of my linux desktop with gnome-screenshot then converted it to a bmp image and dumped it as raw hex values with xxd -ps.[resolution:969x1920]
From the bmp image i made before, i cropped a small area and exported it to another bmp image and dumped it as raw hex values as well with same method as before.[resolution:47x79]
Now when i go and copy a row of hex values (lets say from the end of the file just to avoid any headers) from the smaller image and try to find it on the other dumped file, it shows that there is not there.
I don't know much from image formats, i just want to know if there is something fundamental behind it that i am missing and i have to study before trying again something similar.
Thank you in advance!

VBA to read image directly from word document for Base64 Conversion

Basically I have a document with lots of small images and I need a fast way to build a list of unique images (and identify them in the document), I was thinking of using Base64 conversion to store each unique image in an xml file (or simply store the MD5 hash of each image).
If the images were saved as files somewhere (i.e outside of the document) I'd be able to do this but (unless there's good reason not to) I'd like to learn how to read the image directly from the document.
Specifically, if I have myDoc.InlineShapes(myImageIndex) how can I most efficiently PREPARE that to either convert to Base64 or create a hash?
All the examples I can find assume the image is loaded from file, I'm hoping to load the image directly from the document... (e.g. Convert image (jpg) to base64 in Excel VBA?)
Many thanks in advance,

is image format (suffix) always in url?

is image format (suffix) always in url? most of times is so, but when it is not, how the browser understand url is for image? I mean maybe some where in package there is format. is it so or the only way is parsing?
thanx
No. The extension and in fact the entire URL is completely irrelevant for the format of the image.
Image formats are determined by:
the "mime type" sent by the server, such as image/png or image/jpeg.
the first few bytes in the file contents, which will also tell you the image format. Your image editor (photoshop, etc) applies this type to the file.
Strictly speaking, the server is supposed to send the correct mime type. But most browsers also check the first few bytes of the file.

How to determine if a photo is corrupted?

I have a requirement where in I have to determine whether a photo is corrupted and accordingly tag it as such.
Another thing, I need is to determine if an Image has got wrong extension. What I mean by wrong extension is that sometimes I have come across a photo that has extension of jpg but when I load this photo into IrfanView it reports that the photo is in different format that the extension.
How can I do this in Delphi.
I have a requirement where in I have to determine whether a photo is corrupted and accordingly tag it as such.
You can try some things, but with certain file formats (example: BMP, JPEG to some extent) only a human can ultimately decide if the file is OK or corrupted. The simplest test is to simply load the file into a corresponding object (TJpegImage, TPngObject, etc). If you get an exception while loading you've surely got a corrupted file. Unfortunately if no exception is raised you can't really say the file is not corrupted. I've seen corrupted JPEG files that load just fine into a Delphi TImage and can be opened with Windows's Image Viewer, but are obviously corrupted to a human observer. With BMP images it's even clearer: open up a bitmap, overwrite some bytes in the middle of the file and then open it in a viewer. How can any automated system tell those wrongly colored bits in the middle of the bitmap are actually wrong?
Another thing, I need is to determine if an Image has got wrong extension. What I mean by wrong extension is that sometimes I have come across a photo that has extension of jpg but when I load this photo into IrfanView it reports that the photo is in different format that the extension.
How about doing some of the same, trying to load the file into the object that corresponds to it's extension, and if you fail, try opening up with some other formats? This should be easy.
Alternatively you can investigate image headers: Most file formats start with a short signature, a few bytes. You can look up the documentation of all image file formats and find the signature, or you can simply open up an large number of files and look for a pattern in the first 4 bytes. I'd go for this second alternative since finding proper documentation for all image file formats might be a challenge.
The only way to check if file is corrupted is to try reading it as it is described in file format, ie. load BMP as BMP with reading BMP header, BMP data etc. There are many web pages that describe graphics file formats. Of course if you transmit files and are afraid that it will be corrupted after transmitting then save such files with some sum like CRC32, or even cryptographic MD5 or SHA1. Then after transmitting check if calculated sum is the same as original.
In Delphi there is unit jpeg and types TJPEGImage and TBitmap. Try loading it with data and check exception. For others formats there are many libraries, just look for required file formats.
To check if file extension is good try reading some first bytes of file and check it with some dictionary of graphics file headers. For example GIF files should start with GIF, BMP files starts with BM, and in JPEG header you will find JFIF. I think unix utility file works this way.
Since you used the term "requirement", I suspect that you're doing a job for someone, possibly as a contract. So make sure that you nail the requirements before worrying about the code.
IMO, you need to get samples of test cases. As others mentioned, failure to load the file as a particular format will be one test. But what about a .jpg that loads ok, but the bottom third is missing? Or a .jpg that loads ok but has green "static" lines in the middle where an error occurred upstream somewhere (on the camera, photoshop, whatever) but then the processing recovered and resumed? In this case, the .jpg may really have green lines in it. Is that considered "corrupt" or not? This is where you need to be careful, especially if it's a contract job.
I have handled this situation by reading the suspicious image and trying to getting its shape. The task is done within try-except block. Following is the code:
import cv2
image = cv2.imread('./image.jpg')
try:
dummy = image.shape # this line will throw the exception
except:
print("[INFO] Image is not available or corrupted.")
This approach should cover all your needs like:
Detecting a corrupted image
Non-image file with an image-type extension detection
Missing image detection etc.

Resources