I understand that a JPEG file starts with 0xFFD8 (SOI), followed by a number of 0xFFEn segments holding metadata, then a number of segments holding the compression relate data (DQT, DHT, etc) of which the final one is 0xFFDA (SOS); then comes the actual image data which ends with 0xFFD9 (EOI). Each of those segments states its length in the two bytes following the JPEG marker so it is a trivial execise to calculate the end of a segment/start of next segment and the start of the image data can be calculated from the length of the SOS segment.
Up to that point, the appearance of 0xFFD9 (EOI) is irrelevant 1, because the segments are identified by the length. As far as I can see, however, there is no way of determining the length of the image data other than finding the 0xFFD9(EOI) marker following the SOS segment. In order for that to be assured, it would mean that 0xFFD9 must not appear inside the actual image data itself. Is there something built into the JPEG algorithm to ensure that or am I missing something here?
1 A second 0xFFD8 and 0xFFD9 can appear if a thumbnail is included in the image but that is taken care of by the length of the containing segment - usually a 0xFFE1 (APP1) segment from what I have seen. In images I have checked so far, the start and size of the thumbnail image data is still given in the 0x0201 (JPEGInterchangeFormat - Offset to JPEG SOI)and 0x202 (JPEGInterchangeFormatLength - Bytes of JPEG data) fields in IFD1, even though these were deprecated in Tech Note #2.
In JPEG, the Compressed value FF is encoded as FF00.
The compressed value FFD9 would be encoded as FF00D9.
Related
PNG Specification here says PNG file contains a chunk IDAT which contains the actual image data.
My question is when I modify (using hex editor) LSB of any 1 byte in IDAT the whole image goes bad(colors changes randomly or image becomes transparent with some outline remaining or completely blank).
How just changing 1 byte can cause this?
It says exactly what's in there in the specification you linked. Did you consider reading the specification?
After I read in https://en.wikipedia.org/wiki/Base64 about how the word Man gets converted into TWFu by using the base64 algorithm, I was wondering how an image get converted by the same algorithm, after all this conversion takes bytes ,divide them into groups of 6 and then looking for their ASCII value.
My question is, how an image becomes a base64-encoded string?
I want an answer that describes the flow from when we save the image in our computer until it becomes a base64-string.
Terms that I hope will be explained in the answer are:
pixels/dpi/ppi/1bit/8bit/24bit/Mime.
Base64 isn't an image encoder, it's a byte encoder, important distinction. Whatever you pass it, whether it be a picture, an mp3, or the string "ilikepie" - it takes those bytes and generates a text representation of them. It has no understanding of anything in your pixels/dpi/ppi/1bit/8bit/24bit/Mime list, that would be the business of the software that reads those original bytes.
Per request I want an answer that describes the flow from when we save the image in our computer until it's become 64base string.
To get to a base64 representation:
Open paint and draw a smiley face.
Save that smiley face as smile.png
Paint uses its png encoder to convert the bitmap of pixels into a stream of bytes that it compresses and appends headers to so that when it sees those bytes again it knows how to display them.
Image is written to disk as series of bytes.
You run a base64 encoder on smile.png.
base64 reads the bytes from disk at the location smile.png refers to and converts their representation and displays the result.
To display that base64 representation in a browser:
browser is handed a resource encoded with base64, which looks something ...
Browser takes the image/png part and knows that the data following it will be the bytes of a png image.
It then sees base64, and knows that the next blob will need to be base64 decoded, before it can then be decoded by its png decoder.
It converts the base64 string to bytes.
It passes those bytes to its png decoder.
It gets a bitmap graphic that it can then display.
every image is consists of many pixels, the number of pixel is determined by the image resolution.
image resolution - the number of pixels in a row & number of rows.
for example image with resolution of 800x600 has 800 pixels in a row & 600 rows.
every pixel has bit depth - the number of bits represent pixel.
for example with bit depth of 1 every pixel is represent by one bit and has only 2 options (0 or 1 - black or white).
image can saved in many different formats. the most common are bitmap , jpeg, gif. whatever format is used an image always displayed in computer screens as bitmap (uncompressed format). every format is saved differently.
jpeg- is a 24 bit (bit depth) format. when you stored the image it work in compressed form and you loss some of the image data.
gif- up to 8 bit (bit depth) format. a gif image can be optimized by removing some of the colours in its palette. it is a lossless format.
Just throwing this in for the Bytes clarification :
"The word MAN gets converted into TWFu by using the base64 algorithm, I was wondering how an image gets converted by the same
algorithm, after all this conversion takes Bytes, divide them into
groups of 6 and then looking for their ASCII value."
"My question is, How an image becomes base64 string?"
correction : MAN becomes TUFO. It is actually Man that becomes TWFu as you've shown above.
Now onto the clarification...
The byte values (binary numbers) of the image can be represented in hex notation, which makes it possible to process those same bytes as a string. Hex has a range from 0 up to F which means.. ranging 0 to 9 then it's A = 10 up F = 15. Giving 16 possible values.
Hex is also called Base16.
Bytes conversion to Base64 is simply : Numbers converted from Base16 to Base64.
The Hex range of 0 to F is within Base64 valid chars and so can be written as Base64.
For example :
The beginning 3 bytes of JPEG format are always ff d8 ff
The same as Base64 is : /9j/ ...(see examples at these link1 and link2)
You can test by :
Open any .jpg image in a downloaded free Hex Editor. You can also try online Hex editors but most won't Copy to clipboard. This online HEX viewer will allow Select/Copy but you have to manually remove all those extra minus - in copied hex text (ie: use the Find & Replace option in some text editor), or skip/avoid selecting them before any copy/paste.
Go to this Data Converter and re-type or copy/paste as many bytes (from starting FF up to any X amount) into the [HEX] box and press decode below it. This will show you those bytes as Base64 and even tells you the decimal value of each byte.
When you upload any file in a html form using <input type="file>" it is transferred to server in the exactly same form as it is stored on your computer or device. Browser doesn't check what file format is and traits it as just block of bytes. For transfer details see How does HTTP file upload work?
I'm trying to find the algorithm for IBM/Xerox algorithm: Recorded Image Data Inline Coding recording algorithm (RIDIC).
Within an IPDS print stream, the image gets wrapped in this RIDIC algorithm. I need to be able to take the stream and decode the image portion back to its original image. There is little-to-no information out there as far as I've been able to find.
Here is literally all the information I have on it so far from http://afpcinc.org/wp-content/uploads/2014/07/IPDS-Reference-10.pdf:
"The Recorded Image Data Inline Coding recording algorithm (RIDIC) formats a single image in the binary element sequence of a unidirectional raster scan with no interlaced fields and with parallel raster lines, from left
to right and from top to bottom."
"Each binary element representing an image data element after decompression, without grayscale, is 0 for an
image data element without intensity, and 1 for an image data element with intensity. More than one binary
element can represent an image data element after decompression, corresponding to a grayscale or color
algorithm. Each raster scan line is an integral multiple of 8 bits. If an image occupies an area whose width is
other than an integral multiple of 8 bits, the scan line is padded with zeros."
Any information to work with this algorithm would be greatly appreciated!
Most likely, you're making it a bigger thing than it really is. RIDIC is a recording algorithm: it is the format in which the original image data is arranged prior to compression. Only if the compression is set to "No Compression" would you have to deal with data in the recording format. And then, RIDIC is simply an ordering of bit groups that describe each pixel. E.g. if you had 16-level grayscale, RIDIC encodes each pixel in left-right,top-down order in a nibble, and pads to an even number of bytes.
Ok, so I tried to use the imagemagick command:
"convert picA.png -interlace line picB.png"
to make an interlace version of my .png images. Most of the time, I got the resulting image is larger than the original one, which is kinda normal. However, on certain image, the resulting image size is smaller.
So I just wonder why does that happen? I really don't want my new image to lose any quality because of the command.
Also, is there any compatibility problem with interlaced .png image?
EDIT: I guess my problem is that the original image was not compressed as best as it could be.
The following only applies to the cases where the pixel size is >= 8 bits. I didn't investigate for other cases but I expect similar outcomes.
A content-identical interlaced PNG image file will almost always be greater because of the additional data for filter type descriptions required to handle the passes scanlines. This is what I explained in details in this web page based on the PNG RFC RFC2083.
In short, this is because the sum of the below number of bytes for interlaced filter types description per interlacing pass is almost always greater than the image height (which is the number of filter types for non-interlaced images):
nb_pass1_lines = CEIL(height/8)
nb_pass2_lines = (width>4?CEIL(height/8):0)
nb_pass3_lines = CEIL((height-4)/8)
nb_pass4_lines = (width>2?CEIL(height/4):0)
nb_pass5_lines = CEIL((height-2)/4)
nb_pass6_lines = (width>1?CEIL(height/2):0)
nb_pass7_lines = FLOOR(height/2)
Though, theoretically, it can be that the data entropy/complexity accidentally gets lowered enough by the Adam7 interlacing so that, with the help of filtering, the usually additional space needed for filter types with interlacing may be compensated through the deflate compression used for the PNG format. This would be a particular case to be proven as the entropy/complexity is more likely to increase with interlacing because the image data is made less consistent through the interlacing deconstruction.
I used the word "accidentally" because reducing the data entropy/complexity is not the purpose of the Adam7 interlacing. Its purpose is to allow the progressive loading and display of the image through a passes mechanism. While, reducing the entropy/complexity is the purpose of the filtering for PNG.
I used the word "usually" because, as shown in the explanation web page, for example, a 1 pixel image will be described through the same length of uncompressed data whether interlaced or not. So, in this case, no additional space should be needed.
When it comes to the PNG file size, a lower size for interlaced can be due to:
Different non-pixel encoding related content embedded in the file such as palette (in the case of color type =! 3) and non-critical chunks such as chromaticities, gamma, number of significant bits, default background color, histogram, transparency, physical pixel dimensions, time, text, compressed text. Note that some of those non-pixel encoding related content can lead to different display of the image depending on the software used and the situation.
Different pixel encoding related content (which can change the image quality) such as bit depth, color type (and thus the use of palette or not with color type = 3), image size,... .
Different compression related content such as better filtering choices, accidental lower data entropy/complexity due to interlacing as explained above (theoretical particular case), higher compression level (as you mentioned)
If I had to check whether 2 PNG image files are equivalent pixel wise, I would use the following command in a bash prompt:
diff <( convert non-interlaced.png rgba:- ) <( convert interlaced.png rgba:- )
It should return no difference.
For the compatibility question, if the PNG encoder and PNG decoder implement the mandatory aspects of the PNG RFC, I see no reason for the interlacing to lead to a compatibility issue.
Edit 2018 11 13:
Some experiments based on auto evolved distributed genetic algorithms with niche mechanism (hosted on https://en.oga.jod.li ) are explained here:
https://jod.li/2018/11/13/can-an-interlaced-png-image-be-smaller-than-the-equivalent-non-interlaced-image/
Those experiments show that it is possible for equivalent PNG images to have a smaller size interlaced than non-interlaced. The best images for this are tall, they have a one pixel width and have pixel content that appear random. Though, the shape is not the only important aspect for the interlaced image to be smaller than the non-interlaced image as random cases with the same shape lead to different size differences.
So, yes, some PNG images can be identical pixel wise and for non-pixel related content but have a smaller size interlaced than non-interlaced.
So I just wonder why does that happen?
From section Interlacing and pass extraction of the PNG spec.
Scanlines that do not completely fill an integral number of bytes are padded as defined in 7.2: Scanlines.
NOTE If the reference image contains fewer than five columns or fewer than five rows, some passes will be empty.
I would assume the behavior your experiencing is the result of the Adam7 method requiring additional padding.
I tried to implement steganography with the following steps :
1. Converted image to buffered image
2. Converted buffered image to Bytes array
3. Made modifications in the byte array
4. Converted byte array back to buffered image
5. Saved it as a jpg file
The problem arose when i read the saved file again, converted it to byte array and found that byte array is different from what i obtained after Step 3. (although there were not much difference as 6 converted to 7, 9 to 8 and so on)
I really have no idea why did this happen.
If you save as a JPEG, the RGB data gets converted to YCbCr. Those two color spaces have different gamuts so values get clamped.
JPEG data may be subsampled, causing data to be changed. You can avoid these changes by not subsampling.
The JPEG DCT may introduce small errors (limited to +/-1 if implemented correctly)
Quantization will make rather large changes to the data. You can avoid changes at this step by having all 1s in your quantization tables.
No matter what you do, #1 and #3 can introduce changes in the JPEG compression process.
JPG is a lossy image format, so you can't expect it to hold the data exactly after it is saved. It is especially unsuited for steganography, as it will destroy the small details required for this use, even when using the highest quality setting.
Solution is to use a lossless format, like PNG.
A BufferedImage may be already a byte array. If when you create you BufferedImage you use the encoding TYPE_BYTE_GRAY, 3BYTE_BGR or 4BYTE_ABGR, then your BufferedImage is already a byte array. To access the byte array, you do: byte[] buffer = ((DataBufferByte)my image.getRaster().getDataBuffer()).getData() ;
And when you write an image as a JPEG, you compress with loss your image. So the information you save is altered and cannot be retrieved as before. You should use PNG/TIFF/BMP, PNG being the most common.