Decode JPEG image stripped from inside a PDFs file - image

I have code that decompresses jpgs into bit maps which works fine for JPEG files, however when I feed the code a JPEG I have stripped directly from a PDFs XObject I get errors.
Adobe reader displays the image fine so I don't believe it's corrupted. I have read through JPEG and PDFs documentation and don't find any obvious problems.
My question is this, is there anything different in the "JPEG" embedded inside a PDFs stream and a normal JPEG? And if so what is it?
Note: I can manually open the PDFs, copy the image, paste into paint, and save...when I do this everything works....my problem is I need this automated.
When my code parses the PDFs, strips out the image stream, dumps the binary to a file, and then I try and open this file, it does not work. What am I missing?
My errors seem to be occurring in the Huffman decoding process, the cdt and Huffman tables appear to be read in fine.

Pardon my using the answer section but I overflowed the comment section:
My questions:
1. What code is failing to decode the JPEG? You say you "have code" but where did that come from? Why do you think that it is reliable?
What is the file format of the JPEG stream? JFIF, ADOBE, EXIF, none specified?
Could there be something in the file format that your decoder cannot handle? Does your encoder check for different types of APPn markers?
What is the JPEG format? What type of SOS marker?
Does this encoder source handle all the normally formats? Baseline, Extended, Sequential, progressive? If you have progressive JPEG and and encoder that only does baseline, you are going to have a problem.
How many components does the JPEG stream have?
Some Adobe files have 4 components and decoders may only be able to handle 1 or 3.

Related

Tiled Tiff decoding error by 3rd party readers

I am trying to write tiled tiff images with libTiff in c++. I have no problem in encoding and decoding the image in my program, but 3rd party reader softwares shows they have problem decoding. For example:
In windows, my tiff cannot be previewed/opened by windows photo viewer;
With QGIS, most of the time when I zoom-in/out, it warns "TIFFReadEncodedTile() failed" yet sometimes it handles the same area and level with no problem reported.
For debugging, I wrote the same image in png, let libvips convert it to tiled tiff and no reader complaints about that one. I read the tiff6.0 specification and wrote a program to check the IFDs and they are identical. The two files are Here.
What have I done wrong? Is there any generic image format checking tools that I may use to solve these issues?
Much appreciated.

images with invalid jpeg marker

I encountered a website where the images consistently throws 'invalid jpeg marker' error when downloaded. I am wondering is it possible that they are intentionally doing something which causes this error for most of the users who try to download and use their images?
I want to protect the jpeg resources of my website from unauthorised use. Is it possible to really change something in jpeg header or meta tags so that jpg images display fine on browser but if someone downloads it for their own use it throws an error 'invalid jpeg marker'?
(I don't intend to discuss alternative ways of protecting images online or the limitations of it.)
If it can display on a browser, it can display on something else. What is likely happening is that the decoder you are using to view when downloaded is more strict than your web browser. I presume you are not using your browser to view them after downloading.
You could run the "file" command to make sure the images are actually JPEGs. If so, there are a number of programs available to analyze the structure of a JPEG stream. You could use one to see what's going on with the image. You may have some odd marker ordering or possibly the JPEG image does not occur right at the start of the file stream. I've seen some weird JPEGs where there were extra bytes after the APPn marker and before the next JPEG marker. Some decoders ignored the extra bytes others puked.

Matlab image conversion jp2 to jpg

I have a weird problem in matlab. I have code that takes in a directory of jp2 files and converts all of them to either tiff, png, or a jpg file. Then it puts these files in a new directory. The user can specify how big they want the file to be in terms of how many pixels are used (EX: 1:3:end is every three pixels). This code works perfectly for the png and tiff conversions.
With the jpg conversion there is no error whatsoever but when I go to click on the jpeg file in the new folder (which it does go to at least) It says "Windows Photo Viewer can't open this picture because the file appears to be damaged, corrupted, or is too large" I tried opening the pictures in other viewers but it said the same thing. All of the png and tiff pictures opened fine.
Some help would be greatly appreciated, thanks!
Edit: I noticed when I call imshow on the location of the jpeg file it actually does show up in matlab. It still does not show up in any image viewers though

Ghostscript Stamp Image on PDF

Is there any way to stamp or overlap a tiff image on a existing PDF file and output the result using Ghostscript?
I have two PDF which i want to merge in a result PDF with one over the other using ghostscript. I want to know if this can be done and how, or if it may work with one PDF as tiff image on top of the base PDF.
Can ghostscript make this stamp using layers in the PDF?
Thank you for your answers
The pdfwrite device in Ghostscript doesn't really support layers, so you can't use that. Also its unclear why you think layers would help.
TIFF isn't part of PostScript (or PDF), so you can't directly read a TIFF file into GS. I have elsewhere posted a PostScript program which reads TIFF files and renders them for output. You could use that to read a TIFF file.
However, you would have to mess about with either the PDF interpreter or a custom EndPage procedure in order to read and render the TIFF file. And unless you take specific kinds of action, it will be opaque, which may well not be what you want.
The Ghostscript PDF interpreter doesn't really lend itself to this kind of manipulation, have you considered using pdftk instead ?

How to determine if a photo is corrupted?

I have a requirement where in I have to determine whether a photo is corrupted and accordingly tag it as such.
Another thing, I need is to determine if an Image has got wrong extension. What I mean by wrong extension is that sometimes I have come across a photo that has extension of jpg but when I load this photo into IrfanView it reports that the photo is in different format that the extension.
How can I do this in Delphi.
I have a requirement where in I have to determine whether a photo is corrupted and accordingly tag it as such.
You can try some things, but with certain file formats (example: BMP, JPEG to some extent) only a human can ultimately decide if the file is OK or corrupted. The simplest test is to simply load the file into a corresponding object (TJpegImage, TPngObject, etc). If you get an exception while loading you've surely got a corrupted file. Unfortunately if no exception is raised you can't really say the file is not corrupted. I've seen corrupted JPEG files that load just fine into a Delphi TImage and can be opened with Windows's Image Viewer, but are obviously corrupted to a human observer. With BMP images it's even clearer: open up a bitmap, overwrite some bytes in the middle of the file and then open it in a viewer. How can any automated system tell those wrongly colored bits in the middle of the bitmap are actually wrong?
Another thing, I need is to determine if an Image has got wrong extension. What I mean by wrong extension is that sometimes I have come across a photo that has extension of jpg but when I load this photo into IrfanView it reports that the photo is in different format that the extension.
How about doing some of the same, trying to load the file into the object that corresponds to it's extension, and if you fail, try opening up with some other formats? This should be easy.
Alternatively you can investigate image headers: Most file formats start with a short signature, a few bytes. You can look up the documentation of all image file formats and find the signature, or you can simply open up an large number of files and look for a pattern in the first 4 bytes. I'd go for this second alternative since finding proper documentation for all image file formats might be a challenge.
The only way to check if file is corrupted is to try reading it as it is described in file format, ie. load BMP as BMP with reading BMP header, BMP data etc. There are many web pages that describe graphics file formats. Of course if you transmit files and are afraid that it will be corrupted after transmitting then save such files with some sum like CRC32, or even cryptographic MD5 or SHA1. Then after transmitting check if calculated sum is the same as original.
In Delphi there is unit jpeg and types TJPEGImage and TBitmap. Try loading it with data and check exception. For others formats there are many libraries, just look for required file formats.
To check if file extension is good try reading some first bytes of file and check it with some dictionary of graphics file headers. For example GIF files should start with GIF, BMP files starts with BM, and in JPEG header you will find JFIF. I think unix utility file works this way.
Since you used the term "requirement", I suspect that you're doing a job for someone, possibly as a contract. So make sure that you nail the requirements before worrying about the code.
IMO, you need to get samples of test cases. As others mentioned, failure to load the file as a particular format will be one test. But what about a .jpg that loads ok, but the bottom third is missing? Or a .jpg that loads ok but has green "static" lines in the middle where an error occurred upstream somewhere (on the camera, photoshop, whatever) but then the processing recovered and resumed? In this case, the .jpg may really have green lines in it. Is that considered "corrupt" or not? This is where you need to be careful, especially if it's a contract job.
I have handled this situation by reading the suspicious image and trying to getting its shape. The task is done within try-except block. Following is the code:
import cv2
image = cv2.imread('./image.jpg')
try:
dummy = image.shape # this line will throw the exception
except:
print("[INFO] Image is not available or corrupted.")
This approach should cover all your needs like:
Detecting a corrupted image
Non-image file with an image-type extension detection
Missing image detection etc.

Resources