I'm trying to figure out what is the best way to "clean" images that are coming from non authorized source (app visitors) before opening them, similar to Whatsapp.
Scanning each image with anti virus is probably not so efficient in a large scale, So i came to assumption that rewriting each incoming image by compressing it using jpeg could results a clean image without a malicous code inside it.
From what i read so far the JPEG compression should destroy any hidden content and reorder the data structure of the image which will results a safe image.
WTYT? Am i on the right path to overcome this issue?
There is no code in a JPEG stream. In fact, I don't know of any image format that directs the decoder execute code.
The worst think I can thing of would be to have a JPEG stream that, say embedded malicious code in a thumbnail, COM, or APPn marker. Then another application would look for that image and load the code.
Even this requires something else to get on your system to execute the JPEG "code" and it would be a lot of trouble for something that could be accomplished much easier.
Related
I encountered a website where the images consistently throws 'invalid jpeg marker' error when downloaded. I am wondering is it possible that they are intentionally doing something which causes this error for most of the users who try to download and use their images?
I want to protect the jpeg resources of my website from unauthorised use. Is it possible to really change something in jpeg header or meta tags so that jpg images display fine on browser but if someone downloads it for their own use it throws an error 'invalid jpeg marker'?
(I don't intend to discuss alternative ways of protecting images online or the limitations of it.)
If it can display on a browser, it can display on something else. What is likely happening is that the decoder you are using to view when downloaded is more strict than your web browser. I presume you are not using your browser to view them after downloading.
You could run the "file" command to make sure the images are actually JPEGs. If so, there are a number of programs available to analyze the structure of a JPEG stream. You could use one to see what's going on with the image. You may have some odd marker ordering or possibly the JPEG image does not occur right at the start of the file stream. I've seen some weird JPEGs where there were extra bytes after the APPn marker and before the next JPEG marker. Some decoders ignored the extra bytes others puked.
Browser can render progressive images progressively.
And the images can only be progressively decoded if they were progressively encoded.
e.g., GIF or PNG images saved with the "interlaced" option, or JPEG images saved with the "progressive" option.
I want to render the progressive images in my MFC based application just like the browser does.
Windows Imaging Component provide IWICProgressiveLevelControl interface to decode image progressively.
But I can't find out any example to show how to stream and display image progressively at the same time using IWICProgressiveLevelControl.
Any advice would be appreciated. Thanks.
There's a good sample here:
https://code.msdn.microsoft.com/Windows-Imaging-Component-3af3cd49
Once you've used IWICProgressiveLevelControl::SetCurrentLevel to select the scan, the decoder will behave normally but only use the scans up to and including the one you selected. So any call to CopyPixels or any IWICBitmapSource components in your chain will receive the fully decoded image at the selected scan level.
The trick, as demonstrated in the sample, is that you can't use IWICProgressiveLevelControl::GetLevelCount and select the max level immediately if you don't know the complete file is available. As the documentation for the sample states,
IWICProgressiveLevelControl allows you to control which progressive level of detail to use on the frame decode. It also allows you to query the total number of progressive levels in the file; however it is not recommended to use this method on JPEG images because the total count is not known until the entire image has been downloaded, defeating the purpose of progressive decode. Instead, this sample demonstrates the recommended practice of iteratively requesting increasing levels of detail until WIC returns WINCODEC_ERR_INVALIDPROGRESSIVELEVEL.
Sometimes when I see a lower resolution load before I see the full image. This is not done in jQuery as it also happens when you load the image stand-alone. I have not a single idea how they do this, but I guess it's something server side.
My question is, how would I do this on my own server?
The images are stored in "progressive" or "interlaced" JPEG format, meaning that the low-res image data is stored before the high-res data (in layman's terms). You can encode any JPEG as interlaced. You can even do it on the server using imagemagick.
Here's Jeff Atwood talking about this.
I use Abbyy FineReader for ScanSnap to OCR a couple of scanned PDF files. The software claims it retains the original PDF images. The PDF file sizes pre-OCR and post-OCR are almost identical, which is good.
After the software is done, all PDF images appear anti-aliased in Acrobat X. Page navigation is much slower than before, and when I zoom in/out, the images first go to what looks like the pre-anti-aliasing version before quickly changing to anti-aliased images.
Left: Scanned PDF / Right: after OCR with Abbyy
I would like to get the original images without anti-aliasing back. Interestingly, when I open a single page from the anti-aliased PDF in Photoshop, there is no anti-aliasing and the image looks like the left one.
My limited PDF programming experience leads me to believe that Abbyy likely sets some kind of anti-alias flag for each image during OCR processing. How do I un-set this flag?
Any pointers to useful ideas would be much appreciated.
After the software is done, all PDF images appear anti-aliased in Acrobat X. Page navigation is much slower than before, and when I zoom in/out, the images first go to what looks like the pre-anti-aliasing version before quickly changing to anti-aliased images.
Actually in the original file 2013_11_15_22_51_31.pdf contains a JPEG image while the OCR'ed file 2013_11_15_22_51_31_OCR.pdf contains a JPEG2000 image.
Comparing them in third party viewers, it becomes clear that the image in the OCR'ed file is not inherently anti-alias'ed. Furthermore there is no evident flag in the PDF instructing PDF viewers to apply anti-aliasing to the JPEG2000 image. Thus, Adobe Reader seems to automatically render JPEG and JPEG2000 images differently, applying anti-aliasing to the latter but not to the former.
Comparing both images in detail, though, it becomes clear that these images are not identical but instead the image in the OCR'ed PDF is slightly rotated.
I assume Abbyy FineReader recognized that the original scanned image is not correctly oriented. Thus, it rotated it slightly to correct this orientation.
Thus, replacing the image in the OCR'ed version with the one from the original one is no option: Due to the rotation the OCR information would partially be somewhat off.
What you might want to try is to recode the JPEG2000 image to JPEG and replace the image in the OCR'ed version with this recoded one. This will mean some loss of quality but most likely you can get rid of the anti-aliasing this way.
Be aware, though, that the JPEG2000 image is slightly larger than the JPEG image to accomodate for the rotation.
PS: As #VadimR pointed out, there is indeed an /Interpolate true entry in the image dictionary of the OCR-ed version I missed when looking at the file. This does not seem to be the major issue slowing down the rendering.
There is /Interpolate true entry in image dictionary of OCR-ed version, and that's what causes 'anti-aliasing'. Whether that (and not JPEG2000 instead of JPEG compression) is a cause of slow-down, you check on large enough files.
To un-set this key, the best would be to turn it off while creating a file, and if that's not possible, to write and run a small program in suitable language.
But, since your file doesn't sport 'compressed objects' and offending key is in plain view inside a file, in the spirit of 'job done quickly' you can simply process your file e.g. like this:
perl -M-encoding -0777pe "s!/Interpolate true!' 'x17!ge" <in.pdf >out.pdf
I have a requirement where in I have to determine whether a photo is corrupted and accordingly tag it as such.
Another thing, I need is to determine if an Image has got wrong extension. What I mean by wrong extension is that sometimes I have come across a photo that has extension of jpg but when I load this photo into IrfanView it reports that the photo is in different format that the extension.
How can I do this in Delphi.
I have a requirement where in I have to determine whether a photo is corrupted and accordingly tag it as such.
You can try some things, but with certain file formats (example: BMP, JPEG to some extent) only a human can ultimately decide if the file is OK or corrupted. The simplest test is to simply load the file into a corresponding object (TJpegImage, TPngObject, etc). If you get an exception while loading you've surely got a corrupted file. Unfortunately if no exception is raised you can't really say the file is not corrupted. I've seen corrupted JPEG files that load just fine into a Delphi TImage and can be opened with Windows's Image Viewer, but are obviously corrupted to a human observer. With BMP images it's even clearer: open up a bitmap, overwrite some bytes in the middle of the file and then open it in a viewer. How can any automated system tell those wrongly colored bits in the middle of the bitmap are actually wrong?
Another thing, I need is to determine if an Image has got wrong extension. What I mean by wrong extension is that sometimes I have come across a photo that has extension of jpg but when I load this photo into IrfanView it reports that the photo is in different format that the extension.
How about doing some of the same, trying to load the file into the object that corresponds to it's extension, and if you fail, try opening up with some other formats? This should be easy.
Alternatively you can investigate image headers: Most file formats start with a short signature, a few bytes. You can look up the documentation of all image file formats and find the signature, or you can simply open up an large number of files and look for a pattern in the first 4 bytes. I'd go for this second alternative since finding proper documentation for all image file formats might be a challenge.
The only way to check if file is corrupted is to try reading it as it is described in file format, ie. load BMP as BMP with reading BMP header, BMP data etc. There are many web pages that describe graphics file formats. Of course if you transmit files and are afraid that it will be corrupted after transmitting then save such files with some sum like CRC32, or even cryptographic MD5 or SHA1. Then after transmitting check if calculated sum is the same as original.
In Delphi there is unit jpeg and types TJPEGImage and TBitmap. Try loading it with data and check exception. For others formats there are many libraries, just look for required file formats.
To check if file extension is good try reading some first bytes of file and check it with some dictionary of graphics file headers. For example GIF files should start with GIF, BMP files starts with BM, and in JPEG header you will find JFIF. I think unix utility file works this way.
Since you used the term "requirement", I suspect that you're doing a job for someone, possibly as a contract. So make sure that you nail the requirements before worrying about the code.
IMO, you need to get samples of test cases. As others mentioned, failure to load the file as a particular format will be one test. But what about a .jpg that loads ok, but the bottom third is missing? Or a .jpg that loads ok but has green "static" lines in the middle where an error occurred upstream somewhere (on the camera, photoshop, whatever) but then the processing recovered and resumed? In this case, the .jpg may really have green lines in it. Is that considered "corrupt" or not? This is where you need to be careful, especially if it's a contract job.
I have handled this situation by reading the suspicious image and trying to getting its shape. The task is done within try-except block. Following is the code:
import cv2
image = cv2.imread('./image.jpg')
try:
dummy = image.shape # this line will throw the exception
except:
print("[INFO] Image is not available or corrupted.")
This approach should cover all your needs like:
Detecting a corrupted image
Non-image file with an image-type extension detection
Missing image detection etc.