I have a few JPG images that seem to be corrupt - yet the program dealing with them has no problems at all. I need to convert them to a new database - using C# or Delphi to do it.
The images are stored in a DB (which I can then save to file if I need to) - and the image has the following starting text in the header....
Bad Image
When it should be something like
Example of Good JPG Header
Note that the image has the text LEAD Technologies V1.01. I have contacted the company and they are currently on version 20.x - so it is so old even their latest tools will not read this image properly.
Has anyone out there had to deal with this issue in the past? If so - any thoughts as to how to deal with this one?
It looks as if the image is corrupted - but as I noted the original program can still use it as a image file...
As requested - Full Image to review
Full Image Download
I have been trying to analyse your files and see if I can work out how they are corrupted.
Normally, JPEG files have well-known markers in them, which consist of two bytes - namely 0xFF followed by a second byte that is not 0x00.
If you scan a normal JPEG file for markers, like this:
xxd -c16 -g1 -u normal.jpg | ggrep --color=always "FF [1-9A-F][1-9A-F]"
you will get something like this:
and you can see:
SOI (start of image) - 0xFFD8
DQT (define quantization table) - 0xFFDB
DHT (define Huffman table) - 0xFFC4
SOS (start of scan) - 0xFFDA
EOI (end of image) - 0xFFD9
If, on the other hand, you scan your images, you just get pages of junk - sadly I cannot work out what the pattern is. If anyone else can, I can probably remove it - so ping me with a comment if you can!
Related
So I am developing a type of tamagotchi (virtual pet) for my microprocessors final. I made my own images 128x64 pixels long, as I am using a display with that resolution, so each image weighs 1Kbytes. I am using an at89s52 (8052) microcontroller and it doesn't have enough memory to store all the animations I want. My plan (and I kind of want to keep it that way) is to use an EPROM to save all my images with intel hex format (the programmer I am using is SUPERPRO and it imports that type of files). Of course the assembly code will be easy for me after the point where I have the data in the ROM. I am not that good of a programmer to develop a code that does exaclty what I want (convert images to intel hex), and all the software I have tried doesn't generate them correctly (inserts hex values that aren't supposed to be there, for example: in a blank space there is supposed to be only zeroes, and there is another value). I have tried with png with transparent background, with white background and jpg. The images I have are such:
http://imgur.com/a/yiCOb
(it seems I am not allowed to post images here)
I donĀ“t see much help in other places of the internet, so the answer to this question would be of great help for future MCU-based programmers. Thank you.
It's about 30 years since I last made an EPROM :-)
Anyway, you need 2 things...
Part One
Firstly, your files are PNG format which means they have dates, times, palettes, gamma chunks and a bunch of zlib compressed data and you can't just copy that to a screen buffer. So, you need to convert the PNGs to a simple binary format where 0 is off and 1 is on and there is nothing else in the file. The easiest way to do that is with ImageMagick which is installed on most Linux platforms and is available for free on macOS and Windows. Let's say one of your frames is called anim.png and we want to get it to a simple format, like PGM (Portable GreyMap - see Wikipedia description) we can use ImageMagick like this at the console:
convert anim.png -compress none anim.pgm
The first few lines will be:
P2
128 64
255
255 255 255 255 255 255 255 ...
...
...
because the image is 128x64 and the maximum brightness in the file is 255. Then all the data follows in ASCII (because I put -compress none). In there, 255 represents white and 0 represents black.
As that is too big for the screen, here is an image of how it looks - hopefully you can see your black box as a bunch of zeroes in the middle at the bottom:
Now, if you run that same command again, but remove the -compress none, the same header will be produced but the data will follow in binary.
convert anim.png anim.pgm
And we can also use sed to delete the 3 lines of header:
convert anim.png anim.pgm | sed '1,3d' > anim.bin
Now you have a binary file of just pure pixels that is free of dates/times, author and copyrights, palettes and compressed data, you can pass to the next part.
Part 2
Secondly, once you have got your data in a sensible binary format you need to convert it to Intel Hex, and for that you need srec_cat which is available for Linux daily and via homebrew on a Mac.
Then, I haven't tested this and have never used it, I think you will want something like:
srec_cat anim.bin -binary -output -intel
:020000040000FA
:20000000323535203235352032353520323535203235352032353520323535203235352000
:200020003235352032353520323535203235352032353520323535203235352032353520E0
:200040003235352032353520323535203235352032353520323535203235352032353520C0
:200060003235352032353520323535203235352032353520323535203235352032353520A0
:20008000323535203235352032353520323535203235352032353520323535203235352080
...
:207E8000353520323535203235352032353520323535203235352032353520323535203202
:207EA0003535203235352032353520323535203235352032353520323535203235352032E2
:147EC000353520323535203235352032353520323535200A2A
:00000001FF
Summary
You can abbreviate and simplify what I am suggesting above - I will leave it there so folks can understand it in future though!
convert YourImage.png gray: | srec_cat - -binary -output -intel
The gray: is a very simple ImageMagick format equivalent to just the binary part of a PGM file without any header. Like PGM it uses one byte per pixel so it will be somewhat inefficient for your pure black and white needs. You can see that by looking at the file size - the PGM file is 8192 bytes, so 1 byte per pixel. If you really, really want 1 bit per pixel, you could use PBM format like this:
convert YourImage.png pbm: | sed '1,3d' | srec_cat - -binary -output -intel
Note:
From v7 of ImageMagick onwards, you should replace convert by magick so as to avoid clashing with Windows' built-in convert command that converts filesystems to NTFS.
ImageMagick is quite a large package, you could do this equally well just with the NetPBM suite, and use the tool called pngtopnm in place of convert.
A long time ago I screwed with my HDD and had to recover all my data, but I couldn't recover the files' names.
I used a tool to sort all these files by extension, and another to sort the JPGs by date, since the date when a JPG was created is stored in the file itself. I can't do this with PNGs, though, unfortunately.
So I have a lot of PNGs, but most of them are just icons or assets used formerly as data by the software I used at that time. But I know there are other, "real" pictures, that are valuable to me, and I would really love to get them back.
I'm looking for any tool, or any way, just anything you can think of, that would help me separate the trash from the good in this bunch of pictures, it would really be amazing of you.
Just so you know, I'm speaking of 230 thousand files, for ~2GB of data.
As an exemple, this is what I call trash :
or , and all these kind of images.
I'd like these to be separated from pictures of landscapes / people / screenshots, the kind of pictures you could have in you phone's gallery...
Thanks for reading, I hope you'll be able to help !
This simple ImageMagick command will tell you the:
height
width
number of colours
name
of every PNG in the current directory, separated by colons for easy parsing:
convert *.png -format "%h:%w:%k:%f\n" info:
Sample Output
600:450:5435:face.png
600:450:17067:face_sobel_magnitude.png
2074:856:2:lottery.png
450:450:1016:mask.png
450:450:7216:result.png
600:450:5435:scratches.png
800:550:471:spectrum.png
752:714:20851:z.png
If you are on macOS or Linux, you can easily run it under GNU Parallel to get 16 done at a time and you can parse the results easily with awk, but you may be on Windows.
You may want to change the \n at the end for \r\n under Windows if you are planning to parse the output.
Having approximately 600GB of photos collected over 13 years - now stored on freebsd zfs/server.
Photos comes from family computers, from several partial backups to different external USB HDDs, reconstructed images from disk disasters, from different photo manipulation softwares (iPhoto, Picassa, HP and many others :( ) in several deep subdirectories - shortly = TERRIBLE MESS with many duplicates.
So in the first i done:
searched the the tree for the same size files (fast) and make md5 checksum for those.
collected duplicated images (same size + same md5 = duplicate)
This helped a lot, but here are still MANY MANY duplicates:
photos what are different only with exif/iptc data added by some photo management software, but the image is the same (or at least "looks as same" and have the same dimensions)
or they are only a resized versions of the original image
or they are the "enhanced" versions of originals, etc..
Now the questions:
how to find duplicates withg checksuming only the "pure image bytes" in a JPG without exif/IPTC and like meta informations? So, want filter out the photo-duplicates, what are different only with exif tags, but the image is the same. (therefore file checksuming doesn't works, but image checksuming could...). This is (i hope) not very complicated - but need some direction.
What perl module can extract the "pure" image data from an JPG file what is usable for comparison/checksuming?
More complex
how to find "similar" images, what are only the
resized versions of the originals
"enchanced" versions of the originals (from some photo manipulation programs)
is here already any algorithm available in a unix command form or perl module (XS?) what i can use to detect these special "duplicates"?
I'm able make complex scripts is BASH and "+-" :) know perl.. Can use FreeBSD/Linux utilities directly on the server and over the network can use OS X (but working with 600GB over the LAN not the fastest way)...
My rough idea:
delete images only at the end of workflow
use Image::ExifTool script for collecting duplicate image data based on image-creation date, and camera model (maybe other exif data too).
make checksum of pure image data (or extract histogram - same images should have the same histogram) - not sure about this
use some similarity detection for finding duplicates based on resize and foto enhancement - no idea how to do...
Any idea, help, any (software/algorithm) hint how to make order in the chaos?
Ps:
Here is nearly identical question: Finding Duplicate image files but i'm already done with the answer (md5). and looking for more precise checksuming and image comparing algorithms.
Assuming you can work with localy mounted FS:
rmlint : fastest tool I've ever used to find exact duplicates
findimagedupes : automatize the whole ImageMagick way (as Randal Schwartz's script that I haven't tested? it seems)
Detecting Similar and Identical Images Using Perseptual Hashes goes all the way (a great reference post)
dupeguru-pe (gui) : dedicated tool that is fast and does an excellent job
geeqie (gui) : I find it fast/excellent to finish the job, using the granular deduplication options. Also then you can generate an ordered collection of images such that 'simular images are next to each other, allowing you to 'flip' between the two to see the changes.
Have you looked at this article by Randal Schwartz? He uses a perl script with ImageMagick to compare resized (4x4 RGB grid) versions of the pictures that he then compares in order to flag "similar" pictures.
You can remove exif data with mogrify -strip from ImageMagick toolset. So you could, for each image, copy it without exif, md5sum, and then compare md5sums.
When it comes to visually similar messages - you can, for example, use compare (also from ImageMagick toolset), and produce black/white diff map, like described here, then make histogram of the difference and check if there is "enough" white to mean that it's different.
I had a similar dilemma - several hundred gigs of photos and videos spread and duplicated over about a dozen drives. I know this may not be the exact way you are looking for, but the FSlint Janitor application (on Ubuntu 16.x, then 18.x) was a lifesaver for me. I took the project in chunks, eventually cleaning it all up and ended up with three complete sets (I wanted two off-site backups).
FSLint Janitor:
sudo apt install fslint
I have several large PDF reports (>500 pages) with grid lines and background shading overlay that I converted from postscript using GhostScript's ps2pdf in a batch process. The PDFs that get created look perfect in the Adobe Reader.
However, when I go to print the PDF from Adobe Reader I get about 4-5 ppm from our Dell laser printer with long, 10+ second pauses between each page. The same report PDF generated from another proprietary process (not GhostScript) yeilds a fast 25+ ppm on the same printer.
The PDF file sizes on both are nearly the same at around 1.5 MB each, but when I print both versions of the PDF to file (i.e. postscript), the GhostScript generated PDF postscript output is about 5 times larger than that of the other (2.7 mil lines vs 675K) or 48 MB vs 9 MB. Looking at the GhostScript output, I see that the background pattern for the grid lines/shading (referenced by "/PatternType1" tag) is defined many thousands of times throughout the file, where it is only defined once in the other PDF output. I believe this constant re-defining of the background pattern is what is bogging down the printer.
Is there a switch/setting to force GhostScript to only define a pattern/image only once? I've tried using the -r and -dPdfsettings=/print switches with no relief.
Patterns (and indeed images) and many other constructs should only be emitted once, you don't need to do anything to have this happen.
Forms, however, do not get reused, and its possible that this is the source of your actual problem. As Kurt Pfiefle says above its not possible to tell without seeing a file which causes the problem.
You could raise a bug report at http://bubgs.ghostscript.com which will give you the opportunity to attach a file. If you do this please do NOT attach a > 500 page file, it would be appreciated if you would try to find the time to create a smaller file which shows the same kind of size inflation.
Without seeing the PostScript file I can't make any suggestions at all.
I've looked at the source PostScript now, and as suspected the problem is indeed the use of a form. This is a comparatively unusual area of PostScript, and its even more unusual to see it actually being used properly.
Because its rare usage, we haven't any impetus to implement the feature to preserve forms in the output PDF, and this is what results in the large PDF. The way the pattern is defined inside the form doesn't help either. You could try defining the pattern separately, at least that way pdfwrite might be able to detect the multiple pattern usage and only emit it once (the pattern contains an imagemask so this may be worthwhile).
This construction:
GS C20 setpattern 384 151 32 1024 RF GR
GS C20 setpattern 384 1175 32 1024 RF GR
is inefficient, you keep re-instantiating the pattern, which is expensive, this:
GS C20 setpattern
384 151 32 1024 RF
384 1175 32 1024 RF
GR
is more efficient
In any event, there's nothing you can do with pdfwrite to really reduce this problem.
'[...] when I print both versions of the PDF to file (i.e. postscript), the GhostScript generated PDF postscript output is about 5 times larger than that of the other (2.7 mil lines vs 675K) or 48 MB vs 9 MB.'
Which version of Ghostscript do you use? (Try gs -v or gswin32c.exe -v or gswin64c.exe -v to find out.)
How exactly do you 'print to file' the PDFs? (Which OS platform, which application, which kind of settings?)
Also, ps2pdf may not be your best option for the batch process. It's a small shell/batch script anyway, which internally calls a Ghostscript command.
Using Ghostscript directly will give you much more control over the result (though its commandline 'usability' is rather inconvenient and awkward -- that's why tools like ps2pdf are so popular...).
Lastly, without direct access to one of your PS input samples for testing (as well as the PDF generated by the proprietary converter) it will not be easy to come up with good suggestions.
I'm looking for a method of converting .epsf to .eps for a publication I'm submitting. The submission site requires .eps (even though my understanding is that modern renderers should be able to read .epsf as well - the site is archaic, I have to upload all 100 images individually.) My co-author sent me the zipped files to upload (and now to convert) - I didn't make them myself. Further, the programs that made these images may exist on my co-authors computer but where is uncertain.
I've tried this in Mathematica 8 to reasonable but not full success - as in colored files become black and white, files with duplicate entries (as in Fig11a.eps and Fig11a.epsf both exist though they are different, it seems that the .eps is the background and the .epsf is the foreground layer) convert incorrectly. My attempt was to import the .epsf files to Mathematica and export them as .eps.
Also, I've using a middle man format - e.g. gif/tiff/png/jpg - with similar results. I haven't been able to find a program that's free (I assume photoshop could pull this off) that I could use - also I'd like to do it as a batch. A method that uses requires python/Mathematica or XP/Linux OS's would be fine. Thanks.
You do not need to convert anything. Encapsulated PostScript files can have both extensions (both EPS and EPSF). If you publisher refuses to accept files with an EPSF extension just rename them to EPS.
Any processing/conversion you do on the files (using GhostScript, Mathematica, etc.) carries the risk of corrupting the graphics in some way. But there's no need to do it. Send them as they are or rename them if you prefer.
(If you have any doubt, you can check the EPS Format Specification from 1992 which says that on the Macintish the recommended file extension is .epsf while on DOS it's .EPS)