Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 7 years ago.
Improve this question
I have a .PS file that I would like to view, but am not able to even through countless trial and errors.
Please download the file here https://www.dropbox.com/s/ehnmib05wdhspfc/acsii_kfsh_logo.ps?dl=0
An error that comes up is:
%%[ Error: nocurrentpoint; OffendingCommand: currentpoint ]%%
%%[ Flushing: rest of job (to end-of-file) will be ignored ]%%
%%[ Warning: PostScript error. No PDF file produced. ] %%
From my understanding, that means that there was a problem with the way PostScript code was generated. Teh file was sent to me to view the image, but I am not able to view it. Could it be that the error is coming up because it was incorrectly generated? The sender says it is working just fine with them, and that is what really frustrates me because it's not working with me.
Any help will be GREATLY appreciated!
There is something wrong in your workflow, but it's hard to spot what. "The sender says it is working just fine with them" doesn't really mean anything -- what specific software are they using? And since this is clearly an export and not an original file, is it this file that "works for them" or do they mean the original does?
Anyway, the file contains two errors:
Instead of the usual slash for /name notation, this file contains tildes ~:
currentpoint ~y$pos exch def ~x$pos exch def
That is a weird error because it is invalid PostScript, and no regular software can be expected to work with this. This is the cause of the following error that I get:
%%[ Error: undefined; OffendingCommand: ~y$pos ]%%
Somehow you don't see this error, so there must be something else wrong! Perhaps the file was damaged in transferring to your Dropbox (which would be an achievement on its own).
Should it also be on your side: to fix, replace each occurrence of the ~ character with /.
The file starts with defining x and y coordinates, based on the current point. But this needs additional information: the actual drawing coordinate is not given, and so it must rely on other software to provide the drawing coordinate. Of course neither Adobe Illustrator nor Distiller do this -- they assume the file is self-contained, a reasonable assumption.
This causes the error message
%%[ Error: nocurrentpoint; OffendingCommand: currentpoint ]%%
To fix it you can add the following line at the top:
0 0 moveto
and it will distill properly.
Proper software such as InDesign and Illustrator will still be unable to open the file as image, because it's missing something else: a proper header. The very minimum needed is this, at the very top of the file:
%!PS-Adobe-3.0 EPSF-3.0
%%BoundingBox: 0 0 92 87
After all this work I found the file doesn't contain any vector information at all! EPS is quite a bad choice to send out bitmaps; TIFF is the industry standard for these, but a PNG or even a lowly BMP file would have done, and then without all of the problems you encountered. Discuss this with your supplier.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have a windows .NET application that manages many PDF Files. Some of the files are corrupt.
2 issues: I'll try to explain in my imperfect English...sorry
1.)
How can I detect if any pdf file is correct ?
I want to read header of PDF and detect if it is correct.
var okPDF = PDFCorrect(#"C:\temp\pdfile1.pdf");
2.)
How to know if byte[] (bytearray) of file is PDF file or not.
For example, for ZIP files, you could examine the first four bytes and see if they match the local header signature, i.e. in hex
50 4b 03 04
if (buffer[0] == 0x50 && buffer[1] == 0x4b && buffer[2] == 0x03 &&
buffer[3] == 0x04)
If you are loading it into a long, this is (0x04034b50). by David Pierson
I want the same for PDF files.
byte[] dataPDF = ...
var okPDF = PDFCorrect(dataPDF);
Any sample source code in .NET?
I check Header PDF like this:
public bool IsPDFHeader(string fileName)
{
byte[] buffer = null;
FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read);
BinaryReader br = new BinaryReader(fs);
long numBytes = new FileInfo(fileName).Length;
//buffer = br.ReadBytes((int)numBytes);
buffer = br.ReadBytes(5);
var enc = new ASCIIEncoding();
var header = enc.GetString(buffer);
//%PDF−1.0
// If you are loading it into a long, this is (0x04034b50).
if (buffer[0] == 0x25 && buffer[1] == 0x50
&& buffer[2] == 0x44 && buffer[3] == 0x46)
{
return header.StartsWith("%PDF-");
}
return false;
}
a. Unfortunately, there is no easy way to determine is pdf file corrupt. Usually, the problem files have a correct header so the real reasons of corruption are different. PDF file is effectively a dump of PDF objects. The file contains a reference table giving the exact byte offset locations of each object from the start of the file. So, most probably corrupted files have a broken offsets or may be some object is missed.
The best way to detect the corrupted file is to use specialized PDF libraries.
There are lots of both free and commercial PDF libraries for .NET. You may simply try to load PDF file with one of such libraries. iTextSharp will be a good choice.
b. According to the PDF reference the header of a PDF file usually looks like %PDF−1.X (where X is a number, for the present from 0 to 7). And 99% of PDF files have such header. However, there are some other kinds of headers which Acrobat Viewer accepts and even absence of a header isn't a real problem for PDF viewers. So, you shouldn't treat file as corrupted if it does not contain a header.
E.g., the header may be appeared somewhere within the first 1024 bytes of the file or be in the form %!PS−Adobe−N.n PDF−M.m
Just for your information I am a developer of the Docotic PDF library.
Well-behaving PDFs start with the first 9 Bytes as %PDF-1.x plus a newline (where x in 0..8). 1.x is supposed to give you the version of the PDF file format. The 2nd line are some binary bytes in order to help applications (editors) to identify the PDF as a non-ASCIItext file type.
However, you cannot trust this tag at all. There are lots of applications out there which use features from PDF-1.7 but claim to be PDF-1.4 and are thusly misleading some viewers into spitting out invalid error messages. (Most likey these PDFs are a result of a mis-managed conversion of the file from a higher to a lower PDF version.)
There is no such section as a "header" in PDF (maybe the initial 9 Bytes of %PDF-1.x are what you meant with "header"?). There may be embedded a structure for holding metadata inside the PDF, giving you info about Author, CreationDate, ModDate, Title and some other stuff.
My way to reliably check for PDF corruption
There is no other way to check for validity and un-corrupted-ness of a PDF than to render it.
A "cheap" and rather reliable way to check for such validity for me personally is to use Ghostscript.
However: you want this to happen fast and automatically. And you want to use the method programatically or via a scripted approach to check many PDFs.
Here is the trick:
Don't let Ghostscript render the file to a display or to a real (image) file.
Use Ghostscript's nullpage device instead.
Here's an example commandline:
gswin32c.exe ^
-o nul ^
-sDEVICE=nullpage ^
-r36x36 ^
"c:/path to /input.pdf"
This example is for Windows; on Unix use gs instead of gswin32c.exe and -o /dev/null.
Using -o nul -sDEVICE=nullpage will not output any rendering result. But all the stderr and stdout output of Ghostscript's processing the input.pdf will still appear in your console. -r36x36 sets resolution to 36 dpi to speed up the check.
%errorlevel% (or $? on Linux) will be 0 for an uncorrupted file. It will be non-0 for corrupted files. And any warning or error messages appearing on stdout may help you to identify problems with the input.pdf.
There is no other way to check for a PDF file's corruption than to somehow render it...
Update: Meanwhile not only %PDF-1.0, %PDF-1.1, %PDF-1.2, %PDF-1.3, %PDF-1.4, %PDF-1.5, %PDF-1.6, %PDF-1.7 and %PDF-1.8 are valid version indicators, but also %PDF-2.0.
The first line of a PDF file is a header identifying the version of the PDF specification
to which the file conforms %PDF-1.0, %PDF-1.1, %PDF-1.2, %PDF-1.3, %PDF-1.4 etc.
You could check this by reading some bytes from the start of the file and see if you have the header at the beginning for a match as PDF file. See the PDF reference from Adobe for more details.
Don't have a .NET example for you (haven't touched the thing in some years now) but even if I had, I'm not sure you can check for a complete valid content of the file. The header might be OK but the rest of the file might be messed up (as you said yourself, some files are corrupt).
You could use iTextSharp to open and attempt to parse the file (e.g. try and extract text from it) but that's probably overkill. You should also be aware that it's GNU Affero GPL unless you purchase a commercial licence.
Checking the header is tricky. Some of the code above simply won't work since not all PDF's start with %PDF. Some pdf's that open correctly in a viewer start with a BOM marker, others start like this
------------e56a47d13b73819f84d36ee6a94183
Content-Disposition: form-data; name="par"
...etc
So checking for "%PDF" will not work.
What I do is:
1.Validate extension
2.Open PDF file, read the header (first line) and check if it contains this string: "%PDF-"
3.Check if the file contains a string that specifies the number of pages by searching for multiple "/Page" (PDF file should always have at least 1 page)
As suggested earlier you can also use a library to read the file:
Reading PDF File Using iTextSharp
I have a postscript file when i open it with ghostscript it show output with no error. But when i try to distill it with adobe it stops with following error.
%%[ Error: undefined; OffendingCommand: show; ErrorInfo: MetricsCount --nostringval-- ]%%
I have shortened the file by removing text from it now there are only two words in output.
postscript file
The MetricsCount key is documented in Adobe Tech Note 5012 The Type 42 Font Format Specification. According to the specification it can have 3 possible values, 0, 2 or 4.
Section 5.7 on page 22 of the document:
When a key /MetricsCount is found in a CIDFont with CIDFontType 2, it
must be an integer with values 0, 2, or 4.
To me this suggests that the MetricsCount key/value pair is optional, and as I said other interpreters don't insist on its presence. I can't possibly tell you why Adobe Distiller appears to insist on it, I don't have any experience of the internals of the Distiller PostScript interpreter. I'd have to guess that all Adobe PostScript interpreters have this 'feature' though, presumably your printer is using an Adobe PostScript interpreter.
Simply adding the MetricsCount key does not work. Why didn't you try this yourself instead of asking me ? It would have been quicker....
The error is subtly different, I suspect the answer is that your CIDFont is missing something (or has something present) which is causing Distiller to look for a MetricsCount. I can't see anything obvious in the PostScript information, so perhaps there's something in the sfnts, though that would be surprising.
Interestingly I have in front of me a PostScript file containing a CIDFont which does not have a MetricsCount entry, and which Distiller processes without a complaint.
I can't let you have the file I'm using, its a customer file. However the fact that such a file exists indicates that other such files must exist. The one I'm looking at was created by QuarkXpress. I'd suggest that you try and find a working file to compare against. I'd also suggest that you try and make a smaller, simpler, CIDFont. One with a single glyph would be favourite I'd think.
I'm using gs 9.20 to merge some pdf documents into a single document
/usr/bin/gs9/bin/gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dRENDERTTNOTDEF=true -sOutputFile=/docs/merged.pdf
And I'm getting this error and have no idea how to resolve it. Has anyone come across these types of errors?
GPL Ghostscript 9.20: ERROR: Page 5 used undefined glyph 'g2' from
type 3 font 'PDFType3Untitled'
Without seeing the original file its not possible to be certain, but I would guess from the error that the file calls for a particular glyph in a font (PDFType3Untitled), and that font does not contain that glyph.
The result is that you get an error message (messages from the PDF interpreter which begin with ERROR, as opposed to WARNING, mean that the output is very likely to be incorrect).
You will still get a PDF file, and it may be visually identical with the original because, obviously, the original file didn't have the glyph either.
As for 'resolving' it, you need to fix the original PDF file,that's almost certainly where the problem is.
Please note that you are not 'merging' PDF files as I keep on saying to people, the original file is torn down to graphics primitives, and then a new file built from those primitives. You cannot depend on any constructs in the original file being present in the final file. A truly 'merged' file would preserve that, Ghostscript's pdfwrite device does not.
See here for an explanation.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
I have been around the world twice and a half trying to fix a strange issue with a large sum of images I have.
What I need is to know how to read and write the first 4 and 18th bytes in the header of a JPEG file. If they match certain properties, I need to do some certain work and tweak these bytes to reflect something else. I'm basically fixing the format of these pictures to a more standard format, as Delphi doesn't recognize the format they're in.
So how do I read/write these bytes? What do I use? I know nothing about reading the raw data of an image file, let alone tweaking it.
NOTE
Deleted most of the question as I had put way too much information from the start.
The mechanics of actually changing the JPG header are very easy. That doesn't mean fixing the JPG is easy, it's just that if you know what to change, you can easily do it. Since that's what you're asking, here's how to change the JPG header in-memory, load the modified JPG into a TJpgImage and convert to TBitmap.
Please note I don't think this is a good approach. Fixing the resulting bitmap is so much easier! See my fresh answer to your other question for how to fix the bitmap here.
And here's the code you asked for (how to change the JPG header):
function LoadJpegIntoBitmap_HeaderFix(const FileName:string): TBitmap;
var M: TMemoryStream;
P: PByteArray;
Jpg: TJPEGImage;
begin
M := TMemoryStream.Create;
try
M.LoadFromFile(FileName);
if M.Size < 18 then raise Exception.Create('File too short.');
P := M.Memory;
// Here you can manipulate the header of the JPG file any way you want.
if P[0] = $07 then P[3] := P[17] - P[0]*3;
// Now load the modified JPG into a TJpgImage
Jpg := TJPEGImage.Create;
try
Jpg.LoadFromStream(M);
// Convert to bitmap
Result := TBitmap.Create;
Result.Assign(Jpg);
finally Jpg.Free;
end;
finally M.Free;
end;
end;
Don't even think you can tweak manually the first x bytes of your jpeg files.
As it has been explained already, the jpeg format is multiform and very complex. So much that some commonly used libraries do not handle all the acceptable formats. Open a libjpeg.h and see the bunch of header structures you can find in a jpeg file.
If you want to modify your header without changing the data, you still have to open the jpeg with a library that will hand over the proper structures for you to modify the relevant flags and save it back with the wanted format.
Either find a library written in Delphi or with a Delphi interface and use it to build your batch converter or to transform the format on the fly when opening a jpeg in your application.
If you know the exact combination of properties/flags defining your problem files, you can tweak the jpeg.pas unit to suit your need as I've shown in the previous answer.
Here's some pseudo code really quick. I'll improve it later. I have to run.
Assign the Adobe JPEG image file to a TFileStream
Read first 18 bytes and use CompareMem to see if signature matches Adobe RGB JPEG
If RGB JPEG then
We'll either:
Load from stream and tweak the header as we load
OR
Load from stream, copy to TBitmap, and use ScanLine to fix RGB
Else
Load from stream, normally
Hint, see LoadFromStream() instead of LoadFromFile().
I don't know what you're doing with the image afterward, so there may be some more work to be done.
Right, so your options here are (since we've completely ruled out any 3rd party code whatsoever):
Some Delphi guru (which I am certainly not) coming out of the woodwork and illuminating us with the existence of standard Delphi library code which handles this, or,
You write your own JPEG metadata decoding library.
Since I can't help you with the first option, here are some references you'll need while pursuing the second:
Wikipedia's JPEG article - one of the better written articles on the wiki. Great introductory material to this hairy subject.
ITU T.81 - The original JPEG specification. The most important bit is in Annex B, but this actually describes a format known as JIF which isn't actually in use. However, both JFIF and EXIF are based on it.
ICC - This specifies ICC (International Color Consortium) profiles, which are the bit that seems to be wrong with your headers.
ICC 2010 - New version of above specification, with Errata.
JFIF - The format most JPEG files actually use.
EXIF - Another format JPEG files use, especially those originating from digital cameras.
ITU T.84 - Less useful, this describes JPEG extensions. I'm including it here for completeness.
Sadly, none of the public domain JPEG implementations that I'm aware of (in any language) actually handle the relevant bits of the standards, namely EXIF and ICC profiles, so I can't give you any code you could draw inspiration from. The best place to look would probably be libexif, but that's GPL'ed.
What you're going to want to do is to read in the JPEG file as a regular file, parse the input according to information gleaned from the above documents, make the necessary changes, and write it back out. I don't think this is going to be easy, but apparently this is the only solution you'll accept.
Good luck!
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I've got bunches of auxiliary files that are generated by code and LaTeX documents that I dearly wish would not be suggested by SpotLight as potential search candidates. I'm not looking for example.log, I'm looking for example.tex!
So can Spotlight be configured to ignore, say, all .log files?
(I know, I know; I should just use QuickSilver instead…)
#diciu That's an interesting answer. The problem in my case is this:
Figure out which importer handles your type of file
I'm not sure if my type of file is handled by any single importer? Since they've all got weird extensions (.aux, .glo, .out, whatever) I think it's improbable that there's an importer that's trying to index them. But because they're plain text they're being picked up as generic files. (Admittedly, I don't know much about Spotlight's indexing, so I might be completely wrong on this.)
#diciu again: TextImporterDontImportList sounds very promising; I'll head off and see if anything comes of it.
Like you say, it does seem like the whole UTI system doesn't really allow not searching for something.
#Raynet Making the files invisible is a good idea actually, albeit relatively tedious for me to set up in the general sense. If worst comes to worst, I might give that a shot (but probably after exhausting other options such as QuickSilver). (Oh, and SetFile requires the Developer Tools, but I'm guessing everyone here has them installed anyway :) )
#Will - these things that define types are called uniform type identifiers.
The problem is they are a combination of extensions (like .txt) and generic types (i.e. public.plain-text matches a txt file without the txt extension based purely on content) so it's not as simple as looking for an extension.
RichText.mdimporter is probably the importer that imports your text file.
This should be easily verified by running mdimport in debug mode on one of the files you don't want indexed:
cristi:~ diciu$ echo "All work and no play makes Jack a dull boy" > ~/input.txt
cristi:~ diciu$ mdimport -d 4 -n ~/input.txt 2>&1 | grep Imported
kMD2008-09-03 12:05:06.342 mdimport[1230:10b] Imported '/Users/diciu/input.txt' of type 'public.plain-text' with plugIn /System/Library/Spotlight/RichText.mdimporter.
The type that matches in my example is public.plain-text.
I've no idea how you actually write an extension-based exception for an UTI (like public.plain-text except anything ending in .log).
Later edit: I've also looked though the RichText mdimporter binary and found a promising string but I can't figure out if it's actually being used (as a preference name or whatever):
cristi:FoodBrowser diciu$ strings /System/Library/Spotlight/RichText.mdimporter/Contents/MacOS/RichText |grep Text
TextImporterDontImportList
Not sure how to do it on a file type level, but you can do it on a folder level:
Source: http://lists.apple.com/archives/spotlight-dev/2008/Jul/msg00007.html
Make spotlight ignore a folder
If you absolutely can't rename the folder because other software depends on it another technique is to go ahead and rename the directory to end in ".noindex", but then create a symlink in the same location pointing to the real location using the original name.
Most software is happy to use the symlink with the original name, but Spotlight ignores symlinks and will note the "real" name ends in *.noindex and will ignore that location.
Perhaps something like:
mv OriginalName OriginalName.noindex
ln -s OriginalName.noindex
OriginalName
ls -l
lrwxr-xr-x 1 andy admin 24 Jan 9 2008
OriginalName -> OriginalName.noindex
drwxr-xr-x 11 andy admin 374 Jul 11
07:03 Original.noindex
Here's how it might work.
Note: this is not a very good solution as a system update will overwrite changes you will perform.
Get a list of all importers
cristi:~ diciu$ mdimport -L
2008-09-03 10:42:27.144 mdimport[727:10b] Paths: id(501) (
"/System/Library/Spotlight/Audio.mdimporter",
"/System/Library/Spotlight/Chat.mdimporter",
"/Developer/Applications/Xcode.app/Contents/Library/Spotlight/SourceCode.mdimporter",
Figure out which importer handles your type of file (example for the Audio importer):
cristi:~ diciu$ cat /System/Library/Spotlight/Audio.mdimporter/Contents/Info.plist
[..]
CFBundleTypeRole
MDImporter
LSItemContentTypes
public.mp3
public.aifc-audio
public.aiff-audio
Alter the importer's plist to delete the type you want to ignore.
Reimport the importer's types so the system picks up the change:
mdimport -r /System/Library/Spotlight/Chat.mdimporter
The only option probably is to have them not indexed by spotlight as from some reason you cannot do negative searches. You can search for files with specifix file extension, but you cannot not search for ones that don't match.
You could try making those files invisible for Finder, Spotlight won't index invisible files. Command for setting the kIsInvisible flag on files is:
SetFile -a v [filename(s)]