The latest version of ABBYY FineReader for OS X now supports AppleScript. I try to write a simple script that does an OCR of a given pdf document and saves it as a pdf document to make it searchable. Unfortunately, I'm a beginner in AppleScript and can't get it to work. I couldn't find further documentation or samples for scripting ABBYY FineReader.
I managed to open the dictionary of FineReader, it has this function:
export to pdf v : Converts the current document to a PDF file. If FineReader is running in a Sandbox, the file will be saved to a temporary directory.
export to pdf file : NO_DESCRIPTION
[ocr languages enum language list type] : List of recognition languages that includes language identifiers and full language names.
[saving type save settings enum] : Specifies file creation settings for saving results.
[export mode pdf layout] : Specifies export mode.
[keep page numbers headers and footers boolean] : Keeps headers, footers and page numbers.
[page size page size enum] : Specifies paper size.
[keep pictures boolean] : Keeps pictures in recognized document.
[image quality image quality enum] : Specifies quality of pictures in output file.
[keep text and background colors boolean] : Keeps background and character colors.
[use mrc boolean] : Compresses the output file significantly while retaining high quality of text and images.
[make pdfa boolean] : Creates a searchable PDF document that is well suited for archiving.
[create outline boolean] : Creates a table of contents in a PDF file based on headings.
[enable pdf tagging boolean] : Enables PDF tags.
[embed fonts boolean] : Embeds fonts from the document in the e-book.
→ file :
I tried this script:
tell application "FineReader OCR Pro"
export to pdf "<path to pdf>"
end tell
However, I get the output "missing value". What is wrong?
Looks like you may need to supply a path to save the pdf to.
Try this:
tell application "FineReader OCR Pro"
export to pdf ((path to desktop) as string) & "test-export.pdf"
end tell
Should save file to your desktop
Related
I would like to know how to extract the metadata of an otf file. For example :
chrono number of the glyphs (ex: 001, 002...)
associated unicode in hexadecimal
etc..
Applications like "Font Examiner" can do it :
Font Examiner
Thanks in advance
If this were my problem I'd probably want to solve it using as much public and/or open source code as possible.
For example, you can use the FreeType library (which apparently compiles for both Macintosh & iOS to load the .otf font, then you can get the metadata off the loaded face (e.g. the num_glyphs attribute in FTFace_rec).
If you can load the font into a NSFont or UIFont instance, you do have access to a numberOfGlyphs and a coveredCharacterSet property (the latter shows the characters a font can render).
I am using InDesign CS6 server for changing text and images of .indd files.
Though I have got scipt for changing texts but I am not able to change the images of indd document.
Can some one plz help me out with script for identifying all the images used in indd doc and then replacing these images with user selected images.
Thanks!!!!
Assuming you have a simple layout containing one linked graphic, the following script would relink that graphic to a different file:
set replacementFile to "Macintosh HD:Users:me:Desktop:somefile.png"
tell application "InDesignServer"
tell document 1
relink link 1 to alias replacementFile
end tell
end tell
For example if I have a business card design done in InDesign and now I need to provide print ready PDF for printers containing multiple copies of the business card. How would you do that? Are there any specific tools?
InDesign doesn't do imposition (placing of pages on one output page in a particular order).
You have to buy/find a tool, a plugin. Like croptima dot com.
Or on this page, there's some interesting stuff:
http://www.adobe.com/cfusion/exchange/index.cfm?l=6&s=5&o=desc&exc=19&cat=223&event=producthome
Alternatively do it by hand, or use a pdf imposition tool.
Succes!
Do an export to PDF ( with any marks you need ). Get the file path. Open a text file and type in :
file
/myFile.pdf
/myFile.pdf
/myFile.pdf
/myFile.pdf
/myFile.pdf
/myFile.pdf
/myFile.pdf
…
Once that done. Go to Indesign, set a box that will host the pdf and run a data merge. You will get your imposition quite freely ;)
Loic
My bad, you need to specify that you are placing images files with a trailing arobase :
#pdfs
"/myFile.pdf"
"/myFile.pdf"
"/myFile.pdf"
…
And specify the absolute path to the file.
How many cards do you need to layout ? If few, you could just flow the indd file into another document and duplicate boxes.
I didn't test but maybe you could draw a grid and point for the indesign file. Best scenario, if grid is selected, the file is flowed in every frame.
Loic
When we try to save msword doc file as html file we get "wmz" files for the math equation objects.
I tried decompressing the wmz file and saving the content as jpg.
I can open this jpg file in the "Microsoft Picture manager" properly. But trying to open the file in browser displays the error message "The image cannot be displayed, because it contain errors".
What is the procedure to decompress this wmz file and convert it to jpg.
What will be the extension of decompressed file?
.WMZ seems to be a zipped .WMF file.
You can open the unzipped file with a picture view/editor (just tried IrfanView) and save as .jpg.
When you save your Word documents as "Web Page, filtered" you won't get these WMZ files but just PNG files.
Set the "Web Options" to target to a low version of IE (i.e. 4.0) and check "allow PNG files" and "disable features not supported by these browsers".
Added advantage is that the webpage will display better in different browsers.
However, you should do all of this after you first make a copy of your document (and associated files) using Explorer into another location. Open this copy with Word and then Save as "Web Page, filtered". The original you keep for editing. (Don't save the original as a "web page, filtered" or you will loose the ability to edit the equation objects).
Thanks for the help.
Finally i could not remove the black background from the image file.
So using the round about approach for now
1)Decompress the wmz file to byte array(wmf).
2)Open a new word document
3)Paste the byte array into word document.(this document should only contain this data, and no other extra information)
4)Save the doc as html file (WdSaveFormat.wdFormatFilteredHTML)
5)open the "_files" directory created for the html output
6)Find the only "gif" file created inside the directory
I ran into this trying to throw together a simple Automator script to combine several one-page PDF files. I had 88 files to combine, each just about exactly 300KB, so I expected the final product to be about 30MB; the resulting PDF file, using the Combine PDFs Automator action, was 300+MB.
Poking around, the Automator action uses a Python script, with Foundation bindings, to create the new PDF document with the CoreGraphics PDF APIs. Nothing seems out of place. Basically, it's doing this (simplified, but these are the high points):
writeContext = CGPDFContextCreateWithURL(outURL, None, None)
for url in inURLs:
doc = CGPDFDocumentCreateWithURL(url)
page = CGPDFDocumentGetPage(doc, 1)
mediaBox = CGPDFPageGetBoxRect(page, kCGPDFMediaBox)
CGContextBeginPage(writeContext, mediaBox)
CGContextDrawPDFPage(writeContext, page)
CGContextEndPage(writeContext)
CGPDFContextClose(writeContext)
I can't imagine that CGContextDrawPDFPage, when drawing to a PDF context, would do anything but copy the PDF data for that page (with some window-dressing).
Even when "combining" just one PDF, the output is 2.8MB, compared to the 300KB original one-page PDF.
The resulting PDFs look exactly the same page-by-page as the original pages: text is selectable in the same places, graphics look identical, the pages are exactly the same size.
Any ideas?
Do the input PDFs contain the same set of fonts, or different sets? Maybe if the originals don't contain embedded fonts, but the output does, that could account for some of the growth.