I'm trying to batch convert a bunch of assorted iWork files (Numbers, Pages, Keynote) to PDF on the command line.
I've been trying cups-filter but there's no MIME type filter for the iWork types. I then looked into using qlmanage to generate the preview image and use that, but this doesn't seem to work for multi file Keynote documents as they generate as HTML rather than PDF.
Any suggestions? I'd rather not resort to AppleScript.
I created an .applescript script that converts all .pages files within a folder to .docx. .pdf support can be easily added. In pages2docx.applescript you just need to replace Microsoft Word with PDF.
Here's what I ended up going with, since I really wanted to avoid, AppleScript.
When saving an iWork document there's a "Include Preview In Document" checkbox. Checking this creates a "QuickLook/Preview.pdf" inside the iWork document bundle (which is actually a zip file). Luckily I had this checked for most of the zip files, so it was simply a case of unzipping to NSTemporaryDirectory and grabbing that file.
For those that didn't I put together a script to run qlmanage to create the document preview. For some that creates the PDF, for others it creates an HTML file. You can then use http://code.google.com/p/wkhtmltopdf/ to convert this HTML to a PDF.
Well... you need something that
understand the iWork file formats,
can render the documents to then create the PDF.
Unless you want to re-invent the iWork suite... Sounds simpler to just tell the iWork apps what you want from them.
You would do that via the Scripting Bridge
I would use Applescript, but perhaps you can use Ruby and Python with the Scripting Bridge to accomplish what you need
With Scripting Bridge, RubyCocoa and PyObjC scripts can do what AppleScript scripts can do: control scriptable applications and exchange data with them.
I haven't used the Scripting Bridge in a while, but I believe you can tell applications to print documents. And any application that can print in OS X can send it to PDF instead.
Here are a couple of commands to help those who want to get this working without much thought. It worked for me with a ppt file.
Make sure to get wkhtmltopdf from here.
qlmanage -p -o /tmp /path/of/file.ppt
wkhtmltopdf /tmp/file.ppt.qlpreview/Preview.html /output/to/file.pdf
You may have to fiddle with sizes if you want the original pages to stay consistent, for the ppt I was using the following parameters did the job:
wkhtmltopdf --page-width 200 --page-height 145 Preview.html file.pdf
Edit: I have written a Python script to do a batch conversion. Hopefully people can contribute to make it more robust:
https://github.com/matthewfitch23/DocToPdf
Related
I'm wondering how it possible to extract images from .swf viewer?
Note that .swf file have not images itself.
For example I'm trying extract images from AVON catalogue from this link - http://avon.com.ua/PRSuite/eBrochure.page?index=1&cmpgnYrNr=201404&pageNo=0
Any ideas?
Best way is to put the .swf file in a decompiler for image extraction. Decompilers are smart enough to extract images for you and arrange them.
JPEXS Free Flash Decompiler is a more popular one
http://www.free-decompiler.com/flash/
You can extract other useful content from it as well.
Just download the .swf file from the website
A while back (like around 1999) I wrote a set of tools for Flash animations.
One of the tools is swf_dump which can be used to extract objects (i.e. write the objects in a form of script that sswf can nearly recompile...)
The tool also allows for extracting images that are inline (not downloaded dynamically by the flash animation, if so, anyway, you could as well download those images manually, you'd need the URL, though.)
The command line you can use is:
swf_dump -d my-animation.swf
Then your current folder will be littered with all the images that were found in the flash file. It extracts JPEGs and PNGs. The source can be compressed (SWF or CWF are supported.)
Now, you're on your own to compile that thing... The project is here and is in great need of updating (but Flash is kind of going out too...)
https://sourceforge.net/projects/sswf/
I want to write a viewer that convert in-design output format to html5 format and all the user design in adobe indesign can display in browser but i do not know which output is suitable for me, i think i can retrieve all info about the adobe indesign in idml export,but the problem is parsing such XML and display the tags in html5 format,i want to know is it possible the simple way to convert the output format into html5?
is it possible to download the adobe indesign SDK and use its method to this purpose?
You can use in5 to export HTML5 (layout intact) from InDesign.
Full disclosure: I am the creator of in5.
Exporting to EPUB would result in XHTML 1.1. The Epub file that InDesign generates is a zip file, in which you will find a number of files. (At least) one of them is an XHTML file.
XHTML 1.1 would surely be an easier source to use than the idml, however you will have to make sure that the ePub export is good enough to start with (the pages won't come out exactly the same as in InDesign).
Would that be a solution?
EPub export is supported from InDesign CS4 (JavaScript based export option, outside the object model, as I understand it and a built-in export option, part of the object model, from CS5).
You don't mention what version of InDesign you are using. CS5, CS5.5 and CS6 all allow you to export to HTML. The problem is that the HTML is version 4 and it create badly written CSS. What I like to do is to use XML to build my own HTML. Just create a set of HTML5 tags you want to use and then Map the existing Paragraph and Character styles to the XML tags.
When you're done you will have a basic content structure. Then I use the Structure pane to add different elements as needed. You can add Parents or children as you need to right there and then export to XML. When you save the file, just change its name to .HTML and edit the code to remove the one reference to "xml".
It takes a little time, but it is very doable.
I have tried to find out the way I can put locks or disable the copy and paste on the PDF file after the conversion. I looked at the ConversionJobSettings properties but I couldn’t be able to accomplish this.
Based on what I have read, the sharepoint2010 Word Automation services API provides very limited capability in manipulating the conversion logics but is there any way I can lock down the content so that it cannot be copied?
Thank for your help
You will either need to code something up yourself or get a third party product such as this one, which allows conversion as well as PDF manipulation including security and watermarking.
Note that I worked on this product, so I am obviously biased. Having said that, it works brilliantly.
The only way to prevent copy and paste (as text) is to create image versions of the pages and saves those as a PDF.
a possible solution:
1) Use Word automation to print to a PostScript (PS) printer driver to get a .ps file
2) Use GhostScript to convert the PS to tif files
3) Create a PDF using the tif files (possibly with GhostScript too)
I am currently writing an application working with specially prepared image data. Another tool prepares the images (basically PNGs with additional data stored in the meta-data section). Now my tool works with these files, but not with all PNGs, so "we" decided to use a different file extension. So far, so good.
Now, because I am a lazy sack I implemented some file type registration to allow double-clicking on the file and opening it in my application (no problem at all).
And here is my Question:
It would be cool if the windows explorer could still show me the thumbnail previews for my files. Since they basically are still PNG files, it should be possible without writing my own shell extension (at least I believe so).
I quickly tried to copy all registry keys and values from HKCR.png to HKCR.mInDat (my file name ext) and it worked. However, I would prefere knowning what I am doing ;-)
Which of the registry settings are responsible for the thumbnail preview control and which can I use to get the preview for my file types?
I tried to google it, but I failed, since it seems I am unable to come up with the right buzz-words to find the info I need. Please, help me.
Thank you!
Yours,
3of4
Simple:
[HKEY_CLASSES_ROOT\.apng]
#="apng"
"Content Type"="image/png"
"PerceivedType"="image"
[HKEY_CLASSES_ROOT\apng\shellex\{BB2E617C-0920-11d1-9A0B-00C04FC2D6C1}]
#="{3F30C968-480A-4C6C-862D-EFC0897BB84B}"
I would like to convert pdf, doc files to html files using Cocoa
Please help me in this.
Thanks in advance,
You can convert Word files to HTML using NSAttributedString. You can't do this in pure Cocoa for PDF files; you'll have to use a conversion tool, such as stigi suggested. To do that, use NSTask.
Cocoa's PDFKit framework can convert a PDF file to text, through PDFDocument's -string method for example. Of course this won't copy images or formatting though, and it depends on PDFKit being able to recognize text in the file.
there are a couple of tools for the unix commandline that do such kind of conversions.
check out http://pdftohtml.sourceforge.net/ & http://rtf2html.sourceforge.net/
you may see if there are other tools like this.
but to get back to your question. these command line tools can be called from within your cocoa app (won't work on the iphone) and produce the html result.
check out this link for a guide on how to embed such command line tools within your app.