I came across the neat little R package slidex() to convert pptx to rmd. However, it does only support "en-US" language encoding.
How do I change the language encoding of an existing .pptx file?
Language tags are scattered all over a PowerPoint file. There are add-ins that can do a pretty thorough job: PPTools LanguageSelector
Personally, I prefer to change the file ending to .Zip, expand the file and use a text editor like NotePad++ to find and replace all language tags (you're looking for tags like lang="en-US"), then rezip. The default Windows Zip utility is not the best for this, it adds a top-level folder that PowerPoint can't parse. WinZip and 7-Zip are better.
If you're using PowerPoint for Windows, save the file as a PowerPoint XML Presentation (*.xml). Do the find and replace on that, then resave as a normal presentation. That avoids the unzip/rezip issue.
Related
I'm trying to batch convert a bunch of assorted iWork files (Numbers, Pages, Keynote) to PDF on the command line.
I've been trying cups-filter but there's no MIME type filter for the iWork types. I then looked into using qlmanage to generate the preview image and use that, but this doesn't seem to work for multi file Keynote documents as they generate as HTML rather than PDF.
Any suggestions? I'd rather not resort to AppleScript.
I created an .applescript script that converts all .pages files within a folder to .docx. .pdf support can be easily added. In pages2docx.applescript you just need to replace Microsoft Word with PDF.
Here's what I ended up going with, since I really wanted to avoid, AppleScript.
When saving an iWork document there's a "Include Preview In Document" checkbox. Checking this creates a "QuickLook/Preview.pdf" inside the iWork document bundle (which is actually a zip file). Luckily I had this checked for most of the zip files, so it was simply a case of unzipping to NSTemporaryDirectory and grabbing that file.
For those that didn't I put together a script to run qlmanage to create the document preview. For some that creates the PDF, for others it creates an HTML file. You can then use http://code.google.com/p/wkhtmltopdf/ to convert this HTML to a PDF.
Well... you need something that
understand the iWork file formats,
can render the documents to then create the PDF.
Unless you want to re-invent the iWork suite... Sounds simpler to just tell the iWork apps what you want from them.
You would do that via the Scripting Bridge
I would use Applescript, but perhaps you can use Ruby and Python with the Scripting Bridge to accomplish what you need
With Scripting Bridge, RubyCocoa and PyObjC scripts can do what AppleScript scripts can do: control scriptable applications and exchange data with them.
I haven't used the Scripting Bridge in a while, but I believe you can tell applications to print documents. And any application that can print in OS X can send it to PDF instead.
Here are a couple of commands to help those who want to get this working without much thought. It worked for me with a ppt file.
Make sure to get wkhtmltopdf from here.
qlmanage -p -o /tmp /path/of/file.ppt
wkhtmltopdf /tmp/file.ppt.qlpreview/Preview.html /output/to/file.pdf
You may have to fiddle with sizes if you want the original pages to stay consistent, for the ppt I was using the following parameters did the job:
wkhtmltopdf --page-width 200 --page-height 145 Preview.html file.pdf
Edit: I have written a Python script to do a batch conversion. Hopefully people can contribute to make it more robust:
https://github.com/matthewfitch23/DocToPdf
I have tried to find out the way I can put locks or disable the copy and paste on the PDF file after the conversion. I looked at the ConversionJobSettings properties but I couldn’t be able to accomplish this.
Based on what I have read, the sharepoint2010 Word Automation services API provides very limited capability in manipulating the conversion logics but is there any way I can lock down the content so that it cannot be copied?
Thank for your help
You will either need to code something up yourself or get a third party product such as this one, which allows conversion as well as PDF manipulation including security and watermarking.
Note that I worked on this product, so I am obviously biased. Having said that, it works brilliantly.
The only way to prevent copy and paste (as text) is to create image versions of the pages and saves those as a PDF.
a possible solution:
1) Use Word automation to print to a PostScript (PS) printer driver to get a .ps file
2) Use GhostScript to convert the PS to tif files
3) Create a PDF using the tif files (possibly with GhostScript too)
My C# .NET 3.5 application has an option to export text to PDF. I am using ReportingCloud (based on RDL) as generation engine. However, cyrillic texts shown incorrectly in resulting PDF. What means can I use to generate cyrillic PDF correctly? A method to generate UTF8 will also do.
UPD: Particularly, how to embed right fonts into PDF?
I am not familiar with ReportingCloud, so perhaps this is not the easiest answer to your question. But for really great looking PDFs with UTF8 and cyrillic support you could use LaTeX. But it is a language like HTML, just for PDFs. So you have to generate some source code. It is also possible to embed the desired fonts.
Where can I find the PowerPoint file format definition, like the header/XML/directory structure?
PowerPoint 2010 uses three primary markup languages - PresentationML, DrawingML and PowerPoint 2010 Extentions. The first two are part of ISO/IEC 29500:2008 specs, the last one isn't.
But in all cases with and above PowerPoint 2007, the document structure (i.e. what XML and other files go where in a .pptx and how they relate) is an implementation of the Open Packaging Convention. For details, go to the section PresentationML document structure of the Open XML Explained e-book. For 2010-specific extensions, this document lists them: PowerPoint Extensions to the Office Open XML File Format.
I would like to convert pdf, doc files to html files using Cocoa
Please help me in this.
Thanks in advance,
You can convert Word files to HTML using NSAttributedString. You can't do this in pure Cocoa for PDF files; you'll have to use a conversion tool, such as stigi suggested. To do that, use NSTask.
Cocoa's PDFKit framework can convert a PDF file to text, through PDFDocument's -string method for example. Of course this won't copy images or formatting though, and it depends on PDFKit being able to recognize text in the file.
there are a couple of tools for the unix commandline that do such kind of conversions.
check out http://pdftohtml.sourceforge.net/ & http://rtf2html.sourceforge.net/
you may see if there are other tools like this.
but to get back to your question. these command line tools can be called from within your cocoa app (won't work on the iphone) and produce the html result.
check out this link for a guide on how to embed such command line tools within your app.