Flattening annotations from Preview without rasterizing existing text - macos

I'm trying to flatten annotations I make to PDF files in macOS Preview (El Capitan) to ensure that they cannot be modified. I especially want to ensure that redactions cannot be deleted or unhidden to reveal the text beneath. Ideally, I would also like to preserve the machine-readability and vector quality of the text.
Currently, I achieve this by exporting to .tif, then converting back to .pdf, and then OCR'ing with Abbyy FineReader Express. That's a bit ridiculous, but the final result is almost exactly what I want: permanent annotations and searchable/copyable text. It loses some quality though... and grows in size.
I'm comfortable with the CLI and I've got MacPorts installed and pdftk. I hoped that the pdftk "flatten" option would do the trick, but it does not. It only seems to flatten form fields.
Does anything else out there do this? I swear there was a way to do this on some old built-in imaging program for Windows 2000 or something. (but I'm ok not going back to that) :-)
I would settle for a command that rasterizes the file if and only if it:
did everything in one step
kept the file small
kept the file as a pdf
kept the file as almost as readable and pretty as it was before

The "Best Practice" for Redactions in PDF is to either use Acrobat's Redaction tool, or the (long time industry leader) Redax Acrobat plug-in for Acrobat (although that one is not made for MacOS, as far as I remember).
Of course, the export as picture and then run OCR over it does work, but you have to absolutely make sure that you also clean up the file(s) from any private data and metadata.
Note that with the "real" redaction tools, you have the possibility for smart searches, even involving Regular Expressions.
With Redaction, as with other safety and security-related issues, it is up to you to decide how much it is worth to you.

Use "Export as PDF" (in the File Menu) which seems definitely a proper way to do it (on 10.9.5).
It seemed that the way to make these annotations permanent is printing to PDF with Preview, but that didn't succeed.

Related

Windows Help Files

Back in the old days, Help was not trivial but possible: generate some funky .rtf file with special tags, run it through a compiler, and you got a WinHelp file (.hlp) that actually works really well.
Then, Microsoft decided that WinHelp was not hip and cool anymore and switched to CHM, up to the point they actually axed WinHelp from Vista.
Now, CHM maybe nice, but everyone that tried to open a .chm file on the Network will know the nice "Navigation to the webpage was canceled" screen that is caused by security restrictions.
While there are ways to make CHM work off the network, this is hardly a good choice, because when a user presses the Help Button he wants help and not have to make some funky settings
Bottom Line: I find CHM absolutely unusable. But with WinHelp not being an option anymore either, I wonder what the alternatives are, especially when it comes to integrate with my Application (i.e. for WinHelp and CHM there are functions that allow you to directly jump to a topic)?
PDF has the disadvantage of requiring the Adobe Reader (or one of the more lightweight ones that not many people use). I could live with that seeing as this is kind of standard nowadays, but can you tell it reliably to jump to a given page/anchor?
HTML files seem to be the best choice, you then just have to deal with different browsers (CSS and stuff).
Edit: I am looking to create my own Help Files. As I am a fan of the "No Setup, Just Extract and Run" Philosophy, i had that problem many times in the past because many of my users will run it off the network, which causes exactly this problem.
So i am looking for a more robust and future-proof way to provide help to my users without having to code a different help system for each application i make.
CHM is a really nice format, but that Security Stuff makes it unusable, as a Help system is supposed to provide help to the user, not to generate even more problems.
Yep, at some point they want to add behaviour to their help files which makes it a security issue and guess what happens, the remedy being often worse than the threat.
Or it's too simple or too complicated and being replaced by something new without caring for backward compatibility.
If you want it really simple and build for the ages go for .TXT
You didn't specify what your apps are coded in so it depends.
If it's a web app, plain HTML would be the best choice, for a help file you don't need special features or javascript so being browser independant should be straightforward. But also for desktop apps HTML, on- or offline are often used with good results.
PDF is the other general solution, and yes you can jump to specific pages, see this answer. Every pc has (or should) one client or the other installed, I wouldn't worry about that. I myself never choose Acrobat Reader, faster, sompler and often better solutions are available, my favorite is Sumatra.
I'm sure .Net apps have their own help system (no experience here) and many languages have options to display tooltips, windows or pages with help either by pressing a hotkey (F1) or clicking some control dialog.

Applescript to grab text from a text file and highlight said text in Skim. Two failed attempts?

PDF's ubiquity, specially in academia, makes being able to highlight them (and saving these annotations) extremely important. Some academic journals (specially in law) allows the user to send journal articles to a Kindle reader, which makes reading and taking notes extremely easy. The question is how to take the underlined text from the MyClippings.txt file to the PDF. About a year ago I found that this is possible through an action in Adobe Acrobat Pro X which would parse a text file which is feeded to it and would highlight the relevant sections. The action takes advantage of the Search and Redact tool but instead of redacting, it highlights.
However, I would like to get out of the Adobe environment (for different reasons, one of which is that the Adobe reader is demanding resource wise and not-free). Skim in Mac OSX sounds like a good alternative for its support of AppleScript integration. I found two projects in GitHub which attempt to do this, but with both of them I failed.
my-clippings-to-pdf
Skim-AppleScript
Would anyone with knowledge in AppleScript take a look at that code and tell me if they look sound? It seems this would be a great functionality for integrating PDFs and ePub in a useful and meaningful way specially for academics.

Diff Tool for some binary (non-executable) formats

I'm looking for a visual diff tool for Mac OS X that will allow me to see differences in Pages (from Apple's iWork suite) and Adobe Illustrator documents. I realize a visual diff may be a little much to ask, so I'd settle for some sort of XML or plain-text comparison. I'm using Pages to maintain my Spec and Illustrator for my mockups, which are all version-controlled, and would love to be able to easily see the differences.
FileMerge (which ships with Xcode) just barfs up gobbledygook, so binary comparisons definitely won't work. I know about Kaleidoscope, which does support diffs on various image formats (and seems to be an all-around good solution), but it doesn't seem that it supports Pages or Illustrator.
Araxis Merge has a lot of different file-format viewers. Straight up binary is one of them, if that will get the job done for you. It will do PDF and various other image formats as well.
I don't know about Illustrator files, but Pages has an XML portion. If you open the .pages file with unzip, there is an index.xml file containing all of the text and style information for the file, which you can then compare with diff or FileMerge.

Pretty print code to PDF

I'm searching for a tool that will take a source directory and produce a single PDF containing the source code, preferably with syntax highlighting.
I would like to read the PDF on my phone, in order to get familiar with a code-base, or just to see what I can learn by reading a lot of code. I will most often be reading Ruby.
I would prefer if the tool ran on Linux. I don't mind paying for a tool if it is particularly good.
Any suggestions?
You could wipe something up yourself with Prawn and Ultraviolet.
PDF is no good for reflowing. You might like a html based solution better.
And in reading existing code, a lineair model is no good. You need to jump from one file to the other. A hypertext model with history would probably work best on the limited screen estate of a phone. It should borrow some features of the smalltalk IDEs (jump to senders, implementors).
For the UI, take a look at clamato
GNU source-highlight supports many languages and can output LaTeX in particular that can be converted to pdf.
The SciTE editor can export the currently edited file (with syntax highlighting) to PDF (and HTML, RTF, LaTeX and XML).
Alas, it doesn't have batch conversion capability, but IIRC somebody made a batch tool out of this code base.
I realize this is very late, but I wanted to do the same thing, except I wanted it for my tablet, which is a Galaxy Note 10.1 with a Wacom digitizer that I can use to annotate code. I found that one good solution is to use Doxygen to generate a PDF which will have hyperlinks and everything you would want in a PDF. For my use case, I would pair it with EzPDF on Android to annotate the code. This was also for the purpose of learning a new codebase. In the end I ended up not using the generated PDF but it was pretty usable.

Windows Help files - what are the options?

Back in the old days, Help was not trivial but possible: generate some funky .rtf file with special tags, run it through a compiler, and you got a WinHelp file (.hlp) that actually works really well.
Then, Microsoft decided that WinHelp was not hip and cool anymore and switched to CHM, up to the point they actually axed WinHelp from Vista.
Now, CHM maybe nice, but everyone that tried to open a .chm file on the Network will know the nice "Navigation to the webpage was canceled" screen that is caused by security restrictions.
While there are ways to make CHM work off the network, this is hardly a good choice, because when a user presses the Help Button he wants help and not have to make some funky settings.
Bottom Line: I find CHM absolutely unusable. But with WinHelp not being an option anymore either, I wonder what the alternatives are, especially when it comes to integrate with my Application (i.e. for WinHelp and CHM there are functions that allow you to directly jump to a topic)?
PDF has the disadvantage of requiring the Adobe Reader (or one of the more lightweight ones that not many people use). I could live with that seeing as this is kind of standard nowadays, but can you tell it reliably to jump to a given page/anchor?
HTML files seem to be the best choice, you then just have to deal with different browsers (CSS and stuff).
Edit: I am looking to create my own Help Files. As I am a fan of the "No Setup, Just Extract and Run" Philosophy, i had that problem many times in the past because many of my users will run it off the network, which causes exactly this problem.
So i am looking for a more robust and future-proof way to provide help to my users without having to code a different help system for each application i make.
CHM is a really nice format, but that Security Stuff makes it unusable, as a Help system is supposed to provide help to the user, not to generate even more problems.
HTML would be the next best choice, ONLY IF you would serve them from a public web server. If you tried to bundle it with your app, all the files (and images (and stylesheets (and ...) ) ) would make CHM look like a gift from gods.
That said, when actually bundled in the installation package, (instead of being served over the network), I found the CHM files to work nicely.
OTOH, another pitfall about CHM files: Even if you try to open a CHM file on a local disk, you may bump into the security block if you initially downloaded it from somewhere, because the file could be marked as "came from external source" when it was obtained.
I don't like the html option, and actually moved from plain HTML to CHM by compressing and indexing them. Even use them on a handful of non-Windows customers even.
It simply solved the constant little breakage of people putting it on the network (nesting depth limited, strange locking effects), antivirus that died in directories with 30000 html files, and 20 minutes decompression time while installing on an older system, browser safety zones and features, miscalculations of needed space in the installer etc.
And then I don't even include the people that start "correcting" them, 3rd party product with faulty "integration" attempts etc, complaints about slowliness (browser start-up)
We all had waited years for the problems to go away as OSes and hardware improved, but the problems kept recurring in a bedazzling number of varieties and enough was enough. We found chmlib, and decided we could forever use something based on this as escape with a simple external reader, if the OS provided ones stopped working and switched.
Meanwhile we also have an own compiler, so we are MS free future-proof. That doesn't mean we never will change (solutions with local web-servers seem favourite nowadays), but at least we have a choice.
Our software is both distributed locally to the clients and served from a network share. We opted for generating both a CHM file and a set of HTML files for serving from the network. Users starting the program locally use the CHM file, and users getting their program served from a network share has to use the HTML files.
We use Help and Manual and can thus easily produce both types of output from the same source project. The HTML files also contain searching capabilities and doesn't require a web server, so though it isn't an optimal solution, works fine.
So far all the single-file types for Windows seems broken in one way or another:
WinHelp - obsoleted
HtmlHelp (CHM) - obsoleted on Vista, doesn't work from network share, other than that works really nice
Microsoft Help 2 (HXS) - this seems to work right up until the point when it doesn't, corrupted indexes or similar, this is used by Visual Studio 2005 and above, as an example
If you don't want to use an installer and you don't want the user to perform any extra steps to allow CHM files over the network, why not fall back to WinHelp? Vista does not include WinHlp32.exe out of the box, but it is freely available as a download for both Vista and Server 2008.
It depends on how import the online documentation is to your product, a good documentation infrastructure can be complex to establish but once done it pays off. Here is how we do it -
Help source DITA compilant XML, stored in SCC (ClearCase).
Help editing XMetal
Help compilation, customized Open DITA Toolkit, with custom Perl/Java preprocessing
Help source cross references applications resources at compile time, .RC files etc
Help deliverables from single source, PDF, CHM, Eclipse Help, HTML.
Single source repository produces help for multiple products 10+ with thousands of shared topics.
From what you describe I would look at Eclipse Help, its not simple to integrate into .NET or MFC applications, you basically have to do the help mapping to resolve the request to a URL then fire the URL to Eclipse Help wrapper or a browser.
Is the question how to generate your own help files, or what is the best help file format?
Personally, I find CHM to be excellent. One of the first things I do when setting up a machine is to download the PHP Manual in CHM format (http://www.php.net/download-docs.php) and add a hotkey to it in Crimson Editor. So when I press F1 it loads the CHM and performs a search for the word my cursor is on (great for quick function reference).
If you are doing "just extract and run", you are going to run in security issues. This is especially true if you are users are running Vista (or later). is there a reason why you wanted to avoid packaging your applications inside an installer? Using an installer would alleviate the "external source" problem. You would be able to use .chm files without any problems.
We use InstallAware to create our install packages. It's not cheap, but is very good. If cost is your concern, WIX is open source and pretty robust. WIX does have a learning curve, but it's easy to work with.
PDF has the disadvantage of requiring the Adobe Reader
I use Foxit Reader on Windows at home and at work. A lot smaller and very quick to open. Very handy when you are wondering what exactly a80000326.pdf is and why it is clogging up your documents folder.
I think the solution we're going to end up going with for our application is hosting the help files ourselves. This gives us immediate access to the files and the ability to keep them up to date.
What I plan is to have the content loaded into a huge series of XML files, each one containing help for a specific item. This XML would contain links to other XML files. We would use XSLT to display the contents as necessary.
Depending on the licensing, we may build a client-specific XSLT file in order to tailor the look and feel to what they need. We may need to be able to only show help for particular versions of our product as well and that can be done by filtering out stuff in the XSLT.
I use a commercial package called AuthorIT that can generate a number of different formats, such as chm, html, pdf, word, windows help, xml, xhtml, and some others I have never heard of (does dita ring a bell?).
It is a content management system oriented towards the needs of technical documentation writers.
The advantage is that you can use and re-use the same content to build a set of guides, and then generate them in different formats.
So the bottom line relative to the question of choosing chm or html or whatever is that if you are using this you are not locked into a given format, but you can provide several among which the user can choose, and you can even add more formats as you go along, at no extra cost.
If you just have one guide to create it won't be worth your while, but if you have a documentation set to manage then it is the best to my knowledge. Their support is very helpful also.

Resources