macOS: How to access the Live Text OCR functionality from AppleScript/JXA? - macos

As of macOS Monterey it is possible to select text in images in Preview.
Is this OCR functionality available from AppleScript and/or JXA (JavaScript for Automation)?
In Script Editor.app File > Open dictionary... I selected the Preview.app and looked at the API for Standard Suite and Text Suite but there doesn't seem to be anything related to OCR. (The Text Suite apparently has to do with drawing text on picture and not text extraction.)
I have also searched for text recognition actions in Automator.app but didn't see anything suitable.

There isn’t (yet?) a way to directly access OCR from AppleScript, but the workaround that I’m using is to use OwlOCR. This is an app that can capture text from the screen and output it to PDF, or plant text to the clipboard. Crucially for our purposes, it can also be controlled from the command line, and you can wrap those shell commands in an AppleScript “do shell script” command.

Related

Read current terminal

Is it possible to read what's currently displayed on the windows terminal pragmatically using any available API?
For example, I've got an app that tail's some log files. I'd like to be able to hit a key and open a text editor at the line that is currently being viewed. The problem is the terminal also has scroll bars.
Not easy. Perhaps you could capture the screen and use OCR to identify its contents, or make a shortcut to some sort of macro that selects all the screen and copies the text. But there is no API available to perform the task you ask.
Of course, you can tee the command you're running in the console to a file, and open such file with an editor whenever you like, however it will show the full output of the command and not the visible part. If you like more information on that topic, it is answered in SO - Displaying Windows command prompt output and redirecting it to a file
.

How to access Print dialog's 'Open PDF in Preview' in os x programmatically

I am building a Delphi application which opens an image and its metadata and prints it. For the Windows version I build a form to generate the PrintPreview, but in Mac I can use the Print Dialog's 'Open PDF in Preview' instead. When I click on it, a PDF file is generated and I can see it, its OK. The problem is I want to access this option directly from a button, so when the button is clicked, the PDF in Preview is opened and the user does not have to open the Print Dialog, then click the 'PDF' and then select 'Open PDF in Preview'. How can I do this?
I read about using Automator, apple scripts etc, but I still can't find it.
Is there any path this generated PDF Preview is stored, so maybe I can open it from there?...
TIA
Possible duplicate of Using Automator or Applescript or both to recursively print documents to PDF but I'll answer anyways.
To answer your question directly see the question I linked to. Basically you need to use System Events from applescript to accomplish that exactly
However, there's a quicker solution using /usr/sbin/cupsfilter. Check the man page for more.
You can call cupsfilter <an-image-file> and you'll get a PDF on stdout, courtesy of OSX's printing daemon. It looks quite configurable but I just learned about it a while ago.
If you want this to open for the user you can save it in a nice place or you can do it the one-shot way and do cupsfilter <your-image> | open -f -a "Preview" to open the PDF right up.

programmatically take screenshot, crop section, and run OCR tools. quick solutions?

I will be writing code that takes a screenshot, crops to a small section of the screen (predefined area of screen), and then extracts the text from that cropped image (via OCR tools), and then saves the resulting text to a file. I was wondering if there is software (preferably for Windows) that can do this, or at least parts of it. I am already looking into tesseract as an OCR tool. Anyone know of software that can take the screenshot, and possibly crop a predefined region of the image.
Thanks,
-Jason
I use Greenshot, which is a very awesome tool for screenshots and according to the FAQ it supports OCR (using MODI = Microsoft Office Document Imaging) as well. However, I never got it working on my Windows machine and used Tesseract instead (for Linux, with some scripting experience, this method should be possible as well):
Download Tesseract here for Ubuntu/Debian/Windows and install it.
Download and install Greenshot
Create a new windows batch script called "Greenshot_Tesseract_OCR.bat" using a text editor like notepad or Notepad++ - and save it at a location of your choice, e.g. "C:\Users\MyUser\Scripts\Greenshot_Tesseract_OCR.bat" - with the following content (depending on the installation location of tesseract):
ECHO OFF
set arg1=%1
"C:\Program Files\Tesseract-OCR\tesseract.exe" "%arg1%" "%arg1%"
type "%arg1%.txt" | clip
Right-click the Greenshot icon in the toolbar and click "configure external command"
Add a new command with a name like "Tesseract OCR to Clipboard", select the batch script you just created as a command and as argument, use the default "{0}". Then click OK twice.
You should now be able to copy the text of a screenshot into your clipboard, with a shortcut ("Print" key in my case) and 1-2 mouse clicks (depending on your Greenshot settings)!
You can try the following open-source programs:
Greenshot for screenshots and VietOCR (a GUI frontend for Tesseract) for OCR on screenshots.

OSX Application or Web App for converting text to plain text (unicode)

I am looking for ways to quickly converting blocks of text created in Word, etc. into plain text (i.e. turning right and left quotation marks into "plain text" quotation marks) for quickly transferring content to code with as few headaches as possible.
I came across this:
http://www.softpedia.com/get/Office-tools/Other-Office-Tools/Keith-Fenske-Plain-Text.shtml
...but it is Windows only and I prefer to dev on a Mac. Does anyone have a suggestion for an OSX tool or better yet a web app?
If you're using Snow Leopard, it's easy to create a Service to clean text. Run /Applications/Automator, choose the Service template, set it to receive text in any application, and enable replacing the selected text. Add a Run Shell Script action to the workflow, with Pass Input set to stdin. For the actual script, paste this in in place of the template (cat):
LC_CTYPE=en_US.UTF-8 tr '‘’‛❛❜“”‟❝❞‐–—­‒‑' "['*5]"'["*5][-*6]'
(note: hopefully all the various funny characters I included in the first string will pass through our various web interfaces intact... if not, edit the collections of quote marks to include whatever you need to squash in the first string, and matching numbers of their plain-text equivalents in the second string. And feel free to add other replacements as needed.)
Anyway, save this Service with some reasonable name, and then to invoke it just select some text (in any Cocoa app -- not, unfortunately, MS Word), and select your service from the application menu -> Services submenu. Also, you can use the Keyboard preference pane to assign it a keyboard shortcut if you like.
Text Wrangler from Bare Bones Software. This is BBEdit's free little brother (which will also do what you want).
The "Plain Text" Java application will run on Linux, Mac OS, and Windows.

Writing a simple app to convert files to pdf

I want to create an application on a Mac to convert multiple files (txt, pdf, doc, html, etc) to a single pdf file that can be printed. The real point is that if you have 50 texts you don't have to open every single file and click command-p.
I'm not quite sure whether the best way to do this is by creating a full-fledged app or an automator plugin (or something else). If I remember correctly there's a filter in mac os's terminal that can convert files to pdf (but I forgot what it's called).
So would an automator plugin do this well, or shall I make an app for this? Can you provide me advantages for each answer?
I've done cocoa touch programming before so I can write objective-c quite well.
Use appscript, either as an action in an automator script or standalone. The advantage is that it is very simple and will take you a fraction of the time to write an app.
Here is something very close to what you want. It sets up a drop-folder and each file dragged onto it is printed (you can use multiple-select to get what you want). It uses Apple Works 6 which doesn't support the file-types that you want.
To modify it to use the Preview application instead you need to change the tell command in the script and then google the dictionary for Preview to check which verb to use for printing.

Resources