Speech Recognition on OSX - macos

How can I use OSX's speech-to-text tools programmatically? OSX has offline "enhanced dictation" which essentially means that somewhere on my computer is all the data required to turn audio into speech. I would like to invoke these capabilities from an executable.
I have seen some AppleScript files essentially do this, but I can't get them to work on OSX.

NSSpeechRecognizer is an API that provides access to the older "Speakable Items" functionality that's been around since before OS X (now called "Dictation Commands", and requiring Enhanced Dictation).
This is just a command interface, though — that is, you provide a list of commands, and it tells you when the user has spoken one of them. There's no public API for full speech-to-text dictation.

Related

Are there APIs to enable/disable Bluetooth on Windows 8.1?

In Windows 8/8.1 it's now possible to enable/disable Bluetooth via the OS itself (see image below). This is awesome because it's device/driver-agnostic.
On Android, this is possible via BluetoothAdapter.enable() and BluetoothAdapter.disable(), but I haven't been able to find anything to do this on Windows (even though it seems like it's definitely possible).
So I've tried using:
BluetoothEnableIncomingConnections() - However, this only prevents new incoming connections. It doesn't disable existing ones.
devcon.exe - The problem with this method is that A. it is a non-redistributable binary B. it requires that you know he device ID ahead of time (so it's not device/driver-agnostic). Also, while it's not a dealbreaker, it'd be nice to not require elevation.
UI Automation - Simply launching the PC Settings app and toggling the switch with keyboard events is easy, but it's super ugly, both in terms of proper coding practices and in terms of user experience. That being said, this is the only way I've found to achieve the behavior I'm looking for so far.
I'm writing a native Win32 app in C++, so I'm not constrained to any Windows Store app requirements, although, it would be great if there was an approach that didn't require elevation.
TL;DR
Are there any APIs, WMI interfaces, or anything else available to achieve functionally equivalent results to flipping the Bluetooth toggle switch? If not, are there any alternative methods which yield similar results?
In win8.1 you should be able to call BluetoothEnableRadio to enable/disable the local radio.
Basically the manufacturers should include a method to accomplish this so you don't have to load a dll.
"Beginning with Windows 8.1 vendors are no longer required to implement radio on/off capability (for Bluetooth 4.0 radios) in a software DLL as described in this topic, because the operating system now handles this functionality. Windows 8.1 will ignore any such DLL, even if present."
check out this link which talks about it:
http://msdn.microsoft.com/en-us/library/windows/hardware/hh450832%28v=vs.85%29.aspx

Using Mac OSX Dictation with Speech API

In OSX Mavericks, speech dictation is now included, and is very useful. I am trying to use the dictation capability to create my own digital life assistant, but I can't find how to use the recognition functionality to get the speech in an application rather than a text box.
I have looked into NSSpeechRecognizer, but that seems to be geared toward programming speakable commands with a pre-defined grammar rather than dictation. It doesn't matter what programming language I use, but Python or Java would be nice...
Thanks for your help!
You can use SFSpeechRecognizer (mirror) (requires macOS 10.15+): this is made for speech recognition.
Perform speech recognition on live or prerecorded audio, receive transcriptions, alternative interpretations, and confidence levels of the results.
Whereas as you have noted in the question NSSpeechRecognizer (mirror) indeed provides a “command and control” style of voice recognition system (the command phrases must be defined prior to listening, in contrast to a dictation system where the recognized text is unconstrained).
From https://developer.apple.com/videos/play/wwdc2019/256/ (mirror):
Another way is to directly use Mac Dictation, but as far as I know the only way is to rerdirect audio feeds, which isn't very neat, e.g. see http://www.showcasemarketing.com/ideablog/transcribe-mp3-audio-to-text-mac-os/ (mirror).

Automating excel in python on a mac

I'm automating excel with python through the COM interface, with pywin32. I'd like to port my script to mac. Is there any chance of this happening? I realize that I can't use COM on a mac, and the xlutils python modules won't work (since I need to copy graphs, etc.). Is there anything else I can use?
Unfortunately development of appscript has been stopped,
I'm not aware the development of AppleScript has been stopped. What is your source on that?
As far as I can tell, AppleScript is still maintained and has a future and is working well.
Also there is a distinction between "AppleScript" the scripting language and the Inter-Application Communication Technology (IAC) called: "AppleEvents"
You can find the AppleScript Editor.app at the following location on your Mac:
/Applications/Utilities/AppleScript Editor.app
Open up the AppleScript Editor.app and then choose from the File menu: "Open Dictionary..."
Then select "Excel.app"
This will open the Apple Event dictionary for Excel. You will be able to see "Nouns" and "Verbs" and "Properties" that are supported in Excel for Apple Events.
AppleScript can be used to send commands to Excel via Apple Events, but so can other scripting languages such as Python.
If I were you, I would consider making an abstract class of Excel and two concrete subclasses, one that uses the Windows version of the code for Windows and anther that uses the Apple Event code that the Mac version of Excel might need.
See: the Wikipedia Article on Apple Events:
http://en.wikipedia.org/wiki/Apple_events
It's really Apple Events -- not AppleScript that you may want to use, but you could look at the following article too:
See Also: the Wikipedia Article on AppleScript:
http://en.wikipedia.org/wiki/AppleScript
--- edit ---
I believe that the misconception that Apple is lessening its support of AppleScript comes from the fact that AppleScript Studio was discontinued, but it was replaced with something just as good or better. Now, you can use AppleScript to develop full-fledged Mac applications in XCode. In addition, AppleScript can also still be used in Automator workflows, in XCode to build Automator actions, and in the AppleScript Editor, all of which are delivered by Apple in the latest version of OS X (Mountain Lion 10.8.4) and in the latest version of XCode (version 4.6.2)
So AppleScript is still a viable option, though the OP was asking about a Python solution. Apple Events are available from Python.
Currently the only possible solution would be appscript, which is a module that allows to write AppleScript macros in python (and ruby). Unfortunately development of appscript has been stopped, but for now it keeps working well enough.

is there a scripting solution for determining the default application path for a file on the Mac?

For a given extension, for example ".psd", I'd like to be able to determine the default application path for opening this file, for example "/Applications/Adobe Photoshop CS4.app".
I've looked into the Launch Services API, and there are clearly programmatic ways to get this information. Unfortunately for my particular scenario, only a scripting solution (Applescript or shell script) will do.
I've also looked at "lsregister -dump". It seems to be unwise to rely on parsing this information, since there are no guarantees as to the stability of the output format.
I've been solving this problem in the past with Creator Codes, but since Apple seems to be phasing them out since Snow Leopard I'm trying to eliminate dependence on Creator Codes.
thanks
Launch Services is the one and only place to get that information. You can write a scripting addition that will expose its functionality to AppleScript, but then you have to install that on whatever machine you plan to run on.
System Events does give you this in Leopard
alt text http://img.skitch.com/20091222-eessetxeqbai2mnwduygtm1cd5.png

Change Sound (or other) System Preferences in Mac OS X

I'd like to be able to switch the sound output source in Mac OS X without any GUI interaction.
There are tools to do control the sound output, such as SoundSource and an applescript to open the preferences dialog.
What I am looking for is something that switches the preference instantly, like SoundSource but it has to be scriptable. The goal is to switch between my digital and analog output with one keystroke. I have a helper application that will launch a program or applescript on one keypress. All I need now is the applescript or application that switches the sound source quickly without any user interaction.
I'm willing to write some Objective-C if that is what it takes, but I'm pretty much a newbie at Cocoa development.
Do you have a one-click solution or can point me to a good tutorial on controlling sound system preferences from a Cocoa App or command line?
EDIT: I created a command-line application to do exactly this. You may download it at http://code.google.com/p/switchaudio-osx/downloads. Source code is available on the project site as well.
I created a command-line application to do exactly this.
You may download it at http://code.google.com/p/switchaudio-osx/downloads. Source code is available on the project site as well.
UPDATE (Dec. 2014): the code is now hosted on github -- https://github.com/deweller/switchaudio-osx. And works just fine in Yosemite.
Don’t think of it in terms of preferences; there’s no centralized system preference framework for this sort of thing. I believe what you need to do is use Core Audio to set the kAudioHardwarePropertyDefaultOutputDevice and kAudioHardwarePropertyDefaultSystemOutputDevice properties of the AudioSystemObject (using AudioHardwareSetProperty()).

Resources