In OSX Mavericks, speech dictation is now included, and is very useful. I am trying to use the dictation capability to create my own digital life assistant, but I can't find how to use the recognition functionality to get the speech in an application rather than a text box.
I have looked into NSSpeechRecognizer, but that seems to be geared toward programming speakable commands with a pre-defined grammar rather than dictation. It doesn't matter what programming language I use, but Python or Java would be nice...
Thanks for your help!
You can use SFSpeechRecognizer (mirror) (requires macOS 10.15+): this is made for speech recognition.
Perform speech recognition on live or prerecorded audio, receive transcriptions, alternative interpretations, and confidence levels of the results.
Whereas as you have noted in the question NSSpeechRecognizer (mirror) indeed provides a “command and control” style of voice recognition system (the command phrases must be defined prior to listening, in contrast to a dictation system where the recognized text is unconstrained).
From https://developer.apple.com/videos/play/wwdc2019/256/ (mirror):
Another way is to directly use Mac Dictation, but as far as I know the only way is to rerdirect audio feeds, which isn't very neat, e.g. see http://www.showcasemarketing.com/ideablog/transcribe-mp3-audio-to-text-mac-os/ (mirror).
Related
I am having troubles finding the answer for this question on web.
The project I am developing demands that I could save a recorded audio file, and, after that, transcribe the audio to text for finding interesting predefined keywords.
I am using the Windows.Media.SpeechRecognition framework, and it works fine when you are transcribing the speech during the recording process. I can't find, in the same framework, a function which I can use with an audio file as input.
Does anybody know a good approach for this problem? Or another [free] framework for Windows Apps?
For online recognition and in particular in JS projects you can use directly Microsoft Cognitive Services, that are behind online recognition in the SpeechRecognition in Windows. It is free under some limits.
In particular here is open sourced wrapped for JavaScript on GitHub:Oxford.Speech.JS. It can deal with both wav-files and microphone. Sample code is designed like a website, but I'm pretty sure you can easily convert it into a HTML/JS-based UWP app.
How can I use OSX's speech-to-text tools programmatically? OSX has offline "enhanced dictation" which essentially means that somewhere on my computer is all the data required to turn audio into speech. I would like to invoke these capabilities from an executable.
I have seen some AppleScript files essentially do this, but I can't get them to work on OSX.
NSSpeechRecognizer is an API that provides access to the older "Speakable Items" functionality that's been around since before OS X (now called "Dictation Commands", and requiring Enhanced Dictation).
This is just a command interface, though — that is, you provide a list of commands, and it tells you when the user has spoken one of them. There's no public API for full speech-to-text dictation.
I am creating a native OS X application, and I was surprised at how difficult it is to find documentation on text-to-speech with native APIs. What would be the easiest way of having my application speak (using Alex's voice for example)?
Thanks!
What you call “text-to-speech” is also commonly abbreviated as TTS and alternatively called “speech synthesis”.
The Cocoa class NSSpeechSynthesizer is the API to use. The canonical sample code is CocoaSpeechSynthesisExample.
There also is a guide to “Speech Programming Topics” and a “Speech Synthesis Programming Guide” available.
Finally, there are lower level APIs available if you need access to stuff that is abstracted away for you by NSSpeechSynthesizer.
Look at this please NSSpeechRecognizer example
its a text to speech built in library for OS X .. NSSpeechRecognizer
Can I make Mac OS X "ping" when it recognizes the Speech Recognition keyword?
It pings when it recognizes a phrase, but that's a little different.
My speech recognition is working fine without a keyword, but fails
when I use a keyword, even if it's a short keyword like "Bob" or
"Hal".
If I can at least know when it's accepted the keyword, it would be helpful.
Have other people tried to use Speech Recognition on your machine and failed? The recognizer can, sometimes, have issues with certain accents. Use the OS X text-to-speech system to read out what you want to say (use Alex or one of the other normal-speech voices [i.e. not Zarvox]) and try to match its pronunciation exactly.
my primary language is spanish, but I use all my software in english, including windows; however I'd like to use speech recognition in spanish.
Do you know if there's a way to use vista's speech recognition in other language than the primary os language?
Citation from Vista speech recognition blog:
In Windows Vista, Windows Speech
Recognition works in the current
language of the OS. That means that
in order to use another language for
speech recognition, you have to have
the appropriate language pack
installed. Language packs are
available as free downloads through
Windows Update for the Ultimate and
Enterprise versions of Vista. Once
you have the language installed,
you’ll need to change the display
language of the OS to the language you
want to use. Both of these are
options on the “Regional and Language
Options” control panel. You can look
in help for “Install a display
language” or “Change the display
language”.
To complete aku's answer, you have here different methods to have a "multilingual use in Vista".
Installing a language pack
Switching to a different language (and back)
Creating computer users. Create a user for each language and change the display language for that user to the language of your preference. A new Speech profile will be automatically created for that user. Switch between your languages by the normal procedure of “switching to another user” (Log offà Switch users).
Note: You can create a speech recognition profile for each user with any name you prefer. Change the name, or create a new user, in the Advanced Speech panel.
COMMENTS:
The advantage of the Separate Users method is that you can switch back and forth without changing any computer defaults.
The disadvantages are that it takes more disk space and more attention must be given to user management, and that you may not have access to files opened or saved by your other users unless you know how to give yourself such an access via the new permission dialogues of Windows Vista.
You should look at System.Speech.Recognition.SpeechRecognitionEngine - it's an 'in-proc' recognizer that will let you specify the language you want.
Your next problem is that en-US Vista doesn't ship with the spanish recognition engine. For that, you'll need the Spanish Language Pack. Once you install that, you should be able to instantiate a spanish recognition engine like this:
using System.Speech.Recognition;
SpeechRecognitionEngine recognizer = new SpeechRecognitionEngine(new CultureInfo("es-ES"));
At that point, you can install grammars & do recognitions, etc.
Sure, but I want to do it without
changing the display language... no
way then?
No, not officially, if you believe this KB article: The Windows Speech Recognition language must be the same as the operating system language in Windows Vista.
So try to change it automatically, there some scripts on the internet, I found them via yahoo with Windows Speech Recognition "change language".
This one looks interesting, but it is not tested. I don't know, if it's malware or whatever, so be carefull:
Vistalizator
Good luck!
You can install the language pack, but not apply it on your user. Then you might be able to change the language of the speech recognition, although I haven't tried it since I don't have Vista Ultimate.
It will work fine as I had by changing lanuguage support.