MS SAPI sdk equivalent on OSX - macos

I'm looking for an SDK that would allow me to have speech recognition on a OSX application.
I already have a working code for windows using sapi, to get speech recognition info from an audio file, and i would like to see how to do this in osx since something like SAPI is not available.
Thanks!

The OS X equivalent is the Speech Recognition service:
http://developer.apple.com/library/mac/#documentation/cocoa/conceptual/speech/Articles/RecognizeSpeech.html#//apple_ref/doc/uid/20002081-BCIHEBFH
Note that the Speech Recognition service is limited to listening for predefined command phrases. It cannot be used for dictation.

Related

Implementing a TTS service for Windows 10

I'm working on a research project in which we create a new text-to-speech (TTS) engine, that converts text to spoken audio.
As the engine is already performing good, we try to make it usable by a large number of applications which made us want the engine to show up as a TTS voice on Windows 10.
In Microsoft's developer documentations, all I found was information on how I can use exisiting/already installed voices in my application. However, I didn't find any information on how to implement a voice so that it shows up as a Windows voice and can be used by any application using the Speech SDK or SAPI.
Which interface do I have to implement or what API do I have to connect to in order to get our new TTS engine work with Windows Speech?
I already crawled the documentation of the Microsoft Speech SDK as well as developer sites like https://learn.microsoft.com/en-us/dotnet/api/system.speech.synthesis.ttsengine
You should look at the TTS Engine Vendor Porting Guide. You need to implement ISpTTSEngine, which does all the work, and ISpObjectWithToken, which manages registration and creation.

Voice/Speech Recognition on windows platform

Is there a way to use the voice recognition platform provided by microsoft to build an app which uses the voice to perform some tasks?
Yes, that's what it's designed for. Here is some documentation with examples:
https://msdn.microsoft.com/en-us/library/office/hh378426%28v=office.14%29.aspx

Using Mac OSX Dictation with Speech API

In OSX Mavericks, speech dictation is now included, and is very useful. I am trying to use the dictation capability to create my own digital life assistant, but I can't find how to use the recognition functionality to get the speech in an application rather than a text box.
I have looked into NSSpeechRecognizer, but that seems to be geared toward programming speakable commands with a pre-defined grammar rather than dictation. It doesn't matter what programming language I use, but Python or Java would be nice...
Thanks for your help!
You can use SFSpeechRecognizer (mirror) (requires macOS 10.15+): this is made for speech recognition.
Perform speech recognition on live or prerecorded audio, receive transcriptions, alternative interpretations, and confidence levels of the results.
Whereas as you have noted in the question NSSpeechRecognizer (mirror) indeed provides a “command and control” style of voice recognition system (the command phrases must be defined prior to listening, in contrast to a dictation system where the recognized text is unconstrained).
From https://developer.apple.com/videos/play/wwdc2019/256/ (mirror):
Another way is to directly use Mac Dictation, but as far as I know the only way is to rerdirect audio feeds, which isn't very neat, e.g. see http://www.showcasemarketing.com/ideablog/transcribe-mp3-audio-to-text-mac-os/ (mirror).

Windows 8 speech to text and text to speech API

I guess the question pretty much says it all?
And I would prefer not to access cloud services such as Microsoft Translator and Project Hawaii?
Is there any direct API I can access?
(For metro apps)
I was able to get the Microsoft Speech Platform working on my Windows 7 laptop (both Voice Recognition and Text-to-Speech). You just need to install SDK and the runtime. You can also download additional Voice and Language packs. I would think it should work on Windows 8 as well. Here is a good sample on how to set it up to recognize some basic phrases like "Find restaurants near Seattle".
There are some new APIs for Windows 8.1:
http://msdn.microsoft.com/en-us/library/windows/apps/windows.media.speechsynthesis.aspx
Text-To-Speech sample:
http://code.msdn.microsoft.com/windowsapps/Speech-synthesis-sample-6e07b218
//build video on Channel9:
http://channel9.msdn.com/Events/Build/2013/2-171
There are no Text to Speech or Speech to text libraries available in .Net for Windows 8 apps. System.Speech and such is not available. You will need to roll your own or find compatible 3rd party library.
I use Bing Translator service in my apps, but that has been removed from existance, they brought the text translator services into Azure, but did not bring speech over and they disabled the old website to get API keys for bing translator. Hadn't heard of Project Hawaii before will have to check it out.
Microsoft's speech API seems available in Windows 7. Was it taken out of Windows 8?
You can now Access Bing Services for Windows 8 , the service has been just released and is in Beta1 stage

How to use speech to text in WP7 mango Application using inbuilt Mango voice feature programmatically

can we call inbuilt Windows phone mango voice feature programmatically as in we use Launcher and Choosers.
Is there a way I can use this feature for my Translator application where my application recognises the voices spoken in different languages and convert them to text in that particular language.
Do we have this feature in BING Translator service? if yes then how to use them?
also how to use the INbuilt voice command of my WP mango to build my application?
can we call inbuilt Windows phone mango voice feature progamatically as in we use Launcher and Choosers.
No. There is no speech API for Windows Phone as of right now. See Microsoft TellMe for a upcoming API.
Is there a way i can use this feature for my Translator application where my applictaion recogonises the voices spoken in different languages and convert them to text in that particular language.
No
also how to use the INbuilt voice command of my WP mango to build my application?
You can't.
Not sure if you have found an answer for this, but Wade Wegner from the azure team has a translating and OCR app built using the Bing translation services. I have not tried it, but this seems like something you could use.

Resources