Is it possible to use google voice recognition api on a robot continuously? - google-api

By continuously I mean something like continuous.c, the default program of pocketsphinx. As I tried google voice recognizer or other apps like cortana I know the user should press the record button, then the voice recognizer records his sound for seconds, then sends it to server and so on. But in pocketsphinx, when you run the program, it's listening to you until you terminate it.
I know the pocketsphinx works offline and google and cortana work online. So I am interested to know, is it possible to do continuous voice recognition with online api's like what pocketsphinx does in offline, or not?
If I want to build a robot that works with voice commands, is it possible to use google asr api considering that there is no record or any button to start recording voice commands for the robot(like what we do in smartphones or computer)

Related

How to detect webex meeting or google meeting in progress

I would like to automate disabling breaks in stretchly while a webex meeting or google meeting is in progress on my PC. I'm using win10.
Unfortunately, both Webex and google meeting do not spawn a new process when the meeting is in progress. Therefore I need an alternative detection mechanism.
A running Webex meeting could be detected via a window with a particular title. Is there another alternative?
Google meeting runs in a browser (I use Chrome) and I have not found a simple way to get a list of opened tabs. I looked at network connection but I have not found a simple pattern for detection.
Can you suggest to me other ideas on how to detect running meetings? I don't want to check the calendar because you can start meeting manually.
Thx Vitek

Connect google assistant with chromecast( like netflix)

Is that possible to connect google assistant with chromecast like netflix.
suppose i have already developed an app for google assistant, app name is test and suppose i have something in the test app to show like report.
Is this possible that,I would say hey google, show report then. It should automatically search chromecast devices nearby and connect to near by chromecast device then show the report on that connected chromecast device.
Yes, this is possible. Since August 2018, voice commands officially work with the following apps:
Netflix
CW
CBS All Access
HBO Now
YouTube
YouTube TV
Viki
Crackle
Red Bull
Starz
Google Play Movies & TV (for videos you’ve already rented or purchased)
To use voice commands with Chromecast, you need a device with Google Assistant (such as the Google Home or Google Home Mini smart speaker), plus a phone or tablet with the Google Assistant app installed. Voice commands also work with some smart TVs that have Chromecast built in (see Google’s website for a list of supported sets). Check this article for more details.
While casting from other sources works, an action is not able to provide content that can be casted to other devices.

Is there a way to detect video/audio playback on Windows and Mac?

Is there a Win32 API for Windows or a Cocoa API for Mac which detects video/audio playback?
I'm developing a desktop application that needs to be aware of the user's activity on the computer. When the app is running, if the user is actively using his computer, the app should stay dormant. If the user is away from the computer for a certain period of time, then the app should run some logic.
There is a way to detect user activity via keyboard/mouse interaction. However, if the user is passively engaged (e.g. watching a video or listening to music) without any keyboard/mouse action, then I have no way of knowing.
There is a similar question but no answer.
Video playback detection with Win32 API?
Any help would be much appreciated. Thank you!

Google assistant in other country - pause music

I live in Poland, where Google Assistant isn't allowed (operated) and I would like to write a simple script, which allow me to say: "Ok, google, muzyka stop". Next device will stop music play. Is it possible to write this in dialogflow or other system?
When creating apps for the Google Assistant I'd recommend staying away from use cases already covered by the Google Assistant. In this case controlling music is already covered by the Google Assistant and will already work with most music streaming providers.
If you'd like to add the Google Assistant to your device and add the option to control your device, a conversational building tool like Dialogflow is probably not the best tool. You may want to consider adding the Google Assistant and voice controls to you device use the Google Assistant SDK.

Xamarin voice command by car

I'm looking for the best approach to implement voice command in a Xamarin app.
Here are my requirements:
I don't need to launch my app by voice. Instead, my users will launch the app through touch (so, when the app is not running, no voice recognition is needed by my app)
My app is a client/server app and it will work always on (the backend will run on azure)
My app will be used primarily by car (so consider environment noise)
My app will work in many languages, such as Italian, Spanish, French and English
My app should be developed with xamarin (and eventually mvvmcross or similar)
In my app there will be two kinds of voice commands:
to select an item from a short list: app will show a list of items, such as "apple, kiwi, banana and strawberry" and user will have to say one of those words.
to change current view. Typically these voice commands will be something like "cancel", "confirm", "more" and stuff like these
The typical interaction between user, app and server should be this:
user says one of the available commands in current view/activity/page
suppose here that the user perfectly knows which commands he/she can use, it does no matter now how he/she knows these commands (he/she just knows them)
user could put before the commands some special words, such as "hey 'appname'", to have a command like "hey 'appname', confirm"
Note: the "hey 'appname'" part of the voice command has the only purpose to allow the app to know when the command starts. The app can be always in listening mode, but has to avoid to send the audio stream continuously to the server to recognize commands
best case is if app would recognize these commands locally, without involve the remote server, since the voice commands are predefined and well-known in each view. Anyway, app can send the audio wave to the server which will return a string (in this example the text returned will be "confirm", since the audio was "hey 'appname', confirm")
app will map the text recognized with the available commands, and will invoke the right one
user will receive a feedback by the app. The feedback could be:
voice feedback (text-to-speech)
visual feedback (something on the screen)
both above
I was looking for azure-cognitive-services, but in this case, as far as I've understood, there is no way to recognize the start of the command locally (everything works on server side through REST api or clients). So the user would have to press a button before every voice command, and I need to avoid this kind of interaction.
Since the app is running, my user has him/her hands on the steering wheel, and he/she can't touch everytime the display. Isn't it?
Moreover, I was looking for cortana-skills-kit and botframework, but:
It seems that Cortana Skills are available in English only
Actually, I don't need to involve Cortana to launch my app
I don't have experiences on these topics, so, hope that my question is clear and, generally speaking, that can be useful for other newbie users as well.
* UPDATE 1 *
The Speech Recognition with the Voice Command Definition (VCD) file is really close to what I'd need, because:
it has a way to activate the command through a command name shortcut
It works in foreground (and background as well, even if in my case I don't need the background)
Unfortunately, this service works only on Windows, since it uses the local API. Maybe the right approach could be based on the following considerations:
Every platform exposes a local speech recognition api (Cortana, Siri, Google Now)
Xamarin exposes Siri and Google Now apis and make them available through C#
It would be useful to create a facade component to expose the three different local speech api through a common interface
I'm wondering if there is something other solution to this. Cortana, as personal assistant, is available on Windows, iOS and Android. Since Cortana works both with local api and with remote service (Cortana Skills), is Cortana the right approach? Has Cortana the support for many languages (or, at least, has the support a road map)?
So, just some thoughts here. If you have some other ideas, or suggestions, please add here. Thanks

Resources