How to accept more sentences by sphinx4 - macos

I'm using Sphinx4 version 4-1.0beta6 over my mac osx 10.9.1 through the terminal.
I'm still new in this SR application. I've already run HelloWrold example and added some new words to the gram file and it worked. Now, I'd like to use rules or something that helps in order to accept more sentences spoken by the user. My questions are:
How Can I make my HelloWorld do that? Should I use rules? and if so, is there any examples?
If I used rules, How can I print the spoken question back to the user?
Thanks in Advance.

How Can I make my HelloWorld do that? Should I use rules? and if so, is there any examples?
It depends on the task but generally you want to extend the grammar with more rules. You can find more information in CMUSphinx tutorial at http://cmusphinx.sourceforge.net
If I used rules, How can I print the spoken question back to the user?
You get the recognition result as string. You can print it with System.out.println or display in your UI or do whatever you want.

Related

BotComposer, how to iterate through the characters of a string using lg language?

We need to extract a number from a phrase. For example:
"hey, 1234" -> "1234"
"ok, 4567" -> "4567"
"b3456f" -> "3456"
But we don't found how to iterate through a string using only language generator of the Bot Composer.
We try things like:
join(foreach(createArray("ab c"), x, concat(x, '-')), '')
But with no result... is there any prebuild function that converts a simple string on an array of chars, so we can iterate char by char using foreach?
Thanks!
As far as I know, this currently isn't possible as there's no way to iterate over a string or split a string into a new array by character. I've opened a GitHub issue to request it as an enhancement.
For:
"hey, 1234" -> "1234"
"ok, 4567" -> "4567"
You can use split().
Unfortunately, you're out of luck for your "b3456f" -> "3456" example, unless you know it's going to come in that exact format, in which case, you could use substring().
You could maybe look into using a Regex to do this, if you know the formats will be pretty controlled, but another option is to look at the LUIS language understanding services from Microsoft, which are built exactly for understanding different parts of a text message, especially in a bot context. Here's a link to getting started with this, for C# (on the menu just below in this link, is a Node example if that's what you need).
There's also a tag here on Stack Overflow focused just on LUIS, if you run into trouble or need any more help.
Hope that helps
[Update] I re-read your question and I see now it's about BotComposer, not a custom developed bot. As a result, the sample I linked to is not valid, but LUIS certainly is. I haven't used Bot Composer myself, but LUIS is integrated as part of it - see here.

TWINE game localisation

Do anyone know if it is possible to localise a TWINE game? I’d like to have my interactive stories in all the Scandinavian languages. I also plan to have mp3 spoken narration in each language for non-readers at a later stage. My thought was to maybe have one complete story file per language but it seems like a hard thing to maintain.
Do anyone have a best-way of doing this?
It may be possible to localize your Twine game's default UI strings, depending on your story format. For example, if you're using the SugarCube v2 story format in Twine, then there are some SugarCube localizations here.
However, for your story text it's entirely up to you how you handle displaying that based on the user's choices. Again, assuming you're using SugarCube, you might have the user select the language in the beginning like this:
<<set $lang = "EN">>
''Select your language:'' <<listbox "$lang" autoselect>>
<<option "English" "EN" >>
<<option "русский" "RU">>
<<option "українська" "UK">>
<<option "Türkçe" "TR">>
<</listbox>>
That will give you a dropdown list of language options.
Then, in each of your passages, you would have something like this:
<<switch $lang>>
<<case "RU">>
(Russian version of passage.)
<<case "UK">>
(Ukranian version of passage.)
<<case "TR">>
(Turkish version of passage.)
<<default>>
(English version of passage.)
<</switch>>
You could put any non-language based code outside of that "switch" macro.
If you're using something other than SugarCube for your story format, then you'll likely use something similar to that.
Hope that helps! :-)

Matching users with objects based on keywords and activity in Ruby

I have users that have authenticated with a social media site. Now based on their last X (let's say 200) posts, I want to map how much that content matches up with a finite list of keywords.
What would be the best way to do this to capture associated words/concepts (maybe that's too difficult) or just get a score of how much, say, my tweet history maps to 'Walrus' or 'banana'?
Would a naive Bayes work here to separate into 'matches' and 'no match'?
In Python I would say NLTK can easily do it. In Ruby maybe gem called lda-ruby will help you. Whole LDA concept is well explained here - look at Sarah Palin's email for example. There's even the example of an app (not entirely in Ruby, but still) which did that -> github.com/echen/sarah-palin-lda
Or maybe I just say stupid things and that can't help you at all. I'm not an expert ;)
A simple bayes would work in this case, it is highly used to detect if emails are spam or not so for a simple keyword matching it should work pretty well.
For this problem you could also apply a recommendation system where you look for the top recommended keyword for a user (or for a post).
There are a ton of ways for doing this. I would recommend you to read Programming Collective Intelligence. It is explained using python but since you know ruby there should be not problem to understand the code.

How can I detect a user's input language using Ruby without using an online service?

I'm looking for a library or technique to detect the input language of blocks of text provided by users. Online lookups (like Google translate) won't work for this task as I'm writing an app which must run offline.
Thanks.
Here are two more n-gram-based gems you might want to try. They work offline.
https://github.com/echen/unsupervised-language-identification, optimized for separating english and other languages (has a live demo)
https://github.com/feedbackmine/language_detector, less specialized, will detect more languages. Some languages may need some extra training — I found it to be not precise enough for German text.
For anyone interested, I've found http://rubygems.org/gems/kenwaln-whatlanguage, which is performing excellently.
I'm using CLD which I really like, succinct and easy to use. Give it a try.
A quick demo of WhatLanguage in Ruby:
http://www.youtube.com/watch?v=lNqZ2cqOReo&list=UUJ_3fstMOH-g4yBxtvgAWkw&index=0&feature=plcp

SSpeechRecognizer delegate to be called for any word spoke

I read about NSSpeechRecognizer and found that it can recognize a set of commands beings associated with it in delegate: -speechRecognizer:didRecognizeCommand:
I have a simple question: can this delegate be called for any word spoken by user?? as I think.. only a finite number of words can be associated with it!
Thanks,
Miraaj
It's exactly what it says on the tin: It's for recognizing commands. So, yes, you need to tell it up front what commands it should recognize.
It's not a dictation API. I would guess that if you tried to load up the command list with an English dictionary, you'd make recognition very processor-intensive, slow, and inaccurate.
If you want dictation, you should file an enhancement request to ask for it.

Resources