Windows Speech API questions - windows

I've got two questions for using the Windows Speech API.
First: I've got my Speech Recognizer set up to detect sentences of a specific structure--namely a verb followed by a noun, with some wildcard stuff in between. However, I also would it like it to be able to recognize a "Help" and "Exit" command that would not fit this structure. How can I get the grammar to recognize another, fundamentally different, structure?
Second: I am using SemanticResultValue to analyze the content of my sentences. I want there to be multiple words that users can say for the same verb--for example, "Go," "Walk," and "Run" would all translate to the same action in the system. How do I assign multiple values to the same SemanticResultValue?

1) Multiple grammars would be the obvious solution here; one grammar for your verb/noun, and a separate grammar for pure verbs.
2) The SemanticResultValue constructor that takes a GrammarBuilder parameter (SemanticResultValue (GrammarBuilder, Object)) would be appropriate here.

Related

BotComposer, how to iterate through the characters of a string using lg language?

We need to extract a number from a phrase. For example:
"hey, 1234" -> "1234"
"ok, 4567" -> "4567"
"b3456f" -> "3456"
But we don't found how to iterate through a string using only language generator of the Bot Composer.
We try things like:
join(foreach(createArray("ab c"), x, concat(x, '-')), '')
But with no result... is there any prebuild function that converts a simple string on an array of chars, so we can iterate char by char using foreach?
Thanks!
As far as I know, this currently isn't possible as there's no way to iterate over a string or split a string into a new array by character. I've opened a GitHub issue to request it as an enhancement.
For:
"hey, 1234" -> "1234"
"ok, 4567" -> "4567"
You can use split().
Unfortunately, you're out of luck for your "b3456f" -> "3456" example, unless you know it's going to come in that exact format, in which case, you could use substring().
You could maybe look into using a Regex to do this, if you know the formats will be pretty controlled, but another option is to look at the LUIS language understanding services from Microsoft, which are built exactly for understanding different parts of a text message, especially in a bot context. Here's a link to getting started with this, for C# (on the menu just below in this link, is a Node example if that's what you need).
There's also a tag here on Stack Overflow focused just on LUIS, if you run into trouble or need any more help.
Hope that helps
[Update] I re-read your question and I see now it's about BotComposer, not a custom developed bot. As a result, the sample I linked to is not valid, but LUIS certainly is. I haven't used Bot Composer myself, but LUIS is integrated as part of it - see here.

Seq2Seq Model - Chatbot

I am creating a chatbot using seq2seq. Normally we remove all punctuation and stop words while processing of Text Data and feed the same to Model.
So my questions will this not impact the readability of Output?
For example - a user input some question in Chatbot window and press enter to get an answer. Now if the user gets the answer without the punctuation and stop words, will this impact the readability?
It really depends on what type of Chatbot you want to create. Generally we have two types of ChatBots:
Retrieval-based: You train your model with lots of pairs. In inference phase, your model find most similar item to training example and return to that to the user. In this case, we find most similar item between user question and our questions. Then return to most similar question's response to the user. So if we do preprocessing, on question this will not affect the readability or other things.
Generation-based: In generation based chatbot (Such as seq2seq that you mentioned), the response of the chatbot is completely rely on what you feed in for training. If you remove punctuation or stop-word, Yes it impact on your response and you can not see those things in your Chatbot response.
Of course it degrades readability. Many of those words and all of the punctuation exist to guide the reader to the intended parsing of the sentence. To put it another way:
course degrades readability many words
punctuation guide reader intended parsing
sentence put another way
There are many examples of phrases, sentences, and paragraphs that require punctuation to disambiguate the intended meaning.
Removal of "syntactic sugar" is only to promote using certain (most) techniques for quickly determining likely relevance to a similarly processed document. Your bot design has to separate this process from the user interface. Whatever you return to the user should be in human language, not the internal word soup you employ for information retrieval.

Continuous Language Recognition on Pocket SPhinx

I am trying to implement the voice recognition using pocketSphinx. Currently, i am able to recognize multiple phrases from the grammar file. using
recognizer.addKeywordSearch(KWS_SEARCH,menuGrammar);
The issue, is only the exact phrases from the grammar file get recognized. I want to be able to recognize ( and print) more English like statements. If I am able to retrieve the statements spoken by the user, i will then search for certain keywords ( and then do the required action)
Can you please guide me on how to recognise multiple statements spoken by the user ( without having to force the user to say the specific phrases as it is)

Translation and fixed number of letters words

In a part of my software I need, in different languages, lists of words having a fixed number of letters.
My software is already internationalized (using gettext, and it works fine). But, as it's not possible to translate a 4 letters word from english into a 4 letters word in another language (apart from some exceptions maybe), I can't imagine a reasonable way of letting gettext deal with these words lists.
So, for now, I decided to store my words like this (shortened example with english and french names):
FOUR_LETTERS_WORDS = { 'fr': ["ANGE", "ANIS", "ASIE", "AUBE", "AVEN", "AZUR"],
'en': ["LEFT", "LAMP", "ATOM", "GIRL", "PLUM", "NAVY", "GIFT", "DARK"] }
(This is python syntax, but the problem has not much to do with the programming language used)
The lists do not need to have the same length; they do not need to contain the same words.
My problem is following: if my software is to be translated in another language, like say, german, then all the strings that fall within the scope of gettext will be listed in the pot file and will be available for translation. But then, I also need a list of 4 letters words in german, that won't show up in the translation file.
I would like to know whether I need to think to ask the translator if he/she can also provide a list of such words, or if there's a better way to deal with this situation? (maybe finding a satisfying workaround with gettext?).
EDIT Realized the question has actually not much to do with the programming language, so removed the python* tags
You can do it with gettext. It's possible to use "keys" instead of complete sentences for translation.
If you use sentences in your .po files and don't want to have to translate the main language (let's say english), you don't need to translate them and only provide translation files for these words. If gettext finds a translation file, it uses it, else it will display the key (the msgid). It can be a complete sentence or a key, it does not matter.
To do that, you simply need to use a specific text domain for these words and use dgettext() function. The domain allows you to separate files depending on context, or whatever criteria of your choice (functionnality, sub-package, etc.).
To count these words, it's not as easy. You can count them with grep -c, for instance. You can provide a specific key that contains the number of 4 letters words (this would be a dirty hack on which you could probably not really rely).
Maybe there's another way in Python, I don't know this language...

How to detect vulnerable/personal information in CVs programmatically (by means of syntax analysis/parsing etc...)

To make matter more specific:
How to detect people names (seems like simple case of named entity extraction?)
How to detect addresses: my best guess - find postcode (regexes); country and town names and take some text around them.
As for phones, emails - they could be probably caught by various regexes + preprocessing
Don't care about education/working experience at this point
Reasoning:
In order to build a fulltext index on resumes all vulnerable information should be stripped out from them.
P.S. any 3rd party APIs/services won't do as a solution.
The problem you're interested in is information extraction from semi structured sources. http://en.wikipedia.org/wiki/Information_extraction
I think you should download a couple of research papers in this area to get a sense of what can be done and what can't.
I feel it can't be done by a machine.
Every other resume will have a different format and layout.
The best you can do is to design an internal format and manually copy every resume content in there. Or ask candidates to fill out your form (not many will bother).
I think that the problem should be broken up into two search domains:
Finding information relating to proper names
Finding information that is formulaic
Firstly the information relating to proper names could probably be best found by searching for items that are either grammatically important or significant. I.e. English capitalizes only the first word of the sentence and proper nouns. For the gramatical rules you could look for all of the words that have the first letter of the word capitalized and check it against a database that contains the word and the type [i.e. Bob - Name, Elon - Place, England - Place].
Secondly: Information that is formulaic. This is more about the email addresses, phone numbers, and physical addresses. All of these have a specific formats that don't change. Use a regex and use an algorithm to detect the quality of the matches.
Watch out:
The grammatical rules change based on language. German capitalizes EVERY noun. It might be best to detect the language of the document prior to applying your rules. Also, another issue with this [and my resume sometimes] is how it is designed. If the resume was designed with something other than a text editor [designer tools] the text may not line up, or be in a bitmap format.
TL;DR Version: NLP techniques can help you a lot.

Resources