How to handle misspelled LUIS entity - azure-language-understanding

Lets suppose it is movie bot. I added entity MovieName, and phrase list containing movies. One of the movie name is "Star Wars", and if user misspell it to "Stra Wra" then how I can tackle this issue? Will Bing spell check service help for non English movie names, I'm not sure?

LUIS will not be able to capture misspelled entities by itself unless you provide examples with misspelled entities which is not practical.
So you need to feed the utterances corrected to LUIS.
For Bing spelling correction service you have to try it yourself, but I guess it will handle your case.

If you expect some common misspellings that you expect to be repeated, you could add them in an exchangeable phrase list feature. That will help with the prediction of these misspelled entities.

There are multiple ways to solve this:
Use synonyms with most common mistakes
Have another step in your pipeline (before going to LUIS), which matches user input to possible options and corrects them (even a self made solution would do great, but you can also try to add ElasticSearch with fuzzy queries)

Related

Handling typos / misspellings on list entities

What is the best practice approach to handle typos / misspelling on LUIS List Entities?
I have intents on LUIS which use a list entity (specifically Company Department - HR, Finance, etc). It is common for users to misspell this when putting forward their utterance. LUIS expects an exact match, it doesn't do a "smart" match, and therefore doesn't pick up the misspelled entity.
a) Using bing spell check is not necessarily a good solution. e.g. Certain departments are acronyms such as VRPA - and bing wont correct a typo there.
b) When I used LUIS a year ago, I would pre-process the utterance and use a Levenshtein distance algorithm to fix typos on list entities before feeding them to LUIS.
I would imagine that by now LUIS has some better out of the box way of handling this very common use case.
I'd appreciate input on what the best practice approach is to handle this.
#acambitsis and I exchanged messages via his UserVoice ticket, but I'm going to post the answer here for others.
A combination of Bing and Simple Entities might be what you're looking for, then (they're machine-learned).
I was able to accomplish something close and attached images.
In entities, I created a Simple entity with the role, VRPA. In intents, I created the Show Me intent and added sample utterances "Show me the VRPA" and "Show me the VPRA". I clicked on V**A and selected the Simple Entity:VRPA role. After training, I tried "show me the varp" and it correctly guessed "varp" was the "Simple:VRPA" entity.
You may also find RegEx entities useful. For acronyms, you could do something like: /[vrpa]/i and then any combination of VRPA/VPRA/VARP/ARVP would match.
I highly recommend reading through the Entity Types and Improve App Performance to see if anything jumps out to solve your particular issues.
This may not do exactly what you're looking for. If not, I'd recommend implementing a fuzzy-matching algo of your choice.
entities
intents

How do you train LUIS to recognize general nouns?

I have a bot that was initially based on the Zummer example.
I would like the Search intent to pick up practically any topic you could search for as an entity.
I tried training using several example phrases but it became apparent that although the intent is correctly detected, the ArticleTopic entity only picks up the specific nouns provided as examples.
I also tried creating a regex entity using .* but this matches every complete utterance.
Is there a general approach to tell LUIS to capture some part of an utterance regardless of its contents?
Examples of what I would like to support:
Search for *, What is *, What are *, Tell me about *, etc.
You should use patterns and the entity which is specific to pattern which is Pattern.any. This entity return all the text which is where the entity has been marked.
It should give something like that :
Search for Entity
What is Entity
What are Entity
This issue could be covered with the new Patterns feature (using pattern.any).
This feature helps in labeling the noun following a specific pattern.
If you add the pattern.any entities to your LUIS app, you can't label utterances with these entities. They are only valid in patterns. Here is another example which explains how pattern.any feature resolves the issue of multi-word entity handling. I have reproduced your issue and it works. Hope this helps!!

How to detect names as entities using LUIS in Microsoft Bot Framework

I am using luis.ai which is offered as a part of Microsoft Cognitive Services, in my project. I have a requirement of detecting names using LUIS. For the same, I have been using the phrase list feature. I have added some names in the list. But as we all know, the names list is never exhaustive. So, no matter how many names I add, since they don't have a specific pattern, when I test with some new names, the entity detection fails. I want to know if there's any other way in which we can have LUIS detect names of people.
Please let me know if you have a solution to this problem.
LUIS could be used to recognize and extract intents and entities from utterances, but based on my experience, it might not be 100% intelligent to identify person’s name, because person’s name could be anything.
As you did, adding not well-recognized names in phrase list could be as a solution. Besides, this github issue:Identifying the Names from the sentence using LUIS discussed a similar question, and as cahann mentioned, you can add and label more example utterances that contain not well-recognized name to make your LUIS app recognize Names better.

MSBOT-LUIS: How to specify the mandatory words in utterance? Is it possible by using phrase list features?

I am using phrase list features of LUIS. i am adding my mandatory words in my phrase list.(correct me if i am wrong)
For single mandatory word my intent works fine. But in my another intent i have 2 mandatory words in single intent which is not working fine.
Behaviour
My phrase list- product: [moisturizer,anti wrinkle cream,laugh lines,anti aging skin treatment]
target area: [face,my face,neck,forehead]
Intent name- ste1
utterance- do you have moisturizer?
user enters- "do you have bla bla"- as expected its going to none intent.
Intent name- ste2
utterance- do you have moisturizer for my face?
user input- "do you have moisturizer for my bla bla"- As here "moisturizer" is present bt "my face" is not! This should also hit none intent but its hitting to ste1 intent because "do you have moisturizer?" is completely present in ste1.
Expected Result-
I want to validate that my these two words(moisturizer, face) should be mandate to hit the ste2 intent otherwise i want it to hit none intent.
LUIS only provides a recognition service. If you want to validate something like "face" and "moisturizer" being present in a user's utterance, this should be done in your code.
You may train your bot to direct "incomplete" utterances to the "None" intent (by your description, utterances like, "I want moisturizer", or "I want lotion") but as you yourself noted;
But user can enter any random thing so I cant predict what should be in none intent...
Therefore what you should do in your model and code is add entities for "moisturizer" and "face". With these entities, inside of your code you can take the LUIS response and quickly see if you have the required basic information to start the dialog. If one entity is provided ("moisturizer") but another entity is missing (a part of the body), your bot would help the user disambiguate by prompting them what they're looking for specifically, e.g. face moisturizer or hand moisturizer.
A good way to approach the phrase lists and pattern features is that they're augmentations; they do help the machine learned model, but the weight/impact they provide when determining an intent is less than an entity's weight. The phrase lists and pattern features are not replacements for entities.

LUIS inserts whitespace in utterances when punctuation present causing entity getting incorrectly parsed

I am playing around with the Luis stock ticker example here, GitHub MicrosoftBotBuilder Example, it works well and the entity in the utterances is identified but there are stock tickers in the world that have periods in them such as bt.a
Luis by default pre-processes utterances where word breaks are inserted around punctuation characters and therefore an utterance of "what is price of bt.a" becomes "what is price of bt. a" and therefore Luis thinks the entity is "bt" instead of "bt.a"
Does anyone know how to get around this? Thx
This is how LUIS tokenizes utterances and I don't think it'll change int he near future.
I think you can investigate one of the 2 solutions:
Preprocess the utterance and normalize entities with punctuation (maybe save them in a map), and reverse the process when LUIS is called and the entities have been extracted.
Use phrase list features and add the entities that LUIS misses in their Tokenized form, label the entity tokens in the utterance, and retrain the model (I suggest you try that in a clone of your app, so you don't lose any current progress)
I need to process sentences with website addresses in them so I had to deal with a few different symbols. I found a technique that works for me, but it is not very elegant.
I am assuming here that you have an entity setup to represent the "stock symbol"
Here is what this would look like in your case.
Detect the cases when LUIS gets the "stock symbol" entity wrong. In
your case this may be whenever it ends in a period.
When LUIS gets the entity wrong, tokenize the raw query using spaces
as the separator. Grab the proper token by looking for a match with
the wrong partial token.
So for your example....
"what is price of bt.a"
You would see the "stock symbol" entity of "bt." and know that it is wrong because it ends in a period. You would then tokenize the query and look for tokens that contain "bt.". This would identify "bt.a" as the requested symbol.
Its not pretty, but in the case of website addresses has been reliable.

Resources