Pattern recognition in LUIS app : Please find {Entity1} {Entity2}{Entity3} documents - azure-language-understanding

We have created the LUIS app,
where defined the pattern & utterance in one of the intent.
as "Please find {Entity1} {Entity2}{Entity3} documents"
where are all entities are simple entities ,
we have tested most of the utterances like Please find abc bcd tst documents
for some example it works (entity recognition works properly) but other does not
Please suggest suitable way for correct entity recognition for every utterance
starting with Please find ... .... ... (ending with)documents
Swati

When matching a pattern, LUIS looks for entities first. Simple entities are machine-learned entities so in order for LUIS to find them it needs to know what those entities are. LUIS won't be able to recognize the simple entities in your pattern unless those entities are present in utterances so that LUIS understands what those entities represent. You can read more about patterns here.
I think it's likely that you don't really need a pattern in your case. Go ahead and try adding utterances to your intent like "Please find abc bcd tst documents" and then mark the entities in the utterances.

Related

How do you train LUIS to recognize general nouns?

I have a bot that was initially based on the Zummer example.
I would like the Search intent to pick up practically any topic you could search for as an entity.
I tried training using several example phrases but it became apparent that although the intent is correctly detected, the ArticleTopic entity only picks up the specific nouns provided as examples.
I also tried creating a regex entity using .* but this matches every complete utterance.
Is there a general approach to tell LUIS to capture some part of an utterance regardless of its contents?
Examples of what I would like to support:
Search for *, What is *, What are *, Tell me about *, etc.
You should use patterns and the entity which is specific to pattern which is Pattern.any. This entity return all the text which is where the entity has been marked.
It should give something like that :
Search for Entity
What is Entity
What are Entity
This issue could be covered with the new Patterns feature (using pattern.any).
This feature helps in labeling the noun following a specific pattern.
If you add the pattern.any entities to your LUIS app, you can't label utterances with these entities. They are only valid in patterns. Here is another example which explains how pattern.any feature resolves the issue of multi-word entity handling. I have reproduced your issue and it works. Hope this helps!!

How to detect names as entities using LUIS in Microsoft Bot Framework

I am using luis.ai which is offered as a part of Microsoft Cognitive Services, in my project. I have a requirement of detecting names using LUIS. For the same, I have been using the phrase list feature. I have added some names in the list. But as we all know, the names list is never exhaustive. So, no matter how many names I add, since they don't have a specific pattern, when I test with some new names, the entity detection fails. I want to know if there's any other way in which we can have LUIS detect names of people.
Please let me know if you have a solution to this problem.
LUIS could be used to recognize and extract intents and entities from utterances, but based on my experience, it might not be 100% intelligent to identify person’s name, because person’s name could be anything.
As you did, adding not well-recognized names in phrase list could be as a solution. Besides, this github issue:Identifying the Names from the sentence using LUIS discussed a similar question, and as cahann mentioned, you can add and label more example utterances that contain not well-recognized name to make your LUIS app recognize Names better.

LUIS entity not recognised

I trained my luis model to recognize an intent called "getDefinition" with example utterances such as: "What does BLANK mean" or "Can you explain BLANK to me?". It recognizes the intent correctly. I also added an entity called "topic" and trained it to recognize what topic the user is asking about. The problem is that luis only recognizes the exact topic the user is asking about if I used that specific term in one of the utterances before.
Does this mean I have to train it with all the possible terms a user can ask about or is there some way to have it recognize it anyway?
For example when I ask "What does blockchain mean" it correctly identifies the entity (topic) as blockchain because the word blockchain is in the utterance. But if I ask the same version of the question about another topic such as "what does mining mean", it doesn't recognize that as the entity.
Using a list or phrase list doesn't seem to be solving the problem. I want to eventually have thousands of topics the bot responds to, entering each topic in a list is tedious and inconvenient. Is there a way LUIS can recognize that its a topic just from the context?
What is the best way to go about this?
Same Doubt, Bit Modified. Sorry for Reposting this here.
At the moment LUIS cannot extract an entity just based on the the intent. Phrase lists will help LUIS extract tokens that don't have explicit training data. For example training LUIS with the utterance "What does blockchain mean?" does not mean that it will extract "mining" from "What does mining mean?" unless "mining" was either included in a phrase list, or a list entity. In addition to what Nicolas R said about tagging different values, another thing to consider is that using words not commonly found (or found at all) in the corpuses that LUIS uses for each culture will likely result in LUIS not extracting the words without assistance (either via Phrase list or list entity).
For example, if you created a LUIS application that dealt with units of measurement, while you might not be required to train it with units such as inch, meter, kilometer or ounce; you would probably have to train it with words like milliradian, parsec, and even other cultural spellings like kilometre. Otherwise these words would most likely not be extracted by LUIS. If a user provided the tokens "Planck unit", LUIS might provide a faulty extraction where it returns "unit" as the measurement entity instead of "Planck unit".

How to handle misspelled LUIS entity

Lets suppose it is movie bot. I added entity MovieName, and phrase list containing movies. One of the movie name is "Star Wars", and if user misspell it to "Stra Wra" then how I can tackle this issue? Will Bing spell check service help for non English movie names, I'm not sure?
LUIS will not be able to capture misspelled entities by itself unless you provide examples with misspelled entities which is not practical.
So you need to feed the utterances corrected to LUIS.
For Bing spelling correction service you have to try it yourself, but I guess it will handle your case.
If you expect some common misspellings that you expect to be repeated, you could add them in an exchangeable phrase list feature. That will help with the prediction of these misspelled entities.
There are multiple ways to solve this:
Use synonyms with most common mistakes
Have another step in your pipeline (before going to LUIS), which matches user input to possible options and corrects them (even a self made solution would do great, but you can also try to add ElasticSearch with fuzzy queries)

LUIS inserts whitespace in utterances when punctuation present causing entity getting incorrectly parsed

I am playing around with the Luis stock ticker example here, GitHub MicrosoftBotBuilder Example, it works well and the entity in the utterances is identified but there are stock tickers in the world that have periods in them such as bt.a
Luis by default pre-processes utterances where word breaks are inserted around punctuation characters and therefore an utterance of "what is price of bt.a" becomes "what is price of bt. a" and therefore Luis thinks the entity is "bt" instead of "bt.a"
Does anyone know how to get around this? Thx
This is how LUIS tokenizes utterances and I don't think it'll change int he near future.
I think you can investigate one of the 2 solutions:
Preprocess the utterance and normalize entities with punctuation (maybe save them in a map), and reverse the process when LUIS is called and the entities have been extracted.
Use phrase list features and add the entities that LUIS misses in their Tokenized form, label the entity tokens in the utterance, and retrain the model (I suggest you try that in a clone of your app, so you don't lose any current progress)
I need to process sentences with website addresses in them so I had to deal with a few different symbols. I found a technique that works for me, but it is not very elegant.
I am assuming here that you have an entity setup to represent the "stock symbol"
Here is what this would look like in your case.
Detect the cases when LUIS gets the "stock symbol" entity wrong. In
your case this may be whenever it ends in a period.
When LUIS gets the entity wrong, tokenize the raw query using spaces
as the separator. Grab the proper token by looking for a match with
the wrong partial token.
So for your example....
"what is price of bt.a"
You would see the "stock symbol" entity of "bt." and know that it is wrong because it ends in a period. You would then tokenize the query and look for tokens that contain "bt.". This would identify "bt.a" as the requested symbol.
Its not pretty, but in the case of website addresses has been reliable.

Resources