LUIS doesn't recognize Japanese intent - azure-language-understanding

English models recognized sample utterance correctly, but some Japanese models could not recognize them.
Especially when we add the entities to the intent, Japanese language models are very poor scores against the sample utterances.

Japanese isn't fully supported yet; Named Entity Recognition is currently in Preview.
It's also important to note:
Because LUIS does not provide syntactic analysis and will not understand the difference between Keigo and informal Japanese, you need to incorporate the different levels of formality as training examples for your applications.
でございます is not the same as です.
です is not the same as だ.
and that:
LUIS breaks an utterance into tokens based on culture.
Several cultures return the entity object with the entity value tokenized. The startIndex and endIndex returned by LUIS in the entity object do not map to the new, tokenized value but instead to the original query in order for you to extract the raw entity programmatically.
So unfortunately, LUIS doesn't have great support for Japanese at this time. The LUIS team is hoping to have major improvements to tokenization by August, 2019, which will greatly improve Japanese recognition.
Here's a few things you can try to improve your app:
Ensure that when you create the LUIS app, you select Japanese for the Culture
Ensure that you provide training examples for both Keigo and informal Japanese
Follow the Best Practices Docs - this is VERY helpful for generally improving your LUIS app

Related

Handling typos / misspellings on list entities

What is the best practice approach to handle typos / misspelling on LUIS List Entities?
I have intents on LUIS which use a list entity (specifically Company Department - HR, Finance, etc). It is common for users to misspell this when putting forward their utterance. LUIS expects an exact match, it doesn't do a "smart" match, and therefore doesn't pick up the misspelled entity.
a) Using bing spell check is not necessarily a good solution. e.g. Certain departments are acronyms such as VRPA - and bing wont correct a typo there.
b) When I used LUIS a year ago, I would pre-process the utterance and use a Levenshtein distance algorithm to fix typos on list entities before feeding them to LUIS.
I would imagine that by now LUIS has some better out of the box way of handling this very common use case.
I'd appreciate input on what the best practice approach is to handle this.
#acambitsis and I exchanged messages via his UserVoice ticket, but I'm going to post the answer here for others.
A combination of Bing and Simple Entities might be what you're looking for, then (they're machine-learned).
I was able to accomplish something close and attached images.
In entities, I created a Simple entity with the role, VRPA. In intents, I created the Show Me intent and added sample utterances "Show me the VRPA" and "Show me the VPRA". I clicked on V**A and selected the Simple Entity:VRPA role. After training, I tried "show me the varp" and it correctly guessed "varp" was the "Simple:VRPA" entity.
You may also find RegEx entities useful. For acronyms, you could do something like: /[vrpa]/i and then any combination of VRPA/VPRA/VARP/ARVP would match.
I highly recommend reading through the Entity Types and Improve App Performance to see if anything jumps out to solve your particular issues.
This may not do exactly what you're looking for. If not, I'd recommend implementing a fuzzy-matching algo of your choice.
entities
intents

How to split/extract specific entities automatically to MS Luis?

I am currently working with MS LUIS.ai.
My string/utterance contains both English and Chinese.
Here is the problem:
While if sentence is ALL in English, it works fine in LUIS. The reason is probably because a sentence is composed of different words, which are split by a "space".
However, in Chinese (Both Traditional and Simplified), a sentence is composed of words that are concanated/joined together and difficult to be split.
For example, in English I can write:
I love you so much: There are 5 words here. In LUIS I can select I love you and turn it into an entity. And later on, when more words like I love you goes in LUIS, it can identify the related intent easily.
However, in Chinese if I write:
我很喜歡你: which has the same meaning as in English above. Under LUIS it will be counted as 1 word. If I want to extract the word 喜歡 (which means "Love/Like"), I cannot do this in LUIS.
Only if I put space around 喜歡 like this: 我很 喜歡 你 will I be able to select 喜歡 as a particular entity.
My Question:
Are there any ways/methods/tricks that I can use so that, when someone enters joined-string, like what you see in the Chinese version, to LUIS, LUIS will be able to identify specific words as entity automatically, without any manual change?
Thank you very much in advance for all your help.
To perform machine learning, LUIS breaks an utterance into tokens based on culture. We cannot suppress tokenization. LUIS tokenizes Chinese at character level and returns tokenized entity whereas for English it tokenizes for every space or special character. In the zh-cn culture, LUIS expects the simplified Chinese character set instead of the traditional character set.
Hope this helps!!

How to detect names as entities using LUIS in Microsoft Bot Framework

I am using luis.ai which is offered as a part of Microsoft Cognitive Services, in my project. I have a requirement of detecting names using LUIS. For the same, I have been using the phrase list feature. I have added some names in the list. But as we all know, the names list is never exhaustive. So, no matter how many names I add, since they don't have a specific pattern, when I test with some new names, the entity detection fails. I want to know if there's any other way in which we can have LUIS detect names of people.
Please let me know if you have a solution to this problem.
LUIS could be used to recognize and extract intents and entities from utterances, but based on my experience, it might not be 100% intelligent to identify person’s name, because person’s name could be anything.
As you did, adding not well-recognized names in phrase list could be as a solution. Besides, this github issue:Identifying the Names from the sentence using LUIS discussed a similar question, and as cahann mentioned, you can add and label more example utterances that contain not well-recognized name to make your LUIS app recognize Names better.

LUIS entity not recognised

I trained my luis model to recognize an intent called "getDefinition" with example utterances such as: "What does BLANK mean" or "Can you explain BLANK to me?". It recognizes the intent correctly. I also added an entity called "topic" and trained it to recognize what topic the user is asking about. The problem is that luis only recognizes the exact topic the user is asking about if I used that specific term in one of the utterances before.
Does this mean I have to train it with all the possible terms a user can ask about or is there some way to have it recognize it anyway?
For example when I ask "What does blockchain mean" it correctly identifies the entity (topic) as blockchain because the word blockchain is in the utterance. But if I ask the same version of the question about another topic such as "what does mining mean", it doesn't recognize that as the entity.
Using a list or phrase list doesn't seem to be solving the problem. I want to eventually have thousands of topics the bot responds to, entering each topic in a list is tedious and inconvenient. Is there a way LUIS can recognize that its a topic just from the context?
What is the best way to go about this?
Same Doubt, Bit Modified. Sorry for Reposting this here.
At the moment LUIS cannot extract an entity just based on the the intent. Phrase lists will help LUIS extract tokens that don't have explicit training data. For example training LUIS with the utterance "What does blockchain mean?" does not mean that it will extract "mining" from "What does mining mean?" unless "mining" was either included in a phrase list, or a list entity. In addition to what Nicolas R said about tagging different values, another thing to consider is that using words not commonly found (or found at all) in the corpuses that LUIS uses for each culture will likely result in LUIS not extracting the words without assistance (either via Phrase list or list entity).
For example, if you created a LUIS application that dealt with units of measurement, while you might not be required to train it with units such as inch, meter, kilometer or ounce; you would probably have to train it with words like milliradian, parsec, and even other cultural spellings like kilometre. Otherwise these words would most likely not be extracted by LUIS. If a user provided the tokens "Planck unit", LUIS might provide a faulty extraction where it returns "unit" as the measurement entity instead of "Planck unit".

Entity not recognized

I trained my luis model to recognize an intent called "requestDefintion" with example utterances such as: "What does BLANK mean" or "Can you explain BLANK to me?".
It recognizes the intent correctly. I also added an entity called "topic" and trained it to recognize what topic the user is asking about.
The problem is that luis only recognizes the exact topic the user is asking about if I used that specific term in one of the utterances before.
Does this mean I have to train it with all the possible terms a user can ask about or is there some way to have it recognize it anyway?
For example when I ask "What does blockchain mean" it correctly identifies the entity (topic) as blockchain because the word blockchain is in the utterance. But if I ask the same version of the question about another topic such as "what does mining mean", it doesn't recognize that as the entity.
What is the best way to go about this?
Does this mean I have to train it with all the possible terms a user can ask about or is there some way to have it recognize it anyway?
You can try to use phrase list features, which can help LUIS recognize intents and entities. For example, you can create a phrase list named " topic" that contains the values BLANK, blockchain and mining etc.
My test with utterance what does mining mean:
Using phrase list, the score is 0.94
Not using phrase list, the score is 0.77
Note: If define too many intents, it becomes harder for LUIS to classify utterances correctly, please do not define too many intents.

Resources