LUIS built-in geography type sometimes recognizes a city, but other times doesn't - azure-language-understanding

I'm a bit confused. I'm using LUIS's built-in geographyV2 type.
My utterances are things like "are there any part time cashier positions near houston?" (recognized) or "do you have any part time cashier jobs within 10 miles of houston?" (not recognized).
If I hover over the unrecognized instance of "houston," I don't have the option to tag it as a geographyV2 instance (if I try "browse pre-built entities, it doesn't shown geographyV2, I guess since that is already one of my types).
Is there any way I can train it better to recognize houston in the 2nd case?
Seems like some cities don't get picked up at all:
While others are detected without a problem:
If you have any tips, please let me know. This is the first time I've used LUIS. Overall, I'm very impressed!
Thanks
Updates based on Steven's suggestions:
Now I'm able to get Anchorage and Houston recognized. But this introduces a problem with Los Angeles. It is getting extracted as two entities:
Similar issue for St. Louis (it wants to tokenize "St" and "Louis" separately).
Sorry for being such a n00b :-)

I have somewhat similar issue. I have the utterance:
"what is the price of diesel in Latin America"
Latin America is not tagged as geographyV2!
I tried Asia and same thing!
I tried North America, South America, South Africa, Middle East and those worked and were tagged!
I wondered - why the inconsistency?
I looked over the docs and here is a suggestion:
The behavior of prebuilt entities can't be modified but you can improve resolution by adding the prebuilt entity as a feature to a machine-learning entity or sub-entity.
Here is the link : LUIS DOC
Here is what I have come up with to resolve it:

Related

What is the difference between a concept and a label in XBRL, and do all listed companies share the same US GAAP labels?

Let me show tesla's company facts data with sec's RESTful api:
https://data.sec.gov/api/xbrl/companyfacts/CIK0001318605.json
You can see all labels in 'facts ---- us-gaap' such as :
AccountsAndNotesReceivableNet
AccountsPayableCurrent
AccountsReceivableNetCurrent
AccretionAmortizationOfDiscountsAndPremiumsInvestments
Do all listed companies share same us-gaap label names ?
Can every company create its own customerized us-gaap label names?
concept in xbrl is A taxonomy element that provides the meaning for a fact in the official definition.
https://www.xbrl.org/guidance/xbrl-glossary/
What is the difference between concept in xbrl and us-gaap's label ?
The short answer is yes.
First, a small detail:
AccountsAndNotesReceivableNet
AccountsPayableCurrent
AccountsReceivableNetCurrent
AccretionAmortizationOfDiscountsAndPremiumsInvestments
These are not labels, these are local names of concepts. Labels are something different, human readable, for example "Accounts and notes receivable, net" would be a label. Labels are attached with the label linkbase.
The more complete names (called QNames) of these concepts are:
us-gaap:AccountsAndNotesReceivableNet
us-gaap:AccountsPayableCurrent
us-gaap:AccountsReceivableNetCurrent
us-gaap:AccretionAmortizationOfDiscountsAndPremiumsInvestments
where the us-gaap prefix is bound with the US GAAP namespace, which changes every year and is, for 2021:
http://fasb.org/us-gaap-std/2021-01-31
This makes explicit that these concepts are not maintained by companies, but by the Financial Accounting Standards Board. Thus, all companies filing their reports into the EDGAR system share these concepts.
Two important points:
Companies are allowed to create their own concepts. These are called extension concepts. You will recognize them because they are in a company namespace, not in the US GAAP namespace. Their prefix will not be us-gaap, but some company-specific prefix. These concepts are unique to each company.
An example for Tesla is:
tsla:AccruedAndOtherCurrentLiabilities
Concepts in the US GAAP taxonomy are updated every year, i.e., some get added, some get deprecated, some are removed. However, the FASB tries to maintain consistency across years, i.e., a concept will not suddently change its semantics one year to the next.

How does the "The values are interchangable" option in Phrase List work in LUIS?

I've gone through the ocumentation and tried understanding the Phrase List feature. Although I'm sure of the purpose of the Phrase List feature, I couldn't quite get the purpose of the "interchangable" option intutively.
Any thorough explanation would be appreciated.
#Srichakradhar at your suggestion, posting answer related to your question on gitter to here on StackOverflow as well to benefit the community as a whole!:
"...regarding your question on phrase lists, happy to speak high-levelly on what the feature does :)
#srichakradhar
So ultimately the goal with LUIS is to understand the meaning of the user’s input (utterance), and through calculations, it returns to you the value of how confident it is about the meaning of the input. Using phrase lists is one of the ways to improve the accuracy of determining the meaning of the user’s utterance
—more specifically, when adding features to a phrase list, it can put more weight on the score of an intent or entity.
Using a couple of examples to illustrate the high-level concept of how features help determine intent/entity score, and in turn predict the user’s utterance’s meaning:
For example, if I wanted to describe a class called Tablet, features I could use to describe it could include screen, size, battery, color, etc. If an utterance mentions any of the features, it’ll add points/weight to the score of predicting that the utterance’s meaning is describing Tablet. However, features that would be good to include in a phrase list are words that are maybe foreign, proprietary, or perhaps just rare. For example, maybe I would add, “SurfacePro”, “iPad”, or “Wugz” (a made-up tablet brand) to the phrase list of Tablet. Then if a user’s utterance includes “Wugz”, more points/weight would be put onto predicting that Tablet is the right entity to an utterance.
Or maybe the intent is Book.Flight and features include “Book”, “Flight”, “Cairo”, “Seattle”, etc. And the utterance is “Book me a flight to Cairo”, points/weight towards the score of Book.Flight intent would be added for “Book”, “flight”, “Cairo”.
Now, regarding interchangeable vs. non-interchangeable phrase lists.
Maybe I had a Cities phrase list that included “Seattle”, “Cairo”, “L.A.”, etc. I would make sure that the phrase list is non-interchangeable, because it would indicate that yes “Seattle” and “Cairo” are somehow similar to one-another, however they are not synonyms—I can’t use them interchangeably or rather one in place of the other. (“book flight to Cairo” is different from “book flight to Seattle”)
But if I had a phrase list of Coffee that included features “Coffee”, “Starbucks”, “Joe”, and marked the list as interchangeable, I’m specifying that the features in the list are interchangeable. (“I’d like a cup of coffee” means the same as “I’d like a cup of Joe”)
For more on Phrase Lists - Phrase List features in LUIS
For more on improving prediction - Tutorial: Add phrase list to improve predictions"
Taken from documentation (here):
A phrase list may be interchangeable or non-interchangeable. An
interchangeable phrase list is for values that are synonyms, and a
non-interchangeable phrase list is intended for values that aren't
synonyms but are similar in another way.
There is also a great reply here on MSDN:
Choose "Exchangeable" when the list of words or phases in your feature
form a class or group -- for example, months like "January",
"February", "March"; or names like "John", "Mary", "Frank". These
features are "exchangeable" in the sense that an utterance where one
word/phrase appears would be labeled similarly if the word/phrase were
exchanged with another. For example, if "show the calendar for January" has the same intent as "show the calendar for February", this
suggests choosing "exchangeable".
Choose "Not exchangeable" for words/phrases that are useful in your
domain, but which do not form a class or group. For example, the
words "calendar", "email", "show", and "send" might be relevant to
your domain, but might all be associated with different intents, like
"show my calendar" or "send an email".
If you're not sure, you can try either and see if there's any
difference in performance.

LUIS entity not recognised

I trained my luis model to recognize an intent called "getDefinition" with example utterances such as: "What does BLANK mean" or "Can you explain BLANK to me?". It recognizes the intent correctly. I also added an entity called "topic" and trained it to recognize what topic the user is asking about. The problem is that luis only recognizes the exact topic the user is asking about if I used that specific term in one of the utterances before.
Does this mean I have to train it with all the possible terms a user can ask about or is there some way to have it recognize it anyway?
For example when I ask "What does blockchain mean" it correctly identifies the entity (topic) as blockchain because the word blockchain is in the utterance. But if I ask the same version of the question about another topic such as "what does mining mean", it doesn't recognize that as the entity.
Using a list or phrase list doesn't seem to be solving the problem. I want to eventually have thousands of topics the bot responds to, entering each topic in a list is tedious and inconvenient. Is there a way LUIS can recognize that its a topic just from the context?
What is the best way to go about this?
Same Doubt, Bit Modified. Sorry for Reposting this here.
At the moment LUIS cannot extract an entity just based on the the intent. Phrase lists will help LUIS extract tokens that don't have explicit training data. For example training LUIS with the utterance "What does blockchain mean?" does not mean that it will extract "mining" from "What does mining mean?" unless "mining" was either included in a phrase list, or a list entity. In addition to what Nicolas R said about tagging different values, another thing to consider is that using words not commonly found (or found at all) in the corpuses that LUIS uses for each culture will likely result in LUIS not extracting the words without assistance (either via Phrase list or list entity).
For example, if you created a LUIS application that dealt with units of measurement, while you might not be required to train it with units such as inch, meter, kilometer or ounce; you would probably have to train it with words like milliradian, parsec, and even other cultural spellings like kilometre. Otherwise these words would most likely not be extracted by LUIS. If a user provided the tokens "Planck unit", LUIS might provide a faulty extraction where it returns "unit" as the measurement entity instead of "Planck unit".

LUIS- Doesn't not understand Intent sometime

We are developing Library BOT using Microsoft BOT.
Here We have created one Intent BookSearch, and Entity BookName, BookAuthor.
We trained LUIS with Simple question,but he works only matching questions.
Ex. I trained LUIS like "I need book", so its works properly
But with Same question we write "I need a book", its doesn't understand to match with book intent.
Can anyone help us here? Like that so many scenario where we found LUIS only works with exact matching questions.
One More Problem, We have Book name, with Three Work, unable to tag three words as a bookname entity.
It sounds like your model just needs more training with a variety of sentence structures.
LUIS will match the exact intent when it's been trained but needs more examples to get better with novel utterances. So "I need book" vs "I need a book" should be pretty easy for it to learn with more properly labeled utterances.
As for the title with three words, highlighting them all by clicking and dragging across all three is possible.
You have to write more possible questions which user can ask regarding book?
Let me give an example to explain this...
I need book
I need a book
What latest book you have?
Can you recommend a book to me
I am looking for book
I am looking for a book
In above examples - Intent should be - FindBook
I am looking for C# book?
In above example - Intent should be - FindBook, here user has mentioned the Subject (C#) as well. Subject will be entity.
I am looking for C# book written by Joseph Albahari, Ben Albahari
In above example - Intent Should be one - FindBook, here user has mentioned the Subject and Writer.
Subject and Writer will be entity.
You have to train your model and feed more possible questions, then only LUIS will work perfectly.
You can highlight full sentence as well.

What's needed for NLP?

assuming that I know nothing about everything and that I'm starting in programming TODAY what do you say would be necessary for me to learn in order to start working with Natural Language Processing?
I've been struggling with some string parsing methods but so far it is just annoying me and making me create ugly code. I'm looking for some fresh new ideas on how to create a Remember The Milk API like to parse user's input in order to provide an input form for fast data entry that are not based on fields but in simple one line phrases instead.
EDIT: RTM is todo list system. So in order to enter a task you don't need to type in each field to fill values (task name, due date, location, etc). You can simply type in a phrase like "Dentist appointment monday at 2PM in WhateverPlace" and it will parse it and fill all fields for you.
I don't have any kind of technical constraints since it's going to be a personal project but I'm more familiar with .NET world. Actually, I'm not sure this is a matter of language but if it's necessary I'm more than willing to learn a new language to do it.
My project is related to personal finances so the phrases are more like "Spent 10USD on Coffee last night with my girlfriend" and it would fill location, amount of $$$, tags and other stuff.
Thanks a lot for any kind of directions that you might give me!
This does not appear to require full NLP. Simple pattern-based information extraction will probably suffice. The basic idea is to tokenize the text, then recognize/classify certain keywords, and finally recognize patterns/phrases.
In your example, tokenizing gives you "Dentist", "appointment", "monday", "at", "2PM", "in", "WhateverPlace". Your tool will recognize that "monday" is a day of the week, "2PM" is a time, etc. Finally, you can find patterns like [at] [TIME] and [in] [Place] and use those to fill in the fields.
A framework like GATE may help, but even that may be a larger hammer than you really need.
Have a look at NLTK, its a good resource for beginner programmers interested in NLP.
http://www.nltk.org/
It is written in python which is one of the easier programming languages.
Now that I understand your problem, here is my solution:
You can develop a kind of restricted vocabulary, in which all amounts must end witha $ sign or any time must be in form of 00:00 and/or end with AM/PM, regarding detecting items, you can use list of objects from ontology such as Open Cyc. Open Cyc can provide you with list of all objects such beer, coffee, bread and milk etc. this will help you to detect objects in the short phrase. Still it would be a very fuzzy approach.

Resources