How to find the words in an utterance which were not identified by LUIS - azure-language-understanding

(New to LUIS)
Let us say, the intent is "getSports" and it identifies all the sports present in the phrase provided by the user.
"I play table-tennis, baseball, rugby, squash."
LUIS identifies following entities,
sport : table-tennis
sport : baseball
sport : squash
But due to certain reason (might not have been trained for it), it is not able to identify "rugby". Is there a possibility to get this unidentified text?

Related

Problems in LUIS

1) In pattern, LUIS does not let you have more than 3 arguments using the OR operator, e.g (a|b|c|d) is illegal (why?)
2) In pattern, is there any way we can specify an optional text, something like "I want to [text] {entity}" so that the user can type in whatever between "to" and {entity} ?
3) In pattern, I cannot make the plural option for a word, e.g "How to contact the supplier[s]" doesn't work when the user types in "How to contact the suppliers". And I had to add "suppliers" to my entities list, which I find inconvenient
4) When you delete an intent, all the utterances automatically go to None intent. I think that should be an option "Do you want to move the utterances to None intent?"

geotext library is not picking up the correct name of cities in python

Hi I am new bee in python and we are trying to find the country ,cities name from geotext library of python but it is not picking every name correctly. could anyone please suggest what should be wrong.
While reading the data from email it is picking up "Mobile" as city which is in SIgnature of email
from geotext import GeoText
places = GeoText("Hi , We need to book a flight from Mumbai to London on 13 Aug throuigh shivaji terminal.
Regards,
xyz
Mobile : 5368536
")
Output : ['Mumbai' ,'Moble']
please help
There are three cities named 'Mobile' in various states the US. You cannot avoid picking it up (unless you decide to block that specific word as being a city - but there could easily be other cities with names that match common words).

LUIS - proper nouns mixed with adjective(s) and partial entity matches

I have two hierarchical entities like this (simplified): "Order::OpenOrder, Order::AnyOrder, Job::OpenJob, Job::AnyJob" for a search application. I'm trying to train LUIS to correctly understand inputs like (a)"acme open orders", (b)"open acme orders", (c)"acme open jobs", (d)"open acme jobs" using utterances.
If I just use the two simplest utterances "open orders" -> Order::OpenOrder and "open jobs" -> Job:::OpenJob, then inputs (a) and (c) work fine. But example (b) finds Order::Open, but the string "acme" is included in the entity character range. Example (d) is unable to resolve any entities.
Complicating things is that it's also legal to just input "acme orders" or "acme jobs", where I've trained LUIS using utterances like "blah orders" and "blah jobs" where "orders" and "jobs" are mapped to Order:AnyOrder, Job::AnyJob, respectively. And then you can also just input things like "orders", "open Orders", etc.
Anyway, none of this is working consistently and I'm wondering if I'm taking the wrong approach to training LUIS to understand adjective-noun pairs where proper nouns can appear between them. Anyone else had a model like this who could share some advice?
Thanks,
-Erik

Strange results from Google places autocomplete for sequence of repeating letters

This call https://maps.googleapis.com/maps/api/place/autocomplete/xml?input=qqqqqqq (plus your key) returns addresses like 'qqqqqqqqqq, Florida, USA' and 'qqqqqqqqqqqqqqqqqqqqqqqq - Luizote de Freitas, Uberlândia - State of Minas Gerais, Brazil'. I understand that QQQ might be a valid name, but qqqqqqqqqqqqqqqqqqqqqqqq? And it works the same way for any sequence of repeating letters or numbers.
Ok, let's say this is google having bad data. But how to explain results for 'www': 'Best Buy, Middlesex Turnpike, Burlington, MA, USA', 'Acton Toyota of Littleton, Great Road, Littleton, MA, USA'? I do not see any sane correlation between 'www' and the results.
You can see similar behaviour in google maps, so it's not just autocomplete API.
Any theories?
When I execute request https://maps.googleapis.com/maps/api/place/autocomplete/json?input=www&key=MY_API_KEY from my location I get really weird predictions as well
Montpellier, France (place ID ChIJsZ3dJQevthIRAuiUKHRWh60, type locality)
Berlin, Germany (place ID ChIJAVkDPzdOqEcRcDteW0YgIQQ, type locality)
Hamburg, Germany (place ID ChIJuRMYfoNhsUcRoDrWe_I9JgQ, type locality)
Munich, Germany (place ID ChIJ2V-Mo_l1nkcRfZixfUq4DAE, type locality)
Vienna, Austria (place ID ChIJn8o2UZ4HbUcRRluiUYrlwv0, type locality)
Note all of them have locality type, and indeed it smells like a bug, because I cannot see how on earth the text 'www' might match these predictions. Apparently, something is broken on Google backend and leads to the strange behavior in places autocomplete.
I can confirm that I can see this problem on Google Maps web site as well
At this point I believe the best option for us is sending a feedback to Google Maps team and hope they will fix it soon.

Google Place API street type list

I am using google place to retrieve address, and somehow we want the street(route in google terminology) to be separated into street name and street type. We also want the street type to match an existing column in database.
But things get difficult when google place sometimes use XXXX Street and some times XXXX st
For instance, this is a typical google address
{
administrative_area_level_1: ['short_name', 'VIC'],
locality: ['long_name', 'Carlton'],
postal_code: ['long_name', '3053'],
route: ['long_name', 'Canada Ln'],
street_number: ['short_name', '12'],
subpremise: ['short_name', '13']
}
But it always shows Canada Lane in the suggestion box.
And sometimes even worse when the abbreviation does not match my local data model. For instance we use la instead of ln for short of lane.
It will be appreciated if anyone could tell me where to find a list of street type (and abbreviation) used by google API. Or Is there a way to disable the abbreviation option?
Sounds like you're after "street suffixes". These are complicated.
Not only they change across countries and languages, even within the same country and language they can be used in different ways; abbreviations can have multiple meanings: "St" can be "Street" of "Saint"; abbreviations are used or not depending on subtle rules that also change from place to place.
Same goes for cardinal points (North, South, East, West) that are parts of road / street names: "North St" or "N 11st Street"? It's complicated.
If you already have a good amount of addresses, and you only care about addresses in English, you could take the last word from each street name as the suffix. When matching to your own data, allow for abbreviations when matching, rather than trying to expand them.
For instance, don't try to expand "Canada La" into "Canada Lane" so that it matches "Lane". Instead, expand "Lane" into ["Lane", "La", "Ln"] and match suffixes to all values.
Then you'd need a strategy for "collisions", abbreviations that can mean 2+ suffixes. These seem to be rare, I can't remember any ("St" isn't, because "Saint" isn't a suffix) and USPS' http://pe.usps.gov/text/pub28/28apc_002.htm doesn't seem to have any.

Resources