How to remove luis entity marker from utterance - azure-language-understanding

I am using LUIS to determine which state a customer lives in. I have set up a list entity called "state" that has the 50 states with their two-letter abbreviations as synonyms as described in the documentation. LUIS is returning certain two letter words, such as "hi" or "in" as state entities.
I have set up an intent with phrases such as "My state is Oregon", "I am from WA", etc. Inside the intent, if the word "in" is included in the utterance, for example in the utterance "I live in Kentucky", the word "in" is marked automatically by LUIS as a state entity and I am unable to remove that marker.
Below is a snip of the LUIS json response to the utterance "I live in Kentucky". As you can see, the response includes both Indiana and Kentucky as entities when there should only be Kentucky.
"query": "I live in Kentucky",
"topScoringIntent": {
"intent": "STATE_INQUIRY",
"score": 0.9338141
},
....
"entities": [
....
{
"entity": "in",
"type": "state",
"startIndex": 7,
"endIndex": 8,
"resolution": {
"values": [
"indiana"
]
}
},
{
"entity": "kentucky",
"type": "state",
"startIndex": 10,
"endIndex": 17,
"resolution": {
"values": [
"kentucky"
]
}
}
], ....
How do I train LUIS not to mark the words "in" and "hi" in this context as states if I can't remove the intent marker from the utterance?

In this particular case (populating a list entity with state abbvreviations/names), you would be better served using the geographyV2 prebuilt entity or Places.AbsoluteLocation prebuilt domain entity. (Please note that at the time of this writing, the geographyV2 prebuilt entity has a slight bug, so using the prebuilt domain entity would be the better option).
The reason for this is two-fold:
One, geographic locations are already baked into LUIS and they don't collide with regular syntactic words like "in", "hi", or "me". I tested this in reverse by creating a [Medical] list that contained "ct" as the normalized value and "ct scan" as a synonym. When I typed "get me a ct in CT" it resulted in "get me a [Medical] in [Medical]". To fix, I selected the second "CT" value and re-assigned it to the Places.AbsoluteLocation entity. After retraining, I tested "when in CT show me ct options" which correctly resulted in "when in [Places.AbsoluteLocation] show me [Medical] options". Further examples and training will refine the results.
Two, lists work well for words that have disparate words that can reference one. This tutorial shows a simple example where loosely associated words are assigned as synonyms to a canonical name (normalized value).
Hope of help!

#StevenKanberg's answer was very helpful but unfortunately not complete for my situation. I tried to implement both geographyV2 and Places.AbsoluteLocation (separately). Neither one works entirely in the way I need it to (recognizing states and their two-letter abbrevs in a way that can be queried from the entities in the response).
So my choices are:
Create my own list of states, using the state name and the two-letter abbrev as synonyms, as described in the list description itself. This works except for two letter abbrevs that are also words, such as "in", "hi" and "me".
Use geographyV2 prebuilt which does not allow synonyms and does not recognize two-letter abbrevs at all, or
Use Places.AbsoluteLocation which does recognize two-letter abbrevs for states, does not confuse them with words, but also grabs all locations including cities, countries and addresses and does not differentiate between them so I have no way of parsing which entity is the state in an utterance like "I live in Lake Stevens, Snohomish County, WA".
Solution: If I combine 1 with 3, I can query for entities that have both of those types. If LUIS marks the word "in" as a state (Indiana), I can then check to see if that word has also been flagged as an AbsoluteLocation. If it has not, then I can safely discard that entity. It's not ideal but is a workaround that solves the problem.

Related

Riot Games API: Requests return same identifiers for same player name but different region

I have these two URLs:
https://euw1.api.riotgames.com/lol/summoner/v4/summoners/by-name/okusen
https://eun1.api.riotgames.com/lol/summoner/v4/summoners/by-name/okusen
They just have the same player name and they are two different players from two different regions (Europe West and Europe Nordic & East).
Then, the two JSON responses respectively:
{
"profileIconId": 4275,
"name": "Okusen",
"puuid": "KFM4xJBwzy7T-rytrj9J8lGx0QduGLsBJ-WY9xdx4Q9cZNvxXCSNv_k4YQdfPgQjS52ppwlO_f9vhA",
"summonerLevel": 121,
"accountId": "PsopchdPCOnlQJB4AjXZ6TCrHuEZ9JlMqZMrDP6iAtznGQ",
"id": "zYkVlVUGHDuDmbfo1lmU0neHdpQdqxBNJ-hHMunqC__2K-4",
"revisionDate": 1583882906000
}
{
"profileIconId": 25,
"name": "Okusen",
"puuid": "KFM4xJBwzy7T-rytrj9J8lGx0QduGLsBJ-WY9xdx4Q9cZNvxXCSNv_k4YQdfPgQjS52ppwlO_f9vhA",
"summonerLevel": 30,
"accountId": "PsopchdPCOnlQJB4AjXZ6TCrHuEZ9JlMqZMrDP6iAtznGQ",
"id": "zYkVlVUGHDuDmbfo1lmU0neHdpQdqxBNJ-hHMunqC__2K-4",
"revisionDate": 1495766289000
}
They have the same identifiers so this is incorrect. I need puuid, accountId or id as parameter in other requests in order to get data for a specific player but I can't do that correctly if I don't have the correct identifier.
LoLCHESS.GG does not seem to have this problem as they display different data for these two players so I probably miss something but I really don't know what.
Neither of those IDs are guaranteed to be unique.
summonerId and accountId are guaranteed to be unique on a per region basis (so we won't find two summoners with the same ID on EUW).
puuid is guaranteed to be unique globally but if a user transfers regions, the two accounts will have the same puuid.
Thanks to thomasmarton in GitHub, more details in this thread.

how to breakdown seearch result with elasticsearch?

I have documents in my elasticsearch that represent suppliers, each document is a supplier and each supplier have branches as well, it looks like this:
{
"id": 1,
"supplierName": "John Flower Shop",
"supplierAddress": "107 main st, Los Angeles",
"branches": [
{
"branchId": 11,
"branchName": "John Flower Shop New York",
"branchAddress": "34 5th Ave, New York"
},
{
"branchId": 12,
"branchName": "John Flower Shop Miami",
"branchAddress": "56 ragnar st, Miami"
}
]
}
currently I exposed api to allow search in fields: supplierName, supplierAddress, branchName and branchAddress.
the use case is a search box in my website, that perform a call to the backend, and pur the result in a dropdown for the user to choose the supplier.
my issue is, given the example document above, if you search for "John Flower Shop Miami", the answer will be the whole document, and what will be presented is the top level supplier name.
what I want is to present "John Flower Shop Miami", and im not sure how to understand what part of the result is what hit the search....
does someone had to do something like this before?
Handling relationship in elasticsearch is a bit of work but you can do it. I recommend you to read the ES guide's chapter handling relationships to have the big picture.
Then my advice is to index your branches as nested documents. Thus they will be stored as distinct documents in your index.
It will require you to change your query syntax to use nested queries that can be a pain in the a... but in exchange, you will be granted with inner_hits functionality.
It will allow you to know which subdocument ( nested document ) matched your query.

Dynamic Achievement System algorithm / design

I'm developing this Achievement System and it must have a CRUD, that admins access to create new achievements and it's rules. I need some help with the design & algorithm of this so it can easily evolve with new rules as admins ask.
Rules sample
Medal one: must complete 5 any courses with a score of at least 90
Medal two: must complete two specific courses with a score of at least 85
Medal three: must be top 5 in general ranking at least once
Medal four: must have more than 5000 points
I'll basically store that as metadata in a relational database, probably with these columns below:
action
action quantity
course quantity
score
id course
ranking
position
points
I want to know if there is any known algorithm / design to this kind of problem? Or perhaps I should store them differently to make it easier? Don't know, I want suggestions.
Your doubts may be right. In my opinion, a database is the wrong way to organize this data. Every new kind of achievement you want to create would add extra columns to your database, and most achievements wouldn't use most of the columns. A more flexible data structure, one that doesn't expect for every entry to use all of the possible achievement criteria at once by default, would probably be more useful. Most languages support JSON, so I suggest you use that. The structure could be something like this:
[
{
"name": "Medal One",
"requirements": {
"coursesCompleted": 5,
"scoreMin": 90
}
},
{
"name": "Medal Two",
"requirements": {
"specificCoursesCompleted": [
"Course 1",
"Course 2"
],
"scoreMin": 85
}
},
{
"name": "Medal Three",
"requirements": {
"generalRankingMin": 5
}
},
{
"name": "Medal Four",
"requirements": {
"scoreMin": 5000
}
}
]
You can see here how the criteria types are sometimes reused, but they can be omitted when not needed and new ones can be added to a few achievements without bloating the rest of the dataset as well.
PS: I made the criteria names very verbose for demonstration purposes; shortening them or not in actual use is up to preference.

Is it possible to use different locations with Schema.org JobPosting?

I would like to use Schema.org for JobPosting, but the offer is for different cities (jobLocation).
Can I mark 2-3 cities in this schema (with JSON-LD)? In that case, how?
According to Google: https://developers.google.com/search/docs/data-types/job-postings#definitions
If the job has multiple locations, add multiple jobLocation properties
in an array. Google will choose the best location to display based on
the job seeker's query.
In json+ld it would look something like:
"jobLocation":[
{
"#type":"Place",
"address":{
"#type":"PostalAddress",
"streetAddress": "555 Clancy St",
"addressLocality":"Chicago",
"addressRegion":"IL",
"postalCode": "48201",
}
},
{
"#type":"Place",
"address":{
"#type":"PostalAddress",
"streetAddress": "5 Main St",
"addressLocality":"San Francisco",
"addressRegion":"CA",
"postalCode": "48212",
}
}
]
The jobLocation property, like any property, can have multiple values. In JSON-LD, you have to use an array (see example).
But the question is what multiple values mean for this jobLocation property: do these represent all locations the person has to work in (AND), or do these represent alternatives and the person can choose (OR)?
Neither Schema.org nor JSON-LD offer a way for the author to disambiguate which one is meant.
In my opinion, multiple values should convey that the person has to work in all these places (AND). Why? Because otherwise there would be no way to convey this. If multiple locations would represent alternatives (OR), you can simply provide multiple JobPosting items (one for each location).

How to find related related songs or artists using Freebase MQL?

I have any Freebase mid such as: /m/0mgcr, which is The Offspring.
Whats the best way to use MQL to find related artists?
Or if I have a song mid such as: /m/0l_f7f, which is Original Prankster by The Offspring.
Whats the best way to use MQL to find related songs?
So, the revised question is, given a musical artist, find all other musical artists who share all of the same genres assigned to the first artist.
MQL doesn't have any operators which can work across parts of the query tree, so this can't be done in a single query, but given that you're likely doing this from a programming language, it be done pretty simply in two steps.
First, we'll get all genres for our subject artist, sorted by the number of artists that they contain using this query (although the last part isn't strictly necessary):
[{
"id": "/m/0mgcr",
"name": null,
"/music/artist/genre": [{
"name": null,
"id": null,
"artists": {
"return": "count"
},
"sort": "artists.count"
}]
}]
Then, using the genre with the smallest number of artists for maximum selectivity, we'll add in the other genres to make it even more specific. Here's a version of the query with the artists that match on the three most specific genres (the base genre plus two more):
[{
"id": "/m/0mgcr",
"name": null,
"/music/artist/genre": [{
"name": null,
"id": null,
"artists": {
"return": "count"
},
"sort": "artists.count",
"limit": 1,
"a:artists": [{
"name": null,
"id": null,
"a:genre": {
"id": "/en/ska_punk"
},
"b:genre": {
"id": "/en/melodic_hardcore"
}
}]
}]
}]
Which gives us: Authority Zero, Millencolin, Michael John Burkett, NOFX, Bigwig, Huelga de Hambre, Freygolo, The Vandals
The things to note about this query are that, this fragment:
"sort": "artists.count",
"limit": 1,
limits our initial genre selection to the single genre with the fewest artists (ie Skate Punk), while the prefix notation:
"a:genre": {"id": "/en/ska_punk"},
"b:genre": {"id": "/en/melodic_hardcore"}
is to get around the JSON limitation on not having more than one key with the same name. The prefixes are ignored and just need to be unique (this is the same reason for the a:artists elsewhere in the query.
So, having worked through that whole little exercise, I'll close by saying that there are probably better ways of doing this. Instead of an absolute match, you may get better results with a scoring function that looks at % overlap for the most specific genres or some other metric. Things like common band members, collaborations, contemporaneous recording history, etc, etc, could also be factored into your scoring. Of course this is all beyond the capabilities of raw MQL and you'd probably want to load the Freebase data for the music domain (or some subset) into a graph database to run these scoring algorithms.
In point of fact, both last.fm and Google think a better list would include bands like Sum 41, blink-182, Bad Religion, Green Day, etc.

Resources