Luis Dispatch Model returning inconsistent intent scores - azure-language-understanding

I have multiple LUIS Language Models within a Dispatch Model. This was published over 7 days ago.
When performing a query via the API I find that:
V3 of the API is consistent
Doesn't matter how frequently I call the V3 endpoint with a specific query, I always get the same list of Intents (as in, in the same sequence of confidence), and each Intent's score remains consistent.
V2 of the API is not consistent
I call the V2 endpoint with the same query as for the V3 query and it returns the list of Intents and their scores, but these are very different to those returned by V3. So, my top intent in V3 which had a score of 0.8977665 now comes in 3rd with a score of 0.00916386.
And occasionally, the results flip to be identical to the V3 results, but maybe only for a single query. It's similar in many ways to hitting a web farm where one of the web servers has the wrong code.
I'm sure both the V2 and V3 APIs must be hitting the same instance of the language models, but I'm not clear on why the V2 would behave in this manner.
It would be interesting to know if anyone else had experienced this with the V2 endpoint and, if so, what was the solution. It would be even more interesting to know if anyone had experienced this with the V3 endpoint.

LUIS responses are non-deterministic which is to say that there is no guarantee you will always get the same response. This setting can be disabled, as described in this doc, by passing in certain values in the LUIS request body via the 'Update application version settings' API:
[
{
"name": "UseAllTrainingData",
"value": "true"
},
{
"name": "NormalizeDiacritics",
"value": "false"
},
{
"name": "NormalizePunctuation",
"value": "false"
}
]
Unfortunately, the docs do not state whether this is an option for the v3 API. At a minimum, it is available for v2.
In truth, I find it much more interesting that the v3 API is always returning the same intents in the same order than the v2 API not doing so. Perhaps the v2 API is more sensitive to your scoring thresholds than the v3 API is.

Related

Should I use only one API to get all data of one screen in GraphQL?

I'm new to GraphQL and come from RestAPI. When I start to work with GraphQL, I have some confusion about whether I should split API into smaller APIs as worked with RestAPI. Assume, I have a screen that requires to display some following components:
List of books
List of readers.
List of top 5 latest books.
Should I wrap all this information into one API or split them into 3 smaller APIs?
The fundamental aspect of graphQL is that there is never more nor less than a single API endpoint, which is often defined at /graphql/ or /gql/, that can only be accessed via POST. Then you define many queries (effective gets) and mutations (effective posts) that can be called at this endpoint to perform CRUD actions.
Therefore, you should define a single API with three separate queries that return the lists of books, readers, and top 5 latest books.
However, these queries could then be post requests to other RESTful APIs. Although, this is a question with regards to your tech-stack and the degree to which it is decomposed into microservices.

Child service data access to other services with Apollo Federation configuration

We've been using Apollo Federation for about 1.5 years as our main api. Behind the federation gateway are 6 child graphql services which are all combined at the gateway. This configuration really works excellent when you have a result set of data which spans the different services. E.g. A list of tickets which references the user who purchased and event it is associated with it, etc.
One place we have experienced this breaking down is when a pre-set of data is needed which is already defined in another child service (or across other child services) (resolver/path). There is no way (that has been discovered by us) to query the federation from a child service to get a federated set of data for use by a resolver to work with that data.
For example, say we have a graphql query defined which queries all tickets for an event, and through federation returns the purchaser's data, the event's data and the products data. If I need this data set from a resolver, I would need to make all those queries again myself duplicating dataSource logic and having to match up the data in code.
One crazy thought which came up is to setup apollo-datasource-rest dataSource to make queries against our gateway end point as a dataSource for our resolvers. This way we can request the data we need and let Apollo Federation stitch all the data together as it is designed to do. So instead of the resolver querying the database for all the different pieces of data and then matching them up, we would request the data from our graphql gateway where this query is already defined.
What we are trying to avoid by doing this is having a repeated set of queries in child services to get the details which are already available in (or across) other services.
The question
Is this a really bad idea?
Is it a plausible idea?
Has anyone tried something like this before?
Yes we would have to ensure that there aren't circular dependencies on the resolvers. In our case I see the "dataSource accessing the gateway" utilized in gathering initial data in mutations.
Example of a federated query. In this query, event, allocatedTo, purchasedBy, and product are all types in other services. event is an Event type, allocatedTo and purchasedBy are a Profile type, and product is a Product type. This query provides me with all the data I would use to say, send an email notification to the people in the result set. Though to get this data from a resolver in a mutation to queue up those emails means I need to make many queries and align all the data through code myself instead of using the Gateway/federation which does this already with the already established query. The thought around using apollo-datasource-rest to query our own gateway is get at this data in this form. Not through separate queries and code to align id's etc.
query getRegisteredUsers($eventId: ID!) {
communications {
event(eventId: $eventId) {
registered {
event {
name
}
isAllocated,
hasCheckedIn,
lastUpdatedAt,
allocatedTo {
firstName
lastName
email
}
purchasedBy {
id
firstName
lastName
}
product {
__typename
...on Ticket {
id
name
}
}
}
}
}
}
FYI, I didn't quite understand the question until I looked at your edits, which had some examples.
Is this a really bad idea?
In my experience, yes. Not as an idea, as you're in good company with other very smart people who have done this.
Is it a plausible idea?
Absolutely it's plausible, but I don't recommend it.
Has anyone tried something like this before?
Yes, but I hope you don't.
Your Question
Having resolvers make requests back to the Gateway:
I do not recommend this. I've seen this happen, and I've personally worked to help companies out of the mess this takes you into. Circular dependencies are going to happen. Latency is just going to skyrocket as you have more and more hops, TLS handshakes, etc. Do orchestration instead. It feels weird to introduce non-GraphQL, but IMO in the end it's way simpler, faster, and more maintainable than where "just talk to the gateway" takes you.
What then?
When you're dealing with some mutations which require data from across multiple data sources to be able to process a single thing (like sending a transaction email to a person), you have some choices. Something that helped me figure this out was the question "how would I have done this before GraphQL?"
Orchestration: you have a single "orchestration service", which takes the mutation and makes calls (preferably non-GraphQL, so REST, gRPC, Lambda?) to the owner services to collect the data. The orchestration layer does NOT own data, but it can speak with the other services. It's like Federation, but for sending the data into the request, instead of into the response.
Choreography: you trigger roughly the same thing, but via an event stream. (doesn't work as well with the request / response model of GraphQL)
CQRS (projections): Copies of database data, used for things like reporting. CQRS is basically "the way you read data doesn't have to be the same as the way you write it", and it allows for things like event-sourced data. If all of your data sources actually share the same database, you don't even need "projections" as much as you would just want a read replica. If you're not at enough scale to do replicas, just skip it and promise never to write data that your current domain doesn't own.
What I Do
Where I work, I have gotten us to:
Queries
queries always start with "one database call".
if the "one database call" goes to one domain of data (most often true), that query goes into one service, and Federation fills in the leaves of your tree. If you really follow CQRS, this could go the same way as #3, but we don't.
if your "one database call" needs data from across domains (e.g. get all orders with Product X in it, but sorted by the customer's first name), you need a database projection. Preferably this can be handled by a "reporting service": it doesn't OWN any data, but it READS all data.
Mutations
if your top-level mutation modifies acts only within one domain, the mutation goes in a service, it can use database transactions, and Federation fills in the leaves
if your mutation is required to write across multiple domains and requires immediate consistency (placing an order with inventory, payments, etc), we chose orchestration to write across multiple services (and roll-back when necessary, since we don't have database transactions to do it for us).
if your mutation requires data from many places to send further into the request (like sending an email), we chose orchestration to pull from the multiple services and to push that data down. This feels very much like Federation, but in reverse.

Bot framework - Use LUIS in Scorables

I've read that to handle messages globally, I have to use Scorables and set a score based on the user's input. I am wondering if I can use LUIS to parse the user input and set a score based on LUIS intent score.
Is there any way that I can use LUIS inside my Scorable class?
Or do i have to manually call LUIS and get the response and process myself?
Yes, you can call LUIS yourself, pass the message to it and see what it returns.
You will receive a list of intents with a score back and you typically take the one with the highest score.
LUIS is just an API with one endpoint so you can call it from wherever really, it's actually very easy. Have a look here for more details : https://github.com/Microsoft/Cognitive-LUIS-Windows
The response from LUIS will give you the intent and the parameters it identified assuming you had any. It's probably a good idea to set a threshold, if the score you get back is not high enough then that means you need to train LUIS more but that's another story. My own threshold is set at 88 anything below that, I don't really like.
If you do it like this you basically eliminate any need to do any processing yourself and you use LUIS for what it's mean to be used, which is understanding the user's query. You can do something with the result after that.

How to detect relationships using Microsoft Cognitive services?

Microsoft Cognitive Services offers a wide variety of capabilities to extract information from natural language. However I am not able to find how to use them in order to detect "relationships" where e.g. two (or more) specific "entities" are involved.
For example, detecting company acquisitions / merging.
These could be expressed in News articles as
"Company 1" has announced to acquire "Company2".
Certainly, there are several approaches to address that need, some that include entity detection first (e.g. Company1 and Company2 being companies) and then the relation (e.g. acquire ...).
Other approaches involve identifying first the "action" ( acquire ) and then through grammatical analysis find which is the "actor" and which the "object" of the action.
Machine learning approaches for semantic relation extraction has also been developed, in order to avoid humans to craft formal relation rules.
I would like to know if / how this use case can be performed with the Microsoft Cognitive Services.
Thankyou
Depends on tech used to examine response from the API https://dev.projectoxford.ai/docs/services
I use JQuery to parse the json response (webclient in asp.net code behind) from Luis/Cognitive Services API (I am not using the Bot Framework). I have a rules engine that I can configure for clients and save it, so that when the page loads, they fire functions based on the parsed json response. The rules engine includes various condition functions like contains, begins with, is, etc so I can test the users query for specific entities or virtually anything in the users query. It really comes down to a && or || javascript functions...
For example if intent=product in the json response, I then show a shopping cart widget. Or if entity=coffee black OR entity=double double then it triggers a widget to inject into the chat window (SHOW Shopping Cart). In short you either handle the AND/OR via the Bot Framework or via your tech of choice.

Exhaustive Search on Google Places

I'm trying to use Google Places API for a business locator app, but am having trouble creating an exhaustive database of business.
1.The API call only returns 20 results back.
2.The "type" restriction (e.g. type=restaurant) does not pick up all businesses by type in a given zip. I could use "keyword" but not all restaurants have restaurant in their name, and not all spas have "spa" in their name.
3. Each call produces the same set of results from day to day, and with only 20 returns per call, how am I to get a more exhaustive database of businesses?
I can try to get around the above three constraints by looping through a very well degraded search of businesses: say by zip code, some list of keywords, category type. But I still won't get close to picking up the 50 million or so businesses in google places.
In fact, even when I make a call for restaurants and bars in my own neighborhood, I don't pick up popular places down the block from me.
How is the API usable for an app that locates places then?
Any suggestions on how to create a more exhaustive search?
Thanks,
Nad
I'm not able to answer your question regarding Google Places API.
But for your requirements ('business locator app', 'I don't pick up popular places down the block from me') I suggest you try Yelp Search API:
Yelp's API program enables you to access trusted Yelp information in real time, such as business listing info, overall business ratings and review counts, deals and recent review excerpts.
Yelp is a popular review website with a capable API and you may test the quality of database and the devoted user base they have at Yelp homepage.
Note:
They keep some data for themselves and do not return everything in response.
The (free) dev account has a limit of 100 calls per 24 hours.
I know I'm late but maybe it helps someone these days.
By default, each Nearby Search or Text Search returns up to 20
establishment results per query; however, each search can return as
many as 60 results, split across three pages.
You need to use the field nextPageToken that you will receive on the first search to get the next page.
https://developers.google.com/places/web-service/search
An issue in stack overflow says:
There is no way to get more than 60 results in Places API. Some people
tried to file a feature request in Google issue tracker, but Google
rejected it with the following comment Unfortunately Places API is not
in a position to return more than 60 results. Besides technical
reasons (latency, among others) returning more than 60 results would
make the API be more like a database or general-purpose search engine.
We'd rather improve search quality so that users don't need to go so
far down a long list of results.
google places api more than 60 results
I faced the same difficulties that you did and decided to use the Yelp API instead. It is free, very complete and returns up to 1000 results. You should however check the terms of service before doing anything. It does not provide the website of the business (only the Yelp website link).
https://www.yelp.com/developers/documentation/v3/business_search
Other options I investigated at that time:
Foursquare ventures. (It was very expensive, and only returned up to around 100 results)
Here places API
Factual Places (I don't think this one is an API)
Sygic Travel API (Specific for touristical spots)
Planet.osm (OpenStreetMap)

Resources