Dialogflow ES intent matching performance - performance

I have created a Q&A chatbot that supports a large number of intents (just over 400). Honestly, the intent matching performance is sometimes very low. I would like to ask if there is any way to improve Dialogflow's performance?
Does the region where the Dialogflow (global) instance is created affect performance, since the users are from the APAC region?
Best regards

Related

What is the difference between Elasticsearch, Apache Druid, and Rockset? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I'm working on an game application where I need real-time data for leaderboards I'm building. I've read a bunch of stackoverflows and company blogs- but honestly, I'm not sure which one best fits my use case. I am using DynamoDB to record players' recent moves, and the history of moves are in kafka. I am looking to stream data from these two data sources into a database and my leaderboard-service can then query the database to render the contents of each leaderboard. My data velocity is modest (1K game events/sec). I find these three different databases that I can use, has anybody used any of these database for game-leaderboarding? If so, can you share the advantages or pains that you have encountered while doing so? According to all 3 companies, they are able to do real-time data.
You would have to evaluate the scale and performance that you require and it is difficult for me to estimate those based on the data you provided. But I can do a feature comparison of using some of these systems.
The first option is to run your leaderboards by querying DynamoDB itself and you do not need any additional systems. The advantage obviously is that there is one less component for you to manage. But I am assuming that your leaderboards need complex logic to render, and because DynamoDB api deals with key/values, you have to fetch a lot of data from DynamoDB to execute every query to render the leaderboard.
The second option you specified is Elastic Search. Great system that gives query results really fast because it stores data as an inverted index. However, you wont be able to do JOINs between your DynamoDB data and kafka stream. But you sure can run a lot of concurrent queries on Elastic. I am assuming that you need concurrent queries because you are powering an online game where m
multiple players are accessing the leaderboard at the same time.
The third option, Druid, is a hybrid between a datalake and a data warehouse. You can store large volumes of semi-structured data, but unlike Elastic, you would need to flatten the nested json data an ingest time. I have used Druid for large scale analytical processing for powering my dashboards, and it does not support as high a concurrency as Elastic.
Rockset seems to be much newer product and is a hosted service on the cloud. It says that it build inverted index like Elastic and also supports JOINs. It can auto-tail data from DynamoDB (using change-streams) and kafka. I do not see any performance numbers on the website, but the functionality is very compatible with what I would need for building a game leaderboard.

MS LUIS: Number of Intents / Data Imbalance

I am seeing on the LUIS documentation page here that you absolutely recommend to treat Data Imbalance (e.g. the differing number of total unterances compared amongst various intents) as a first priority. We currently see a mean of 19 Utterances per Intent on our dashboard, so in my opinion I should optimize all Intents towards having about 20 Utterances each as an example.
Now my question: When I use active learning by adding Endpoint Utterances, Utterances will be added to the intent we see them fitting (Active Learning Documentation). How can I ensure, that the number of utterances per intent will always remain equal (e.g. around 20 in our example)? In my opinion naturally by attributing endpoint utterances to Intents, a Data Imbalance will be created again.
Thanks a lot!
Best,
Mark
After your initial model is satisfactory, there no longer needs to be equality between intents, active learning specifically tries to correct for cases that were unseen of before, so if other examples already cover all your cases, then you don’t need to actively correct it.

Is Rasa-core train on actual dialog data behind the scene?

Since the core trains on domain.yml and stories.yml, without depending on the users' words (nlu.yml), I understand that RASA-Core training has nothing to do with the NLU part. It solely trains on the 'intent-action' pair, not the actual dialog data:
* greet
- utter_greet
Is this correct? In such a case, I think the training data for dialog policy training is always going to be small, because it trains on the abstract intent-action pairs, not the actual data. In another words, dialog policy training is totally independent from NLU.
Is this understanding correct? Just want to confirm this understanding.
In another words, dialog policy training is totally independent from NLU.
This is right for the training. However, in production Rasa Core uses the extracted entities from Rasa NLU and of course the classified intents.
abstract intent-action pairs
It should only be "pairs" if you are doing a FAQ chatbot. If you actually want to handle more complex conversation, then you have to write more stories. As you can see in this Rasa demo the required training data can get quite large for more complex chatbots.
how to create those intent-action pairs?
You actually have to design your training stories manually. There is currently no way to so automatically. See this blog post which gives some recommendations how to write better training stories for Rasa Core.

Is there way to influence AlchemyAPI sentiment analysis

I was using AlchemyAPI for text analysis. I want to know if there is way to influence the API results or fine-tune it as per the requirement.
I was trying to analyse different call center conversations available on internet. To understand the sentiments i.e. whether customer was unsatisfied/angry and hence conversation is negative.
For 9 out of 10 conversations it gave sentiment as positive and for 1 it was negative. That conversation was about emergency response system (#911 in US). It seems that words shooting, fear, panic, police, siren could have cause this result.
But actually the whole conversation was fruitful. Caller was not angry with the service instead call center person solved the caller's problem and caller was relaxed. So logically this should not be treated as negative.
What is the way ahead to customize the AlchemyAPI behavior ?
We are currently looking at the tools that would be required to allow customization of the AlchemyAPI services. Our current service is entirely pre-trained on billions of web pages, but customization is on the road map. I can't give you any timelines this early, but keep checking back!
Zach, Dev Evangelist AlchemyAPI

How can we *set* deadlines, to allow us to work to them effectively, in an agile way? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 5 years ago.
Improve this question
I'm working in a team that's been consistently and fairly successfully working in an agile approach, and this has been working great for the current project until now, for our initial work, as we incrementally build the product.
We're now moving into the next phase of this though, and the management are keen for us to set some specific deadlines ourselves, for when we'll be in a position to demo and sell this to real customers, on the order of months.
We have a fairly well organised large backlog for each of the elements of functionality we'd like to include, and a good sense of the prioritisation of these individual bits of functionality.
The naive solution is to get the minimum list of stories that would provide a demo-able product, estimate all of those individually, and add them up and combine with our velocity to get a date, and announce we'll be demoing from then. That leaves no leeway though, and seems likely to result in a mad crunch as we get up to deadline time, which I desperately want to avoid.
As an improvement, I'd like to add in some ratio of more optional stories to act as either contingency or bonus improvements, depending on how we progress, but we don't have any idea what ratio would be sensible, or whether this is the standard approach.
I'm also concerned by having to estimate the whole of our backlog all in one go up-front, as that seems very time consuming, and it seems likely that we'll discover more information in the months before we get to that story, which will affect our estimates.
Are there recommended approaches to dealing with setting deadlines to allow for an agile development process? Most of the information I've seen seems to be around handling the situation once you've got a fixed deadline to hit instead. I'd also be interested in any relevant literature or interesting blog posts that cover this issue.
Regarding literature: the best book I know regarding the estimation in software is "Software Estimation: Demystifying the Black Art" by Steve McConnel. It covers your case. Plus, it describes the difference between estimation and commitment (set-deadline, in other words) and explains how to derive the second from the first reliably.
The naive solution is to get the minimum list of stories that would
provide a demo-able product, estimate all of those individually, and
add them up and combine with our velocity to get a date, and announce
we'll be demoing from then. That leaves no leeway though, and seems
likely to result in a mad crunch as we get up to deadline time, which
I desperately want to avoid.
This is the solution I have used in the past. Your initial estimate is going to be off a bit so add some slack via a couple of additional sprints before setting your release date. If you get behind you can make it up in the slack. If not, your product backlog gives you additional features that you can include in the release if you so choose. This will be dependent on your velocity metric for the team though. Adjust your slack based on how accurate you feel this metric is for the current team. Once you have a target release you can circle back to see if you have any known resource constraints that might affect that release.
The approach you describe is likely to be correct. You may want to estimate for all desirable features, and prioritise UI elements (because investors and customers basically just see the shiny UI), and then your deadline will be that estimated date for completion; then add on some slack in the form of scaling your estimates. Use the ratio between current productivity and your worst period to create a pessimistic estimate. You can use that same ratio to scale shorter estimates (e.g. for your estimate to the minimum feature set).

Resources