Azure QnA Maker Update Confidence - azure-bot-service

Is it possible to change confidence of Azure QnA answers pragmatically.
Suppose QnA maker API returns 4 possible answers for a question with confidence
Answer 1=> 90%
Answer 2=> 85%
Answer 3=> 80%
Answer 4=> 70%
Now we want to sent the confidence of Answer 3 to 95%. So that this answer will come on top next time for same question.
How to do it with QnA API, any sample code or reference manual?
Note: Adding alternative question is not a solution for our problem. As admin can change the confidence of answers, based on their business logic in future.

Related

MS LUIS: Number of Intents / Data Imbalance

I am seeing on the LUIS documentation page here that you absolutely recommend to treat Data Imbalance (e.g. the differing number of total unterances compared amongst various intents) as a first priority. We currently see a mean of 19 Utterances per Intent on our dashboard, so in my opinion I should optimize all Intents towards having about 20 Utterances each as an example.
Now my question: When I use active learning by adding Endpoint Utterances, Utterances will be added to the intent we see them fitting (Active Learning Documentation). How can I ensure, that the number of utterances per intent will always remain equal (e.g. around 20 in our example)? In my opinion naturally by attributing endpoint utterances to Intents, a Data Imbalance will be created again.
Thanks a lot!
Best,
Mark
After your initial model is satisfactory, there no longer needs to be equality between intents, active learning specifically tries to correct for cases that were unseen of before, so if other examples already cover all your cases, then you don’t need to actively correct it.

i want to train luis ai with sufficient utterances uploaded through luis api

I am new LUIS AI.
I would like to train luis for my bot users who wants to buy books online. It is possible to enter I want XYZ, where XYZ is a book or I want ABC, where ABC is an author.
They can write find, find out, search, searching, looking, would like to see, would like to find or anything they want to write.
My requirement is to begin with an excel-sheet with utterances and entities and when I upload it, click on train, the application should be trained enough to handle all such user input, at least 90%.
The problem here is how should I write utterances to handle huge probability of user input. I have already approx 65 utterances which includes relevant and diverse utterance but still it is not getting trained to handle all user input.
Please suggest me how to proceed with the utterances to meet this requirement.
Scientist often take 30 minutes of conversation or 200 utterances as a good enough sample to conduct research [1] That is an order of magnitude estimate that is good to know and compare ourselves to.
Now, to get the most variability of incoming utterances, one must find a good origin of similar requests. For my case, sites like yahoo answers is great for finding the usual structure of requests on the topic I work into. I would suggest you to find a place where people query with a similar objective: Google adwords helper is a general but solid start.
[1] http://www.scielo.br/scielo.php?pid=S1516-18462015000401143&script=sci_arttext&tlng=en

Algorithm for most popular posts based on likes, shares and views [duplicate]

This question already has answers here:
Popularity Algorithm
(5 answers)
Closed 8 years ago.
I am working on a website which will have gazillions of stories. Stories in all formats: texts, videos, photos and other multimedia elements. stories can be filtered on various basis some of which are "new" which obviously will contain latest stories first, "featured" stories which will be marked featured manually and "popular" for which I need to come up with an algorithm.
So far what I am doing is taking average of facebook likes, number of shares (including both facebook, twitter or any other shares) and number of views. But this doesn't look good to me. Because giving equal weight-age to all three metrics doesn't sound genuine for reasons like social spamming etc.
Looking forward to some really good algorithms to rank popularity of stories.
----Addition-----
Popularity Algorithm discusses algorithm only based on "likes" and the algorithm is based on to categorize results in categories of timestamps: popular on day, week and month. whereas This has an answer which nearly answers my query but not exactly because the metrics is assumed there. I am looking for some exact metric with genuine explanation. For eg "facebook *2", with an explanation of why *2 for facebook. I hope I am not duplicating now!
I'd suggest trying to use a regression algorithm. The most widely used is linear regression, but if that model does not fit - feel free to explore others.
First, determine the features of each story. Your features are
likes, tweets, shares, views, .... I'd also add a boolean indicator
(variable that can be values 0 or 1 only) for each of the types
(video/photo/...).
Next, create yourself a training-set - which is a set of stories
where you (or other human experts) have given a score to.
Now, using these features and the training set - use some regression
algorithm to create a model that best fits the features you have to
the examples you already scored.1
After you have a model - you can use it to give a score to all other
documents.
Regarding spammers detection - you could try anomality detection algorithms
(1) Actually, step 2 and 3 can be done together - using active regression techniques - in active regression, the learner (algorithm) asks you for the examples that will make the algorithm learn as fast as it can. From my experiments PAlice is a very well performing active regression algorithm.

Distribution among users for collaborative voting algorithm

Users of my application (it's a game actually) answer questions to get points. Questions are supplied by other users. Due to volume, I cannot check everything myself, so I decided to crowd-source the filtering process to the users (players). The rules are simple:
each user is shown a question to rate as good/bad/unsure
when question is rated 5 times as "bad" it is removed from the pool
when question is rated 5 times as "good" it is removed from the poll and flagged to be played by other players who have not seen it
If everyone could see everything, this would be easy. However, later in the game phase, users shouldn't get questions they have already seen. This means that users should not see all the questions, and exactly those they don't see would they get to play (answer) later in the game.
Total number of questions is much larger than number of players, new questions are added daily and new players come all the time, so I cannot just distribute in advance.
I'm looking for some algorithm that would maximize the number of rated playable (i.e. unseen) questions for all players.
I tried to google, but I'm not even sure which terms to put in the search box, and using stuff like "distribution", "voting", "collaborative filtering" gives very interesting but unusable results.
Ratio of good vs bad questions is 1:3, ie. 25% of questions are rated good. Number of already submitted unrated questions is over 10000. Number of active users with privilege to vote is around 150.
I'm currently considering splitting the question pool and user base into 2 parts. One part of the user base would check the question for the other part and vice versa. Splitting the questions is easy (odd vs even for example). However, I'm still not sure how to divide the user base. I thought about using odd/even position in "top question checkers" list, however the positions on list changes daily as new questions are checked.
Update: I just asked a sequel to this question - I need to periodically remove a fixed number of questions from the pool.
I'm unaware if there is a specific, well known algorithm for this. However this would be my line of thinking:
"maximize the number of rated playable (i.e. unseen) questions for all players" means both maximising the number of questions with +5 and the number of not-seen questions from each player.
Whatever the algorithm will be, its effectiveness is tied to both the quality of the questions submitted by the contributors and the willingness to rate by other players (unless you force them to rate questions).
The goal of your system should not be that of making all players to have the same amount of "unseen questions" [this is in fact irrelevant], but rather that of always having for each player a "reserve" of unseen questions that allows him/her to play at its normal gamerate. For example: say you have two users A and B who play regularly on your site. A normally answers 80 quizzes per day, while B only 40. If your system in average get 100 new approved questions daily, in principle you would like player A to never see more than 20 of those every day, while player B could safely see 60 of them.
The ratio between submitted question and approved question is also important: if every second submitted question is not good, then users A and B from above could rate 40 and 120 questions daily.
So my approach to the final algorithm would be:
Keep track of the following:
Number of submitted new question per day (F = Flow)
Ratio between good/total submitted questions (Q = quality)
Number of questions used (for playing, not for rating) by each player per day (GR = Game Rate)
Number of questions rated by each player on a given day (RC = Review Counter)
Establish a priority queue of questions to be rated. The goal here is to have approved questions as fast as possible. Give a bonus priority to both:
questions that have collected upvotes already
questions submitted by users who have a history of other questions having already been accepted.
When a player is involved in rating, show him/her the first question in the queue.
Repeat step 3 as much as you want making sure this condition is never met: Q * (F - RC) < GR
[The above can be further tweaked, considering the fact that when the user first register, there will be already a pool of approved but unseen questions on the site]
Of course you can heavily influence the behaviour of users by giving incentives for meritorious activity (badges and reputation points on SO are a self-explanatory example).
EDIT/ADDENDUM: The discussion in the comments clarify that the GR is fixed, and it is one question per day. Furthermore, the OP states that there will be at least 1 new approved question in the system every 24 hours. This means that it is possible to simplify the above algorithm in one of the two forms:
If the user can vote only AFTER he answered his daily question:
If there is at least one approved, unseen question in the system, let the user vote at will.
If the user can vote even BEFORE answering his daily question:
If there are at least two approved, unseen questions in the system, let the user vote at will.
This is such that if a user is voting all votable questions on the system and then answers his daily one at 23:59, there will still be a question available to be answered at 00:00, plus 24h time for the system to acquire a new question for the following day.
HTH!

What's the prediction algorithm behind websites like farecast.com (bing travel)?

I think everything is in the title of the question: What's the prediction algorithm behind farecast.com (bing travel) ?
The website : http://www.bing.com/travel/ originally named http://farecast.com before it was bought buy bing is a website that predicts AirFares to help you purchase tickets when they are the cheapest.
I know farecast algorithm is based on historical prices. They used a huge database of airfare observations to build the predictions.
But like options (in finance call/put), there are formulas to calculate the plane ticket prices, so there must be more than just simple datamining behind their algorithm. (for exemple getting historical datas to find the different parameters in a generic formula for pricing tickets - like finding the implied volatility from historical prices of options.)
Can someone tell me what is the theory behind these kind of prediction?
I believe the theory is pretty new since the idea came up in 2003, only 8 years ago.
Hope my question is clear,
Thanks in advance
EDIT
A very quick edit to answer yi_H comment:
I'm looking for recent papers on forecasting algorithm based on hitorical prices and pricing calculation.
Such algorithm may exist in Financial engineering, and farecast might have used quantitative finance algorithm to predict price of options to help them predict airfares.
if by chance someone knows the algorithm farecast uses, it would be great.
Thanks again
Actually the prices are not based on historic data.
A few years ago I heard a talk from a guy who works for ITA software (who built the system that is used by orbitz and was recently bought by google).
Here are some slides by a founder:
http://www.demarcken.org/carl/papers/ITA-software-travel-complexity/img9.html
The airlines maintain a database with the air fares that is propagated to those airfare optimizers.
However, the airfare system is overly complicated and it is very hard to find optimal prices.
In the talk, that I heard, the speaker said, they were working with a Canadian airline to get rid of the old database stuff and replace it with something more efficient.

Resources