RASA Story w/ Slots - entities

Can someone clarify for me with Rasa stories involving slots:
## story with email
* intent_request_email
- utter_request_email
* inform_email {"email":"example#example.com"}
- slot {"email":"example#example.com"}
- utter_thanks
In the above, does example#example.com act as a placeholder for any email address (ie, will work equally for john.smith#somedomain.com), or is this effectively limiting this story to the case when the email provided is exactly example#example.com
Now consider the case below, for entities that are floats:
## story with numeric
* intent_want_to_buy
- utter_request_budget
* inform_budget {"amount":100}
- slot {"amount":100}
- utter_thanks
Does the 100 act as a placeholder for any amount that is provided (ie 200, 300, 65.95), or, is it actually saying that this story is to be applied only when and only if the user states their budget is exactly $100.00.
With the above in mind, how does one control a story to be executed when the slot has NOT been set, vs a path to be taken when the slot has been filled/provided.
The documentation is rather lacking in these kinds of basics, which are obvious once known, but are not so obvious for someone new to Rasa.

The specific entity values in the stories are placeholders only and do not effect the story line.
Only in the nlu training data the annotated entity values have an effect, helping intent classification.

Related

how to handle spelling mistake(typos) in entity extraction in Rasa NLU?

I have few intents in my training set(nlu_data.md file) with sufficient amount of training examples under each intent.
Following is an example,
##intent: SEARCH_HOTEL
- find good [hotel](place) for me in Mumbai
I have added multiple sentences like this.
At the time of testing, all sentences in training file are working fine. But if any input query is having spelling mistake e.g, hotol/hetel/hotele for hotel keyword then Rasa NLU is unable to extract it as an entity.
I want to resolve this issue.
I am allowed to change only training data, also restricted not to write any custom component for this.
To handle spelling mistakes like this in entities, you should add these examples to your training data. So something like this:
##intent: SEARCH_HOTEL
- find good [hotel](place) for me in Mumbai
- looking for a [hotol](place) in Chennai
- [hetel](place) in Berlin please
Once you've added enough examples, the model should be able to generalise from the sentence structure.
If you're not using it already, it also makes sense to use the character-level CountVectorFeaturizer. That should be in the default pipeline described on this page already
One thing I would highly suggest you to use is to use look-up tables with fuzzywuzzy matching. If you have limited number of entities (like country names) look-up tables are quite fast, and fuzzy matching catches typos when that entity exists in your look-up table (searching for typo variations of those entities). There's a whole blogpost about it here: on Rasa.
There's a working implementation of fuzzy wuzzy as a custom component:
class FuzzyExtractor(Component):
name = "FuzzyExtractor"
provides = ["entities"]
requires = ["tokens"]
defaults = {}
language_list ["en"]
threshold = 90
def __init__(self, component_config=None, *args):
super(FuzzyExtractor, self).__init__(component_config)
def train(self, training_data, cfg, **kwargs):
pass
def process(self, message, **kwargs):
entities = list(message.get('entities'))
# Get file path of lookup table in json format
cur_path = os.path.dirname(__file__)
if os.name == 'nt':
partial_lookup_file_path = '..\\data\\lookup_master.json'
else:
partial_lookup_file_path = '../data/lookup_master.json'
lookup_file_path = os.path.join(cur_path, partial_lookup_file_path)
with open(lookup_file_path, 'r') as file:
lookup_data = json.load(file)['data']
tokens = message.get('tokens')
for token in tokens:
# STOP_WORDS is just a dictionary of stop words from NLTK
if token.text not in STOP_WORDS:
fuzzy_results = process.extract(
token.text,
lookup_data,
processor=lambda a: a['value']
if isinstance(a, dict) else a,
limit=10)
for result, confidence in fuzzy_results:
if confidence >= self.threshold:
entities.append({
"start": token.offset,
"end": token.end,
"value": token.text,
"fuzzy_value": result["value"],
"confidence": confidence,
"entity": result["entity"]
})
file.close()
message.set("entities", entities, add_to_output=True)
But I didn't implement it, it was implemented and validated here: Rasa forum
Then you will just pass it to your NLU pipeline in config.yml file.
Its a strange request that they ask you not to change the code or do custom components.
The approach you would have to take would be to use entity synonyms. A slight edit on a previous answer:
##intent: SEARCH_HOTEL
- find good [hotel](place) for me in Mumbai
- looking for a [hotol](place:hotel) in Chennai
- [hetel](place:hotel) in Berlin please
This way even if the user enters a typo, the correct entity will be extracted. If you want this to be foolproof, I do not recommend hand-editing the intents. Use some kind of automated tool for generating the training data. E.g. Generate misspelled words (typos)
First of all, add samples for the most common typos for your entities as advised here
Beyond this, you need a spellchecker.
I am not sure whether there is a single library that can be used in the pipeline, but if not you need to create a custom component. Otherwise, dealing with only training data is not feasible. You can't create samples for each typo.
Using Fuzzywuzzy is one of the ways, generally, it is slow and it doesn't solve all the issues.
Universal Encoder is another solution.
There should be more options for spell correction, but you will need to write code in any way.

Firebase many to many performance

I'm wondering about the performance of Firebase when making n + 1 queries. Let's consider the example in this article https://www.firebase.com/blog/2013-04-12-denormalizing-is-normal.html where a link has many comments. If I want to get all of the comments for a link I have to:
Make 1 query to get the index of comments under the link
For each comment ID, make a query to get that comment.
Here's the sample code from that article that fetches all comments belonging to a link:
var commentsRef = new Firebase("https://awesome.firebaseio-demo.com/comments");
var linkRef = new Firebase("https://awesome.firebaseio-demo.com/links");
var linkCommentsRef = linkRef.child(LINK_ID).child("comments");
linkCommentsRef.on("child_added", function(snap) {
commentsRef.child(snap.key()).once("value", function() {
// Render the comment on the link page.
));
});
I'm wondering if this is a performance concern as compared to the equivalent of this query if I were using a SQL database where I could make a single query on comments: SELECT * FROM comments WHERE link_id = LINK_ID clause.
Imagine I have a link with 1000 comments. In SQL this would be a single query, but in Firebase this would be 1001 queries. Should I be worried about the performance of this?
One thing to keep in mind is that Firebase works over web sockets (where available), so while there may be 1001 round trips there is only one connection that needs to be established. Also: a lot of the round trips will be happening in parallel. So you might be surprised at how much time this takes.
Should I worry about this?
In general people over-estimate the amount of use they'll get. So (again: in general) I recommend that you don't worry about it until you actually have that many comments. But from day 1, ensure that nothing you do today precludes optimizing later.
One way to optimize is to further denormalize your data. If you already know that you need all comments every time you render an article, you can also consider duplicating the comments into the article.
A fairly common scenario:
/users
twitter:4784
name: "Frank van Puffelen"
otherData: ....
/messages
-J4377684
text: "Hello world"
uid: "twitter:4784"
name: "Frank van Puffelen"
-J4377964
text: "Welcome to StackOverflow"
uid: "twitter:4784"
name: "Frank van Puffelen"
So in the above data snippet I store both the user's uid and their name for every message. While I could look up the name from the uid, having the name in the messages means I can display the messages without the lookup. I'm also keeping the uid, so that I provide a link to the user's profile page (or other message).
We recently had a good question about this, where I wrote more about the approaches I consider for keeping the derived data up to date: How to write denormalized data in Firebase

Kendo ui grid (web) row height , fixed cell height

One of the column in my Kendo grid is the "Notes Column" (which has atleast 3000 characters).
Now the problem I'm facing is the grid cell(along with the row) expands to the size of the characters in the cell. It makes my cell huge.
I would like to make the grid cell single line with a fixed amount of characters and have a tooltip on the cell.
I'm not sure whether I can achieve it.
Please let me know the possible solution for the above case.
HAve tried some css changes :
.k-grid tbody tr{height:38px;} //Not working
Sample data in a cell in the Column (Notes):
17/11/2010 - Not received many enquiries, uses Extra Sure and Holmans. Finds our Medical Screening too lengthy. Took her through system and printed off Mes Screen questions, will use us more than N J Heritage. She will aim to use us for new enquiries.16/02/11 - L/M - R/C.08/03/2011 - JCO - Spoke to Matthew Salmon, not finding us competitive been using ExtraSure who are a lot cheaper. Advised our USP's and our benefits. Will use us for next travel enquiry.16/06/11 - Jane spoke to Richard and will be taking him through the system, SunWorld Plus.12/10/11-I Have spoken to Richard, I have emailed across details of Sunworld Extra & medical screening along with Username & Password. MP01/11/2011 - JCO spoke to Richard, Richard issuing a quotation today, likes that we don't have any age limits and restricted to 85 on AMT policies. First time to use us, usually use Citybond, likes the look of our product. JCO advised to contact me if requires detailed explanation of system. 25/05/12 - MW - Spoke to Richard, he is very nice, I thanked them for their continued support in using SunWorld and I am sending him the email with the Special Features and the SunWorld Extra info.. He said he rates SunWorld 8 and a half out of ten because he would like to see the option to increase the single article limit and also the rates have gone up quite a bit recently.. He said they don't really do that much travel but they are going to be pushing it over the next year because he said he thinks people are getting fed up of going on the internet to get insurance and realising they aren't actually covered for anything... He is generally happy with everything.08/08/12 - MW - Spoke to Luke, I asked why they hadn't used us since June and he said it was just because of a slow down in enquiries. Travel is not something they push, they just offer it to accommodate their existing clients. He said the only use us and one other provider so any enquiries they do get they always quote with us, he has done some quotes this week but they haven't come back. They are very happy with everything. No problems etc. I am sending him the Special Features for 2012 and also SunWorld Extra info,16/08/12 - MW - Spoke to Richard, asked if they would be interested in having a poster, he said it would not really be of any use to them as they are not a high street broker, they are in an office and not customer facing, he said they don't really do much travel, they are mainly a commercial broker but they are happy with any travel business they can do, what they would like is a flyer as opposed to a poster so they can email it out to their customers. He said he thinks the product is great, likes the age limits and limits etc,14/11/12 - MW - Spoke to Luke, told him about Snowman Cover and Broker Survey.04/12/12 - MW - Spoke to Luke, he said they are really quiet at the moment. Only using SunWorld but just not getting the enquiries. he is happy with SunWorld though and I have told him about the Changes for 2013.08/02/13 - MW - I can see that they said back in August that they would like some leaflets so I am sending them some out.28/02/13 - MW - Luke has sent this email - "Sorry, not sure if you are still doing this but can we havesome leaflets to send with our renewals to try and offer your services J thanks" , So I am sending them out some more leaflets.01/05/13 - MW - Spoke to Richard, he said the main person who does the travel, Luke, is on holiday in Turkey for the rest of the week and will be back on Tuesday. I have made a note in my diary to give him a call back on Wednesday.08/05/13 - MW - Spoke to Luke, he said SunWorld are their main travel provider, they have not really had many enquiries for travel lately. He said our rates are competitive for annual but people can get travel insurance so cheap online now that he thinks they have just been doing that. I said we will reduce the rates for him and he said that would be good. They are also set up to use us via the AXA route which he said is fine to be deactivated and they will carry on using this one as they have been. He said he would like some leaflets as he never received the last lot so I have checked his address and I am sending out 20 more. I told him about the new product and I am sending him the email with the Underwriting Changes and Special Features for 2013. RATES REDUCED01/07/13 - MW - Spoke to Luke, he said they are only using SunWorld so they must have not had any enquiries and that's why they haven't used us in the last month. He said they only really offer travel insurance to accommodate their existing clients, they don't really push for it. He said they have got the leaflets and they will be sending them out with renewals etc. As soon as they get the enquiries, we will be getting the business.13/09/13 - MW - Spoke to Richard, told him about the new product going live on the 1st October. I am sending him the email with the Underwriting Changes and Special Features for 2013. He said they would like some leaflets so I have confirmed their address and I am sending out 20.01/11/13 - MW - Spoke to Richard, he said they are very quiet at the moment, their customers are not going away and that's why they haven't issued anything. Luke is the main person who deals with this and he will be our contact going forward because he is the one who deals with it most of the time. I told Richard about the video tutorial and he is going to tell Luke and he will let us know if he has any queries. I am sending the email with the New Special Features and further information on the changes we have made to the website. luke.robson#aifltd.co.uk02/01/14 - MW - Spoke to Luke, I have confirmed all the contact information is correct and I have added his email address to the spreadsheet for the 2014 mailer and then he will forward it around to all the others. He said they only use SunWorld and it is the easiest to use, they just haven't had any enquiries for travel. I am sending him the email with the New Special Features and information about the changes we have made to the system. He said he has asked for some leaflets before but he has not received them. He really wants some to send out with all his renewals so I am sending him out 60 leaflets.
In addition to limit the height of the row, you have to say that the excess in text should be hidden.
Try adding the following style:
.k-grid tbody tr td {
overflow: hidden;
text-overflow: ellipsis;
white-space: nowrap;
}
See it here: http://jsfiddle.net/OnaBai/8U6rg/1/
For showing a tooltip you need to use a template that includes both as title and as content the value of the cell. Example of the column definition for a field called name:
{
field: "name",
title: "Name",
template: "<span title='${name}'>${name}</span>"
}
See it here: http://jsfiddle.net/OnaBai/8U6rg/3
EDIT: If you want to show an HTML formatted text in tooltip, you cannot use standard tooltips and you will have to use KendoUI tooltips.
To do so, you have to do:
In order to show the content of the cell as HTML, you should add the option encoded set to false, something like:
{
field: "name",
title: "Name",
encoded: false
}
Next, to use a KendoUI tooltip for this, what we are going to do is create a KendoUI Tooltip widget for each cell. You should do this once the grid is rendered, so I do it in Grid's dataBound handler:
dataBound: function() {
$("#grid").kendoTooltip({
...
});
}
To limit what to tooltip I'm going to mark the cells with the CSS class onabai, so now my column definition is:
{
field: "name",
title: "Name",
encoded: false,
template: "<span class='onabai'>#=name#</span>"
}
And the Tooltip in dataBound is:
dataBound: function() {
$("#grid").kendoTooltip({
filter: ".onabai",
position : "left",
width: 200,
...
});
}
But, we still have to say what is going to the content of the tooltip. To do so we have to define a content property and define a function that returns the content of the cell in the Grid. We do this using e.target.html()
dataBound: function() {
$("#grid").kendoTooltip({
filter: ".onabai",
position : "left",
width: 200,
content: function(e) {
return e.target.html();
}
});
You can see this running here: http://jsfiddle.net/OnaBai/8U6rg/8/

Naive Bayesian and zero-frequency issue

I think I've implemented most of it correctly. One part confused me:
The zero-frequency problem:
Add 1 to the count for every attribute value-class combination (Laplace estimator) when an attribute value doesn’t occur with every class value.
Here's some of my client code:
//Clasify
string text = "Claim your free Macbook now!";
double posteriorProbSpam = classifier.Classify(text, "spam");
Console.WriteLine("-------------------------");
double posteriorProbHam = classifier.Classify(text, "ham");
Now say the word 'free' is present in the training data somewhere
//Training
classifier.Train("ham", "Attention: Collect your Macbook from store.");
*Lot more here*
classifier.Train("spam", "Free macbook offer expiring.");
But the word is present in my training data for category 'spam' only not in 'ham'. So when I go to calculate posteriorProbHam what do i do when I come across the word 'free'.
Still add one. The reason: Naive Bayes models P("free" | spam) and P("free" | ham) as being completely independent, so you want to estimate the probability of each completely independently. The Laplace estimator you're using for P("free" | spam) is (count("free" | spam) + 1) / count(spam); P("ham" | spam) is the same.
If you think about what it would mean to not add one, it wouldn't really make sense: seeing "free" one time in ham would make it less likely to see "free" in spam.

iphone's phone number splitting algorithm?

iPhone has a pretty good telephone number splitting function, for example:
Singapore mobile: +65 9852 4135
Singapore resident line: +65 6325 6524
China mobile: +86 135-6952-3685
China resident line: +86 10-65236528
HongKong: +886 956-238-82
USA: +1 (732) 865-3286
Notice the nice features here:
- the splitting of country code, area code, and the rest is automatic;
- the delimiter is also nicely adopted to different countries, e.g. "()", "-" and space.
Note the parsing logic is doable to me, however, I don't know where to get the knowledge of most countries' telephone number format.
where could i found such knowledge, or an open source code that implemented it?
You can get similar functionality with the libphonenumber code library.
Interestingly enough, you cannot use an NSNumberFormatter for this, but you can write your own custom class for it. Just create a new class, set properties such as countryCode, areaCode and number, and then create a method that formats the number based on the countryCode.
Here's a great example: http://the-lost-beauty.blogspot.com/2010/01/locale-sensitive-phone-number.html
As an aside: a friend told me about a gigantic regular expression he had to maintain that could pick telephone numbers out of intercepted communications from hundreds of countries around the world. It was very non-trivial.
Thankfully your problem is easier, as you can just have a table with the per-country formats:
format[usa] = "+d (ddd) ddd-dddd";
format[hk] = "+ddd ddd-ddd-dd";
format[china_mobile] = "+dd ddd-dddd-dddd";
...
Then when you're printing, you simply output one digit from the phone number string in each d spot as needed. This assumes you know the country, which is a safe enough assumption for telephone devices -- pick "default" formats for the few surrounding countries.
Since some countries have different formats with different lengths you might need to store your table with additional information:
format[germany][10] = "..."
format[germany][11] = "....."

Resources