How to make LUIS respond with the matched entity - azure-language-understanding

I am setting up a LUIS service for dutch.
I have this sentence:
Hi, ik ben igor -> meaning Hi, I'm igor
Where Hi is an simple entity called Hey, that can have multiple different values such as (hey, hello, ..) which I specified as a list in the phrases.
And Igor is a simple entity called Name
In the dashboard I can see that Igor has been correctly mapped as a Name entity, but the retrieved result is the following:
{
"query": "Hi, ik ben igor",
"topScoringIntent": {
"intent": "Greeting",
"score": 0.462906122
},
"intents": [
{
"intent": "Greeting",
"score": 0.462906122
},
{
"intent": "None",
"score": 0.41605103
}
],
"entities": [
{
"entity": "hi",
"type": "Hey",
"startIndex": 0,
"endIndex": 1,
"score": 0.9947428
}
]
}
Is it possible to solve this? I do not want to make a phrase list of all the names that exist.

Managed to train LUIS to even recognize asdaasdasd:
{
"query": "Heey, ik ben asdaasdasd",
"topScoringIntent": {
"intent": "Greeting",
"score": 0.5320666
},
"intents": [
{
"intent": "Greeting",
"score": 0.5320666
},
{
"intent": "None",
"score": 0.236944184
}
],
"entities": [
{
"entity": "asdaasdasd",
"type": "Name",
"startIndex": 13,
"endIndex": 22,
"score": 0.8811139
}
]
}
To be honest I do not have a great guide on how to do this:
Add multiple example utterances with example entity position
Did this for about 5 utterances
No phrase list necessary
I'm going to accept this as an answer, but once someone explains in-depth and technically what is happening behind the covers, I will accept that answer.

Related

How do I handle the special character ‘/’ in rasa?

For example, the following sentence: “hi good afternoon how are you doing”. The intent of this joke is ‘greet’. When I request the API of ‘model / parse’, it will return the correct intent and entity. But when I add the special character ‘/’ in front of this sentence, such as: “/ hi good afternoon how are you doing”, the intention returned by ‘model / parse’ is ‘/ hi good afternoon how are you doing’ instead of ‘greet’. I read the source code of rasa, as follows:
How do I deal with the special character ‘/’ to take into account the RegexInterpreter of the source code and I cite this example? It is best to solve it without modifying the source code. Please help me, thanks.
answer to:
I want to implement the following three functions at the same time:
When message.text = '/ greet {"people": "tom"}':
The actual result returned by ‘model / parse’ is as follows:
{
"text": "/ greet {\" people \ ": \" tom \ "}",
"intent": {
"name": "greet",
"confidence": 1.0
},
"intent_ranking": [
{
"name": "greet",
"confidence": 1.0
}
],
"entities": [
{
"entity": "people",
"start": 6,
"end": 22,
"value": "tom"
}
]
}
2, when message.text = 'Hi Tom, good afternoon'
The actual result returned by ‘model / parse’ is as follows:
{
"text": "Hi Tom, good afternoon",
"intent": {
"name": "greet",
"confidence": 0.923
},
"intent_ranking": [
{
"name": "greet",
"confidence": 0.923
}
],
"entities": [
{
"entity": "people",
"start": 2,
"end": 5,
"value": "tom",
"confidence": 0.8433478958,
"extractor": "CRFEntityExtractor"
}
]
}
3, when message.text = '/ Hi Tom, good afternoon'
The actual result returned by ‘model / parse’ is as follows (this is not what I want):
{
"text": "/ Hi Tom, good afternoon",
"intent": {
"name": "Hi Tom, good afternoon",
"confidence": 1.0
},
"intent_ranking": [
{
"name": "Hi Tom, good afternoon",
"confidence": 1.0
}
],
"entities": []
}
But the result I expect is as follows:
{
"text": "Hi Tom, good afternoon",
"intent": {
"name": "greet",
"confidence": 0.923
},
"intent_ranking": [
{
"name": "greet",
"confidence": 0.923
}
],
"entities": [
{
"entity": "people",
"start": 2,
"end": 5,
"value": "tom",
"confidence": 0.8433478958,
"extractor": "CRFEntityExtractor"
}
]
}
Note that the third and second difference is that the third message.text only adds '/' at the beginning
Therefore, is there a method that can solve this problem well, and can satisfy the above three situations at the same time?
The presence of / at the beginning of a user message is the set way to trigger an intent directly. From the example you've given, it doesn't look like something that would occur frequently (if at all) in a real user situation. That said, if you do want to be sure that only actual intents can be triggered with the /, you could create a custom component which strips off any leading / if it is not followed by an intent already in your training data.
However, before going to that effort, I'd recommend checking how often this is actually happening.

Entity Extraction fails for Sinhala Language

Trying chatbot development for Sinhala Language using RASA NLU.
My config.yml
pipeline:
- name: "WhitespaceTokenizer"
- name: "CRFEntityExtractor"
- name: "EntitySynonymMapper"
- name: "CountVectorsFeaturizer"
- name: "EmbeddingIntentClassifier"
And in data.json I have added sample data as below.
When I train nlu model and try sample input to extract, "සිංහලෙන්" as medium, it only outputs the intent and the entity value, and not the entity.
What am i doing wrong?
{
"text": "සිංහලෙන් දේශන පවත්වන්නේ නැද්ද?",
"intent": "ask_medium",
"entities": [{
"start":0,
"end":8,
"value": "සිංහලෙන්",
"entity": "medium"
}]
},
{
"text": "සිංහලෙන් lectures කරන්නේ නැද්ද?",
"intent": "ask_medium",
"entities": [{
"start":0,
"end":8,
"value": "සිංහලෙන්",
"entity": "medium"
}]
}
The response I get when testing the nlu model is
{'intent':
{'name': 'ask_langmedium', 'confidence': 0.9747527837753296}, 'entities':
[{'start': 10,
'end': 18,
'value': 'සිංහලෙන්',
'entity': '-',
'confidence': 0.5970129041418675,
'extractor': 'CRFEntityExtractor'}],
'intent_ranking': [
{'name': 'ask_langmedium', 'confidence': 0.9747527837753296},
{'name': 'ask_langmedium_request_possibility', 'confidence':
0.07433460652828217}],
'text': 'උගන්නන්නේ සිංහලෙන් ද ?'}
If this is your completed dataset then I am not sure how are you able to generate the model because rasa requires at least two intents. I added another intent with hello and rest of the data I just replicated your data in my own code and it worked out well and this is the output I've got.
Enter a message: උගන්නන්නේ සිංහලෙන් ද?
{
"intent": {
"name": "ask_medium",
"confidence": 0.9638749361038208
},
"entities": [
{
"start": 10,
"end": 18,
"value": "\u0dc3\u0dd2\u0d82\u0dc4\u0dbd\u0dd9\u0db1\u0dca",
"entity": "medium",
"confidence": 0.7177257810884379,
"extractor": "CRFEntityExtractor"
}
]
}
This is my full Code
DataSet.json
{
"rasa_nlu_data": {
"common_examples": [
{
"text": "හෙලෝ",
"intent": "hello",
"entities": []
},
{
"text": "සිංහලෙන් දේශන පවත්වන්නේ නැද්ද?",
"intent": "ask_medium",
"entities": [{
"start":0,
"end":8,
"value": "සිංහලෙන්",
"entity": "medium"
}]
},
{
"text": "සිංහලෙන් lectures කරන්නේ නැද්ද?",
"intent": "ask_medium",
"entities": [{
"start":0,
"end":8,
"value": "සිංහලෙන්",
"entity": "medium"
}]
}
],
"regex_features" : [],
"lookup_tables" : [],
"entity_synonyms": []
}
}
nlu_config.yml
pipeline: "supervised_embeddings"
Training Command
python -m rasa_nlu.train -c ./config/nlu_config.yml --data ./data/sh_data.json -o models --fixed_model_name nlu --project current --verbose
& testing.py
from rasa_nlu.model import Interpreter
import json
interpreter = Interpreter.load('./models/current/nlu')
def predict_intent(text):
results = interpreter.parse(text)
print(json.dumps({
"intent": results["intent"],
"entities": results["entities"]
}, indent=2))
keep_asking = True
while(keep_asking):
text = input('Enter a message: ')
if (text == 'exit'):
keep_asking = False
break
else:
predict_intent(text)

Best approch of Elastic Search time based feeds module?

I am new with elastic search and looking for the best solution with which i can create a feed module which have time based feeds along with there group and comment.
I learned little and come up with following.
PUT /group
{
"mappings": {
"groupDetail": {},
"content": {
"_parent": {
"type": "groupDetail"
}
},
"comment": {
"_parent": {
"type": "content"
}
}
}
}
so that will be placed separately as per index.
but than after i found one post where i found that parent child is costly operation for search than nested objects.
something like following is two group(feed) having details with content and comments as nested element.
{
"_index": "group",
"_type": "groupDetail",
"_id": 6829,
"_score": 1,
"_source": {
"groupid": 6829,
"name": "Jignesh Public",
"insdate": "2016-10-01T04:09:33.916Z",
"upddate": "2017-04-19T05:19:40.281Z",
"isVerified": true,
"tags": [
"spotrs",
"surat"
],
"content": [
{
"contentid": 1,
"type": "1",
"byUser": 5858,
"insdate": "2016-10-01 11:20",
"info": [
{
"t": 1,
"v": "lorem ipsum long text 1"
},
{
"t": 2,
"v": "http://www.imageurl.com/1"
}
],
"comments": [
{
"byuser": 5859,
"comment": "Comment 1",
"upddate": "2016-10-01T04:09:33.916Z"
},
{
"byuser": 5860,
"comment": "Comment 2",
"upddate": "2016-10-01T04:09:33.916Z"
}
]
},
{
"contentid": 2,
"type": "2",
"byUser": 5859,
"insdate": "2016-10-01 11:20",
"info": [
{
"t": 4,
"v": "http://www.videoURL.com/1"
}
],
"comments": [
{
"byuser": 5859,
"comment": "Comment 1",
"upddate": "2016-10-01T04:09:33.916Z"
},
{
"byuser": 5860,
"comment": "Comment 2",
"upddate": "2016-10-01T04:09:33.916Z"
}
]
}
]
}
}
{
"_index": "group",
"_type": "groupDetail",
"_id": 6849,
"_score": 1,
"_source": {
"groupid": 6849,
"name": "Xyz Group Public",
"insdate": "2016-10-01T04:09:33.916Z",
"upddate": "2017-04-19T05:19:40.281Z",
"isVerified": false,
"tags": [
"spotrs",
"food"
],
"content": [
{
"contentid": 3,
"type": "1",
"byUser": 5858,
"insdate": "2016-10-01 11:20",
"info": [
{
"t": 1,
"v": "lorem ipsum long text 3"
},
{
"t": 2,
"v": "http://www.imageurl.com/1"
}
],
"comments": [
{
"byuser": 5859,
"comment": "Comment 1",
"upddate": "2016-10-01T04:09:33.916Z"
},
{
"byuser": 5860,
"comment": "Comment 2",
"upddate": "2016-10-01T04:09:33.916Z"
}
]
},
{
"contentid": 4,
"type": "2",
"byUser": 5859,
"insdate": "2016-10-01 11:20",
"info": [
{
"t": 4,
"v": "http://www.videoURL.com/1"
}
],
"comments": [
{
"byuser": 5859,
"comment": "Comment 1",
"upddate": "2016-10-01T04:09:33.916Z"
},
{
"byuser": 5860,
"comment": "Comment 2",
"upddate": "2016-10-01T04:09:33.916Z"
}
]
}
]
}
}
now if i try to think with nested object than i confused if user add comment very frequently than reindexing factor will effect?
So main think i want to ask is which is the best approach with which i can add comment frequently and my content searching result is also faster.
Performance
Parent/child stores relevant data in same shards, as separately doc, which avoid the network;
Parent/child needs a joining process when retrieving data;
Nested object store the inner and outer object together, as a single doc;
So, we can infer:
Update nested object will re-index whole index, which can very expensive if your document is large;
Update parent or child alone will not affect the other one;
Searching nested object is a little fast, which save the process of joining;
Suggestions
As far as I understand your problem, you should use parent/child.
When your group's comments become more and more, adding a new comment will still re-index whole content, which can be very time-consuming;
On the other hand, search a comment with parent/child just need one more look up after finding the child, which is relative acceptable.
Furthermore, you should also take the rate of searching a comment comparing to adding a comment into account:
If you need searching a lot but a little new comments, maybe you can choose nested object;
Otherwise, choose parent/child;
By the way, you may combine both of them:
When this feed is active, use parent/child to store them;
When it is closed, i.e., no more comments can be added, move them to a new index with nested object;
If you do not specify more detailed info other than very frequently it is going to be hard to come up with a recommendation. Also you have not mentioned how your data looks like. A comment in a blog post might be happening rare, even in heated discussions. A comment/reply in a forum post (that will result in a huge document) might be sth very different. I'd personally start with nested and see how it goes, but I also do not know all the requirements, so this might be a very wrong answer.

I'm using Sentiment on NLU, getting this error: "warnings": [ "sentiment: cannot locate keyphrase"

when I enter this request:
{
"text": "
Il sindaco pensa solo a far realizzare rotonde...non lo disturbate per le cavolate! ,Che schifo!
",
"features":
{
"sentiment": {
"targets": [
"aggressione", "aggressioni", "agguati", "agguato", "furto", "furti", "lavoro nero",
"omicidi", "omicidio", "rapina", "rapine", "ricettazione", "ricettazioni", "rom", "zingari", "zingaro",
"scippo", "scippi", "spaccio", "scommesse"
]
},
"categories": {},
"entities": {
"emotion": true,
"sentiment": true,
"limit": 5
},
"keywords": {
"emotion": true,
"sentiment": true,
"limit": 5
}
}
}
I get this response:
{
"language": "it",
"keywords": [
{
"text": ",Che schifo",
"relevance": 0.768142
}
],
"entities": [],
"categories": [
{
"score": 0.190673,
"label": "/law, govt and politics/law enforcement/police"
},
{
"score": 0.180499,
"label": "/style and fashion/clothing/pants"
},
{
"score": 0.160763,
"label": "/society/crime"
}
],
"warnings": [
"sentiment: cannot locate keyphrase"
]
}
Why I don't receive output for the document sentiment? if NLU does not find the key phrase it gives back this warning without the sentiment for the text! is this a NLU error to fix?
If NLU does not find any of the keyphrases you passed, then it would throw the warning "cannot locate keyphrase". It does return the doc sentiment even when one of the targets is present in the text.
If you are not sure about the presence of target phrases in your text, make a separate API call just for sentiment without any targets for retrieving document sentiment.
I would not say it as a bug on NLU Side but the service can be lenient instead of being strict if it did not find any target phrase in a given text.

Use a many to many relation in Elasticsearch

Currently we have a problem to perform a query (or more precisely to design a mapping) in elasticsearch, which help us to perform a query over a relational problem, that we didn't get solved with our non-document orientated thinking from sql.
We want to create a many-to-many relation between different Elasticsearch entries. We need this to edit an entry once and keep all using’s updated to this.
To describe the problem, we'll use the following simple data model:
Broadcast Content
------------ ---------
Id Id
Day Title
Contents [] Description
So we have two different types to index, broadcasts and contents.
A broadcast can have many contents and single contents could also be part of different broadcasts (e.g. repetition).
JSON like:
index/broadcasts
{
"Id": "B1",
"Day": "2014-10-15",
"Contents": [
"C1",
"C2"
]
}
{
"Id": "B2",
"Day": "2014-10-16",
"Contents": [
"C1",
"C3"
]
}
index/contents
{
"Id": "C1",
"Title": "Wake up",
"Description": "Morning show with Jeff Bridges"
}
{
"Id": "C2",
"Title": "Have a break!",
"Description": "Everything about Android"
}
{
"Id": "C3",
"Title": "Late Night Disaster",
"Description": "Comedy show"
}
Now we want to rename the "Late Night Disaster" into something more precisely and keep all references up to date.
How could we approach this? Are there fourther options in ES, like includes in RavenDB?
Nested objects or child-parent relations didn't helped us so far.
What about denormalizing? seems difficult if we come from the SQL mindset, but give you a try, even with millions of documents, LUCENE indexing can help, and renaming will be a batch job.
[
{
"Id": "B1",
"Day": "2014-10-15",
"Contents": [
{
"Title": "Wake up",
"Description": "Morning show with Jeff Bridges"
},
{
"Title": "Have a break!",
"Description": "Everything about Android"
}
]
},
{
"Id": "B2",
"Day": "2014-10-16",
"Contents": [
{
"Title": "Wake up",
"Description": "Morning show with Jeff Bridges"
},
{
"Title": "Late Night Disaster",
"Description": "Comedy show"
}
]
}
]

Resources