RASA NLU: Can't extract entity - rasa-nlu

I've trained my rasa nlu model in a way that It recognizes the content in between square brackets as pst entity. For the training part, I had covered both the scenarios with more than 50 examples.
There are two scenarios(only space difference):
When I pass http://www.google.comm, 1283923, [9283911,9309212,9283238], it is considering only [ bracket as the pst entity.
When I pass http://www.google.comm, 1283923, [9283911, 9309212, 9283238], it is working fine and recognizing [9283911, 9309212, 9283238] as the pst entity as expected.
For the scenario 1, I've tried all the possible pipelines, but it only recognizes the first square bracket [ as the pst entity
In the response, I am getting this output:
{
'intent': {
'name': None,
'confidence': 0.0
},
'entities': [
{
'start': 0,
'end': 22,
'value': 'http://www.google.comm',
'entity': 'url',
'confidence': 0.8052099168500071,
'extractor': 'ner_crf'
},
{
'start': 24,
'end': 31,
'value': '1283923',
'entity': 'defect_id',
'confidence': 0.8334249141074151,
'extractor': 'ner_crf'
},
{
'start': 33,
'end': 34,
'value': '[',
'entity': 'pst',
'confidence': 0.5615805162522188,
'extractor': 'ner_crf'
}
],
'intent_ranking': [],
'text': 'http://www.google.comm, 1283923, [9283911,9309212,9283238]'
}
So, Can anyone tell me what I am missing in the configuration? The problem is happening because of spacing only, and my model should have the knowledge of spacing as I am providing the training data with both scenarios.

It is good idea to use Regex for your purpose. Rasa NLU supports extraction of Entities by Regex. Normal NLU training data will have something like below
{
"rasa_nlu_data": {
"common_examples": [
{
"text": "Hi",
"intent": "greet",
"entities": []
}]
}
}
You can provide Regex data for training as below in the NLU json file.
{
"rasa_nlu_data": {
"regex_features": [
{
"name": "pst",
"pattern": "\[..*\]"
},
]
}
}
Reference: Regular Expression in Rasal NLU

Related

How to filter string prefixes with Vega-lite

Is it possible to filter records with Vega-lite by strings?
Example:
record: "ABCD"
record: "AMFK"
record: "AMRK"
I would like to process only records where the string starts with "AM".
I studied the documentation and found solutions only for comparing the entire string. Is it possible to truncate the string? Or use something like "LEFT()" in Excel? Or something completely different?
Edit:
Possibly of importance, I'm using the Vega-lite app in Airtable.
You can do this using a filter transform along with an appropriate vega expression. For example (open in editor):
{
"data": {
"values": [
{"key": "ABCD", "value": 1},
{"key": "AMFK", "value": 2},
{"key": "AMRK", "value": 3}
]
},
"transform": [{"filter": "slice(datum.key, 0, 2) == 'AM'"}],
"mark": "bar",
"encoding": {
"x": {"type": "quantitative", "field": "value"},
"y": {"type": "nominal", "field": "key"}
}
}

Microsoft LUIS builtin.number

I used builtin.number in my LUIS app trying to collect a 4 digit pin number. The following is what's returned from LUIS when my input is "one two three four".
"entities": [
{
"entity": "one",
"type": "builtin.number",
"startIndex": 0,
"endIndex": 2,
"resolution": {
"value": "1"
}
},
{
"entity": "two",
"type": "builtin.number",
"startIndex": 4,
"endIndex": 6,
"resolution": {
"value": "2"
}
},
{
"entity": "three",
"type": "builtin.number",
"startIndex": 8,
"endIndex": 12,
"resolution": {
"value": "3"
}
},
{
"entity": "four",
"type": "builtin.number",
"startIndex": 14,
"endIndex": 17,
"resolution": {
"value": "4"
}
},
As you can see, it's returning individual digits in both text and digit format. Seems to me that it's more important to return the whole digit than the individual ones. Is there a way to do it so that I get '1234' as result for builtin.number?
Thanks!
It's not possible to do what you're asking for by only using LUIS. The way LUIS does its tokenization is that it recognizes each word/number individually due to the whitespace. It goes without saying that 'onetwothreefour' will also not return 1234.
Additionally, users are unable to modify the recognition of the prebuilt entities on an individual model level. The recognizers for certain languages are open-source, and contributions from the community are welcome.
All of that said, a way you could achieve what you're asking for is by concatenating the numbers. A JavaScript example might be something like the following:
var pin = '';
entities.forEach(entity => {
if (entity.type == 'builtin.number') {
pin += entity.resolution.value;
}
}
console.log(pin); // '1234'
After that you would need to perform your own handling/regexp, but I'll leave that to you. (after all, what if someone provides "seven eight nine ten"? Or "twenty seventeen"?)

How to index thousands sub-objects?

I've somethinh like it:
MainObject
~3000x SubObjects
Each sub ojects have ~2 SubSubObjects
Idem: ~1 SubSubSubObject
For each subOject I need to get a mainObject information (array of integer), for the moment when I had the MainObject in database with all its subObjects (via command in console) I duplicate the array for all objects (thousands duplication...) when I need to edit this array, I re-index all again... I'm sure I can do it better.
In the document I've see it exists many possibilities: object, nested, parent/child... But I don't really know which is the better...
And in an other post, someone explain me how to do with nested document, with aggregation... But I can't do it... And more I read, more I've doubt about the nested method...
Thank you for your help
Edit, simplified arborescence in JSON of my entities (in Doctrine)
{
"public": false,
"authorized_users": [1, 23, 51],
"chromosomes": [
{
"name": "C1",
"locus": [
{
"name": "locus1",
"features": [
{
"name": "feature1",
"products": [
{
"name": "product1"
//...
}
]
}
]
}
]
}
]
}
And I just do search on name for locus, features and products, but with a filter on public and authorized_users, thats why I do objects like (in Elasticsearch):
{
"_type": "locus",
"name": "locus1",
"public": false,
"authorized_users": [1, 23, 51],
},
{
"_type": "locus",
"name": "locus2",
"public": false,
"authorized_users": [1, 23, 51],
}
{
"_type": "feature",
"name": "feature1",
"public": false,
"authorized_users": [1, 23, 51],
}

Elasticsearch: does not give back result when searching for a simple 'a' character

I want to store tags for messages in ElasticSearch. I've defined the tags field as this:
{
'tags': {
'type': 'string',
'index_name': 'tag'
}
}
For a message I've stored the following list in the tags field:
['a','b','c']
Now if I try to search for tag 'b' with the following query, it gives back the message and the tags:
{
'filter': {
'limit': {
'value': 100
}
},
'query': {
'bool': {
'should': [
{
'text': {
'tags': 'b'
}
}
],
'minimum_number_should_match': 1
}
}
}
There goes the same with tag 'c'.
But if I search for tag 'a' with this:
{
'filter': {
'limit': {
'value': 100
}
},
'query': {
'bool': {
'should': [
{
'text': {
'tags': 'a'
}
}
],
'minimum_number_should_match': 1
}
}
}
It gives back no results at all!
The answer is:
{
'hits': {
'hits': [],
'total': 0,
'max_score': None
},
'_shards': {
'successful': 5,
'failed': 0,
'total': 5
},
'took': 1,
'timed_out': False
}
What am I doing wrong? (It doesn't matter that the 'a' is the first element of the list, the same goes for ['b','a','c']. It seems it has problems only with a single 'a' character.
If you didn't set any analyzer and mapping to your index, Elasticsearch uses its own analyzer by default. Elasticsearch's default_analyzer has stopwords filter that defaultly ignores English stopwords such as:
"a", "an", "and", "are", "as", "at", "be", "but", "by",
"for", "if", "in", "into", "is", "it",
"no", "not", "of", "on", "or", "such",
"that", "the", "their", "then", "there", "these",
"they", "this", "to", "was", "will", "with"
Before going for more just check ElasticSearch mapping and analyzer guides:
Analyzer Guide
Mapping Guide
There might be some stemming or stop word lists involved. Try making sure the field is not analyzed.
'tags': {'type': 'string', 'index_name': 'tag', "index" : "not_analyzed"}
Similar: matching whole string with dashes in elasticsearch

Error Parsing JSON from YELP API

I am having trouble parsing JSON from Yelp API. The JSON data looks like this:
{
region: {
span: {
latitude_delta: 0,
longitude_delta: 0
},
center: {
latitude: 38.054117,
longitude: -84.439002
}
},
total: 23,
businesses: [
{
is_claimed: false,
rating: 5,
mobile_url: "http://m.yelp.com/biz/vineyard-community-church-lexington",
rating_img_url: "http://s3-media1.ak.yelpcdn.com/assets/2/www/img/f1def11e4e79/ico/stars/v1/stars_5.png",
review_count: 2,
name: "Vineyard Community Church",
snippet_image_url: "http://s3-media4.ak.yelpcdn.com/photo/VoeMtbk7NRFi6diksSUtOQ/ms.jpg",
rating_img_url_small: "http://s3-media1.ak.yelpcdn.com/assets/2/www/img/c7623205d5cd/ico/stars/v1/stars_small_5.png",
url: "http://www.yelp.com/biz/vineyard-community-church-lexington",
phone: "8592582300",
snippet_text: "I have been a member of Vineyard Community Church since 2004. Here you will find a modern worship service with a full band, witty speakers who teach...",
image_url: "http://s3-media3.ak.yelpcdn.com/bphoto/D71eikniuaHjdOC8DB6ziA/ms.jpg",
categories: [
[
"Churches",
"churches"
]
],
display_phone: "+1-859-258-2300",
rating_img_url_large: "http://s3-media3.ak.yelpcdn.com/assets/2/www/img/22affc4e6c38/ico/stars/v1/stars_large_5.png",
id: "vineyard-community-church-lexington",
is_closed: false,
location: {
city: "Lexington",
display_address: [
"1881 Eastland Pwky",
"Lexington, KY 40505"
],
geo_accuracy: 8,
postal_code: "40505",
country_code: "US",
address: [
"1881 Eastland Pwky"
],
coordinate: {
latitude: 38.054117,
longitude: -84.439002
},
state_code: "KY"
}
}
]
}
The JSON is stored in a ruby string called #stuff
Here is the code I use to try and parse it:
#parsed_stuff = JSON::parse(#stuff)
When i do this and try and display the contents of # parsed_stuff, i get the following error in the browser
Parse error on line 2: { "region"=>{ "span -------------^ Expecting '}', ':', ',', ']'
Any help given on this issue will be highly appreciated.
Use jsonlint for validating JSON. Here you have to give all keys as a string.
Try it
{
"region": {
"span": {
"latitude_delta": 0,
"longitude_delta": 0
},
"center": {
"latitude": 38.054117,
"longitude": -84.439002
}
},
"total": 23,
"businesses": [
{
"is_claimed": false,
"rating": 5,
"mobile_url": "http://m.yelp.com/biz/vineyard-community-church-lexington",
"rating_img_url": "http://s3-media1.ak.yelpcdn.com/assets/2/www/img/f1def11e4e79/ico/stars/v1/stars_5.png",
"review_count": 2,
"name": "Vineyard Community Church",
"snippet_image_url": "http://s3-media4.ak.yelpcdn.com/photo/VoeMtbk7NRFi6diksSUtOQ/ms.jpg",
"rating_img_url_small": "http://s3-media1.ak.yelpcdn.com/assets/2/www/img/c7623205d5cd/ico/stars/v1/stars_small_5.png",
"url": "http://www.yelp.com/biz/vineyard-community-church-lexington",
"phone": "8592582300",
"snippet_text": "I have been a member of Vineyard Community Church since 2004. Here you will find a modern worship service with a full band, witty speakers who teach...",
"image_url": "http://s3-media3.ak.yelpcdn.com/bphoto/D71eikniuaHjdOC8DB6ziA/ms.jpg",
"categories": [
[
"Churches",
"churches"
]
],
"display_phone": "+1-859-258-2300",
"rating_img_url_large": "http://s3-media3.ak.yelpcdn.com/assets/2/www/img/22affc4e6c38/ico/stars/v1/stars_large_5.png",
"id": "vineyard-community-church-lexington",
"is_closed": false,
"location": {
"city": "Lexington",
"display_address": [
"1881 Eastland Pwky",
"Lexington, KY 40505"
],
"geo_accuracy": 8,
"postal_code": "40505",
"country_code": "US",
"address": [
"1881 Eastland Pwky"
],
"coordinate": {
"latitude": 38.054117,
"longitude": -84.439002
},
"state_code": "KY"
}
}
]
}

Resources