Latest Rasa core model does not detect any entities - rasa-nlu

I followed the Rasa masterclass and have the following setup:
data/nlu.md:
## intent:search_provider
- I want to go to a [hospital] (facilitytype)
- I am sick need to go to a [hospital] (facilitytype)
- Can you tell me how to get to a [hospital] (facilitytype)
rasa train:
2020-01-17 19:06:36 INFO rasa.model - Data (nlu-config) for NLU model changed.
Core stories/configuration did not change. No need to retrain Core model.
Training NLU model...
2020-01-17 19:06:36 INFO rasa.nlu.training_data.loading - Training data format of /var/folders/2p/c7zvhbtj4dz0p053y49fmr_h0000gp/T/tmphtg4ok5r/68c2a3993fe141b199b22b8cac047519_nlu.md is md
2020-01-17 19:06:36 INFO rasa.nlu.training_data.training_data - Training data stats:
- intent examples: 50 (8 distinct intents)
- Found intents: 'goodbye', 'affirm', 'inform', 'search_provider', 'mood_unhappy', 'greet', 'mood_great', 'deny'
**- entity examples: 0 (0 distinct entities)
- found entities:**
When I run the nlp model, it detects the right intent, but cannot extract any entities, not sure what I am missing :
Next message:
I want to go to a hospital
{
"intent": {
"name": "search_provider",
"confidence": 0.9632793664932251
},
"entities": [],
My pipeline has the following line in the config.yml:
pipeline: supervised_embeddings

#user12735193 Can you specify the exact Rasa version you are using?
Also, please don't add a space between the [entity value] and (entity type). It should be like this - [entity value](entity type) and not [entity value] (entity type). I think that should fix it for you.

As mentioned by #dragster, there shouldn't be a space between entity value and type:
## intent:inform
- [NEW YORK](city)
- I am going home to [Detroit](city)

Related

Could not convert parameter `tx` between node and runtime: No such variant in enum MultiSignature

Hi I am getting the below error in polkadot-js, when I am trying to transfer the balance from Alice to Dave (or any other transfer).
Error :
balances.transferKeepAlive
1002: Verification Error: Execution: Could not convert parameter tx between node and runtime: No such variant in enum MultiSignature: RuntimeApi, Execution: Could not convert parameter tx between node and runtime: No such variant in enum MultiSignature
Please refer the screen shot in the below :
Screen Shot
You are missing some data types on your UI, adding this in developer settings will do the job.
{
"Address": "MultiAddress",
"LookupSource": "MultiAddress"
}
https://polkadot.js.org/docs/api/FAQ#the-node-returns-a-could-not-convert-error-on-send
{
"Address": "MultiAddress",
"LookupSource": "MultiAddress"
}
Worked for me on the Substrate - Forkless Upgrade a Chain tutorial: https://substrate.dev/docs/en/tutorials/forkless-upgrade/sudo-upgrade
https://polkadot.js.org/apps/?rpc=ws%3A%2F%2F127.0.0.1%3A9944#/settings/developer -> Settings - Developer - change the JSON to match the above -> Save, that's it.

Dumping Beam-scores to an HDF File

I have a translation model (TM), which synthesizes its hypotheses using beam-search. For analysis purposes, I would like to study all hypotheses in each beam emitted by the TM’s ChoiceLayer. I’m able to fetch the hypotheses for each input sequence from the TM’s ChoiceLayer and write it to my file system, using the HDFDumpLayer:
'__SEARCH_dump_beam__': {
'class': 'hdf_dump',
'from': ['output'],
'filename': "<my-path>/beams.hdf",
'is_output_layer': True
}
But beside the hypotheses, I would also like to store the score of each hypothesis. I’m able to fetch the beam scores from the ChoiceLayer using a ChoiceGetBeamScoresLayer, but I was not able to dump the scores using an HDFDumpLayer:
'get_scores': {'class': 'choice_get_beam_scores', 'from': ['output']},
'__SEARCH_dump_scores__': {
'class': 'hdf_dump',
'from': ['get_scores'],
'filename': "<my-path>/beam_scores.hdf",
'is_output_layer': True
}
Running the config like likes this, makes RETURNN complain about the ChoiceGetBeamScoresLayer output not having a time axis:
Exception creating layer root/'__SEARCH_dump_scores__' of class HDFDumpLayer with opts:
{'filename': '<my-path>/beam_scores.hdf',
'is_output_layer': True,
'name': '__SEARCH_dump_scores__',
'network': <TFNetwork 'root' train=False search>,
'output': Data(name='__SEARCH_dump_scores___output', shape=(), time_dim_axis=None, beam=SearchBeam(name='output/output', beam_size=12, dependency=SearchBeam(name='output/prev:output', beam_size=12)), batch_shape_meta=[B]),
'sources': [<ChoiceGetBeamScoresLayer 'get_scores' out_type=Data(shape=(), time_dim_axis=None, beam=SearchBeam(name='output/output', beam_size=12, dependency=SearchBeam(name='output/prev:output', beam_size=12)), batch_shape_meta=[B])>]}
Unhandled exception <class 'AssertionError'> in thread <_MainThread(MainThread, started 139964674299648)>, proc 31228.
...
File "<...>/returnn/repository/returnn/tf/layers/basic.py", line 6226, in __init__
line: assert self.sources[0].output.have_time_axis()
locals:
self = <local> <HDFDumpLayer '__SEARCH_dump_scores__' out_type=Data(shape=(), time_dim_axis=None, beam=SearchBeam(name='output/output', beam_size=12, dependency=SearchBeam(name='output/prev:output', beam_size=12)), batch_shape_meta=[B])>
self.sources = <local> [<ChoiceGetBeamScoresLayer 'get_scores' out_type=Data(shape=(), time_dim_axis=None, beam=SearchBeam(name='output/output', beam_size=12, dependency=SearchBeam(name='output/prev:output', beam_size=12)), batch_shape_meta=[B])>]
output = <not found>
output.have_time_axis = <not found>
AssertionError
I tried to alter the shape of the score data using ExpandDimsLayer and EvalLayer, with several different configurations, but those all lead to different errors.
I’m sure I am not the first person trying to dump beam scores. Can anybody tell me, how to do that properly?
To answer the question:
The HDFDumpLayer assert you are hitting is simply due to the fact that this is not implemented yet (support of dumping data without time axis).
You can create a pull request and add this support for HDFDumpLayer.
If you just want to dump this information in any way, not necessarily via HDFDumpLayer, there are a couple of other options:
With task="search", as the search_output_layer, just select the layer which still includes the beam information, i.e. not after a DecideLayer.
That will simply dump all hypotheses including their beam scores.
Use a custom EvalLayer and dump it in whatever way you want.

Sinhala entity classifications

UserWarning: Misaligned entity annotation in message ‘කමල්’ with intent ‘username’. Make sure the start and end values of entities in the training data match the token boundaries (e.g. entities don’t include trailing whitespaces or punctuation).Make sure the start and end values of entities in the training data match the token boundaries (e.g. entities don’t include trailing whitespaces or punctuation).
I am using Sinhala language and also I am using rasa open source .
This is my nlu part
{
"text": "කමල්",
"intent": "username",
"entities": [
{
"start": 1,
"end": 8,
"value": "කමල්",
"entity": "uname"
}
]
},
Config.yml
# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: en
pipeline:
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: "char_wb"
min_ngram: 1
max_ngram: 4
- name: DIETClassifier
epochs: 100
- name: EntitySynonymMapper
- name: ResponseSelector
epochs: 100
# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
policies:
- name: MemoizationPolicy
- name: TEDPolicy
max_history: 5
epochs: 100
- name: MappingPolicy
You're using the WhitespaceTokenizer which will split the text into tokens if there is a space between characters. It seems that the text that you provide (apologies, I do not recognize the language) does not split tokens using space. That is why the entire text is seen as a single token.
It looks like you may need to find another tokenizer for your language. I don't know which tokenizer might apply though. Feel free to contribute to the discussion here.

Cloudwatch to Elasticsearch parse/tokenize log event before push to ES

Appreciate your help in advance.
In my scenario - Cloudwatch multiline logs needs to be shipped to elasticsearch service.
ECS--awslog->Cloudwatch---using lambda--> ES Domain
(Basic flow though very open to change how data is shipped from CW to ES )
I was able to solve multi-line issue using multi_line_start_pattern BUT
The main issue I am experiencing now - is my logs have ODL format (following format)
[yyyy-mm-ddThh:mm:ss.SSS-Z][ProductName-Version][Log Level]
[Message ID][LoggerName][Key Value Pairs][[
Message]]
AND I will like to parse and tokenize log events before storing in ES (vs the complete log line ).
For example:
[2018-05-31T11:08:49.148-0400] [glassfish 4.1] [INFO] [] [] [tid: _ThreadID=43 _ThreadName=Thread-8] [timeMillis: 1527692929148] [levelValue: 800] [[
[] INFO : (DummyApplicationFunctionJPADAO) EntityManagerFactory located under resource lookup name [null], resource name=AuthorizationPU]]
Needs to be parsed and tokenize using format
timestamp 2018-05-31T11:08:49.148-0400
ProductName-Version glassfish 4.1
LogLevel INFO
MessageID
LoggerName
KeyValuePairs tid: _ThreadID=43 _ThreadName=Thread-8
Message [] INFO : (DummyApplicationFunctionJPADAO)
EntityManagerFactorylocated under resource lookup name
[null], resource name=AuthorizationPU
In above Key Value pairs repeat and are variable - for simplicity I can store all as one long string.
As far as what I gathered about Cloudwatch - It seems Subscription Filter Pattern reg ex support is very limited really not sure how to fit the above pattern. For lambda function that pushes the data to ES have not seen AWS doc or examples that support lambda as means to parse and push for ES.
Will appreciate if someone can please guide what/where will be best option to parse CW logs before it gets into ES => Subscription Filter -Pattern vs in lambda function or any other way.
Thank you .
From what I can see your best bet is what you're suggesting, a CloudWatch log triggered lambda that reformats the logged data into your ES prefered format and then posts it into ES.
You'll need to subscribe this lambda to your CloudWatch logs. You can do this on the lambda console, or the cloudwatch console (https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/Subscriptions.html).
The lambda's event payload will be: { "awslogs": { "data": "encoded-logs" } }. Where encoded-logs is a Base64 encoding of a gzipped JSON.
For example, the sample event (https://docs.aws.amazon.com/lambda/latest/dg/eventsources.html#eventsources-cloudwatch-logs) can be decoded in node, for example, using:
const zlib = require('zlib');
const data = event.awslogs.data;
const gzipped = Buffer.from(data, 'base64');
const json = zlib.gunzipSync(gzipped);
const logs = JSON.parse(json);
console.log(logs);
/*
{ messageType: 'DATA_MESSAGE',
owner: '123456789123',
logGroup: 'testLogGroup',
logStream: 'testLogStream',
subscriptionFilters: [ 'testFilter' ],
logEvents:
[ { id: 'eventId1',
timestamp: 1440442987000,
message: '[ERROR] First test message' },
{ id: 'eventId2',
timestamp: 1440442987001,
message: '[ERROR] Second test message' } ] }
*/
From what you've outlined, you'll want to extract the logEvents array, and parse this into an array of strings. I'm happy to give some help on this too if you need it (but I'll need to know what language you're writing your lambda in- there are libraries for tokenizing ODL- so hopefully it's not too hard).
At this point you can then POST these new records directly into your AWS ES Domain. Somewhat crypitcally the S3-to-ES guide gives a good outline of how to do this in python: https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-aws-integrations.html#es-aws-integrations-s3-lambda-es
You can find a full example for a lambda that does all this (by someone else) here: https://github.com/blueimp/aws-lambda/tree/master/cloudwatch-logs-to-elastic-cloud

API Blueprint and Dredd - Required field missing from response, but tests still pass

I am using a combination of API Blueprint and Dredd to test an API my application is dependent on. I am using attributes in API blueprint to define the structure of the response's body.
Apparently I'm missing something though because the tests always pass even though I've purposefully defined a fake "required" parameter that I know is missing from the API's response. It seems that Dredd is only testing whether the type of the response body (array) rather than the type and the parameters within it.
My API Blueprint file:
FORMAT: 1A
HOST: http://somehost.net
# API Title
## Endpoints [GET /endpoint/{date}]
+ Parameters
+ date: `2016-09-01` (string, required) - Date
+ Response 200 (application/json; charset=utf-8)
+ Attributes (array[Data])
## Data Structures
### Data
- realParameter: 2432432 (number)
- realParameter2: `some string` (string, required)
- realParameter3: `Something else` (string, required)
- realParameter4: 1 (number, required)
- fakeParam: 1 (number, required)
The response body:
[
{
"realParameter": 31,
"realParameter2": "some value",
"realParameter3": "another value",
"realParameter4": 8908
},
{
"realParameter": 54,
"realParameter2": "something here",
"realParameter3": "and here too",
"realParameter4": 6589
}
]
And my Dredd config file:
reporter: apiary
custom:
apiaryApiKey: somekey
apiaryApiName: somename
dry-run: null
hookfiles: null
language: nodejs
sandbox: false
server: null
server-wait: 3
init: false
names: false
only: []
output: []
header: []
sorted: false
user: null
inline-errors: false
details: false
method: []
color: true
level: info
timestamp: false
silent: false
path: []
blueprint: myApiBlueprintFile.apib
endpoint: 'http://ahost.com'
Does anyone have any idea why Dredd ignores the fact that "fakeParameter" doesn't actually show up in the response body and still allows the test to pass?
You've run into a limitation of MSON, the language API Blueprint uses for describing attributes. In many cases, MSON describes what MAY be present in the data structure rather than what MUST exactly be present.
The most prominent case are arrays, where basically any content of the array is optional and thus the underlying generated JSON Schema doesn't put any constraints on array contents. Dredd just respects that, so indirectly it becomes a Dredd issue too, however there's not much Dredd can do about it.
There's an issue for the problem: apiaryio/mson#66 You can follow and comment under the issue to get updated about this. Dredd is usually very prompt in getting the latest API Blueprint parser, so once it's implemented in the language itself, it won't take long to appear in Dredd.
Obvious (but tedious) workaround is to specify your own JSON Schema with stricter rules using the + Schema section alongside the + Attributes section.

Resources