Sinhala entity classifications - rasa-nlu

UserWarning: Misaligned entity annotation in message ‘කමල්’ with intent ‘username’. Make sure the start and end values of entities in the training data match the token boundaries (e.g. entities don’t include trailing whitespaces or punctuation).Make sure the start and end values of entities in the training data match the token boundaries (e.g. entities don’t include trailing whitespaces or punctuation).
I am using Sinhala language and also I am using rasa open source .
This is my nlu part
{
"text": "කමල්",
"intent": "username",
"entities": [
{
"start": 1,
"end": 8,
"value": "කමල්",
"entity": "uname"
}
]
},
Config.yml
# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: en
pipeline:
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: "char_wb"
min_ngram: 1
max_ngram: 4
- name: DIETClassifier
epochs: 100
- name: EntitySynonymMapper
- name: ResponseSelector
epochs: 100
# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
policies:
- name: MemoizationPolicy
- name: TEDPolicy
max_history: 5
epochs: 100
- name: MappingPolicy

You're using the WhitespaceTokenizer which will split the text into tokens if there is a space between characters. It seems that the text that you provide (apologies, I do not recognize the language) does not split tokens using space. That is why the entire text is seen as a single token.
It looks like you may need to find another tokenizer for your language. I don't know which tokenizer might apply though. Feel free to contribute to the discussion here.

Related

Sending messages to multiple elastic search indices

We are running an ELK stack to aggregate all our logs and we have multiple systems. Currently, we have Filebeat configured to log to specific indices based on the system (SystemA, SystemB, SystemC).
I would like to, additionally, send all logs with level ERROR to another index where I would like to collect all errors across systems, but somehow I can't figure out how to get Filebeat to send one message to multiple indices
According to the documentation, the first condition that matches will define the index to be used, which sounds to me as if it's not possible to send a message that would match multiple patterns to multiple indices?
What I want to do:
output.elasticsearch:
hosts: '${ELASTICSEARCH_HOSTS}'
username: '${ELASTICSEARCH_USERNAME}'
password: '${ELASTICSEARCH_PASSWORD}'
index: "filebeat-external-%{+yyyy.MM.dd}"
indices:
- index: "filebeat-error-logs-%{+yyyy.MM.dd}"
when:
or:
- equals:
level: "ERROR"
- equals:
level: "error"
- index: "filebeat-service-a-%{+yyyy.MM.dd}"
when:
regexp:
container.name: "^service-a-"
- index: "filebeat-service-b-%{+yyyy.MM.dd}"
when:
regexp:
container.name: "^service-b-"
The only way I currently see is to have multiple indices per system and aggregate them in Kibana:
output.elasticsearch:
hosts: '${ELASTICSEARCH_HOSTS}'
username: '${ELASTICSEARCH_USERNAME}'
password: '${ELASTICSEARCH_PASSWORD}'
index: "filebeat-external-%{+yyyy.MM.dd}"
indices:
- index: "error-log-service-a-%{+yyyy.MM.dd}"
when:
and:
- equals:
level: "ERROR"
- regexp:
container.name: "^service-a-"
- index: "service-log-service-a-%{+yyyy.MM.dd}"
when:
and:
- not:
- equals:
level: "ERROR"
- regexp:
container.name: "^service-a-"
But this would double our number of indices and is code duplication. Am I missing something here, is there an easier way to have a general error-index but still have errors go to the service-specific indices as well?

YAMLSyntaxError: Failed to resolve SEQ_ITEM node here at line X, column Y:

I'm getting this error when trying to run Netlify CMS
Error loading the CMS configuration
Config Errors:
YAMLSyntaxError: Failed to resolve SEQ_ITEM node here at line 10, column 1:
- name: Posts
^^^^^^^^^^^^^^…
Check your config.yml file.
I have checked the syntax and tried different syntax but I still get the same error for somewhere in the config.yml document.
This is the troubled config.yml document:
backend:
name: git-gateway
branch: master
media_folder: src/assets/images
media_library:
name: uploads
collections:
- name: Posts
label: Posts
create: true
folder: "/articles"
slug: articles/{{slug}}
fields:
- {label: Title, name: title, widget: string}
- {label: Publish Date, name: date, widget: datetime}
- {label: Featured Image, name: cover_image, widget: image}
- {label: Body, name: body, widget: markdown}
Here is a link to the files that I am getting the errors from https://drive.google.com/file/d/1OJPKJRgCljxAG5UuUxXkBPPNoUcyJe48/view?usp=sharing
The file you linked uses tabs for indentation. YAML uses spaces, see the spec:
In general, indentation is defined as a zero or more space characters at the start of a line.
To maintain portability, tab characters must not be used in indentation, since different systems treat tabs differently.
You need to convert the tabs to spaces.

Latest Rasa core model does not detect any entities

I followed the Rasa masterclass and have the following setup:
data/nlu.md:
## intent:search_provider
- I want to go to a [hospital] (facilitytype)
- I am sick need to go to a [hospital] (facilitytype)
- Can you tell me how to get to a [hospital] (facilitytype)
rasa train:
2020-01-17 19:06:36 INFO rasa.model - Data (nlu-config) for NLU model changed.
Core stories/configuration did not change. No need to retrain Core model.
Training NLU model...
2020-01-17 19:06:36 INFO rasa.nlu.training_data.loading - Training data format of /var/folders/2p/c7zvhbtj4dz0p053y49fmr_h0000gp/T/tmphtg4ok5r/68c2a3993fe141b199b22b8cac047519_nlu.md is md
2020-01-17 19:06:36 INFO rasa.nlu.training_data.training_data - Training data stats:
- intent examples: 50 (8 distinct intents)
- Found intents: 'goodbye', 'affirm', 'inform', 'search_provider', 'mood_unhappy', 'greet', 'mood_great', 'deny'
**- entity examples: 0 (0 distinct entities)
- found entities:**
When I run the nlp model, it detects the right intent, but cannot extract any entities, not sure what I am missing :
Next message:
I want to go to a hospital
{
"intent": {
"name": "search_provider",
"confidence": 0.9632793664932251
},
"entities": [],
My pipeline has the following line in the config.yml:
pipeline: supervised_embeddings
#user12735193 Can you specify the exact Rasa version you are using?
Also, please don't add a space between the [entity value] and (entity type). It should be like this - [entity value](entity type) and not [entity value] (entity type). I think that should fix it for you.
As mentioned by #dragster, there shouldn't be a space between entity value and type:
## intent:inform
- [NEW YORK](city)
- I am going home to [Detroit](city)

Can filebeat convert log lines output to json without logstash in pipeline?

We have standard log lines in our Spring Boot web applications (non json).
We need to centralize our logging and ship them to an elastic search as json.
(I've heard the later versions can do some transformation)
Can Filebeat read the log lines and wrap them as a json ? i guess it could append some meta data aswell. no need to parse the log line.
expected output :
{timestamp : "", beat: "", message: "the log line..."}
i have no code to show unfortunately.
filebeat supports several outputs including Elastic Search.
Config file filebeat.yml can look like this:
# filebeat options: https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-reference-yml.html
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/../file.err.log
processors:
- drop_fields:
# Prevent fail of Logstash (https://www.elastic.co/guide/en/beats/libbeat/current/breaking-changes-6.3.html#custom-template-non-versioned-indices)
fields: ["host"]
- dissect:
# tokenizer syntax: https://www.elastic.co/guide/en/logstash/current/plugins-filters-dissect.html.
tokenizer: "%{} %{} [%{}] {%{}} <%{level}> %{message}"
field: "message"
target_prefix: "spring boot"
fields:
log_type: spring_boot
output.elasticsearch:
hosts: ["https://localhost:9200"]
username: "filebeat_internal"
password: "YOUR_PASSWORD"
Well it seems to do it by default. this is my result when i tried it locally to read log lines. it wraps it exactly like i wanted.
{
"#timestamp":"2019-06-12T11:11:49.094Z",
"#metadata":{
"beat":"filebeat",
"type":"doc",
"version":"6.2.4"
},
"message":"the log line...",
"source":"/Users/myusername/tmp/hej.log",
"offset":721,
"prospector":{
"type":"log"
},
"beat":{
"name":"my-macbook.local",
"hostname":"my-macbook.local",
"version":"6.2.4"
}
}

What is `<<` and `&` in yaml mean?

When I review the cryptogen(a fabric command) config file . I saw there symbol.
Profiles:
SampleInsecureSolo:
Orderer:
<<: *OrdererDefaults ## what is the `<<`
Organizations:
- *ExampleCom ## what is the `*`
Consortiums:
SampleConsortium:
Organizations:
- *Org1ExampleCom
- *Org2ExampleCom
Above there a two symbol << and *.
Application: &ApplicationDefaults # what is the `&` mean
Organizations:
As you can see there is another symbol &.
I don't know what are there mean. I didn't get any information even by reviewing the source code (fabric/common/configtx/tool/configtxgen/main.go)
Well, those are elements of the YAML file format, which is used here to provide a configuration file for configtxgen. The "&" sign mean anchor and "*" reference to the anchor, this is basically used to avoid duplication, for example:
person: &person
name: "John Doe"
employee: &employee
<< : *person
salary : 5000
will reuse fields of person and has similar meaning as:
employee: &employee
name : "John Doe"
salary : 5000
another example is simply reusing value:
key1: &key some very common value
key2: *key
equivalent to:
key1: some very common value
key2: some very common value
Since abric/common/configtx/tool/configtxgen/main.go uses of the shelf YAML parser you won't find any reference to these symbols in configtxgen related code. I would suggest to read a bit more about YAML file format.
in yaml if data is like
user: &userId '123'
username: *userId
equivalent yml is
user: '123'
username: '123'
or
equivalent json will is
{
"user": "123",
"username": "123"
}
so it basically allows to reuse data, you can also try with array instead of single value like 123
try converting below yml to json using any yml to json online converter
users: &users
k1: v1
k2: v2
usernames: *users

Resources