I am new to rasa. I am training a model to recognize certain entities using lookup tables. I have multiple entities in the single sentence and I am trying to extract them.
nlu.yml
version: "2.0"
nlu:
- intent: intent_1
examples : |
- how many deaths were there last year in [Ohio](Filter-State)?
- death count of [Florida](Filter-State) this year
- death count of [Texas](Filter-State) this year
- what's the death count for this quarter in [CA](Filter-State)?
- lookup: Filter-State
examples: |
- Alabama
- AL
- Alaska
- AK
- Arizona
- AZ
- Arkansas
- AR
- California
- CA
- Colorado
- CO
- Connecticut
- CT
- Delaware
- DE
- District of Columbia
- DC
- Florida
- FL
- Georgia
- GA
config.yml
language: en
pipeline:
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: "char_wb"
min_ngram: 1
max_ngram: 4
- name: DIETClassifier
epochs: 150
random_seed: 1
- name: FallbackClassifier
threshold: 0.7
- name: DucklingEntityExtractor
url: http://duckling.rasa.com:8000
dimensions:
- email
- time
- name: EntitySynonymMapper
policies:
- name: AugmentedMemoizationPolicy
max_history: 4
- name: TEDPolicy
max_history: 4
epochs: 100
- name: RulePolicy
core_fallback_threshold: 0.4
core_fallback_action_name: "action_default_fallback"
enable_fallback_prediction: True
When I train the model and try using the api, It doesn't recognize cases from the states in the lookup table and as a result can't assign it to slot filter_state.
Can anyone advise me as to what am I doing wrong here for making the lookup table work!
I'm new to Rasa and searching for another issue, but I just ran into and solved this issue last night.
For lookup tables to work, you need to add "RegexEntityExtractor" to your pipeline and possibly remove RegexFeaturizer. You also need to enable lookup tables in the RegexEntityExtractor config.
config.yml
pipeline:
- name: WhitespaceTokenizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: RegexEntityExtractor
case_sensitive: False
use_lookup_tables: True
use_regexes: True
...
Could you, please, post on the Rasa forum and include more details of your setup? In particular, what version of Rasa Open Source are you using? Is the above your complete NLU data? (I think that you need at least 2 intents for an intent classifier to train.) I also recommend that you test the system using rasa interactive --debug and share e.g. a screenshot, this will help everyone see the exact input message and how it gets processed by Rasa. I'm sure we'll trace the issue down to its roots :-)
What's the best way to represent a hierarchical list of name value pairs like the following in YAML:
name_1: value_1
subName1_1: subValue1_1
subName1_2: subValue1_2
name_2: value_2
subName2_1: subValue2_1
subName2_2: subValue2_2
name_3: value_3
subName3_1: subValue3_1
subName3_2: subValue3_2
name_4: value_4
subName4_1: subValue4_1
subName4_2: subValue4_2
I am thinking of the following but not sure if this is the best way or not:
- name_1:
ID: 1
subNames:
- subName1_1:
ID: 1
- subName1_2:
ID: 2
- name_2:
ID: 2
subNames:
- subName2_1:
ID: 1
- subName2_2:
ID: 2
or I could also do:
- Name: Name_1
ID: 1
SubNames:
- SubName: subName1_1
ID: 1
- SubName: subName1_2
ID: 2
- Name: Name_2
ID: 2
SubNames:
- SubName: subName2_1
ID: 1
- SubName: subName2_2
ID: 2
I need the name_* to be unique as well as their corresponding values as well so I'd prefer something which python can easily consume to validate there are no duplicates.
Well there's the value key type. It's not part of the standard and defined for YAML 1.1, but it has been designed to solve this problem. It suggests you basically have a value in your mapping named = which contains the default value:
name_1:
=: value_1
subName1_1: subValue1_1
subName1_2: subValue1_2
name_2:
=: value_2
subName2_1: subValue2_1
subName2_2: subValue2_2
name_3:
=: value_3
subName3_1: subValue3_1
subName3_2: subValue3_2
name_4:
=: value_4
subName4_1: subValue4_1
subName4_2: subValue4_2
Alternatively, you could make the values a list with single key_value pairs:
name_1:
- value_1
- subName1_1: subValue1_1
- subName1_2: subValue1_2
name_2:
- value_2
- subName2_1: subValue2_1
- subName2_2: subValue2_2
name_3:
- value_3
- subName3_1: subValue3_1
- subName3_2: subValue3_2
name_4:
- value_4
- subName4_1: subValue4_1
- subName4_2: subValue4_2
You can write this with flow sequences since YAML allows flow sequences to contain single key-value pairs which will be interpreted as implicit mappings:
name_1: [value_1,
subName1_1: subValue1_1,
subName1_2: subValue1_2]
name_2: [value_2,
subName2_1: subValue2_1,
subName2_2: subValue2_2]
name_3: [value_3,
subName3_1: subValue3_1,
subName3_2: subValue3_2]
name_4: [value_4,
subName4_1: subValue4_1,
subName4_2: subValue4_2]
Be aware that when you do this, you can't have any kind of block-style nodes in the subnames, but other flow nodes will be fine.
I'm training dialoGPT on my own dataset, following this tutorial.
When I follow exactly the tutorial with the provided dataset I have no issues. I changed the example dataset. The only difference between the example and my code is that my dataset is 256397 lines long compared to the tutorial’s 1906 lines.
I am not sure if the error is pertaining to my column labels in my dataset or if its an issue in one of the text values on a particular row, or the size of my data.
06/12/2020 09:23:08 - WARNING - __main__ - Process rank: -1, device: cuda, n_gpu: 1, distributed training: False, 16-bits training: False
06/12/2020 09:23:10 - INFO - transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/microsoft/DialoGPT-small/config.json from cache at cached/c3a09526c725b854c685b72cf60c50f1fea9b0e4d6227fa41573425ef4bd4bc6.4c1d7fc2ac6ddabeaf0c8bec2ffc7dc112f668f5871a06efcff113d2797ec7d5
06/12/2020 09:23:10 - INFO - transformers.configuration_utils - Model config GPT2Config {
"activation_function": "gelu_new",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50256,
"embd_pdrop": 0.1,
"eos_token_id": 50256,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 1024,
"n_embd": 768,
"n_head": 12,
"n_layer": 12,
"n_positions": 1024,
"resid_pdrop": 0.1,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"vocab_size": 50257
}
06/12/2020 09:23:11 - INFO - transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/microsoft/DialoGPT-small/config.json from cache at cached/c3a09526c725b854c685b72cf60c50f1fea9b0e4d6227fa41573425ef4bd4bc6.4c1d7fc2ac6ddabeaf0c8bec2ffc7dc112f668f5871a06efcff113d2797ec7d5
06/12/2020 09:23:11 - INFO - transformers.configuration_utils - Model config GPT2Config {
"activation_function": "gelu_new",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50256,
"embd_pdrop": 0.1,
"eos_token_id": 50256,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 1024,
"n_embd": 768,
"n_head": 12,
"n_layer": 12,
"n_positions": 1024,
"resid_pdrop": 0.1,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"vocab_size": 50257
}
06/12/2020 09:23:11 - INFO - transformers.tokenization_utils - Model name 'microsoft/DialoGPT-small' not found in model shortcut name list (gpt2, gpt2-medium, gpt2-large, gpt2-xl, distilgpt2). Assuming 'microsoft/DialoGPT-small' is a path, a model identifier, or url to a directory containing tokenizer files.
06/12/2020 09:23:15 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/microsoft/DialoGPT-small/vocab.json from cache at cached/78725a31b87003f46d5bffc3157ebd6993290e4cfb7002b5f0e52bb0f0d9c2dd.1512018be4ba4e8726e41b9145129dc30651ea4fec86aa61f4b9f40bf94eac71
06/12/2020 09:23:15 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/microsoft/DialoGPT-small/merges.txt from cache at cached/570e31eddfc57062e4d0c5b078d44f97c0e5ac48f83a2958142849b59df6bbe6.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
06/12/2020 09:23:15 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/microsoft/DialoGPT-small/added_tokens.json from cache at None
06/12/2020 09:23:15 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/microsoft/DialoGPT-small/special_tokens_map.json from cache at None
06/12/2020 09:23:15 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/microsoft/DialoGPT-small/tokenizer_config.json from cache at None
06/12/2020 09:23:19 - INFO - filelock - Lock 140392381680496 acquired on cached/9eab12d0b721ee394e9fe577f35d9b8b22de89e1d4f6a89b8a76d6e1a82bceae.906a78bee3add2ff536ac7ef16753bb3afb3a1cf8c26470f335b7c0e46a21483.lock
06/12/2020 09:23:19 - INFO - transformers.file_utils - https://cdn.huggingface.co/microsoft/DialoGPT-small/pytorch_model.bin not found in cache or force_download set to True, downloading to /content/drive/My Drive/Colab Notebooks/cached/tmpj1dveq14
Downloading: 100%
351M/351M [00:34<00:00, 10.2MB/s]
06/12/2020 09:23:32 - INFO - transformers.file_utils - storing https://cdn.huggingface.co/microsoft/DialoGPT-small/pytorch_model.bin in cache at cached/9eab12d0b721ee394e9fe577f35d9b8b22de89e1d4f6a89b8a76d6e1a82bceae.906a78bee3add2ff536ac7ef16753bb3afb3a1cf8c26470f335b7c0e46a21483
06/12/2020 09:23:32 - INFO - transformers.file_utils - creating metadata file for cached/9eab12d0b721ee394e9fe577f35d9b8b22de89e1d4f6a89b8a76d6e1a82bceae.906a78bee3add2ff536ac7ef16753bb3afb3a1cf8c26470f335b7c0e46a21483
06/12/2020 09:23:33 - INFO - filelock - Lock 140392381680496 released on cached/9eab12d0b721ee394e9fe577f35d9b8b22de89e1d4f6a89b8a76d6e1a82bceae.906a78bee3add2ff536ac7ef16753bb3afb3a1cf8c26470f335b7c0e46a21483.lock
06/12/2020 09:23:33 - INFO - transformers.modeling_utils - loading weights file https://cdn.huggingface.co/microsoft/DialoGPT-small/pytorch_model.bin from cache at cached/9eab12d0b721ee394e9fe577f35d9b8b22de89e1d4f6a89b8a76d6e1a82bceae.906a78bee3add2ff536ac7ef16753bb3afb3a1cf8c26470f335b7c0e46a21483
06/12/2020 09:23:39 - INFO - transformers.modeling_utils - Weights of GPT2LMHeadModel not initialized from pretrained model: ['transformer.h.0.attn.masked_bias', 'transformer.h.1.attn.masked_bias', 'transformer.h.2.attn.masked_bias', 'transformer.h.3.attn.masked_bias', 'transformer.h.4.attn.masked_bias', 'transformer.h.5.attn.masked_bias', 'transformer.h.6.attn.masked_bias', 'transformer.h.7.attn.masked_bias', 'transformer.h.8.attn.masked_bias', 'transformer.h.9.attn.masked_bias', 'transformer.h.10.attn.masked_bias', 'transformer.h.11.attn.masked_bias']
06/12/2020 09:23:54 - INFO - __main__ - Training/evaluation parameters <__main__.Args object at 0x7fafa60a00f0>
06/12/2020 09:23:54 - INFO - __main__ - Creating features from dataset file at cached
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-12-523c0d2a27d3> in <module>()
----> 1 main(trn_df, val_df)
7 frames
<ipython-input-11-d6dfa312b1f5> in main(df_trn, df_val)
59 # Training
60 if args.do_train:
---> 61 train_dataset = load_and_cache_examples(args, tokenizer, df_trn, df_val, evaluate=False)
62
63 global_step, tr_loss = train(args, train_dataset, model, tokenizer)
<ipython-input-9-3c4f1599e14e> in load_and_cache_examples(args, tokenizer, df_trn, df_val, evaluate)
40
41 def load_and_cache_examples(args, tokenizer, df_trn, df_val, evaluate=False):
---> 42 return ConversationDataset(tokenizer, args, df_val if evaluate else df_trn)
43
44 def set_seed(args):
<ipython-input-9-3c4f1599e14e> in __init__(self, tokenizer, args, df, block_size)
24 self.examples = []
25 for _, row in df.iterrows():
---> 26 conv = construct_conv(row, tokenizer)
27 self.examples.append(conv)
28
<ipython-input-9-3c4f1599e14e> in construct_conv(row, tokenizer, eos)
1 def construct_conv(row, tokenizer, eos = True):
2 flatten = lambda l: [item for sublist in l for item in sublist]
----> 3 conv = list(reversed([tokenizer.encode(x) + [tokenizer.eos_token_id] for x in row]))
4 conv = flatten(conv)
5 return conv
<ipython-input-9-3c4f1599e14e> in <listcomp>(.0)
1 def construct_conv(row, tokenizer, eos = True):
2 flatten = lambda l: [item for sublist in l for item in sublist]
----> 3 conv = list(reversed([tokenizer.encode(x) + [tokenizer.eos_token_id] for x in row]))
4 conv = flatten(conv)
5 return conv
/usr/local/lib/python3.6/dist-packages/transformers/tokenization_utils.py in encode(self, text, text_pair, add_special_tokens, max_length, stride, truncation_strategy, pad_to_max_length, return_tensors, **kwargs)
1432 pad_to_max_length=pad_to_max_length,
1433 return_tensors=return_tensors,
-> 1434 **kwargs,
1435 )
1436
/usr/local/lib/python3.6/dist-packages/transformers/tokenization_utils.py in encode_plus(self, text, text_pair, add_special_tokens, max_length, stride, truncation_strategy, pad_to_max_length, is_pretokenized, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, **kwargs)
1574 )
1575
-> 1576 first_ids = get_input_ids(text)
1577 second_ids = get_input_ids(text_pair) if text_pair is not None else None
1578
/usr/local/lib/python3.6/dist-packages/transformers/tokenization_utils.py in get_input_ids(text)
1554 else:
1555 raise ValueError(
-> 1556 "Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers."
1557 )
1558
ValueError: Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers.
So i'm moving to writing my sass to the BEM convention. I've used the sass-lint configuration generator to create my config and only edited the class-name-format's - convention: to strictbem however I'm still having some issues with it.
Maybe I'm misinterpreting BEM?
Error:
[sass-lint] Class '.bus__tyre--front' should be written in BEM (Block
Element Modifier) format (class-name-format)
<element class="bus__tyre--front">
Sass:
.bus {
position: relative;
&__tyre {
position: absolute;
&--front {
bottom: -22px;
right: 3%;
width: 17%;
}
}
}
sass-lint.yml:
# sass-lint config generated by make-sass-lint-config v0.1.2
#
# The following scss-lint Linters are not yet supported by sass-lint:
# DisableLinterReason, ElsePlacement, PropertyCount, SelectorDepth
# SpaceAroundOperator, TrailingWhitespace, UnnecessaryParentReference, Compass::*
#
# The following settings/values are unsupported by sass-lint:
# Linter Indentation, option "allow_non_nested_indentation"
# Linter Indentation, option "character"
# Linter NestingDepth, option "ignore_parent_selectors"
# Linter PropertySortOrder, option "min_properties"
# Linter PropertySortOrder, option "separate_groups"
# Linter SpaceBeforeBrace, option "allow_single_line_padding"
# Linter VendorPrefix, option "identifier_list"
files:
include: '**/*.scss'
options:
formatter: stylish
merge-default-rules: false
rules:
bem-depth:
- 0
- max-depth: 1
border-zero:
- 1
- convention: zero
brace-style:
- 1
- allow-single-line: true
class-name-format:
- 1
- convention: strictbem
clean-import-paths:
- 1
- filename-extension: false
leading-underscore: false
empty-line-between-blocks:
- 1
- ignore-single-line-rulesets: true
extends-before-declarations: 1
extends-before-mixins: 1
final-newline:
- 1
- include: true
force-attribute-nesting: 1
force-element-nesting: 1
force-pseudo-nesting: 1
function-name-format:
- 1
- allow-leading-underscore: true
convention: hyphenatedlowercase
hex-length:
- 1
- style: short
hex-notation:
- 1
- style: lowercase
id-name-format:
- 1
- convention: hyphenatedlowercase
indentation:
- 1
- size: 2
leading-zero:
- 1
- include: false
mixin-name-format:
- 1
- allow-leading-underscore: true
convention: hyphenatedlowercase
mixins-before-declarations: 1
nesting-depth:
- 1
- max-depth: 3
no-color-keywords: 1
no-color-literals: 1
no-css-comments: 1
no-debug: 1
no-duplicate-properties: 1
no-empty-rulesets: 1
no-extends: 0
no-ids: 1
no-important: 1
no-invalid-hex: 1
no-mergeable-selectors: 1
no-misspelled-properties:
- 1
- extra-properties: []
no-qualifying-elements:
- 1
- allow-element-with-attribute: false
allow-element-with-class: false
allow-element-with-id: false
no-trailing-zero: 1
no-transition-all: 0
no-url-protocols: 1
no-vendor-prefixes:
- 1
- additional-identifiers: []
excluded-identifiers: []
placeholder-in-extend: 1
placeholder-name-format:
- 1
- convention: hyphenatedlowercase
property-sort-order:
- 1
- ignore-custom-properties: false
property-units:
- 1
- global:
- ch
- em
- ex
- rem
- cm
- in
- mm
- pc
- pt
- px
- q
- vh
- vw
- vmin
- vmax
- deg
- grad
- rad
- turn
- ms
- s
- Hz
- kHz
- dpi
- dpcm
- dppx
- '%'
per-property: {}
quotes:
- 1
- style: single
shorthand-values:
- 1
- allowed-shorthands:
- 1
- 2
- 3
single-line-per-selector: 1
space-after-bang:
- 1
- include: false
space-after-colon:
- 1
- include: true
space-after-comma:
- 1
- include: true
space-before-bang:
- 1
- include: true
space-before-brace:
- 1
- include: true
space-before-colon: 1
space-between-parens:
- 1
- include: false
trailing-semicolon: 1
url-quotes: 1
variable-for-property:
- 0
- properties: []
variable-name-format:
- 1
- allow-leading-underscore: true
convention: hyphenatedlowercase
zero-unit: 1
Judging by #335 and #319 in the scss-lint repo, it looks like you need to change:
class-name-format:
- 1
- convention: strictbem
to:
class-name-format:
- 1
- convention: hyphenatedbem
I hope that helps!
Instead of hyphenated_BEM it should be hyphenatedbem.
Example from the sass-lint docs
Okay, I have the following YAML file that was generated by using yaml_db for Rails. So this is basically an autogenerated export of my Rails database:
---
admins:
columns:
- id
- username
- email
- encrypted_password
- password_salt
- sign_in_count
- current_sign_in_at
- last_sign_in_at
- current_sign_in_ip
- last_sign_in_ip
- failed_attempts
- unlock_token
- locked_at
- created_at
- updated_at
records:
- - 1
-
- lalala#lalala.at
- $2a$10$dZU50HD6paWS7EjKuWAruOFdwt9eqxiNTRh/D4sj8cqSzy5gjYd2i
- $2a$10$dZU50HD6paWS7EjKuWAruO
- 86
- 2011-01-27 07:37:45 Z
- 2011-01-26 13:27:13 Z
- 12.34.56.78
- 12.34.56.78
- 0
-
-
- 2010-12-23 09:20:46 Z
- 2011-01-27 07:37:45 Z
- - 2
- admin
- lalalalala#lalalala.com
- $2a$10$3DML64hdCCvG90bnhIpN/unEEm6C.a9FqGrAFlFHU0.2D54DSQ1Ni
- $2a$10$3DML64hdCCvG90bnhIpN/u
- 1
- 2011-01-21 09:52:14 Z
- 2011-01-21 09:52:14 Z
- 12.34.56.78
- 12.34.56.78
- 0
-
-
- 2011-01-05 14:29:49 Z
- 2011-01-21 09:52:14 Z
---
experiments:
columns:
- id
- description
- startdate
- enddate
- maps_base_URI
- maps_count
- queries_count
- proposals_count
- created_at
- updated_at
.......
Now when I try to load this YAML file in Ruby with:
file = YAML.load(File.open("data-2011-01-27.yml"))
It doesn't load more than the first admin, not even the experiments:
ruby-1.9.2-p0 > file.keys
=> ["admins"]
ruby-1.9.2-p0 > file["admins"]["records"].count
=> 1
Why is that? I would assume that the autogenerated .yml file is syntactically correct? When I run rake:db:dump and rake:db:load it works just fine.
Three hyphens (---) separate multiple documents. See YAML.load_stream to load them all:
documents = YAML.load_stream(open("data-2011-01-27.yml")).documents
documents.map(&:keys)
#=> [["admins"], ["experiments"]]