rasa__nlu how to train entity to capture dynamic values - rasa-nlu

am new to rasa_nlu am bulting a bot where it takes values that are dynamic i want the dynamic value to be captured by an entity
examples:
with notes please { bring your college marksheet }
with notes please { come prepared
where the string inside braces are dynamic values i need to capture them and make use of them. suggest me a way.
Thank you in advance

The Rasa component CRFEntityExtractor is able to generalize to different entity values. Just make sure you add enough training data, so that the additional random field can pick up the pattern.
The following example learns up to pick person names (you have to add more than these three examples, but this should give an idea).
## intent: greet_with_name
- Hi, I'm [Donald](name)
- Hello, it's [Steve](name)
- Hi, Im [Sam](name)
- ...

Related

how to handle spelling mistake(typos) in entity extraction in Rasa NLU?

I have few intents in my training set(nlu_data.md file) with sufficient amount of training examples under each intent.
Following is an example,
##intent: SEARCH_HOTEL
- find good [hotel](place) for me in Mumbai
I have added multiple sentences like this.
At the time of testing, all sentences in training file are working fine. But if any input query is having spelling mistake e.g, hotol/hetel/hotele for hotel keyword then Rasa NLU is unable to extract it as an entity.
I want to resolve this issue.
I am allowed to change only training data, also restricted not to write any custom component for this.
To handle spelling mistakes like this in entities, you should add these examples to your training data. So something like this:
##intent: SEARCH_HOTEL
- find good [hotel](place) for me in Mumbai
- looking for a [hotol](place) in Chennai
- [hetel](place) in Berlin please
Once you've added enough examples, the model should be able to generalise from the sentence structure.
If you're not using it already, it also makes sense to use the character-level CountVectorFeaturizer. That should be in the default pipeline described on this page already
One thing I would highly suggest you to use is to use look-up tables with fuzzywuzzy matching. If you have limited number of entities (like country names) look-up tables are quite fast, and fuzzy matching catches typos when that entity exists in your look-up table (searching for typo variations of those entities). There's a whole blogpost about it here: on Rasa.
There's a working implementation of fuzzy wuzzy as a custom component:
class FuzzyExtractor(Component):
name = "FuzzyExtractor"
provides = ["entities"]
requires = ["tokens"]
defaults = {}
language_list ["en"]
threshold = 90
def __init__(self, component_config=None, *args):
super(FuzzyExtractor, self).__init__(component_config)
def train(self, training_data, cfg, **kwargs):
pass
def process(self, message, **kwargs):
entities = list(message.get('entities'))
# Get file path of lookup table in json format
cur_path = os.path.dirname(__file__)
if os.name == 'nt':
partial_lookup_file_path = '..\\data\\lookup_master.json'
else:
partial_lookup_file_path = '../data/lookup_master.json'
lookup_file_path = os.path.join(cur_path, partial_lookup_file_path)
with open(lookup_file_path, 'r') as file:
lookup_data = json.load(file)['data']
tokens = message.get('tokens')
for token in tokens:
# STOP_WORDS is just a dictionary of stop words from NLTK
if token.text not in STOP_WORDS:
fuzzy_results = process.extract(
token.text,
lookup_data,
processor=lambda a: a['value']
if isinstance(a, dict) else a,
limit=10)
for result, confidence in fuzzy_results:
if confidence >= self.threshold:
entities.append({
"start": token.offset,
"end": token.end,
"value": token.text,
"fuzzy_value": result["value"],
"confidence": confidence,
"entity": result["entity"]
})
file.close()
message.set("entities", entities, add_to_output=True)
But I didn't implement it, it was implemented and validated here: Rasa forum
Then you will just pass it to your NLU pipeline in config.yml file.
Its a strange request that they ask you not to change the code or do custom components.
The approach you would have to take would be to use entity synonyms. A slight edit on a previous answer:
##intent: SEARCH_HOTEL
- find good [hotel](place) for me in Mumbai
- looking for a [hotol](place:hotel) in Chennai
- [hetel](place:hotel) in Berlin please
This way even if the user enters a typo, the correct entity will be extracted. If you want this to be foolproof, I do not recommend hand-editing the intents. Use some kind of automated tool for generating the training data. E.g. Generate misspelled words (typos)
First of all, add samples for the most common typos for your entities as advised here
Beyond this, you need a spellchecker.
I am not sure whether there is a single library that can be used in the pipeline, but if not you need to create a custom component. Otherwise, dealing with only training data is not feasible. You can't create samples for each typo.
Using Fuzzywuzzy is one of the ways, generally, it is slow and it doesn't solve all the issues.
Universal Encoder is another solution.
There should be more options for spell correction, but you will need to write code in any way.

How to perform Sorting using flow steps in WebMethods (Software AG)?

I am new to this Middleware and I tried my level best to perform sorting using the flow steps in designer but couldn't make it.Can anybody help me out by giving me direction for how to complete my work?(like the flow steps in order and where i can put the conditions and all)
Thanks.
No need to over-complicate it - use the utilities - pub.document:sortDocuments is what you are looking for.
If you receive stringList as input - convert this into a documentList. This can be done using pub.list:stringListToDocumentList (set the key to 'value')
Use pub.document:sortDocuments to sort the documentList. Remember to specify the key as 'value' once again and compareStringAs as 'numeric'. The order can also be set (ascending/descending)
What do you want to Sort? For Document-Lists you will find a built-in services in the WmPublic Folder.
For String-Lists i would use a Java-Service for Sorting.
Logic behind Sorting in webMethods is same as all other languages. You need LOOP to iterate every string in stringList, BRANCH to compare the two number and then map to the compare result to new StringList.
What format do you have the numbers in? Are they in a flat file or in a string list etc.

how to group observations in stata

I'm a beginner with stata so this question might be easy for some of you.
I have a Dataset with Firmspecific data. One variable is Branche which contains the following lines of business: Consumer, Utilities, Food/Beverage, Technology, Logistics/Transportation, Retail, Insurance etc.
Now I want to form groups, for example the group Consumer which should contain Retail, Food/Beverages, Consumer but with the command generate Consumer = Consumer Retail Food/Beverages it doesn't work. Does anyone know what the right command would be? Thanks!
You can use user-written string recode command strrec:
ssc install strrec
strrec Branche ("Consumer" "Retail" "Food/Beverage" = 1 "Consumer"), gen(trunk)
You will need to add additional categories as you see fit. This creates a new variable, trunk, that has labeled integer(s).
You can refer to particular trunks like this:
list if trunk == 1
list if trunk == "Consumer":trunk
The reason I used an integer with value labels rather than a string is that some of the panel data commands do not like string IDs. I am guessing you are headed that route.

Report Builder Expressions

Im new to Report Builder and having issues with some expressions that Im trying to implement in a report. I got the standard ones to work however as soon as I try any distinctions, I get error messages. Over the last couple weeks, Ive tried many combinations, read the expression help, google and looking at other questions at internet sites. To reduce my frustrations, I even would jump to other expressions and walk away hoping I would have different insight coming back.
Its probably something simple or something I dont know about writing expressions.
Im hoping that someone can help with these expressions; they are the versions I get the least errors with(usually just expression expected) and show what Im trying to accomplish.
=IIF((Fields!RECORDFLAG.Value)='D',COUNTDISTINCT(Fields!TICKETNUM.Value),0)
=IIF((Fields!TRANSTYPE.Value)='1' and (Fields!RECORDFLAG.VALUE)='A' or
'B',SUM(Fields!DOLLARS.Value),0)
=IIF((Fields!TRANSTYPE.Value)='1' and
(Fields!RECORDFLAG.VALUE)='P',SUM(Fields!DOLLARS.Value),0)
=Sum([DOLLARS] case when [RECORDFLAG]='P' then -1*[DOLLARS])
Thank You.
=IIF((Fields!RECORDFLAG.Value)=”D”,COUNTDISTINCT(Fields!TICK‌​ETNUM.Value))
The error message gives you the answer here - no false part of the iif() has been specified. Use =IIF((Fields!RECORDFLAG.Value)=”D”,COUNTDISTINCT(Fields!TICK‌​ETNUM.Value), 0)
=IIF((Fields!TRANSTYPE.Value)="1" and (Fields!RECORDFLAG.VALUE)="A" or "B",SUM(Fields!DOLLARS.Value),0)
This is not how an OR works in SSRS. Use:
=IIF((Fields!TRANSTYPE.Value)="1" and (Fields!RECORDFLAG.VALUE="A" or Fields!RECORDFLAG.Value = "B"),SUM(Fields!DOLLARS.Value),0)
The 0s are returned due to your report design. countdistinct() is an aggregate function - it's meant to be used on a set of data. However, your iif() is only testing on a per row basis - you're basically saying "if the current row is thing, count all the distinct values" which doesn't make sense. There are a couple of ways forward:
You can count the number of times a certain value occurs in a given condition using a sum(). This is not the same as the countdistinct(), but if you use =sum(iif(Fields!RECORDFLAG.Value = "D", 1, 0)) then you will get the number of times RECORDFLAG is D in that set. Note: this requires the data to be aggregated (so in SSRS, grouped in a tablix).
You can use custom code to count distinct values in a set. See https://itsalocke.com/aggregate-on-a-lookup-in-ssrs/. You can apply this even if you have only one dataset - just reference the same one twice.
You can change the way your report works. You can group on Fields!RECORDFLAG.Value and filter the group to where Fields!RECORDFLAG.Value = "D". Then in your textbox, use =countdistinct(Fields!TICKETNUM.Value) to get the distinct values for TICKETNUM when RECORDFLAG is D.

SPMETAL / LINQ to SharePoint Decimal Types

I've hit a pretty major snag with the entities generated by spmetal / linq to sharepoint. I am hoping someone has dealt with this before.. or maybe I am missing something obvious.
Let's say we have a list with a number field. The field will be expected to hold reasonably precise values.. for example, 0.0000451. Once the value is in the list- SharePoint is fine with it. It displays in the list and display/edit views correctly.
Now if we generate entities based on this list with spmetal, we will get..
//...
private System.Nullable<double> _number;
//..
[Microsoft.SharePoint.Linq.ColumnAttribute(Name="Number", Storage="_number", Required=true, FieldType="Number")]
public System.Nullable<double> Number {
get {
return this._number;
}
set {
if ((value != this._number))
{
this.OnPropertyChanging("Number", this._number);
this._number= value;
this.OnPropertyChanged("Number");
}
}
}
//...
Since the type determined by spmetal is doublewe get notation when trying to retrieve it.. for example:
var number = (from x in myDc.MyList select x.Number).First();
number would actually result in a double of 4.51E-05, not 0.0000451.
I am assuming this can be fixed by using a decimal. If I change the types throughout the generated entities to System.Nullable<decimal> I get type conversion failures.
How should I fix this?
EDIT I think maybe it is better to ask "how should I deal with this"? for example, I can simply convert my double values to decimal later on down the line.. my linq query, for example. If I do that, the example case would return the expected result. That seems clunky, though, and I'd like to correct this at the source.
There are several cases like this where SPMetal will give you clunky code. You can, and sometimes have to, fix that. And I admit, it definitely feels better to do it at the source.
But there is a downside.
When your data model changes you will have to re-run SPMetal to incorporate your new entities. Any changes you made to the generated file will have to be carefully documented and re-done, or your code will be broken. Therefore, I would advise to leave the generated code alone if you can work with it.
If you can write a wrapper around the objects/methods it would of course be preferable to just converting the types at the end-point, but that's general good programming practice.
4.51E-05 actually equals 0.0000451 so there is nothing wrong with your code.
In other words 4.51E-05 means 4.51 times ten to the minus five power, or 0.0000451

Resources