How to fix different intent getting identified when input contains special characters - azure-language-understanding

In my LUIS application I have a 'Greeting' intent. The intent identified for 'hi' is 'Greeting' but for 'hi.......' some other intent is identified.
After training the 'hi.......' as 'Greeting' it gets identified as 'Greeting' correctly. There are some other variants too with special characters which need to be trained to make it work.
How do I make this to identify as Greeting without training with special characters?
This is being used in Microsoft Bot Framework v3 in C#

You can either train your LUIS model with all possible variations that include special characters or you can strip out all of the special characters before you send it to LUIS. I would recommend the latter. Here is an example of how you would do that in Node.
turnContext.activity.text = turnContext.activity.text.replace(/[^a-zA-Z ]/g, "", "");
Hope this helps!

Related

Rasa RegexFeaturizer is it based on token or whole sentence?

- regex: regex features for intent classification
examples: |
- \bon road pric/i
- \bonroad pric/i
I have tested above regex and they are working fine. Hence I am sure there is no issue with regex expression
Example:
training-row-1] Please tell me on road price now.
training-row-2] Please tell me price now.
Based on above regex pattern, regex features which should get added are:
training-row-1] Please tell me on road price now. ==> TRUE (because regex match)
training-row-2] Please tell me price now. ==> FALSE (regex don't match)
My question is, In RegexFeaturizer, does regex match happens on whole sentence or on each token?
It make sense to have it on whole sentence.
Is above featurization which I have assumed is correct or no?
I've found the following docstring in the code for the RegexFeaturizer.
"""
Given a sentence, returns a vector of {1,0} values indicating which
regexes did match. Furthermore, if the message is tokenized, the
function will mark all tokens with a dict relating the name of the
regex to whether it was matched.
"""
So I think it's taking the entire sentence as input. It's hard to see inside of the feature space in Rasa but I've confirmed that the correct entity is picked up across tokens when using the RegexEntityExtractor. This is easily verified by temporarily adding entity examples in your NLU data (make sure it appears at least twice in intents) and running rasa interactive.

Pre-process user utterances in bot before forwarding them to LUIS

I build a bot in German language which should understand Swiss number formats:
English format for 1Mio: 1,000,000
German format for 1Mio: 1.000.000
Swiss format for 1Mio: 1'000'000
Unfortunately LUIS has no Swiss culture and will therefore not correctly understand 1'000'000 with builtin number entity. So my idea is to pre-process the user utterances before forwarding it to LUIS as follows: If I see a Swiss thousand separator (i.e. ') with at least one digit on the left and 3 digits on the right, then remove the Swiss thousand separator from the utterance before forwarding it to LUIS... and LUIS will then correctly recognize it because the numbers are cleaned of thousand separators.
Has anyone an idea how to do this in the bot? Or better in the middleware? I am new to BotFramework and pretty much lost.
Thanks!
Yes, you can modify the activity before you pass it to LUIS. You just need to come up with the appropriate regex to find and replace the '. For example, here's a bot where I'm updating this as part of the onTurn function, updated with a regex replace that I think will work for you (in nodejs):
async onTurn(context) {
if (context.activity.type === ActivityTypes.Message) {
context.activity.text = context.activity.text.replace(/(?<=\d{1})'(?=\d{3})/g,'')
const dc = await this.dialogs.createContext(context);
const results = await this.luisRecognizer.recognize(context);
The regex here is looking for the ' character preceeded by one digit (it's ok if it's more than one like in the middle of the number) and followed by 3 digits. You'd actually probably be ok with just /'(?=\d{3})/g which is a ' followed by three digits.
Same applies if you are using C# or a different turn handler, you just need to modify the activity.text before you pass it to LUIS.

Not able to add new intent in LUIS with ":"

My whole application has intents with name having ":" .But now when i am trying to add new intent,its giving me error "BadArgument: Intent and entity name cannot contain the character ":" or "$" "
Welcome to Stack Overflow. Unfortunately, these special characters should not be used in intent names because they are reserved for other uses. An example of how the colon is used is in entity roles in patterns. My recommendation is to rename your intents. I also recommend storing the intent names in your application as constant string resources so that the values can be easily changed.

How do I create a multiline bot response in Rasa Core?

Can anyone help how to get bot responses in multiple lines.
Also how to get bullets in the Bot responses. I tried with >, * , enter key and also. Nothing seem to work. Does Rasa response templates support HTML tags?
The visualization of the message depends on the output channel which you are using.
Hence, it should be possible to provide HTML tags in your bots answers as long as your output channel can then correctly render it. For a simple newline, please try adding a "\n" in your messages, e.g.:
utter_message:
- text: "First line\nSecond line\Third line"
You can also have a multiline string in your yaml file which then results in a string containing newlines (see here for examples). The block below is the same as the example above:
utter_message:
- text: >
First line
Second line
Third line
To include bullets, you could simply add the unicode character of a bullet, e.g.:
utter_message:
- text: >
• First line
• Second line
• Third line
I think newlines doesn't correspond to "multiple bot responses" (that I interpret with multiple boxes on a instant messaging/caht channel. It's so in Telegram, by example. So I fair #Tobias solution isn't definitive.
A solution to have separate box messages could be to split the original single utterance in a sequence of utterances to be inserted afterward in a "story" as described in this RASA forum reply: https://forum.rasa.com/t/split-utterances-templates-into-multiple-answers/1204/2?u=solyarisoftware
That's more a workaround but that's debatable from the conversational design perspective. Maybe I want different boxes not just for a text pretty printing with newlines, but to communicate different semantics.
For example, if the user say:
Hello
The bot could reply answering the greet and also introducing a new question/prompt to let the dialog continue.
And that could deserves a new box, for a sequence of 2 boxes.
So bot reply could better be:
Hello!
How are you?

LUIS issues with special characters

(TEXT) is converted to ( TEXT ) in LUIS when we identify an entity name.
Issues with special characters.
Refer the image in below:
Here monthly iq dashboard hospitalists is converted to reportname --> "monthly iq dashboard ( hospitalists )" in Entities. So when we use this entity in bot framework we are facing issues while comparing to actual report name stored in Metadata (database).
(TEXT) is converted to ( TEXT ) in LUIS when we identify an entity name. Issues with special characters.
The issue you reported seems be that whitespace is added when some special characters are using, I reproduced the issue on my side, and I find similar issues are reported by others:
LUIS inserts whitespace in utterances when punctuation present causing entity getting incorrectly parsed
LUIS cannot take care of special characters
when we use this entity in bot framework we are facing issues while comparing to actual report name stored in Metadata (database)
To solve it, as Nicolas R and NiteLordz mentioned in comments, you can try to handle that in your code. And to remove whitespace from ( hospitalists ), the following regex would be helpful.
Regex regex = new Regex(#"\(\s\w*\s\)");
input = Regex.Replace(input, regex.ToString(), c => c.Value.Replace(" ", ""));
Note: can reproduce the issue, and same issue will appear when we process something like URL that contains / and . etc

Resources