How do I create a multiline bot response in Rasa Core? - rasa-nlu

Can anyone help how to get bot responses in multiple lines.
Also how to get bullets in the Bot responses. I tried with >, * , enter key and also. Nothing seem to work. Does Rasa response templates support HTML tags?

The visualization of the message depends on the output channel which you are using.
Hence, it should be possible to provide HTML tags in your bots answers as long as your output channel can then correctly render it. For a simple newline, please try adding a "\n" in your messages, e.g.:
utter_message:
- text: "First line\nSecond line\Third line"
You can also have a multiline string in your yaml file which then results in a string containing newlines (see here for examples). The block below is the same as the example above:
utter_message:
- text: >
First line
Second line
Third line
To include bullets, you could simply add the unicode character of a bullet, e.g.:
utter_message:
- text: >
• First line
• Second line
• Third line

I think newlines doesn't correspond to "multiple bot responses" (that I interpret with multiple boxes on a instant messaging/caht channel. It's so in Telegram, by example. So I fair #Tobias solution isn't definitive.
A solution to have separate box messages could be to split the original single utterance in a sequence of utterances to be inserted afterward in a "story" as described in this RASA forum reply: https://forum.rasa.com/t/split-utterances-templates-into-multiple-answers/1204/2?u=solyarisoftware
That's more a workaround but that's debatable from the conversational design perspective. Maybe I want different boxes not just for a text pretty printing with newlines, but to communicate different semantics.
For example, if the user say:
Hello
The bot could reply answering the greet and also introducing a new question/prompt to let the dialog continue.
And that could deserves a new box, for a sequence of 2 boxes.
So bot reply could better be:
Hello!
How are you?

Related

Pre-process user utterances in bot before forwarding them to LUIS

I build a bot in German language which should understand Swiss number formats:
English format for 1Mio: 1,000,000
German format for 1Mio: 1.000.000
Swiss format for 1Mio: 1'000'000
Unfortunately LUIS has no Swiss culture and will therefore not correctly understand 1'000'000 with builtin number entity. So my idea is to pre-process the user utterances before forwarding it to LUIS as follows: If I see a Swiss thousand separator (i.e. ') with at least one digit on the left and 3 digits on the right, then remove the Swiss thousand separator from the utterance before forwarding it to LUIS... and LUIS will then correctly recognize it because the numbers are cleaned of thousand separators.
Has anyone an idea how to do this in the bot? Or better in the middleware? I am new to BotFramework and pretty much lost.
Thanks!
Yes, you can modify the activity before you pass it to LUIS. You just need to come up with the appropriate regex to find and replace the '. For example, here's a bot where I'm updating this as part of the onTurn function, updated with a regex replace that I think will work for you (in nodejs):
async onTurn(context) {
if (context.activity.type === ActivityTypes.Message) {
context.activity.text = context.activity.text.replace(/(?<=\d{1})'(?=\d{3})/g,'')
const dc = await this.dialogs.createContext(context);
const results = await this.luisRecognizer.recognize(context);
The regex here is looking for the ' character preceeded by one digit (it's ok if it's more than one like in the middle of the number) and followed by 3 digits. You'd actually probably be ok with just /'(?=\d{3})/g which is a ' followed by three digits.
Same applies if you are using C# or a different turn handler, you just need to modify the activity.text before you pass it to LUIS.

Microsoft Botframework: How to use Telegram Parameters or sending NewLine?

I'm trying the new Botframework from Microsoft. When sending a message with \n there is no linebreak in the message. How can I solve that?
In the Telegram API there is an parameter called parse_mode (https://core.telegram.org/bots/api#formatting-options) to activate HTML. Than I could use "<br />" for that, but I don't know how to set this parameter. Can someone help me by sending linebreak or Telegram-parameters?
Greeting
Max
BotFramework uses Markdown. To represent a paragraph break you need to have a blank line (i.e. "\n\n")
Markdown like this:
This is
paragraph one
This is
paragraph two
Will be rendered as
This is paragraph one
This is paragraph two
See documentation at: http://docs.botframework.com/connector/message-content/#markdown-paragraphs

Yahoo pipes: how can I add an additional nodes/elements to RSS/feed items

I am merging two feeds using Yahoo pipes and using the output feed on a website. However, as would like to identify the "feed source" for each item in the output feed. Is it possible to manipulate the original feeds so I can add another node/element to the feed items?
Thanks
One way to do that is using the Regex operator. Let's say you want to add a new field called source. You could use Regex with parameters:
In: item.source
replace: .*
with: (the text you want)
See it in action here:
http://pipes.yahoo.com/janos/7a3b9993cfc143d414fe7b637b1bd95a
That is, I have two feeds, I added a source attribute in the first with value "Question 1" and in the second with value "Question 2".
As an added bonus interesting undocumented Yahoo Pipes hack, I used one more Regex after the Union to make the source appear in the title.
However, this only adds the attribute to the node in the pipe debugger. You can use it for further processing, like I added it here to the title, it won't create a <source> tag in the output. That's because the RSS output of Yahoo Pipes removes all other fields that are not in the RSS standard. You can still see it in the JSON output though.

Strip signatures and replies from emails

I'm currently working on a system that allows users to reply to notification emails that are sent out (sigh).
I need to strip out the replies and signatures, so that I'm left with the actual content of the reply, without all the noise.
Does anyone have any suggestions about the best way to do this?
If your system is in-house and/or you have a limited number of reply formats, it's possible to do a pretty good job. Here are the filters we have set up for email responses to trac tickets:
Drop all text after and including:
Lines that equal '-- \n' (standard email sig delimiter)
Lines that equal '--\n' (people often forget the space in sig delimiter; and this is not that common outside sigs)
Lines that begin with '-----Original Message-----' (MS Outlook default)
Lines that begin with '________________________________' (32 underscores, Outlook again)
Lines that begin with 'On ' and end with ' wrote:\n' (OS X Mail.app default)
Lines that begin with 'From: ' (failsafe four Outlook and some other reply formats)
Lines that begin with 'Sent from my iPhone'
Lines that begin with 'Sent from my BlackBerry'
Numbers 3 and 4 are 'begin with' instead of 'equals' because sometimes users will squash lines together on accident.
We try to be more liberal about stripping out replies, since it's much more of an annoyance (to us) have reply garbage than it is to correct missing text.
Anybody have other formats from the wild that they want to share?
Check out the email_reply_parser gem - https://github.com/github/email_reply_parser . It does a nice job handling this problem.
I don't believe you can do this reliably (signatures used to begin with '--' but I don't see that anymore). Perhaps you're better off asking people to reply inbetween text headers and then simply strip the reply from this ? It's not elegant, but perhaps more reliable.
e.g.
REPLY BETWEEN HERE -->
AND HERE -->
so you'd simply look for the required headers above and take what's inbetween.
If you want something powerful & robust, and don't mind reading academic publications, you might check out this:
Learning to Extract Signature and Reply Lines from Email
Here's the homepage for one of the authors, with more info & some downloads:
Vitor R. Carvalho - Software and Datasets - (Vitor Carvalho)
An approach that can be used for signature only (in addition to detect __ or --) is to test if the first name and/or family name of the sender is on a short line (~ containing 3 to 4 words, max).
The sender name is on the raw email header, most of the time next to the email address, like in:
From: John Doe <jdoe#provider.com>
This would be based on the assumption that you rarely write your own name in a email, and if you do so, it is probably in a long sentence.
Of course there will be some false positive, but it may not be a big problem depending on what you do (we use it to fold quoted text and signature into a ... gmail-style button, so overdetection does not end up into losing any content, it is just misplaced).
If you can assume that these emails are in plain text, just strip lines that begins with ">" as replies, and "-- " line should delimit signature. But those assumptions might not work, as not all people over internet use software that complies to rules.
There's a really nice PHP library dedicated to the email parsing
http://williamdurand.fr/EmailReplyParser/
https://github.com/willdurand/EmailReplyParser
I made one for golang: https://github.com/web-ridge/email-reply-parser it detects signatures like
Karen The Green
Graphic Designer
Office
Tel: +44423423423423
Fax: +44234234234234
karen#webby.com
Street 2, City, Zeeland, 4694EG, NL
www.thing.com
The content of this email is confidential and intended for the recipient specified in message only. It is strictly forbidden to share any part of this message with any third party, without a written consent of the sender. If you received this message by mistake, please reply to this message and follow with its deletion, so that we can ensure such a mistake does not occur in the future.
Met vriendelijke groeten,
Richard Lindhout
The recommended signature delimiter is "-- \n". If people follow this recommendation, stripping signatures should be easy.

Algorithm for re-wrapping hard-wrapped text?

Let's say that I have written a custom e-mail management application for the company that I work for. It reads e-mails from the company's support account and stores cleaned-up, plain text versions of them in a database, doing other neat things like associating it with customer accounts and orders in the process. When an employee replies to a message, my program generates an e-mail that is sent to the customer with a formatted version of the discussion thread. If the customer responds, the app looks for a unique number in the subject line to read the incoming message, strip out the previous discussion, and add it as a new item in the thread. For example:
This is a message from Contoso customer service.
Recently, you requested customer support. Below is a summary of your
request and our reply.
--------------------------------------------------------------------
Contoso (Fred) on Tuesday, December 30, 2008 at 9:04 a.m.
--------------------------------------------------------------------
John:
I've modified your address. You can confirm my work by logging into
"Your Account" on our Web site. Your order should ship out today.
Thanks for shopping at Contoso.
--------------------------------------------------------------------
You on Tuesday, December 30, 2008 at 8:03 a.m.
--------------------------------------------------------------------
Oops, I entered my address incorrectly. Can you change it to
Fred Smith
123 Main St
Anytown, VA 12345
Thanks!
--
Fred Smith
Contoso Product Lover
Generally, this all works great, but there's one area that I've kind of putting off cleaning up for a while now, and it deals with text wrapping. In order to generate the pretty e-mail format like the one above, I need to re-wrap the text that the customer originally sent.
I've written an algorithm that does this (though looking at the code, I'm not entirely sure how it works anymore--it could use some refactoring). But it can't distinguish between a hard-wrap newline, an "end of paragraph" newline, and a "semantic" newline. For example, a hard-wrap newline is one that the e-mail client inserted within a paragraph to wrap a long line of text, say, at 79 columns. An end of paragraph newline is one that the user added after the last sentence in a paragraph. And a semantic newline would be something like the br tag, such as the address that the Fred typed above.
My algorithm instead only sees two newlines in a row as indicating a new paragraph, so it would make the customer's e-mail be formatted something like the following:
Oops, I entered my address incorrectly. Can you change it to
Fred Smith 123 Main St Anytown, VA 12345
Thanks!
-- Fred Smith Contoso Product Lover
Whenever I try to write a version that would re-wrap this text as intended, I basically hit a wall in that I need to know the semantics of the text, the difference between a "hard-wrap" newline and a "I really meant it like a br"-type newline, such as in the customer's address. (I use two newlines in a row to determine when to start a new paragraph, which coincides with how the majority of people seem to actually type e-mails.)
Anyone have an algorithm that can re-wrap the text as intended? Or is this implementation "good enough" when weighing the complexity of any given solution?
Thanks.
You could try to check if a newline has been inserted to keep the line length below a maximum (aka hard wrap): Just check for the longest line in the text. Then, for any given line, you append the first word of the following line to it. If the resulting line exceeds the maximum length, the line break probably was a hard wrap.
Even simpler you might just consider all breaks in (maxlength - 15) <= length <= maxlength as being hardwraps (with 15 just being an educated guess). This would certainly filter out intentional breaks as in addresses and stuff, and any missed break in this range wouldn't influence the result too badly.
I have two suggestions, as follows.
Pay attention to punctuation: this will help you to distinguish between a "hard-wrap" newline and an "end of paragraph" newline (because, if the line ends with a full stop, then it's more likely that the user intended it to be an end-of-paragraph.
Pay attention to whether a line is much shorter than the maximum line length: in the example above, you might have text that's being "hard-wrapped" at 79 characters, plus you have address lines which are only 30 characters long; because 30 is much less than 79, you know that the address lines were broken by the user and not by the user's text-wrap algorithm.
Also, pay attention to indents: lines which are indented with whitespace from the left may be supposed to be new paragraphs, broken from the previous lines, as they are on this forum.
Following Ole's advice above, I re-worked my implementation to look at a threshold. It seems to handle most scenarios I throw at it well enough without me having to go nuts and write code that actually understand the English language.
Basically, I first scan through the input string and record the longest line length in the variable inputMaxLineLength. Then as I'm rewrapping, if I encounter a newline that has an index between inputMaxLineLength and 85% of inputMaxLineLength, then I replace that newline with a space because I think it's a hard wrap newline--unless it's immediately followed by another newline, because then I assume that it's just a one-line paragraph that just happens to within that range. This can happen if someone types out a short bulleted list, for example.
Certainly not perfect, but "good enough" for my scenario, considering the text is usually half-mangled by a previous e-mail client to begin with.
Here's some code, my a-few-hours-old implementation that probably still underwraps in a few edge cases (using C#). It's a lot less complicated than my previous solution, which is nice.
Source Code
And here's some unit tests that exercise that code (using MSTest):
Test Code
If anyone has a better implementation (and no doubt a better implementation exists), I'll be happy to read your thoughts! Thanks.

Resources