rfc2047 multiple encoded-word in email subject - oracle

I need to send an email with the Subject containing cyrillic letters. But my recipients sometimes receive incorrect letters due to some problems with mail server and/or client. I always send emails in windows-1251 encoding, but sometimes a mail client shows letter's Subject and Sender in another encoding (KOI-8R) and our users can't understand the message.
I tried to use an encoded-word tag as described in RFC 2047 Standard. For example, my Subject field in the email now looks like:
Subject: =?WINDOWS-1251?B?wiDt5eTw4PUg8vPt5PD7IOL75PD7IOIg4+Xy8OD1IPL78P/yIOIg4uXk8OAg/+Tw?=
=?WINDOWS-1251?B?4CDq5eTw4C4gwvvw4uDiIPEg4vvk8Psg4iDy8+3k8OUg4+Xy?=
=?WINDOWS-1251?B?8PssIOL78vDzIOL75PDu6SD/5PDgIOrl5PDgLCDi+/Lw8yDj?=
=?WINDOWS-1251?B?5fLw7ukg4vvk8OUg7O7w5PMsIP/k8OAg4iDi5eTw4Cwg4vvk?=
=?WINDOWS-1251?B?8PMg4iDy8+3k8PMu?=
These lines was generated by Oracle function UTL_ENCODE.MIMEHEADER_ENCODE.
All mail clients (Lotus Notes, gmail.com) show only the first line of such email subject (only first 48 symbols).
What is the problem with my mail subject?

The problem is, that you do not fold correctly, according to RFC 2822. To make a multi line field in the header each line has to start with a white space.
What you need to do is:
replace(UTL_ENCODE.MIMEHEADER_ENCODE(subject, 'UTF8', UTL_ENCODE.BASE64), UTL_TCP.CRLF, UTL_TCP.CRLF || ' ')
This should solve your problem.

Related

What is SCC_BODY_URI_ONLY rule in spam assassin?

I am facing this issue SCC_BODY_URI_ONLY with my email when checked with SPAM ASSASSIN,
Does anybody know about this rule. There is no great deal of documentation around it.
You are right about the documentation. I checked out https://www.futurequest.net/docs/SA/. A very long list. But still no description.
But I did see it had to do with the Meta.
So I looked at the source of the email and saw that the title brackets were empty. So I just added a title and bam.... email passed ! Helpful I hope. Rock on..
As of 2022-06-23, the rule works as follows, as defined under 72_active.cf:
meta SCC_BODY_URI_ONLY T_SCC_BODY_TEXT_LINE < 2 && __HAS_ANY_URI && !__SMIME_MESSAGE
meta T_SCC_BODY_TEXT_LINE __SCC_BODY_TEXT_LINE_FULL - __SCC_SUBJECT_HAS_NON_SPACE
body __SCC_BODY_TEXT_LINE_FULL /^\s*\S/
tflags __SCC_BODY_TEXT_LINE_FULL multiple maxhits=3
header __SCC_SUBJECT_HAS_NON_SPACE Subject =~ /\S/
To summarise, the rule SCC_BODY_URI_ONLY will trigger if:
T_SCC_BODY_TEXT_LINE returns a number less than 2
T_SCC_BODY_TEXT_LINE checks the body of the email for lines containing any non-whitespace character, with any amount of whitespace characters before it, and will run this check a maximum of 3 times. Minus 1 if the Subject contains any non-whitespace characters.
The email contains any URI
The email does not have a Content-Type header indicating it is an S/MIME email
So, pretty much any email that contains:
At least 1 URI, 1 line in the body and a blank subject, OR
At least 1 URI, 2 lines and a subject with content
The above may be out of date in future, you would have to check the current state of the rules in your Spamassassin definitions. More information can be found about writing/interpreting rules here: https://cwiki.apache.org/confluence/display/SPAMASSASSIN/writingrules

"=?utf-8?Q??=" in To: field with Outlook and MailChimp

I can't find much information on this problem aside from issues with Code Igniter and long subjects (my subject is < 20 chars). I sent a campaign with MailChimp, and found that when using Outlook (Gmail web is fine), the To: field says "=?utf-8?Q??=" instead of the recipient name.
What could cause this?
The To header below encodes an empty string (nothing between two ?'s in =?utf-8?Q??=:
To: =?utf-8?Q??= <MyName#MyCompanyName.com>
Either get rid of the utf-8 encoding or actually provide a display name
To: =?utf-8?Q?Some Name?= <MyName#MyCompanyName.com>

What quirks are there in making large strings with the || operator?

Specifically, I am building out an email message body and getting some odd behavior with inserting CRLF (defined as CRLF CONSTANT VARCHAR2(2) := CHR(13) || CHR(10);) and with formatting dollar values (using this construct in two places with the same database field value and getting different results in the output TRIM(TO_CHAR(foo.mydollars, '$99,999,999,990.00'))).
In the CRLF case sometimes I get a newline and sometimes not.
In the number formatting I see:
1. $1,66942.
2. $1,669.42
I am running Oracle 10g.
So I output the message body to the spool log file and dollar value formatting looks perfect there for all cases. Newlines are not an issue as it was the client messing with me. Still have the problem of dropping a decimal in the message by the time it lands in my inbox.
The program is passing the message body to the Oracle mail package through a wrapper that sets the character set to iso-8859-1 and the message body is processed like the following:
UTL_SMTP.write_raw_data
(c,
UTL_ENCODE.quoted_printable_encode
(UTL_RAW.cast_to_raw ( p_msgBody
|| UTL_TCP.crlf
)
)
);
My guess would be that it might be the email / mail reader that's causing the problem rather than Oracle. Email clients do all sorts of funny formatting to the emails they receive. Outlook in particular adds/remove line-breaks as it sees fit.
I suggest writing the email body into a temp/logging table so that you can compare that with the email?

UTL_SMTP.Write_Data not sending any text if a colon is included

I have created a Oracle function to send an email using the UTL_SMTP package.
When using the methods Write_Data or Data, if my text does not contain a colon, the sent email will contain the inputted text in the email's body.
However, if the text contains a colon, the text is not included in the email.
Every example of this i've seen online seems to indicate this is no problem. Any idea what could be the cause of this?
This works: UTL_SMTP.write_data(l_mail_conn, 'test');
This does not get sent: UTL_SMTP.write_data(l_mail_conn, 'test:');
nor does: UTL_SMTP.write_data(l_mail_conn, 'test' || ':');
It may be getting interpreted as a header
Rather than write your own, look at the mail code included in PLCODEBREW
I had this problem too.
Apreciating that you have upgraded to UTL_MAIL - my findings below are for those whom would prefer or have to stay with UTL_SMTP.
If you ensure your SMS body does not match the pattern 'aaa:...' then the utl_smtp.write_data will not interpret it as a header. If your SMS body does match this pattern then prefix your message with a space or you may simply want to replace the colon with a semi-colon etc... Your choice.
You can use the following to intercept and workaround the problem.
.....
/* 999999999 is just an indicitive integer above and beyond the max length of an sms */
IF (INSTR(p_message,':') < NVL(INSTR(p_message,' '),999999999)
AND INSTR(p_message,':') != 0)
THEN p_message := ' '||p_message;
END IF;
utl_smtp.write_data(l_mail_conn, p_message);
.....
I wasn't able to get UTL_SMTP working - certainly looks like any colons in the UTL_SMTP body was being interpreted as a header and I could not find a way to escape them.
I was able to use the UTL_MAIL package introduced by Oracle in 10g and it worked very well. Only configuration necessary was setting the smtp_out_server variable in oracle to the mail server (include the port number)of the mail server. Only requires a single method call - much cleaner to implement in PL/SQL as well - and also allows you more control over specific email fields (Subject, for example) and can handle attachments as well.

Strip signatures and replies from emails

I'm currently working on a system that allows users to reply to notification emails that are sent out (sigh).
I need to strip out the replies and signatures, so that I'm left with the actual content of the reply, without all the noise.
Does anyone have any suggestions about the best way to do this?
If your system is in-house and/or you have a limited number of reply formats, it's possible to do a pretty good job. Here are the filters we have set up for email responses to trac tickets:
Drop all text after and including:
Lines that equal '-- \n' (standard email sig delimiter)
Lines that equal '--\n' (people often forget the space in sig delimiter; and this is not that common outside sigs)
Lines that begin with '-----Original Message-----' (MS Outlook default)
Lines that begin with '________________________________' (32 underscores, Outlook again)
Lines that begin with 'On ' and end with ' wrote:\n' (OS X Mail.app default)
Lines that begin with 'From: ' (failsafe four Outlook and some other reply formats)
Lines that begin with 'Sent from my iPhone'
Lines that begin with 'Sent from my BlackBerry'
Numbers 3 and 4 are 'begin with' instead of 'equals' because sometimes users will squash lines together on accident.
We try to be more liberal about stripping out replies, since it's much more of an annoyance (to us) have reply garbage than it is to correct missing text.
Anybody have other formats from the wild that they want to share?
Check out the email_reply_parser gem - https://github.com/github/email_reply_parser . It does a nice job handling this problem.
I don't believe you can do this reliably (signatures used to begin with '--' but I don't see that anymore). Perhaps you're better off asking people to reply inbetween text headers and then simply strip the reply from this ? It's not elegant, but perhaps more reliable.
e.g.
REPLY BETWEEN HERE -->
AND HERE -->
so you'd simply look for the required headers above and take what's inbetween.
If you want something powerful & robust, and don't mind reading academic publications, you might check out this:
Learning to Extract Signature and Reply Lines from Email
Here's the homepage for one of the authors, with more info & some downloads:
Vitor R. Carvalho - Software and Datasets - (Vitor Carvalho)
An approach that can be used for signature only (in addition to detect __ or --) is to test if the first name and/or family name of the sender is on a short line (~ containing 3 to 4 words, max).
The sender name is on the raw email header, most of the time next to the email address, like in:
From: John Doe <jdoe#provider.com>
This would be based on the assumption that you rarely write your own name in a email, and if you do so, it is probably in a long sentence.
Of course there will be some false positive, but it may not be a big problem depending on what you do (we use it to fold quoted text and signature into a ... gmail-style button, so overdetection does not end up into losing any content, it is just misplaced).
If you can assume that these emails are in plain text, just strip lines that begins with ">" as replies, and "-- " line should delimit signature. But those assumptions might not work, as not all people over internet use software that complies to rules.
There's a really nice PHP library dedicated to the email parsing
http://williamdurand.fr/EmailReplyParser/
https://github.com/willdurand/EmailReplyParser
I made one for golang: https://github.com/web-ridge/email-reply-parser it detects signatures like
Karen The Green
Graphic Designer
Office
Tel: +44423423423423
Fax: +44234234234234
karen#webby.com
Street 2, City, Zeeland, 4694EG, NL
www.thing.com
The content of this email is confidential and intended for the recipient specified in message only. It is strictly forbidden to share any part of this message with any third party, without a written consent of the sender. If you received this message by mistake, please reply to this message and follow with its deletion, so that we can ensure such a mistake does not occur in the future.
Met vriendelijke groeten,
Richard Lindhout
The recommended signature delimiter is "-- \n". If people follow this recommendation, stripping signatures should be easy.

Resources