What quirks are there in making large strings with the || operator? - oracle

Specifically, I am building out an email message body and getting some odd behavior with inserting CRLF (defined as CRLF CONSTANT VARCHAR2(2) := CHR(13) || CHR(10);) and with formatting dollar values (using this construct in two places with the same database field value and getting different results in the output TRIM(TO_CHAR(foo.mydollars, '$99,999,999,990.00'))).
In the CRLF case sometimes I get a newline and sometimes not.
In the number formatting I see:
1. $1,66942.
2. $1,669.42
I am running Oracle 10g.
So I output the message body to the spool log file and dollar value formatting looks perfect there for all cases. Newlines are not an issue as it was the client messing with me. Still have the problem of dropping a decimal in the message by the time it lands in my inbox.
The program is passing the message body to the Oracle mail package through a wrapper that sets the character set to iso-8859-1 and the message body is processed like the following:
UTL_SMTP.write_raw_data
(c,
UTL_ENCODE.quoted_printable_encode
(UTL_RAW.cast_to_raw ( p_msgBody
|| UTL_TCP.crlf
)
)
);

My guess would be that it might be the email / mail reader that's causing the problem rather than Oracle. Email clients do all sorts of funny formatting to the emails they receive. Outlook in particular adds/remove line-breaks as it sees fit.
I suggest writing the email body into a temp/logging table so that you can compare that with the email?

Related

PLSQL - convert UTF-8 NVARCHAR2 to VARCHAR2

I have a table with a column configured as NVARCHAR2, I'm able save the string in UTF-8 without any issues.
But the application the calls the value does not fully support UTF-8.
This means that the string is passed to the database and back after the string is converted into HTML letter code. Each letter in the string is converted to such HTML code.
I'm looking for an easier solution.
I've considered converting it to BASE64, but it contains various characters which are considered illegal in the application.
In addition tried using HEXTORAW & RAWTOHEX.
None of the above helped.
If the column contains 'κόσμε' I need to find a way to convert/encode it to something else, but the decode should be possible to do from the HTML running the application.
Try using ASCIISTR function, it will convert it in something similar as JSON encodes unicode strings (it's actually the same, except "\" is used instead of "\u") and then when you receive it back from front end try using UNISTR to convert it back to unicode.
ASCIISTR: https://docs.oracle.com/cd/B28359_01/server.111/b28286/functions006.htm
UNISTR: https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions204.htm
SELECT ASCIISTR(N'κόσμε') FROM DUAL;
SELECT UNISTR('\03BA\1F79\03C3\03BC\03B5') FROM DUAL;

rfc2047 multiple encoded-word in email subject

I need to send an email with the Subject containing cyrillic letters. But my recipients sometimes receive incorrect letters due to some problems with mail server and/or client. I always send emails in windows-1251 encoding, but sometimes a mail client shows letter's Subject and Sender in another encoding (KOI-8R) and our users can't understand the message.
I tried to use an encoded-word tag as described in RFC 2047 Standard. For example, my Subject field in the email now looks like:
Subject: =?WINDOWS-1251?B?wiDt5eTw4PUg8vPt5PD7IOL75PD7IOIg4+Xy8OD1IPL78P/yIOIg4uXk8OAg/+Tw?=
=?WINDOWS-1251?B?4CDq5eTw4C4gwvvw4uDiIPEg4vvk8Psg4iDy8+3k8OUg4+Xy?=
=?WINDOWS-1251?B?8PssIOL78vDzIOL75PDu6SD/5PDgIOrl5PDgLCDi+/Lw8yDj?=
=?WINDOWS-1251?B?5fLw7ukg4vvk8OUg7O7w5PMsIP/k8OAg4iDi5eTw4Cwg4vvk?=
=?WINDOWS-1251?B?8PMg4iDy8+3k8PMu?=
These lines was generated by Oracle function UTL_ENCODE.MIMEHEADER_ENCODE.
All mail clients (Lotus Notes, gmail.com) show only the first line of such email subject (only first 48 symbols).
What is the problem with my mail subject?
The problem is, that you do not fold correctly, according to RFC 2822. To make a multi line field in the header each line has to start with a white space.
What you need to do is:
replace(UTL_ENCODE.MIMEHEADER_ENCODE(subject, 'UTF8', UTL_ENCODE.BASE64), UTL_TCP.CRLF, UTL_TCP.CRLF || ' ')
This should solve your problem.

Not sure why the output of my PHP scripts contains random embedded spaces within character strings

I have written several PHP scripts to read the contents of a database and output those contents in an email message. Every once in a while, I will see a SPACE (0x20) character embedded in the output where there shouldn't be any. For example, in one script, I reference a PHP global variable containing exactly "n" non-space characters, and sometimes (not always), when that variable is dumped to an email message, the string will appear with an embedded blank (making the total length of the string "n+1"). Other times, an HTML tag (such as <BR>) will appear as < BR> (note the SPACE before the "B").
Because the behavior of the script is not consistent (some emails are affected, and others aren't), I can't seem to find the problem.
I am enclosing a link to the PHP script that is occasionally embedding a space into the BREAK tag. I have removed the lines that provide specific login information to the databases. Otherwise, everything else is intact. In the code file you can find at the link below, line 281 is the one that contained the BREAK command with the embedded SPACE (as described above). This has happened only once!
http://jem-software.com/temptest.txt
I guess the only other potentially relevant information is that this script file is taken from code entered into a JUMI code block contained within a Joomla! based website.
Edit 1:
Thank you, Riccardo, for your suggestions. Here is some more clarification:
I am not reading an email and parsing the results in order to insert into a database. Just the opposite, I am reading from a database and using the results to create an email. I will check the database to see what character set was used, and explicitly pass the character set to see if that makes a difference.
I don't use Joomla functions to access the database because the database I am referencing is external to the Joomla! environment. It is a pre-existing database created from PHP scripts written several years prior. When my old website was re-written using Joomla, I wanted to "port" the PHP database access code intact, so I installed the JUMI plugin to make this possible.
I will check out the character coding in the database and synchronize it to the character code of the email message.
I don't understand how an issue with character coding would result in the insertion of a SPACE into the hard-coded HTML tag - this tag did not come from any database, but was typed into the email as a literal string.
This is a strange issue, but here are my two cents:
The first is you're not using Joomla functions to access the db and the mail subsystem. While this could work, it's not really nice.
The second is, this smells like a character set / codepage issue.
Here are a few considerations on the character set issue:
I read your code quickly, and I didn't notice anything wrong. But Joomla uses UTF-8, and your queries don't specify it (mysql_set_charset() is missing!) which could be a first issue.
The second is that the emails you read will have different character sets, depending on the senders' settings. Make sure you handle the codepage issues properly: the following is a snippet of a function I use for parsing email:
$mime = imap_fetchmime($this->connection, $this->messageNumber, $partNumber);
return $this->decodeMailBody($data,$mime); // QUOTED_PRINTABLE
function decodeMailBody($string,$mime) {
$str = quoted_printable_decode($string);
echo "<h3>mime: $mime; charset $charset</h3>";
//mime: Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8
//mime: Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252
$mimes = explode('charset=',$mime);
foreach($mimes as $mimepiece) {
$charset = $mimepiece;
}
$charset = strtolower(trim($charset));
if ($charset == 'utf-8') {
return $str;
} else {
return iconv($charset, 'UTF-8', $str);
}
}
Last, make sure you use utf-8 when you insert the mail into the db after parsing it.

UTL_SMTP.Write_Data not sending any text if a colon is included

I have created a Oracle function to send an email using the UTL_SMTP package.
When using the methods Write_Data or Data, if my text does not contain a colon, the sent email will contain the inputted text in the email's body.
However, if the text contains a colon, the text is not included in the email.
Every example of this i've seen online seems to indicate this is no problem. Any idea what could be the cause of this?
This works: UTL_SMTP.write_data(l_mail_conn, 'test');
This does not get sent: UTL_SMTP.write_data(l_mail_conn, 'test:');
nor does: UTL_SMTP.write_data(l_mail_conn, 'test' || ':');
It may be getting interpreted as a header
Rather than write your own, look at the mail code included in PLCODEBREW
I had this problem too.
Apreciating that you have upgraded to UTL_MAIL - my findings below are for those whom would prefer or have to stay with UTL_SMTP.
If you ensure your SMS body does not match the pattern 'aaa:...' then the utl_smtp.write_data will not interpret it as a header. If your SMS body does match this pattern then prefix your message with a space or you may simply want to replace the colon with a semi-colon etc... Your choice.
You can use the following to intercept and workaround the problem.
.....
/* 999999999 is just an indicitive integer above and beyond the max length of an sms */
IF (INSTR(p_message,':') < NVL(INSTR(p_message,' '),999999999)
AND INSTR(p_message,':') != 0)
THEN p_message := ' '||p_message;
END IF;
utl_smtp.write_data(l_mail_conn, p_message);
.....
I wasn't able to get UTL_SMTP working - certainly looks like any colons in the UTL_SMTP body was being interpreted as a header and I could not find a way to escape them.
I was able to use the UTL_MAIL package introduced by Oracle in 10g and it worked very well. Only configuration necessary was setting the smtp_out_server variable in oracle to the mail server (include the port number)of the mail server. Only requires a single method call - much cleaner to implement in PL/SQL as well - and also allows you more control over specific email fields (Subject, for example) and can handle attachments as well.

Strip signatures and replies from emails

I'm currently working on a system that allows users to reply to notification emails that are sent out (sigh).
I need to strip out the replies and signatures, so that I'm left with the actual content of the reply, without all the noise.
Does anyone have any suggestions about the best way to do this?
If your system is in-house and/or you have a limited number of reply formats, it's possible to do a pretty good job. Here are the filters we have set up for email responses to trac tickets:
Drop all text after and including:
Lines that equal '-- \n' (standard email sig delimiter)
Lines that equal '--\n' (people often forget the space in sig delimiter; and this is not that common outside sigs)
Lines that begin with '-----Original Message-----' (MS Outlook default)
Lines that begin with '________________________________' (32 underscores, Outlook again)
Lines that begin with 'On ' and end with ' wrote:\n' (OS X Mail.app default)
Lines that begin with 'From: ' (failsafe four Outlook and some other reply formats)
Lines that begin with 'Sent from my iPhone'
Lines that begin with 'Sent from my BlackBerry'
Numbers 3 and 4 are 'begin with' instead of 'equals' because sometimes users will squash lines together on accident.
We try to be more liberal about stripping out replies, since it's much more of an annoyance (to us) have reply garbage than it is to correct missing text.
Anybody have other formats from the wild that they want to share?
Check out the email_reply_parser gem - https://github.com/github/email_reply_parser . It does a nice job handling this problem.
I don't believe you can do this reliably (signatures used to begin with '--' but I don't see that anymore). Perhaps you're better off asking people to reply inbetween text headers and then simply strip the reply from this ? It's not elegant, but perhaps more reliable.
e.g.
REPLY BETWEEN HERE -->
AND HERE -->
so you'd simply look for the required headers above and take what's inbetween.
If you want something powerful & robust, and don't mind reading academic publications, you might check out this:
Learning to Extract Signature and Reply Lines from Email
Here's the homepage for one of the authors, with more info & some downloads:
Vitor R. Carvalho - Software and Datasets - (Vitor Carvalho)
An approach that can be used for signature only (in addition to detect __ or --) is to test if the first name and/or family name of the sender is on a short line (~ containing 3 to 4 words, max).
The sender name is on the raw email header, most of the time next to the email address, like in:
From: John Doe <jdoe#provider.com>
This would be based on the assumption that you rarely write your own name in a email, and if you do so, it is probably in a long sentence.
Of course there will be some false positive, but it may not be a big problem depending on what you do (we use it to fold quoted text and signature into a ... gmail-style button, so overdetection does not end up into losing any content, it is just misplaced).
If you can assume that these emails are in plain text, just strip lines that begins with ">" as replies, and "-- " line should delimit signature. But those assumptions might not work, as not all people over internet use software that complies to rules.
There's a really nice PHP library dedicated to the email parsing
http://williamdurand.fr/EmailReplyParser/
https://github.com/willdurand/EmailReplyParser
I made one for golang: https://github.com/web-ridge/email-reply-parser it detects signatures like
Karen The Green
Graphic Designer
Office
Tel: +44423423423423
Fax: +44234234234234
karen#webby.com
Street 2, City, Zeeland, 4694EG, NL
www.thing.com
The content of this email is confidential and intended for the recipient specified in message only. It is strictly forbidden to share any part of this message with any third party, without a written consent of the sender. If you received this message by mistake, please reply to this message and follow with its deletion, so that we can ensure such a mistake does not occur in the future.
Met vriendelijke groeten,
Richard Lindhout
The recommended signature delimiter is "-- \n". If people follow this recommendation, stripping signatures should be easy.

Resources