Outlook handling of quoted url parameters in mailto link - outlook

I'm attempting to populate the body of a mailto link with an html link. The target browser is IE 7+ and mail client is Outlook 2007+. Before I ask my question, i'll acknowledge the fact that the body parameter is intended for short text messages as called out here:
https://stackoverflow.com/a/4304779/573083
and detailed here:
The special "body" indicates that the associated is the body of the
message. The "body" field value is intended to contain the content for
the first text/plain body part of the message. The "body" pseudo
header field is primarily intended for the generation of short text
messages for automatic processing (such as "subscribe" messages for
mailing lists), not for general MIME bodies. Except for the encoding
of characters based on UTF-8 and percent-encoding, no additional
encoding (such as e.g., base64 or quoted-printable; see [RFC2045]) is
used for the "body" field value. As a consequence, header fields
related to message encoding (e.g., Content-Transfer-Encoding) in a
'mailto' URI are irrelevant and MUST be ignored. The "body" pseudo
header field name has been registered with IANA for this special
purpose (see Section 8.2).
That being said, there have been a number of threads on SO with varying levels of success with inserting links in the body tag. for example: https://stackoverflow.com/a/1455881/573083 and https://stackoverflow.com/a/9138245/573083
My issue is similiar, but it is specifically with outlook rendering quoted parameters of embedded links. I currently have the following that is almost working:
A link
A partial link appears correctly in the outlook body, however outlook is not including the final quoted url parameter ("somevalue") in the link; the ="somevalue" is just appearing as plain text. Viewing the source of the email message shows that outlook is closing the enclosing <a> tag as it is interpreting the %22 as the end of the link. I've attempted to escape the %22 with %2f, /, ' - to no avail. I believe that I need the correct sequence for outlook to understand that the %22 should be included in the link, and not as the closure of the enclosing link.
Any help would be appreciated.

Judging by the ?, you haven't encoded the body component.
> encodeURIComponent("http://someserver.somedomain/somepage.aspx?id=1234%26somekey=%22somevalue%22")
"http%3A%2F%2Fsomeserver.somedomain%2Fsomepage.aspx%3Fid%3D1234%2526somekey%3D%2522somevalue%2522"
So the code should be:
A link
Or more likely:
A link

I would put the link inside "<" & ">".
%20 = space
%0D = new line
%3C = "<"
%3E = ">"
<html>
<body>hi
A link</body>
</html>

Related

Mac computers aren't processing mailto: links correctly when they have // in them (mailto://)

Sorry for the question title, it's a little difficult to phrase in my opinion. Here is the full question:
The WYSIWYG HTML editor we use on our websites includes a // in the mailto: link when inserted into the text editor box (mailto://). We are a webfirm and use this editor on many, many websites. For example, all the mail links inserted appear like this:
Text Here
We just noticed this morning that Windows computers do not include the // in the To: field when clicked regardless of the email client it's opened with. It will include the email as normal (email#domain.com).
However, Mac computers are including the // though, so whenever someone tries to send an email using these links, it's trying to email //email#domain.com - which isn't delivering, because obviously it's an invalid format with the //s.
Does anyone have any knowledge to why this is happening? The WYSIWYG editor we are using is obout. If we have to go back and remove these // from every single website we've built, it would be a tremendous task. I'm just wondering why Macs seem to not process the link correctly, while Windows computers do.
The Macs are processing the link correctly. Windows is incorrectly removing data and your editor is incorrectly encoding the data.
The mailto: URL scheme is defined by RFC 2368. It defines it as:
mailtoURL = "mailto:" [ to ] [ headers ]
to = #mailbox
headers = "?" header *( "&" header )
header = hname "=" hvalue
hname = *urlc
hvalue = *urlc
"#mailbox" is as specified in RFC 822 [RFC822]. This means that it
consists of zero or more comma-separated mail addresses, possibly
including "phrase" and "comment" components. Note that all URL
reserved characters in "to" must be encoded: in particular,
parentheses, commas, and the percent sign ("%"), which commonly occur
in the "mailbox" syntax.
There is no provision for removing characters such as /.

How Do Validators Differentiate Between '&' and '&amp'?

Knowing that & is the html entity value of & - how do validators like w3c know this? Even when I look at my source code it's already been parsed into the correct value.
Your question is based on a false premise -- as Co_42 noted, & is not the "ASCII value" of '&'. It's a HTML character reference representing the character '&'. The ASCII value of '&' is 38 (or 0x26).
Your source code almost certainly consists of ASCII or Unicode text files. Those don't use HTML entities. If you have a string with an ampersand stored in the source code, it'll probably be stored with a bare "&". If there's a string literal somewhere containing actual HTML data, it may contain "&".
When you use some sort of tool or function to convert strings to text ready to put into for an HTML or XML document, any "&" will be (should be!) converted into "&".
When a program that reads HTML documents encounters an ASCII "&", it can assume that that's the beginning of a HTML character reference. This is okay because all ampersands in the actual text should have been converted into "&".
As a somewhat perverse example, if you open your source code in a word processor and save it as an HTML document, you'll find that in the actual file, "&" has been converted into "&" (and "&" has been converted into "&amp;"). If you then open that document in a browser, you'll find that the ampersands are displayed the same way they are when you view your source code in a text editor. The encoding step that happened when you saved the HTML document corresponds to the decoding step that happens when the browser displays it.
If you put something like "Fish & chips" directly into an actual HTML document, your HTML document will be invalid. Complicating the matter is the fact that programs such as browsers tend to try to recover from errors in document and display the documents anyway. As such, your browser may still display "Fish & chips" on the screen when you open your invalid document. However, a program such as the W3C validator, which is specifically meant to discover errors in HTML documents, will notify you that your document is invalid.

How can i send a parameter with space to .net web api

I would like to receive a long string the contains spaces to my method in my web api
To my understanding i can't send a parameter with white spaces, does it have to be encoded in some way?
EDIT:
My content type is:
Content-Type: application/x-www-form-urlencoded
I've changed it to several other types but none of them allows me to receive a parameter with + instead of spaces
my post method signature is
public HttpResponseMessage EditCommentForExtension(string did, string extention, string comment)
Usually, parameters to an HTTP GET request are URL encoded. This means (among other) that spaces are replaced by "+".
Using + to mean "space" in a URL is an internal convention used by some web sites, but it's not part of the URL encoding standard. If you want to use + to means spaces, you are going to have to convert them yourself.
As you discovered, spaces (like everything else that needs encoding) should be encoded with %XX where X standards for a hex digit.
http://www.w3.org/Addressing/rfc1738.txt
The only thing that work for me is to add %20 instead of the spaces

Not sure why the output of my PHP scripts contains random embedded spaces within character strings

I have written several PHP scripts to read the contents of a database and output those contents in an email message. Every once in a while, I will see a SPACE (0x20) character embedded in the output where there shouldn't be any. For example, in one script, I reference a PHP global variable containing exactly "n" non-space characters, and sometimes (not always), when that variable is dumped to an email message, the string will appear with an embedded blank (making the total length of the string "n+1"). Other times, an HTML tag (such as <BR>) will appear as < BR> (note the SPACE before the "B").
Because the behavior of the script is not consistent (some emails are affected, and others aren't), I can't seem to find the problem.
I am enclosing a link to the PHP script that is occasionally embedding a space into the BREAK tag. I have removed the lines that provide specific login information to the databases. Otherwise, everything else is intact. In the code file you can find at the link below, line 281 is the one that contained the BREAK command with the embedded SPACE (as described above). This has happened only once!
http://jem-software.com/temptest.txt
I guess the only other potentially relevant information is that this script file is taken from code entered into a JUMI code block contained within a Joomla! based website.
Edit 1:
Thank you, Riccardo, for your suggestions. Here is some more clarification:
I am not reading an email and parsing the results in order to insert into a database. Just the opposite, I am reading from a database and using the results to create an email. I will check the database to see what character set was used, and explicitly pass the character set to see if that makes a difference.
I don't use Joomla functions to access the database because the database I am referencing is external to the Joomla! environment. It is a pre-existing database created from PHP scripts written several years prior. When my old website was re-written using Joomla, I wanted to "port" the PHP database access code intact, so I installed the JUMI plugin to make this possible.
I will check out the character coding in the database and synchronize it to the character code of the email message.
I don't understand how an issue with character coding would result in the insertion of a SPACE into the hard-coded HTML tag - this tag did not come from any database, but was typed into the email as a literal string.
This is a strange issue, but here are my two cents:
The first is you're not using Joomla functions to access the db and the mail subsystem. While this could work, it's not really nice.
The second is, this smells like a character set / codepage issue.
Here are a few considerations on the character set issue:
I read your code quickly, and I didn't notice anything wrong. But Joomla uses UTF-8, and your queries don't specify it (mysql_set_charset() is missing!) which could be a first issue.
The second is that the emails you read will have different character sets, depending on the senders' settings. Make sure you handle the codepage issues properly: the following is a snippet of a function I use for parsing email:
$mime = imap_fetchmime($this->connection, $this->messageNumber, $partNumber);
return $this->decodeMailBody($data,$mime); // QUOTED_PRINTABLE
function decodeMailBody($string,$mime) {
$str = quoted_printable_decode($string);
echo "<h3>mime: $mime; charset $charset</h3>";
//mime: Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8
//mime: Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252
$mimes = explode('charset=',$mime);
foreach($mimes as $mimepiece) {
$charset = $mimepiece;
}
$charset = strtolower(trim($charset));
if ($charset == 'utf-8') {
return $str;
} else {
return iconv($charset, 'UTF-8', $str);
}
}
Last, make sure you use utf-8 when you insert the mail into the db after parsing it.

Bug in Chrome, or Stupidity in User? Sanitising inputs on forms?

I've written a more detailed post about this on my blog at:
http://idisposable.co.uk/2010/07/chrome-are-you-sanitising-my-inputs-without-my-permission/
but basically, I have a string which is:
||abcdefg
hijklmn
opqrstu
vwxyz
||
the pipes I've added to give an indiciation of where the string starts and ends, in particular note the final carriage return on the last line.
I need to put this into a hidden form variable to post off to a supplier.
In basically, any browser except chrome, I get the following:
<input type="hidden" id="pareqMsg" value="abcdefg
hijklmn
opqrstu
vwxyz
" />
but in chrome, it seems to apply a .Trim() or something else that gives me:
<input type="hidden" id="pareqMsg" value="abcdefg
hijklmn
opqrstu
vwxyz" />
Notice it's cut off the last carriage return. These carriage returns (when Encoded) come up as %0A if that helps.
Basically, in any browser except chrome, the whole thing just works and I get the desired response from the third party. In Chrome, I get an 'invalid pareq' message (which suggests to me that those last carriage returns are important to the supplier).
Chrome version is 5.0.375.99
Am I going mad, or is this a bug?
Cheers,
Terry
You can't rely on form submission to preserve the exact character data you include in the value of a hidden field. I've had issues in the past with Firefox converting CRLF (\r\n) sequences into bare LFs, and your experience shows that Chrome's behaviour is similarly confusing.
And it turns out, it's not really a bug.
Remember that what you're supplying here is an HTML attribute value - strictly, the HTML 4 DTD defines the value attribute of the <input> element as of type CDATA. The HTML spec has this to say about CDATA attribute values:
User agents should interpret attribute values as follows:
Replace character entities with characters,
Ignore line feeds,
Replace each carriage return or tab with a single space.
User agents may ignore leading and trailing white space in CDATA attribute values (e.g., " myval " may be interpreted as "myval"). Authors should not declare attribute values with leading or trailing white space.
So whitespace within the attribute value is subject to a number of user agent transformations - conforming browsers should apparently be discarding all your linefeeds, not only the trailing one - so Chrome's behaviour is indeed buggy, but in the opposite direction to the one you want.
However, note that the browser is also expected to replace character entities with characters - which suggests you ought to be able to encode your CRs and LFs as 
 and
, and even spaces as , eliminating any actual whitespace characters from your value field altogether.
However, browser compliance with these SGML parsing rules is, as you've found, patchy, so your mileage may certainly vary.
Confirmed it here. It trims trailing CRLFs, they don't get parsed into the browser's DOM (I assume for all HTML attributes).
If you append CRLF with script, e.g.
var pareqMsg = document.forms[0]['pareqMsg']
if (/\r\n$/.test(pareqMsg.value) == false)
pareqMsg.value += '\r\n';
...they do get maintained and POSTed back to the server. Although the hidden <textarea> idea suggested by Gaby might be easier!
Normally in an input box you cannot enter (by keyboard) a newline.. so perhaps chrome enforces this even for embedded, through the attributes, values ..
try using a textarea (with display:none)..

Resources