Include filename in MIME base64 encoding - mime

When encoding a picture, say, into a MIME base64 string, is there a standard way of also including its filename, or at least a suggested filename?

Content-Disposition: attachment; filename="picture.jpg". The Content-Type header can also contain a name= attribute although it is not recommended.
I am assuming email, but IIRC the same goes for HTTP.

Related

Are these email headers RFC-2047 compliant?

I have several clients using a mail client that I wrote myself. They have recently stumbled upon emails where attachment file names arrive are in gibberish.
When I examined these emails, I have discovered that there is apparently a local webmail service that sends attachment names as follows:
Content-Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document;
name*="UTF-8''%D7%A2%D7%A8%D7%9B%D7%AA%20%D7%94%D7%A8%D7%A9%D7%9E%D7%94%20TCMP.docx"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename*=UTF-8''%D7%A2%D7%A8%D7%9B%D7%AA%20%D7%94%D7%A8%D7%A9%D7%9E%D7%94%20TCMP.docx
This is a totally invalid mime header according to RFC 2047. It has no quoted-printable identifier (?Q?), the different bytes are encoded with % instead of =, and the entire encoded-word should begin with =? and end with ?=, which it doesn't.
When I fix it to the correct format like so:
Content-Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document;
name="=?UTF-8?Q?=D7=A2=D7=A8=D7=9B=D7=AA=20=D7=94=D7=A8=D7=A9=D7=9E=D7=94=20TCMP.docx?="
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename=?UTF-8?Q?=D7=A2=D7=A8=D7=9B=D7=AA=20=D7=94=D7=A8=D7=A9=D7=9E=D7=94=20TCMP.docx?=
then the header gets decoded correctly.
Can anyone tell me if I'm missing something here? Is there a new extension to RFC2047 that allows for these headers, or are they just completely wrong?
As mentioned by #alex-k, the name*= syntax is defined in RFC2231 which was written after RFC2047.
But to answer the question as asked, no. Neither set of headers is RFC2047 compliant.
The *= syntax was not in existence when RFC2047 was written, so the original ones do not conform.
The second set, with MIME encoded words, are invalid because they break the rules about where MIME encoded words are allowed according to section 5 of RFC2047, specifically both of these rules:
+ An 'encoded-word' MUST NOT appear within a 'quoted-string'.
+ An 'encoded-word' MUST NOT be used in parameter of a MIME
Content-Type or Content-Disposition field, or in any structured
field body except within a 'comment' or 'phrase'.
(Those rules are not consecutive in the RFC.)

Can a multipart 7bit MIME message contain submessages of type 8bit or binary?

I am new to MIME, and I don't know if the following situation is valid:
Consider two nested MIME messages: the top-level message has Content-Transfer-Encoding: 7bit
The body of the top-level message is a nested MIME message that has Content-Transfer-Encoding: binary. The body of the internal message has lines that end in LF only, rather than CRLF.
I think this message is invalid, because the rules for 7bit say that LF by itself is not valid. However, a colleague is arguing that this message is valid, because the Content-Transfer-Encoding of the inner message is binary, which doesn't have any restrictions around CR LF.
My argument is that the entire body of the top-level message needs to conform to its encoding (7bit), regardless of the Content-Transfer-Encoding of any nested messages.
I've searched the web and tried to find the answer in the MIME spec, but was not able to find anything that seemed to address this particular situation.
Found an answer in section 6.4 of RFC 2045:
It should also be noted that, by definition, if a composite entity has
a transfer-encoding value such as "7bit", but one of the enclosed
entities has a less restrictive value such as "8bit", then either the
outer "7bit" labelling is in error, because 8bit data are included, or
the inner "8bit" labelling placed an unnecessarily high demand on the
transport system because the actual included data were actually
7bit-safe.
So the message in my example is invalid.

Charset in data URI

Over the years from reading the evolving specs I had assumed that RFC 3986 had finally settled on UTF-8 encoding for escape octet sequences. That is, if my URI has %XX%YY%ZZ I can take that sequence of decoded octets (for any URI in the scheme-specific part) and interpret the resulting bytes as UTF-8 to find out what decoded information was intended. In practical terms, I can call JavaScript decodeURIComponent() which does this decoding automatically for me.
Then I read the spec for data: URIs, RFC 2397, which includes a charset argument, which (naturally) indicates the charset of the encoded data. But how does that work? If I have a two-octet encoded sequence %XX%YY in my data: URI, does a charset=iso-8859-1 indicate that the two decoded octects should not be interpreted as a UTF-8 sequence, but as as two separate Latin characters (as each byte in ISO-8859-1 represents a character)? RFC 2397 seems to indicate this, as it gives an example of "greek [sic] characters":
data:text/plain;charset=iso-8859-7,%be%fg%be
But this means that JavaScript decodeURIComponent() (which assumes UTF-8 encoded octets) can't be used to extract a string from a data URI, correct? Does this mean I have to create my own decoding for data URIs if the charset is something besides UTF-8?
Furthermore, does this mean that RFC 2397 is now in conflict with RFC 3986, which seems to indicate that UTF-8 is assumed? Or does RFC 3986 only refer "new URI scheme[s]", meaning that the data: URI scheme gets grandfathered in and has its own technique for specifying what the encoded octets means?
My best guess at the moment is that data: plays by its own rules and if it indicates a charset other than UTF-8, I'll have to use something other than decodeURIComponent() in JavaScript. Any recommendations on a replacement method would be welcome, too.
Remember that the data: URI scheme describes a resource that can be thought of as a file which consists of an opaque bytestream just as though it were a http: URI (the same bytestream, but stored on an HTTP server) or an ftp: URI (the same bytestream, but stored on an FTP server) or a file: URI (the same bytestream, but stored on your local filesystem). Only the metadata attached to the file gives the bytestream meaning.
RFC 2397 gives a clear specification on how this bytestream is to be embedded in the URI itself (in contrast to other URI schemes, where the URI gives instructions on where to fetch the bytestream, not what it contains). It might be base64 or it might be the percent-encoding method given in the RFC. Base64 is going to be more compact if the bytestream contains man non-ASCII bytes.
The data: URI also describes its own Content-Type, which gives the intended interpretation of the bytestream. In this case, since you have used text/plain;charset=iso-8859-7, the bytes must be correctly encoded ISO-8859-7 text. The bytes will definitely not be decided as UTF-8 or any other character encoding. It will be unambiguously decoded using the character encoding you have specified.

How can i send a parameter with space to .net web api

I would like to receive a long string the contains spaces to my method in my web api
To my understanding i can't send a parameter with white spaces, does it have to be encoded in some way?
EDIT:
My content type is:
Content-Type: application/x-www-form-urlencoded
I've changed it to several other types but none of them allows me to receive a parameter with + instead of spaces
my post method signature is
public HttpResponseMessage EditCommentForExtension(string did, string extention, string comment)
Usually, parameters to an HTTP GET request are URL encoded. This means (among other) that spaces are replaced by "+".
Using + to mean "space" in a URL is an internal convention used by some web sites, but it's not part of the URL encoding standard. If you want to use + to means spaces, you are going to have to convert them yourself.
As you discovered, spaces (like everything else that needs encoding) should be encoded with %XX where X standards for a hex digit.
http://www.w3.org/Addressing/rfc1738.txt
The only thing that work for me is to add %20 instead of the spaces

MIME RFC "Content-Type" parameter confusion? Unclear RFC specification

I'm trying to implement a basic MIME parser for the multipart/related in C++/Qt.
So far I've been writing some basic parser code for headers, and I'm reading the RFCs to get an idea how to do everything as close to the specification as possible. Unfortunately there is a part in the RFC that confuses me a bit:
From RFC882 Section 3.1.1:
Each header field can be viewed as a single, logical line of
ASCII characters, comprising a field-name and a field-body.
For convenience, the field-body portion of this conceptual
entity can be split into a multiple-line representation; this
is called "folding". The general rule is that wherever there
may be linear-white-space (NOT simply LWSP-chars), a CRLF
immediately followed by AT LEAST one LWSP-char may instead be
inserted. Thus, the single line
Alright, so I simply parse a header field and if a CRLF follows with linear whitespace, I simply concat those in a useful manner to result in a single header line. Let's proceed...
From RFC2045 Section 5.1:
In the Augmented BNF notation of RFC 822, a Content-Type header field
value is defined as follows:
content := "Content-Type" ":" type "/" subtype
*(";" parameter)
; Matching of media type and subtype
; is ALWAYS case-insensitive.
[...]
parameter := attribute "=" value
attribute := token
; Matching of attributes
; is ALWAYS case-insensitive.
value := token / quoted-string
token := 1*<any (US-ASCII) CHAR except SPACE, CTLs,
or tspecials>
Okay. So it seems if you want to specify a Content-Type header with parameters, simply do it like this:
Content-Type: multipart/related; foo=bar; something=else
... and a folded version of the same header would look like this:
Content-Type: multipart/related;
foo=bar;
something=else
Correct? Good. As I kept reading the RFCs, I came across the following in RFC2387 Section 5.1 (Examples):
Content-Type: Multipart/Related; boundary=example-1
start="<950120.aaCC#XIson.com>";
type="Application/X-FixedRecord"
start-info="-o ps"
--example-1
Content-Type: Application/X-FixedRecord
Content-ID: <950120.aaCC#XIson.com>
[data]
--example-1
Content-Type: Application/octet-stream
Content-Description: The fixed length records
Content-Transfer-Encoding: base64
Content-ID: <950120.aaCB#XIson.com>
[data]
--example-1--
Hmm, this is odd. Do you see the Content-Type header? It has a number of parameters, but not all have a ";" as parameter delimiter.
Maybe I just didn't read the RFCs correctly, but if my parser works strictly like the specification defines, the type and start-info parameters would result in a single string or worse, a parser error.
Guys, what's your thought on this? Just a typo in the RFCs? Or did I miss something?
Thanks!
It is a typo in the examples. Parameters must always be delimited with semicolons correctly, even when folded. The folding is not meant to change the semantics of a header, only to allow for readability and to account for systems that have line length restrictions.
Quite possibly a typo, but in general (and from experience) you should be able to handle this kind of thing "in the wild" as well. In particular, mail clients vary wildly in their ability to generate valid messages and follow all of the relevant specifications (if anything, it's even worse in the email/SMTP world than it is the WWW world!)

Resources