URN for MIME Type - mime

Does an official URN for the MIME type exist?
Mozilla Firefox and other applications use notations like "urn:mimetype:text/plain" or "urn:mimetype:handler:text/plain".
There are two problems with this approach:
No "mimetype" namespace exists according to the IANA's official registry (http://www.iana.org/assignments/urn-namespaces/). Thus, only "urn:x-mimetype" would be a valid namespace (according to RFC 3406 section 4.1).
The slash "/" may not be used in URNs according to RFC 2141 section 2.2. But it could be encoded to "%2F".
All that considered, is there another way to represent the MIME type "text/plain" as an URN than "urn:x-mimetype:text%2Fplain"?
[UPDATE: Thinking about it, an URI would be ok too. But I can't find no URI for MIME type neither.]
Thanks

FYI, I read the presence of / as a SHOULD NOT by RFC 2141, rather than a MUST NOT.
For this approach I would just use the URI of the assignment, e.g. <http://www.iana.org/assignments/media-types/application/zip>. The only caveat is that not all of them dereference. If you can live with that though, you should be OK.

Related

When making SNMPv3 connection, is it necessary to specify "Context Name"

When we make a SNMPv3 connection, following are the parameters mainly.
SNMPV3UserName
SNMPV3ContextName
SNMPV3SecurityLevel
SNMPV3AuthProtocol
SNMPV3AuthPassword
SNMPV3PrivacyControl
SNMPV3PrivacyPassword
I want to understand, if is it necessary to specify "SNMPV3ContextName" when connecting. I SNMP RFC Doc and other links I did not find any clear mention.
I have one application which asks for context name if not input by user. I doubt that it should not ask for Context name input as it seems like optional parameter.
RFC I reffered : https://www.rfc-editor.org/rfc/rfc5343
tl;dr: Probably not.
RFC 5343 says:
The contextName is a character string (following the SnmpAdminString textual convention of the SNMP-FRAMEWORK-MIB [RFC3411])
and RFC 3411 defines SnmpAdminString as an OCTET STRING (SIZE (0..255)).
So, it can be empty. I can't find anything to constrain this more, so an empty string is permitted. Per these RFCs (and also RFC 3412) it seems to be a way to add multiple contexts on top of the contextEngineID, if your engine needs this disambiguating functionality (to treat it as multiple engines, in a sense).
However, as with anything SNMP, some implementations may impose their own constraints, or just flat-out not follow the spec properly. So you should consult the documentation for the technology that you're using.

If Content-disposition is not safe to use, what can we use instead?

I've read here that using content-disposition has security issues and is not part of the http standard. If content-disposition, what can we use instead?
I've also searched the list of all response fields categorized whether it is part of the standard or not and I've not seen a response field that can be used to replace content-disposition.
Well, the information about not being a standard is incorrect - see https://greenbytes.de/tech/webdav/rfc6266.html and http://www.iana.org/assignments/message-headers/message-headers.xhtml (note that Wikipedia is entirely irrelevant with respect to this).

What does UTypeAnnotation in CoreAnnotations do?

I checked the documentation but nothing's mentioned in it.
links referred are
http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ling/CoreAnnotations.html
and
http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ling/CoreAnnotations.UTypeAnnotation.html
The only thing I understood from the links that it returns the unicode type class. Doesn't mention what "class" means for this case and what is the basis for classification
This is used by the Chinese segmenter. See, for example, edu/stanford/nlp/wordseg/Sighan2005DocumentReaderAndWriter.java. It is then used in the Chinese Segmenter feature factory (edu/stanford/nlp/wordseg/Gale2007ChineseSegmenterFeatureFactory.java).

MIME subject decoding, when RFC are not respected

Subject mime field is in ASCII. Every character excluded by the ASCII table has to be Q/encoded or base64/encoded. Content-Type field in the header has also nothing to do with the way subject is encoded. Am I correct?
However (and unfortunately) some clients (read Microsoft Outlook 6 for example) insert a string encoded in whatever (BIG5 for example) in the header, without specifying with q/base64 encoding that the string is in BIG5. How can i handle these wrongly-encoded emails? Is there a standard way, to parse these?
My goal is to have the biggest compatibility possible, even by using 3rd part paid programs; how can i do that? (sorry for my buggy english)
Subject header encoding has nothing to do with Content-Type header. There is no "perfect" way to handle Subject. I've implemented this just by a hack that tries to see if all characters of text fit in big5, if not then try next encoding in order.
Big5, utf-8, latin-1, q/base64 and finally ascii

How can I use unicode in "mailto" protocol?

I want to launch default e-mail client application via ShellExecute function.
I.e. I write something like this:
ShellExecute(0, 'mailto:example#example.com?subject=example&body=example', ...);
How can I encode non-US characters in subject and body?
I can't use default ANSI code page, because characters can be anything: chinese characters, cyrillic or something else.
P.S. Notes:
I'm using ShellExecuteW function.
Leaving subject and body "as is" will not work (tested with Windows Live Mail client on Win7 and Outlook Express on WinXP).
Encoding subject as URLEncode(UTF8Encode(Subject)) will work for Windows Live Mail, but won't work for Outlook Express.
URLEncode(UTF8Encode(Body)) will not work for both clients.
example#example.com?subject=example&body=%e5%85%ad
The short answer is no. Characters must be percentage-encoded as defined by RFC 3986 and its predecessors. RFC 2368 defines the structure of the mailto URI.
#include "windows.h"
int main() {
ShellExecute(0, TEXT("open"),
TEXT("mailto:example#example.com?subject=example&body=%e5%85%ad"),
TEXT(""), NULL, SW_SHOWNORMAL);
return 0;
}
The body in this case is the CJK character U+516D (六) encoded as UTF-8 (E5 85 AD). This works correctly with Mozilla Thunderbird (you may need to install additional fonts if it does not).
The rest is up to how your user-agent (mail client) interprets the URI. RFC 3986 mandates UTF-8, but prior specifications did not. A user-agent may fail to interpret the data correctly if it pre-dates RFC 3986, has not been updated or is maintaining backwards compatibility with prior implementations.
Note: URLEncode functions generally mean the HTML application/x-www-form-urlencoded encoding. This will probably cause space characters to be replaced by plus characters.
Note 2: I'm not current on the state of IRI support in the Windows shell, but it's probably worth looking into. However, some characters in the query part will still need to be percent-encoded.
The interpretation of the command line is up to the launched program. Depending on the nature of the installed e-mail client, you may or may not get your Unicode support (in one or another different shape or form). So there's no single recipe. Some of them may use ANSI command line (because why not?), some of them may respect URLEncoded characters, etc.
Your best bet is to detect 3-4 popular mailers by reading the registry and customize your command line accordingly. Very inelegant, and incomplete by design, but nothing else you can do.

Resources