e.g. take this example
https://alvinalexander.com/blog/post/servlets/how-put-object-request-httpservletrequest-servlet
request.setAttribute("YOUR_KEY", yourVariable);
How to make yourVariable to be a UFT-8 code string ?
Thanks !
In Java Servlets, request-scoped variables are internal to the JVM, so you don't have to worry about encoding them. They're just regular Java strings, which are internally stored as a series of 16-bit characters. You only have to worry about encoding strings as UTF-8 (or decoding them from UTF-8) when sending them outside of the JVM (or receiving them from outside of the JVM). You could encode a Java string into a byte buffer using UTF-8, but then it would just be a byte buffer, not a string. You're best off treating strings within the JVM as regular String instances and only UTF-8 encoding them when sending them to a destination that expects UTF-8. If you're using the string in a JSP, then (assuming that the JSP is using UTF-8) the string will be encoded as UTF-8 during the rendering of the JSP.
Related
I have an external system written in Ruby which sending a data over the wire encoded with ASCII_8BIT. How should I decode and encode them in the Scala?
I couldn't find a library for decoding and encoding ASCII_8BIT string in scala.
As I understand, correctly, the ASCII_8BIT is something similar to Base64. However, there is more than one Base64 encoding. Which type of encoding should I use to be sure that cover all corner cases?
What is ASCII-8BIT?
ASCII-8BIT is Ruby's binary encoding (the name "BINARY" is accepted as an alias for "ASCII-8BIT" when specifying the name of an encoding). It is used both for binary data and for text whose real encoding you don't know.
Any sequence of bytes is a valid string in the ASCII-8BIT encoding, but unlike other 8bit-encodings, only the bytes in the ASCII range are considered printable characters (and of course only those that are printable in ASCII). The bytes in the 128-255 range are considered special characters that don't have a representation in other encodings. So trying to convert an ASCII-8BIT string to any other encoding will fail (or replace the non-ASCII characters with question marks depending on the options you give to encode) unless it only contains ASCII characters.
What's its equivalent in the Scala/JVM world?
There is no strict equivalent. If you're dealing with binary data, you should be using binary streams that don't have an encoding and aren't treated as containing text.
If you're dealing with text, you'll either need to know (or somehow figure out) its encoding or just arbitrarily pick an 8-bit ASCII-superset encoding. That way non-ASCII characters may come out as the wrong character (if the text was actually encoded with a different encoding), but you won't get any errors because any byte is a valid character. You can then replace the non-ASCII characters with question marks if you want.
What does this have to do with Base64?
Nothing. Base64 is a way to represent binary data as ASCII text. It is not itself a character encoding. Knowing that a string has the character encoding ASCII or ASCII-8BIT or any other encoding, doesn't tell you whether it contains Base64 data or not.
But do note that a Base64 string will consist entirely of ASCII characters (and not just any ASCII characters, but only letters, numbers, +, / and =). So if your string contains any non-ASCII character or any character except the aforementioned, it's not Base64.
Therefore any Base64 string can be represented as ASCII. So if you have an ASCII-8BIT string containing Base64 data in Ruby, you should be able to convert it to ASCII without any problems. If you can't, it's not Base64.
In almost all examples UUID is encoded to utf-8 for example.
"aa4aaa2c-c6ca-d5f5-b8b2-0b5c78ee2cb7".getBytes(StandardCharsets.UTF_8))
UUID is not ascii format? Why everyone encodes to utf-8?
UUID is encoded as a 128-bit object (see RFC4122). Your example is the textual representation in hexadecimal of an UUID value.
There is no particular encoding required for UUID. I guess UTF-8 is used probably because it is the default encoding for various exchange formats such as for example JSON.
What do you mean by "UUID is not ascii format?" UUID is a 128-bit number, and this is one (ambiguous) way to encode it into a string. Do you mean "why do people use UTF-8 when ASCII is equivalent?" Because it's good habit to use UTF-8 for most things unless you have a reason not to. When it's equivalent to ASCII, it's the same, so it doesn't matter. When it's not equivalent to ASCII, you usually wanted UTF-8.
I'm writing a Go package for communicating with a 3rd-party vendor's API. Their documentation states roughly this:
Our API uses the ISO-8859-1 encoding. If you fail to use ISO-8859-1 for encoding special characters, this will result in unexpected errors or malformed strings.
I've been doing research on the subject of charsets and encodings, trying to figure out how to "encode special characters" in ISO-8859-1, but based on what I've found this seems to be a red herring.
From StackOverflow, emphasis mine:
UTF-8 is a multibyte encoding that can represent any Unicode character. ISO 8859-1 is a single-byte encoding that can represent the first 256 Unicode characters. Both encode ASCII exactly the same way.
ISO-8859-1 is a binary encoding format where each possible value of a single byte maps to a specific character. It's certainly within my power to have my HTTP POST body encoded in this way, but not any characters beyond the 256 defined in the spec.
I gather that, to encode a special character (such as the Euro symbol) in ISO-8859-1, it would first need to be escaped in some way.
Is there some kind of standard ISO-8859-1 escaping? Would it suffice to URL-encode any special characters and then encode my POST body in ISO-8859-1?
I am trying to convert a string encoding to utf-8 in freemarker script.
Is there a way to encode a freemarker string encoding?
If by UTF-8 encoding you mean percentage escaping, like példa to p%C3%A9lda, then it's done as myString?url (or if it's more familiar this way: ${myString?url}). However, the charset used by ?url depends on the url_encoding_charset FreeMarker configuration setting, which should be set to UTF-8 in your application. (It's also possible to specify the charset directly, like in myString?url('UTF-8').)
Documentation: http://freemarker.org/docs/ref_builtins_string.html#ref_builtin_url
I'm using v8 to use JavaScript in native(c++) code. To call a Javascript function I need to convert all the parameters to v8 data types.
For eg: Code to convert char* to v8 data type
char* value;
...
v8::String::New(value);
Now, I need to pass unicode chars(wchar_t) to JavaScript.
First of all does v8 supports Unicode chars? If yes, how to convert wchar_t/std::wstring to v8 data type?
I'm not sure if this was the case at the time this question was asked, but at the moment the V8 API has a number of functions which support UTF-8, UTF-16 and Latin-1 encoded text:
https://github.com/v8/v8/blob/master/include/v8.h
The relevant functions to create new string objects are:
String::NewFromUtf8 (UTF-8 encoded, obviously)
String::NewFromOneByte (Latin-1 encoded)
String::NewFromTwoByte (UTF-16 encoded)
Alternatively, you can avoid copying the string data and construct a V8 string object that refers to existing data (whose lifecycle you control):
String::NewExternalOneByte (Latin-1 encoded)
String::NewExternalTwoByte (UTF-16 encoded)
Unicode just maps Characters to Number. What you need is proper encoding, like UTF8 or UTF-16.
V8 seems to support UTF-8 (v8::String::WriteUtf8) and a not further described 16bit type (Write). I would give it a try and write some UTF-16 into it.
In unicode applications, windows stores UTF-16 in std::wstring. Maybe you try something like
std::wstring yourString;
v8::String::New (yourString.c_str());
No it doesn't have unicode support, the above solution is fine.
The following code did the trick
wchar_t path[1024] = L"gokulestás";
v8::String::New((uint16_t*)path, wcslen(path))