Does V8 have Unicode support? - v8

I'm using v8 to use JavaScript in native(c++) code. To call a Javascript function I need to convert all the parameters to v8 data types.
For eg: Code to convert char* to v8 data type
char* value;
...
v8::String::New(value);
Now, I need to pass unicode chars(wchar_t) to JavaScript.
First of all does v8 supports Unicode chars? If yes, how to convert wchar_t/std::wstring to v8 data type?

I'm not sure if this was the case at the time this question was asked, but at the moment the V8 API has a number of functions which support UTF-8, UTF-16 and Latin-1 encoded text:
https://github.com/v8/v8/blob/master/include/v8.h
The relevant functions to create new string objects are:
String::NewFromUtf8 (UTF-8 encoded, obviously)
String::NewFromOneByte (Latin-1 encoded)
String::NewFromTwoByte (UTF-16 encoded)
Alternatively, you can avoid copying the string data and construct a V8 string object that refers to existing data (whose lifecycle you control):
String::NewExternalOneByte (Latin-1 encoded)
String::NewExternalTwoByte (UTF-16 encoded)

Unicode just maps Characters to Number. What you need is proper encoding, like UTF8 or UTF-16.
V8 seems to support UTF-8 (v8::String::WriteUtf8) and a not further described 16bit type (Write). I would give it a try and write some UTF-16 into it.
In unicode applications, windows stores UTF-16 in std::wstring. Maybe you try something like
std::wstring yourString;
v8::String::New (yourString.c_str());

No it doesn't have unicode support, the above solution is fine.

The following code did the trick
wchar_t path[1024] = L"gokulestás";
v8::String::New((uint16_t*)path, wcslen(path))

Related

universal Detector encoding in golang?

there are some sample code http.DetectContentType(buffer[:n]) detect for limited charset , in case like ANSSI it recognize as UTF-8
is any universal solution for this problem?
to check that byte array is UTF8 string, you can use utf8.valid

Can you use UTF-8 code for HttpServletRequest.setAttribute?

e.g. take this example
https://alvinalexander.com/blog/post/servlets/how-put-object-request-httpservletrequest-servlet
request.setAttribute("YOUR_KEY", yourVariable);
How to make yourVariable to be a UFT-8 code string ?
Thanks !
In Java Servlets, request-scoped variables are internal to the JVM, so you don't have to worry about encoding them. They're just regular Java strings, which are internally stored as a series of 16-bit characters. You only have to worry about encoding strings as UTF-8 (or decoding them from UTF-8) when sending them outside of the JVM (or receiving them from outside of the JVM). You could encode a Java string into a byte buffer using UTF-8, but then it would just be a byte buffer, not a string. You're best off treating strings within the JVM as regular String instances and only UTF-8 encoding them when sending them to a destination that expects UTF-8. If you're using the string in a JSP, then (assuming that the JSP is using UTF-8) the string will be encoded as UTF-8 during the rendering of the JSP.

Decoding unicode code point into utf8 using ICU

I have a unicode character code point stored as a string.
std::string code = "0663";
I need to decode it into utf8 and get as a standard std::string using the ICU library.
I decided to use ICU to get a cross-platform bit-independent solution.
Untested:
Convert the string into a int32_t.
Treat the int32_t as a UChar32.
Create a UnicodeString with UnicodeString::setTo from the UChar32.
Create a string object with UnicodeString::toUTF8String from the UnicodeString.

iconv is not working properly in linux (C++)

I want to convert a string from 1252 char code set to UTF-8. For this I used iconv library in my c++ application development which is based on linux platform.
I used the the API iconv() and converted my string.
there is a character è in my input. UTF-8 also does support to this character. So when my conversion is over, my output also should contain the same character è.
But When I see the output, Character è is converted to è which I don't want.
One more point is if the converter found any unknown character, that should be automatically replaced with the default REPLACEMENT CHARACTER of UTF-8 �(FFFD) which is not happening.
How can I achieve the above two points with the library iconv.
I used the below APIs to convert the string
1)iconv_open("UTF-8","CP1252")
2)iconv() - Pass the parameters required
3)iconv_close(cd)
Can any body help me to sort out this issue please......
Please use this to replace invalid utf-8 charaters.
iconv_open("UTF-8//IGNORE","CP1252")

libxml2 questions about xmlChar*

I'm using libxml2. All function are working with xmlChar*. I found that xmlChar is an unsigned char.
So I have some questions about how to work with it.
1) For example if I working with utf-16 or utf-32 file how libxml2 process it and returns xmlChar in function? Will I lose some characters then??
2) If I want to do something with this string, should I cast it to char* or wchar_t* and how??
Will I lose some characters?
xmlChar is for handling UTF-8 encoding only.
So, to answer your questions:
No, you won't loose any characters if using UTF-16 or UTF-32. Just use iconv or any other library to encode your UTF-16 or UTF-32 data before passing it to the API.
Do not just "cast" the string. Convert them if needed in some other encoding.

Resources