Decoding unicode code point into utf8 using ICU - c++11

I have a unicode character code point stored as a string.
std::string code = "0663";
I need to decode it into utf8 and get as a standard std::string using the ICU library.
I decided to use ICU to get a cross-platform bit-independent solution.

Untested:
Convert the string into a int32_t.
Treat the int32_t as a UChar32.
Create a UnicodeString with UnicodeString::setTo from the UChar32.
Create a string object with UnicodeString::toUTF8String from the UnicodeString.

Related

universal Detector encoding in golang?

there are some sample code http.DetectContentType(buffer[:n]) detect for limited charset , in case like ANSSI it recognize as UTF-8
is any universal solution for this problem?
to check that byte array is UTF8 string, you can use utf8.valid

How do I convert encrypted raw binary data to/from a hexadecimal string in C?

I am encrypting a string using the crypto libraries in C (using gcc in Linux). Here is what I would do:
ien = encrypt ((unsigned char*)passwd, strlen(passwd), (unsigned char*)cypher_key, (unsigned char*)cypher_salt, cypher_text);
Now if I were to print out the contents of cypher_text, it would be all garbled. I need this to be saved to a text file (it will be hashed with other things so as to be obfuscated eventually). This right now is an experiment. I need to convert this raw binary data into a hexadecimal string that is readable in any text editor.
Then, I need to convert that hexadecimal string back into raw binary so I can do this and get the original text:
decrypt(cypher_text, ien, (unsigned char*)cypher_key, (unsigned char*)cypher_salt, (unsigned char*)passwd);
Can someone please point me in the right direction on how I could accomplish this?
UPDATE A suggested link about printing hex in C isn't exactly what I am after. I am taking direct unprintable data put out by the encrypt function (not hexadecimal values in an array) and trying to convert it into a string of hexadecimal numbers and be able to convert that hex string back into it's original binary form. I hope that helps clear up some of what I'm trying to do.
I'm going to look into the idea of using base64 encoding but I can't have headers, etc. in the encoding. I'll need just a straight string.

Does V8 have Unicode support?

I'm using v8 to use JavaScript in native(c++) code. To call a Javascript function I need to convert all the parameters to v8 data types.
For eg: Code to convert char* to v8 data type
char* value;
...
v8::String::New(value);
Now, I need to pass unicode chars(wchar_t) to JavaScript.
First of all does v8 supports Unicode chars? If yes, how to convert wchar_t/std::wstring to v8 data type?
I'm not sure if this was the case at the time this question was asked, but at the moment the V8 API has a number of functions which support UTF-8, UTF-16 and Latin-1 encoded text:
https://github.com/v8/v8/blob/master/include/v8.h
The relevant functions to create new string objects are:
String::NewFromUtf8 (UTF-8 encoded, obviously)
String::NewFromOneByte (Latin-1 encoded)
String::NewFromTwoByte (UTF-16 encoded)
Alternatively, you can avoid copying the string data and construct a V8 string object that refers to existing data (whose lifecycle you control):
String::NewExternalOneByte (Latin-1 encoded)
String::NewExternalTwoByte (UTF-16 encoded)
Unicode just maps Characters to Number. What you need is proper encoding, like UTF8 or UTF-16.
V8 seems to support UTF-8 (v8::String::WriteUtf8) and a not further described 16bit type (Write). I would give it a try and write some UTF-16 into it.
In unicode applications, windows stores UTF-16 in std::wstring. Maybe you try something like
std::wstring yourString;
v8::String::New (yourString.c_str());
No it doesn't have unicode support, the above solution is fine.
The following code did the trick
wchar_t path[1024] = L"gokulestás";
v8::String::New((uint16_t*)path, wcslen(path))

iconv is not working properly in linux (C++)

I want to convert a string from 1252 char code set to UTF-8. For this I used iconv library in my c++ application development which is based on linux platform.
I used the the API iconv() and converted my string.
there is a character è in my input. UTF-8 also does support to this character. So when my conversion is over, my output also should contain the same character è.
But When I see the output, Character è is converted to è which I don't want.
One more point is if the converter found any unknown character, that should be automatically replaced with the default REPLACEMENT CHARACTER of UTF-8 �(FFFD) which is not happening.
How can I achieve the above two points with the library iconv.
I used the below APIs to convert the string
1)iconv_open("UTF-8","CP1252")
2)iconv() - Pass the parameters required
3)iconv_close(cd)
Can any body help me to sort out this issue please......
Please use this to replace invalid utf-8 charaters.
iconv_open("UTF-8//IGNORE","CP1252")

libxml2 questions about xmlChar*

I'm using libxml2. All function are working with xmlChar*. I found that xmlChar is an unsigned char.
So I have some questions about how to work with it.
1) For example if I working with utf-16 or utf-32 file how libxml2 process it and returns xmlChar in function? Will I lose some characters then??
2) If I want to do something with this string, should I cast it to char* or wchar_t* and how??
Will I lose some characters?
xmlChar is for handling UTF-8 encoding only.
So, to answer your questions:
No, you won't loose any characters if using UTF-16 or UTF-32. Just use iconv or any other library to encode your UTF-16 or UTF-32 data before passing it to the API.
Do not just "cast" the string. Convert them if needed in some other encoding.

Resources