Decoding unicode code point into utf8 using ICU

Decoding unicode code point into utf8 using ICU - c++11

I have a unicode character code point stored as a string.
std::string code = "0663";
I need to decode it into utf8 and get as a standard std::string using the ICU library.
I decided to use ICU to get a cross-platform bit-independent solution.

Untested:
Convert the string into a int32_t.
Treat the int32_t as a UChar32.
Create a UnicodeString with UnicodeString::setTo from the UChar32.
Create a string object with UnicodeString::toUTF8String from the UnicodeString.

Related

universal Detector encoding in golang?

there are some sample code http.DetectContentType(buffer[:n]) detect for limited charset , in case like ANSSI it recognize as UTF-8
is any universal solution for this problem?

to check that byte array is UTF8 string, you can use utf8.valid

How do I convert encrypted raw binary data to/from a hexadecimal string in C?

I am encrypting a string using the crypto libraries in C (using gcc in Linux). Here is what I would do:
ien = encrypt ((unsigned char*)passwd, strlen(passwd), (unsigned char*)cypher_key, (unsigned char*)cypher_salt, cypher_text);
Now if I were to print out the contents of cypher_text, it would be all garbled. I need this to be saved to a text file (it will be hashed with other things so as to be obfuscated eventually). This right now is an experiment. I need to convert this raw binary data into a hexadecimal string that is readable in any text editor.
Then, I need to convert that hexadecimal string back into raw binary so I can do this and get the original text:
decrypt(cypher_text, ien, (unsigned char*)cypher_key, (unsigned char*)cypher_salt, (unsigned char*)passwd);
Can someone please point me in the right direction on how I could accomplish this?
UPDATE A suggested link about printing hex in C isn't exactly what I am after. I am taking direct unprintable data put out by the encrypt function (not hexadecimal values in an array) and trying to convert it into a string of hexadecimal numbers and be able to convert that hex string back into it's original binary form. I hope that helps clear up some of what I'm trying to do.
I'm going to look into the idea of using base64 encoding but I can't have headers, etc. in the encoding. I'll need just a straight string.

Does V8 have Unicode support?

I'm using v8 to use JavaScript in native(c++) code. To call a Javascript function I need to convert all the parameters to v8 data types.
For eg: Code to convert char* to v8 data type
char* value;
...
v8::String::New(value);
Now, I need to pass unicode chars(wchar_t) to JavaScript.
First of all does v8 supports Unicode chars? If yes, how to convert wchar_t/std::wstring to v8 data type?

I'm not sure if this was the case at the time this question was asked, but at the moment the V8 API has a number of functions which support UTF-8, UTF-16 and Latin-1 encoded text:
https://github.com/v8/v8/blob/master/include/v8.h
The relevant functions to create new string objects are:
String::NewFromUtf8 (UTF-8 encoded, obviously)
String::NewFromOneByte (Latin-1 encoded)
String::NewFromTwoByte (UTF-16 encoded)
Alternatively, you can avoid copying the string data and construct a V8 string object that refers to existing data (whose lifecycle you control):
String::NewExternalOneByte (Latin-1 encoded)
String::NewExternalTwoByte (UTF-16 encoded)

Unicode just maps Characters to Number. What you need is proper encoding, like UTF8 or UTF-16.
V8 seems to support UTF-8 (v8::String::WriteUtf8) and a not further described 16bit type (Write). I would give it a try and write some UTF-16 into it.
In unicode applications, windows stores UTF-16 in std::wstring. Maybe you try something like
std::wstring yourString;
v8::String::New (yourString.c_str());

No it doesn't have unicode support, the above solution is fine.

The following code did the trick
wchar_t path[1024] = L"gokulestás";
v8::String::New((uint16_t*)path, wcslen(path))

iconv is not working properly in linux (C++)

I want to convert a string from 1252 char code set to UTF-8. For this I used iconv library in my c++ application development which is based on linux platform.
I used the the API iconv() and converted my string.
there is a character è in my input. UTF-8 also does support to this character. So when my conversion is over, my output also should contain the same character è.
But When I see the output, Character è is converted to Ã¨ which I don't want.
One more point is if the converter found any unknown character, that should be automatically replaced with the default REPLACEMENT CHARACTER of UTF-8 �(FFFD) which is not happening.
How can I achieve the above two points with the library iconv.
I used the below APIs to convert the string
1)iconv_open("UTF-8","CP1252")
2)iconv() - Pass the parameters required
3)iconv_close(cd)
Can any body help me to sort out this issue please......

Please use this to replace invalid utf-8 charaters.
iconv_open("UTF-8//IGNORE","CP1252")

libxml2 questions about xmlChar*

I'm using libxml2. All function are working with xmlChar*. I found that xmlChar is an unsigned char.
So I have some questions about how to work with it.
1) For example if I working with utf-16 or utf-32 file how libxml2 process it and returns xmlChar in function? Will I lose some characters then??
2) If I want to do something with this string, should I cast it to char* or wchar_t* and how??
Will I lose some characters?

xmlChar is for handling UTF-8 encoding only.
So, to answer your questions:
No, you won't loose any characters if using UTF-16 or UTF-32. Just use iconv or any other library to encode your UTF-16 or UTF-32 data before passing it to the API.
Do not just "cast" the string. Convert them if needed in some other encoding.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Decoding unicode code point into utf8 using ICU - c++11

I have a unicode character code point stored as a string. std::string code = "0663"; I need to decode it into utf8 and get as a standard std::string using the ICU library. I decided to use ICU to get a cross-platform bit-independent solution.

Untested: Convert the string into a int32_t. Treat the int32_t as a UChar32. Create a UnicodeString with UnicodeString::setTo from the UChar32. Create a string object with UnicodeString::toUTF8String from the UnicodeString.

Related

universal Detector encoding in golang?

How do I convert encrypted raw binary data to/from a hexadecimal string in C?

Does V8 have Unicode support?

iconv is not working properly in linux (C++)

libxml2 questions about xmlChar*

Categories

Resources