universal Detector encoding in golang? - go

there are some sample code http.DetectContentType(buffer[:n]) detect for limited charset , in case like ANSSI it recognize as UTF-8
is any universal solution for this problem?

to check that byte array is UTF8 string, you can use utf8.valid

Related

Does V8 have Unicode support?

I'm using v8 to use JavaScript in native(c++) code. To call a Javascript function I need to convert all the parameters to v8 data types.
For eg: Code to convert char* to v8 data type
char* value;
...
v8::String::New(value);
Now, I need to pass unicode chars(wchar_t) to JavaScript.
First of all does v8 supports Unicode chars? If yes, how to convert wchar_t/std::wstring to v8 data type?
I'm not sure if this was the case at the time this question was asked, but at the moment the V8 API has a number of functions which support UTF-8, UTF-16 and Latin-1 encoded text:
https://github.com/v8/v8/blob/master/include/v8.h
The relevant functions to create new string objects are:
String::NewFromUtf8 (UTF-8 encoded, obviously)
String::NewFromOneByte (Latin-1 encoded)
String::NewFromTwoByte (UTF-16 encoded)
Alternatively, you can avoid copying the string data and construct a V8 string object that refers to existing data (whose lifecycle you control):
String::NewExternalOneByte (Latin-1 encoded)
String::NewExternalTwoByte (UTF-16 encoded)
Unicode just maps Characters to Number. What you need is proper encoding, like UTF8 or UTF-16.
V8 seems to support UTF-8 (v8::String::WriteUtf8) and a not further described 16bit type (Write). I would give it a try and write some UTF-16 into it.
In unicode applications, windows stores UTF-16 in std::wstring. Maybe you try something like
std::wstring yourString;
v8::String::New (yourString.c_str());
No it doesn't have unicode support, the above solution is fine.
The following code did the trick
wchar_t path[1024] = L"gokulestás";
v8::String::New((uint16_t*)path, wcslen(path))

iconv is not working properly in linux (C++)

I want to convert a string from 1252 char code set to UTF-8. For this I used iconv library in my c++ application development which is based on linux platform.
I used the the API iconv() and converted my string.
there is a character è in my input. UTF-8 also does support to this character. So when my conversion is over, my output also should contain the same character è.
But When I see the output, Character è is converted to è which I don't want.
One more point is if the converter found any unknown character, that should be automatically replaced with the default REPLACEMENT CHARACTER of UTF-8 �(FFFD) which is not happening.
How can I achieve the above two points with the library iconv.
I used the below APIs to convert the string
1)iconv_open("UTF-8","CP1252")
2)iconv() - Pass the parameters required
3)iconv_close(cd)
Can any body help me to sort out this issue please......
Please use this to replace invalid utf-8 charaters.
iconv_open("UTF-8//IGNORE","CP1252")

Base64 calculation for extended ascii?

As far as I know - base 64 can represent any char. ( inclduing binary)
Base64 encoding schemes are commonly used when there is a need to
encode binary(!) data that need to be stored and transferred over
media that are designed to deal with textual data
So I tried to apply it on extended ascii char ( beyond the 127)
the char :
after following the simple algorithm :
I got to :
so the value should be Fy
So why when I use online-encoder and put the value by alt+178 ,
I get this result :
What is going on here ?
Your browser sent the encoding website the UTF-8 encoding of the character. This encoding is not 178.
That is the UTF-8 encoding of unicode character U+2593, which is the same as extended ASCII character 178.
Thanks to Damien_The_Unbeliever && zmbq &&Markus Jarderot :
so

Make UUID shorter (Hex to ASCII conversion)

In my web application one model uses identifier that was generated by some UUID tool. As I want that identifier to be part of the URL I am investigating methods to shorten that UUID string. As it is currently is in hexadecimal format I thought about converting it to ASCII somehow. As it should afterwards only contain normal characters and number ([\d\w]+) the normal hex to ASCII conversion doesn't seem to work (ugly characters).
Do you know of some nice algorithm or tool (Ruby) to do that?
A UUID is a 128-bit binary number, in the end. If you represent it as 16 unencoded bytes, there's no way to avoid "ugly characters". What you probably want to do is decode it from hex and then encode it using base64. Note that base64 encoding uses the characters + / = as well as A-Za-z0-9, you'll want to do a little postprocessing (I suggest s/+/-/g; s/\//_/g; s/==$// -- a base64ed UUID will always end with two equals signs)

Converting ANSI to UTF8 with Ruby

I have a Ruby script that generates an ANSI file.
I want to convert the file to UTF8.
What's the easiest way to do it?
If your data is between ascii range 0 to 0x7F, its valid UTF8, so you don't need to do anything.
Or, if there is characters above 0x7F, you could use Iconv
text=Iconv.iconv('UTF-8', 'ascii',text)
The 8-bit Unicode Transformation Format (UTF-8) was designed to be backwards compatible with the American Standard Code for Information Interchange (ASCII). Therefore, by definition, any valid ASCII sequence is also a valid UTF-8 sequence. For more information, read the UTF FAQ and Unicode FAQ.
Any ASCII file is a valid UTF8 file, going by your Q's title, so no conversion is needed. I don't know what a UIF8 file is, going by your Q's text, so different from its title.

Resources