VS 2017 shows incorrect C++ Unicode literals during debugging - visual-studio

When I try to define Unicode string literals in C++(17) I see some very odd results during debugging, which I would like to discuss. Look at the following variable definitions:
std::string u8 { u8"⬀⬁" };
std::string u8_1 { u8"\u2B00\u2B01" };
std::u16string u16 { u"⬀⬁" };
std::u16string u16_1 { 0x2B00, 0x2B01 };
std::u32string u32 { U"⬀⬁" };
std::u32string u32_1 { 0x2B00, 0x2B01 };
std::string u8_2 { u8"\u2B00⬀⬁" };
std::u16string u16_2 { u"\u2B00⬀⬁" };
std::u32string u32_2 { U"\u2B00⬀⬁" };
During debugging I now get the following strings:
As you can see the values are pretty surprising. Strings defined with an initializer list or escape codes appear correct, while those specified as normal characters appear as if they were UTF-8 encoded and the bytes are written to the strings. This is correct for an UTF-8 string (u8_1 here), but not for UTF-16 and UFT-32 strings (u16 and u32 here). And what's even more odd is the fact that the u8 variable contains values that are doubly UTF-8 encoded. You can easily prove that with an online UTF converter. If the debugger was the problem I wouldn't see some correct values, but since it shows values correct that have been specified with escape sequence, I assume something else is guilty for messing up the strings.
What would explain the results somehow is when the file is UTF-8 encoded and that would directly have taken over to string variables, without any conversion. Though I think the original string in the C++ file should be converted from the source encoding (which is indeed UTF-8) to the correct target encode, instead. Needless to say this works as expected in XCode.
Is this a known problem? Can this somehow worked around to avoid having to use numeric values instead?

Related

How to convert a v8::Local<v8::Value> into a uint32_t

Given the following code how can I convert the v8::Local<v8::Value> into a uint32_t. Or other types based on the Is* method?
v8::Local<v8::Value> value;
v8::Local<v8::Context> context = v8::Context::New(v8::Isolate::GetCurrent());
if(value->IsUint32()) {
v8::MaybeLocal<Int32> maybeLocal = value->Uint32Value(context);
uint32_t i = maybeLocal;
}
Your posted code doesn't work because value->Uint32Value(context) doesn't return a v8::MaybeLocal<Int32>. C++ types are your friend (just like TypeScript)!
You have two possibilities:
(1) You can use Value::Uint32Value(...) which returns a Maybe<uint32_t>. Since you already checked that value->IsUint32(), this conversion cannot fail, so you can extract the uint32_t wrapped in the Maybe using Maybe::ToChecked().
(2) You can use Value::ToUint32(...) which returns a MaybeLocal<Uint32>. Again, since you already checked that value->IsUint32(), that cannot fail, so you can get a Local<Uint32> via MaybeLocal::ToLocalChecked(), and then simply use -> syntax to call the wrapped Uint32's Value() method, which gives a uint32_t.
If you're only interested in the final uint32_t (and not in the intermediate Local<Uint32>, which you could pass back to JavaScript), then option (1) will be slightly more efficient.
Note that IsUint32() will say false for objects like {valueOf: () => 42; }. If you want to handle such objects, then attempt the conversion, and handle failures, e.g.:
Maybe<uint32_t> maybe_uint = value->Uint32Value(context);
if (maybe_uint.IsJust()) {
uint32_t i = maybe_uint.FromJust();
} else {
// Conversion failed. Maybe it threw an exception (use a `v8::TryCatch` to catch it), or maybe the object wasn't convertible to a uint32.
// Handle that somehow.
}
Also, note that most of these concepts are illustrated in V8's samples and API tests. Reading comments and implementations in the API headers themselves also provides a lot of insight.
Final note: you'll probably want to track the current context you're using, rather than creating a fresh context every time you need one.

Can wstring_convert just replace invalid characters?

I am currently working on a tool to extract archives from a game for the purpose of data mining. I currently extract metadata from the archives (number of files per archive, filenames, packed/unpacked sizes, etc.) and write them to a std::wstring for further analysis. I have stumbled over an issue with converting filenames to wide characters using std::wstring_conver.
My code looks something like this now:
struct IndexEntry {
int32_t file_id;
std::array<char, 260> filename;
// more fields
}
wstring foo(IndexEntry entry) {
std::wstringstream buffer;
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
buffer << entry.file_id << L'\n';
buffer << converter.from_bytes(entry.filename.data()) << L'\n';
// add rest of the IndexEntry fields to the stream
return buffer.str();
}
The IndexEntry struct is filled by reading from files with a std::ifstream in binary mode. The error happens with converter.from_bytes(). Some of the filenames contain 0x81 as a character and when the converter encounters these, it throws a std::range_error exception.
Is there a way to tell wstring_convert to replace characters it can not convert with something else? Or is there a generally better way to handle this conversion?
This whole project is mostly a learning excercise. I wanted to do all internal string handling with wstring, so I can get some experience dealing with strings in different encodings. Unfortunatly I have no idea what exact encoding was used to generate these archive files.

C++(Visual Studio 2012): Copying a function's parameter char* to a dynamically allocated one

I have this structure defined and a class in my project. It is a class that holds id numbers generated by GetIdUsingThisString(char *), which is a function that loads a texture file into GPU and returns an id(OpenGL).
The problem is, when I try to read a specific file, the program crashes. When I run this program in VS with debugging it works fine, but running .exe crashes the program(or running without debugging from MSVS). By using just-n-time debugger I have found out that, for num of that specific file, Master[num].name actually contains "\x5" added(concatenation) at the end of the file path, and this is only generated for this one file. Nothing out of this method could do it, and I also use this type of slash / in paths, not \ .
struct WIndex{
char* name;
int id;
};
class Test_Class
{
public:
Test_Class(void);
int AddTex(char* path);
struct WIndex* Master;
TextureClass* tex;
//some other stuff...
};
Constructor:
Test_Class::Test_Class(void)
{
num=0;
Master=(WIndex*)malloc(1*sizeof(WIndex));
Master[0].name=(char*)malloc(strlen("Default")*sizeof(char));
strcpy(Master[0].name,"Default");
Master[0].id=GetIdUsingThisString(Master[0].name);
}
Adding a new texture:(The bug)
int Test_Class::AddTex(char* path)
{
num++;
Master=(WIndex*)realloc(Master,(num+1)*sizeof(WIndex));
Master[num].name=(char*)malloc(strlen(path)*sizeof(char));
strcpy(Master[num].name,path);<---HERE
Master[num].id=GetIdUsingThisString(path);
return Master[num].id;
}
At runtime, calling AddTex with this file would have path with the right value, while Master[num].name will show this modified value after strcpy(added "\x5").
Question:
Is there something wrong with copying(strcpy) to a dynamically allocated string? If i use char name[255] as a part of the WIndex structure, everything works fine.
More info:
This exact file is called "flat blanc.tga". If I put it in a folder where I intended it to be, fread in GetIdUsingThisString throws corrupted heap errors. If I put it in a different folder it is ok. If I change it's name to anything else, it's ok again. If I put a different file and give it that same name, it is ok too(!!!). I need the program to be bug free of this kind of things because I won't know which textures will be loaded(if I knew I could simply replace them).
Master[num].name=(char*)malloc(strlen(path)*sizeof(char));
Should be
Master[num].name=(char*)malloc( (strlen(path)+1) * sizeof(char));
There was not place for the terminating NULL character
From http://www.cplusplus.com/reference/cstring/strcpy/:
Copies the C string pointed by source into the array pointed by
destination, including the terminating null character (and
stopping at that point).
The same happens here:
Master[0].name=(char*)malloc(strlen("Default")*sizeof(char));
strcpy(Master[0].name,"Default");
Based on the definitions (below) - you should use strlen(string)+1 for malloc.
A C string is as long as the number of characters between the beginning of the string and the terminating null character (without including the terminating null character itself).
The strcpy() function shall copy the string pointed to by s2 (including the terminating null byte)
Also see discussions in How to allocate the array before calling strcpy?

How to pass to CreateWindow win32 api the title bar string as function param

i know i need to use : L"my title" for the second param in the win32 api CreateWindow.
but what is i want to make this parameter dynamic and gets its string from function .
for example this not working the title bar is in Chinese fonts all the time
GLboolean esUtil_win32::WinCreate ( ESContext *esContext, const char *title )
{
...
...
esContext->hWnd = CreateWindow(
L"opengles2.0",
(LPCTSTR)title,
wStyle,
0,
0,
windowRect.right - windowRect.left,
windowRect.bottom - windowRect.top,
NULL,
NULL,
hInstance,
NULL);
}
Fundamentally the problem is that title is an ANSI (or multi-byte) string and the CreateWindowW function expects Unicode strings. There are three ways you can solve this:
Change the definition of the WinCreate function to take a const wchar_t* title parameter instead. This may have repercussions elsewhere in your code, although if the strings passed to this function are always string literals then it's as simple as prefixing them all with L to make them wide.
Change the CreateWindow call to CreateWindowA, to explicitly call the ANSI version of the function. This would let you pass title to the function without conversion. You would need to remove the L from L"opengles2.0" if you did this.
Convert the title string to Unicode before passing it to the function. You can do this using code similar to this:
wchar_t wchTitle[256]; // pick a sensible maximum
MultiByteToWideChar(CP_ACP, 0, title, -1, wchTitle, 256);
You would then pass wchTitle to the CreateWindow function instead of title. If title is in some other encoding (e.g. UTF-8) you would change the CP_ACP value appropriately.
Windows uses UTF-16 character encoding, but you are passing a string with some other encoding (title). To use this string you need to convert it to UTF-16 first. Call MultiByteToWideChar to convert from your source encoding to UTF-16.

UnicodeString storage type

My application needs to manage a few of unicode strings (<10). The content of these strings is dynamic and can change through application run. To store strings I am using objects of type UnicodeString.
One approach to solving this problem is to create as many member variables as there are unicode strings like for example:
UnicodeString str1;
UnicodeString str2;
...
UnicodeString strN;
This solutions is pretty simple at least at first glance. But there is problem with scalability. If the number of string would rise in the future, we risk creating hard-to-read bigg code. So I thougth creating something like this for managing strings:
std::map<HWND, UnicodeString> file_names; ///< member variable of form TForm1
Every string is connected with some edit box. I can use window handle as key to dictionary.
What I don't understand - who should be responsible for allocating and deallocating space for storing unicode string in this case? Lets say I create UnicodeString variable on local stack:
void TForm1::ProcessFile(TEdit *edit_box)
{
UnicodeString str = "C:\\Temp\\ws.gdb";
file_name[edit_box->Handle] = str;
}
Will the content of str variable survive end of member function ProcessFile?
The memory storage of a UnicodeString is reference counted and managed by the RTL for you. You do not need to worry about deallocating it yourself, unless you allocate the UnicodeString itself using the new operator. In your code snippet, the str variable will be freed when ProcessFile() exits, but its contents will survive because file_name still has an active reference to it.
Do not use an HWND as the key for your std::map. The window managed by the TWinControl::Handle property is dynamic and can change value during the lifetime of the app. You can, however, use the TEdit* pointer instead:
std::map<TEdit*, UnicodeString> file_names;
void TForm1::ProcessFile(TEdit *edit_box)
{
UnicodeString str = "C:\\Temp\\ws.gdb";
file_names[edit_box] = str;
}

Resources