problems with unsigned char - char

I have using Aaron Liddiment's excellent Ledtext library.
It includes code
unsigned char TxtDemo[] = { EFFECT_HSV "\x10\xff\xff" "TEST" };
ScrollingMsg.SetText((unsigned char *)TxtDemo, sizeof(TxtDemo) - 1);
I have a socket connection which outputs a string. I want to dynamically replace the word "TEST" with this string. I have tried:
unsigned char TxtDemo[] = { EFFECT_HSV "\x00\xff\xff" };
String txt="my text";
strcat( TxtDemo, txt.c_str() );
no good... help please

Related

How to widen a standard string while maintaining the characters?

I have a function to take a std::string and change it into a wchar_t*. My current widen function looks like this
wchar_t* widen(const std::string& str){
wchar_t * dest = new wchar_t[str.size()+1];
char * temp = new char[str.size()];
for(int i=0;i<str.size();i++)
dest[i] = str[i];
dest[str.size()] = '\0';
return dest;
}
This works just fine for standard characters, however (and I cannot believe this hasn't been an issue before now) when I have characters like á, é, í, ó, ú, ñ, or ü it breaks and the results are vastly different.
Ex: my str comes in as "Database Function: áFákéFúnctíóñü"
But dest ends up as: "Database Function: £F£k←Fnct■￳￱"
How can I change from a std::string to a wchar_t* while maintaining international characters?
Short answer: You can't.
Longer answer: std::string contains char elements which typically contain ASCII in the first 127 values, while everything else ("international characters") is in the values above (or the negative ones, if char is signed). In order to determine the according representation in a wchar_t string, you first need to know the encoding in the source string (could be ISO-8859-15 or even UTF-8) and the one in the target string (often UTF-16, UCS2 or UTF-32) and then transcode accordingly.
It depends if the source is using old ANSI code page or UTF8. For ANSI code page, you have to know the locale, and use mbstowcs. For UTF8 you can make a conversion to UTF16 using codecvt_utf8_utf16. However codecvt_utf8_utf16 is deprecated and it has no replacement as of yet. In Windows you can use WinAPI function to make the conversions more reliably.
#include <iostream>
#include <string>
#include <codecvt>
std::wstring widen(const std::string& src)
{
int len = src.size();
std::wstring dst(len + 1, 0);
mbstowcs(&dst[0], src.c_str(), len);
return dst;
}
int main()
{
//ANSI code page?
std::string src = "áFákéFúnctíóñü";
setlocale(LC_ALL, "en"); //English assumed
std::wstring dst = widen(src);
std::wcout << dst << "\n";
//UTF8?
src = u8"áFákéFúnctíóñü";
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>, wchar_t> convert;
dst = convert.from_bytes(src);
std::wcout << dst << "\n";
return 0;
}
For a Windows solution, here's some utility functions I use based on the wisdom of http://utf8everywhere.org/
/// Convert a windows UTF-16 string to a UTF-8 string
///
/// #param s[in] the UTF-16 string
/// #return std::string UTF-8 string
inline std::string Narrow(std::wstring_view wstr) {
if (wstr.empty()) return {};
int len = ::WideCharToMultiByte(CP_UTF8, 0, &wstr[0], wstr.size(), nullptr, 0,
nullptr, nullptr);
std::string out(len, 0);
::WideCharToMultiByte(CP_UTF8, 0, &wstr[0], wstr.size(), &out[0], len,
nullptr, nullptr);
return out;
}
/// Convert a UTF-8 string to a windows UTF-16 string
///
/// #param s[in] the UTF-8 string
/// #param n[in] the UTF-8 string's length, or -1 if string is null-terminated
/// #return std::wstring UTF-16 string
inline std::wstring Widen(std::string_view str) {
if (str.empty()) return {};
int len = ::MultiByteToWideChar(CP_UTF8, 0, &str[0], str.size(), NULL, 0);
std::wstring out(len, 0);
::MultiByteToWideChar(CP_UTF8, 0, &str[0], str.size(), &out[0], len);
return out;
}
Usually used inline in windows API calls like:
std::string message = "Hello world!";
::MessageBoxW(NULL, Widen(message).c_str(), L"Title", MB_OK);
A cross-platform and possibly faster solution could be found by exploring Boost.Nowide's conversion functions: https://github.com/boostorg/nowide/blob/develop/include/boost/nowide/utf/convert.hpp

Storing/Retrieving Pointer to std::wstring in char Array

I am trying to learn more about C++ memory management and type casting. How can I store and retrieve a std::wstring* in a char array? Code or suggestions about what to read would be appreciated.
Here is what I have tried so far. My definition of "Works" is that the correct string appears in the GUI control. "Does not work" means it compiles and runs but gives me a blank in the GUI control.
I'm really curious about why "Wrapper 2" works but not "Wrapper 1".
Integer example (Works):
int* lInt(new int);
*lInt = 1500000;
char lBuffer[sizeof(void*)];
memcpy(lBuffer, lInt, sizeof(void*));
int* lInt2 = (int*)lBuffer;
Memo->Lines->Append(IntToStr(*lInt2)); //C++Builder GUI control
delete lInt;
std::wstring Example (Does not work):
std::wstring* lMyString = new std::wstring();
*lMyString = L"My Name";
char lBuffer[sizeof(void*)];
memcpy(lBuffer, lMyString, sizeof(void*));
std::wstring* lMyString2 = (std::wstring*)lBuffer;
Memo->Lines->Append(lMyString2->c_str()); //C++Builder GUI control
delete lMyString;
Wrapper 1 (Does not work):
struct MyString
{
std::wstring Text;
};
MyString* lMyString = new MyString();
lMyString->Text = L"My Name";
char lBuffer[sizeof(void*)];
memcpy(lBuffer, lMyString, sizeof(void*));
MyString* lMyString2 = (MyString*)lBuffer;
Memo->Lines->Append(lMyString2->Text.c_str()); //C++Builder GUI control
delete lMyString;
Wrapper 2 (Works):
struct MyString
{
MyString(): Text(new std::wstring()){};
~MyString() {delete Text;};
std::wstring* Text;
};
MyString* lMyString = new MyString();
*lMyString->Text = L"My Name";
char lBuffer[sizeof(void*)];
memcpy(lBuffer, lMyString, sizeof(void*));
MyString* lMyString2 = (MyString*)lBuffer;
Memo->Lines->Append(lMyString2->Text->c_str()); //C++Builder GUI control
delete lMyString;
How can I store and retrieve a std::wstring* in a char array?
The better way is to store it in void*:
std::wstring ws;
void *addr = &ws;
// ...
std::wstring retreived = *static_cast<std::wstring*>(addr);
***** std::wstring Example (Does not work) *****
std::wstring* lMyString = new std::wstring();
// ...
char lBuffer[sizeof(void*)];
memcpy(lBuffer, lMyString, sizeof(void*));
std::wstring* lMyString2 = (std::wstring*)lBuffer;
// ...
memcpy used this way actually copies bytes of the stored std::wstring.
void* memcpy( void* dest, const void* src, std::size_t count );
lMyString is a pointer to allocated std::wstring and memcpy copies bytes from memory src points to. What you actually wanted to do is to copy the pointer itself so you need to take address of the pointer:
memcpy(lBuffer, &lMyString, sizeof(void*));
std::wstring* lMyString2 = (std::wstring*)lBuffer;
This is also wrong. You are converting address of the first character in char[], not the actual bytes that you stored in the char array before. You should memcpy it back:
std::wstring* lMyString2 = nullptr;
std::memcpy(&lMyString2, lBuffer, sizeof(void*));
As I said, use void* instead.

cannot convert const wchar_t* to const char*

ALL,
Can someone explain to me why this code:
std::wstring query1 = L"SELECT....";
res = mysql_query( m_db, m_pimpl->m_myconv.from_bytes( query1.c_str() ).c_str() );
gives me an error from the subject?
I do have -DUNICODE defined inside C++ options
I guess I just need a pair of fresh eyes.
Thank you.
It is on Gentoo Linux with gcc5.4.
This is a way to convert a unicode wide-character string to a const char*
char query_cstr[100];
size_t charsConverted;
wchar_t* unicode_query = L"SELECT * FROM table;";
wcstombs_s(&charsConverted, query_cstr, unicode_query, wcslen(unicode_query));
const char* query_const = query_cstr;
//Use query_const inside of mysql_query now that it's been converted to a const char*
I've run into trouble using the locale functions for various reasons. wcstombs_s() makes things a bit easier when converting unicode. Using c_str() on a std::wstring object will yield a const wchar_t* string, which is not what you want.

Outputting UTF-8 with qInstallMsgHandler

I would like to make my debug handler (installed with qInstallMsgHandler) handles UTF-8, however it seems it can only be defined as void myMessageOutput(QtMsgType type, const char *msg) and const char* doesn't handle UTF-8 (once displayed, it's just random characters).
Is there some way to define this function as void myMessageOutput(QtMsgType type, QString msg), or maybe some other way to make it work?
This is my current code:
void myMessageOutput(QtMsgType type, const char *msg) {
QString message = "";
QString test = QString::fromUtf8(msg);
// If I break into the debugger here. both "test" and "msg" contain a question mark.
switch (type) {
case QtDebugMsg:
message = QString("[Debug] %1").arg(msg);
break;
case QtWarningMsg:
message = QString("[Warning] %1").arg(msg);
break;
case QtCriticalMsg:
message = QString("[Critical] %1").arg(msg);
break;
case QtFatalMsg:
message = QString("[Fatal] %1").arg(msg);
abort();
}
Application::instance()->debugDialog()->displayMessage(message);
}
Application::Application(int argc, char *argv[]) : QApplication(argc, argv) {
debugDialog_ = new DebugDialog();
debugDialog_->show();
qInstallMsgHandler(myMessageOutput);
qDebug() << QString::fromUtf8("我");
}
If you step through the code in the debugger you will find out that QDebug and qt_message first construct a QString from the const char* and then use toLocal8Bit on this string.
The only way I can think of to circumvent this: Use your own coding (something like "[E68891]") or some other coding like uu-encode or base64-encoding that uses only ASCII characters and decode the string in your message handler.
You should also consider to use the version qDebug("%s", "string") to avoid quotes and additional whitespace (see this question).
Edit: the toLocal8Bit happens in the destructor of QDebug that is call at the end of a qDebug statement (qdebug.h line 85). At least on the Windows platform this calls toLatin1 thus misinterpreting the string. You can prevent this by calling the following lines at the start of your program:
QTextCodec *codec = QTextCodec::codecForName("UTF-8");
QTextCodec::setCodecForLocale(codec);
On some platforms UTF-8 seems to be the default text codec.
try to pass data in UTF8 and extract it in your function with something like
QString::fromUTF8
it takes const char* on input.
The problem is that the operator<<(const char *) method expects a Latin1-encoded string, so you should pass a proper UTF-8 QString to QDebug like this:
qDebug() << QString::fromUtf8("我");
... and from inside the message handler expect a UTF-8 string:
QString message = QString::fromUtf8(msg);
And that should work like a charm. ;)
For more information please read the QDebug reference manual.
You could also do the wrong thing: keep passing UTF-8 encoded strings via << and convert the strings with the horrible QString::fromUtf8(QString::fromUtf8(msg).toAscii().constData()) call.
Edit: This is the final example that works:
#include <QString>
#include <QDebug>
#include <QMessageBox>
#include <QApplication>
void
myMessageOutput(QtMsgType type, const char *msg)
{
QMessageBox::information(NULL, NULL, QString::fromUtf8(msg), QMessageBox::Ok);
}
int
main(int argc, char *argv[])
{
QApplication app(argc, argv);
qInstallMsgHandler(myMessageOutput);
qDebug() << QString::fromUtf8("我");
return 0;
}
Please note that QDebug doesn't do any charset conversion if you don't instantiate QApplication. This way you wouldn't need to do anything special to msg from inside the message handler, but I STRONGLY recommend you to instantiate it.
One thing you must be sure is that your source code file is being encoded in UTF-8. To do that you might use a proper tool to check it (file in case you use Linux, for example) or just call QMessageBox::information(NULL, NULL, QString::fromUtf8("我"), QMessageBox::Ok) and see if a proper message appears.
#include <QtCore/QCoreApplication>
#include <stdio.h>
#include <QDebug>
void myMessageOutput(QtMsgType type, const char *msg)
{
fprintf(stderr, "Msg: %s\n", msg);
}
int main(int argc, char *argv[])
{
qInstallMsgHandler(myMessageOutput);
QCoreApplication a(argc, argv);
qDebug() << QString::fromUtf8("我");
}
The code above works here perfectly, but I must stress that my console does support UTF-8, because if it would not it would show another char at that location.

Is there a format specifier that always means char string with _tprintf?

When you build an app on Windows using TCHAR support, %s in _tprintf() means char * string for Ansi builds and wchar_t * for Unicode builds while %S means the reverse.
But are there any format specifiers that always mean char * string no matter if it's an Ansi or Unicode build? Since even on Windows UTF-16 is not really used for files or networking it turns out to still be fairly often that you'll want to deal with byte-based strings regardless of the native character type you compile your app as.
The h modifier forces both %s and %S to char*, and the l modifier forces both to wchar_t*, ie: %hs, %hS, %ls, and %lS.
This might also solve your problem:
_TCHAR *message;
_tprintf(_T("\n>>>>>> %d") TEXT(" message is:%s\n"),4,message);
You can easily write something like this:
#ifdef _UNICODE
#define PF_ASCIISTR "%S"L
#define PF_UNICODESTR "%s"L
#else
#define PF_ASCIISTR "%s"
#define PF_UNICODESTR "%S"
#endif
and then you use the PF_ASCIISTR or the PF_UNICODESTR macros in your format string, exploiting the C automatic string literals concatenation:
_tprintf(_T("There are %d ") PF_ASCIISTR _T(" over the table"), 10, "pens");
I found, that '_vsntprintf_s' uses '%s' for type TCHAR and works for both, GCC and MSVC.
So you could wrap it like:
int myprintf(const TCHAR* lpszFormat, va_list argptr) {
int len = _vsctprintf(lpszFormat, argptr); // -1:err
if (len<=0) {return len;}
auto* pT = new TCHAR[2 + size_t(len)];
_vsntprintf_s(pT, (2+len)*sizeof(TCHAR), 1+len, lpszFormat, argptr);
int rv = printf("%ls", pT);
delete[] pT;
return rv;
}
int myprintf(const TCHAR* lpszFormat, ...) {
va_list argptr;
va_start(argptr, lpszFormat);
int rv = myprintf(lpszFormat, argptr);
va_end(argptr);
return rv;
}
int main(int, char**) { return myprintf(_T("%s"), _T("Test")); }

Resources