Copying string to clipboard, only one character written when pasted - winapi

I was basic some code off this code, also mentioned in my other question. That version forces the character type to char*, which breaks compilation on my unicode project. So I made the following tweaks:
void SetClipboardText(CString & szData)
{
HGLOBAL h;
LPTSTR arr;
size_t bytes = (szData.GetLength()+1)*sizeof(TCHAR);
h=GlobalAlloc(GMEM_MOVEABLE, bytes);
arr=(LPTSTR)GlobalLock(h);
ZeroMemory(arr,bytes);
_tcscpy_s(arr, szData.GetLength()+1, szData);
szData.ReleaseBuffer();
GlobalUnlock(h);
::OpenClipboard (NULL);
EmptyClipboard();
SetClipboardData(CF_TEXT, h);
CloseClipboard();
}
The copying looks fine - running in a debugger Visual Studio tells me arr contains the copied string as expected.
But When I then paste into any application, only the first character is pasted.
What's going wrong?

Your Unicode comment in the prior question's comment is telling. If you have a wide character string with a low-ASCII character, in UTF-16 it's going to be encoded as the low-ASCII byte followed by a NULL. Use CF_UNICODETEXT instead of CF_TEXT.

Related

Qt5 C++ UTF-8 convertion to Windows-1250 of Romanian ș and ț characters

My application is developed in C++'11 and uses Qt5. In this application, I need to store a UTF-8 text as Windows-1250 coded file.
I tried two following ways and both work expect for Romanian 'ș' and 'ț' characters :(
1.
auto data = QStringList() << ... <some texts here>;
QTextStream outStream(&destFile);
outStream.setCodec(QTextCodec::codecForName("Windows-1250"));
foreach (auto qstr, data)
{
outStream << qstr << EOL_CODE;
}
2.
auto data = QStringList() << ... <some texts here>;
auto *codec = QTextCodec::codecForName("Windows-1250");
foreach (auto qstr, data)
{
const QByteArray encodedString = codec->fromUnicode(qstr);
destFile.write(encodedString);
}
In case of 'ț' character (alias 0xC89B), instead of expected 0xFE value, the character is coded and stored as 0x3F, that it is unexpected.
So I am looking for any help or experience / examples regarding text recoding.
Best regards,
Do not confuse ț with ţ. The former is what is in your post, the latter is what's actually supported by Windows-1250.
The character ț from your post is T-comma, U+021B, LATIN SMALL LETTER T WITH COMMA BELOW, however:
This letter was not part of the early Unicode versions, which is why Ţ (T-cedilla, available from version 1.1.0, June 1993) is often used in digital texts in Romanian.
The character referred to is ţ, U+0163, LATIN SMALL LETTER T WITH CEDILLA (emphasis mine):
In early versions of Unicode, the Romanian letter Ț (T-comma) was considered a glyph variant of Ţ, and therefore was not present in the Unicode Standard. It is also not present in the Windows-1250 (Central Europe) code page.
The story of ş and ș, being S-cedilla and S-comma is analogous.
If you must encode to this archaic Windows 1250 code page, I'd suggest replacing the comma variants by the cedilla variants (both lowercase and uppercase) before encoding. I think Romanians will understand :)

string size limit input cin.get() and getline()

In this project the user can type in a text(maximum 140 characters).
so for this limitation I once used getline():
string text;
getline(cin, text);
text = text.substr(1, 140);
but in this case the result of cout << text << endl; is an empty string.
so I used cin.get() like:
cin.get(text, 140);
this time I get this error: no matching function for call to ‘std::basic_istream::get(std::__cxx11::string&, int)’
note that I have included <iostream>
so the question is how can I fix this why is this happening?
Your first approach is sound with one correction - you need to use
text = text.substr(0, 140);
instead of text = text.substr(1, 140);. Containers (which includes a string) in C/C++ start with index 0 and you are requesting the string to be trimmed from position 1. This is perfectly fine, but if the string happens to be only one character long, calling text.substr(1, 140); will not necessarily cause the program to crash, but will not end up in the desired output either.
According to this source, substr will throw an out of range exception if called with starting position larger than string length. In case of a one character string, position 1 would be equal to string length, but the return value is not meaningful (in fact, it may even be an undefined behavior but I cannot find a confirmation of this statement - in yours and my case, calling it returns an empty string). I recommend you test it yourself in the interactive coding section following the link above.
Your second approach tried to pass a string to a function that expected C-style character arrays. Again, more can be found here. Like the error said, the compiler couldn't find a matching function because the argument was a string and not the char array. Some functions will perform a conversion of string to char, but this is not the case here. You could convert the string to char array yourself, as for instance described in this post, but the first approach is much more in line with C++ practices.
Last note - currently you're only reading a single line of input, I assume you will want to change that.

Compare unicode std::string with usual "" literal or u8"" declartion

On Windows with Visual Studio 2015
// Ü
// UTF-8 (hex) 0xC3 0x9C
// UTF-16 (hex) 0x00DC
// UTF-32 (hex) 0x000000DC
using namespace std::string_literals;
const auto narrow_multibyte_string_s = "\u00dc"s;
const auto wide_string_s = L"\u00dc"s;
const auto utf8_encoded_string_s = u8"\u00dc"s;
const auto utf16_encoded_string_s = u"\u00dc"s;
const auto utf32_encoded_string_s = U"\u00dc"s;
assert(utf8_encoded_string_s == "\xC3\x9C");
assert(narrow_multibyte_string_s == "Ü");
assert(utf8_encoded_string_s == u8"Ü");
// here is the question
assert(utf8_encoded_string_s != narrow_multibyte_string_s);
"\u00dc"s is not the same as u8"\u00dc"s or "Ü"s is not the same as u8"Ü"s
Apparently the default encoding for usual string literal is not UTF-8 (Probably UTF-16) and I cannot just compare two std::string without knowing its encoding even they have the same semantic.
What is the practice to perform such string comparison in unicode-enable c++ application development??
For example an API like this:
class MyDatabase
{
bool isAvailable(const std::string& key)
{
// *compare* key in database
if (key == "Ü")
return true;
else
return false;
}
}
Other programs may call isAvailable with std::string in UTF-8 or default (UTF-16?) encoding. How can I garantee to do the proper comparision?
can I detect any encoding mismatch in compile-time?
Note: I prefer C++11/14 stuff.
Prefer std::string than std::wstring
"\u00dc" is a char[] encoded in whatever the compiler/OS's default 8-bit encoding happens to be, so it can be different on different machines. On Windows, that tends to be the OS's default Ansi encoding, or it could be the encoding that the source file is saved as.
L"\u00dc" is a wchar_t[] encoded with either UTF-16 or UTF-32, depending on the compiler's definition of wchar_t (which is 16-bit on Windows, so UTF-16).
u8"\u00dc" is a char[] encoded in UTF-8.
u"\u00dc" is a char16_t[] encoded in UTF-16.
U"\u00dc" is a char32_t[] encoded in UTF-32.
The ""s suffix simply returns a std::string, std::wstring, std::u16string, or std::u32string, depending on whether a char[], wchar_t[], char16_t[], or char32_t[] is passed to it.
When comparing two strings, make sure they are in the same encoding first. This is especially important for your char[]/std::string data, as it could be in any number of 8-bit encodings, depending on the systems involved. This is not so much a problem if the app is generating the strings itself, but it is important if one or more of the strings is coming from an external source (file, user input, network protocol, etc).
In your example, "\u00dc" and "Ü" are not necessarily guaranteed to produce the same char[] sequence, depending on how the compiler interprets those different literals. But even if they did (which seems to be the case in your example), neither of them will likely produce UTF-8 (you have to go to extra measures to force that), which is why your comparison to utf8_encoded_string_s fails.
So, if you are expecting a string literal to be UTF-8, use u8"" to ensure that. If you are getting string data from an external source and need it to be in UTF-8, convert it to UTF-8 in code as soon as possible, if it is not already (which means you have to know the encoding used by the external source).

Converting Characters to ASCII Code & Vice Versa In C++/CLI

I am currently learning c++/cli and I want to convert a character to its ASCII code decimal and vice versa( example 'A' = 65 ).
In JAVA, this can be achieved by a simple type casting:
char ascci = 'A';
char retrieveASCII =' ';
int decimalValue;
decimalValue = (int)ascci;
retrieveASCII = (char)decimalValue;
Apparently this method does not work in c++/cli, here is my code:
String^ words = "ABCDEFG";
String^ getChars;
String^ retrieveASCII;
int decimalValue;
getChars = words->Substring(0, 1);
decimalValue = Int32:: Parse(getChars);
retrieveASCII = decimalValue.ToString();
I am getting this error:
A first chance exception of type 'System.ArgumentOutOfRangeException' occurred in mscorlib.dll
Additional information: Input string was not in a correct format.
Any Idea on how to solve this problem?
Characters in a TextBox::Text property are in a System::String type. Therefore, they are Unicode characters. By design, the Unicode character set includes all of the ASCII characters. So, if the string only has those characters, you can convert to an ASCII encoding without losing any of them. Otherwise, you'd have to have a strategy of omitting or substituting characters or throwing an exception.
The ASCII character set has one encoding in current use. It represents all of its characters in one byte each.
// using ::System::Text;
const auto asciiBytes = Encoding::ASCII->GetBytes(words->Substring(0,1));
const auto decimalValue = asciiBytes[0]; // the length is 1 as explained above
const auto retrieveASCII = Encoding::ASCII->GetString(asciiBytes);
Decimal is, of course, a representation of a number. I don't see where you are using decimal except in your explanation. If you did want to use it in code, it could be like this:
const auto explanation = "The encoding (in decimal) "
+ "for the first character in ASCII is "
+ decimalValue;
Note the use of auto. I have omitted the types of the variables because the compiler can figure them out. It allows the code to be more focused on concepts rather than boilerplate. Also, I used const because I don't believe the value of "variables" should be varied. Neither of these is required.
BTW- All of this applies to Java, too. If your Java code works, it is just out of coincidence. If it had been written properly, it would have been easy to translate to .NET. Java's String and Charset classes have very similar functionality as .NET String and Encoding classes. (Encoding to the proper term, though.) They both use the Unicode character set and UTF-16 encoding for strings.
More like Java than you think
String^ words = "ABCDEFG";
Char first = words [0];
String^ retrieveASCII;
int decimalValue = ( int)first;
retrieveASCII = decimalValue.ToString();

XCode: preprocessor concatenation broken?

We have a piece of cross-platform code that uses wide strings. All our string constants are wide strings and we need to use CFSTR() on some of them. We use these macros to get rid of L from wide strings:
// strip leading L"..." from wide string macros
// expand macro, e.g. turn WIDE_STRING (#define WIDE_STRING L"...") into L"..."
# define WIDE2NARROW(WideMacro) REMOVE_L(WideMacro)
// L"..." -> REM_L"..."
# define REMOVE_L(WideString) REM_##WideString
// REM_L"..." -> "..."
# define REM_L
This works on both Windows and Linux. Not on Mac – we get the following error:
“error: pasting "REM_" and "L"qm"" does not give a valid preprocessing token”
Mac example:
#define TRANSLATIONS_DIR_BASE_NAME L"Translations"
#define TRANSLATIONS_FILE_NAME_EXTENSION L"qm"
CFURLRef appUrlRef = CFBundleCopyResourceURL( CFBundleGetMainBundle()
, macTranslationFileName
, CFSTR(WIDE2NARROW(TRANSLATIONS_FILE_NAME_EXTENSION))
, CFSTR(WIDE2NARROW(TRANSLATIONS_DIR_BASE_NAME))
);
Any ideas?
During tokenization, which happens before the preprocessor language, string literals are processed. So the L"qm" is converted to a wide string literal. Which means you are trying to token paste with a string literal(and not the letter L), which C99 forbids.

Resources