wcout problems on Mac OSX - macos

I am trying to do some simple box drawing in the terminal using unicode characters. However I noticed that wcout wouldn't output anything for the box drawing characters, not even a place holder. So I decided to write the program below and find out which unicode characters were supported and found that wcout refused to output anything above 255. Is there something i have to do to make wcout work properly? Why can't access any of the extended unicode characters?
#include <wchar.h>
#include <locale>
#include <iostream>
using namespace std;
int main()
{
for (wchar_t c = 0; c < 0xFFFF; c++)
{
cout << "Iteration " << (int)c << endl;
wcout << c << endl << endl;
}
return 0;
}

I don't recommend using wcout because it is non-portable, inefficient (always performs transcoding) and doesn't support all of Unicode (e.g. surrogate pairs).
Instead you can use the open-source {fmt} library to portably print Unicode text including box drawing characters, for example:
#include <fmt/core.h>
int main() {
fmt::print("┌────────────────────┐\n"
"│ Hello, world! │\n"
"└────────────────────┘\n");
}
prints (https://godbolt.org/z/4EP6Yo):
┌────────────────────┐
│ Hello, world! │
└────────────────────┘
Disclaimer: I'm the author of {fmt}.

Related

GetUserPreferredUILanguages() never returns more than two languages

I'm trying to retrieve the complete list of the user's preferred languages from a C++/Qt application, as configured in the "Region & language" page in the user's preferences:
For that, I am trying with the WinAPI function GetUserPreferredUILanguages(), on an up-to-date Windows 10 Pro system.
However, the function always only returns the first entry (the main Windows display language), and "en-US". If English is configured as the main language, then only "en-US" is returned. E.g., if I have (German, French, English) configured, ["de-de", "en-US"] is returned, French is omitted. If I add more languages to the list, they are omitted as well.
I also looked at User Interface Language Management, but to no avail. GetSystemPreferredUILanguages() for example only returns "en-US". GetUILanguageFallbackList() returns ["de-de", "de", "en-US", "en"].
The code I use:
// calling GetUserPreferredUILanguages() twice, once to get number of
// languages and required buffer size, then to get the actual data
ULONG numberOfLanguages = 0;
DWORD bufferLength = 0;
const auto result1 = GetUserPreferredUILanguages(MUI_LANGUAGE_NAME,
&numberOfLanguages,
nullptr,
&bufferLength);
// result1 is true, numberOfLanguages=2
QVector<wchar_t> languagesBuffer(static_cast<int>(bufferLength));
const auto result2 = GetUserPreferredUILanguages(MUI_LANGUAGE_NAME,
&numberOfLanguages,
languagesBuffer.data(),
&bufferLength);
// result2 is true, languageBuffer contains "de-de", "en-US"
Is this not the right function to use, or am I misunderstanding something about the language configuration in Windows 10? How can I get the complete list of preferred languages? I see UWP API that might do the job, but if possible, I'd like to use C API, as it integrated more easily with the C++ codebase at hand. (unmanaged C++, that is)
GlobalizationPreferences.Languages is usable from unmanaged C++ because GlobalizationPreferences has DualApiPartitionAttribute.
Here is a C++/WinRT example of using GlobalizationPreferences.Languages:
#pragma once
#include <winrt/Windows.Foundation.Collections.h>
#include <winrt/Windows.System.UserProfile.h>
#include <iostream>
#pragma comment(lib, "windowsapp")
using namespace winrt;
using namespace Windows::Foundation;
using namespace Windows::System::UserProfile;
int main()
{
winrt::init_apartment();
for (const auto& lang : GlobalizationPreferences::Languages()) {
std::wcout << lang.c_str() << std::endl;
}
}
And a WRL example for those who cannot migrate to C++ 17:
#include <roapi.h>
#include <wrl.h>
#include <Windows.System.UserProfile.h>
#include <iostream>
#include <stdint.h>
#pragma comment(lib, "runtimeobject.lib")
using namespace Microsoft::WRL;
using namespace Microsoft::WRL::Wrappers;
using namespace ABI::Windows::Foundation::Collections;
using namespace ABI::Windows::System::UserProfile;
int main()
{
RoInitializeWrapper initialize(RO_INIT_MULTITHREADED);
if (FAILED(initialize)) {
std::cerr << "RoInitialize failed" << std::endl;
return 1;
}
ComPtr<IGlobalizationPreferencesStatics> gps;
HRESULT hr = RoGetActivationFactory(
HStringReference(
RuntimeClass_Windows_System_UserProfile_GlobalizationPreferences)
.Get(),
IID_PPV_ARGS(&gps));
if (FAILED(hr)) {
std::cerr << "RoGetActivationFactory failed" << std::endl;
return 1;
}
ComPtr<IVectorView<HSTRING>> langs;
hr = gps->get_Languages(&langs);
if (FAILED(hr)) {
std::cerr << "Could not get Languages" << std::endl;
return 1;
}
uint32_t size;
hr = langs->get_Size(&size);
if (FAILED(hr)) {
std::cerr << "Could not get Size" << std::endl;
return 1;
}
for (uint32_t i = 0; i < size; ++i) {
HString lang;
hr = langs->GetAt(i, lang.GetAddressOf());
if (FAILED(hr)) {
std::cerr << "Could not get Languages[" << i << "]" << std::endl;
continue;
}
std::wcout << lang.GetRawBuffer(nullptr) << std::endl;
}
}
I found out that language list returned by GetUserPreferredUILanguages() matters with your "Windows UI language" setting, and nothing to do with "Input method list order".
For example, in following screenshot from Win10.21H2,
I can see GetUserPreferredUILanguages() return a list of three langtags:
fr-CA\0fr-FR\0en-US\0\0
In summary, for GetUserPreferredUILanguages() and GetUILanguageFallbackList() their returned langtag list is determined solely by current user's "Windows display language" selection. It is a user-wide single-selection setting. And, for a specific display-language selection, the list-items within and the order of the list-items are hard-coded by Windows itself. Yes, it is even unrelated to what "input methods(IME)" you have added to the control panel -- for example, you add "fr-CA" but not "fr-FR", and the fallback list will still be fr-CA\0fr-FR\0en-US\0\0.
The difference of the two APIs, according to my experiment, is that GetUILanguageFallbackList() returns neutral langtags("fr", "en" etc) as well, so it produces a superset of GetUserPreferredUILanguages().

How to Convert Custom string to ptime using boost

I have a string "2018Jan23T181138.65498648" which I need to convert to ptime. I have used below code but seems it is not working. Any idea what I am doing wrong here.
boost::posix_time::ptime pt;
std::istringstream is("2018Jan23T181138.65498648");
is.imbue(std::locale(std::locale::classic(), new boost::posix_time::time_input_facet("%Y%m%dT%H%M%S.%f")));
is >> pt;
std::cout << pt;
You need to at least match the format string to reflect the input format.
"Jan" is not a valid match for %Y%m%d (which would expect 20180123 instead). Likewise, %S.%f is a format string that might work for formatting¹, but to parse the seconds with fractions, the docs show to use %s
Live On Coliru
#include <boost/date_time.hpp>
#include <boost/date_time/posix_time/posix_time_io.hpp>
#include <sstream>
#include <iostream>
int main() {
boost::posix_time::ptime pt;
std::istringstream is("2018Jan23T181138.65498648");
is.imbue(std::locale(std::locale::classic(), new boost::posix_time::time_input_facet("%Y%b%dT%H%M%s")));
if (is >> pt) {
std::cout << pt << "\n";
} else {
std::cout << "unparsed\n";
}
}
Prints
2018-Jan-23 18:11:38.654986
¹ haven't tested it for output formatting

using stl to run length encode a string using std::adjacent_find

I am trying to perform run length compression on a string for a special protocol that I am using. Runs are considered efficient when the run size or a particular character in the string is >=3. Can someone help me to achieve this. I have live demo on coliru. I am pretty sure this is possible with the standard library's std::adjacent_find with a combination of std::not_equal_to<> as the binary predicate to search for run boundaries and probably using std::equal_to<> once I find a boundary. Here is what I have so far but I am having trouble with the results:
Given the following input text string containing runs or spaces and other characters (in this case runs of the letter 's':
"---thisssss---is-a---tesst--"
I am trying to convert the above text string into a vector containing elements that are either pure runs of > 2 characters or mixed characters. The results are almost correct but not quite and I cannot spot the error.
g++ -std=c++14 -O2 -Wall -pedantic -pthread main.cpp && ./a.out
expected the following
======================
---,thi,sssss,---,is-a,---,tesst--,
actual results
==============
---,thi,sssss,---,is-a,---,te,ss,--,
EDIT: I fixed up the previous code to make this version closer to the final solution. Specifically I added explicit tests for the run size to be > 2 to be included. I seem to be having boundary case problems though - the all spaces case and the case where the end of the strings ends in several spaces:
#include <iterator>
#include <iostream>
#include <memory>
#include <string>
#include <vector>
#include <algorithm>
#include <functional>
int main()
{
// I want to convert this string containing adjacent runs of characters
std::string testString("---thisssss---is-a---tesst--");
// to the following
std::vector<std::string> idealResults = {
"---", "thi", "sssss",
"---", "is-a",
"---", "tesst--"
};
std::vector<std::string> tokenizedStrings;
auto adjIter = testString.begin();
auto lastIter = adjIter;
// temporary string used to accumulate characters that
// are not part of a run.
std::unique_ptr<std::string> stringWithoutRun;
while ((adjIter = std::adjacent_find(
adjIter, testString.end(), std::not_equal_to<>())) !=
testString.end()) {
auto next = std::string(lastIter, adjIter + 1);
// append to foo if < run threshold
if (next.length() < 2) {
if (!stringWithoutRun) {
stringWithoutRun = std::make_unique<std::string>();
}
*stringWithoutRun += next;
} else {
// if we have encountered non run characters, save them first
if (stringWithoutRun) {
tokenizedStrings.push_back(*stringWithoutRun);
stringWithoutRun.reset();
}
tokenizedStrings.push_back(next);
}
lastIter = adjIter + 1;
adjIter = adjIter + 1;
}
tokenizedStrings.push_back(std::string(lastIter, adjIter));
std::cout << "expected the following" << std::endl;
std::cout << "======================" << std::endl;
std::copy(idealResults.begin(), idealResults.end(), std::ostream_iterator<std::string>(std::cout, ","));
std::cout << std::endl;
std::cout << "actual results" << std::endl;
std::cout << "==============" << std::endl;
std::copy(tokenizedStrings.begin(), tokenizedStrings.end(), std::ostream_iterator<std::string>(std::cout, ","));
std::cout << std::endl;
}
if (next.length() < 2) {
if (!stringWithoutRun) {
stringWithoutRun = std::make_unique<std::string>();
}
*stringWithoutRun += next;
}
This should be if (next.length() <= 2). You need to add a run of identical characters to the current token if its length is either 1 or 2.
I seem to be having boundary case problems though - the all spaces
case and the case where the end of the strings ends in several spaces
When stringWithoutRun is not empty after the loop finishes, the characters accumulated in it are not added to the array of tokens. You can fix it like this:
// The loop has finished
if (stringWithoutRun)
tokenizedStrings.push_back(*stringWithoutRun);
tokenizedStrings.push_back(std::string(lastIter, adjIter));

String length changes suddenly

Here in this code, the character length is changing suddenly. Before introducing char file the strlen(str) was correct. As I introduced the new char file the strlen value of variable str changes.
#include <unistd.h>
#include <iostream>
#include <stdio.h>
#include <string.h>
using namespace std;
int main(){
char buf[BUFSIZ];
if(!getcwd(buf,BUFSIZ)){
perror("ERROR!");
}
cout << buf << endl;
char *str;
str = new char[strlen(buf)];
strcpy(str,buf);
strcat(str,"/");
strcat(str,"input/abcdefghijklmnop");
cout << str << endl;
cout << strlen(str) << endl;
char *file;
file = new char[strlen(str)];
cout << strlen(file) << endl;
strcpy(file,str);
cout << file << endl;
}
Your code has undefined behavior because of buffer overflow. You should be scared.
You should consider using std::string.
std::string sbuf;
{
char cwdbuf[BUFSIZ];
if (getcwd(cwdbuf, sizeof(cwdbuf))
sbuf = cwdbuf;
else {
perror("getcwd");
exit(EXIT_FAILURE);
}
}
sbuf += "/input/abcdefghijklmnop";
You should compile with all warnings & debug info (e.g. g++ -Wall -Wextra -g) then use the debugger gdb. Don't forget that strings are zero-byte terminated. Your str is much too short. If you insist on avoiding std::string (which IMHO you should not), you need to allocate more space (and remember the extra zero byte).
str = new char[strlen(buf)+sizeof("/input/abcdefghijklmnop")];
strcpy(str, buf);
strcat(str, "/input/abcdefghijklmnop");
Remember that the sizeof some literal string is one byte more than its length (as measured by strlen). For instance sizeof("abc") is 4.
Likewise your file variable is one byte too short (missing space for the terminating zero byte).
file = new char[strlen(str)+1];
BTW on GNU systems (such as Linux) you could use asprintf(3) or strdup(3) (and use free not delete to release the memory) and consider using valgrind.

Wrong endian with wstring_convert

I recently discovered the <codecvt> header, so I wanted to convert between UTF-8 and UTF-16.
I use the codecvt_utf8_utf16 facet with wstring_convert from C++11.
The issue I have, is when I try to convert an UTF-16 string to UTF-8, then in UTF-16 again, the endianness changes.
For this code :
#include <codecvt>
#include <string>
#include <locale>
#include <iostream>
using namespace std;
int main(int argc, char const *argv[])
{
wstring_convert<codecvt_utf8_utf16<char16_t>, char16_t>
convert;
u16string utf16 = u"\ub098\ub294\ud0dc\uc624";
cout << hex << "UTF-16\n\n";
for (char16_t c : utf16)
cout << "[" << c << "] ";
string utf8 = convert.to_bytes(utf16);
cout << "\n\nUTF-16 to UTF-8\n\n";
for (unsigned char c : utf8)
cout << "[" << int(c) << "] ";
cout << "\n\nConverting back to UTF-16\n\n";
utf16 = convert.from_bytes(utf8);
for (char16_t c : utf16)
cout << "[" << c << "] ";
cout << endl;
}
I get this output :
UTF-16
[b098] [b294] [d0dc] [c624]
UTF-16 to UTF-8
[eb] [82] [98] [eb] [8a] [94] [ed] [83] [9c] [ec] [98] [a4]
Converting back to UTF-16
[98b0] [94b2] [dcd0] [24c6]
When I change the third template argument of wstring_convert to std::little_endian, the bytes are reversed.
What did I miss ?
It was indeed a bug, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66855
It will be fixed in 5.3

Resources