Non-ASCII character casted on int - utf-8

I would like to cast a non-ASCII character (for example 'ą') on int to get it number in UTF-8. When I do something like this:
#include <iostream>
using namespace std;
int main()
{
cout << static_cast<int>('ą')<<endl;
return 0;
}
I get -71 what is not its proper number in UTF-8. I heard that it might be because 'ą' is stored in 2 bytes and one of them is cut away when initialization of variable. Any solution for this?

Related

C++: No viable overloaded '=' data.end() -1 = '\0'

I'm trying to create a program that filters through speech text, removes any unwanted characters (",", "?", etc., etc.") and then produces a new speech where the words are jumbled based on what words follow or precede them. So for example, if you had the Gettysburg Address:
Four score and seven years ago our fathers brought forth, on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal.
my program would take that text, put it into a set of strings. i.e. ["Four","score","and","seven",...."continent,"..."Liberty,"..."equal."] Then it would remove any unwanted characters from each string using c++ .erase and c++ .remove, like "," or "." and capitals. After, you'd have a filtered string like ["four","score","and","seven",...."continent"..."liberty"..."equal."]
After that then the words would be rearranged into a new coherent, funnier speech, like:
"Seven years ago our fathers conceived on men...", etc.
That was just so you know the scope of this project. My trouble at the moment has to do with either using my iterator properly or null terminators.
#include <iostream>
#include <fstream>
#include <iomanip>
#include <string>
#include <set>
#include <iterator> //iterates through sets
#include <algorithm>
using namespace std;
int main() {
set <string> speechSet;
set <string> ::iterator itr; //forgot what :: means. Declares iterator as set
int sum = 0;
int x;
string data;
ofstream out;
string setString;
ifstream speechFile; //declare output file stream object. Unknown type name
speechFile.open("./MySpeech");
if (!speechFile) {
cerr << "Unable to open file " << endl;
exit(1);
}
char unwantedCharacters[] = ".";
while (!speechFile.eof()) {
speechFile >> data; //speechFile input into data
for (unsigned int i = 0; i < strlen(unwantedCharacters); ++i) {
data.erase((remove(data.begin(), data.end(),
unwantedCharacters[i]), data.end())); //remove doesn't delete.
data.end() - 1 = '\0'; //Reorganizes
cout << data << endl;
}
speechSet.insert(string(data));
}
//Go through each string (word) one at a time and remove "",?, etc.
/*for(itr = speechSet.begin(); itr != speechSet.end(); ++itr){
if(*itr == ".")//if value pointed to by *itr is equal to '.'
itr = speechSet.erase(itr);//erase the value in the set and leave blank
cout << " " << *itr;//print out the blank
else{
cout << " " << *itr;
}
}*/
speechFile.close();
return (0);
}
I keep getting an error that says error: no viable overloaded '='. At first I thought it might be due to .end() not being a command for a C++ string, but I checked the documentation and it shouldn't be an issue of mismatched data typed. Then I thought it might have to set the iterator itr equal to the end of the data.
iterator itr = data.end() - 1;
and then dereference that pointer and set it equal to the null terminator
itr* = '\0';
That removed the overload error, but I still had another error use of class template 'iterator' requires template arguments. Let me know if any more clarification is needed.
In the for loop, use auto for iterator so you don't have to specify its type like:
for(auto itr = speechSet.begin(); itr != speechSet.end(); ++itr){

Accepting and printing a string

Can we accept and print a string like this in c++?
This code is not working properly.
#include<iostream>
#include<string>
using namespace std;
main()
{
string a;char ch;
for(int i=0;i<5;i++)
{cin>>ch;
a[i]=ch;
}
a[5]='\0';
cout<<a;
}
I am able to print individual elements like a[1],a[2],etc but unable to print the entire string.Why?
If you want to take a string, you could do the following.
#include <iostream>
int main() {
std::string str;
std::getline(std::cin, str);
std::cout << str;
}
Also, C++ automatically null terminates any string literal you use.
Well it's not really anywhere near best-practices but to fix your immediate issue you need to actually resize the string.
#include<iostream>
#include<string>
main()
{
std::string a;char ch;
a.resize(5); // <--- reserves memory
for(int i=0;i<5;i++)
{
std::cin>>ch;
a[i]=ch;
}
a[5]='\0'; //<-- unnecessary
st::cout<<a;
}
alternatively you can append the characters
#include<iostream>
#include<string>
main()
{
std::string a;char ch;
for(int i=0;i<5;i++)
{
std::cin>>ch;
a+=ch;
}
std::cout<<a;
}
The real problem here is not that you can't read or can't print the string, is that you are writing to unallocated memory. operator[], which is what you are using when you do something like a[i]=ch, does not do any kind of boundary checking and thus you are causing undefined behavior. In my machine, nothing is printed, for instance.
In short, you need to make sure that you have space to write your characters. If you are certain that you are going to read 5 characters (and adding a \0 at the end, making it 6 in length), you could do something like this:
std::string a(6, '\0')
If you are uncertain of how many characters you are going to read, std::string is ready to allocate space as need, but you need to use std::push_back to give it a chance to do so. Your loop contents would be something like:
cin >> ch;
a.push_back(ch);
If you are uncertain where the std::string object is coming from (as in, this is library code that accepts a std::string as an argument, you could use at(i) (e.g, a.at(i) = ch instead of a[i] = ch), which throws an exception if it is out of range.
You can print the string like this
#include<iostream>
#include<string>
using namespace std;
int main()
{
string a;char ch;
for(int i=0;i<5;i++)
{
cin>>ch;
a.push_back(ch);
}
a.push_back('\0');
cout << a;
return 0;
}

using MultiByteToWideChar

The following code prints the desired output but it prints garbage at the end of the string. There is something wrong with the last call to MultiByteToWideChar but I can't figure out what. Please help??
#include "stdafx.h"
#include<Windows.h>
#include <iostream>
using namespace std;
#include<tchar.h>
int main( int, char *[] )
{
TCHAR szPath[MAX_PATH];
if(!GetModuleFileName(NULL,szPath,MAX_PATH))
{cout<<"Unable to get module path"; exit(0);}
char ansiStr[MAX_PATH];
if(!WideCharToMultiByte(CP_ACP,WC_COMPOSITECHECK,szPath,-1,
ansiStr,MAX_PATH,NULL,NULL))
{cout<<"Unicode to ANSI failed\n";
cout<<GetLastError();exit(1);}
string s(ansiStr);
size_t pos = 0;
while(1)
{
pos = s.find('\\',pos);
if(pos == string::npos)
break;
s.insert(pos,1,'\\');
pos+=2;
}
if(!MultiByteToWideChar(CP_ACP,MB_PRECOMPOSED,s.c_str(),s.size(),szPath,MAX_PATH))
{cout<<"ANSI to Unicode failed"; exit(2);}
wprintf(L"%s",szPath);
}
MSDN has this to say about the cbMultiByte parameter:
If this parameter is -1, the function processes the entire input
string, including the terminating null character. Therefore, the
resulting Unicode string has a terminating null character, and the
length returned by the function includes this character.
If this parameter is set to a positive integer, the function processes
exactly the specified number of bytes. If the provided size does not
include a terminating null character, the resulting Unicode string is
not null-terminated, and the returned length does not include this
character.
..so if you want the output string to be 0 terminated you should include the 0 terminator in the length you pass in OR 0 terminate yourself based on the return value...

Extended ASCII charatcer string convert into Hex format

I am trying to convert the input ascii string to hex format using the format specifier "%.2X" . I have pasted the program at below which does the same. But, this program works in Linux server its giving improper hex output.
#include <iostream>
#include <cstring>
#include <string>
using namespace std;
int main ()
{
char *buffer = "<9e>¥/gÿbbbbABCD";
char *newBuffer = new char[strlen(buffer)*2 + 1];
for(int i = 0; i< strlen(buffer); ++i)
{
sprintf(newBuffer+(2*i), "%02x", buffer[i]);
//cout<<"\t"<<newBuffer;
}
cout<<"\t"<<newBuffer;
return 0;
}
o/p: 3c39653effff2f67ffff6262626241424344
for this extended ASCII char ¥ and ÿ is giving ffff and ffff.
Please tell me how to convert properly.
If we typecast the source string with unsigned char, the problem gets solved:
sprintf(newBuffer+(2*i), "%02x", (unsigned char)buffer[i]);

GetUserDefaultLocaleName() API is crashing

I have one application which reads user default locale in Windows Vista and above. When i tried calling the API for getting User default Locale API is crashing. Below is the code, It will be helpfull if any points the reason
#include <iostream>
#include <WinNls.h>
#include <Windows.h>
int main()
{
LPWSTR lpLocaleName=NULL;
cout << "Calling GetUserDefaultLocaleName";
int ret = GetUserDefaultLocaleName(lpLocaleName, LOCALE_NAME_MAX_LENGTH);
cout << lpLocaleName<<endl;
}
You need to have lpLocaleName initialized to a buffer prior to calling the API. As a general consensus, if an API has a LPWSTR data type parameter, call malloc or new on it first, to the desired length, in this case, LOCALE_NAME_MAX_LENGTH. Setting it to NULL and passing it to the API function is a guaranteed way to crash!
Hope this helps,
Best regards,
Tom.
In addition to the previous answers, you should also be aware that you can't print a wide string with cout; instead, you should use wcout.
So:
#include <iostream>
#include <WinNls.h>
#include <Windows.h>
#define ARRSIZE(arr) (sizeof(arr)/sizeof(*(arr)))
using namespace std;
int main()
{
WCHAR_T localeName[LOCALE_NAME_MAX_LENGTH]={0};
cout<<"Calling GetUserDefaultLocaleName";
int ret = GetUserDefaultLocaleName(localeName,ARRSIZE(localeName));
if(ret==0)
cout<<"Cannot retrieve the default locale name."<<endl;
else
wcout<<localeName<<endl;
return 0;
}
I believe you need to initialise lpLocaleName to an empty string of 256 chars (for example) then pass the length (256) where you have LOCALE_NAME_MAX_LENGTH

Resources