I am currently learning c++/cli and I want to convert a character to its ASCII code decimal and vice versa( example 'A' = 65 ).
In JAVA, this can be achieved by a simple type casting:
char ascci = 'A';
char retrieveASCII =' ';
int decimalValue;
decimalValue = (int)ascci;
retrieveASCII = (char)decimalValue;
Apparently this method does not work in c++/cli, here is my code:
String^ words = "ABCDEFG";
String^ getChars;
String^ retrieveASCII;
int decimalValue;
getChars = words->Substring(0, 1);
decimalValue = Int32:: Parse(getChars);
retrieveASCII = decimalValue.ToString();
I am getting this error:
A first chance exception of type 'System.ArgumentOutOfRangeException' occurred in mscorlib.dll
Additional information: Input string was not in a correct format.
Any Idea on how to solve this problem?
Characters in a TextBox::Text property are in a System::String type. Therefore, they are Unicode characters. By design, the Unicode character set includes all of the ASCII characters. So, if the string only has those characters, you can convert to an ASCII encoding without losing any of them. Otherwise, you'd have to have a strategy of omitting or substituting characters or throwing an exception.
The ASCII character set has one encoding in current use. It represents all of its characters in one byte each.
// using ::System::Text;
const auto asciiBytes = Encoding::ASCII->GetBytes(words->Substring(0,1));
const auto decimalValue = asciiBytes[0]; // the length is 1 as explained above
const auto retrieveASCII = Encoding::ASCII->GetString(asciiBytes);
Decimal is, of course, a representation of a number. I don't see where you are using decimal except in your explanation. If you did want to use it in code, it could be like this:
const auto explanation = "The encoding (in decimal) "
+ "for the first character in ASCII is "
+ decimalValue;
Note the use of auto. I have omitted the types of the variables because the compiler can figure them out. It allows the code to be more focused on concepts rather than boilerplate. Also, I used const because I don't believe the value of "variables" should be varied. Neither of these is required.
BTW- All of this applies to Java, too. If your Java code works, it is just out of coincidence. If it had been written properly, it would have been easy to translate to .NET. Java's String and Charset classes have very similar functionality as .NET String and Encoding classes. (Encoding to the proper term, though.) They both use the Unicode character set and UTF-16 encoding for strings.
More like Java than you think
String^ words = "ABCDEFG";
Char first = words [0];
String^ retrieveASCII;
int decimalValue = ( int)first;
retrieveASCII = decimalValue.ToString();
Related
How can i add an angle symbol to a string to put in a TMemo?
I can add a degree symbol easy enough based on its octal value from the extended ascii table:
String deg = "\272"; // 272 is octal value in ascii code table for degree symbol
Form1->Memo1->Lines->Add("My angle = 90" + deg);
But, if i try to use the escape sequence for the angle symbol (\u2220) i get a compiler error, W8114 Character represented by universal-character-name \u2220 cannot be represented in the current ansi locale:
UnicodeString deg = "\u2220";
Form1->Memo1->Lines->Add("My angle = 90" + deg);
Just for clarity, below is the symbol i'm after. I can just use the # if i have too, just wondering if this is possible without nashing of teeth. My target for this test was Win32 but i'll want it to work on iOS and Android too.
p.s. This table is handy to see the codes.
After following Rob's answer i've got it working but on iOS the angle is offset down below the horizontal with the other text. On Win32 it is tiny. Looks good on Android. I'll report as a bug to Embarcadero, albeit minor.
Here is code i used based on Rob's comments:
UnicodeString szDeg;
UnicodeString szAng;
szAng.SetLength(1);
szDeg.SetLength(1);
*(szAng.c_str()) = 0x2220;
*(szDeg.c_str()) = 0x00BA;
Form1->Memo1->Lines->Add("1: " + FormatFloat("##,###0.0",myPhasors.M1)+ szAng + FormatFloat("###0.0",myPhasors.A1) + szDeg);
Here is how looks when explicitly set the TMemo font to Courier New:
Here is the final code i'm using after Remy's replies:
UnicodeString szAng = _D("\u2220");
UnicodeString szDeg = _D("\u00BA");
Form1->Memo1->Lines->Add("1: " + FormatFloat("##,###0.0",myPhasors.M1)+ szAng + FormatFloat("###0.0",myPhasors.A1) + szDeg);
The compiler error is because you are using a narrow ANSI string literal, and \u2220 does not fit in a char. Use a Unicode string literal instead:
UnicodeString deg = _D("\u2220");
The RTL's _D() macro prefixes the literal with either the L or u prefix depending on whether UnicodeString uses wchar_t (Windows only) or char16_t (other platforms) for its character data.
The error indicates some kind of code range failure, which you ought to be able to avoid. Try setting the character code directly:
UnicodeString szDeg;
UnicodeString szMessage;
szDeg.SetLength(1);
*(szDeg.c_str())=0x2022;
szMessage=UnicodeString(L"My angle = 90 ")+szDeg;
Form1->Memo1->Lines->Add(szMessage);
On Windows with Visual Studio 2015
// Ü
// UTF-8 (hex) 0xC3 0x9C
// UTF-16 (hex) 0x00DC
// UTF-32 (hex) 0x000000DC
using namespace std::string_literals;
const auto narrow_multibyte_string_s = "\u00dc"s;
const auto wide_string_s = L"\u00dc"s;
const auto utf8_encoded_string_s = u8"\u00dc"s;
const auto utf16_encoded_string_s = u"\u00dc"s;
const auto utf32_encoded_string_s = U"\u00dc"s;
assert(utf8_encoded_string_s == "\xC3\x9C");
assert(narrow_multibyte_string_s == "Ü");
assert(utf8_encoded_string_s == u8"Ü");
// here is the question
assert(utf8_encoded_string_s != narrow_multibyte_string_s);
"\u00dc"s is not the same as u8"\u00dc"s or "Ü"s is not the same as u8"Ü"s
Apparently the default encoding for usual string literal is not UTF-8 (Probably UTF-16) and I cannot just compare two std::string without knowing its encoding even they have the same semantic.
What is the practice to perform such string comparison in unicode-enable c++ application development??
For example an API like this:
class MyDatabase
{
bool isAvailable(const std::string& key)
{
// *compare* key in database
if (key == "Ü")
return true;
else
return false;
}
}
Other programs may call isAvailable with std::string in UTF-8 or default (UTF-16?) encoding. How can I garantee to do the proper comparision?
can I detect any encoding mismatch in compile-time?
Note: I prefer C++11/14 stuff.
Prefer std::string than std::wstring
"\u00dc" is a char[] encoded in whatever the compiler/OS's default 8-bit encoding happens to be, so it can be different on different machines. On Windows, that tends to be the OS's default Ansi encoding, or it could be the encoding that the source file is saved as.
L"\u00dc" is a wchar_t[] encoded with either UTF-16 or UTF-32, depending on the compiler's definition of wchar_t (which is 16-bit on Windows, so UTF-16).
u8"\u00dc" is a char[] encoded in UTF-8.
u"\u00dc" is a char16_t[] encoded in UTF-16.
U"\u00dc" is a char32_t[] encoded in UTF-32.
The ""s suffix simply returns a std::string, std::wstring, std::u16string, or std::u32string, depending on whether a char[], wchar_t[], char16_t[], or char32_t[] is passed to it.
When comparing two strings, make sure they are in the same encoding first. This is especially important for your char[]/std::string data, as it could be in any number of 8-bit encodings, depending on the systems involved. This is not so much a problem if the app is generating the strings itself, but it is important if one or more of the strings is coming from an external source (file, user input, network protocol, etc).
In your example, "\u00dc" and "Ü" are not necessarily guaranteed to produce the same char[] sequence, depending on how the compiler interprets those different literals. But even if they did (which seems to be the case in your example), neither of them will likely produce UTF-8 (you have to go to extra measures to force that), which is why your comparison to utf8_encoded_string_s fails.
So, if you are expecting a string literal to be UTF-8, use u8"" to ensure that. If you are getting string data from an external source and need it to be in UTF-8, convert it to UTF-8 in code as soon as possible, if it is not already (which means you have to know the encoding used by the external source).
Team,
I am not able to use the Java 7 Underscores in Numeric Literals feature for getting the input from user and printing out in same format as declared. Please help in doing that? OR Is this feature is incomplete?
Scanner input = new Scanner( System.in );
int x = 1_00_000;
System.out.print( "Enter numeric literals with underscores: " ); //2_00_000
x = input.nextInt(); //java.util.InputMismatchException
System.out.println(x); // Prints in normal format, but want to be in 2_00_000.
NOTE: In Eclipse; I am able to change the value of numeric literal with Underscored numeric literal in runtime. This may be hack, but this is needed feature to input Underscored numeric literal in runtime rit?.
http://www.eclipse.org/jdt/ui/r3_8/Java7news/whats-new-java-7.html#miscellaneous
if you want maintain the underscores you can use String:
Scanner input = new Scanner( System.in );
System.out.print( "Enter numeric literals with underscores: " ); //2_00_000
String stringLiterals = input.nextLine();
System.out.println(stringLiterals); // Prints 2_00_000.
I played around with some String -> byte -> binary code and I want my code to work for any byte[] array, currently it only works for, I am not sure ascii?
chinese DONT WORK.
String message =" 汉语";
playingWithFire(message.getBytes());
while String wow = "WOW..."; Works :( I want it to work for all utf-8 formates. Any pointers on how I can do it?
//thanks
public static byte[] playingWithFire(byte[] bytes){
byte[] newbytes = null;
newbytes = new byte[bytes.length];
for(int i = 0; i < bytes.length; i++){
String tempStringByte = String.format("%8s", Integer.toBinaryString(bytes[i] & 0xFF)).replace(' ', '0');
StringBuffer newByteBrf = null;
newByteBrf = new StringBuffer();
for(int x = 0; x < tempStringByte.length(); x++){
newByteBrf.append(tempStringByte.charAt(x));
}
/*short a = Short.parseShort(newByteBrf.toString(), 2);
ByteBuffer bytesads = ByteBuffer.allocate(2).putShort(a);
newbytes[i] = bytesads.get();
cause: java.nio.BufferUnderflowException
*/
//cause: java.lang.NumberFormatException: Value out of range.
newbytes[i] = Byte.parseByte(newByteBrf.toString(), 2);
}
return newbytes;
}
message.getBytes() in your case is trying to convert Chinese Unicode characters to bytes using the default character set on your computer. If its a western charset, its going to be wrong.
Notice that String.getBytes() has another form with String.getBytes(String) where the string is the name of a character encoding that is used to convert the chars of the string to bytes.
The char type will hold Unicode. The byte type only holds raw bits in groups of 8.
So, to convert a Unicode string to bytes encoded as UTF-16 you would use this code:
String message =" 汉语";
byte[] utf16Bytes = message.getBytes("utf-16");
Substitute the name of any encoding that you want to use.
Similarly new String(String, byte[]) constructor can take an array of bytes encoded in some fashion and, given the String, can convert those bytes to Unicode characters.
For example: If you want to convert those bytes, which were encoded as utf-16 above, back to a String (which has Unicode chars in it):
String newMessage = new String(utf16Bytes, "utf-16");
Since I don't know what you mean by "binary code" above, I can't go much farther. As I see it, the Unicode chars have a binary code inside them that represents the characters one-by-one. Also the byte array has a binary code in it that represents the characters with a many-bytes-to-one-character representation. If you want to encrypt the byte array somehow, use a standard, proven encryption method and proven, time-tested procedures to secure the contents.
I want to add a bunch of Emoji icons to an array. From my earlier question I found out how to write the Emoji icons in an NSString.
Now I want to make a loop and add these icons to an array. This should be fairly easy as the unicodes are in certain ranges so something like the following should do it:
for (int i = 0; i < 10; i++)
[someArray addObject:[NSString stringWithFormat:#"\U0001F43%i", i]];
Problem is, when doing so I get an error saying:
Incomplete universal character name.
Does anyone know of a way to do this?
That's because the escape sequence \Uxxxxxxxx is evaluated by the compiler which replaces it with the corresponding Unicode code point. Then when the method stringWithFormat: will replace the format specifier %i with the decimal representation of i. The final string is the concatenation of the characters corresponding to \Uxxxxxxxx and the characters representing i. stringWithFormat: replaces characters with other characters ; it doesn't alter existing characters.
But the problem is, here the compiler sees an incomplete escape sequence as you only wrote 7 hexadecimal digits. So it's not able to generate the string and raises an error.
The solution is to generate the character (a simple integer value) at runtime and create a string with it using +[NSString stringWithCharacters:length].
But if you look in the headers, you'll see that NSString stores its characters as unichar which is defined as an unsigned short, i.e a 16 bits-long value, whereas the Unicode code point U+1F430 (🐰) requires at least 17 bits.
So you cannot use a single unichar character to represent that code point. But don't worry: you can use two characters to represent it.
You're lost? Here the explanation! Unicode doesn't define characters, it defines code points which are arbitrary integers values in the range U+0000 – U+10FFFF. Then, the implementation decides how to represent those code point using characters. The implementation may use any data type it wants as characters as long as it manages to represent all valid code points. The simplest solution would be to use 32 bits-long integers but that would require too much memory as most of the code point you use are in the first Unicode plan (U+0000 – U+FFFF). So NSString stores the code points with the UTF-16 encoding which uses 16 bits-long characters.
In UTF-16, every code point beyond U+FFFF is stored using a pair of characters (known as a surrogate pair) in the range 0xD800 – 0xDFFF (the corresponding code points are explicitly reserved in the Unicode standard).
In conclusion, any valid Unicode code point may be represented using one or two unichar characters. The method to do so is described there. And here is a simple implementation:
static NSString *stringWithCodePoint(uint32_t codePoint)
{
// NOTE: As I edited the answer, you'll find a simpler implementation of
// this function below
unichar characters[2];
NSUInteger length;
if ( codePoint <= 0xD7FF || (codePoint >= 0xE000 && codePoint <= 0xFFFF) ) {
characters[0] = codePoint;
length = 1;
}
if ( codePoint >= 0x10000 && codePoint <= 0x10ffff ) {
codePoint -= 0x10000;
characters[0] = 0xD800 + (codePoint >> 10);
characters[1] = 0xDC00 + (codePoint & 0x3ff);
length = 2;
}
else {
length = 0; // invalid code point
}
return [NSString stringWithCharacters:characters length:length];
}
Now that we can generate a string from any valid code point, we just need to update the code to use the function we wrote before:
for (int i = 0; i < 10; i++)
[someArray addObject:stringWithCodePoint(0x0001F430 + i)];
EDIT: I just figured out a simpler method to get a NSString from a code point. It works by using -[NSString initWithBytes:length:encoding:] and the NSUTF32StringEncoding encoding:
static NSString *stringWithCodePoint(uint32_t codePoint)
{
NSString *string = [[NSString alloc] initWithBytes:&codePoint length:4 encoding:NSUTF32StringEncoding];
// You may remove the next 3 lines if you use ARC
#if ! __has_feature(objc_arc)
[string autorelease];
#endif
return string;
}
Note this similar question. As one of its answers explains, backslash escapes in a string literal are evaluated at compile time. If you want to make a Unicode character using a \Uxxxx escape, the xxxx all need to be numbers in the string literal.
What you can do instead, as per another answer is use the format specifier %C -- not together with the \Uxxxx escape, but on its own -- and pass in the full character code as an integer. (Actually, a wchar_t, which is a 32-bit integer on Mac OS X now, which you'll need since the character code you're looking for is more than 16 bits long.) To put this together with a base, you can just add the integers:
wchar_t base = 0x0001F430; // unfamiliar? we start with 0x for hexadecimal integers
for (int i = 0; i < 10; i++)
[someArray addObject:[NSString stringWithFormat:#"%C", base + i]];
There's also stringWithCharacters: but that explicitly takes a (16-bit) unichar, so you'd need to use a character sequence to encode your emoji in UTF-16.
Use %C instead of %i
so:
[someArray addObject:[NSString stringWithFormat:#"\U0001F43%C", i]];