Unicode with format - macos

I want to add a bunch of Emoji icons to an array. From my earlier question I found out how to write the Emoji icons in an NSString.
Now I want to make a loop and add these icons to an array. This should be fairly easy as the unicodes are in certain ranges so something like the following should do it:
for (int i = 0; i < 10; i++)
[someArray addObject:[NSString stringWithFormat:#"\U0001F43%i", i]];
Problem is, when doing so I get an error saying:
Incomplete universal character name.
Does anyone know of a way to do this?

That's because the escape sequence \Uxxxxxxxx is evaluated by the compiler which replaces it with the corresponding Unicode code point. Then when the method stringWithFormat: will replace the format specifier %i with the decimal representation of i. The final string is the concatenation of the characters corresponding to \Uxxxxxxxx and the characters representing i. stringWithFormat: replaces characters with other characters ; it doesn't alter existing characters.
But the problem is, here the compiler sees an incomplete escape sequence as you only wrote 7 hexadecimal digits. So it's not able to generate the string and raises an error.
The solution is to generate the character (a simple integer value) at runtime and create a string with it using +[NSString stringWithCharacters:length].
But if you look in the headers, you'll see that NSString stores its characters as unichar which is defined as an unsigned short, i.e a 16 bits-long value, whereas the Unicode code point U+1F430 (🐰) requires at least 17 bits.
So you cannot use a single unichar character to represent that code point. But don't worry: you can use two characters to represent it.
You're lost? Here the explanation! Unicode doesn't define characters, it defines code points which are arbitrary integers values in the range U+0000 – U+10FFFF. Then, the implementation decides how to represent those code point using characters. The implementation may use any data type it wants as characters as long as it manages to represent all valid code points. The simplest solution would be to use 32 bits-long integers but that would require too much memory as most of the code point you use are in the first Unicode plan (U+0000 – U+FFFF). So NSString stores the code points with the UTF-16 encoding which uses 16 bits-long characters.
In UTF-16, every code point beyond U+FFFF is stored using a pair of characters (known as a surrogate pair) in the range 0xD800 – 0xDFFF (the corresponding code points are explicitly reserved in the Unicode standard).
In conclusion, any valid Unicode code point may be represented using one or two unichar characters. The method to do so is described there. And here is a simple implementation:
static NSString *stringWithCodePoint(uint32_t codePoint)
{
// NOTE: As I edited the answer, you'll find a simpler implementation of
// this function below
unichar characters[2];
NSUInteger length;
if ( codePoint <= 0xD7FF || (codePoint >= 0xE000 && codePoint <= 0xFFFF) ) {
characters[0] = codePoint;
length = 1;
}
if ( codePoint >= 0x10000 && codePoint <= 0x10ffff ) {
codePoint -= 0x10000;
characters[0] = 0xD800 + (codePoint >> 10);
characters[1] = 0xDC00 + (codePoint & 0x3ff);
length = 2;
}
else {
length = 0; // invalid code point
}
return [NSString stringWithCharacters:characters length:length];
}
Now that we can generate a string from any valid code point, we just need to update the code to use the function we wrote before:
for (int i = 0; i < 10; i++)
[someArray addObject:stringWithCodePoint(0x0001F430 + i)];
EDIT: I just figured out a simpler method to get a NSString from a code point. It works by using -[NSString initWithBytes:length:encoding:] and the NSUTF32StringEncoding encoding:
static NSString *stringWithCodePoint(uint32_t codePoint)
{
NSString *string = [[NSString alloc] initWithBytes:&codePoint length:4 encoding:NSUTF32StringEncoding];
// You may remove the next 3 lines if you use ARC
#if ! __has_feature(objc_arc)
[string autorelease];
#endif
return string;
}

Note this similar question. As one of its answers explains, backslash escapes in a string literal are evaluated at compile time. If you want to make a Unicode character using a \Uxxxx escape, the xxxx all need to be numbers in the string literal.
What you can do instead, as per another answer is use the format specifier %C -- not together with the \Uxxxx escape, but on its own -- and pass in the full character code as an integer. (Actually, a wchar_t, which is a 32-bit integer on Mac OS X now, which you'll need since the character code you're looking for is more than 16 bits long.) To put this together with a base, you can just add the integers:
wchar_t base = 0x0001F430; // unfamiliar? we start with 0x for hexadecimal integers
for (int i = 0; i < 10; i++)
[someArray addObject:[NSString stringWithFormat:#"%C", base + i]];
There's also stringWithCharacters: but that explicitly takes a (16-bit) unichar, so you'd need to use a character sequence to encode your emoji in UTF-16.

Use %C instead of %i
so:
[someArray addObject:[NSString stringWithFormat:#"\U0001F43%C", i]];

Related

Add angle symbol to string

How can i add an angle symbol to a string to put in a TMemo?
I can add a degree symbol easy enough based on its octal value from the extended ascii table:
String deg = "\272"; // 272 is octal value in ascii code table for degree symbol
Form1->Memo1->Lines->Add("My angle = 90" + deg);
But, if i try to use the escape sequence for the angle symbol (\u2220) i get a compiler error, W8114 Character represented by universal-character-name \u2220 cannot be represented in the current ansi locale:
UnicodeString deg = "\u2220";
Form1->Memo1->Lines->Add("My angle = 90" + deg);
Just for clarity, below is the symbol i'm after. I can just use the # if i have too, just wondering if this is possible without nashing of teeth. My target for this test was Win32 but i'll want it to work on iOS and Android too.
p.s. This table is handy to see the codes.
After following Rob's answer i've got it working but on iOS the angle is offset down below the horizontal with the other text. On Win32 it is tiny. Looks good on Android. I'll report as a bug to Embarcadero, albeit minor.
Here is code i used based on Rob's comments:
UnicodeString szDeg;
UnicodeString szAng;
szAng.SetLength(1);
szDeg.SetLength(1);
*(szAng.c_str()) = 0x2220;
*(szDeg.c_str()) = 0x00BA;
Form1->Memo1->Lines->Add("1: " + FormatFloat("##,###0.0",myPhasors.M1)+ szAng + FormatFloat("###0.0",myPhasors.A1) + szDeg);
Here is how looks when explicitly set the TMemo font to Courier New:
Here is the final code i'm using after Remy's replies:
UnicodeString szAng = _D("\u2220");
UnicodeString szDeg = _D("\u00BA");
Form1->Memo1->Lines->Add("1: " + FormatFloat("##,###0.0",myPhasors.M1)+ szAng + FormatFloat("###0.0",myPhasors.A1) + szDeg);
The compiler error is because you are using a narrow ANSI string literal, and \u2220 does not fit in a char. Use a Unicode string literal instead:
UnicodeString deg = _D("\u2220");
The RTL's _D() macro prefixes the literal with either the L or u prefix depending on whether UnicodeString uses wchar_t (Windows only) or char16_t (other platforms) for its character data.
The error indicates some kind of code range failure, which you ought to be able to avoid. Try setting the character code directly:
UnicodeString szDeg;
UnicodeString szMessage;
szDeg.SetLength(1);
*(szDeg.c_str())=0x2022;
szMessage=UnicodeString(L"My angle = 90 ")+szDeg;
Form1->Memo1->Lines->Add(szMessage);

What is meant by returning half of a supplementary character?

When you use .charAt method to convert a particular character in a string of string class to a character of char type, a code unit.
For example:
String greeting = "Hello";
Char character = greeting.charAt(0);
Character of char type is set to 'H'.
But you use a single index number for a supplementary character, only the first or second half is returned.
So for example, invoke the same method on $\mathcal {P} (\mathbb{Z})$ (set of integer), only the first half is returned.
All these are what is said in the book. But what does it meant by first half? First half of the two unit?
The language is java
Let me clarify:
(The latex code for set of integer seems to not be working)
String greeting = "(set of integer) is important";
char ch = greeting.charAt(0)
Index zero refers to the symbol for set of integer in math but b/c it is a supplementary(encoded by "\uD835\uDD6B"). Only the first half of this character is returned and stored in ch.
But what is first half?

Converting Characters to ASCII Code & Vice Versa In C++/CLI

I am currently learning c++/cli and I want to convert a character to its ASCII code decimal and vice versa( example 'A' = 65 ).
In JAVA, this can be achieved by a simple type casting:
char ascci = 'A';
char retrieveASCII =' ';
int decimalValue;
decimalValue = (int)ascci;
retrieveASCII = (char)decimalValue;
Apparently this method does not work in c++/cli, here is my code:
String^ words = "ABCDEFG";
String^ getChars;
String^ retrieveASCII;
int decimalValue;
getChars = words->Substring(0, 1);
decimalValue = Int32:: Parse(getChars);
retrieveASCII = decimalValue.ToString();
I am getting this error:
A first chance exception of type 'System.ArgumentOutOfRangeException' occurred in mscorlib.dll
Additional information: Input string was not in a correct format.
Any Idea on how to solve this problem?
Characters in a TextBox::Text property are in a System::String type. Therefore, they are Unicode characters. By design, the Unicode character set includes all of the ASCII characters. So, if the string only has those characters, you can convert to an ASCII encoding without losing any of them. Otherwise, you'd have to have a strategy of omitting or substituting characters or throwing an exception.
The ASCII character set has one encoding in current use. It represents all of its characters in one byte each.
// using ::System::Text;
const auto asciiBytes = Encoding::ASCII->GetBytes(words->Substring(0,1));
const auto decimalValue = asciiBytes[0]; // the length is 1 as explained above
const auto retrieveASCII = Encoding::ASCII->GetString(asciiBytes);
Decimal is, of course, a representation of a number. I don't see where you are using decimal except in your explanation. If you did want to use it in code, it could be like this:
const auto explanation = "The encoding (in decimal) "
+ "for the first character in ASCII is "
+ decimalValue;
Note the use of auto. I have omitted the types of the variables because the compiler can figure them out. It allows the code to be more focused on concepts rather than boilerplate. Also, I used const because I don't believe the value of "variables" should be varied. Neither of these is required.
BTW- All of this applies to Java, too. If your Java code works, it is just out of coincidence. If it had been written properly, it would have been easy to translate to .NET. Java's String and Charset classes have very similar functionality as .NET String and Encoding classes. (Encoding to the proper term, though.) They both use the Unicode character set and UTF-16 encoding for strings.
More like Java than you think
String^ words = "ABCDEFG";
Char first = words [0];
String^ retrieveASCII;
int decimalValue = ( int)first;
retrieveASCII = decimalValue.ToString();

java.lang.NumberFormatException or java.nio.BufferUnderflowException when transforming bytes

I played around with some String -> byte -> binary code and I want my code to work for any byte[] array, currently it only works for, I am not sure ascii?
chinese DONT WORK.
String message =" 汉语";
playingWithFire(message.getBytes());
while String wow = "WOW..."; Works :( I want it to work for all utf-8 formates. Any pointers on how I can do it?
//thanks
public static byte[] playingWithFire(byte[] bytes){
byte[] newbytes = null;
newbytes = new byte[bytes.length];
for(int i = 0; i < bytes.length; i++){
String tempStringByte = String.format("%8s", Integer.toBinaryString(bytes[i] & 0xFF)).replace(' ', '0');
StringBuffer newByteBrf = null;
newByteBrf = new StringBuffer();
for(int x = 0; x < tempStringByte.length(); x++){
newByteBrf.append(tempStringByte.charAt(x));
}
/*short a = Short.parseShort(newByteBrf.toString(), 2);
ByteBuffer bytesads = ByteBuffer.allocate(2).putShort(a);
newbytes[i] = bytesads.get();
cause: java.nio.BufferUnderflowException
*/
//cause: java.lang.NumberFormatException: Value out of range.
newbytes[i] = Byte.parseByte(newByteBrf.toString(), 2);
}
return newbytes;
}
message.getBytes() in your case is trying to convert Chinese Unicode characters to bytes using the default character set on your computer. If its a western charset, its going to be wrong.
Notice that String.getBytes() has another form with String.getBytes(String) where the string is the name of a character encoding that is used to convert the chars of the string to bytes.
The char type will hold Unicode. The byte type only holds raw bits in groups of 8.
So, to convert a Unicode string to bytes encoded as UTF-16 you would use this code:
String message =" 汉语";
byte[] utf16Bytes = message.getBytes("utf-16");
Substitute the name of any encoding that you want to use.
Similarly new String(String, byte[]) constructor can take an array of bytes encoded in some fashion and, given the String, can convert those bytes to Unicode characters.
For example: If you want to convert those bytes, which were encoded as utf-16 above, back to a String (which has Unicode chars in it):
String newMessage = new String(utf16Bytes, "utf-16");
Since I don't know what you mean by "binary code" above, I can't go much farther. As I see it, the Unicode chars have a binary code inside them that represents the characters one-by-one. Also the byte array has a binary code in it that represents the characters with a many-bytes-to-one-character representation. If you want to encrypt the byte array somehow, use a standard, proven encryption method and proven, time-tested procedures to secure the contents.

Extended Ascii codes incomplete DART? No character from 128 to 160

I created a small piece of code to print the extended ASCII characters in DART but it seems the ones between 128 and 160 are blank.
PrintExtendedASCII(){
var listCodes = new List();
for (var i = 128; i < 256 ; i++) {
listeCodes.add(i);
}
var list = new String.fromCharCodes(listCodes);
print(list);
}
It only prints :  ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
Is there something different about the extended ASCII characters in DART?
There is no "extended ASCII" in Dart. The character codes you are using in the code example are not ASCII - they are Unicode code points. For code points 0-127, the character codes match ASCII exactly. The block you are missing, from 128 to 160 (0x80 to 0x9F), is all non-printable control characters.
Here is a table of Unicode code points for the 0x000-0xFFF block. If you look carefully, the order of characters exactly matches the string printed on your machine.

Resources