Extended Ascii codes incomplete DART? No character from 128 to 160 - ascii

I created a small piece of code to print the extended ASCII characters in DART but it seems the ones between 128 and 160 are blank.
PrintExtendedASCII(){
var listCodes = new List();
for (var i = 128; i < 256 ; i++) {
listeCodes.add(i);
}
var list = new String.fromCharCodes(listCodes);
print(list);
}
It only prints :  ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
Is there something different about the extended ASCII characters in DART?

There is no "extended ASCII" in Dart. The character codes you are using in the code example are not ASCII - they are Unicode code points. For code points 0-127, the character codes match ASCII exactly. The block you are missing, from 128 to 160 (0x80 to 0x9F), is all non-printable control characters.
Here is a table of Unicode code points for the 0x000-0xFFF block. If you look carefully, the order of characters exactly matches the string printed on your machine.

Related

Calculating checksum or XOR operations

I'm using hyperterminal and trying to send strings a to 6 digit scoreboard. I was sent a sample string from the manufacturer to test with and it worked, but to be able change the displayed message I was told to calculate a new Checksum value.
The sample string is: &AHELLO N-12345\71
Charactors A and N are addresses for the scoreboards(allowing two displays be used through one RS232 connection). HELLO and -12345 are the characters to be shown on the display. The "71" is where I am getting stuck.
How can you obtain 71 from "AHELLO N-12345"?
In the literature supplied with the scoreboard, the "71" from the sample string is described as a character by character logical XOR operation on characters "AHELLO N-12345". The manufacturer however called it a checksum. I'm not trained in this type of language and I did try to research but I can't put it together on my own.
The text below is copied from the supplied literature and describes the "71" (ckck) in question...
- ckck = 2 ASCII control characters: corresponds to the two hexadecimal digits obtained by
performing the character by character logical XOR operation on characters
"AxxxxxxByyyyyy". If there is an error in these characters, the string is ignored
Example: if the byte by byte logical XOR operation carried out on the ASCII codes of the
characters of the "AxxxxxxByyyyyy" string returns the hexadecimal value 0x2A,
the control characters ckck are "2" and "A".
You don't specify a language but here's the algorithm in C#. Basically xor the values of the string all together and you'll end up with a value of 113, 71 in hex. Hence 71 is on the end of the input string.
string input = "AHELLO N-12345";
UInt16 chk = 0;
foreach(char ch in input) {
chk ^= ch;
}
MessageBox.Show("value is " + chk);
Outputs "value is 113"

How can I convert ASCII code to characters in Verilog language

I've been looking into this but searching seems to lead to nothing.
It might be too simple to be described, but here I am, scratching my head...
Any help would be appreciated.
Verilog knows about "strings".
A single ASCII character requires 8 bits. Thus to store 8 characters you need 64 bits:
wire [63:0] string8;
assign string8 = "12345678";
There are some gotchas:
There is no End-Of-String character (like the C null-character)
The most RHS character is in bits 7:0.
Thus string8[7:0] will hold 8h'38. ("8").
To walk through a string you have to use e.g.: string[ index +: 8];
As with all Verilog vector assignments: unused bits are set to zero thus
assign string8 = "ABCD"; // MS bit63:32 are zero
You can not use two dimensional arrays:
wire [7:0] string5 [0:4]; assign string5 = "Wrong";
You are probably mislead by a misconception about characters. There are no such thing as a character in hardware. There are only sets of bits or codes. The only thing which converts binary codes to characters is your terminal. It interprets codes in a certain way and forming letters for you to se. So, all the printfs in 'c' and $display in verilog only send the codes to the terminal (or to a file).
The thing which converts characters to the codes is your keyboard, which you also use to type in the program. The compiler then interprets your program. Verilog (as well as the 'c') compiler represents strings in double quotes (which you typed in) as a set of bytes directly. Verilog, as well as 'c' use ascii-8 encoding for such character strings, meaning that the code for 'a' is decimal 97 and 'b' is 98, .... Every character is 8-bit wide and the quoted string forms a concatenation of bytes of ascii codes.
So, answering you question, you can convert an ascii codes to characters by sending them to the terminal via $display (or other) function, using the %s modifier.
So, an example:
module A;
reg[8*5-1:0] hello;
reg[8*3 - 1: 0] bye;
initial begin
hello = "hello"; // 5 bytes of characters
bye = {8'd98, 8'd121, 8'd101}; // 3 bytes 'b' 'y' 'e'
$display("hello=%s bye=%s", hello, bye);
end
endmodule

New line character in serialized messages

Some protobuf messages, when serialized to string, have new line character \n inside them. Usually when the first field of the message is a string then the new line character is prepended before the message. But wa also found messages with new line character somewhere in the middle.
The problem with new line character is when you want to save the messages into one file line by line. The new line character breaks the line and makes the message invalid.
example.proto
syntax = "proto3";
package data_sources;
message StringFirst {
string key = 1;
bool valid = 2;
}
message StringSecond {
bool valid = 1;
string key = 2;
}
example.py
from protocol_buffers.data_sources.example_pb2 import StringFirst, StringSecond
print(StringFirst(key='some key').SerializeToString())
print(StringSecond(key='some key').SerializeToString())
output
b'\n\x08some key'
b'\x12\x08some key'
Is this expected / desired behaviour? How can one prevent the new line character?
protobuf is a binary protocol (unless you're talking about the optional json thing). So: any time you're treating it as text-like in any way, you're using it wrong and the behaviour will be undefined. This includes worrying about whether there are CR/LF characters, but it also includes things like the nul-character (0x00), which is often interpreted as end-of-string in text-based APIs in many frameworks (in particular, C-strings).
Specifically:
LF (0x0A) is identical to the field header for "field 1, length-prefixed"
CR (0x0D) is identical to the field header for "field 1, fixed 32-bit"
any of 0x00, 0x0A or 0x0D could occur as a length prefix (to signify a length of 0, 10, or 13)
any of 0x00, 0x0A or 0x0D could occur naturally in binary data (bytes)
any of 0x00, 0x0A or 0x0D could occur naturally in any numeric type
0x0A or 0x0D could occur naturally in text data (as could 0x00 if your originating framework allows nul-characters arbitrarily in strings, so... not C-strings)
and probably a range of other things
So: again - if the inclusion of "special" text characters is problematic: you're using it wrong.
The most common way to handle binary data as text is to use a base-N encode; base-16 (hex) is convenient to display and read, but base-64 is more efficient in terms of the number of characters required to convey the same number of bytes. So if possible: convert to/from base-64 as required. Base-64 never includes any of the non-printable characters, so you will never encounter CR/LF/nul.

Converting Characters to ASCII Code & Vice Versa In C++/CLI

I am currently learning c++/cli and I want to convert a character to its ASCII code decimal and vice versa( example 'A' = 65 ).
In JAVA, this can be achieved by a simple type casting:
char ascci = 'A';
char retrieveASCII =' ';
int decimalValue;
decimalValue = (int)ascci;
retrieveASCII = (char)decimalValue;
Apparently this method does not work in c++/cli, here is my code:
String^ words = "ABCDEFG";
String^ getChars;
String^ retrieveASCII;
int decimalValue;
getChars = words->Substring(0, 1);
decimalValue = Int32:: Parse(getChars);
retrieveASCII = decimalValue.ToString();
I am getting this error:
A first chance exception of type 'System.ArgumentOutOfRangeException' occurred in mscorlib.dll
Additional information: Input string was not in a correct format.
Any Idea on how to solve this problem?
Characters in a TextBox::Text property are in a System::String type. Therefore, they are Unicode characters. By design, the Unicode character set includes all of the ASCII characters. So, if the string only has those characters, you can convert to an ASCII encoding without losing any of them. Otherwise, you'd have to have a strategy of omitting or substituting characters or throwing an exception.
The ASCII character set has one encoding in current use. It represents all of its characters in one byte each.
// using ::System::Text;
const auto asciiBytes = Encoding::ASCII->GetBytes(words->Substring(0,1));
const auto decimalValue = asciiBytes[0]; // the length is 1 as explained above
const auto retrieveASCII = Encoding::ASCII->GetString(asciiBytes);
Decimal is, of course, a representation of a number. I don't see where you are using decimal except in your explanation. If you did want to use it in code, it could be like this:
const auto explanation = "The encoding (in decimal) "
+ "for the first character in ASCII is "
+ decimalValue;
Note the use of auto. I have omitted the types of the variables because the compiler can figure them out. It allows the code to be more focused on concepts rather than boilerplate. Also, I used const because I don't believe the value of "variables" should be varied. Neither of these is required.
BTW- All of this applies to Java, too. If your Java code works, it is just out of coincidence. If it had been written properly, it would have been easy to translate to .NET. Java's String and Charset classes have very similar functionality as .NET String and Encoding classes. (Encoding to the proper term, though.) They both use the Unicode character set and UTF-16 encoding for strings.
More like Java than you think
String^ words = "ABCDEFG";
Char first = words [0];
String^ retrieveASCII;
int decimalValue = ( int)first;
retrieveASCII = decimalValue.ToString();

Unicode with format

I want to add a bunch of Emoji icons to an array. From my earlier question I found out how to write the Emoji icons in an NSString.
Now I want to make a loop and add these icons to an array. This should be fairly easy as the unicodes are in certain ranges so something like the following should do it:
for (int i = 0; i < 10; i++)
[someArray addObject:[NSString stringWithFormat:#"\U0001F43%i", i]];
Problem is, when doing so I get an error saying:
Incomplete universal character name.
Does anyone know of a way to do this?
That's because the escape sequence \Uxxxxxxxx is evaluated by the compiler which replaces it with the corresponding Unicode code point. Then when the method stringWithFormat: will replace the format specifier %i with the decimal representation of i. The final string is the concatenation of the characters corresponding to \Uxxxxxxxx and the characters representing i. stringWithFormat: replaces characters with other characters ; it doesn't alter existing characters.
But the problem is, here the compiler sees an incomplete escape sequence as you only wrote 7 hexadecimal digits. So it's not able to generate the string and raises an error.
The solution is to generate the character (a simple integer value) at runtime and create a string with it using +[NSString stringWithCharacters:length].
But if you look in the headers, you'll see that NSString stores its characters as unichar which is defined as an unsigned short, i.e a 16 bits-long value, whereas the Unicode code point U+1F430 (🐰) requires at least 17 bits.
So you cannot use a single unichar character to represent that code point. But don't worry: you can use two characters to represent it.
You're lost? Here the explanation! Unicode doesn't define characters, it defines code points which are arbitrary integers values in the range U+0000 – U+10FFFF. Then, the implementation decides how to represent those code point using characters. The implementation may use any data type it wants as characters as long as it manages to represent all valid code points. The simplest solution would be to use 32 bits-long integers but that would require too much memory as most of the code point you use are in the first Unicode plan (U+0000 – U+FFFF). So NSString stores the code points with the UTF-16 encoding which uses 16 bits-long characters.
In UTF-16, every code point beyond U+FFFF is stored using a pair of characters (known as a surrogate pair) in the range 0xD800 – 0xDFFF (the corresponding code points are explicitly reserved in the Unicode standard).
In conclusion, any valid Unicode code point may be represented using one or two unichar characters. The method to do so is described there. And here is a simple implementation:
static NSString *stringWithCodePoint(uint32_t codePoint)
{
// NOTE: As I edited the answer, you'll find a simpler implementation of
// this function below
unichar characters[2];
NSUInteger length;
if ( codePoint <= 0xD7FF || (codePoint >= 0xE000 && codePoint <= 0xFFFF) ) {
characters[0] = codePoint;
length = 1;
}
if ( codePoint >= 0x10000 && codePoint <= 0x10ffff ) {
codePoint -= 0x10000;
characters[0] = 0xD800 + (codePoint >> 10);
characters[1] = 0xDC00 + (codePoint & 0x3ff);
length = 2;
}
else {
length = 0; // invalid code point
}
return [NSString stringWithCharacters:characters length:length];
}
Now that we can generate a string from any valid code point, we just need to update the code to use the function we wrote before:
for (int i = 0; i < 10; i++)
[someArray addObject:stringWithCodePoint(0x0001F430 + i)];
EDIT: I just figured out a simpler method to get a NSString from a code point. It works by using -[NSString initWithBytes:length:encoding:] and the NSUTF32StringEncoding encoding:
static NSString *stringWithCodePoint(uint32_t codePoint)
{
NSString *string = [[NSString alloc] initWithBytes:&codePoint length:4 encoding:NSUTF32StringEncoding];
// You may remove the next 3 lines if you use ARC
#if ! __has_feature(objc_arc)
[string autorelease];
#endif
return string;
}
Note this similar question. As one of its answers explains, backslash escapes in a string literal are evaluated at compile time. If you want to make a Unicode character using a \Uxxxx escape, the xxxx all need to be numbers in the string literal.
What you can do instead, as per another answer is use the format specifier %C -- not together with the \Uxxxx escape, but on its own -- and pass in the full character code as an integer. (Actually, a wchar_t, which is a 32-bit integer on Mac OS X now, which you'll need since the character code you're looking for is more than 16 bits long.) To put this together with a base, you can just add the integers:
wchar_t base = 0x0001F430; // unfamiliar? we start with 0x for hexadecimal integers
for (int i = 0; i < 10; i++)
[someArray addObject:[NSString stringWithFormat:#"%C", base + i]];
There's also stringWithCharacters: but that explicitly takes a (16-bit) unichar, so you'd need to use a character sequence to encode your emoji in UTF-16.
Use %C instead of %i
so:
[someArray addObject:[NSString stringWithFormat:#"\U0001F43%C", i]];

Resources