Are byte slices of utf8 also utf8? - utf-8

Given a slice of bytes that is valid utf8, is it true that any sub-slice of such slice is also valid utf8?
In other words, given b1: [u8] that is valid utf8, can I assume that
b2 = b1[i..j] is valid utf8 for any i,j : i<j?
If not, what would be the counter-example?

what would be the counter-example?
Any code point that encodes as more than 1 byte. For example π in hex is cf80, and slicing it in the middle produces two (separate) invalid UTF-8 strings.

Related

Calculating checksum or XOR operations

I'm using hyperterminal and trying to send strings a to 6 digit scoreboard. I was sent a sample string from the manufacturer to test with and it worked, but to be able change the displayed message I was told to calculate a new Checksum value.
The sample string is: &AHELLO N-12345\71
Charactors A and N are addresses for the scoreboards(allowing two displays be used through one RS232 connection). HELLO and -12345 are the characters to be shown on the display. The "71" is where I am getting stuck.
How can you obtain 71 from "AHELLO N-12345"?
In the literature supplied with the scoreboard, the "71" from the sample string is described as a character by character logical XOR operation on characters "AHELLO N-12345". The manufacturer however called it a checksum. I'm not trained in this type of language and I did try to research but I can't put it together on my own.
The text below is copied from the supplied literature and describes the "71" (ckck) in question...
- ckck = 2 ASCII control characters: corresponds to the two hexadecimal digits obtained by
performing the character by character logical XOR operation on characters
"AxxxxxxByyyyyy". If there is an error in these characters, the string is ignored
Example: if the byte by byte logical XOR operation carried out on the ASCII codes of the
characters of the "AxxxxxxByyyyyy" string returns the hexadecimal value 0x2A,
the control characters ckck are "2" and "A".
You don't specify a language but here's the algorithm in C#. Basically xor the values of the string all together and you'll end up with a value of 113, 71 in hex. Hence 71 is on the end of the input string.
string input = "AHELLO N-12345";
UInt16 chk = 0;
foreach(char ch in input) {
chk ^= ch;
}
MessageBox.Show("value is " + chk);
Outputs "value is 113"

String formatting has precision in Go?

I came across this line in the book 'The Go Programming Languague' on page 112.
fmt.Printf("#%-5d %9.9s %.55s\n", item.Number, item.User.Login, item.Title)
What do %9.9s and %.55s mean?
From go doc fmt:
Width is specified by an optional decimal number immediately preceding the verb. If absent, the width is whatever is necessary to represent the value. ....
For strings, byte slices and byte arrays, however, precision limits the length of the input to be formatted (not the size of the output), truncating if necessary.
Thus, %.9.9s means minimal width 9 runes with input truncated at 9, and thus exactly length 9. Similar %.55s means no minimal width but input truncated at 55 which means output is at most 55 runes.

How can I convert ASCII code to characters in Verilog language

I've been looking into this but searching seems to lead to nothing.
It might be too simple to be described, but here I am, scratching my head...
Any help would be appreciated.
Verilog knows about "strings".
A single ASCII character requires 8 bits. Thus to store 8 characters you need 64 bits:
wire [63:0] string8;
assign string8 = "12345678";
There are some gotchas:
There is no End-Of-String character (like the C null-character)
The most RHS character is in bits 7:0.
Thus string8[7:0] will hold 8h'38. ("8").
To walk through a string you have to use e.g.: string[ index +: 8];
As with all Verilog vector assignments: unused bits are set to zero thus
assign string8 = "ABCD"; // MS bit63:32 are zero
You can not use two dimensional arrays:
wire [7:0] string5 [0:4]; assign string5 = "Wrong";
You are probably mislead by a misconception about characters. There are no such thing as a character in hardware. There are only sets of bits or codes. The only thing which converts binary codes to characters is your terminal. It interprets codes in a certain way and forming letters for you to se. So, all the printfs in 'c' and $display in verilog only send the codes to the terminal (or to a file).
The thing which converts characters to the codes is your keyboard, which you also use to type in the program. The compiler then interprets your program. Verilog (as well as the 'c') compiler represents strings in double quotes (which you typed in) as a set of bytes directly. Verilog, as well as 'c' use ascii-8 encoding for such character strings, meaning that the code for 'a' is decimal 97 and 'b' is 98, .... Every character is 8-bit wide and the quoted string forms a concatenation of bytes of ascii codes.
So, answering you question, you can convert an ascii codes to characters by sending them to the terminal via $display (or other) function, using the %s modifier.
So, an example:
module A;
reg[8*5-1:0] hello;
reg[8*3 - 1: 0] bye;
initial begin
hello = "hello"; // 5 bytes of characters
bye = {8'd98, 8'd121, 8'd101}; // 3 bytes 'b' 'y' 'e'
$display("hello=%s bye=%s", hello, bye);
end
endmodule

How do I convert hex to binary (and vice versa) in Ruby, WHILE maintaining leading zeroes?

I have a data structure that I'd like to convert back and forth from hex to binary in Ruby. The simplest approach for a binary to hex is '0010'.to_i(2).to_s(16) - unfortunately this does not preserve leading zeroes (due to the to_i call), as one may need with data structures like cryptographic keys (which also vary with the number of leading zeroes).
Is there an easy built in way to do this?
I think you should have a firm idea of how many bits are in your cryptographic key. That should be stored in some constant or variable in your program, not inside individual strings representing the key:
KEY_BITS = 16
The most natural way to represent a key is as an integer, so if you receive a key in a hex format you can convert it like this (leading zeros in the string do not matter):
key = 'a0a0'.to_i(16)
If you receive a key in a (ASCII) binary format, you can convert it like this (leading zeros in the string do not matter):
key = '101011'.to_i(2)
If you need to output a key in hex with the right number of leading zeros:
key.to_s(16).rjust((KEY_BITS+3)/4, '0')
If you need to output a key in binary with the right number of leading zeros:
key.to_s(2).rjust(KEY_BITS, '0')
If you really do want to figure out how many bits might be in a key based on a (ASCII) binary or hex string, you can do:
key_bits = binary_str.length
key_bits = hex_str.length * 4
The truth is, leading zeros are not part of the integer value. I mean, it's a little detail related to representation of this value, not the value itself. So if you want to preserve properties of representation, it may be best not to get to underlying values at all.
Luckily, hex<->binary conversion has one neat property: each hexadecimal digit exactly corresponds to 4 binary digits. So assuming you only get binary numbers that have number of digits divisible by 4 you can just construct two dictionaries for constructing back and forth:
# Hexadecimal part is easy
hex = [*'0'..'9', *'A'..'F']
# Binary... not much longer, but a bit trickier
bin = (0..15).map { |i| '%04b' % i }
Note the use of String#% operator, that formats the given value interpreting the string as printf-style format string.
Okay, so these are lists of "digits", 16 each. Now for the dictionaries:
hex2bin = hex.zip(bin).to_h
bin2hex = bin.zip(hex).to_h
Converting hex to bin with these is straightforward:
"DEADBEEF".each_char.map { |d| hex2bin[d] }.join
Converting back is not that trivial. I assume we have a "good number" that can be split into groups of 4 binary digits each. I haven't found a cleaner way than using String#scan with a "match every 4 characters" regex:
"10111110".scan(/.{4}/).map { |d| bin2hex[d] }.join
The procedure is mostly similar.
Bonus task: implement the same conversion disregarding my assumption of having only "good binary numbers", i. e. "110101".
"I-should-have-read-the-docs" remark: there is Hash#invert that returns a hash with all key-value pairs inverted.
This is the most straightforward solution I found that preserves leading zeros. To convert from hexadecimal to binary:
['DEADBEEF'].pack('H*').unpack('B*').first # => "11011110101011011011111011101111"
And from binary to hexadecimal:
['11011110101011011011111011101111'].pack('B*').unpack1('H*') # => "deadbeef"
Here you can find more information:
Array#pack: https://ruby-doc.org/core-2.7.1/Array.html#method-i-pack
String#unpack1 (similar to unpack): https://ruby-doc.org/core-2.7.1/String.html#method-i-unpack1

Random byte being added when using string()

I am attempting to XOR two values. If I do I can get the right result, however, using string() on it results in a random byte being added to it!
Can anyone explain this?
Here's a playground: http://play.golang.org/p/tIOOjqo_Fe
So, you have:
z := 175 // 0xaf
That is the unicode code point for the character: ¯
The following line of code will then take the value and treat it as a unicode code point (rune) and turn it into a utf-8 encoded string:
out := string(z)
In utf-8 encoding, that character would be represented by two bytes: []byte(0xc2, 0xaf)
So, the bytes you see are the Go string's utf-8 encoding.

Resources