I'm trying to understand padding in sha family.
padded message abc is 61626380000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000018
a=61 , b=62 and c=63 I got it. 18 means 24 in hex which is abc => 3.8
what is 8 and the whole zeros for?
Thanks for the answers
To extend someWhereElse's comment. SHA padding extends the message to the next block boundary. At the end of the message add a 1 bit (not byte) followed by zero bits until 64 bits from the end of the block. Use the last 64 bits of the block to hold the length of the message in binary.
Your padded block is made up of:
616263 : "abc" - the message text.
8 : "1000" - the 1 bit to start the padding, followed by padding zeros.
0...0 : The rest of the padding zero bits.
0000000000000018 : 64 bit message length, 24 bits in this case.
If the message ends within 65 bits of the block boundary, then the padding needs to include a whole extra block.
Related
In Oracle SQL:
When we pass any non-ascii characters to ascii function, it is returning some number. How we can interpret that number. If we have character set as AL32UTF8, why its not returning Unicode point for given character
select * from nls_database_parameters where parameter = 'NLS_CHARACTERSET';--AL32UTF8
select ascii('Á') from dual;--50049
what is the meaning of this value 50049? I was expecting 193
Here - mostly for my own education - is an explanation of the value 50049 for the accented A (code point: 193 in the Unicode coded character set). I just learned how this works, from reading https://www.fileformat.info/info/unicode/utf8.htm Another pretty clear explanation and an example for a three-byte encoding on Wikipedia: https://en.wikipedia.org/wiki/UTF-8#Encoding
The computation of the encoding is derived from the code point 193 and has nothing to do with which specific character is associated with that code point.
UTF-8 uses a relatively simple scheme to encode code points up to 1,141,111 (or, likely, more these days; for now let's only worry about code points up to that upper limit).
Code points from 1 to 127 (decimal) are encoded as a single byte, equal to the code point (so the byte always has a leading zero when represented in binary).
Code points from 128 to 2047 are encoded as two bytes. The first byte, in binary representation, always begins with 110, and the second with 10. A byte that begins with 110 is always the first byte of a two-byte encoding, and a byte that begins with 10 is always a "continuation" byte (second, third or fourth) in a multi-byte encoding. These mandatory prefixes are part of the encoding scheme (the "rules" of UTF8 encoding); they are hard-coded values in the rules.
So: for code points from 128 to 2047, the encoding is in two bytes, of the exact format 110xxxxx and 10xxxxxx in binary notation. The last five digits (bits) from the first byte, plus the last six digits from the second (total: 11 bits) are the binary representation of the code point (the value from 128 to 2047 that must be encoded).
2047 = 2^11 - 1 (this is why 2047 is relevant here). The code point can be represented as an 11-bit binary number (possibly with leading zeros). Take the first five bits (after left-padding with 0 to a length of 11 bits) and attach that to the mandatory 110 prefix of the first byte, and take the last six bits of the code point and attach them to the mandatory prefix 10 of the second byte. That gives the UTF8 encoding (in two bytes) of the given code point.
Let's do that for code point 193(decimal). In binary, and padding with 0 to the left, that is 00011000001. So far, nothing fancy.
Split this into five bits || six bits: 00011 and 000001.
Attach the mandatory prefixes: 11000011 and 10000001.
Rewrite these in hex: \xC3 and \x81. Put them together; this is hex C381, or decimal 50049.
See documentation: ASCII
ASCII returns the decimal representation in the database character set of the first character of char.
Binary value of character Á (U+00C1) in UTF-8 is xC381 which is decimal 50049.
193 is the Code Point. For UTF-8 the code point is equal to binary representation only for characters U+0000 - U+007F (0-127) . For UTF-16BE the code point is equal to binary representation only for characters U+0000 - U+FFFF (0-65535),
Maybe you are looking for
ASCIISTR('Á')
which returns \00C1, you only have to convert it to a decimal value.
Some time ago I developed this function, which is more advanced than ASCIISTR, it works also work multicode characters.
CREATE OR REPLACE TYPE VARCHAR_TABLE_TYPE AS TABLE OF VARCHAR2(100);
FUNCTION UNICODECHAR(uchar VARCHAR2) RETURN VARCHAR_TABLE_TYPE IS
UTF16 VARCHAR2(32000) := ASCIISTR(uchar);
UTF16_Table VARCHAR_TABLE_TYPE := VARCHAR_TABLE_TYPE();
sg1 VARCHAR2(4);
sg2 VARCHAR2(4);
codepoint INTEGER;
res VARCHAR_TABLE_TYPE := VARCHAR_TABLE_TYPE();
i INTEGER;
BEGIN
IF uchar IS NULL THEN
RETURN VARCHAR_TABLE_TYPE();
END IF;
SELECT REGEXP_SUBSTR(UTF16, '(\\[[:xdigit:]]{4})|.', 1, LEVEL)
BULK COLLECT INTO UTF16_Table
FROM dual
CONNECT BY REGEXP_SUBSTR(UTF16, '\\[[:xdigit:]]{4}|.', 1, LEVEL) IS NOT NULL;
i := UTF16_Table.FIRST;
WHILE i IS NOT NULL LOOP
res.EXTEND;
IF REGEXP_LIKE(UTF16_Table(i), '^\\') THEN
IF REGEXP_LIKE(UTF16_Table(i), '^\\D(8|9|A|B)') THEN
sg1 := REGEXP_SUBSTR(UTF16_Table(i), '[[:xdigit:]]{4}');
i := UTF16_Table.NEXT(i);
sg2 := REGEXP_SUBSTR(UTF16_Table(i), '[[:xdigit:]]{4}');
codepoint := 2**10 * (TO_NUMBER(sg1, 'XXXX') - TO_NUMBER('D800', 'XXXX')) + TO_NUMBER(sg2, 'XXXX') - TO_NUMBER('DC00', 'XXXX') + 2**16;
res(res.LAST) := 'U+'||TO_CHAR(codepoint, 'fmXXXXXX');
ELSE
res(res.LAST) := 'U+'||REGEXP_REPLACE(UTF16_Table(i), '^\\');
END IF;
ELSE
res(res.LAST) := 'U+'||LPAD(TO_CHAR(ASCII(UTF16_Table(i)), 'fmXX'), 4, '0');
END IF;
i := UTF16_Table.NEXT(i);
END LOOP;
RETURN res;
END UNICODECHAR;
Try some examples from https://unicode.org/emoji/charts/full-emoji-list.html#1f3f3_fe0f_200d_1f308
UNICODECHAR('🏴☠️')
should return
U+1F3F4
U+200D
U+2620
U+FE0F
I am trying to solve the following problem using Ruby:
I have a requirement to generate strings with variable bits length which contain only alphanumeric characters.
Here is what I have already found:
Digest::SHA2.new(bitlen = 256).to_s
# => "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
It does exactly what I need, but it accepts only 256, 384, and 512 as bitlen.
Does anybody aware of any alternatives?
Thanks in advance.
Update
One byte = collection of 8 bits.
Every alphanumeric character occupies 1 byte according to String#bytesize.
('a'..'z').chain('A'..'Z').chain('0'..'9').map(&:bytesize).uniq
# => [1]
Based on the facts mentioned above, we can suppose that
SecureRandom.alphanumeric(1) generates an alphanumeric string with 8 bits length.
SecureRandom.alphanumeric(2) generates an alphanumeric string with 16 bits length.
SecureRandom.alphanumeric(3) generates an alphanumeric string with 24 bits length.
And so on...
As a result, #anothermh's answer can be considered as an acceptable solution.
Use SecureRandom.
First, make sure you require it:
require 'securerandom'
Then you can generate values:
SecureRandom.alphanumeric(10)
=> "hxYolwzk0P"
Change 10 to whatever length you require.
It's worth pointing out that the example you used was returning not alphanumeric but hexadecimal values. If you specifically require hex then you can use:
SecureRandom.hex(10)
=> "470eb1d8daebacd20920"
I'm using hyperterminal and trying to send strings a to 6 digit scoreboard. I was sent a sample string from the manufacturer to test with and it worked, but to be able change the displayed message I was told to calculate a new Checksum value.
The sample string is: &AHELLO N-12345\71
Charactors A and N are addresses for the scoreboards(allowing two displays be used through one RS232 connection). HELLO and -12345 are the characters to be shown on the display. The "71" is where I am getting stuck.
How can you obtain 71 from "AHELLO N-12345"?
In the literature supplied with the scoreboard, the "71" from the sample string is described as a character by character logical XOR operation on characters "AHELLO N-12345". The manufacturer however called it a checksum. I'm not trained in this type of language and I did try to research but I can't put it together on my own.
The text below is copied from the supplied literature and describes the "71" (ckck) in question...
- ckck = 2 ASCII control characters: corresponds to the two hexadecimal digits obtained by
performing the character by character logical XOR operation on characters
"AxxxxxxByyyyyy". If there is an error in these characters, the string is ignored
Example: if the byte by byte logical XOR operation carried out on the ASCII codes of the
characters of the "AxxxxxxByyyyyy" string returns the hexadecimal value 0x2A,
the control characters ckck are "2" and "A".
You don't specify a language but here's the algorithm in C#. Basically xor the values of the string all together and you'll end up with a value of 113, 71 in hex. Hence 71 is on the end of the input string.
string input = "AHELLO N-12345";
UInt16 chk = 0;
foreach(char ch in input) {
chk ^= ch;
}
MessageBox.Show("value is " + chk);
Outputs "value is 113"
I came across this line in the book 'The Go Programming Languague' on page 112.
fmt.Printf("#%-5d %9.9s %.55s\n", item.Number, item.User.Login, item.Title)
What do %9.9s and %.55s mean?
From go doc fmt:
Width is specified by an optional decimal number immediately preceding the verb. If absent, the width is whatever is necessary to represent the value. ....
For strings, byte slices and byte arrays, however, precision limits the length of the input to be formatted (not the size of the output), truncating if necessary.
Thus, %.9.9s means minimal width 9 runes with input truncated at 9, and thus exactly length 9. Similar %.55s means no minimal width but input truncated at 55 which means output is at most 55 runes.
I've been looking into this but searching seems to lead to nothing.
It might be too simple to be described, but here I am, scratching my head...
Any help would be appreciated.
Verilog knows about "strings".
A single ASCII character requires 8 bits. Thus to store 8 characters you need 64 bits:
wire [63:0] string8;
assign string8 = "12345678";
There are some gotchas:
There is no End-Of-String character (like the C null-character)
The most RHS character is in bits 7:0.
Thus string8[7:0] will hold 8h'38. ("8").
To walk through a string you have to use e.g.: string[ index +: 8];
As with all Verilog vector assignments: unused bits are set to zero thus
assign string8 = "ABCD"; // MS bit63:32 are zero
You can not use two dimensional arrays:
wire [7:0] string5 [0:4]; assign string5 = "Wrong";
You are probably mislead by a misconception about characters. There are no such thing as a character in hardware. There are only sets of bits or codes. The only thing which converts binary codes to characters is your terminal. It interprets codes in a certain way and forming letters for you to se. So, all the printfs in 'c' and $display in verilog only send the codes to the terminal (or to a file).
The thing which converts characters to the codes is your keyboard, which you also use to type in the program. The compiler then interprets your program. Verilog (as well as the 'c') compiler represents strings in double quotes (which you typed in) as a set of bytes directly. Verilog, as well as 'c' use ascii-8 encoding for such character strings, meaning that the code for 'a' is decimal 97 and 'b' is 98, .... Every character is 8-bit wide and the quoted string forms a concatenation of bytes of ascii codes.
So, answering you question, you can convert an ascii codes to characters by sending them to the terminal via $display (or other) function, using the %s modifier.
So, an example:
module A;
reg[8*5-1:0] hello;
reg[8*3 - 1: 0] bye;
initial begin
hello = "hello"; // 5 bytes of characters
bye = {8'd98, 8'd121, 8'd101}; // 3 bytes 'b' 'y' 'e'
$display("hello=%s bye=%s", hello, bye);
end
endmodule