Understanding BSON Notation - bson

I was trying to understand BSON Notations from the site BSON Site. However I was unable to understand the reason behind the correlations.
I also referred the following questions, but am not convinced because of the following reasons.
Question 1 : Not familiar with ruby implementations
Question 2 : I understood the byte allocation. But unsure about the notations.
I would like to know how the bson object is formed for the following examples in BSON Site
1.{"hello": "world"}
2.{"BSON": ["awesome", 5.05, 1986]}

{"hello": "world"}
\x16\x00\x00\x00
\x02 hello\x00 \x06\x00\x00\x00 world\x00
\x00
(overall: 22 bytes)
The first four bytes contain the overall length as a 32-bit little-endian integer.
\x16\x00\x00\x00 => That's 22 in decimal.
Now comes the first element. The first byte gives the type of data.
\x02 => That's a UTF-8 string.
Then comes the name of the first element, as a null-terminated string.
hello\x00
Next comes the data of the element in the previously given type, in this case a string.
For scannability (so you can quickly skip when you don't need them), strings start with their length, and are null-terminated.
\x06\x00\x00\x00 => That's length 6.
world\x00
Now would come subsequent elements, if there were any. The whole thing is terminated with a null byte.
\x00
{"BSON": ["awesome", 5.05, 1986]}
\x31\x00\x00\x00
\x04 BSON\x00 \x26\x00\x00\x00
\x02 0\x00 \x08\x00\x00\x00 awesome\x00
\x01 1\x00 \x33\x33\x33\x33\x33\x33\x14\x40
\x10 2\x00 \xc2\x07\x00\x00
\x00
\x00
(overall: 49 bytes, array: 38 bytes)
The first four bytes contain the overall length as a 32-bit little-endian integer.
\x31\x00\x00\x00 => That's 49 in decimal.
Now comes the first element. The first byte gives the type of data.
\x04 => That's an array.
Then comes the name of the first element, as a null-terminated string.
BSON\x00
Next comes the data of the element in the previously given type, in this case an array.
[Quote: "The document for an array is a normal BSON document with
integers for the keys, starting with 0 (..)"]
For scannability and because they form document in their own right, arrays start with their length, and are null-terminated.
\x26\x00\x00\x00 => That's 38 in decimal.
Now comes the first element of the array. The first byte gives the type of data.
\x02 => That's a UTF-8 string.
Then comes the name of the first element of the array, null-terminated.
0\x00 => That's key 0.
Next comes the data of the element in the previously given type, in this case a string.
Strings start with their length, and are null-terminated.
\x08\x00\x00\x00 => length 8
awesome\x00
Now comes the second element of the array. The first byte gives the type of data.
\x01 => That's a double floating point number.
Then comes the name of the second element of the array, null-terminated.
1\x00 => That's key 1.
Next comes the data of the element in the previously given type, in this case a double floating point number.
\x33\x33\x33\x33\x33\x33\x14\x40 => That's 5.5.
Now comes the third element of the array. The first byte gives the type of data.
\x10 => That's a 32-bit integer.
Then comes the name of the third element of the array, null-terminated.
2\x00 => That's key 2.
Next comes the data of the element in the previously given type, in this case a 32-bit integer.
\xc2\x07\x00\x00 => That's 1986.
The array is terminated with a null byte.
\x00
The whole thing is terminated with a null byte.
\x00

Related

Ruby: How to generate strings of variable bits length with only alphanumeric characters?

I am trying to solve the following problem using Ruby:
I have a requirement to generate strings with variable bits length which contain only alphanumeric characters.
Here is what I have already found:
Digest::SHA2.new(bitlen = 256).to_s
# => "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
It does exactly what I need, but it accepts only 256, 384, and 512 as bitlen.
Does anybody aware of any alternatives?
Thanks in advance.
Update
One byte = collection of 8 bits.
Every alphanumeric character occupies 1 byte according to String#bytesize.
('a'..'z').chain('A'..'Z').chain('0'..'9').map(&:bytesize).uniq
# => [1]
Based on the facts mentioned above, we can suppose that
SecureRandom.alphanumeric(1) generates an alphanumeric string with 8 bits length.
SecureRandom.alphanumeric(2) generates an alphanumeric string with 16 bits length.
SecureRandom.alphanumeric(3) generates an alphanumeric string with 24 bits length.
And so on...
As a result, #anothermh's answer can be considered as an acceptable solution.
Use SecureRandom.
First, make sure you require it:
require 'securerandom'
Then you can generate values:
SecureRandom.alphanumeric(10)
=> "hxYolwzk0P"
Change 10 to whatever length you require.
It's worth pointing out that the example you used was returning not alphanumeric but hexadecimal values. If you specifically require hex then you can use:
SecureRandom.hex(10)
=> "470eb1d8daebacd20920"

String formatting has precision in Go?

I came across this line in the book 'The Go Programming Languague' on page 112.
fmt.Printf("#%-5d %9.9s %.55s\n", item.Number, item.User.Login, item.Title)
What do %9.9s and %.55s mean?
From go doc fmt:
Width is specified by an optional decimal number immediately preceding the verb. If absent, the width is whatever is necessary to represent the value. ....
For strings, byte slices and byte arrays, however, precision limits the length of the input to be formatted (not the size of the output), truncating if necessary.
Thus, %.9.9s means minimal width 9 runes with input truncated at 9, and thus exactly length 9. Similar %.55s means no minimal width but input truncated at 55 which means output is at most 55 runes.

How do I convert hex to binary (and vice versa) in Ruby, WHILE maintaining leading zeroes?

I have a data structure that I'd like to convert back and forth from hex to binary in Ruby. The simplest approach for a binary to hex is '0010'.to_i(2).to_s(16) - unfortunately this does not preserve leading zeroes (due to the to_i call), as one may need with data structures like cryptographic keys (which also vary with the number of leading zeroes).
Is there an easy built in way to do this?
I think you should have a firm idea of how many bits are in your cryptographic key. That should be stored in some constant or variable in your program, not inside individual strings representing the key:
KEY_BITS = 16
The most natural way to represent a key is as an integer, so if you receive a key in a hex format you can convert it like this (leading zeros in the string do not matter):
key = 'a0a0'.to_i(16)
If you receive a key in a (ASCII) binary format, you can convert it like this (leading zeros in the string do not matter):
key = '101011'.to_i(2)
If you need to output a key in hex with the right number of leading zeros:
key.to_s(16).rjust((KEY_BITS+3)/4, '0')
If you need to output a key in binary with the right number of leading zeros:
key.to_s(2).rjust(KEY_BITS, '0')
If you really do want to figure out how many bits might be in a key based on a (ASCII) binary or hex string, you can do:
key_bits = binary_str.length
key_bits = hex_str.length * 4
The truth is, leading zeros are not part of the integer value. I mean, it's a little detail related to representation of this value, not the value itself. So if you want to preserve properties of representation, it may be best not to get to underlying values at all.
Luckily, hex<->binary conversion has one neat property: each hexadecimal digit exactly corresponds to 4 binary digits. So assuming you only get binary numbers that have number of digits divisible by 4 you can just construct two dictionaries for constructing back and forth:
# Hexadecimal part is easy
hex = [*'0'..'9', *'A'..'F']
# Binary... not much longer, but a bit trickier
bin = (0..15).map { |i| '%04b' % i }
Note the use of String#% operator, that formats the given value interpreting the string as printf-style format string.
Okay, so these are lists of "digits", 16 each. Now for the dictionaries:
hex2bin = hex.zip(bin).to_h
bin2hex = bin.zip(hex).to_h
Converting hex to bin with these is straightforward:
"DEADBEEF".each_char.map { |d| hex2bin[d] }.join
Converting back is not that trivial. I assume we have a "good number" that can be split into groups of 4 binary digits each. I haven't found a cleaner way than using String#scan with a "match every 4 characters" regex:
"10111110".scan(/.{4}/).map { |d| bin2hex[d] }.join
The procedure is mostly similar.
Bonus task: implement the same conversion disregarding my assumption of having only "good binary numbers", i. e. "110101".
"I-should-have-read-the-docs" remark: there is Hash#invert that returns a hash with all key-value pairs inverted.
This is the most straightforward solution I found that preserves leading zeros. To convert from hexadecimal to binary:
['DEADBEEF'].pack('H*').unpack('B*').first # => "11011110101011011011111011101111"
And from binary to hexadecimal:
['11011110101011011011111011101111'].pack('B*').unpack1('H*') # => "deadbeef"
Here you can find more information:
Array#pack: https://ruby-doc.org/core-2.7.1/Array.html#method-i-pack
String#unpack1 (similar to unpack): https://ruby-doc.org/core-2.7.1/String.html#method-i-unpack1

Array of multiple strings possible? Or just array of characters for one string? XGetWindowProperty on atom's where we know a string is returned

When I do XGetWindowProperty on atoms that return strings, such as _NET_WM_NAME, WM_CLASS, WM_NAME. They are returning an array of characters. However when I ask for long or short types like _NET_WM_PID it returns an array of longs, for the PID its just 1 element array, the nitems_return argument is populated with number of elements returned in the array, so for PID its 1. But for the string atoms, the nitems_return hold number of characters returned.
I tested convenience function of XGetWMName and this set the nitems field of XTextProperty to number of characters returned which matches XGetWindowProperty, so I was wondering, is there ever a chance that XGetWindowProperty can returned number of elements in array like for long and short? OR is there only ever just one string associated with those string atoms?
docs on XGetWindowProperty
Thanks

How to deal with Unicode strings in Ruby?

I've seen follownig construction in a tutorial of Ruby:
irb(main):001:0> "abc".each_byte { |c| printf "<%c>", c }
<a><b><c>=> "abc"
However, if I put string Здравствуйте! instead of abc, I get
irb(main):003:0> "Здравствуйте!".each_byte { |c| printf "<%c>", c }
<Ð><><Ð><´><Ñ><><Ð><°><Ð><²><Ñ><><Ñ><><Ð><²><Ñ><><Ð><¹><Ñ><><Ð><µ><!>=> "Здравствуйте!"
How to deal with Unicode strings?
irb(main):005:0> RUBY_VERSION
=> "1.9.3"
▶ "Здравствуйте!".each_char { |c| printf "<%c>", c }
# ⇒ <З><д><р><а><в><с><т><в><у><й><т><е><!>=> "Здравствуйте!"
Byte is byte, while char is char, consisting of bytes.
A byte is 8 bits. But unicode characters can take up multiple bytes when stored on your computer. So for example, lets say the integer code for some unicode character is 8,000, which is what is actually stored on your computer. When ruby reads in 8,000, ruby knows that represents some unicode character. However, 8,000 cannot be stored in one byte on your computer(the largest number that can be stored in one byte is 1111 1111, which is 255). If you tell ruby that each byte of the several bytes stored on your computer for 8,000 represents one character, i.e. by calling each_byte(), then ruby will never see the 8,000. Instead, ruby will read in a piece of 8,000 and think that represents one character, then read in another piece of 8,000 and think that represents another character.
each_byte() tells ruby to ignore the clusters of bytes, and just read in one byte at a time and then determine what character is represented by the integer stored in that byte.

Resources