Can someone explain why how the result for the following unpack is computed?
"aaa".unpack('h2H2') #=> ["16", "61"]
In binary, 'a' = 0110 0001. I'm not sure how the 'h2' can become 16 (0001 0000) or 'H2' can become 61 (0011 1101).
Not 16 - it is showing 1 and then 6. h is giving the hex value of each nibble, so you get 0110 (6), then 0001 (1), depending on whether its the high or low bit you're looking at. Use the high nibble first and you get 61, which is hex for 97 - the value of 'a'
Check out the Programming Ruby reference on unpack. Here's a snippet:
Decodes str (which may contain binary
data) according to the format string,
returning an array of each value
extracted. The format string consists
of a sequence of single-character
directives, summarized in Table 22.8
on page 379. Each directive may be
followed by a number, indicating the
number of times to repeat with this
directive. An asterisk ("*") will
use up all remaining elements. The
directives sSiIlL may each be followed
by an underscore ("_") to use the
underlying platform's native size for
the specified type; otherwise, it uses
a platform-independent consistent
size. Spaces are ignored in the format
string. See also Array#pack on page
286.
And the relevant characters from your example:
H Extract hex nibbles from each character (most significant first).
h Extract hex nibbles from each character (least significant first).
The hex code of char a is 61.
Template h2 is a hex string (low nybble first), H2 is the same with high nibble first.
Also see the perl documentation.
Related
How can I extract raw []uint8 from an image?
The size on an 8-bit integer is, by definition, 8 bits (or 1 byte).
(Editing to remove erroneous information, which I'll repost for posterity below.)
The string representation being output is not one character per number - it's several (for example, the int you list first - 65 - is being represented by three characters - a 6, a 5, and a space. That would increase the expected size threefold, from 300k to 900k.
As for the rest, I would think that (as icza said in a comment) image compression may be the culprit.
(The irrelevant information I'd initially posted as part of my answer was:
Go has two character types, byte and rune. A byte being used to store a character is the same size as a byte being used to store an integer - both are 8 bits. But a character being stored as a rune is going to be 32 bits (see https://www.callicoder.com/golang-basic-types-operators-type-conversion/). So, if the characters being written out are runes, that would explain a four-fold increase in size (because 32 / 8 = 4)...
I want to know how many characters or numbers can I store in 1 bit only. It will be more helpful if you tell it in octal, hexadecimal.
I want to know how many characters or numbers can I store in 1 bit only.
It is not practical to use a single bit to store numbers or characters. However, you could say:
One integer provided that the integer is in the range 0 to 1.
One ASCII character provided that the character is either NUL (0x00) or SOH (0x01).
The bottom line is that a single bit has two states: 0 and 1. Any value domain with more that two values in the domain cannot be represented using a single bit.
It will be more helpful if you tell it in octal, hexadecimal.
That is not relevant to the problem. Octal and hexadecimal are different textual representations for numeric data. They make no difference to the meaning of the numbers, or (in most cases1) the way that you represent the numbers in a computer.
1 - The exception is when you are representing numbers as text; e.g. when you represent the number 42 in a text document as the character '4' followed by the character '2'.
A bit is a "binary digit", or a value from a set of size two. If you have one or more bits, you raise 2 to the power of the number of bits. So, 2ยน gives 2. The field in Mathematics is called combinatorics.
My program is a decoder for a binary protocol. One of the fields in that binary protocol is an encoded String. Each character in the String is printable, and represents an integral value. According to the spec of the protocol I'm decoding, the integral value it represents is taken from the following table, where all possible characters are listed:
Character Value
========= =====
0 0
1 1
2 2
3 3
[...]
: 10
; 11
< 12
= 13
[...]
B 18
So for example, the character = represents an integral 13.
My code was originally using ord to get the ASCII code for the character, and then subtracting 48 from that, like this:
def Decode(val)
val[0].ord - 48
end
...which works perfectly, assuming that val consists only of characters listed in that table (this is verified elsewhere).
However, in another question, I was told that:
You are asking for a Ruby way to use ord, where using it is against
the Ruby way.
It seems to me that ord is exactly what I need here, so I don't understand why using ord here is not a Rubyist way to do what I'm trying to do.
So my questions are:
First and foremost, what is the Rubyist way to write my function above?
Secondary, why is using ord here a non-Rubyist practice?
A note on encoding: This protocol which I'm decoding specifies precisely that these strings are ASCII encoded. No other encoding is possible here. Protocols like this are extremely common in my industry (stock & commodity markets).
I guess the Rubyistic way, and faster, to decode the string into an array of integers is the unpack method:
"=01:".unpack("C*").map {|v| v - 48}
>> [13, 0, 1, 10]
The unpack method, with "C*" param, converts each character to an 8-bit unsigned integer.
Probably ord is entirely safe and appropriate in your case, as the source data should always be encoded the same way. Especially if when reading the data you set the encoding to 'US-ASCII' (although the format used looks safe for 'ASCII-8BIT', 'UTF-8' and 'ISO-8859', which may be the point of it - it seems resilient to many conversions, and does not use all possible byte values). However, ord is intended to be used with character semantics, and technically you want byte semantics. With basic ASCII and variants there is no practical difference, all byte values below 128 are the same character code.
I would suggest using String#unpack as a general method for converting binary input to Ruby data types, but there is not an unpack code for "use this byte with an offset", so that becomes a two-part process.
For eg. if we have two strings 2 and 10, 10 will come first if we order lexicographically.
The very trivial sol will be to repeat a character n number of time.
eg. 2 can be encoded as aa
10 as aaaaaaaaaa
This way the lex order is same as the numeric one.
But, is there a more elegant way to do this?
When converting the numbers to strings make sure that all the strings have the same length, by appending 0s in the front if necessary. So 2 and 10 would be encoded as "02" and "10".
While kjampani's solution is probably the best and easiest in normal applications, another way which is more space-efficient is to prepend every string with its own length. Of course, you need to encode the length in a way which is also consistently sorted.
If you know all the strings are fairly short, you can just encode their length as a fixed-length base-X sequence, where X is the number of character codes you're willing to use (popular values are 64, 96, 255 and 256.) Note that you have to use the character codes in lexicographical order, so normal base64 won't work.
One variable-length order-preserving encoding is the one used by UTF-8. (Not UTF-8 directly, which has a couple of corner cases which will get in the way, but the same encoding technique. The order-preserving property of UTF-8 is occasionally really useful.) The full range of such compressed codes can encode values up to 42 bits long, with an average of five payload bits per byte. That's sufficient for pretty long strings; four terabyte long strings are pretty rare in the wild; but if you need longer, it's possible, too, by extending the size prefix over more than one byte.
Break the string into successive sub strings of letters and numbers and then sort by comparing each substring as an integer if it's an numeric string
"aaa2" ---> aaa + 2
"aaa1000" ---> aaa + 1000
aaa == aaa
Since they're equal, we continue:
1000 > 2
Hence, aaa1000 > aaa2.
I have a hex value (0x0020004E0000 ... which is the base address to a hardware address). I need to add 0x04 to the base for each register. I have been doing this by first converting the base address to a base 10 number, then adding 4 to that value. The sum I then take and convert back to hex all via the string class .to_s and .to_i.
Is there a better way to do this so I'm not converting back-and-forth between base 10 and base 16 all the time? (FYI, in my previous AppleScript script, I punted hex math to the OS and let bc take care of the addition for me).
0x0020004E0000 + 0x04
or simply
0x0020004E0000 + 4
You have four ways of representing integer values in Ruby
64 # integer
0x40 # hexadecimal
0100 # octal
0b1000000 # binary
# These are all 64.
A number is a number is a number, no matter how it is represented internally or displayed to the user. Just add them as you would any other number. If you want to view them as Hex later then fine; format them for output.
You are confusing representation with value. Ruby can parse a number represented in Hex just as well as it can parse a decimal, binary, or octal value.
0x04 (Hex) == 4 (decimal) == 100 (binary)
All the same thing.