I am trying to use unpack to decode a binary file. The binary file has the following structure:
ABCDEF\tFFFABCDEF\tFFFF....
where
ABCDEF -> String of fixed length
\t -> tab character
FFF -> 3 Floats
.... -> repeat thousands of times
I know how to do it when types are all the same or with only numbers and fixed length arrays, but I am struggling in this situation. For example, if I had a list of floats I would do
s.unpack('F*')
Or if I had integers and floats like
[1, 3.4, 5.2, 4, 2.3, 7.8]
I would do
s.unpack('CF2CF2')
But in this case I am a bit lost. I was hoping to use a format string such `(CF2)*' with brackets, but it does not work.
I need to use Ruby 2.0.0-p247 if that matters
Example
ary = ["ABCDEF\t", 3.4, 5.6, 9.1, "FEDCBA\t", 2.5, 8.9, 3.1]
s = ary.pack('P7fffP7fff')
then
s.scan(/.{19}/)
["\xA8lf\xF9\xD4\x7F\x00\x00\x9A\x99Y#33\xB3#\x9A\x99\x11", "A\x80lf\xF9\xD4\x7F\x00\x00\x00\x00 #ff\x0EAff"]
Finally
s.scan(/.{19}/).map{ |item| item.unpack('P7fff') }
Error: #<ArgumentError: no associated pointer>
<main>:in `unpack'
<main>:in `block in <main>'
<main>:in `map'
<main>:in `<main>'
You could read the file in small chunks of 19 bytes and use 'A7fff' to pack and unpack. Do not use pointers to structure ('p' and 'P'), as they need more than 19 bytes to encode your information.
You could also use 'A6xfff' to ignore the 7th byte and get a string with 6 chars.
Here's an example, which is similar to the documentation of IO.read:
data = [["ABCDEF\t", 3.4, 5.6, 9.1],
["FEDCBA\t", 2.5, 8.9, 3.1]]
binary_file = 'data.bin'
chunk_size = 19
pattern = 'A7fff'
File.open(binary_file, 'wb') do |o|
data.each do |row|
o.write row.pack(pattern)
end
end
raise "Something went wrong. Please check data, pattern and chunk_size." unless File.size(binary_file) == data.length * chunk_size
File.open(binary_file, 'rb') do |f|
while record = f.read(chunk_size)
puts '%s %g %g %g' % record.unpack(pattern)
end
end
# =>
# ABCDEF 3.4 5.6 9.1
# FEDCBA 2.5 8.9 3.1
You could use a multiple of 19 to speed up the process if your file is large.
When dealing with mixed formats that repeat, and are of a known fixed size, it is often easier to split the string first,
Quick example would be:
binary.scan(/.{LENGTH_OF_DATA}/).map { |item| item.unpack(FORMAT) }
Considering your above example, take the length of the string including the tab character (in bytes), plus the size of a 3 floats. If your strings are literally 'ABCDEF\t', you would use a size of 19 (7 for the string, 12 for the 3 floats).
Your final product would look like this:
str.scan(/.{19}/).map { |item| item.unpack('P7fff') }
Per example:
irb(main):001:0> ary = ["ABCDEF\t", 3.4, 5.6, 9.1, "FEDCBA\t", 2.5, 8.9, 3.1]
=> ["ABCDEF\t", 3.4, 5.6, 9.1, "FEDCBA\t", 2.5, 8.9, 3.1]
irb(main):002:0> s = ary.pack('pfffpfff')
=> "\xE8Pd\xE4eU\x00\x00\x9A\x99Y#33\xB3#\x9A\x99\x11A\x98Pd\xE4eU\x00\x00\x00\x00 #ff\x0EAffF#"
irb(main):003:0> s.unpack('pfffpfff')
=> ["ABCDEF\t", 3.4000000953674316, 5.599999904632568, 9.100000381469727, "FEDCBA\t", 2.5, 8.899999618530273, 3.0999999046325684]
The minor differences in precision is unavoidable, but do not worry about it, as it comes from the difference of a 32-bit float and 64-bit double (what Ruby used internally), and the precision difference will be less than is significant for a 32-bit float.
I encrypted numerical value as following.
> secret = Sestrong textcureRandom::hex(128)
> encryptor = ::ActiveSupport::MessageEncryptor.new(secret, cipher: 'aes-256-cbc')
> message1 = 1
> message1.size
=> 8
> message1.class
=> Fixnum
> encrypt_message1 = encryptor.encrypt_and_sign(message1)
> encrypt_message1.length
=> 110
> message2 = 10000
> message2.size
=> 8
> message2.class
=> Fixnum
> encrypt_message2 = encryptor.encrypt_and_sign(message2)
> encrypt_message2.length
=> 110
Above result is expected result.
Because, class of number which is less than 4611686018427387903 is Fixnum, and size of Fixnum is 8 byte.
In addition, block size of AES is 128bit(16 byte).
8 byte < 16 byte.
So, both length of encrypted value of 1 and 10000 is same.
But, following case, Length of encrypted value is different.
> message3 = 1000000000000000000000000000
> message3.size
=> 12
> message3.class
=> Bignum
> encrypt_message3 = encryptor.encrypt_and_sign(message3)
> encrypt_message3.size
=> 138
1000000000000000000000000000 is Bignum,but this size is 12 and less than 16(block size of AES).
So, I expected that length of encrypted value is same to that of Fixnum.
But, these are different...
Why are these different?
There are multiple layers to what is happening here, and you cannot explain it solely on the data size + encryption used (ie you have to factor in the transformations that happen also)
Look at: https://github.com/rails/rails/blob/29be3f5d8386fc9a8a67844fa9b7d6860574e715/activesupport/lib/active_support/message_encryptor.rb
and after that look at:
https://github.com/rails/rails/blob/29be3f5d8386fc9a8a67844fa9b7d6860574e715/activesupport/lib/active_support/message_verifier.rb which is used in the encryptor.
There are a few stages:
serializing the data you pass in (this is done using Marshal.dump if you don't specify any serializer)
base64 encoding the data.
generating a digest (ie signature) for the data.
encrypting the data+digest and storign the result + iv from the cipher used in encrypted form.
If you want to understand the generated encrypted data you basically need to trace through the code above, but:
::Base64.strict_encode64(Marshal.dump(1)).size is 8
::Base64.strict_encode64(Marshal.dump(10000)).size is 8
::Base64.strict_encode64(Marshal.dump(1000000000000000000000000000)).size is 24
But:
Marshal.dump(1).size is 4
Marshal.dump(10000).size is 6
Marshal.dump(1000000000000000000000000000).size is 17
Here is how Marshal.dump works internally: http://jakegoulding.com/blog/2013/01/15/a-little-dip-into-rubys-marshal-format/
Here is how base64 encoding works: https://blogs.oracle.com/rammenon/entry/base64_explained Look at the rules for padding.
How to convert the 32 bit integer to network byte order.
What is the right way to do that?
[1024].pack("N")
OR
[1,0,2,4].pack("N")
Thanks
To start, look at the output of each:
>> [1024].pack("N")
=> "\000\000\004\000"
>> [1,0,2,4].pack("N")
=> "\000\000\000\001"
Note what the second is missing:
>> [1,0,2,4].pack("NNNN")
=> "\000\000\000\001\000\000\000\000\000\000\000\002\000\000\000\004"
I need to round up to the nearest tenth. What I need is ceil but with precision to the first decimal place.
Examples:
10.38 would be 10.4
10.31 would be 10.4
10.4 would be 10.4
So if it is any amount past a full tenth, it should be rounded up.
I'm running Ruby 1.8.7.
This works in general:
ceil(number*10)/10
So in Ruby it should be like:
(number*10).ceil/10.0
Ruby's round method can consume precisions:
10.38.round(1) # => 10.4
In this case 1 gets you rounding to the nearest tenth.
If you have ActiveSupport available, it adds a round method:
3.14.round(1) # => 3.1
3.14159.round(3) # => 3.142
The source is as follows:
def round_with_precision(precision = nil)
precision.nil? ? round_without_precision : (self * (10 ** precision)).round / (10 ** precision).to_f
end
To round up to the nearest tenth in Ruby you could do
(number/10.0).ceil*10
(12345/10.0).ceil*10 # => 12350
(10.33 + 0.05).round(1) # => 10.4
This always rounds up like ceil, is concise, supports precision, and without the goofy /10 *10.0 thing.
Eg. round up to nearest hundredth:
(10.333 + 0.005).round(2) # => 10.34
To nearest thousandth:
(10.3333 + 0.0005).round(3) # => 10.334
etc.
I was hoping someone with better math capabilities would assist me in figuring out the total possibilities for a string given it's length and character set.
i.e. [a-f0-9]{6}
What are the possibilities for this pattern of random characters?
It is equal to the number of characters in the set raised to 6th power.
In Python (3.x) interpreter:
>>> len("0123456789abcdef")
16
>>> 16**6
16777216
>>>
EDIT 1:
Why 16.7 million? Well, 000000 ... 999999 = 10^6 = 1M, 16/10 = 1.6 and
>>> 1.6**6
16.77721600000000
* EDIT 2:*
To create a list in Python, do: print(['{0:06x}'.format(i) for i in range(16**6)])
However, this is too huge. Here is a simpler, shorter example:
>>> ['{0:06x}'.format(i) for i in range(100)]
['000000', '000001', '000002', '000003', '000004', '000005', '000006', '000007', '000008', '000009', '00000a', '00000b', '00000c', '00000d', '00000e', '00000f', '000010', '000011', '000012', '000013', '000014', '000015', '000016', '000017', '000018', '000019', '00001a', '00001b', '00001c', '00001d', '00001e', '00001f', '000020', '000021', '000022', '000023', '000024', '000025', '000026', '000027', '000028', '000029', '00002a', '00002b', '00002c', '00002d', '00002e', '00002f', '000030', '000031', '000032', '000033', '000034', '000035', '000036', '000037', '000038', '000039', '00003a', '00003b', '00003c', '00003d', '00003e', '00003f', '000040', '000041', '000042', '000043', '000044', '000045', '000046', '000047', '000048', '000049', '00004a', '00004b', '00004c', '00004d', '00004e', '00004f', '000050', '000051', '000052', '000053', '000054', '000055', '000056', '000057', '000058', '000059', '00005a', '00005b', '00005c', '00005d', '00005e', '00005f', '000060', '000061', '000062', '000063']
>>>
EDIT 3:
As a function:
def generateAllHex(numDigits):
assert(numDigits > 0)
ceiling = 16**numDigits
for i in range(ceiling):
formatStr = '{0:0' + str(numDigits) + 'x}'
print(formatStr.format(i))
This will take a while to print at numDigits = 6.
I recommend dumping this to file instead like so:
def generateAllHex(numDigits, fileName):
assert(numDigits > 0)
ceiling = 16**numDigits
with open(fileName, 'w') as fout:
for i in range(ceiling):
formatStr = '{0:0' + str(numDigits) + 'x}'
fout.write(formatStr.format(i))
If you are just looking for the number of possibilities, the answer is (charset.length)^(length). If you need to actually generate a list of the possibilities, just loop through each character, recursively generating the remainder of the string.
e.g.
void generate(char[] charset, int length)
{
generate("",charset,length);
}
void generate(String prefix, char[] charset, int length)
{
for(int i=0;i<charset.length;i++)
{
if(length==1)
System.out.println(prefix + charset[i]);
else
generate(prefix+i,charset,length-1);
}
}
The number of possibilities is the size of your alphabet, to the power of the size of your string (in the general case, of course)
assuming your string size is 4: _ _ _ _ and your alphabet = { 0 , 1 }:
there are 2 possibilities to put 0 or 1 in the first place, second place and so on.
so it all sums up to: alphabet_size^String_size
first: 000000
last: ffffff
This matches hexadecimal numbers.
For any given set of possible values, the number of permutations is the number of possibilities raised to the power of the number of items.
In this case, that would be 16 to the 6th power, or 16777216 possibilities.