Ruby: Split binary data - ruby

I want to split data to chunks of let's say 8154 byte:
data = Zlib::Deflate.deflate(some_very_long_string)
What would be the best way to do that?
I tried to use this:
chunks = data.scan /.{1,8154}/
...but data was lost! data had a size of 11682, but when looping through every chunk and summing up the size I ended up with a total size of 11677. 5 bytes were lost! Why?

Regexps are not a good way to parse binary data. Use bytes and each_slice to operate bytes. And use pack 'C*' to convert them back into strings for output or debug:
irb> data = File.open("sample.gif", "rb", &:read)
=> "GIF89a\r\x00\r........."
irb> data.bytes.each_slice(10){ |slice| p slice, slice.pack("C*") }
[71, 73, 70, 56, 57, 97, 13, 0, 13, 0]
"GIF89a\r\x00\r\x00"
[247, 0, 0, 0, 0, 0, 0, 0, 51, 0]
"\xF7\x00\x00\x00\x00\x00\x00\x003\x00"
...........

The accepted answer works, but creates unneeded arrays and is extremely slow for big files.
This alternative works fine and is much faster (500x for a 1MB file and 10kB chunks!) :
def get_binary_chunks(string, size)
Array.new(((string.length + size - 1) / size)) { |i| string.byteslice(i * size, size) }
end
For the given example, you'd use it this way :
chunks = get_binary_chunks(data, 8154)

Related

Clear every other bit in Ruby

How do I clear every other for a string in Ruby, and convert it to byte array? I understand that I need to do AND operation with 0x01010101 value for every byte. But the difficulty is with correct conversion from string to binary. Ideally it should be fast and with least amount of allocations.
Later I will need to pass this value to Digest::MD5.hexdigest.
Firstly, note that 0x is for base 16, 0b is for base 2:
0b11111111.to_s(2) #=> "11111111"
0x11111111.to_s(2) #=> "10001000100010001000100010001"
As you are converting bits within bytes you want to use 0b... for your mask.
Next,
0b01010101.to_s(2) #=> "1010101"
showing that, as with all integers, leading zeroes are dropped, meaning you can include them or not. Consider,
0b11111111 & 0 #=> 0
It is seen that, as a mask, zero is treated as having 7 leading bits of zero. We see that
(0b11111111 &
0b1010101).to_s(2) #=> "1010101"
So, we can define your bitwise mask as
MASK = 0b1010101
We can now use String#unpack with format string "C*" to convert the string to an array of 8-bit unsigned integers, which we then bitwise and with MASK (using &):
str = "Let's party, now!"
str.unpack("C*").map { |u| u & MASK }
#=> [68, 69, 84, 5, 81, 0, 80, 65, 80, 84, 81, 4, 0, 68, 69, 85, 1]
The "C" in "C*" means the format directive "C" is applied to the first character; "*" means to repeat "C" for all subsequent characters.
See also Integer#&.
I see from #DavidKling's answer that one could alternatively write
str.bytes.map { |u| u & MASK }
You can use String#bytes to give you an array of the string's characters' unicode values (in decimal).
'Roman'.bytes # [82, 111, 109, 97, 110]

Find incremental x amount of numbers in range

I don't even know how to explain this... I've been looking for algos but no luck.
I need a function that would return an array of incrementally bigger numbers (not sure what kind of curve) from two numbers that I'd pass as parameters.
Ex.:
$length = 20;
get_numbers(1, 1000, $length);
> 1, 2, 3, 5, 10, 20, 30, 50, 100, 200, 500... // let's say that these are 20 numbers that add up to 1000
Any idea how I could do this..? I guess I'm not smart enough to figure it out.
How about an exponential curve? Sample Python implementation:
begin = 1
end = 1000
diff = end - begin
length = 10
X = diff**(1.0/(length-1))
seq = []
for i in range(length):
seq.append(int(begin+X**i))
print seq
(note: ** is the Python operator for exponentiation. Other languages may or may not use ^ instead)
Result:
[2, 3, 5, 10, 22, 47, 100, 216, 464, 999]

Making sense of Pebble's Accelerometer byte[ ] array

I'm using DataLogging service to log the raw accelerometer reading from pebble and retrieve these as byte array on my android. Just not sure how to interpret it based on the AccelData struct (x, y, z, did_vibrate boolean, time stamp). Here is a byte array string sample:
[-112, -1, 32, -1, 88, -4, 0, 95, -73, -62, -106, 68, 1, 0, 0]
(sampling with 10Hz and 10 samples per update)
Thanks.
isVibrate is 1 byte (bool)
timestamp is 8 bytes (uint64)
x, y, z are each 2 bytes (int16)
it looks like the data in your array is in reverse order of the struct definition. depending on what language you are coding in, the routine to convert these bytes into useable numbers will be different...

Splitting a string of numbers with different digit sizes in Ruby

I'm trying to figure out if there's a way to split a string that contains numbers with different digit sizes without having to use if/else statements. Is there an outright method for doing so. Here is an example string:
"123456789101112131415161718192021222324252627282930"
So that it would be split into an array containing 1-9 and 10-30 without having to first split the array into single digits, separate it, find the 9, and iterate through combining every 2 elements after the 9.
Here is the current way I would go about doing this to clarify:
single_digits, double_digits = [], []
string = "123456789101112131415161718192021222324252627282930".split('')
single_digits << string.slice!(0,9)
single_digits.map! {|e| e.to_i}
string.each_slice(2) {|num| double_digits << num.join.to_i}
This would give me:
single_digits = [1,2,3,4,5,6,7,8,9]
double_digits = [10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30]
As long as you can be sure that every number is greater than its predecessor and greater than zero, and every length of number from a single digit to the maximum is represented at least once, you could write this
def split_numbers(str)
numbers = []
current = 0
str.each_char do |ch|
current = current * 10 + ch.to_i
if numbers.empty? or current > numbers.last
numbers << current
current = 0
end
end
numbers << current if current > 0
numbers
end
p split_numbers('123456789101112131415161718192021222324252627282930')
output
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]
For Anon's example of 192837453572 we get
[1, 9, 28, 37, 45, 357, 2]
Go through each character of the string, collecting single 'digits', until you find a 9 (set a controlling value and increment it by 1), then continue on collecting two digits, until you find 2 consecutive 9's, and continue on.
This can then be written to handle any sequence of numbers such as your example string.
You could do this:
str = "123456789101112131415161718192021222324252627282930"
result = str[0..8].split('').map {|e| e.to_i }
result += str[9..-1].scan(/../).map {|e| e.to_i }
It's essentially the same solution as yours, but slightly cleaner (no need to combine the pairs of digits). But yeah, if you want a generalizable solution to an arbitrary length string (including more than just 2 digits), that's a different question than what you seem to be asking.
UPDATE:
Well, I haven't been able to get this question out of my mind, because it seems like there could be a simple, generalizable solution. So here's my attempt. The basic idea is to keep a counter so that you know how many digits the number you want to slice out of the string is.
str = "123456789101112131415161718192021222324252627282930"
result = []
i = 1
done = str.length < 1
str_copy = str
while !done do
result << str_copy.slice!(0..i.to_s.size-1).to_i
done = true if str_copy.size == 0
i += 1
end
puts result
This generates the desired output, and is generalizable to a string of consecutive positive integers starting with 1. I'd be very interested to see other people's improvements to this -- it's not super succinct

Compare all elements inside a 2D array with each other

I have a perfectly square 64x64 2D array of integers that will never have a value greater than 64. I was wondering if there is a really fast way to compare all of the elements with each other and display the ones that are the same, in a unique way.
At the current moment I have this
2D int array named array
loop from i = 0 to 64
loop from j = 0 to 64
loop from k = (j+1) to 64
loop from z = 0 to 64
if(array[i][j] == array[k][z])
print "element [i][j] is same as [k][z]
As you see having 4 nested loops is quite a stupid thing that I would like not to use. Language does not matter at all whatsoever, I am just simply curious to see what kind of cool solutions it is possible to use. Since value inside any integer will not be greater than 64, I guess you can only use 6 bits and transform array into something fancier. And that therefore would require less memory and would allow for some really fancy bitwise operations. Alas I am not quite knowledgeable enough to think in that format, and therefore would like to see what you guys can come up with.
Thanks to anyone in advance for a really unique solution.
There's no need to sort the array via an O(m log m) algorithm; you can use an O(m) bucket sort. (Letting m = n*n = 64*64).
An easy O(m) method using lists is to set up an array H of n+1 integers, initialized to -1; also allocate an array L of m integers each, to use as list elements. For the i'th array element, with value A[i], set k=A[i] and L[i]=H[k] and H[k]=i. When that's done, each H[k] is the head of a list of entries with equal values in them. For 2D arrays, treat array element A[i,j] as A[i+n*(j-1)].
Here's a python example using python lists, with n=7 for ease of viewing results:
import random
n = 7
m = n*n
a=[random.randint(1,n) for i in range(m)]
h=[[] for i in range(n+1)]
for i in range(m):
k = a[i]
h[k].append(i)
for i in range(1,n+1):
print 'With value %2d: %s' %(i, h[i])
Its output looks like:
With value 1: [1, 19, 24, 28, 44, 45]
With value 2: [3, 6, 8, 16, 27, 29, 30, 34, 42]
With value 3: [12, 17, 21, 23, 32, 41, 47]
With value 4: [9, 15, 36]
With value 5: [0, 4, 7, 10, 14, 18, 26, 33, 38]
With value 6: [5, 11, 20, 22, 35, 37, 39, 43, 46, 48]
With value 7: [2, 13, 25, 31, 40]
class temp {
int i, j;
int value;
}
then fill your array in class temp array[64][64], then sort it by value (you can do this in Java by implementing a comparable interface). Then the equal element should be after each other and you can extract i,j for each other.
This solution would be optimal, categorizing as a quadratic approach for big-O notation.
Use quicksort on the array, then iterate through the array, storing a temporary value of the "cursor" (current value you're looking at), and determine if the temporary value is the same as the next cursor.
array[64][64];
quicksort(array);
temp = array[0][0];
for x in array[] {
for y in array[][] {
if(temp == array[x][y]) {
print "duplicate found at x,y";
}
temp = array[x][y];
}
}

Resources