Packing sets of non power of 2 integers - algorithm

I have a set of integers, each with a specific range:
foo = [1, 5]
bar = [1, 10]
baz = [1, 200]
I can calculate how many bits are required to store each number separately based on the number of different states that they can have:
foo = 5 possible states ~ 3 bits
bar = 10 possible states ~ 4 bits
baz = 200 possible states ~ 8 bits
Which gives me a total of 15 bits. But every number has a range that is unused, resulting in wasted space. I can instead calculate the required bits for the whole set by calculating all the possible states of all the numbers combined:
5 * 10 * 200 = 10000 possible states ~ 14 bits
This could save me a whole bit!
And this is where my question comes in: what is the best way to load and store numbers using this type of layout?

A list of variables with different ranges like this:
foo = [1, 5]
bar = [1, 10]
baz = [1, 200]
Can (almost?) be interpreted as a mixed-radix number representation. If they started at zero the correspondence would be immediate, but since these start at one (or in general: if they are any finite set of possibilities) they must be remapped a little first, here just by subtracting one for conversion to the "packed" state and adding one back when decoding it again.
The encoding is nice and easy, involving only cheap operations:
packed = (foo - 1) + 5 * (bar - 1) + (5 * 10) * (baz - 1)
The scale factors come from the number of possible states of course. Every element needs to be remapped into a contiguous range starting at zero, and then scaled by the product of the #states of the preceding elements, with the first being scaled by 1 (the empty product). By the way note that [1 .. 5] has 5 states, not 4.
Decoding involves remainders and divisions, the simplest (but not in general the fastest) way is extracting digit-by-digit:
// extract foo
foo = packed % 5 + 1
// drop foo from packed representation
packed /= 5
// extract bar (which is now the lowest digit in 'packed')
bar = packed % 10 + 1
// drop bar
packed /= 10
// top digit is left over
baz = packed + 1
For larger examples it would be more efficient to first "chop" the packed number into a few separate parts, and then decode those independently. This prevents having a long chain of dependent operations, which the digit-by-digit method naturally results in.
Working directly with the packed representation is generally tricky, except to add and subtract from the elements if you know that would not overflow.


Faster way to find the size of the intersection of any two corresponding multisets from two 3D arrays of multisets

I have two uint16 3D (GPU) arrays A and B in MATLAB, which have the same 2nd and 3rd dimension. For instance, size(A,1) = 300 000, size(B,1) = 2000, size(A,2) = size(B,2) = 20, and size(A,3) = size(B,3) = 100, to give an idea about the orders of magnitude. Actually, size(A,3) = size(B,3) is very big, say ~ 1 000 000, but the arrays are stored externally in small pieces cut along the 3rd dimension. The point is that there is a very long loop along the 3rd dimension (cfg. MWE below), so the code inside of it needs to be optimized further (if possible). Furthermore, the values of A and B can be assumed to be bounded way below 65535, but there are still hundreds of different values.
For each i,j, and d, the rows A(i,:,d) and B(j,:,d) represent multisets of the same size, and I need to find the size of the largest common submultiset (multisubset?) of the two, i.e. the size of their intersection as multisets. Moreover, the rows of B can be assumed sorted.
For example, if [2 3 2 1 4 5 5 5 6 7] and [1 2 2 3 5 5 7 8 9 11] are two such multisets, respectively, then their multiset intersection is [1 2 2 3 5 5 7], which has the size 7 (7 elements as a multiset).
I am currently using the following routine to do this:
s = 300000; % 1st dim. of A
n = 2000; % 1st dim. of B
c = 10; % 2nd dim. of A and B
depth = 10; % 3rd dim. of A and B (corresponds to a batch of size 10 of A and B along the 3rd dim.)
N = 100; % upper bound on the possible values of A and B
A = randi(N,s,c,depth,'uint16','gpuArray');
B = randi(N,n,c,depth,'uint16','gpuArray');
Sizes_of_multiset_intersections = zeros(s,n,depth,'uint8'); % too big to fit in GPU memory together with A and B
for d=1:depth
A_slice = A(:,:,d);
B_slice = B(:,:,d);
unique_B_values = permute(unique(B_slice),[3 2 1]); % B is smaller than A
% compute counts of the unique B-values for each multiset:
A_values_counts = permute(sum(uint8(A_slice==unique_B_values),2,'native'),[1 3 2]);
B_values_counts = permute(sum(uint8(B_slice==unique_B_values),2,'native'),[1 3 2]);
% compute the count of each unique B-value in the intersection:
Sizes_of_multiset_intersections_tmp = gpuArray.zeros(s,n,'uint8');
for i=1:n
Sizes_of_multiset_intersections_tmp(:,i) = sum(min(A_values_counts,B_values_counts(i,:)),2,'native');
Sizes_of_multiset_intersections(:,:,d) = gather(Sizes_of_multiset_intersections_tmp);
One can also easily adapt above code to compute the result in batches along dimension 3 rather than d=1:depth (=batch of size 1), though at the expense of even bigger unique_B_values vector.
Since the depth dimension is large (even when working in batches along it), I am interested in faster alternatives to the code inside the outer loop. So my question is this: is there a faster (e.g. better vectorized) way to compute sizes of intersections of multisets of equal size?
Disclaimer : This is not a GPU based solution (Don't have a good GPU). I find the results interesting and want to share, but I can delete this answer if you think it should be.
Below is a vectorized version of your code, that makes it possible to get rid of the inner loop, at the cost of having to deal with a bigger array, that might be too big to fit in the memory.
The idea is to have the matrices A_values_counts and B_values_counts be 3D matrices shaped in such a way that calling min(A_values_counts,B_values_counts) will calculate everything in one go due to implicit expansion. In the background it will create a big array of size s x n x length(unique_B_values) (Probably most of the time too big)
In order to go around the constraint on the size, the results are calculated in batches along the n dimension, i.e. the first dimension of B:
nBatches_B = 2000;
sBatches_B = n/nBatches_B;
Sizes_of_multiset_intersections_new = zeros(s,n,depth,'uint8');
for d=1:depth
A_slice = A(:,:,d);
B_slice = B(:,:,d);
% compute counts of the unique B-values for each multiset:
unique_B_values = reshape(unique(B_slice),1,1,[]);
A_values_counts = sum(uint8(A_slice==unique_B_values),2,'native'); % s x 1 x length(uniqueB) array
B_values_counts = reshape(sum(uint8(B_slice==unique_B_values),2,'native'),1,n,[]); % 1 x n x length(uniqueB) array
% Not possible to do it all in one go, must split in batches along B
for ii = 1:nBatches_B
Sizes_of_multiset_intersections_new(:,((ii-1)*sBatches_B+1):ii*sBatches_B,d) = sum(min(A_values_counts,B_values_counts(:,((ii-1)*sBatches_B+1):ii*sBatches_B,:)),3,'native'); % Vectorized
Here is a little benchmark with different values of the number of batches. You can see that a minimum is found around a number of 400 (batch size 50), with a decrease of around 10% in processing time (each point is an average over 3 runs). (EDIT : x axis is amount of batches, not batches size)
I'd be interested in knowing how it behaves for GPU arrays as well!

Create a 'perfect hash function' for contiguous ranges

I'm looking for a way to 'project' a series of 'ranges' into a series of values. Applications would be histograms with uneven bins or creation of lookup tables.
An example:
0 to 14 => 0
15 to 234 => 1
235 => 2
236 to 255 => 3
The actual order of 'result' values (0, 1, 2, 3) on the right don't really matter as long as they are between 0 and 3, so that I can use a small lookup table after. It would be even better if I could make this work on floating point values (on the left).
I know I could use an 8-bit lookup table here and repeat values but I'd like to find a way to 'perfect hash' this: through a series of systematic operations (as small as possible, no branches) compute the right from the left values, to have the smallest possible result space.
I can't seem to find the correct series of magic Google incantations for this kind of algorithm.
The computing duration of this 'hash' or 'projection' function can be in the days if necessary.
If you have n ranges without overlap or gaps you can generate some simple code using O(log2(n)) instructions to to this lookup. I will demonstrate with some Python code.
# Must be a power of two, extend with zeros on the right if needed.
lookup = [0, 15, 235, 236]
def index(x):
i = 0
# ceil(log2(len(lookup))) iterations in the following pattern.
# We only need 2 iterations here.
# ...
# i += (x >= lookup[i+8]) << 4
# i += (x >= lookup[i+4]) << 3
i += (x >= lookup[i+2]) << 2
i += (x >= lookup[i+1]) << 1
return i
This is known as a branchless binary search. E.g. even if you have 216 ranges, this would only require 32 additions and 16 table lookups, comparisons, and bitshifts to compute the exact index.

Split array into four boxes such that sum of XOR's of the boxes is maximum

Given an array of integers which are needed to be split into four
boxes such that sum of XOR's of the boxes is maximum.
I/P -- [1,2,1,2,1,2]
O/P -- 9
Explanation: Box1--[1,2]
I've tried using recursion but failed for larger test cases as the
Time Complexity is exponential. I'm expecting a solution using dynamic
def max_Xor(b1,b2,b3,b4,A,index,size):
if index == size:
return b1+b2+b3+b4
return m
def main():
Thanks in Advance!!
There are several things to speed up your algorithm:
Build in some start-up logic: it doesn't make sense to put anything into box 3 until boxes 1 & 2 are differentiated. In fact, you should generally have an order of precedence to keep you from repeating configurations in a different order.
Memoize your logic; this avoids repeating computations.
For large cases, take advantage of what value algebra exists.
This last item may turn out to be the biggest saving. For instance, if your longest numbers include several 5-bit and 4-bit numbers, it makes no sense to consider shorter numbers until you've placed those decently in the boxes, gaining maximum advantage for the leading bits. With only four boxes, you cannot have a num from 3-bit numbers that dominates a single misplaced 5-bit number.
Your goal is to place an odd number of 5-bit numbers into 3 or all 4 boxes; against this, check only whether this "pessimizes" bit 4 of the remaining numbers. For instance, given six 5-digit numbers (range 16-31) and a handful of small ones (0-7), your first consideration is to handle only combinations that partition the 5-digit numbers by (3, 1, 1, 1), as this leaves that valuable 5-bit turned on in each set.
With a more even mixture of values in your input, you'll also need to consider how to distribute the 4-bits for a similar "keep it odd" heuristic. Note that, as you work from largest to smallest, you need worry only about keeping it odd, and watching the following bit.
These techniques should let you prune your recursion enough to finish in time.
We can use Dynamic programming here to break the problem into smaller sets then store their result in a table. Then use already stored result to calculate answer for bigger set.
For example:
Input -- [1,2,1,2,1,2]
We need to divide the array consecutively into 4 boxed such that sum of XOR of all boxes is maximised.
Lets take your test case, break the problem into smaller sets and start solving for smaller set.
box = 1, num = [1,2,1,2,1,2]
ans = 1 3 2 0 1 3
Since we only have one box so all numbers will go into this box. We will store this answer into a table. Lets call the matrix as DP.
DP[1] = [1 3 2 0 1 3]
DP[i][j] stores answer for distributing 0-j numbers to i boxes.
now lets take the case where we have two boxes and we will take numbers one by one.
num = [1] since we only have one number it will go into the first box.
DP[1][0] = 1
Lets add another number.
num = [1 2]
now there can be two ways to put this new number into the box.
case 1: 2 will go to the First box. Since we already have answer
for both numbers in one box. we will just use that.
answer = DP[0][1] + 0 (Second box is empty)
case 2: 2 will go to second box.
answer = DP[0][0] + 2 (only 2 is present in the second box)
Maximum of the two cases will be stored in DP[1][1].
DP[1][1] = max(3+0, 1+2) = 3.
Now for num = [1 2 1].
Again for new number we have three cases.
box1 = [1 2 1], box2 = [], DP[0][2] + 0
box1 = [1 2], box2 = [1], DP[0][1] + 1
box1 = [1 ], box2 = [2 1], DP[0][0] + 2^1
Maximum of these three will be answer for DP[1][2].
Similarly we can find answer of num = [1 2 1 2 1 2] box = 4
1 3 2 0 1 3
1 3 4 6 5 3
1 3 4 6 7 9
1 3 4 6 7 9
Also note that a xor b xor a = b. you can use this property to get xor of a segment of an array in constant time as suggested in comments.
This way you can break the problem in smaller subset and use smaller set answer to compute for the bigger ones. Hope this helps. After understanding the concept you can go ahead and implement it with better time than exponential.
I would go bit by bit from the highest bit to the lowest bit. For every bit, try all combinations that distribute the still unused numbers that have that bit set so that an odd number of them is in each box, nothing else matters. Pick the best path overall. One issue that complicates this greedy method is that two boxes with a lower bit set can equal one box with the next higher bit set.
Alternatively, memoize the boxes state in your recursion as an ordered tuple.

Ugly Number - Mathematical intuition for dp

I am trying find the "ugly" numbers, which is a series of numbers whose only prime factors are [2,3,5].
I found dynamic programming solution and wanted to understand how it works and what is the mathematical intuition behind the logic.
The algorithm is to keep three different counter variable for a multiple of 2, 3 and 5. Let's assume i2,i3, and i5.
Declare ugly array and initialize 0 index to 1 as the first ugly number is 1.
Initialize i2=i3=i4=0;
ugly[i] = min(ugly[i2]*2, ugly[i3]*3, ugly[i5]*5) and increment i2 or i3 or i5 which ever index was chosen.
Dry run:
ugly = |1|
ugly[1] = min(ugly[0]*2, ugly[0]*3, ugly[0]*5) = 2
ugly = |1|2|
ugly[2] = min(ugly[1]*2, ugly[0]*3, ugly[0]*5) = 3
ugly = |1|2|3|
ugly[3] = min(ugly[1]*2, ugly[1]*3, ugly[0]*5) = 4
ugly = |1|2|3|4|
ugly[4] = min(ugly[2]*2, ugly[1]*3, ugly[0]*5) = 5
ugly = |1|2|3|4|5|
ugly[4] = min(ugly[2]*2, ugly[1]*3, ugly[0]*5) = 6
ugly = |1|2|3|4|5|6|
I am getting lost how six is getting formed from 2's index. Can someone explain in an easy way?
Every "ugly" number (except 1) can be formed by multiplying a smaller ugly number by 2, 3, or 5.
So let's say that the ugly numbers found so far are [1,2,3,4,5]. Based on that list we can generate three sequences of ugly numbers:
Multiplying by 2, the possible ugly numbers are [2,4,6,8,10]
Multiplying by 3, the possible ugly numbers are [3,6,9,12,15]
Multiplying by 5, the possible ugly numbers are [5,10,15,20,25]
But we already have 2,3,4, and 5 in the list, so we don't care about values less than or equal to 5. Let's mark those entries with a - to indicate that we don't care about them
Multiplying by 2, the possible ugly numbers are [-,-,6,8,10]
Multiplying by 3, the possible ugly numbers are [-,6,9,12,15]
Multiplying by 5, the possible ugly numbers are [-,10,15,20,25]
And in fact, all we really care about is the smallest number in each sequence
Multiplying by 2, the smallest number greater than 5 is 6
Multiplying by 3, the smallest number greater than 5 is 6
Multiplying by 5, the smallest number greater than 5 is 10
After adding 6 to the list of ugly numbers, each sequence has one additional element:
Multiplying by 2, the possible ugly numbers are [-,-,-,8,10,12]
Multiplying by 3, the possible ugly numbers are [-,-,9,12,15,18]
Multiplying by 5, the possible ugly numbers are [-,10,15,20,25,30]
But the elements from each sequence that are useful are:
Multiplying by 2, the smallest number greater than 6 is 8
Multiplying by 3, the smallest number greater than 6 is 9
Multiplying by 5, the smallest number greater than 6 is 10
So you can see that what the algorithm is doing is creating three sequences of ugly numbers. Each sequence is formed by multiplying all of the existing ugly numbers by one of the three factors.
But all we care about is the smallest number in each sequence (larger than the largest ugly number found so far).
So the indexes i2, i3, and i5 are the indexes into the corresponding sequences. When you use a number from a sequence, you update the index to point to the next number in that sequence.
The intuition is the following:
any ugly number can be written as the product between 2, 3 or 5 and another (smaller) ugly number.
With that in mind, the solution that is mentioned in the question keeps track of i2, i3 and i5, the indices of the smallest ugly numbers generated so far, which multiplied by 2, 3, respectively 5 lead to a number that was not already generated. The smallest of these products is the smallest ugly number that was not already generated.
To state this differently, I believe that the following statement from the question might be the source of some confusion:
The algorithm is to keep three different counter variable for a
multiple of 2, 3 and 5. Let's assume i2,i3, and i5.
Note, for example, that ugly[i2] is not necessarily a multiple of 2. It is simply the smallest ugly number for which 2 * ugly[i2] is greater than ugly[i] (the largest ugly number known so far).
Regarding how the number 6 is generated in the next step, the procedure is shown below:
ugly = |1|2|3|4|5
i2 = 2;
i3 = 1;
i5 = 1;
ugly[5] = min(ugly[2]*2, ugly[1]*3, ugly[1]*5) = min(3*2, 2*3, 2*5) = 6
ugly = |1|2|3|4|5|6
i2 = 3
i3 = 2
i5 = 1
Note that here both i2 and i3 need to be incremented after generating the number 6, because both i2*2, as well as i3*3 produced the same next smallest ugly number.

Creating hash function to map 6 numbers to a short string

I have 6 variables 0 ≤ n₁,...,n₆ ≤ 12 and I'd like to build a hash function to do the direct mapping D(n₁,n₂,n₃,n₄,n₅,n₆) = S and another function to do the inverse mapping I(S) = (n₁,n₂,n₃,n₄,n₅,n₆), where S is a string (a-z, A-Z, 0-9).
My goal is to minimize the length of S for 3 or less.
I thought as the variables have 13 possible values, a single letter (a-z) should be able to represent 2 of them, but I realized that 1 + 12 = m and 2 + 11 = m, so I still don't know how to write a function.
Is there any approach to build a function that does this mapping and returns a small string?
Using the whole ASCII to represent S is an option if it's necessary.
You can convert a set of numbers in any given range to numbers in any other range using base conversion.
Binary is base 2 (0-1), decimal is base 10 (0-9). Your 6 numbers are base 13 (0-12).
Checking whether a conversion would be possible involves counting the number of possible combinations of values for each set. With each number in the range [0,n] (thus base n+1), we can go from all 0's to all n's, thus each number can take on n+1 values and the total number of possibilities is (n+1)numberCount. For 6 decimal digits, for example, it would be 106 = 1000000, which checks out, since there are 1000000 possible numbers with (at most) 6 digits, i.e. numbers < 1000000.
Lower- and uppercase letters and numbers (26+26+10) would be base 62 (0-61), but, following from the above, 3 such values would be insufficient to represent your 6 numbers (136 > 623). To do conversion from/to these, you can do the conversion to a set of base 62 numbers, then have appropriate if-statements to convert 0-9 <=> 0-9, a-z <=> 10-35, A-Z <=> 36-61.
You can represent your data in 3 bytes (since 2563 >= 136), although this wouldn't necessary be printable characters - 32-126 is considered the standard printable range (which is still too small of a range), 128-255 is the extended range and may not be displayed properly in any given environment (to give the best chance of properly displaying it, you should at least avoid 0-31 and 127, which are control characters - you can convert 0-... to the above ranges by adding 32 and then adding another 1 if the value is >= 127).
Many / most languages should allow you to give a numeric value to represent a character, so it should be fairly simple to output it once you do the base conversion. Although some may use Unicode to represent characters, which could make it a bit less trivial to work with ASCII.
If the numbers had specific constraints, that would reduce the number of possible combinations, thus possibly making it fit into a smaller set or range of numbers.
To do the actual base conversion:
It might be simplest to first convert it to a regular integral type (typically binary or decimal), where we don't have to worry about the base, and then convert it to the target base (although first make sure your value will fit in whichever data type you're using).
Consider how binary works:
1101 is 13 = 23 + 22 + 20
13 % 2 = 1 13 / 2 = 6
6 % 2 = 0 6 / 2 = 3
3 % 2 = 1 3 / 2 = 1
1 % 2 = 1
The above, from top to bottom: 1101 = our number
Using the same idea, we can convert to/from any base as follows: (pseudo-code)
int convertFromBase(array, base):
output = 0
for each i in array
output = base*output + i
return output
int[] convertToBase(num, base):
output = []
while num > 0
output.append(num % base)
num /= base
return output
You can also extend this logic to situations where each number is in a different range by changing what you divide or multiple by at each step (a detailed explanation of that is perhaps a bit beyond the scope of the question).
I thought as the variables have 13 possible values, a single letter
(a-z) should be able to represent 2 of them
This reasoning is wrong. In fact to represent two variables (=any combination these variables might take) you will need 13x13 = 169 symbols.
For your example the 6 variables can take 13^6 (=4826809) different combinations. In order to represent all possible combinations you will need 5 letters (a-z) since 26^5 (=11881376) is the least amount that is will yield more than 13^6 combinations.
For ASCII characters 3 symbols should suffice since 256^3 > 13^6.
If you are still interested in code that does the conversion, I will be happy to help.
