How to balance the number of items across multiple columns - algorithm

I need to find out a method to determine how many items should appear per column in a multiple column list to achieve the most visual balance. Here are my criteria:
The list should only be split into multiple columns if the item count is greater than 10.
If multiple columns are required, they should contain no less than 5 (except for the last column in case of a remainder) and no more than 10 items.
If all columns cannot contain an equal number of items
All but the last column should be equal in number.
The number of items in each column should be optimized to achieve the smallest difference between the last column and the other column(s).

Well, your requirements and your examples appear a bit contradictory. For instance, your second example could be divided into two columns with 11 items in each, and satisfy your criteria. Let's assume that for rule #2 you meant that there should be <= 10 items / column.
In addition, I think you need to add another rule to make the requirements sensible:
The number of columns must not be greater than what is required to accomodate overflow.
Otherwise, you will often end up with degenerate solutions where you have far more columns than you need. For example, in the case of 26 items you probably don't want 13 columns of 2 items each.
If that's case, here's a simple calculation that should work well and is easy to understand:
int numberOfColumns = CEILING(numberOfItems / 10);
int numberOfItemsPerColumn = CEILING(numberOfItems / numberOfColumns);
Now you'll create N-1 columns of items (having `numberOfItemsPerColumn each) and the overflow will go in the last column. By this definition, the overflow should be minimized in the last column.

If you want to automatically determine the appropriate number of columns, and have no restrictions on its limits, I would suggest the following:
Calculate the square root of the total number of items. That would make an squared layout.
Divide that number by 1.618, and assign that to the total number of rows.
Multiply that same number by 1.618, and assign that to the total number of columns.
All columns but the right most one will have the same number of items.
By the way, the constant 1.618 is the Golden Ratio. That will achieve a more pleasant layout than a squared one.
Divide and multiply the other way round for vertical displays.
Hope this algorithm helps anyone with a similar problem.

Here's what you're trying to solve:
minimize y - z where n = xy + z and 5 <= y <= 10 and 0 <= z <= y
where you have n items split into x full columns of y items and one remainder column of z items.
There is almost certainly a smart way of doing this, but given these constraints a brute force implementation exploring all 6 + 7 + 8 + 9 + 10 = 40 possible combinations for y and z would take no time at all (only assignments where (n - z) mod y = 0 are solutions).

I think a brute force solution is easy, given the constraint on the number of items per columns: let v be the number of items per column (except the last one), then v belongs to [5,10] and can thus take a whooping 6 different values.
Evaluating 6 values is easy enough. Python one-liner (or not so far) to prove it:
# compute the difference between the number of items for the normal columns
# and for the last column, lesser is better
def helper(n,v):
modulo = n % v
if modulo == 0: return 0
else: return v - modulo
# values can only be in [5,10]
# we compute the difference with the last column for each
# build a list of tuples (difference, - number of items)
# (because the greater the value the better, it means less columns)
# extract the min automatically (in case of equality, less is privileged)
# and then pick the number of items from the tuple and re-inverse it
def compute(n): return - min([(helper(n,v), -v) for v in [5,6,7,8,9,10]])[1]
For 77 this yields: 7 meaning 7 items per columns
For 22 this yields: 8 meaning 8 items per columns

Related

Replace two elements with their absolute difference and generate the minimum possible element in array

I have an array of size n and I can apply any number of operations(zero included) on it. In an operation, I can take any two elements and replace them with the absolute difference of the two elements. We have to find the minimum possible element that can be generated using the operation. (n<1000)
Here's an example of how operation works. Let the array be [1,3,4]. Applying operation on 1,3 gives [2,4] as the new array.
Ex: 2 6 11 3 => ans = 0
This is because 11-6 = 5 and 5-3 = 2 and 2-2 = 0
Ex: 20 6 4 => ans = 2
Ex: 2 6 10 14 => ans = 0
Ex: 2 6 10 => ans = 2
Can anyone tell me how can I approach this problem?
Edit:
We can use recursion to generate all possible cases and pick the minimum element from them. This would have complexity of O(n^2 !).
Another approach I tried is Sorting the array and then making a recursion call where the either starting from 0 or 1, I apply the operations on all consecutive elements. This will continue till their is only one element left in the array and we can return the minimum at any point in the recursion. This will have a complexity of O(n^2) but doesn't necessarily give the right answer.
Ex: 2 6 10 15 => (4 5) & (2 4 15) => (1) & (2 15) & (2 11) => (13) & (9). The minimum of this will be 1 which is the answer.
When you choose two elements for the operation, you subtract the smaller one from the bigger one. So if you choose 1 and 7, the result is 7 - 1 = 6.
Now having 2 6 and 8 you can do:
8 - 2 -> 6 and then 6 - 6 = 0
You may also write it like this: 8 - 2 - 6 = 0
Let"s consider different operation: you can take two elements and replace them by their sum or their difference.
Even though you can obtain completely different values using the new operation, the absolute value of the element closest to 0 will be exactly the same as using the old one.
First, let's try to solve this problem using the new operations, then we'll make sure that the answer is indeed the same as using the old ones.
What you are trying to do is to choose two nonintersecting subsets of initial array, then from sum of all the elements from the first set subtract sum of all the elements from the second one. You want to find two such subsets that the result is closest possible to 0. That is an NP problem and one can efficiently solve it using pseudopolynomial algorithm similar to the knapsack problem in O(n * sum of all elements)
Each element of initial array can either belong to the positive set (set which sum you subtract from), negative set (set which sum you subtract) or none of them. In different words: each element you can either add to the result, subtract from the result or leave untouched. Let's say we already calculated all obtainable values using elements from the first one to the i-th one. Now we consider i+1-th element. We can take any of the obtainable values and increase it or decrease it by the value of i+1-th element. After doing that with all the elements we get all possible values obtainable from that array. Then we choose one which is closest to 0.
Now the harder part, why is it always a correct answer?
Let's consider positive and negative sets from which we obtain minimal result. We want to achieve it using initial operations. Let's say that there are more elements in the negative set than in the positive set (otherwise swap them).
What if we have only one element in the positive set and only one element in the negative set? Then absolute value of their difference is equal to the value obtained by using our operation on it.
What if we have one element in the positive set and two in the negative one?
1) One of the negative elements is smaller than the positive element - then we just take them and use the operation on them. The result of it is a new element in the positive set. Then we have the previous case.
2) Both negative elements are smaller than the positive one. Then if we remove bigger element from the negative set we get the result closer to 0, so this case is impossible to happen.
Let's say we have n elements in the positive set and m elements in the negative set (n <= m) and we are able to obtain the absolute value of difference of their sums (let's call it x) by using some operations. Now let's add an element to the negative set. If the difference before adding new element was negative, decreasing it by any other number makes it smaller, that is farther from 0, so it is impossible. So the difference must have been positive. Then we can use our operation on x and the new element to get the result.
Now second case: let's say we have n elements in the positive set and m elements in the negative set (n < m) and we are able to obtain the absolute value of difference of their sums (again let's call it x) by using some operations. Now we add new element to the positive set. Similarly, the difference must have been negative, so x is in the negative set. Then we obtain the result by doing the operation on x and the new element.
Using induction we can prove that the answer is always correct.

What is the fast way to calculate this summation in MATLAB?

So I have the following constraints:
How to write this in MATLAB in an efficient way? The inputs are x_mn, M, and N. The set B={1,...,N} and the set U={1,...,M}
I did it like this (because I write x as the follwoing vector)
x=[x_11, x_12, ..., x_1N, X_21, x_22, ..., x_M1, X_M2, ..., x_MN]:
%# first constraint
function R1 = constraint_1(M, N)
ee = eye(N);
R1 = zeros(N, N*M);
for m = 1:M
R1(:, (m-1)*N+1:m*N) = ee;
end
end
%# second constraint
function R2 = constraint_2(M, N)
ee = ones(1, N);
R2 = zeros(M, N*M);
for m = 1:M
R2(m, (m-1)*N+1:m*N) = ee;
end
end
By the above code I will get a matrix A=[R1; R2] with 0-1 and I will have A*x<=1.
For example, M=N=2, I will have something like this:
And, I will create a function test(x) which returns true or false according to x.
I would like to get some help from you and optimize my code.
You should place your x_mn values in a matrix. After that, you can sum in each dimension to get what you want. Looking at your constraints, you will place these values in an M x N matrix, where M is the amount of rows and N is the amount of columns.
You can certainly place your values in a vector and construct your summations in the way you intended earlier, but you would have to write for loops to properly subset the proper elements in each iteration, which is very inefficient. Instead, use a matrix, and use sum to sum over the dimensions you want.
For example, let's say your values of x_mn ranged from 1 to 20. B is in the set from 1 to 5 and U is in the set from 1 to 4. As such:
X = vec2mat(1:20, 5)
X =
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
vec2mat takes a vector and reshapes it into a matrix. You specify the number of columns you want as the second element, and it will create the right amount of rows to ensure that a proper matrix is built. In this case, I want 5 columns, so this should create a 4 x 5 matrix.
The first constraint can be achieved by doing:
first = sum(X,1)
first =
34 38 42 46 50
sum works for vectors as well as matrices. If you have a matrix supplied to sum, you can specify a second parameter that tells you in what direction you wish to sum. In this case, specifying 1 will sum over all of the rows for each column. It works in the first dimension, which is the rows.
What this is doing is it is summing over all possible values in the set B over all values of U, which is what we are exactly doing here. You are simply summing every single column individually.
The second constraint can be achieved by doing:
second = sum(X,2)
second =
15
40
65
90
Here we specify 2 as the second parameter so that we can sum over all of the columns for each row. The second dimension goes over the columns. What this is doing is it is summing over all possible values in the set U over all values of B. Basically, you are simply summing every single row individually.
BTW, your code is not achieving what you think it's achieving. All you're doing is simply replicating the identity matrix a set number of times over groups of columns in your matrix. You are actually not performing any summations as per the constraint. What you are doing is you are simply ensuring that this matrix will have the conditions you specified at the beginning of your post to be enforced. These are the ideal matrices that are required to satisfy the constraints.
Now, if you want to check to see if the first condition or second condition is satisfied, you can do:
%// First condition satisfied?
firstSatisfied = all(first <= 1);
%// Second condition satisfied
secondSatisfied = all(second <= 1);
This will check every element of first or second and see if the resulting sums after you do the above code that I just showed are all <= 1. If they all satisfy this constraint, we will have true. Else, we have false.
Please let me know if you need anything further.

How to get histogram data object from matlab

Lets say I have a matrix x=[ 1 2 1 2 1 2 1 2 3 4 5 ]. To look at its histogram, I can do h=hist(x).
Now, h with retrieve a matrix consisting only the number of occurrences and does not store the original value to which it occurred.
What I want is something like a function which takes a value from x and returns number of occurrences of it. Having said that, what one thing histeq does should we admire is, it automatically scales nearest values according!
How should solve this issue? How exactly people do it?
My reason of interest is in images:
Lets say I have an image. I want to find all number of occurrences of a chrominance value of image.
I'm not really shure what you are looking for, but if you ant to use hist to count the number of occurences, use:
[h,c]=hist(x,sort(unique(x)))
Otherwise hist uses ranges defined by centers. The second output argument returns the corresponding number.
hist has a second return value that will be the bin centers xc corresponding to the counts n returned in form of the first return value: [n, xc] = hist(x). You should have a careful look at the reference which describes a large number of optional arguments that control the behavior of hist. However, hist is way too mighty for your specific problem.
To simply count the number of occurrences of a specific value, you could simply use something like sum(x(:) == 42). The colon operator will linearize your image matrix, the equals operator will yield a list of boolean values with 1 for each element of x that was 42, and thus sum will yield the total number of these occurrences.
An alternative to hist / histc is to use bsxfun:
n = unique(x(:)).'; %'// values contained in x. x can have any number of dims
y = sum(bsxfun(#eq, x(:), n)); %// count for each value

Efficient database lookup based on input where not all digits are sigificant

I would like to do a database lookup based on a 10 digit numeric value where only the first n digits are significant. Assume that there is no way in advance to determine n by looking at the value.
For example, I receive the value 5432154321. The corresponding entry (if it exists) might have key 54 or 543215 or any value based on n being somewhere between 1 and 10 inclusive.
Is there any efficient approach to matching on such a string short of simply trying all 10 possibilities?
Some background
The value is from a barcode scan. The barcodes are EAN13 restricted circulation numbers so they have the following structure:
02[1234567890]C
where C is a check sum. The 10 digits in between the 02 and the check sum consist of an item identifier followed by an item measure. There might be a check digit after the item identifier.
Since I can't depend on the data to adhere to any single standard, I would like to be able to define on an ad-hoc basis, how particular barcodes are structured which means that the portion of the 10 digit number that I extract, can be any length between 1 and 10.
Just a few ideas here:
1)
Maybe store these numbers in reversed form in your DB.
If you have N = 54321 you store it as N = 12345 in the DB.
Say N is the name of the column you stored it in.
When you read K = 5432154321, reverse this one too,
you get K1 = 1234512345, now check the DB column N
(whose value is let's say P), if K1 % 10^s == P,
where s=floor(Math.log(P) + 1).
Note: floor(Math.log(P) + 1) is a formula for
the count of digits of the number P > 0.
The value floor(Math.log(P) + 1) you may also
store in the DB as precomputed one, so that
you don't need to compute it each time.
2) As this 1) is kind of sick (but maybe best of the 3 ideas here),
maybe you just store them in a string column and check it with
'like operator'. But this is trivial, you probably considered it
already.
3) Or ... you store the numbers reversed, but you also
store all their residues mod 10^k for k=1...10.
col1, col2,..., col10
Then you can compare numbers almost directly,
the check will be something like
N % 10 == col1
or
N % 100 == col2
or
...
(N % 10^10) == col10.
Still not very elegant though (and not quite sure
if applicable to your case).
I decided to check my idea 1).
So here is an example
(I did it in SQL Server).
insert into numbers
(number, cnt_dig)
values
(1234, 1 + floor(log10(1234)))
insert into numbers
(number, cnt_dig)
values
(51234, 1 + floor(log10(51234)))
insert into numbers
(number, cnt_dig)
values
(7812334, 1 + floor(log10(7812334)))
select * From numbers
/*
Now we have this in our table:
id number cnt_dig
4 1234 4
5 51234 5
6 7812334 7
*/
-- Note that the actual numbers stored here
-- are the reversed ones: 4321, 43215, 4332187.
-- So far so good.
-- Now we read say K = 433218799 on the input
-- We reverse it and we get K1 = 997812334
declare #K1 bigint
set #K1 = 997812334
select * From numbers
where
#K1 % power(10, cnt_dig) = number
-- So from the last 3 queries,
-- we get this row:
-- id number cnt_dig
-- 6 7812334 7
--
-- meaning we have a match
-- i.e. the actual number 433218799
-- was matched successfully with the
-- actual number (from the DB) 4332187.
So this idea 1) doesn't seem that bad after all.

Random number generator that fills an interval

How would you implement a random number generator that, given an interval, (randomly) generates all numbers in that interval, without any repetition?
It should consume as little time and memory as possible.
Example in a just-invented C#-ruby-ish pseudocode:
interval = new Interval(0,9)
rg = new RandomGenerator(interval);
count = interval.Count // equals 10
count.times.do{
print rg.GetNext() + " "
}
This should output something like :
1 4 3 2 7 5 0 9 8 6
Fill an array with the interval, and then shuffle it.
The standard way to shuffle an array of N elements is to pick a random number between 0 and N-1 (say R), and swap item[R] with item[N]. Then subtract one from N, and repeat until you reach N =1.
This has come up before. Try using a linear feedback shift register.
One suggestion, but it's memory intensive:
The generator builds a list of all numbers in the interval, then shuffles it.
A very efficient way to shuffle an array of numbers where each index is unique comes from image processing and is used when applying techniques like pixel-dissolve.
Basically you start with an ordered 2D array and then shift columns and rows. Those permutations are by the way easy to implement, you can even have one exact method that will yield the resulting value at x,y after n permutations.
The basic technique, described on a 3x3 grid:
1) Start with an ordered list, each number may exist only once
0 1 2
3 4 5
6 7 8
2) Pick a row/column you want to shuffle, advance it one step. In this case, i am shifting the second row one to the right.
0 1 2
5 3 4
6 7 8
3) Pick a row/column you want to shuffle... I suffle the second column one down.
0 7 2
5 1 4
6 3 8
4) Pick ... For instance, first row, one to the left.
2 0 7
5 1 4
6 3 8
You can repeat those steps as often as you want. You can always do this kind of transformation also on a 1D array. So your result would be now [2, 0, 7, 5, 1, 4, 6, 3, 8].
An occasionally useful alternative to the shuffle approach is to use a subscriptable set container. At each step, choose a random number 0 <= n < count. Extract the nth item from the set.
The main problem is that typical containers can't handle this efficiently. I have used it with bit-vectors, but it only works well if the largest possible member is reasonably small, due to the linear scanning of the bitvector needed to find the nth set bit.
99% of the time, the best approach is to shuffle as others have suggested.
EDIT
I missed the fact that a simple array is a good "set" data structure - don't ask me why, I've used it before. The "trick" is that you don't care whether the items in the array are sorted or not. At each step, you choose one randomly and extract it. To fill the empty slot (without having to shift an average half of your items one step down) you just move the current end item into the empty slot in constant time, then reduce the size of the array by one.
For example...
class remaining_items_queue
{
private:
std::vector<int> m_Items;
public:
...
bool Extract (int &p_Item); // return false if items already exhausted
};
bool remaining_items_queue::Extract (int &p_Item)
{
if (m_Items.size () == 0) return false;
int l_Random = Random_Num (m_Items.size ());
// Random_Num written to give 0 <= result < parameter
p_Item = m_Items [l_Random];
m_Items [l_Random] = m_Items.back ();
m_Items.pop_back ();
}
The trick is to get a random number generator that gives (with a reasonably even distribution) numbers in the range 0 to n-1 where n is potentially different each time. Most standard random generators give a fixed range. Although the following DOESN'T give an even distribution, it is often good enough...
int Random_Num (int p)
{
return (std::rand () % p);
}
std::rand returns random values in the range 0 <= x < RAND_MAX, where RAND_MAX is implementation defined.
Take all numbers in the interval, put them to list/array
Shuffle the list/array
Loop over the list/array
One way is to generate an ordered list (0-9) in your example.
Then use the random function to select an item from the list. Remove the item from the original list and add it to the tail of new one.
The process is finished when the original list is empty.
Output the new list.
You can use a linear congruential generator with parameters chosen randomly but so that it generates the full period. You need to be careful, because the quality of the random numbers may be bad, depending on the parameters.

Resources