Group numbers into closest groups - algorithm

For example, I have the numbers 46,47,54,58,60, and 66. I want to make group them in such a way as to make the largest possible group sizes. Numbers get grouped if their values are within plus or minus 10 (inclusive). So, depending on which number you start with, for this example there can be three possible outcomes (shown in the image).
What I want is the second possible outcome, which would occur if you started with 54, as the numbers within 44 to 64 would be grouped, leaving 66 by itself, and creating the largest group (5 items).
I realize I could easily brute force this example, but I really have a long list of numbers and it needs to do this across thousands of numbers.. Can anyone tell me about algorithms I should be reading about or give me suggestions?

You can simply sort the array first. Then for every i th number you can do a binary search to find the right most number that's within ith number + 20 range, let the position of such right most index is X. You have to find the largest (X-i+1) for all ith numbers and we are done :)
Runtime analysis: Runtime for this algorithm will be O(NlgN), where N is the number of items in the original array.
A better solution: Let's assume we have the array ar[] and ar[] has N items.
sort ar[] in non decreasing order
set max_result = 0, set cur_index = 0, i=0
increase i while i
set max_result to max(max_result,i-cur_index+1)
set cur_index=cur_index+1
if cur_index
Runtime Analysis: O(N), where N is the number of items in the array ar[] as cur_index will iterate through the array exactly once and i will iterate just once too.
Correctness: as the array is sorted in non decreasing order, if i < j and j < k and ar[i]+20 > ar[k] then ar[j]+20 > ar[k] too. So we don't need to check for these items those are already checked for previous item.

This is what I wanted to do. Sorry I didn't explain myself very well. Each iteration finds the largest possible group, using the numbers that are left after removing the previous largest group. Matlab code:
function out=groupNums(y)
d=10;
out=[];
if length(y)==1
out=y;
return
end
group=[];
for i=1:length(y)
group{i}=find(y<=y(i)+d & y>=y(i)-d);
end
[~,idx]=max(cellfun(#length,group));
out=[out,{y(group{idx})}];
y(group{idx})=[];
out=[out,groupNums(y)];

Related

Replace two elements with their absolute difference and generate the minimum possible element in array

I have an array of size n and I can apply any number of operations(zero included) on it. In an operation, I can take any two elements and replace them with the absolute difference of the two elements. We have to find the minimum possible element that can be generated using the operation. (n<1000)
Here's an example of how operation works. Let the array be [1,3,4]. Applying operation on 1,3 gives [2,4] as the new array.
Ex: 2 6 11 3 => ans = 0
This is because 11-6 = 5 and 5-3 = 2 and 2-2 = 0
Ex: 20 6 4 => ans = 2
Ex: 2 6 10 14 => ans = 0
Ex: 2 6 10 => ans = 2
Can anyone tell me how can I approach this problem?
Edit:
We can use recursion to generate all possible cases and pick the minimum element from them. This would have complexity of O(n^2 !).
Another approach I tried is Sorting the array and then making a recursion call where the either starting from 0 or 1, I apply the operations on all consecutive elements. This will continue till their is only one element left in the array and we can return the minimum at any point in the recursion. This will have a complexity of O(n^2) but doesn't necessarily give the right answer.
Ex: 2 6 10 15 => (4 5) & (2 4 15) => (1) & (2 15) & (2 11) => (13) & (9). The minimum of this will be 1 which is the answer.
When you choose two elements for the operation, you subtract the smaller one from the bigger one. So if you choose 1 and 7, the result is 7 - 1 = 6.
Now having 2 6 and 8 you can do:
8 - 2 -> 6 and then 6 - 6 = 0
You may also write it like this: 8 - 2 - 6 = 0
Let"s consider different operation: you can take two elements and replace them by their sum or their difference.
Even though you can obtain completely different values using the new operation, the absolute value of the element closest to 0 will be exactly the same as using the old one.
First, let's try to solve this problem using the new operations, then we'll make sure that the answer is indeed the same as using the old ones.
What you are trying to do is to choose two nonintersecting subsets of initial array, then from sum of all the elements from the first set subtract sum of all the elements from the second one. You want to find two such subsets that the result is closest possible to 0. That is an NP problem and one can efficiently solve it using pseudopolynomial algorithm similar to the knapsack problem in O(n * sum of all elements)
Each element of initial array can either belong to the positive set (set which sum you subtract from), negative set (set which sum you subtract) or none of them. In different words: each element you can either add to the result, subtract from the result or leave untouched. Let's say we already calculated all obtainable values using elements from the first one to the i-th one. Now we consider i+1-th element. We can take any of the obtainable values and increase it or decrease it by the value of i+1-th element. After doing that with all the elements we get all possible values obtainable from that array. Then we choose one which is closest to 0.
Now the harder part, why is it always a correct answer?
Let's consider positive and negative sets from which we obtain minimal result. We want to achieve it using initial operations. Let's say that there are more elements in the negative set than in the positive set (otherwise swap them).
What if we have only one element in the positive set and only one element in the negative set? Then absolute value of their difference is equal to the value obtained by using our operation on it.
What if we have one element in the positive set and two in the negative one?
1) One of the negative elements is smaller than the positive element - then we just take them and use the operation on them. The result of it is a new element in the positive set. Then we have the previous case.
2) Both negative elements are smaller than the positive one. Then if we remove bigger element from the negative set we get the result closer to 0, so this case is impossible to happen.
Let's say we have n elements in the positive set and m elements in the negative set (n <= m) and we are able to obtain the absolute value of difference of their sums (let's call it x) by using some operations. Now let's add an element to the negative set. If the difference before adding new element was negative, decreasing it by any other number makes it smaller, that is farther from 0, so it is impossible. So the difference must have been positive. Then we can use our operation on x and the new element to get the result.
Now second case: let's say we have n elements in the positive set and m elements in the negative set (n < m) and we are able to obtain the absolute value of difference of their sums (again let's call it x) by using some operations. Now we add new element to the positive set. Similarly, the difference must have been negative, so x is in the negative set. Then we obtain the result by doing the operation on x and the new element.
Using induction we can prove that the answer is always correct.

Algorithm for iterating over random permutation

I have a bag that has the following:
6 red marbles
5 green marbles
2 blue marbles
I want to remove a random marble from the bag, record its color, and repeat until no more marbles are left in the bag:
sort the counts
bag = {2:blue, 5:green, 6:red}
compute the cumulative counts
cumulative = {2:blue, 7:green, 13:red}
pick a random number in [0, max cumulative count]
rand(0, 13) = 3
find insertion point index of this integer using binary search
i = 1
record the color corresponding to this index
green
reduce that count by 1
bag = {2:blue, 4:green, 6:red}
repeat until no more marbles in bag
Is this a good way to do this or are there more efficient ways in terms of time complexity?
Your algorithm is pretty good, but it could be optimized further:
You don't need to sort the colors! You can skip the first step.
Instead of calculating the cumulative counts each time you can do it iteratively by decreasing all values right of the selected one (including the selected color itself).
You also don't need the binary search, you can just start decreasing the cumulative counts from the end until you reach the correct number.
There is also another algorithm based on lists:
Create a list with all the items (0=red, 1=green, 2=blue): [0,0,0,0,0,0,1,1,1,1,1,2,2].
Get a random integer i between 0 and the size of the list - 1.
Remove the ith item from the list and add it to the result.
Repeat 2. and 3. until the list is empty.
Instead of relying on extraction, you can shuffle the array in-place.
like in maraca's answer, you store the items individually in the array (citing it here: "Create a list with all the items (0=red, 1=green, 2=blue): [0,0,0,0,0,0,1,1,1,1,1,2,2].")
iterate through the array and, for each element i, pick a random index j of an element to swap place with
at the end, just iterate over the array to get a shuffled order.
Something like
for(i=0..len-1) {
j=random(0..len-1);
// swap them
aux=a[i]; a[i]=a[j]; a[j]=aux;
}
// now consume the array - it is as random as it can be
// without extracting from it on the way
Note: many programming languages will have libraries providing already implemented array/list shuffling functions
C++ - std::random_shuffle
Java - Collections.shuffle
Python - random.shuffle

Generate a number is range (1,n) but not in a list (i,j)

How can I generate a random number that is in the range (1,n) but not in a certain list (i,j)?
Example: range is (1,500), list is [1,3,4,45,199,212,344].
Note: The list may not be sorted
Rejection Sampling
One method is rejection sampling:
Generate a number x in the range (1, 500)
Is x in your list of disallowed values? (Can use a hash-set for this check.)
If yes, return to step 1
If no, x is your random value, done
This will work fine if your set of allowed values is significantly larger than your set of disallowed values:if there are G possible good values and B possible bad values, then the expected number of times you'll have to sample x from the G + B values until you get a good value is (G + B) / G (the expectation of the associated geometric distribution). (You can sense check this. As G goes to infinity, the expectation goes to 1. As B goes to infinity, the expectation goes to infinity.)
Sampling a List
Another method is to make a list L of all of your allowed values, then sample L[rand(L.count)].
The technique I usually use when the list is length 1 is to generate a random
integer r in [1,n-1], and if r is greater or equal to that single illegal
value then increment r.
This can be generalised for a list of length k for small k but requires
sorting that list (you can't do your compare-and-increment in random order). If the list is moderately long, then after the sort you can start with a bsearch, and add the number of values skipped to r, and then recurse into the remainder of the list.
For a list of length k, containing no value greater or equal to n-k, you
can do a more direct substitution: generate random r in [1,n-k], and
then iterate through the list testing if r is equal to list[i]. If it is
then set r to n-k+i (this assumes list is zero-based) and quit.
That second approach fails if some of the list elements are in [n-k,n].
I could try to invest something clever at this point, but what I have so far
seems sufficient for uniform distributions with values of k much less than
n...
Create two lists -- one of illegal values below n-k, and the other the rest (this can be done in place).
Generate random r in [1,n-k]
Apply the direct substitution approach for the first list (if r is list[i] then set r to n-k+i and go to step 5).
If r was not altered in step 3 then we're finished.
Sort the list of larger values and use the compare-and-increment method.
Observations:
If all values are in the lower list, there will be no sort because there is nothing to sort.
If all values are in the upper list, there will be no sort because there is no occasion on which r is moved into the hazardous area.
As k approaches n, the maximum size of the upper (sorted) list grows.
For a given k, if more value appear in the upper list (the bigger the sort), the chance of getting a hit in the lower list shrinks, reducing the likelihood of needing to do the sort.
Refinement:
Obviously things get very sorty for large k, but in such cases the list has comparatively few holes into which r is allowed to settle. This could surely be exploited.
I might suggest something different if many random values with the same
list and limits were needed. I hope that the list of illegal values is not the
list of results of previous calls to this function, because if it is then you
wouldn't want any of this -- instead you would want a Fisher-Yates shuffle.
Rejection sampling would be the simplest if possible as described already. However, if you didn't want use that, you could convert the range and disallowed values to sets and find the difference. Then, you could choose a random value out of there.
Assuming you wanted the range to be in [1,n] but not in [i,j] and that you wanted them uniformly distributed.
In Python
total = range(1,n+1)
disallowed = range(i,j+1)
allowed = list( set(total) - set(disallowed) )
return allowed[random.randrange(len(allowed))]
(Note that this is not EXACTLY uniform since in all likeliness, max_rand%len(allowed) != 0 but this will in most practical applications be very close)
I assume that you know how to generate a random number in [1, n) and also your list is ordered like in the example above.
Let's say that you have a list with k elements. Make a map(O(logn)) structure, which will ensure speed if k goes higher. Put all elements from list in map, where element value will be the key and "good" value will be the value. Later on I'll explain about "good" value. So when we have the map then just find a random number in [1, n - k - p)(Later on I'll explain what is p) and if this number is in map then replace it with "good" value.
"GOOD" value -> Let's start from k-th element. It's good value is its own value + 1, because the very next element is "good" for us. Now let's look at (k-1)th element. We assume that its good value is again its own value + 1. If this value is equal to k-th element then the "good" value for (k-1)th element is k-th "good" value + 1. Also you will have to store the largest "good" value. If the largest value exceed n then p(from above) will be p = largest - n.
Of course I recommend you this only if k is big number otherwise #Timothy Shields' method is perfect.

Sorting Linked list : random or nearly sorted?

i have a linked list and i want to check whether its nearly sorted or random? Can anyone suggest how to do that??
Right now what i am trying to do is run upto half of list and compare adjacent elements to check whether given list is nearly sorted or otherwise. But difficulty is that this method is not full proof and i want something concrete.
For example, if you have 100 item then the scale will be out of 100. (Score of how much the list is sorted.) If you have all the list sorted you have a score of 100. If the list is sorted in backwards then you have a 0 score. You will be checking each of the adjacent and decide if the pair is sorted or not (0th and 1st, 1st and 2nd, 2nd and 3rd and so on). Therefore you will have a scale between 0 and 100 (or the linked list size for your case). There are a lot of heuristics about the "sorting scale" but this might be one.
If you want to involve the amplitude of your data you can do (Python3):
import random
l = [random.random() for x in range(100)]
s = 0
for i,x in enumerate(l[0:50]):
s += l[i+1] - x
print(s)
if you'd rather just look at how many values are sorted, replace the s+= line with
s += 1 if l[i+1] > x else 0

Bin Packing - Brute force recursive solution - How to make it faster

I have an array which contains a list of different sizes of materials : {4,3,4,1,7,8} . However, the bin can accomodate materials upto size 10. I need to find out the minimum number of bins needed to pack all the elements in the array.
For the above array, you can pack in 3 bins and divide them as follows: {4,4,1}, {3,7} , {8} . There are other possible arrangements that also fit into three stock pipes, but it cannot be done with fewer
I am trying to solve this problem through recursion in order to understand it better.
I am using this DP formulation (page 20 of the pdf file)
Consider an input (n1;:::;nk) with n = ∑nj items
Determine set of k-tuples (subsets of the input) that can be packed into a single bin
That is, all tuples (q1;:::;qk) for which OPT(q1;:::;qk) = 1
Denote this set by Q For each k-tuple q , we have OPT(q) = 1
Calculate remaining values by using the recurrence : OPT(i1;:::;ik) = 1 +
minOPT(i1 - q1;:::;ik - qk)
I have made the code, and it works fine for small data set. But if increase the size of my array to more than 6 elements, it becomes extremely slow -- takes about 25 seconds to solve an array containing 8 elements Can you tell me if theres anything wrong with the algorithm? I dont need an alternative solution --- just need to know why this is so slow, and how it can be improved
Here is the code I have written in C++ :
void recCutStock(Vector<int> & requests, int numStocks)
{
if (requests.size() == 0)
{
if(numStocks <= minSize)
{
minSize = numStocks;
}
// cout<<"GOT A RESULT : "<<numStocks<<endl;
return ;
}
else
{
if(numStocks+1 < minSize) //minSize is a global variable initialized with a big val
{
Vector<int> temp ; Vector<Vector<int> > posBins;
getBins(requests, temp, 0 , posBins); // 2-d array(stored in posBins) containing all possible single bin formations
for(int i =0; i < posBins.size(); i++)
{
Vector<int> subResult;
reqMinusPos(requests, subResult, posBins[i]); // subtracts the initial request array from the subArray
//displayArr(subResult);
recCutStock(subResult, numStocks+1);
}
}
else return;
}
}
The getBins function is as follows :
void getBins(Vector<int> & requests, Vector<int> current, int index, Vector<Vector<int> > & bins)
{
if (index == requests.size())
{
if(sum(current,requests) <= stockLength && sum(current, requests)>0 )
{
bins.add(current);
// printBins(current,requests);
}
return ;
}
else
{
getBins(requests, current, index+1 , bins);
current.add(index);
getBins(requests, current, index+1 , bins);
}
}
The dynamic programming algorithm is O(n^{2k}) where k is the number of distinct items and n is the total number of items. This can be very slow irrespective of the implementation. Typically, when solving an NP-hard problem, heuristics are required for speed.
I suggest you consider Next Fit Decreasing Height (NFDH) and First Fit Decreasing Height (FFDH) from Coffman et al. They are 2-optimal and 17/10-optimal, respectively, and they run in O(n log n) time.
I recommend you first try NFDH: sort in decreasing order, store the result in a linked list, then repeatedly try to insert the items starting from the beginning (largest values first) until you have filled the bin or there is no more items that can be inserted. Then go to the next bin and so on.
References:
Owen Kaser, Daniel Lemire, Tag-Cloud Drawing: Algorithms for Cloud Visualization, Tagging and Metadata for Social Information Organization (WWW 2007), 2007. (See Section 5.1 for a related discussion.)
E. G. Coffman, Jr., M. R. Garey, D. S. Johnson, and R. E. Tarjan. Performance bounds for level-oriented two-dimensional packing algorithms. SIAM J. Comput., 9(4):808–826, 1980.
But if increase the size of my array to more than 6 elements, it
becomes extremely slow -- takes about 25 seconds to solve an array
containing 8 elements Can you tell me if theres anything wrong with
the algorithm?
That's normal with brute force. Brute force does not scale at all.
In your case: Bin size = 30, total items = 27, at least 3 bins are needed.
You could try first fit decreasing, and it works!
More ways to improve: With 3 bins and 27 size units, you will have 3 units of space left over. Which means you can ignore the item of size 1; if you fit the others into 3 bins, it will fit somewhere. That leaves you with 26 size units. That means you will have at least two units empty in one bin. If you had items of size 2, you could ignore them as well because they would fit. If you had two items of size 2, you could ignore items of size 3 as well.
You have two items of size 7 + 3 which is exactly the bin size. There is always an optimal solution where these two are in the same bin: If the "7" were with other items, their size would be 3 or less, so you could swap them with the "3" if it is in another bin.
Another method: If you have k items >= bin size / 2 (you can't have two items equal to bin size / 2 at this point), then you need k bins. This might increase the minimum number of bins that you estimated initially which in turn increases the guaranteed empty space in all bins which increases the minimum size of leftover space in one bin. If for j = 1, 2, ..., k you can fit all items with them into j bins that could possibly fit into the same bin, then this is optimal. For example, if you had sizes 8, 1, 1 but no size 2, then 8+1+1 in a bin would be optimal. Since you have 8 + 4 + 4 left, and nothing fits with the 8, "8" alone in its bin is optimal. (If you had items of sizes 8, 8, 8, 2, 1, 1, 1 and nothing else of size 2, packing them into three bins would be optimal).
More things to try if you have large items: If you have a large item, and the largest item that fits with it is as large or larger than any combination of items that would fit, then combining them is optimal. If there is more space, then this can be repeated.
So all in all, a bit of thinking reduced the problem to fitting two items of sizes 4, 4 into one or more bins. With larger problems, every little bit helps.
I've written a bin-packing solution and I can recommend best-fit with random order.
After doing what you can to reduce the problem, you are left with the problem to fit n items into k bins if possible, into k + 1 bins otherwise, or into k + 2 bins etc. If k bins fail, then you know that you will have more empty space in an optimal solution of k + 1 bins, which may make it possible to remove more small items, so that's the first thing to do.
Then you try some simple methods: First fit descending, next fit descending. I tried a reasonably fast variation of first fit descending: As long as the two largest items fit, add the largest item. Otherwise, find the single item or the largest combination of two items that fit, and add the single item or the larger of that combination. If any of these algorithms fits your items into k bins, you solved the problem.
And eventually you need brute force. You can decide: Do you attempt to fit everything into k bins, or do you attempt to prove it isn't possible? You will have some empty space to play with. Let's say 10 bins of size 100 and items of total size 936, that would leave you 64 units of empty space. If you put only items of size 80 into your first bin, then 20 of your 64 units are already gone, making it much harder to find a solution from there. So you don't try things in random order. You first try combinations for the first bin that fill it completely or close to completely. Since small items make it easier to fill containers completely you try not to use them in the first bins but leave them for later, when you have less choice of items. And when you've found items to put into a bin, try one of the simple algorithms to see if they can finish it. For example, if first fit descending put 90 units into the first bin, and you just managed to put 99 units in there, it is quite possible that this is enough improvement to fit everything.
On the other hand, if there is very little space (10 bins, total item size 995 for example) you may want to prove that fitting the items is not possible. You don't need to care about optimising the algorithm to find a solution quickly, because you need to try all combinations to see they don't work. Obviously you need with these numbers to fit at least 95 units into the first bin and so on; that might make it easy to rule out solutions quickly. If you proved that k bins are not achievable, then k+1 bins should be a much easier target.

Resources