Algorithm for iterating over random permutation - algorithm

I have a bag that has the following:
6 red marbles
5 green marbles
2 blue marbles
I want to remove a random marble from the bag, record its color, and repeat until no more marbles are left in the bag:
sort the counts
bag = {2:blue, 5:green, 6:red}
compute the cumulative counts
cumulative = {2:blue, 7:green, 13:red}
pick a random number in [0, max cumulative count]
rand(0, 13) = 3
find insertion point index of this integer using binary search
i = 1
record the color corresponding to this index
green
reduce that count by 1
bag = {2:blue, 4:green, 6:red}
repeat until no more marbles in bag
Is this a good way to do this or are there more efficient ways in terms of time complexity?

Your algorithm is pretty good, but it could be optimized further:
You don't need to sort the colors! You can skip the first step.
Instead of calculating the cumulative counts each time you can do it iteratively by decreasing all values right of the selected one (including the selected color itself).
You also don't need the binary search, you can just start decreasing the cumulative counts from the end until you reach the correct number.
There is also another algorithm based on lists:
Create a list with all the items (0=red, 1=green, 2=blue): [0,0,0,0,0,0,1,1,1,1,1,2,2].
Get a random integer i between 0 and the size of the list - 1.
Remove the ith item from the list and add it to the result.
Repeat 2. and 3. until the list is empty.

Instead of relying on extraction, you can shuffle the array in-place.
like in maraca's answer, you store the items individually in the array (citing it here: "Create a list with all the items (0=red, 1=green, 2=blue): [0,0,0,0,0,0,1,1,1,1,1,2,2].")
iterate through the array and, for each element i, pick a random index j of an element to swap place with
at the end, just iterate over the array to get a shuffled order.
Something like
for(i=0..len-1) {
j=random(0..len-1);
// swap them
aux=a[i]; a[i]=a[j]; a[j]=aux;
}
// now consume the array - it is as random as it can be
// without extracting from it on the way
Note: many programming languages will have libraries providing already implemented array/list shuffling functions
C++ - std::random_shuffle
Java - Collections.shuffle
Python - random.shuffle

Related

Finding the combination from an array of numbers that gives the required coefficient

Please recommend the optimal algorithm or solution for such a task:
There are several arrays with fractional numbers
a = [1.5, 2, 3, 4.5, 7, 10, ...(up to 100 numbers)]
b = [5, 6, 8, 14, ...]
c = [1, 2, 4, 6.25, 8.15 ...] (up to 7 arrays)
Arrays can be of arbitrary length and contain a different count of numbers.
It is required to select one number from each array in such a way that their product was into a given range.
For example data required product should be between 40 and 50.
Solution can be:
a[2] * b[2] * c[1] = 3 * 8 * 2 = 48
a[0] * b[3] * c[1] = 1.5 * 14 * 2 = 42
If there can be several solutions (different combinations), then how can you find them all in the optimal way?
This is doable, but barely. This will require combining pairs of things over and over again using a variety of strategies.
First of all if you have 2 arrays of no more than 100 things, you can create an array of all pairs, sorted by sum either ascending or descending, and it only has 10,000 things in it.
Next, we can use a heap to implement a priority queue.
With a priority queue, we can combine 2 ordered arrays of size at most 10,000 to stream out the sums in either ascending or descending order while not keeping track of more than 10,000 things. How? First we create a data structure like this:
Create priority queue
For every entry a of array A:
Put (a, B[0], 0) into our queue using the product as a priority
return a data structure which contains B and the priority queue
And now we can get values out like this:
If the priority queue is empty:
We're done
else:
Take the first element of the queue
if not at the end of B:
insert (a, b[next_index], next_index) into the queue
return that first element
And we can peek at them by just looking at the first element of the queue without touching the data structure.
This strategy can stream through 2 arrays of size 10,000 with total work just a few billion operations.
OK, so now we can arrange to always have 7 arrays. (Some may simply be a trivial [1].) We can start as follows with the brute force strategy.
Combine the first 2 ascending.
Combine the second 2 ascending.
Combine the third 2 descending.
Arrange the last descending.
Next we can use the priority queue merge strategy as follows:
Combine (first 2) with (second 2) ascending
Combine (third 2) with last descending
We just need the generators at the moment.
Now our strategy will look like this:
For each combination (in ascending order) from first 4:
For each combination that lands in window from last 3:
emit final combination
But how do we do the window? Well, as the combination from the first 4 goes up, the window that the last 3 has to fall in goes down. So adjusting the window looks like this:
while there is a next value and next value is large enough to fit in the window:
Extract next value
Add next value to end of window
while first value is too large for the window:
remove first value from the window
(Variable sized arrays, such as Python's List, can do both these operations in amortized O(1) each.)
So our actual way to finish is:
For each combination (in ascending order) from first 4:
adjust entries in window from last 3
For each in window from last 3:
emit final combination
This has a fixed overhead of a few billion operations plus O(number of answers) to actually emit the combinations. This includes a number of data structures with around 10k items, plus a window whose maximum size is 1 million items for a maximum memory usage of a few hundred MB.

fastest algorithm for sum queries in a range

Assume we have the following data, which consists of a consecutive 0's and 1's (the nature of data is that there are very very very few 1s.
data =
[0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0]
so a huge number of zeros, and then possibly some ones (which indicate that some sort of an event is happening).
You want to query this data many times. The query is that given two indices i and j what is sum(data[i:j]). For example, sum_query(i=12, j=25) = 2 in above example.
Note that you have all these queries in advance.
What sort of a data structure can help me evaluate all the queries as fast as possible?
My initial thoughts:
preprocess the data and obtain two shorter arrays: data_change and data_cumsum. The data_change will be filled up with the indices for when the sequence of 1s will start and when the next sequence of 0s will start, and so on. The data_cumsum will contain the corresponding cummulative sums up to indices represented in data_change, i.e. data_cumsum[k] = sum(data[0:data_change[k]])
In above example, the preprocessing results in: data_change=[8,11,18,20,31,35] and data_cumsum=[0,3,3,5,5,9]
Then if query comes for i=12 and j=25, I will do a binary search in this sorted data_change array to find the corresponding index for 12 and then for 25, which will result in the 0-based indices: bin_search(data_change, 12)=2 and bin_search(data_change, 25)=4.
Then I simply output the corresponding difference from the cumsum array: data_cumsum[4] - data_cumsum[2]. (I won't go into the detail of handling the situation where the any endpoint of the query range falls in the middle of the sequence of 1's, but those cases can be handled easily with an if-statement.
With linear space, linear preprocessing, constant query time, you can store an array of sums. The i'th position gets the sum of the first i elements. To get query(i,j) you take the difference of the sums (sums[j] - sums[i-1]).
I already gave an O(1) time, O(n) space answer. Here are some alternates that trade time for space.
1. Assuming that the number of 1s is O(log n) or better (say O(log n) for argument):
Store an array of ints representing the positions of the ones in the original array. so if the input is [1,0,0,0,1,0,1,1] then A = [0,4,6,7].
Given a query, use binary search on A for the start and end of the query in O(log(|A|)) = O(log(log(n)). If the element you're looking for isn't in A, find the smallest bigger index and the largest smaller index. E.g., for query (2,6) you'd return the indices for the 4 and the 6, which are (1,2). Then the answer is one more than the difference.
2. Take advantage of knowing all the queries up front (as mentioned by the OP in a comment to my other answer). Say Q = (Q1, Q2, ..., Qm) is the set of queries.
Process the queries, storing a map of start and end indices to the query. E.g., if Q1 = (12,92) then our map would include {92 => Q1, 12 => Q1}. This take O(m) time and O(m) space. Take note of the smallest start index and the largest end index.
Process the input data, starting with the smallest start index. Keep track of the running sum. For each index, check your map of queries. If the index is in the map, associate the current running sum with the appropriate query.
At the end, each query will have two sums associated with it. Add one to the difference to get the answer.
Worst case analysis:
O(n) + O(m) time, O(m) space. However, this is across all queries. The amortized time cost per query is O(n/m). This is the same as my constant time solution (which required O(n) preprocessing).
I would probably go with something like this:
# boilerplate testdata
from itertools import chain, permutations
data = [0,0,0,0,0,0,0,1,1,1]
chained = list(chain(*permutations(data,5))) # increase 5 to 10 if you dare
Preprozessing:
frSet = frozenset([i for i in range(len(chained)) if chained[i]==1])
"Counting":
# O(min(len(frSet), len(frozenset(range(200,500))))
summa = frSet.intersection(frozenset(range(200,500))) # use two sets for faster intersect
counted=len(summa)
"Sanity-Check"
print(sum([1 for x in frSet if x >= 200 and x<500]))
print(summa)
print(len(summa))
No edge cases needed, intersection will do all you need, slightly higher memory as you store each index not ranges of ones. Performance depends on intersection-Implementation.
This might be helpfull: https://wiki.python.org/moin/TimeComplexity#set

Group numbers into closest groups

For example, I have the numbers 46,47,54,58,60, and 66. I want to make group them in such a way as to make the largest possible group sizes. Numbers get grouped if their values are within plus or minus 10 (inclusive). So, depending on which number you start with, for this example there can be three possible outcomes (shown in the image).
What I want is the second possible outcome, which would occur if you started with 54, as the numbers within 44 to 64 would be grouped, leaving 66 by itself, and creating the largest group (5 items).
I realize I could easily brute force this example, but I really have a long list of numbers and it needs to do this across thousands of numbers.. Can anyone tell me about algorithms I should be reading about or give me suggestions?
You can simply sort the array first. Then for every i th number you can do a binary search to find the right most number that's within ith number + 20 range, let the position of such right most index is X. You have to find the largest (X-i+1) for all ith numbers and we are done :)
Runtime analysis: Runtime for this algorithm will be O(NlgN), where N is the number of items in the original array.
A better solution: Let's assume we have the array ar[] and ar[] has N items.
sort ar[] in non decreasing order
set max_result = 0, set cur_index = 0, i=0
increase i while i
set max_result to max(max_result,i-cur_index+1)
set cur_index=cur_index+1
if cur_index
Runtime Analysis: O(N), where N is the number of items in the array ar[] as cur_index will iterate through the array exactly once and i will iterate just once too.
Correctness: as the array is sorted in non decreasing order, if i < j and j < k and ar[i]+20 > ar[k] then ar[j]+20 > ar[k] too. So we don't need to check for these items those are already checked for previous item.
This is what I wanted to do. Sorry I didn't explain myself very well. Each iteration finds the largest possible group, using the numbers that are left after removing the previous largest group. Matlab code:
function out=groupNums(y)
d=10;
out=[];
if length(y)==1
out=y;
return
end
group=[];
for i=1:length(y)
group{i}=find(y<=y(i)+d & y>=y(i)-d);
end
[~,idx]=max(cellfun(#length,group));
out=[out,{y(group{idx})}];
y(group{idx})=[];
out=[out,groupNums(y)];

Sorting Linked list : random or nearly sorted?

i have a linked list and i want to check whether its nearly sorted or random? Can anyone suggest how to do that??
Right now what i am trying to do is run upto half of list and compare adjacent elements to check whether given list is nearly sorted or otherwise. But difficulty is that this method is not full proof and i want something concrete.
For example, if you have 100 item then the scale will be out of 100. (Score of how much the list is sorted.) If you have all the list sorted you have a score of 100. If the list is sorted in backwards then you have a 0 score. You will be checking each of the adjacent and decide if the pair is sorted or not (0th and 1st, 1st and 2nd, 2nd and 3rd and so on). Therefore you will have a scale between 0 and 100 (or the linked list size for your case). There are a lot of heuristics about the "sorting scale" but this might be one.
If you want to involve the amplitude of your data you can do (Python3):
import random
l = [random.random() for x in range(100)]
s = 0
for i,x in enumerate(l[0:50]):
s += l[i+1] - x
print(s)
if you'd rather just look at how many values are sorted, replace the s+= line with
s += 1 if l[i+1] > x else 0

Random number generator that fills an interval

How would you implement a random number generator that, given an interval, (randomly) generates all numbers in that interval, without any repetition?
It should consume as little time and memory as possible.
Example in a just-invented C#-ruby-ish pseudocode:
interval = new Interval(0,9)
rg = new RandomGenerator(interval);
count = interval.Count // equals 10
count.times.do{
print rg.GetNext() + " "
}
This should output something like :
1 4 3 2 7 5 0 9 8 6
Fill an array with the interval, and then shuffle it.
The standard way to shuffle an array of N elements is to pick a random number between 0 and N-1 (say R), and swap item[R] with item[N]. Then subtract one from N, and repeat until you reach N =1.
This has come up before. Try using a linear feedback shift register.
One suggestion, but it's memory intensive:
The generator builds a list of all numbers in the interval, then shuffles it.
A very efficient way to shuffle an array of numbers where each index is unique comes from image processing and is used when applying techniques like pixel-dissolve.
Basically you start with an ordered 2D array and then shift columns and rows. Those permutations are by the way easy to implement, you can even have one exact method that will yield the resulting value at x,y after n permutations.
The basic technique, described on a 3x3 grid:
1) Start with an ordered list, each number may exist only once
0 1 2
3 4 5
6 7 8
2) Pick a row/column you want to shuffle, advance it one step. In this case, i am shifting the second row one to the right.
0 1 2
5 3 4
6 7 8
3) Pick a row/column you want to shuffle... I suffle the second column one down.
0 7 2
5 1 4
6 3 8
4) Pick ... For instance, first row, one to the left.
2 0 7
5 1 4
6 3 8
You can repeat those steps as often as you want. You can always do this kind of transformation also on a 1D array. So your result would be now [2, 0, 7, 5, 1, 4, 6, 3, 8].
An occasionally useful alternative to the shuffle approach is to use a subscriptable set container. At each step, choose a random number 0 <= n < count. Extract the nth item from the set.
The main problem is that typical containers can't handle this efficiently. I have used it with bit-vectors, but it only works well if the largest possible member is reasonably small, due to the linear scanning of the bitvector needed to find the nth set bit.
99% of the time, the best approach is to shuffle as others have suggested.
EDIT
I missed the fact that a simple array is a good "set" data structure - don't ask me why, I've used it before. The "trick" is that you don't care whether the items in the array are sorted or not. At each step, you choose one randomly and extract it. To fill the empty slot (without having to shift an average half of your items one step down) you just move the current end item into the empty slot in constant time, then reduce the size of the array by one.
For example...
class remaining_items_queue
{
private:
std::vector<int> m_Items;
public:
...
bool Extract (int &p_Item); // return false if items already exhausted
};
bool remaining_items_queue::Extract (int &p_Item)
{
if (m_Items.size () == 0) return false;
int l_Random = Random_Num (m_Items.size ());
// Random_Num written to give 0 <= result < parameter
p_Item = m_Items [l_Random];
m_Items [l_Random] = m_Items.back ();
m_Items.pop_back ();
}
The trick is to get a random number generator that gives (with a reasonably even distribution) numbers in the range 0 to n-1 where n is potentially different each time. Most standard random generators give a fixed range. Although the following DOESN'T give an even distribution, it is often good enough...
int Random_Num (int p)
{
return (std::rand () % p);
}
std::rand returns random values in the range 0 <= x < RAND_MAX, where RAND_MAX is implementation defined.
Take all numbers in the interval, put them to list/array
Shuffle the list/array
Loop over the list/array
One way is to generate an ordered list (0-9) in your example.
Then use the random function to select an item from the list. Remove the item from the original list and add it to the tail of new one.
The process is finished when the original list is empty.
Output the new list.
You can use a linear congruential generator with parameters chosen randomly but so that it generates the full period. You need to be careful, because the quality of the random numbers may be bad, depending on the parameters.

Resources