Detect duplicate elements in subset of given list of numbers - algorithm

I have a set of numbers say : 1 1 2 8 5 6 6 7 8 8 4 2...
I want to detect the duplicate element in sub-array (of given size say k) of the above numbers... For example : Consider the increasing sub arrays for k=3`
Sub array 1 :{1,1,2}
Sub array 2 :{1,2,8}
Sub array 3 :{2,8,5}
Sub array 4 :{8,5,6}
Sub array 5 :{5,6,6}
Sub array 6 :{6,6,7}
....
So my algorithm should detect that sub-arrays 1, 5, and 6 contain duplicates..
My approach :
1)Copy the 1st k elements to a temporary array(vector)
2) using #include file in C++ STL...using unique() I would determine if there's any change in size of vector..
Any other clue how to approach this problem....because my method would consume lot of time and space if the list of the given number is large..

O(n) average time and O(k) space solution could be to build a hash-based histogram, and iterate the array while maintaining #occurances for elements in each sublist.
In each iteration, kick the eldest element out (by decreasing the relevant entrance in the histogram) and add a new element.
Also maintain a numDupes variable, which counts how many dupes you currently have and is maintained when adding/removing elements from the current candidate.
Pseudo code (sorry if I have off by 1 error or something, but this is the idea):
numDupes = 0
histogram = new map<int,int>;
//first set:
for each i form 0 to k:
if histogram.contains(arr[i]):
histogram.put(arr[i],histogram.get(arr[i]) + 1)
numDupes += 1
else:
histogram.put(arr[i],1)
//each iteration is for a new set
if (numDupes > 0) print 1 //first sub array has dupes
for each i from k to n:
if (histogram.get(arr[i-k]) > 1) numDupes -= 1 //we just removed a dupe
histogram.put(arr[i-k],histogram.get(arr[i-k] - 1)) //take off "eldest" element.
if (histogram.contains(arr[i]) && histogram.get(arr[i]) > 0):
histogram.put(arr[i],histogram,get(arr[i]) + 1))
numDupes += 1 //we just added a dupe
else:
histogram.put(arr[i],1)
if (numDupes > 0) print i-k+1 // the current sub array contains a dupe
The initial answer had a small mistake: It failקג to catch cases when the last element added did not cause the dupe, but there is still one (like sub array 6 in the example).
It can be solved by maintaining an extra integer that counts the number of current dupes found, and print the sub-array when the counter is greater then 0. (updated the pseudo code).
Also note: To achieve O(k) space, your histogram needs to delete elements when their value is 0.

Related

Replace two elements with their absolute difference and generate the minimum possible element in array

I have an array of size n and I can apply any number of operations(zero included) on it. In an operation, I can take any two elements and replace them with the absolute difference of the two elements. We have to find the minimum possible element that can be generated using the operation. (n<1000)
Here's an example of how operation works. Let the array be [1,3,4]. Applying operation on 1,3 gives [2,4] as the new array.
Ex: 2 6 11 3 => ans = 0
This is because 11-6 = 5 and 5-3 = 2 and 2-2 = 0
Ex: 20 6 4 => ans = 2
Ex: 2 6 10 14 => ans = 0
Ex: 2 6 10 => ans = 2
Can anyone tell me how can I approach this problem?
Edit:
We can use recursion to generate all possible cases and pick the minimum element from them. This would have complexity of O(n^2 !).
Another approach I tried is Sorting the array and then making a recursion call where the either starting from 0 or 1, I apply the operations on all consecutive elements. This will continue till their is only one element left in the array and we can return the minimum at any point in the recursion. This will have a complexity of O(n^2) but doesn't necessarily give the right answer.
Ex: 2 6 10 15 => (4 5) & (2 4 15) => (1) & (2 15) & (2 11) => (13) & (9). The minimum of this will be 1 which is the answer.
When you choose two elements for the operation, you subtract the smaller one from the bigger one. So if you choose 1 and 7, the result is 7 - 1 = 6.
Now having 2 6 and 8 you can do:
8 - 2 -> 6 and then 6 - 6 = 0
You may also write it like this: 8 - 2 - 6 = 0
Let"s consider different operation: you can take two elements and replace them by their sum or their difference.
Even though you can obtain completely different values using the new operation, the absolute value of the element closest to 0 will be exactly the same as using the old one.
First, let's try to solve this problem using the new operations, then we'll make sure that the answer is indeed the same as using the old ones.
What you are trying to do is to choose two nonintersecting subsets of initial array, then from sum of all the elements from the first set subtract sum of all the elements from the second one. You want to find two such subsets that the result is closest possible to 0. That is an NP problem and one can efficiently solve it using pseudopolynomial algorithm similar to the knapsack problem in O(n * sum of all elements)
Each element of initial array can either belong to the positive set (set which sum you subtract from), negative set (set which sum you subtract) or none of them. In different words: each element you can either add to the result, subtract from the result or leave untouched. Let's say we already calculated all obtainable values using elements from the first one to the i-th one. Now we consider i+1-th element. We can take any of the obtainable values and increase it or decrease it by the value of i+1-th element. After doing that with all the elements we get all possible values obtainable from that array. Then we choose one which is closest to 0.
Now the harder part, why is it always a correct answer?
Let's consider positive and negative sets from which we obtain minimal result. We want to achieve it using initial operations. Let's say that there are more elements in the negative set than in the positive set (otherwise swap them).
What if we have only one element in the positive set and only one element in the negative set? Then absolute value of their difference is equal to the value obtained by using our operation on it.
What if we have one element in the positive set and two in the negative one?
1) One of the negative elements is smaller than the positive element - then we just take them and use the operation on them. The result of it is a new element in the positive set. Then we have the previous case.
2) Both negative elements are smaller than the positive one. Then if we remove bigger element from the negative set we get the result closer to 0, so this case is impossible to happen.
Let's say we have n elements in the positive set and m elements in the negative set (n <= m) and we are able to obtain the absolute value of difference of their sums (let's call it x) by using some operations. Now let's add an element to the negative set. If the difference before adding new element was negative, decreasing it by any other number makes it smaller, that is farther from 0, so it is impossible. So the difference must have been positive. Then we can use our operation on x and the new element to get the result.
Now second case: let's say we have n elements in the positive set and m elements in the negative set (n < m) and we are able to obtain the absolute value of difference of their sums (again let's call it x) by using some operations. Now we add new element to the positive set. Similarly, the difference must have been negative, so x is in the negative set. Then we obtain the result by doing the operation on x and the new element.
Using induction we can prove that the answer is always correct.

Disperse Duplicates in an Array

Source : Google Interview Question
Write a routine to ensure that identical elements in the input are maximally spread in the output?
Basically, we need to place the same elements,in such a way , that the TOTAL spreading is as maximal as possible.
Example:
Input: {1,1,2,3,2,3}
Possible Output: {1,2,3,1,2,3}
Total dispersion = Difference between position of 1's + 2's + 3's = 4-1 + 5-2 + 6-3 = 9 .
I am NOT AT ALL sure, if there's an optimal polynomial time algorithm available for this.Also,no other detail is provided for the question other than this .
What i thought is,calculate the frequency of each element in the input,then arrange them in the output,each distinct element at a time,until all the frequencies are exhausted.
I am not sure of my approach .
Any approaches/ideas people .
I believe this simple algorithm would work:
count the number of occurrences of each distinct element.
make a new list
add one instance of all elements that occur more than once to the list (order within each group does not matter)
add one instance of all unique elements to the list
add one instance of all elements that occur more than once to the list
add one instance of all elements that occur more than twice to the list
add one instance of all elements that occur more than trice to the list
...
Now, this will intuitively not give a good spread:
for {1, 1, 1, 1, 2, 3, 4} ==> {1, 2, 3, 4, 1, 1, 1}
for {1, 1, 1, 2, 2, 2, 3, 4} ==> {1, 2, 3, 4, 1, 2, 1, 2}
However, i think this is the best spread you can get given the scoring function provided.
Since the dispersion score counts the sum of the distances instead of the squared sum of the distances, you can have several duplicates close together, as long as you have a large gap somewhere else to compensate.
for a sum-of-squared-distances score, the problem becomes harder.
Perhaps the interview question hinged on the candidate recognizing this weakness in the scoring function?
In perl
#a=(9,9,9,2,2,2,1,1,1);
then make a hash table of the counts of different numbers in the list, like a frequency table
map { $x{$_}++ } #a;
then repeatedly walk through all the keys found, with the keys in a known order and add the appropriate number of individual numbers to an output list until all the keys are exhausted
#r=();
$g=1;
while( $g == 1 ) {
$g=0;
for my $n (sort keys %x)
{
if ($x{$n}>1) {
push #r, $n;
$x{$n}--;
$g=1
}
}
}
I'm sure that this could be adapted to any programming language that supports hash tables
python code for algorithm suggested by Vorsprung and HugoRune:
from collections import Counter, defaultdict
def max_spread(data):
cnt = Counter()
for i in data: cnt[i] += 1
res, num = [], list(cnt)
while len(cnt) > 0:
for i in num:
if num[i] > 0:
res.append(i)
cnt[i] -= 1
if cnt[i] == 0: del cnt[i]
return res
def calc_spread(data):
d = defaultdict()
for i, v in enumerate(data):
d.setdefault(v, []).append(i)
return sum([max(x) - min(x) for _, x in d.items()])
HugoRune's answer takes some advantage of the unusual scoring function but we can actually do even better: suppose there are d distinct non-unique values, then the only thing that is required for a solution to be optimal is that the first d values in the output must consist of these in any order, and likewise the last d values in the output must consist of these values in any (i.e. possibly a different) order. (This implies that all unique numbers appear between the first and last instance of every non-unique number.)
The relative order of the first copies of non-unique numbers doesn't matter, and likewise nor does the relative order of their last copies. Suppose the values 1 and 2 both appear multiple times in the input, and that we have built a candidate solution obeying the condition I gave in the first paragraph that has the first copy of 1 at position i and the first copy of 2 at position j > i. Now suppose we swap these two elements. Element 1 has been pushed j - i positions to the right, so its score contribution will drop by j - i. But element 2 has been pushed j - i positions to the left, so its score contribution will increase by j - i. These cancel out, leaving the total score unchanged.
Now, any permutation of elements can be achieved by swapping elements in the following way: swap the element in position 1 with the element that should be at position 1, then do the same for position 2, and so on. After the ith step, the first i elements of the permutation are correct. We know that every swap leaves the scoring function unchanged, and a permutation is just a sequence of swaps, so every permutation also leaves the scoring function unchanged! This is true at for the d elements at both ends of the output array.
When 3 or more copies of a number exist, only the position of the first and last copy contribute to the distance for that number. It doesn't matter where the middle ones go. I'll call the elements between the 2 blocks of d elements at either end the "central" elements. They consist of the unique elements, as well as some number of copies of all those non-unique elements that appear at least 3 times. As before, it's easy to see that any permutation of these "central" elements corresponds to a sequence of swaps, and that any such swap will leave the overall score unchanged (in fact it's even simpler than before, since swapping two central elements does not even change the score contribution of either of these elements).
This leads to a simple O(nlog n) algorithm (or O(n) if you use bucket sort for the first step) to generate a solution array Y from a length-n input array X:
Sort the input array X.
Use a single pass through X to count the number of distinct non-unique elements. Call this d.
Set i, j and k to 0.
While i < n:
If X[i+1] == X[i], we have a non-unique element:
Set Y[j] = Y[n-j-1] = X[i].
Increment i twice, and increment j once.
While X[i] == X[i-1]:
Set Y[d+k] = X[i].
Increment i and k.
Otherwise we have a unique element:
Set Y[d+k] = X[i].
Increment i and k.

Anyone have a sort algorithm optimized for very slow write operations?

I need a sorting algorithm which operates on a single, pre-populated array, and which is limited to perform only one type of write operation:
O=Move an item X to index Y. The elements on subsequent positions are shifted 1 position.
The algorithm must be optimized for the least possible number of operations O. Read operations are infinitely cheaper than write operations. Temporary helper lists are also cheap.
Edit: It might be more correct to call it a linked list, because of its behaviour, although the implementation is hidden for me.
Background:
The thing is I'm working against a Google API which only allows me to perform this operation on their lists. The operation is a web service call. I want to minimize the number of calls. You can assume the sorting program (the client) has a copy of the list in memory before starting, so there is no need to perform read operations against the service - only write. You can of course also do any amount of temporary list actions locally before performing the service calls, including duplicating the list or using existing .NET sort functions.
How do I proceed here? Is there a known algorithm I can use here?
Failed attempt:
I have already implemented a dumb algorithm, but it is not optimal for all cases. It works well when the list is like this:
List A = [2,3,4,5,6,7,8,9,1]
It goes like this:
Is list sorted? No
Find element that belongs at position 0: "1"
Move Element "1" to position 0
(New list state A1: [1,2,3,4,5,6,7,8,9])
Is list sorted? Yes. End
...But when the list is like this, I get into trouble:
List B = [9,1,2,3,4,5,6,7,8]
Is list sorted? No
Find element that belongs at position 0: "1"
Move Element "1" to position 0
(New list state B1: [1,9,2,3,4,5,6,7,8])
Is list sorted? No
Find element that belongs at position 1: "2"
Move Element "2" to position 1
(New list state B2: [1,2,9,3,4,5,6,7,8])
...you can see where I'm going here...
Compute the longest increasing subsequence of the array. Perform a write operation for each element that is not present in the sequence.
EDIT: Adding an example
Let the numbers in the input array be 1 3 2 7 4 8 6 5 9. A longest increasing sequence is 1 2 4 6 9. When computing this sequence store the indexes of the elements that occur in the sequence. Then it is straightforward to travel through the original array and find the elements not present in the sequence. In this case they are 3 7 8 5. For each of these elements perform a write operation that places them in the appropriate position. So the sequence of modifications of the array would be:
1 2 3 7 4 8 6 5 9 (after writing 3 to appropriate position)
1 2 3 4 8 6 7 5 9 (after writing 7 to appropriate position)
1 2 3 4 6 7 5 8 9 (after writing 8 to appropriate position)
1 2 3 4 5 6 7 8 9 (after writing 5 to appropriate position)
Sort the array locally. Keep a copy of the original.
Compute the optimal edit sequence between the original array and the sorted version using LCS distance; that's the variant of Levenshtein distance where substitutions are not allowed. A simplified version of the dynamic programming algorithm for Levenshtein can be used to compute LCS distance. Look at the source code for programs such as diff to see how the edit sequence can be obtained from the dynamic programming table.
You now have an edit sequence, meaning a list of insertions and deletions to be performed to transform the original array into the sorted version. Perform the insertions. (You can skip the deletions since they'll be performed by your operation O, but be aware that indices into the arrays change because of that, so you'll have to compensate for that.)
Search for the longest sorted subsequence, and shift each unsorted element into it's correct position.
For your examples:
start: 2,3,4,5,6,7,8,9,1 (O = 0)
LSS: 2,3,4,5,6,7,8,9
step: 1,2,3,4,5,6,7,8,9 (O = 1)
start: 9,1,2,3,4,5,6,7,8 (O = 0)
LSS: 1,2,3,4,5,6,7,8
step: 1,2,3,4,5,6,7,8,9 (O = 1)
One of mine:
start: 9,3,1,7,2,8,5,6,4 (O = 0)
LSS: 1,2,5,6
step: 3,1,7,2,8,5,6,9,4 (O = 1)
LSS: 1,2,5,6,9
step: 1,7,2,3,8,5,6,9,4 (O = 2)
LSS: 1,2,3,5,6,9
step: 1,2,3,8,5,6,7,9,4 (O = 3)
LSS: 1,2,3,5,6,7,9
step: 1,2,3,5,6,7,8,9,4 (O = 4)
LSS: 1,2,3,5,6,7,8,9
step: 1,2,3,4,5,6,7,8,9 (O = 5)
You'll need an algorithm to identify the LSS. You only need to use it once, after you have it, you can just insert elements into it as you sort.
Pseudocode:
function O(oldindex, newindex):
# removes oldindex from list, shifts elements, inserts at newindex
function lss(list):
# identifies the LSS of a list and returns it in a cheap temporary list
function insert(index, element, list):
# inserts specified specified element into specified index in specified list
# elements at and after specified index are shifted down to make room
function sort(input):
lss_temp_list = lss(input) # get lss of input list
do until lss == input:
old = any(index where (input[index] not in lss)# item in input; not in lss
# getting new index is uglier
nl = min(X where (X > input[old] and X in lss))# next lowest element in lss
nh = max(X where (X < input[old] and X in lss))# next highest element in lss
new = any(index # index of next lowest/highest
where ((input[index + 1] == nl and nl exists)
or (input[index + 1] == nh and nh exists))
O(old, new) # list shift
il = min(index where (lss[index] > input[new]))# index of next lowest in lss
ih = max(index where (lss[index] < input[new]))# index of next highest in lss
i = any(X where (X == il or X == (ih + 1))) # index to insert element
insert(i, input[new], lss) # add new element to lss
repeat
return input
Apologies for the wonky pseudocode style, I was trying to make it narrow enough to not make the code block need a scroll bar.

Algorithm to count the number of valid blocks in a permutation [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Finding sorted sub-sequences in a permutation
Given an array A which holds a permutation of 1,2,...,n. A sub-block A[i..j]
of an array A is called a valid block if all the numbers appearing in A[i..j]
are consecutive numbers (may not be in order).
Given an array A= [ 7 3 4 1 2 6 5 8] the valid blocks are [3 4], [1,2], [6,5],
[3 4 1 2], [3 4 1 2 6 5], [7 3 4 1 2 6 5], [7 3 4 1 2 6 5 8]
So the count for above permutation is 7.
Give an O( n log n) algorithm to count the number of valid blocks.
Ok, I am down to 1 rep because I put 200 bounty on a related question: Finding sorted sub-sequences in a permutation
so I cannot leave comments for a while.
I have an idea:
1) Locate all permutation groups. They are: (78), (34), (12), (65). Unlike in group theory, their order and position, and whether they are adjacent matters. So, a group (78) can be represented as a structure (7, 8, false), while (34) would be (3,4,true). I am using Python's notation for tuples, but it is actually might be better to use a whole class for the group. Here true or false means contiguous or not. Two groups are "adjacent" if (max(gp1) == min(gp2) + 1 or max(gp2) == min(gp1) + 1) and contigous(gp1) and contiguos(gp2). This is not the only condition, for union(gp1, gp2) to be contiguous, because (14) and (23) combine into (14) nicely. This is a great question for algo class homework, but a terrible one for interview. I suspect this is homework.
Just some thoughts:
At first sight, this sounds impossible: a fully sorted array would have O(n2) valid sub-blocks.
So, you would need to count more than one valid sub-block at a time. Checking the validity of a sub-block is O(n). Checking whether a sub-block is fully sorted is O(n) as well. A fully sorted sub-block contains n·(n - 1)/2 valid sub-blocks, which you can count without further breaking this sub-block up.
Now, the entire array is obviously always valid. For a divide-and-conquer approach, you would need to break this up. There are two conceivable breaking points: the location of the highest element, and that of the lowest element. If you break the array into two at one of these points, including the extremum in the part that contains the second-to-extreme element, there cannot be a valid sub-block crossing this break-point.
By always choosing the extremum that produces a more even split, this should work quite well (average O(n log n)) for "random" arrays. However, I can see problems when your input is something like (1 5 2 6 3 7 4 8), which seems to produce O(n2) behaviour. (1 4 7 2 5 8 3 6 9) would be similar (I hope you see the pattern). I currently see no trick to catch this kind of worse case, but it seems that it requires other splitting techniques.
This question does involve a bit of a "math trick" but it's fairly straight forward once you get it. However, the rest of my solution won't fit the O(n log n) criteria.
The math portion:
For any two consecutive numbers their sum is 2k+1 where k is the smallest element. For three it is 3k+3, 4 : 4k+6 and for N such numbers it is Nk + sum(1,N-1). Hence, you need two steps which can be done simultaneously:
Create the sum of all the sub-arrays.
Determine the smallest element of a sub-array.
The dynamic programming portion
Build two tables using the results of the previous row's entries to build each successive row's entries. Unfortunately, I'm totally wrong as this would still necessitate n^2 sub-array checks. Ugh!
My proposition
STEP = 2 // amount of examed number
B [0,0,0,0,0,0,0,0]
B [1,1,0,0,0,0,0,0]
VALID(A,B) - if not valid move one
B [0,1,1,0,0,0,0,0]
VALID(A,B) - if valid move one and step
B [0,0,0,1,1,0,0,0]
VALID (A,B)
B [0,0,0,0,0,1,1,0]
STEP = 3
B [1,1,1,0,0,0,0,0] not ok
B [0,1,1,1,0,0,0,0] ok
B [0,0,0,0,1,1,1,0] not ok
STEP = 4
B [1,1,1,1,0,0,0,0] not ok
B [0,1,1,1,1,0,0,0] ok
.....
CON <- 0
STEP <- 2
i <- 0
j <- 0
WHILE(STEP <= LEN(A)) DO
j <- STEP
WHILE(STEP <= LEN(A) - j) DO
IF(VALID(A,i,j)) DO
CON <- CON + 1
i <- j + 1
j <- j + STEP
ELSE
i <- i + 1
j <- j + 1
END
END
STEP <- STEP + 1
END
The valid method check that all elements are consecutive
Never tested but, might be ok
The original array doesn't contain duplicates so must itself be a consecutive block. Lets call this block (1 ~ n). We can test to see whether block (2 ~ n) is consecutive by checking if the first element is 1 or n which is O(1). Likewise we can test block (1 ~ n-1) by checking whether the last element is 1 or n.
I can't quite mould this into a solution that works but maybe it will help someone along...
Like everybody else, I'm just throwing this out ... it works for the single example below, but YMMV!
The idea is to count the number of illegal sub-blocks, and subtract this from the total possible number. We count the illegal ones by examining each array element in turn and ruling out sub-blocks that include the element but not its predecessor or successor.
Foreach i in [1,N], compute B[A[i]] = i.
Let Count = the total number of sub-blocks with length>1, which is N-choose-2 (one for each possible combination of starting and ending index).
Foreach i, consider A[i]. Ignoring edge cases, let x=A[i]-1, and let y=A[i]+1. A[i] cannot participate in any sub-block that does not include x or y. Let iX=B[x] and iY=B[y]. There are several cases to be treated independently here. The general case is that iX<i<iY<i. In this case, we can eliminate the sub-block A[iX+1 .. iY-1] and all intervening blocks containing i. There are (i - iX + 1) * (iY - i + 1) such sub-blocks, so call this number Eliminated. (Other cases left as an exercise for the reader, as are those edge cases.) Set Count = Count - Eliminated.
Return Count.
The total cost appears to be N * (cost of step 2) = O(N).
WRINKLE: In step 2, we must be careful not to eliminate each sub-interval more than once. We can accomplish this by only eliminating sub-intervals that lie fully or partly to the right of position i.
Example:
A = [1, 3, 2, 4]
B = [1, 3, 2, 4]
Initial count = (4*3)/2 = 6
i=1: A[i]=1, so need sub-blocks with 2 in them. We can eliminate [1,3] from consideration. Eliminated = 1, Count -> 5.
i=2: A[i]=3, so need sub-blocks with 2 or 4 in them. This rules out [1,3] but we already accounted for it when looking right from i=1. Eliminated = 0.
i=3: A[i] = 2, so need sub-blocks with [1] or [3] in them. We can eliminate [2,4] from consideration. Eliminated = 1, Count -> 4.
i=4: A[i] = 4, so we need sub-blocks with [3] in them. This rules out [2,4] but we already accounted for it when looking right from i=3. Eliminated = 0.
Final Count = 4, corresponding to the sub-blocks [1,3,2,4], [1,3,2], [3,2,4] and [3,2].
(This is an attempt to do this N.log(N) worst case. Unfortunately it's wrong -- it sometimes undercounts. It incorrectly assumes you can find all the blocks by looking at only adjacent pairs of smaller valid blocks. In fact you have to look at triplets, quadruples, etc, to get all the larger blocks.)
You do it with a struct that represents a subblock and a queue for subblocks.
struct
c_subblock
{
int index ; /* index into original array, head of subblock */
int width ; /* width of subblock > 0 */
int lo_value;
c_subblock * p_above ; /* null or subblock above with same index */
};
Alloc an array of subblocks the same size as the original array, and init each subblock to have exactly one item in it. Add them to the queue as you go. If you start with array [ 7 3 4 1 2 6 5 8 ] you will end up with a queue like this:
queue: ( [7,7] [3,3] [4,4] [1,1] [2,2] [6,6] [5,5] [8,8] )
The { index, width, lo_value, p_above } values for subbblock [7,7] will be { 0, 1, 7, null }.
Now it's easy. Forgive the c-ish pseudo-code.
loop {
c_subblock * const p_left = Pop subblock from queue.
int const right_index = p_left.index + p_left.width;
if ( right_index < length original array ) {
// Find adjacent subblock on the right.
// To do this you'll need the original array of length-1 subblocks.
c_subblock const * p_right = array_basic_subblocks[ right_index ];
do {
Check the left/right subblocks to see if the two merged are also a subblock.
If they are add a new merged subblock to the end of the queue.
p_right = p_right.p_above;
}
while ( p_right );
}
}
This will find them all I think. It's usually O(N log(N)), but it'll be O(N^2) for a fully sorted or anti-sorted list. I think there's an answer to this though -- when you build the original array of subblocks you look for sorted and anti-sorted sequences and add them as the base-level subblocks. If you are keeping a count increment it by (width * (width + 1))/2 for the base-level. That'll give you the count INCLUDING all the 1-length subblocks.
After that just use the loop above, popping and pushing the queue. If you're counting you'll have to have a multiplier on both the left and right subblocks and multiply these together to calculate the increment. The multiplier is the width of the leftmost (for p_left) or rightmost (for p_right) base-level subblock.
Hope this is clear and not too buggy. I'm just banging it out, so it may even be wrong.
[Later note. This doesn't work after all. See note below.]

Random number generator that fills an interval

How would you implement a random number generator that, given an interval, (randomly) generates all numbers in that interval, without any repetition?
It should consume as little time and memory as possible.
Example in a just-invented C#-ruby-ish pseudocode:
interval = new Interval(0,9)
rg = new RandomGenerator(interval);
count = interval.Count // equals 10
count.times.do{
print rg.GetNext() + " "
}
This should output something like :
1 4 3 2 7 5 0 9 8 6
Fill an array with the interval, and then shuffle it.
The standard way to shuffle an array of N elements is to pick a random number between 0 and N-1 (say R), and swap item[R] with item[N]. Then subtract one from N, and repeat until you reach N =1.
This has come up before. Try using a linear feedback shift register.
One suggestion, but it's memory intensive:
The generator builds a list of all numbers in the interval, then shuffles it.
A very efficient way to shuffle an array of numbers where each index is unique comes from image processing and is used when applying techniques like pixel-dissolve.
Basically you start with an ordered 2D array and then shift columns and rows. Those permutations are by the way easy to implement, you can even have one exact method that will yield the resulting value at x,y after n permutations.
The basic technique, described on a 3x3 grid:
1) Start with an ordered list, each number may exist only once
0 1 2
3 4 5
6 7 8
2) Pick a row/column you want to shuffle, advance it one step. In this case, i am shifting the second row one to the right.
0 1 2
5 3 4
6 7 8
3) Pick a row/column you want to shuffle... I suffle the second column one down.
0 7 2
5 1 4
6 3 8
4) Pick ... For instance, first row, one to the left.
2 0 7
5 1 4
6 3 8
You can repeat those steps as often as you want. You can always do this kind of transformation also on a 1D array. So your result would be now [2, 0, 7, 5, 1, 4, 6, 3, 8].
An occasionally useful alternative to the shuffle approach is to use a subscriptable set container. At each step, choose a random number 0 <= n < count. Extract the nth item from the set.
The main problem is that typical containers can't handle this efficiently. I have used it with bit-vectors, but it only works well if the largest possible member is reasonably small, due to the linear scanning of the bitvector needed to find the nth set bit.
99% of the time, the best approach is to shuffle as others have suggested.
EDIT
I missed the fact that a simple array is a good "set" data structure - don't ask me why, I've used it before. The "trick" is that you don't care whether the items in the array are sorted or not. At each step, you choose one randomly and extract it. To fill the empty slot (without having to shift an average half of your items one step down) you just move the current end item into the empty slot in constant time, then reduce the size of the array by one.
For example...
class remaining_items_queue
{
private:
std::vector<int> m_Items;
public:
...
bool Extract (int &p_Item); // return false if items already exhausted
};
bool remaining_items_queue::Extract (int &p_Item)
{
if (m_Items.size () == 0) return false;
int l_Random = Random_Num (m_Items.size ());
// Random_Num written to give 0 <= result < parameter
p_Item = m_Items [l_Random];
m_Items [l_Random] = m_Items.back ();
m_Items.pop_back ();
}
The trick is to get a random number generator that gives (with a reasonably even distribution) numbers in the range 0 to n-1 where n is potentially different each time. Most standard random generators give a fixed range. Although the following DOESN'T give an even distribution, it is often good enough...
int Random_Num (int p)
{
return (std::rand () % p);
}
std::rand returns random values in the range 0 <= x < RAND_MAX, where RAND_MAX is implementation defined.
Take all numbers in the interval, put them to list/array
Shuffle the list/array
Loop over the list/array
One way is to generate an ordered list (0-9) in your example.
Then use the random function to select an item from the list. Remove the item from the original list and add it to the tail of new one.
The process is finished when the original list is empty.
Output the new list.
You can use a linear congruential generator with parameters chosen randomly but so that it generates the full period. You need to be careful, because the quality of the random numbers may be bad, depending on the parameters.

Resources