In random draw: how to insure that a value is not re-drawn too soon - algorithm

When drawing in random from a set of values in succession, where a drawn value is allowed to
be drawn again, a given value has (of course) a small chance of being drawn twice (or more) in immediate succession, but that causes an issue (for the purposes of a given application) and we would like to eliminate this chance. Any algorithmic ideas on how to do so (simple/efficient)?
Ideally we would like to set a threshold say as a percentage of the size of the data set:
Say the size of the set of values N=100, and the threshold T=10%, then if a given value is drawn in the current draw, it is guaranteed not to show up again in the next N*T=10 draws.
Obviously this restriction introduces bias in the random selection. We don't mind that a
proposed algorithm introduces further bias into the randomness of selection, what really
matters for this application is that the selection is just random enough to appear so
for a human observer.
As an implementation detail, the values are stored as database records, so database table flags/values can be used, or maybe external memory structures. Answers about the abstract case are welcome too.
Edit:
I just hit this other SO question here, which has good overlap with my own. Going through the good points there.

Here's an implementation that does the whole process in O(1) (for a single element) without any bias:
The idea is to treat the last K elements in the array A (which contains all the values) like a queue, we draw a value from the first N-k values in A, which is the random value, and swap it with an element in position N-Pointer, when Pointer represents the head of the queue, and it resets to 1 when it crosses K elements.
To eliminate any bias in the first K draws, the random value will be drawn between 1 and N-Pointer instead of N-k, so this virtual queue is growing in size at each draw until reaching the size of K (e.g. after 3 draws the number of possible values appear in A between indexes 1 and N-3, and the suspended values appear in indexes N-2 to N.
All operations are O(1) for drawing a single elemnent and there's no bias throughout the entire process.
void DrawNumbers(val[] A, int K)
{
N = A.size;
random Rnd = new random;
int Drawn_Index;
int Count_To_K = 1;
int Pointer = K;
while (stop_drawing_condition)
{
if (Count_To_K <= K)
{
Drawn_Index = Rnd.NextInteger(1, N-Pointer);
Count_To_K++;
}
else
{
Drawn_Index = Rnd.NextInteger(1, N-K)
}
Print("drawn value is: " + A[Drawn_Index])
Swap(A[Drawn_Index], A[N-Pointer])
Pointer--;
if (Pointer < 1) Pointer = K;
}
}
My previous suggestion, by using a list and an actual queue, is dependent on the remove method of the list, which I believe can be at best O(logN) by using an array to implement a self balancing binary tree, as the list has to have direct access to indexes.
void DrawNumbers(list N, int K)
{
queue Suspended_Values = new queue;
random Rnd = new random;
int Drawn_Index;
while (stop_drawing_condition)
{
if (Suspended_Values.count == K)
N.add(Suspended_Value.Dequeue());
Drawn_Index = Rnd.NextInteger(1, N.size) // random integer between 1 and the number of values in N
Print("drawn value is: " + N[Drawn_Index]);
Suspended_Values.Enqueue(N[Drawn_Index]);
N.Remove(Drawn_Index);
}
}

I assume you have an array, A, that contains the items you want to draw. At each time period you randomly select an item from A.
You want to prevent any given item, i, from being drawn again within some k iterations.
Let's say that your threshold is 10% of A.
So create a queue, call it drawn, that can hold threshold items. Also create a hash table that contains the drawn items. Call the hash table hash.
Then:
do
{
i = Get random item from A
if (i in hash)
{
// we have drawn this item recently. Don't draw it.
continue;
}
draw(i);
if (drawn.count == k)
{
// remove oldest item from queue
temp = drawn.dequeue();
// and from the hash table
hash.remove(temp);
}
// add new item to queue and hash table
drawn.enqueue(i);
hash.add(i);
} while (forever);
The hash table exists solely to increase lookup speed. You could do without the hash table if you're willing to do a sequential search of the queue to determine if an item has been drawn recently.

Say you have n items in your list, and you don't want any of the k last items to be selected.
Select at random from an array of size n-k, and use a queue of size k to stick the items you don't want to draw (adding to the front and removing from the back).
All operations are O(1).
---- clarification ----
Give n items, and a goal of not redrawing any of the last k draws, create an array and queue as follows.
Create an array A of size n-k, and put n-k of your items in the list (chosen at random, or seeded however you like).
Create a queue (linked list) Q and populate it with the remaining k items, again in random order or whatever order you like.
Now, each time you want to select an item at random:
Choose a random index from your array, call this i.
Give A[i] to whomever is asking for it, and add it to the front of Q.
Remove the element from the back of Q, and store it in A[i].
Everything is O(1) after the array and linked list are created, which is a one-time O(n) operation.
Now, you might wonder, what do we do if we want to change n (i.e. add or remove an element).
Each time we add an element, we either want to grow the size of A or of Q, depending on our logic for deciding what k is (i.e. fixed value, fixed fraction of n, whatever...).
If Q increases then the result is trivial, we just append the new element to Q. In this case I'd probably append it to the end of Q so that it gets in play ASAP. You could also put it in A, kicking some element out of A and appending it to the end of Q.
If A increases, you can use a standard technique for increasing arrays in amortized constant time. E.g., each time A fills up, we double it in size, and keep track of the number of cells of A that are live. (look up 'Dynamic Arrays' in Wikipedia if this is unfamiliar).

Set-based approach:
If the threshold is low (say below 40%), the suggested approach is:
Have a set and a queue of the last N*T generated values.
When generating a value, keep regenerating it until it's not contained in the set.
When pushing to the queue, pop the oldest value and remove it from the set.
Pseudo-code:
generateNextValue:
// once we're generated more than N*T elements,
// we need to start removing old elements
if queue.size >= N*T
element = queue.pop
set.remove(element)
// keep trying to generate random values until it's not contained in the set
do
value = getRandomValue()
while set.contains(value)
set.add(value)
queue.push(value)
return value
If the threshold is high, you can just turn the above on its head:
Have the set represent all values not in the last N*T generated values.
Invert all set operations (replace all set adds with removes and vice versa and replace the contains with !contains).
Pseudo-code:
generateNextValue:
if queue.size >= N*T
element = queue.pop
set.add(element)
// we can now just get a random value from the set, as it contains all candidates,
// rather than generating random values until we find one that works
value = getRandomValueFromSet()
//do
// value = getRandomValue()
//while !set.contains(value)
set.remove(value)
queue.push(value)
return value
Shuffled-based approach: (somewhat more complicated that the above)
If the threshold is a high, the above may take long, as it could keep generating values that already exists.
In this case, some shuffle-based approach may be a better idea.
Shuffle the data.
Repeatedly process the first element.
When doing so, remove it and insert it back at a random position in the range [N*T, N].
Example:
Let's say N*T = 5 and all possible values are [1,2,3,4,5,6,7,8,9,10].
Then we first shuffle, giving us, let's say, [4,3,8,9,2,6,7,1,10,5].
Then we remove 4 and insert it back in some index in the range [5,10] (say at index 5).
Then we have [3,8,9,2,4,6,7,1,10,5].
And continue removing the next element and insert it back, as required.
Implementation:
An array is fine if we don't care about efficient a whole lot - to get one element will cost O(n) time.
To make this efficient we need to use an ordered data structure that supports efficient random position inserts and first position removals. The first thing that comes to mind is a (self-balancing) binary search tree, ordered by index.
We won't be storing the actual index, the index will be implicitly defined by the structure of the tree.
At each node we will have a count of children (+ 1 for itself) (which needs to be updated on insert / remove).
An insert can be done as follows: (ignoring the self-balancing part for the moment)
// calling function
insert(node, value)
insert(node, N*T, value)
insert(node, offset, value)
// node.left / node.right can be defined as 0 if the child doesn't exist
leftCount = node.left.count - offset
rightCount = node.right.count
// Since we're here, it means we're inserting in this subtree,
// thus update the count
node.count++
// Nodes to the left are within N*T, so simply go right
// leftCount is the difference between N*T and the number of nodes on the left,
// so this needs to be the new offset (and +1 for the current node)
if leftCount < 0
insert(node.right, -leftCount+1, value)
else
// generate a random number,
// on [0, leftCount), insert to the left
// on [leftCount, leftCount], insert at the current node
// on (leftCount, leftCount + rightCount], insert to the right
sum = leftCount + rightCount + 1
random = getRandomNumberInRange(0, sum)
if random < leftCount
insert(node.left, offset, value)
else if random == leftCount
// we don't actually want to update the count here
node.count--
newNode = new Node(value)
newNode.count = node.count + 1
// TODO: swap node and newNode's data so that node's parent will now point to newNode
newNode.right = node
newNode.left = null
else
insert(node.right, -leftCount+1, value)
To visualize inserting at the current node:
If we have something like:
4
/
1
/ \
2 3
And we want to insert 5 where 1 is now, it will do this:
4
/
5
\
1
/ \
2 3
Note that when a red-black tree, for example, performs operations to keep itself balanced, none of these involve comparisons, so it doesn't need to know the order (i.e. index) of any already-inserted elements. But it will have to update the counts appropriately.
The overall efficiency will be O(log n) to get one element.

I'd put all "values" into a "list" of size N, then shuffle the list and retrieve values from the top of the list. Then you "insert" the retrieved value at a random position with any index >= N*T.
Unfortunately I'm not truly a math-guy :( So I simply tried it (in VB, so please take it as pseudocode ;) )
Public Class BiasedRandom
Private prng As New Random
Private offset As Integer
Private l As New List(Of Integer)
Public Sub New(ByVal size As Integer, ByVal threshold As Double)
If threshold <= 0 OrElse threshold >= 1 OrElse size < 1 Then Throw New System.ArgumentException("Check your params!")
offset = size * threshold
' initial fill
For i = 0 To size - 1
l.Add(i)
Next
' shuffle "Algorithm p"
For i = size - 1 To 1 Step -1
Dim j = prng.Next(0, i + 1)
Dim tmp = l(i)
l(i) = l(j)
l(j) = tmp
Next
End Sub
Public Function NextValue() As Integer
Dim tmp = l(0)
l.RemoveAt(0)
l.Insert(prng.Next(offset, l.Count + 1), tmp)
Return tmp
End Function
End Class
Then a simple check:
Public Class Form1
Dim z As Integer = 10
Dim k As BiasedRandom
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
k = New BiasedRandom(z, 0.5)
End Sub
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim j(z - 1)
For i = 1 To 10 * 1000 * 1000
j(k.NextValue) += 1
Next
Stop
End Sub
End Class
And when I check out the distribution it looks okay enough for an unarmed eye ;)
EDIT:
After thinking about RonTeller's argumentation, I have to admit that he is right. I don't think that there is a performance friendly way to achieve the wanted and to pertain a good (not more biased than required) random order.
I came to the follwoing idea:
Given a list (array whatever) like this:
0123456789 ' not shuffled to make clear what I mean
We return the first element which is 0. This one must not come up again for 4 (as an example) more draws but we also want to avoid a strong bias. Why not simply put it to the end of the list and then shuffle the "tail" of the list, i.e. the last 6 elements?
1234695807
We now return the 1 and repeat the above steps.
2340519786
And so on and so on. Since removing and inserting is kind of unnecessary work, one could use a simple array and a "pointer" to the actual element. I have changed the code from above to give a sample. It's slower than the first one, but should avoid the mentioned bias.
Public Function NextValue() As Integer
Static current As Integer = 0
' only shuffling a part of the list
For i = current + l.Count - 1 To current + 1 + offset Step -1
Dim j = prng.Next(current + offset, i + 1)
Dim tmp = l(i Mod l.Count)
l(i Mod l.Count) = l(j Mod l.Count)
l(j Mod l.Count) = tmp
Next
current += 1
Return l((current - 1) Mod l.Count)
End Function
EDIT 2:
Finally (hopefully), I think the solution is quite simple. The below code assumes that there is an array of N elements called TheArray which contains the elements in random order (could be rewritten to work with sorted array). The value DelaySize determines how long a value should be suspended after it has been drawn.
Public Function NextValue() As Integer
Static current As Integer = 0
Dim SelectIndex As Integer = prng.Next(0, TheArray.Count - DelaySize)
Dim ReturnValue = TheArray(SelectIndex)
TheArray(SelectIndex) = TheArray(TheArray.Count - 1 - current Mod DelaySize)
TheArray(TheArray.Count - 1 - current Mod DelaySize) = ReturnValue
current += 1
Return ReturnValue
End Function

Related

Divide an odd size array into into two equal sets of same size and same sum after deleting any one element from the array

Given an array of odd size. You have to delete any one element from the array and then find whether it is possible to divide the remaining even size array into two sets of equal size and having same sum of their elements. It is mandatory to remove any one element from the array.
So Here I am assuming that it is necessary to remove 1 element from the array.
Please look at the code snippet below.
int solve(int idx, int s, int cntr, int val) {
if(idx == n)
if(cntr != 1)
return INT_MAX;
else
return abs((sum-val)-2*s);
int ans = INT_MAX;
if(cntr == 0)
ans = min(ans, solve(idx+1, s, cntr+1, arr[idx]));
else
ans = min(ans, min(solve(idx+1,s+arr[idx], cntr, val), solve(idx+1, s, cntr, val)));
return ans;
}
Here sum is the total sum of original array,
val is the
value of the element at any position which u want to delete, and cntr to keep track whether any value is removed from the array or not.
So the algo goes like this.
Forget that you need to delete any value, Then the problem becomes whether is it possible to divide the array into 2 equi-sum halves. Now we can think of this problem such as divide the array into 2 parts such that abs(sum-2*sum_of_any_half_part) is minimized. So With this idea Lets say I initially have a bucket s which can be the part of array which we are concerned about. So at each step we can either put any element into this part or leave it for the other part.
Now if we introduce the deletion part in to this problem, its just one small changes which is required. Now at each step instead of 2 you have 3 options.
To delete this particular element and then increase the cntr to 1 and the val to the value of the element at that index in the array.
don't do any thing with this element. This is equal to putting this element into other bucket/half
put this element into bucket s, i.e. increase value of s by arr[idx];
Now recursively check which gives the best result.
P.S. Look at the base case in the code snippet to have better idea.
In the end if the above solve function gives ans = 0 then that means yes we can divide the array into 2 equi-sum parts after deleting any element.
Hope this helps.

Algorithm to find matching real values in a list

I have a complex algorithm which calculates the result of a function f(x). In the real world f(x) is a continuous function. However due to rounding errors in the algorithm this is not the case in the computer program. The following diagram gives an example:
Furthermore I have a list of several thousands values Fi.
I am looking for all the x values which meet an Fi value i.e. f(xi)=Fi
I can solve this problem with by simply iterating through the x values like in the following pseudo code:
for i=0 to NumberOfChecks-1 do
begin
//calculate the function result with the algorithm
x=i*(xmax-xmin)/NumberOfChecks;
FunctionResult=CalculateFunctionResultWithAlgorithm(x);
//loop through the value list to see if the function result matches a value in the list
for j=0 to NumberOfValuesInTheList-1 do
begin
if Abs(FunctionResult-ListValues[j])<Epsilon then
begin
//mark that element j of the list matches
//and store the corresponding x value in the list
end
end
end
Of course it is necessary to use a high number of checks. Otherwise I will miss some x values. The higher the number of checks the more complete and accurate is the result. It is acceptable that the list is 90% or 95% complete.
The problem is that this brute force approach takes too much time. As I mentioned before the algorithm for f(x) is quite complex and with a high number of checks it takes too much time.
What would be a better solution for this problem?
Another way to do this is in two parts: generate all of the results, sort them, and then merge with the sorted list of existing results.
First step is to compute all of the results and save them along with the x value that generated them. That is:
results = list of <x, result>
for i = 0 to numberOfChecks
//calculate the function result with the algorithm
x=i*(xmax-xmin)/NumberOfChecks;
FunctionResult=CalculateFunctionResultWithAlgorithm(x);
results.Add(x, FunctionResult)
end for
Now, sort the results list by FunctionResult, and also sort the FunctionResult-ListValues array by result.
You now have two sorted lists that you can move through linearly:
i = 0, j = 0;
while (i < results.length && j < ListValues.length)
{
diff = ListValues[j] - results[i];
if (Abs(diff) < Episilon)
{
// mark this one with the x value
// and move to the next result
i = i + 1
}
else if (diff > 0)
{
// list value is much larger than result. Move to next result.
i = i + 1
}
else
{
// list value is much smaller than result. Move to next list value.
j = j + 1
}
}
Sort the list, producing an array SortedListValues that contains
the sorted ListValues and an array SortedListValueIndices that
contains the index in the original array of each entry in
SortedListValues. You only actually need the second of these and
you can create both of them with a single sort by sorting an array
of tuples of (value, index) using value as the sort key.
Iterate over your range in 0..NumberOfChecks-1 and compute the
value of the function at each step, and then use a binary chop
method to search for it in the sorted list.
Pseudo-code:
// sort as described above
SortedListValueIndices = sortIndices(ListValues);
for i=0 to NumberOfChecks-1 do
begin
//calculate the function result with the algorithm
x=i*(xmax-xmin)/NumberOfChecks;
FunctionResult=CalculateFunctionResultWithAlgorithm(x);
// do a binary chop to find the closest element in the list
highIndex = NumberOfValuesInTheList-1;
lowIndex = 0;
while true do
begin
if Abs(FunctionResult-ListValues[SortedListValueIndices[lowIndex]])<Epsilon then
begin
// find all elements in the range that match, breaking out
// of the loop as soon as one doesn't
for j=lowIndex to NumberOfValuesInTheList-1 do
begin
if Abs(FunctionResult-ListValues[SortedListValueIndices[j]])>=Epsilon then
break
//mark that element SortedListValueIndices[j] of the list matches
//and store the corresponding x value in the list
end
// break out of the binary chop loop
break
end
// break out of the loop once the indices match
if highIndex <= lowIndex then
break
// do the binary chop searching, adjusting the indices:
middleIndex = (lowIndex + 1 + highIndex) / 2;
if ListValues[SortedListValueIndices[middleIndex] < FunctionResult then
lowIndex = middleIndex;
else
begin
highIndex = middleIndex;
lowIndex = lowIndex + 1;
end
end
end
Possible complications:
The binary chop isn't taking the epsilon into account. Depending on
your data this may or may not be an issue. If it is acceptable that
the list is only 90 or 95% complete this might be ok. If not then
you'll need to widen the range to take it into account.
I've assumed you want to be able to match multiple x values for each FunctionResult. If that's not necessary you can simplify the code.
Naturally this depends very much on the data, and especially on the numeric distribution of Fi. Another problem is that the f(x) looks very jumpy, eliminating the concept of "assumption of nearby value".
But one could optimise the search.
Picture below.
Walking through F(x) at sufficient granularity, define a rough min
(red line) and max (green line), using suitable tolerance (the "air"
or "gap" in between). The area between min and max is "AREA".
See where each Fi-value hits AREA, do a stacked marking ("MARKING") at X-axis accordingly (can be multiple segments of X).
Where lots of MARKINGs at top of each other (higher sum - the vertical black "sum" arrows), do dense hit tests, hence increasing the overall
chance to get as many hits as possible. Elsewhere do more sparse tests.
Tighten this schema (decrease tolerance) as much as you dare.
EDIT: Fi is a bit confusing. Is it an ordered array or does it have random order (as i assumed)?
Jim Mischel's solution would work in a O(i+j) instead of the O(i*j) solution that you currently have. But, there is a (very) minor bug in his code. The correct code would be :
diff = ListValues[j] - results[i]; //no abs() here
if (abs(diff) < Episilon) //add abs() here
{
// mark this one with the x value
// and move to the next result
i = i + 1
}
the best methods will relay on the nature of your function f(x).
The best solution is if you can create the reversing to F(x) and use it
as you said F(x) is continuous:
therefore you can start evaluating small amount of far points, then find ranges that makes sense, and refine your "assumption" for x that f(x)=Fi
it is not bullet proof, but it is an option.
e.g. Fi=5.7; f(1)=1.4 ,f(4)=4,f(16)=12.6, f(10)=10.1, f(7)=6.5, f(5)=5.1, f(6)=5.8, you can take 5 < x < 7
on the same line as #1, and IF F(x) is hard to calculate, you can use Interpolation, and then evaluate F(x) only at the values that are probable.

Number of distinct sequences of fixed length which can be generated using a given set of numbers

I am trying to find different sequences of fixed length which can be generated using the numbers from a given set (distinct elements) such that each element from set should appear in the sequence. Below is my logic:
eg. Let the set consists of S elements, and we have to generate sequences of length K (K >= S)
1) First we have to choose S places out of K and place each element from the set in random order. So, C(K,S)*S!
2) After that, remaining places can be filled from any values from the set. So, the factor
(K-S)^S should be multiplied.
So, overall result is
C(K,S)S!((K-S)^S)
But, I am getting wrong answer. Please help.
PS: C(K,S) : No. of ways selecting S elements out of K elements (K>=S) irrespective of order. Also, ^ : power symbol i.e 2^3 = 8.
Here is my code in python:
# m is the no. of element to select from a set of n elements
# fact is a list containing factorial values i.e. fact[0] = 1, fact[3] = 6& so on.
def ways(m,n):
res = fact[n]/fact[n-m+1]*((n-m)**m)
return res
What you are looking for is the number of surjective functions whose domain is a set of K elements (the K positions that we are filling out in the output sequence) and the image is a set of S elements (your input set). I think this should work:
static int Count(int K, int S)
{
int sum = 0;
for (int i = 1; i <= S; i++)
{
sum += Pow(-1, (S-i)) * Fact(S) / (Fact(i) * Fact(S - i)) * Pow(i, K);
}
return sum;
}
...where Pow and Fact are what you would expect.
Check out this this math.se question.
Here's why your approach won't work. I didn't check the code, just your explanation of the logic behind it, but I'm pretty sure I understand what you're trying to do. Let's take for example K = 4, S = {7,8,9}. Let's examine the sequence 7,8,9,7. It is a unique sequence, but you can get to it by:
Randomly choosing positions 1,2,3, filling them randomly with 7,8,9 (your step 1), then randomly choosing 7 for the remaining position 4 (your step 2).
Randomly choosing positions 2,3,4, filling them randomly with 8,9,7 (your step 1), then randomly choosing 7 for the remaining position 1 (your step 2).
By your logic, you will count it both ways, even though it should be counted only once as the end result is the same. And so on...

Data structure for set of (non-disjoint) sets

I'm looking for a data structure that roughly corresponds to (in Java terms) Map<Set<int>, double>. Essentially a set of sets of labeled marbles, where each set of marbles is associated with a scalar. I want it to be able to efficiently handle the following operations:
Add a given integer to every set.
Remove every set that contains (or does not contain) a given integer, or at least set the associated double to 0.
Union two of the maps, adding together the doubles for sets that appear in both.
Multiply all of the doubles by a given double.
Rarely, iterate over the entire map.
under the following conditions:
The integers will fall within a constrained range (between 1 and 10,000 or so); the exact range will be known at compile-time.
Most of the integers within the range (80-90%) will never be used, but which ones will not be easily determinable until the end of the calculation.
The number of integers used will almost always still be over 100.
Many of the sets will be very similar, differing only by a few elements.
It may be possible to identify certain groups of integers that frequently appear only in sequential order: for example, if a set contains the integers 27 and 29 then it (almost?) certainly contains 28 as well.
It may be possible to identify these groups prior to running the calculation.
These groups would typically have 100 or so integers.
I've considered tries, but I don't see a good way to handle the "remove every set that contains a given integer" operation.
The purpose of this data structure would be to represent discrete random variables and permit addition, multiplication, and scalar multiplication operations on them. Each of these discrete random variables would ultimately have been created by applying these operations to a fixed (at compile-time) set of independent Bernoulli random variables (i.e. each takes the value 1 or 0 with some probability).
The systems being modeled are close to being representable as a time-inhomogeneous Markov chains (which would of course simplify this immensely) but, unfortunately, it is essential to track the duration since various transitions.
Here's a data structure, that can do all of your operations pretty efficiently:
I'm going to refer to it as a BitmapArray for this explanation.
Thinking about it, apparently for just the operations you have described a sorted array with bitmaps as keys and weights(your doubles) as values will be pretty efficient.
The bitmaps are what maintain membership in your set. Since you said the range of integers in the set are between 1-10,000, we can maintain information about any set with a bitmap of length 10,000.
It's gonna be tough sorting an array where the keys can be as big as 2^10000, but you can be smart about implementing the comparison function in the following way:
Iterate from left to right on the two bitmaps
XOR the bits on each index
Say you get a 1 at ith position
Whichever bitmap has 1 at ith position is greater
If you never get a 1 they're equal
I know this is still a slow comparison.
But not too slow, Here's a benchmark fiddle I did on bitmaps with length 10000.
This is in Javascript, if you're going to write in Java, it's going to perform even better.
function runTest() {
var num = document.getElementById("txtValue").value;
num = isNaN(num * 1) ? 0 : num * 1;
/*For integers in the range 1-10,000 the worst case for comparison are any equal integers which will cause the comparision to iterate over the whole BitArray*/
bitmap1 = convertToBitmap(10000, num);
bitmap2 = convertToBitmap(10000, num);
before = new Date().getMilliseconds();
var result = firstIsGreater(bitmap1, bitmap2, 10000);
after = new Date().getMilliseconds();
alert(result + " in time: " + (after-before) + " ms");
}
function convertToBitmap(size, number) {
var bits = new Array();
var q = number;
do {
bits.push(q % 2);
q = Math.floor(q / 2);
} while (q > 0);
xbitArray = new Array();
for (var i = 0; i < size; i++) {
xbitArray.push(0);
}
var j = xbitArray.length - 1;
for (var i = bits.length - 1; i >= 0; i--) {
xbitArray[j] = bits[i];
j--
}
return xbitArray;
}
function firstIsGreater(bitArray1, bitArray2, lengthOfArrays) {
for (var i = 0; i < lengthOfArrays; i++) {
if (bitArray1[i] ^ bitArray2[i]) {
if (bitArray1[i]) return true;
else return false;
}
}
return false;
}
document.getElementById("btnTest").onclick = function (e) {
runTest();
};
Also, remember that you only have to do this once, when building your BitmapArray (or while taking unions) and then it's going to become pretty efficient for the operations you'd do most often:
Note: N is the length of the BitmapArray.
Add integer to every set: Worst/best case O(N) time. Flip a 0 to 1 in each bitmap.
Remove every set that contains a given integer: Worst case O(N) time.
For each bitmap check the bit that represents the given integer, if 1 mark it's index.
Compress the array by deleting all marked indices.
If you're okay with just setting the weights to 0 it'll be even more efficient. This also makes it very easy if you want to remove all sets that have any element in a given set.
Union of two maps: Worst case O(N1+N2) time. Just like merging two sorted arrays, except you have to be smart about comparisons once more.
Multiply all of the doubles by a given double: Worst/best case O(N) time. Iterate and multiply each value by the input double.
Iterate over the BitmapArray: Worst/best case O(1) time for next element.

Algorithm to keep collection sorted while inserting in the middle

Let's say I have a large collection of elements.
Each element has a "position" field, which is a positive integer.
No two elements have the same value for the field "position".
The only supported operation of the collection is: addElement(newElement, positionAfterElement), where:
- newElement is the new element to be added (its position is unknown for now)
- positionAfterElement is an existing element of the collection.
The function will guarantee that:
- position(positionAfterElement) < position(newElement)
- no other element in the collection has a position between position(positionAfterElement) and position(newElement)
I can change the value of all the element positions as I wish but I want to minimize the number of changes (on average).
How should I implement the addElement function?
I could just push all the elements with higher positions by 1 but I am pretty sure there must be a better way to do this.
Thanks all for your help.
Use a balanced tree. At every node of the tree, keep a count of the number of items below it (left.count + right.count + 1). Then, you can compute the index of an item easily while traversing to it. This is O(n log n) time in the number of operations.
Here's a basic idea:
expected_number_of_elements = 10^6
spread_factor = 100
first element gets position = spread_factor * expected_number_of_element
each following element inserted:
if its inserted in last position, give it the last element's position + spread_factor
if its inserted in the first position, give it the first element's position - spread_factor
otherwise, put it in the middle between its 2 closest neighbors
if you don't have any space left: expand_the_array
expand_the_array:
spread_factor = spread_factor * 10
iterate over all the elements, and multiply position by 10.
expanding the array is an expensive operation, but since it multiplies the size of the array, on average (assuming your input is random, and not crafted by an adversary) you'll have to do this operation very rarely.
the major drawback of this solution, is that you'll have to watch out for int overflow....
OK, so here is what I implemented, in pseudo-code:
addElement(newElement, positionAfterElement):
p0 <- positionOf(positionAfterElement)
c <- 5
finished <- false
while (not finished):
search for all elements which positions are in the range [p0 + 1, p0 + c]
if there are less than c elements in that range: // the range is not full
adjust position of elements in the range, and position of newElement,
so that
- elements are correctly ordered
- element positions are "evenly spread out" within the range
finished <- true
else: // the range is full
c <- c * 2
end while
end addElement

Resources