How to solve MaxCounters - Coditility with Golang - algorithm

You are given an array of counters N initiated with zero.
You have a list of actions A to perform on N array.
each action is an int x
ie A = [1,5,3]
for each k in A as x actions
if x <= len(N) then increase N[i-1] by one
else set all N items with max of max value of N
you should return the counters array after the last action
Exercise Link

The first and simpler solution
wont pass 100% because of time complexity
create counters array with all 0 in size of len(A)
for each idx,action in A
if action <= len(N)
counters[idx-1]++
else
maxVal = max(counters)
now set max maxVal to all counters items
return counters
small improvement, store maxVal variable in the top and update it with each increase of the counter items. Then when you need to set all items, you dont need to find max value.
Better solution
that passes the tests 100%
We will still go through actions, and we store the max value and another new variable forcedVal. we will not update the whole counters array each time we need but only update the forcedVal with the max. Then when ever we need to ++ an item we will first check if it is smaller then forcedVal and if so we will give it forcedVal value before the ++.
create counters array with all 0 in size of len(A)
create max var with 0
create forcedVal var with 0
for each idx,action in A
if action <= len(N)
cur = counters[idx-1]
if cur < forceVal ? cur = forceVal
cur++
if cur > max
max = cur
counters[idx-1] = cur
else
forcedVal = max
here we do a single loop to set our counters
for each cnt in counters
if cnt < forcedVal ? cnt = forcedVal
return counters
this is how it will look with this example
Solution(5, [3, 4, 4, 6, 1, 4, 4])
| c[0] | c[1] | c[2] | c[3] | c[4] | action | max | forcedVal |
|------|------|------|------|------|--------|-----|-----------|
| 0 | 0 | 0 | 0 | 0 | null | 0 | 0 |
| 0 | 0 | 1 | 0 | 0 | 3 | 1 | 0 |
| 0 | 0 | 1 | 1 | 0 | 4 | 1 | 0 |
| 0 | 0 | 1 | 2 | 0 | 4 | 2 | 0 |
| 0 | 0 | 1 | 2 | 0 | 6 | 2 | 2 |
| 3 | 0 | 1 | 2 | 0 | 1 | 3 | 2 |
| 3 | 0 | 1 | 3 | 0 | 4 | 3 | 2 |
| 3 | 0 | 1 | 4 | 0 | 4 | 4 | 2 |
as you can see above after all we have left with:
3, 0, 1, 4, 0
we will our final cleanup loop to set all items to at least the value of forcedVal and we get
3, 2, 2, 4, 2
func Solution(N int, A []int) []int {
counters := make([]int, N)
var max int
var forcedVal int
for _, v := range A {
if v > N {
forcedVal = max
} else {
cur := counters[v-1]
if cur < forcedVal {
cur = forcedVal
}
cur++
if cur > max {
max = cur
}
counters[v-1] = cur
}
}
for i := range counters {
if counters[i] < forcedVal {
counters[i] = forcedVal
}
}
return counters
}

Related

noMethodError on a < symbol ruby

enter image description here
The link for the error is above; I keep getting a noMethodError for a symbol and I am very confused
numList = [4,3,78,2,0,2]
def bubbleSort(list)
again = true
while again
copy = list
i = 0
until i == list.length do
list.each_with_index do |num, index|
if list[index+1] < num
tempNum = list[index+1]
list[index+1] = num
list[index] = tempNum
end
end
i = i+1
end
if copy != list
again = true
else
again = false
end
p copy
end
end
bubbleSort(numList)
As noted in comments, you are iterating with an index:
list.each_with_index do |num, index|
But this includes the last index, which you're trying to access one past.
if list[index+1] < num
list[index+1] return nil, which will give the error you're seeing.
irb(main):083:0> nil < 42
Traceback (most recent call last):
5: from C:/Ruby26-x64/bin/irb.cmd:31:in `<main>'
4: from C:/Ruby26-x64/bin/irb.cmd:31:in `load'
3: from C:/Ruby26-x64/lib/ruby/gems/2.6.0/gems/irb-1.0.0/exe/irb:11:in `<top (required)>'
2: from (irb):83
1: from (irb):83:in `rescue in irb_binding'
NoMethodError (undefined method `<' for nil:NilClass)
It'd be important to avoid this access. I would also suggest avoiding the again boolean flag. Rather use loop and break out of the loop at the appropriate time.
def bubbleSort(list)
loop do
copy = list
i = 0
until i == list.length
list.each_with_index do |num, index|
next if index == list.length - 1
if list[index+1] < num
tempNum = list[index+1]
list[index+1] = num
list[index] = tempNum
end
end
i = i+1
end
break if copy == list
p copy
end
end
Taking a step back and looking at how bubble sort works though, your implementation can be improved.
Each cycle through the array a bubble sort will advance the largest element to the end of the array. You don't need to cycle over the whole array each times, but can decrease the inner loop by one iteration each time.
You can also track the number of swaps. If no swaps are performed, the array is clearly sorted already, and the outer loop can be exited.
+---+---+---+---+
| 4 | 2 | 3 | 1 | (starting state for loop)
+---+---+---+---+
>---<
+---+---+---+---+
| 2 | 4 | 3 | 1 | (swap 4 and 2)
+---+---+---+---+
>---<
+---+---+---+---+
| 2 | 3 | 4 | 1 | (swap 4 and 3)
+---+---+---+---+
>---<
+---+---+---+---+
| 2 | 3 | 1 | 4 | (swap 4 and 1)
+---+---+---+---+
Three swaps are performed. The last position in the array is sorted.
+---+---+---+---+
| 2 | 3 | 1 | 4 | (starting state for loop)
+---+---+---+---+
>---<
+---+---+---+---+
| 2 | 1 | 3 | 4 | (swap 3 and 1)
+---+---+---+---+
+---+---+---+---+
| 2 | 1 | 3 | 4 | (no swap needed for 1 and 3)
+---+---+---+---+
One swap is performed. Now the last two positions are sorted.
+---+---+---+---+
| 2 | 1 | 3 | 4 | (starting state for loop)
+---+---+---+---+
>---<
+---+---+---+---+
| 1 | 2 | 3 | 4 | (swap 2 and 1)
+---+---+---+---+
One swap is performed. We know the last three positions are sorted. The first position therefore is sorted.
If the array is already sorted and we keep track of swaps:
+---+---+---+---+
| 1 | 2 | 3 | 4 | (starting state for loop)
+---+---+---+---+
+---+---+---+---+
| 1 | 2 | 3 | 4 | (no swap needed for 1 and 2)
+---+---+---+---+
+---+---+---+---+
| 1 | 2 | 3 | 4 | (no swap needed for 2 and 3)
+---+---+---+---+
+---+---+---+---+
| 1 | 2 | 3 | 4 | (no swap needed for 3 and 4)
+---+---+---+---+
Since zero swaps are performed, the array is known to be sorted. While bubble sort is worst case O(n^2), this technique can make it best case O(n).
def bubble_sort(arr)
(arr.length-1).times do |x|
swaps = 0
0.upto(arr.length-x-2) do |i|
if arr[i] > arr[i+1]
temp = arr[i+1]
arr[i+1] = arr[i]
arr[i] = temp
swaps += 1
end
end
break if swaps == 0
end
arr
end

Binary tree from unsorted array

I have an array of records from which I am trying to construct a binary tree.
The end result should be as follow. -1 represents null and if both the left and right pointer are -1, it is a leaf node.
| Index | Left Pointer | Data | Right Pointer |
|:-----:|:------------:|:----:|:-------------:|
| 0 | 1 | 17 | 4 |
| 1 | 2 | 8 | 3 |
| 2 | -1 | 4 | 7 |
| 3 | -1 | 12 | 6 |
| 4 | 5 | 22 | 8 |
| 5 | -1 | 19 | -1 |
| 6 | -1 | 14 | -1 |
| 7 | -1 | 5 | -1 |
| 8 | 9 | 30 | -1 |
| 9 | -1 | 25 | -1 |
This is my current code:
type node = record
data : string;
left : integer;
right : integer;
end;
var
binaryTree : array[0..9] of node;
procedure output;
var
i : integer;
begin
for i := 0 to 9 do
begin
writeln('Data: ' , binaryTree[i].data);
writeln('Left: ' , binaryTree[i].left);
writeln('Right: ' , binaryTree[i].right);
end;
end;
procedure takeInput;
var
i : integer;
begin
for i := 0 to 9 do
readln(binaryTree[i].data);
end;
procedure initialisePointers;
var
i : integer;
begin
// initialise to -1 for null
for i := 0 to 9 do
begin
binaryTree[i].left := -1;
binaryTree[i].right := -1;
end;
end;
procedure setPointers;
var
i : integer;
root, currentNode : node;
begin
// first value becomes root
root := binaryTree[0];
for i := 1 to 9 do
begin
currentNode := root;
while true do
begin
if binaryTree[i].data < currentNode.data then
begin
if currentNode.left = -1 then
begin
currentNode.left := i;
break;
end
else
currentNode := binaryTree[currentNode.left]
end
else
begin
if binaryTree[i].data >= currentNode.data then
begin
if currentNode.right = -1 then
begin
currentNode.right := i;
break;
end
else
currentNode := binaryTree[currentNode.right]
end;
end;
end;
end;
end;
begin
takeInput;
initialisePointers;
setPointers;
output;
end.
With the input shown in the table, I get an output of all pointers remaining at -1 as initialised. Any idea why this may be?
In Pascal, records are value types, so when you do e.g.
root := binaryTree[0];
then root is a copy of binaryTree[0].
When you set fields in root you are setting fields in the copy.
If you want chose changes reflected back in the binaryTree array, you have to assign the value back after use:
binaryTree[0] := root;
Or else, use pointers, but I am not sure you have learned about those yet.

Emit a group only when condition meets

I have requirement to emit all records corresponds to a group, only when a condition is met. Below is the sample data set with alias name as "SAMPLE_DATA".
Col-1 | Col-2 | Col-3
-------------------------
2 | 4 | 1
2 | 5 | 2
3 | 3 | 1
3 | 2 | 2
4 | 5 | 1
4 | 6 | 2
SAMPLE_DATA_GRP = GROUP SAMPLE_DATA BY Col-1;
RESULT = FOREACH SAMPLE_DATA_GRP {
max_value = MAX(SAMPLE_DATA.Col-2);
IF(max_value >= 5)
GENERATE ALL RECORDS IN THAT GROUP;
}
RESULT should be:
Col-1 | Col-2 | Col-3
-------------------------
2 | 4 | 1
2 | 5 | 2
---- ---- ---
4 | 5 | 1
4 | 6 | 2
Two groups got generated. First group is generate because max value of 4,5 is "5"(which meets our condition >=5). Same for second group (6 >= 5).
As I would be performing this operation on big dataset operations like distinct and join would be overkill. For this reason I have come up with pseudo code with one grouping to perform this operation.
Hope I have provided enough information. Thanks in advance.
I would be performing this operation on a huge data set. Doing operation like distinct and join would be overkill on the system. For this reason I have come up with this grouping approach.
Please try the below code and see..
This solution is little lengthy ,but it will work
numbers = LOAD '/home/user/inputfiles/c1.txt' USING PigStorage(',') AS(c1:int,c2:int,c3:int);
num_grp = GROUP numbers by c1;
num_each = FOREACH num_grp
{
max_each = MAX(numbers.c2);
generate flatten(group) as temp_c1, (max_each >= 5 ?1 :0) as indicator;
};
num_each_filtered = filter num_each BY indicator == 1;
num_joined = join numbers BY c1,num_each_filtered by tem_c1;
num_output = FOREACH num_joined GENERATE c1,c2,c3;
dump num_output;
O/p:
Col-1 | Col-2 | Col-3
-------------------------
2 | 4 | 1
2 | 5 | 2
---- ---- ---
4 | 5 | 1
4 | 6 | 2

Vbscript basic functions

I am new to programming and computer science. HTML is all I know and I have been facing problems with vbscript.
This program (my first in vbscript) was given by my teacher. But I really do not understand anything. I referred to my book but in vain.
I am not even sure if this is the right SE to post the question.
Please help.
What you have there is a loop with another nested loop, both of which print some text to the screen (document.write("...")).
The outer loop
For i = 1 To 5 Step 1
...
Next
iterates from 1 to 5 in steps of 1 (which is redundant, since 1 is the default step size, so you could just omit the Step 1). If you printed the value of i inside the loop
For i = 1 To 5 Step 1
document.Write(i & "<br>")
Next
You'd get the following output:
1
2
3
4
5
In your code sample you just print <br>, though, so each cycle of the outer loop just prints a line break.
In addition to printing line breaks in the outer loop you also have a nested loop, which for each cycle of the outer loop iterates from 1 to the current value of i, again in steps of 1.
For j = 1 To i Step 1
...
Next
So in the first cycle of the outer loop (i=1) the inner loop iterates from 1 to 1, in the second cycle of the outer loop (i=2) it iterates from 1 to 2, and so on.
For i = 1 To 5 Step 1
document.Write(i & "<br>")
For j = 1 To i Step 1
document.Write("*")
Next
Next
Since the inner loop prints an asterisk with each cycle you get i asterisks per line before the inner loop ends, the outer loop then goes into the next cycle and prints a line break, thus ending the current output line.
A good (although somewhat tedious) way to get an understanding of how the loops work is to note the current value of each variable as well as the current output line in a table on a sheet of paper, e.g. like this:
code line | instruction | i | j | output line
----------+------------------------+-------+-------+------------
1 | For i = 1 To 5 Step 1 | 1 | Empty |
2 | document.Write("<br>") | 1 | Empty | <br>
3 | For j = 1 To i Step 1 | 1 | 1 |
4 | document.Write("*") | 1 | 1 | *
5 | Next | 1 | 1 | *
6 | Next | 1 | 1 | *
1 | For i = 1 To 5 Step 1 | 2 | 1 | *
2 | document.Write("<br>") | 2 | 1 | *<br>
3 | For j = 1 To i Step 1 | 2 | 1 |
4 | document.Write("*") | 2 | 1 | *
5 | Next | 2 | 1 | *
3 | For j = 1 To i Step 1 | 2 | 2 | *
4 | document.Write("*") | 2 | 2 | **
5 | Next | 2 | 2 | **
6 | Next | 2 | 2 | **
1 | For i = 1 To 5 Step 1 | 3 | 2 | **
2 | document.Write("<br>") | 3 | 2 | **<br>
3 | For j = 1 To i Step 1 | 3 | 1 |
4 | document.Write("*") | 3 | 1 | *
... | ... | ... | ... | ...

Weighted random selection with maximum consecutive repetitions

Let's say I have a list of items in which each item can be repeated once or more than once (1 to n). I'm trying to find an algorithm that extracts random items from the list until it's empty, but with the constraint that an item cannot be repeated consecutively more than a fixed number of times (which could be different for each item). I would like the algorithm to have "correct probabilities" (I try to explain that later).
For example, say I have this list:
Item | Count | Max. consecutive
-----+-------+-----------------
A | 2 | 1
B | 4 | 2
Some results could be:
B A B A B B
B B A B A B
But the following would be incorrect, since "B" has 3 consecutive repetitions, when the maximum was 2:
B B B A B A
I've managed to create an algorithm that works, but it has a problem with the probabilities. I will put the code first, and then talk about the problem. It's a C# class; sorry if you don't like the format or the GOTOs, let's be friends ;P
To use it, you instantiate an object providing the number of unique items, use SetIndexData() to add the information for each item (number of elements and max. repetitions), and call GetNextRandomIndex() to get the next random item (I have implemented it using int as if the items were indexes, you would have to create an indexed collection to store the real items' values).
You can use the method CheckIndexMaxConsecutiveCorrectness() to check if it's possible to have an item with the specified maximum repetitions in a bag with the specified total elements. For example, if a bag has only 1 item with 2 elements, but we say it can only be repeated once, it's obviously impossible, so CheckIndexMaxConsecutiveCorrectness() would return false.
The method GetIndexMinMaxGroups() is used internally by the class to retrieve the minimum and maximum number of groups of consecutive elements for an item (for example, in the sequence B B B A B A we have 2 groups for "A" and 2 for "B"). Actually, the numbers calculated in this method are not exactly true, since there are some variables that it doesn't take into account, but the values it returns are useful for two things: doing the check from CheckIndexMaxConsecutiveCorrectness() (if the minimum number of groups is greater than the maximum, then the maximum number of repetitions is not correct), and knowing when it's obligatory to return a determined item from GetNextRandomIndex() (when the minimum is equal to the maximum). I haven't used any mathematical algorithm to deduce those properties, so something could be wrong, though so far it has "magically" worked perfectly in my tests, millions of times...
class ShuffleBag
{
/*** INFORMATION FOR THE INDEXES ***/
// Class for the data:
class IndexData
{
public int Count = 0; // The current number of instances of the index in the bag.
public int MaxConsecutive = int.MaxValue; // The maximum consecutive repetitions allowed for the index.
}
// List of indexes data for this bag:
IndexData[] IndexesDataList;
/*** MEMBERS USED FOR CALCULATIONS ***/
// Random number generator:
Random RandGenerator;
// Remaining elements in the bag:
int _RemainingElementsCount = 0;
public int RemainingElementsCount
{
get { return _RemainingElementsCount; }
}
// Last retrieved index (-1 if no index has been retrieved yet),
// and the last consecutive repetitions of that index:
int LastIndex = -1;
int LastRepetitions = 0;
///==============///
/// Constructor. ///
///==============///
public ShuffleBag(int uniqueIndexesCount)
{
IndexesDataList = new IndexData[uniqueIndexesCount];
for (int i = 0; i < uniqueIndexesCount; i++)
IndexesDataList[i] = new IndexData();
RandGenerator = new Random();
}
///===========================================================///
/// Resets the shuffle bag; must be called before reusing it. ///
/// The number of unique indexes won't be reset. ///
/// Doesn't need to be called just after creating the bag. ///
///===========================================================///
public void Reset()
{
for (int i = 0; i < IndexesDataList.Length; i++)
{
IndexesDataList[i].Count = 0;
IndexesDataList[i].MaxConsecutive = int.MaxValue;
}
_RemainingElementsCount = 0;
LastIndex = -1;
LastRepetitions = 0;
}
///==================================================================================================///
/// Checks if it's possible to honor the max repetitions of an index with the provided data. ///
/// If it was not possible, the behaviour of a shuffle bag with those parameters would be undefined. ///
///==================================================================================================///
public static bool CheckIndexMaxConsecutiveCorrectness(int maxConsecutive, int indexElements, int bagTotalElements)
{
int min, max;
GetIndexMinMaxGroups(indexElements, maxConsecutive, bagTotalElements, out min, out max);
return min <= max;
}
///=====================================================================///
/// Sets the data for the specified index. ///
/// Can be called after starting to use the bag, ///
/// but if any parameters make the max consecutive repetitions invalid, ///
/// the behaviour of the bag will be undefined. ///
///=====================================================================///
public void SetIndexData(int index, int count, int maxConsecutive)
{
IndexData data = IndexesDataList[index];
_RemainingElementsCount += count - data.Count;
data.Count = count;
data.MaxConsecutive = maxConsecutive;
}
///====================================================================================================================///
/// Retrieves the next random index. The caller must check if there are remaining elements in the bag to be retrieved. ///
///====================================================================================================================///
public int GetNextRandomIndex()
{
/*** GET THE INDEX ***/
int index;
// If, for any index, the minimum possible groups equals the maximum, it must be the returned index:
for (index = 0; index < IndexesDataList.Length; index++)
{
IndexData data = IndexesDataList[index];
int minGroups, maxGroups;
GetIndexMinMaxGroups(data.Count, data.MaxConsecutive, _RemainingElementsCount, out minGroups, out maxGroups);
if (minGroups == maxGroups)
goto _INDEX_FOUND_;
}
// Get a random number to choose the index:
int rand = RandGenerator.Next(_RemainingElementsCount);
for (index = 0; index < IndexesDataList.Length; index++)
{
IndexData data = IndexesDataList[index];
// This index corresponds with the random number:
if (rand < data.Count)
{
// Check if the index has reached the maximum consecutive repetitions;
// in that case, get the next available one:
if (index == LastIndex && data.MaxConsecutive == LastRepetitions)
{
for (int k = 1; k <= IndexesDataList.Length - 1; k++)
{
int m = WrapIndexSimple(index + k, IndexesDataList.Length);
if (IndexesDataList[m].Count > 0)
{
index = m;
goto _INDEX_FOUND_;
}
}
}
goto _INDEX_FOUND_;
}
// This index doesn't correspond with the random number; update it to check the next index:
else
{
rand -= data.Count;
}
}
/*** INDEX FOUND, UPDATE AND RETURN ***/
_INDEX_FOUND_:
IndexData resultData = IndexesDataList[index];
resultData.Count--;
_RemainingElementsCount--;
if (LastIndex == index)
{
LastRepetitions++;
}
else
{
LastIndex = index;
LastRepetitions = 1;
}
return index;
}
///===============================================================================///
/// Calculates the minimum and maximum possible groups of consecutive repetitions ///
/// for an index with the specified data. ///
/// If any provided data is invalid, the behaviour and results are undefined. ///
///===============================================================================///
static void GetIndexMinMaxGroups(int indexRemainingElements, int indexMaxConsecutive, int bagRemainingElements, out int min, out int max)
{
int rem;
int div = Math.DivRem(indexRemainingElements, indexMaxConsecutive, out rem);
min = rem == 0 ? div : div + 1;
max = bagRemainingElements - indexRemainingElements + 1;
}
///=======================================================///
/// Converts an index out of bounds to a valid value. ///
/// "length" is the number of elements of the collection. ///
/// Only works for indexes that are less than 2*length. ///
///=======================================================///
static int WrapIndexSimple(int index, int length)
{
if (index >= length)
return index - length;
else if (index < 0)
return length + index;
else
return index;
}
}
The problem:
As I said, so far the class has worked as I wanted, extracting items without ever exceeding the number of repetitions. The problem is that the probabilities of extracting an item at each call of GetNextRandomIndex() are not what they should be. For example, if I have this:
Item | Count | Max. consecutive
-----+-------+-----------------
a | 30 | 3
X | 10 | 2
If I'm right, there would be 13 different possible sequences. The method should return "a" 12 times for each time it returns "X" in the first 3 calls; for the fourth, it should return "a" 10 times for every 3 times it returns "X"; etc. These are the different sequences:
Seq. 1 | Seq. 2 | Seq. 3 | Seq. 4 | Seq. 5 | Seq. 6 | Seq. 7 | Seq. 8 | Seq. 9 | Seq. 10 | Seq. 11 | Seq. 12 | Seq. 13 | Number of “a” | Number of “X”
-------+--------+--------+--------+--------+--------+--------+--------+--------+---------+---------+---------+---------+---------------+--------------
a | a | a | X | a | a | a | a | a | a | a | a | a | 12 | 1
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | X | X | X | X | X | X | X | X | X | 3 | 10
a | a | a | X | X | a | a | a | a | a | a | a | a | 11 | 2
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | X | X | X | X | X | X | X | X | 4 | 9
a | a | a | X | X | X | a | a | a | a | a | a | a | 10 | 3
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | a | X | X | X | X | X | X | X | 5 | 8
a | a | a | X | X | X | X | a | a | a | a | a | a | 9 | 4
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | a | a | X | X | X | X | X | X | 6 | 7
a | a | a | X | X | X | X | X | a | a | a | a | a | 8 | 5
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | a | a | a | X | X | X | X | X | 7 | 6
a | a | a | X | X | X | X | X | X | a | a | a | a | 7 | 6
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | a | a | a | a | X | X | X | X | 8 | 5
a | a | a | X | X | X | X | X | X | X | a | a | a | 6 | 7
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | a | a | a | a | a | X | X | X | 9 | 4
a | a | a | X | X | X | X | X | X | X | X | a | a | 5 | 8
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | a | a | a | a | a | a | X | X | 10 | 3
a | a | a | X | X | X | X | X | X | X | X | X | a | 4 | 9
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | a | a | a | a | a | a | a | X | 11 | 2
a | a | a | X | X | X | X | X | X | X | X | X | X | 3 | 10
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
This little console application shows the results of the use of the class. For the specified number of iterations, it fills the bag and extracts all the random items. At the end, it shows the maximum number of consecutive repetitions for each item, and the accumulated number of occurrences of each item and each call to GetNextRandomIndex() for all the iterations. As you can see, after 1 million iterations, the accumulated elements for the first call are around 750,000 for "a" and 250,000 for "X", and for the last call are around 999,950 for "a" and 50 for "X":
static void Main(string[] args)
{
int MaxIterations = 1000000;
bool ShowResults = true;
string[] items = new string[] { "a", "X" };
int[] count = new int[] { 30, 10 };
int[] maxRepet = new int[] { 3, 2 };
int elementCount = count.Sum();
bool maxRepetOK = true;
for (int i = 0; i < items.Length; i++)
maxRepetOK = maxRepetOK && ShuffleBag.CheckIndexMaxConsecutiveCorrectness(maxRepet[i], count[i], elementCount);
if (! maxRepetOK)
{
Console.WriteLine("*** Bad number of repetitions!! ***\n");
goto _END_;
}
Dictionary<string, Tuple<int,int>> results = new Dictionary<string,Tuple<int,int>>();
for (int i = 0; i < items.Length; i++)
results.Add(items[i], new Tuple<int,int>(0, 0));
int[,] resultsPerCall = new int[elementCount, items.Length];
ShuffleBag bag = new ShuffleBag(items.Length);
int iterations = 0;
for (int x = 0; x < MaxIterations; x++)
{
iterations++;
bag.Reset();
for (int i = 0; i < items.Length; i++)
bag.SetIndexData(i, count[i], maxRepet[i]);
string prevItem = "";
int prevRepetitions = 0;
int row = 0;
while (bag.RemainingElementsCount > 0)
{
int newIndex = bag.GetNextRandomIndex();
if (prevItem == items[newIndex])
{
prevRepetitions++;
}
else
{
prevItem = items[newIndex];
prevRepetitions = 1;
}
var resultAnt = results[items[newIndex]];
results[items[newIndex]] = new Tuple<int,int>(resultAnt.Item1 + 1, Math.Max(prevRepetitions, resultAnt.Item2));
resultsPerCall[row, newIndex]++;
if (ShowResults)
{
Console.WriteLine(items[newIndex]);
}
row++;
}
if (ShowResults && MaxIterations > 1)
{
Console.WriteLine("\nESC:\tEnd\nENTER:\tNext iteration\nTAB:\tAll iterations");
while (true)
{
switch (Console.ReadKey(true).Key)
{
case ConsoleKey.Enter:
goto _CONTINUE_;
case ConsoleKey.Escape:
goto _RESULTS_;
case ConsoleKey.Tab:
ShowResults = false;
goto _CONTINUE_;
}
}
_CONTINUE_:
Console.Clear();
Console.WriteLine("Iterating ...");
}
}
_RESULTS_:
Console.Clear();
Console.WriteLine("RESULTS\n------------------------------------------");
for (int i = 0; i < items.Length; i++)
{
var data = results[items[i]];
double average = (double)data.Item1 / iterations;
Console.WriteLine(items[i] +
": Average = " + average.ToString() + (average != count[i] ? "(!)" : "") +
"\t\tMax. repetitions = " + data.Item2.ToString() + (data.Item2 > maxRepet[i] ? "(!)" : ""));
}
Console.WriteLine();
for (int i = 0; i < elementCount; i++)
{
Console.Write((i+1).ToString("00") + ")");
for (int k = 0; k < items.Length; k++)
{
Console.Write(" \t" + items[k] + ": " + resultsPerCall[i, k].ToString());
}
Console.WriteLine();
}
_END_:
Console.WriteLine("\n\nESC to exit");
while (Console.ReadKey(true).Key != ConsoleKey.Escape);
}
The problem is that, from GetNextRandomIndex(), I only use the number of remaining elements of an item as the probability for selecting it, so for the first call the probability of getting "a" is 3 times the probability of getting "X" (since there are 30 elements of "a" and 10 of "X"). Does anybody have any idea of how can I change my algorithm (or use a different one) to get the correct probabilities?
As promised, here’s a Markov chain Monte Carlo sampler. I can’t seem to
get an exact version using the Propp–Wilson technique; this one merely
converges to a distribution that is biased toward and uniform on the
desired outcomes, possibly rather slowly. Define the score of a
permutation to be the number of invalid letters. Specifically, if A
should appear at most once consecutively and B should appear at most
twice consecutively, then the score of
AAABBAABBBB
^^ ^ ^^
is 5, where the contributing letters are marked with ^.
Starting with a random permutation, repeatedly choose two positions with
replacement independently and uniformly at random. Score the permutation
with the letters at those positions swapped; this is the proposed
permutation. Compute two to the power of (the current score - the
proposed score). If a uniform random float between zero and one is less
than this number, then keep this proposed swap; otherwise, undo it. Do
this as long as you can stand, then take the next valid permutation.
Here's an approach for exact sampling that did not work as well as I had hoped. I'm posting it in case it proves useful anyway. Given the maximum run lengths, prepare an automaton that recognizes the valid strings.
def make_automaton(max_consecutive):
states = {letter * j for letter, k in max_consecutive.items() for j in range(1, k + 1)}
states.add('')
automaton = {}
for state in states:
transitions = {}
for letter in max_consecutive.keys():
new_state = state + letter if letter == state[-1:] else letter
if new_state in states:
transitions[letter] = new_state
automaton[state] = transitions
return automaton
Here's the code in action.
>>> from pprint import pprint
>>> pprint(make_automaton({'a': 3, 'X': 2}))
{'': {'X': 'X', 'a': 'a'},
'X': {'X': 'XX', 'a': 'a'},
'XX': {'a': 'a'},
'a': {'X': 'X', 'a': 'aa'},
'aa': {'X': 'X', 'a': 'aaa'},
'aaa': {'X': 'X'}}
Now, there's an approach to exact sampling that seems to use too much space. Prepare a table indexed by pairs of an automaton state and a multiset of letters. The table values give the number of words with specified letter counts that put the automaton in the specified state. Then we can use this table by sampling the final state and using conditional probabilities to derive the string that got the automaton there.
My idea was, instead of trying to remember the counts, sample from the following distribution until we get the letter counts right. Each letter is distributed independently according to the counts. We condition on the resulting word being valid.
Here's the rest of the code. It seems to work OK as long as the counts aren't too large and the runs aren't too short.
from collections import defaultdict
def make_probabilities(count, automaton):
probabilities = [{min(automaton): 1.0}]
total = sum(count.values())
for i in range(1, total + 1):
distribution = defaultdict(float)
for state, p in probabilities[-1].items():
transitions = automaton[state]
for letter, k in count.items():
if letter in transitions:
distribution[transitions[letter]] += p * (k / total)
probabilities.append(dict(distribution))
return probabilities
pprint(make_probabilities({'a': 30, 'X': 10}, make_automaton({'a': 3, 'X': 2})))
from random import random
def weighted_sample(distribution):
while True:
sample = random() * sum(distribution.values())
for letter, k in distribution.items():
sample -= k
if sample < 0.0:
return letter
from collections import Counter
print(Counter(weighted_sample({'a': 30, 'X': 10}) for i in range(10000)))
def unbiased_sample(count, max_consecutive):
automaton = make_automaton(max_consecutive)
probabilities = make_probabilities(count, automaton)
total = sum(count.values())
while True:
sample = []
state = weighted_sample(probabilities[-1])
for i in range(len(probabilities) - 2, -1, -1):
conditional_distribution = {}
for old_state, p in probabilities[i].items():
transitions = automaton[old_state]
for letter, k in count.items():
if letter in transitions and transitions[letter] == state:
conditional_distribution[(old_state, letter)] = p * (k / total)
state, letter = weighted_sample(conditional_distribution)
sample.append(letter)
if Counter(sample) == count:
return ''.join(sample)
for r in range(1000):
print(unbiased_sample({'A': 10, 'B': 10, 'C': 10, 'D': 10}, {'A': 5, 'B': 4, 'C': 3, 'D': 2}))

Resources