Related
I need to draw a square grid of size n to the console. The grid uses - for horizontal cell boundaries, | for vertical cell boundaries, and + for corners of each cell.
As an example, a size 3 grid should look like the following:
+-+-+-+
| | | |
+-+-+-+
| | | |
+-+-+-+
| | | |
+-+-+-+
I was thinking of using a double for loop with outer loop iterating through rows and inner loop iterating through cols. each iteration of inner loop would be processing an individual cell. drawing | characters doesn't seem to hard but I'm not sure how I'd go about printing the - chars above and below a cell.
You could use Integer#times and String#*:
def print_grid(n)
n.times { print "+-"*n, "+\n", "| "*n, "|\n" }
print "+-"*n, "+\n"
end
print_grid(3)
+-+-+-+
| | | |
+-+-+-+
| | | |
+-+-+-+
| | | |
+-+-+-+
=> nil
Alternatively:
def print_grid(n)
puts n.times.map{ "+-"*n + "+\n" + "| "*n + "|\n" }.join + "+-"*n + "+\n"
end
For a width of 3, you have a separator-row like this:
+-+-+-+
This can be sees as either:
3 - connected by and surrounded by +
4 + connected by -
The latter is a bit easier to express in Ruby:
width = 3
Array.new(width + 1, '+').join('-')
#=> "+-+-+-+"
The same works for the cell-row:
Array.new(width + 1, '|').join(' ')
#=> "| | | |"
Vertically, you have 3 cell-rows connected by and surrounded by separator-rows. (that should ring a bell) Just like before, this can also be expressed as 4 separator-rows connected by cell-rows.
Let's store our separator-row and cell-row in variables: (we also have to append newlines)
width = 3
separator_row = Array.new(width + 1, '+').join('-') << "\n"
cell_row = Array.new(width + 1, '|').join(' ') << "\n"
And define the grid:
height = 3
grid = Array.new(height + 1, separator_row).join(cell_row)
#=> "+-+-+-+\n| | | |\n+-+-+-+\n| | | |\n+-+-+-+\n| | | |\n+-+-+-+\n"
put grid
Output:
+-+-+-+
| | | |
+-+-+-+
| | | |
+-+-+-+
| | | |
+-+-+-+
Here are two ways to do that.
Determine each character depending on whether the row and column indices are even or odd
def grid(n)
sz = 2*n+1
Array.new(sz) do |i|
Array.new(sz) do |j|
if i.even?
j.even? ? '+' : '-'
else # i is odd
j.even? ? '|' : ' '
end
end.join
end
end
puts grid(3)
+-+-+-+
| | | |
+-+-+-+
| | | |
+-+-+-+
| | | |
+-+-+-+
puts grid(4)
+-+-+-+-+
| | | | |
+-+-+-+-+
| | | | |
+-+-+-+-+
| | | | |
+-+-+-+-+
| | | | |
+-+-+-+-+
Use an enumerator
def pr_grid(n)
enum = [*['+','-']*n, "+", "\n", *['|',' ']*n, "|", "\n"].cycle
((2+2*n)*(1+2*n)).times { print enum.next }
end
pr_grid(3)
+-+-+-+
| | | |
+-+-+-+
| | | |
+-+-+-+
| | | |
+-+-+-+
pr_grid(4)
+-+-+-+-+
| | | | |
+-+-+-+-+
| | | | |
+-+-+-+-+
| | | | |
+-+-+-+-+
| | | | |
+-+-+-+-+
For n = 3 the steps are as follows.
a = [*['+','-']*n, "+", "\n", *['|',' ']*n, "|", "\n"]
#=> ["+", "-", "+", "-", "+", "-", "+", "\n",
# "|", " ", "|", " ", "|", " ", "|", "\n"]
enum = a.cycle
#=> #<Enumerator: ["+", "-", "+", "-", "+", "-", "+", "\n",
# "|", " ", "|", " ", "|", " ", "|", "\n"]:cycle
enum.next #=> "+"
enum.next #=> "-"
enum.next #=> "+"
enum.next #=> "-"
enum.next #=> "+"
enum.next #=> "-"
enum.next #=> "+"
enum.next #=> "\n"
enum.next #=> "|"
enum.next #=> " "
and so on.
Just for fun and just as an example, here is a list of methods (to be optimized and debugged and maybe used within a class) that you can use also for filling a table.
#cell = '|'
#line = '-'
#cross = '+'
def build_row(content)
(content.zip [#cell]* content.size).flatten.prepend(#cell).join
end
def separator_from_content(content)
content.map { |e| #line * e.size + #cross }.prepend(#cross).join
end
def adjust_content_in(lines)
width = lines.flatten.max_by(&:size).size
lines.map { |line| line.map { |e| e << ' ' * (width - e.size) } }
end
def build_table(lines)
lines = adjust_content_in(lines)
separator = separator_from_content(lines.first)
mapped = lines.map { |line| build_row(line) }.zip([separator] * (lines.size) )
return mapped.flatten.prepend(separator).join("\n")
end
def empty_squared_table(n)
lines = n.times.map { n.times.map { ' ' } }
build_table(lines)
end
So, if you want to draw an empty table, can call:
n = 3
puts empty_squared_table(n)
# +-+-+-+
# | | | |
# +-+-+-+
# | | | |
# +-+-+-+
# | | | |
# +-+-+-+
Or you can fill in some text (more lines inside a cell not supported):
lines = [['so', 'you can', 'fill'], ['a table', 'given the content', '']]
puts build_table(lines)
# +-----------------+-----------------+-----------------+
# |so |you can |fill |
# +-----------------+-----------------+-----------------+
# |a table |given the content| |
# +-----------------+-----------------+-----------------+
I have:
hash = {"1"=>["A", "B", "C", ... "Z"], "2"=>["B", "C"], "3"=>["A", "C"]
My goal is to use hash as a source for creating a CSV with columns whose names are a letter of the alphabet and with rows hash(key) = 1,2,3 etc.
I created an array of all hash.values.unshift("")values that serve as row 1 (columns labels).
desired output:
| A | B | C | ... | Z |
1| A | B | C | ... | Z |
2| | B | C | ....... |
3| A | | C | ....... |
Creating CSV:
CSV.open("groups.csv", 'w') do |csv|
csv << row1
hash.each do |v|
csv << v.flatten
end
end
This makes the CSV look almost what I want but There is no spacing to get columns to align.
Any advice on how to make a method for modifying my hash that compares my all [A-Z] against each subsequent hash key (rows) to insert empty strings to provide spacing?
Can Class CSV do it better?
Something like this?
require 'csv'
ALPHA = ('A'..'Z').to_a.freeze
hash={"1"=>ALPHA, "2"=>["B", "C"], "3"=>["A", "C"]}
csv = CSV.generate("", col_sep: "|") do |csv|
csv << [" "] + ALPHA # header
hash.each do |k, v|
alphabet = ALPHA.map { |el| [el, 0] }.to_h
v.each { |el| alphabet[el] += 1 }
csv << [k, *alphabet.map { |k, val| val == 1 ? k : " " }]
end
end
csv.split("\n").each { |row| puts row }
output:
|A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z
1|A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z
2| |B|C| | | | | | | | | | | | | | | | | | | | | | |
3|A| |C| | | | | | | | | | | | | | | | | | | | | | |
If your values are truly single characters and don't need the CSV escaping, then I recommend bypassing CSV altogether and building the string in plain Ruby.
Assuming you want to align your lines correctly regardless of the number of digits in the row number (e.g. 1, 10, and 100), you can use printf style formatting to guarantee horizontal aligment (assuming your row number width never exceeds the value of ROWNUM_WIDTH).
By the way, I changed the hash's keys to integers, hope that's ok.
#!/usr/bin/env ruby
FIELDS = ('A'..'Z').to_a
DATA = { 1 => FIELDS, 2 => %w(B C), 3 => %w(A C) }
ROWNUM_WIDTH = 3
output = ' ' * ROWNUM_WIDTH + " | #{FIELDS.join(' | ')} |\n"
DATA.each do |rownum, values|
line = "%*d | " % [ROWNUM_WIDTH, rownum]
FIELDS.each do |field|
char = values.include?(field) ? field : ' '
line << "#{char} | "
end
output << line << "\n"
end
puts output
=begin
Outputs:
| A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z |
1 | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z |
2 | | B | C | | | | | | | | | | | | | | | | | | | | | | | |
3 | A | | C | | | | | | | | | | | | | | | | | | | | | | | |
=end
all = [*?A..?Z]
hash = {"1"=>[*?A..?Z], "2"=>["B", "C"], "3"=>["A", "C"]}
hash.map do |k, v|
[k, *all.map { |k| v.include?(k) ? k : ' ' }]
end.unshift([' ', *all]).
map { |row| row.join('|') }
#⇒ [
# [0] " |A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z",
# [1] "1|A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z",
# [2] "2| |B|C| | | | | | | | | | | | | | | | | | | | | | | ",
# [3] "3|A| |C| | | | | | | | | | | | | | | | | | | | | | | "
# ]
Is there a way to output an array of arrays as a table in ruby? I'm looking for something that is useful from the console, like powershell's format-table command.
An array that would be passed could be something like:
a = [[1.23, 5, :bagels], [3.14, 7, :gravel], [8.33, 11, :saturn]]
And the output could be something like:
-----------------------
| 1.23 | 5 | bagels |
| 3.14 | 7 | gravel |
| 8.33 | 11 | saturn |
-----------------------
I accepted sagarpandya82's answer, but here's my better implementation:
class Array
def to_table
column_sizes = self.reduce([]) do |lengths, row|
row.each_with_index.map{|iterand, index| [lengths[index] || 0, iterand.to_s.length].max}
end
puts head = '-' * (column_sizes.inject(&:+) + (3 * column_sizes.count) + 1)
self.each do |row|
row = row.fill(nil, row.size..(column_sizes.size - 1))
row = row.each_with_index.map{|v, i| v = v.to_s + ' ' * (column_sizes[i] - v.to_s.length)}
puts '| ' + row.join(' | ') + ' |'
end
puts head
end
end
You could try something like this:
a = [[1.23, 5, :bagels], [3.14, 7, :gravel], [8.33, 11, :saturn]]
bar = '-'*(a[0][0].to_s.length + 4 + a[0][1].to_s.length + 3 + a[0][2].to_s.length + 4)
puts bar
a.each do |i|
i.each.with_index do |j,k|
if k == 1 && j < 10
print "| #{j} "
else
print "| #{j} "
end
end
print '|'
puts
end
puts bar
returns:
----------------------
| 1.23 | 5 | bagels |
| 3.14 | 7 | gravel |
| 8.33 | 11 | saturn |
----------------------
bar is just an estimate of how long the top and bottom dash-bar will be. In general this code checks each sub-array and prints out its elements with | in the appropriate places. The only tricky bit is the second element of each sub-array. Here we check to see if it is double digit or single digit. The print-format is different for each.
Bear in mind that this code works specifically for your example and makes a lot of assumptions. Having said that it can be easily modified to taste.
I've made a small improvement on the #dart-egregious snippet
class Array
def to_table(header: true)
column_sizes = self.reduce([]) do |lengths, row|
row.each_with_index.map{|iterand, index| [lengths[index] || 0, iterand.to_s.length].max}
end
head = '+' + column_sizes.map{|column_size| '-' * (column_size + 2) }.join('+') + '+'
puts head
to_print = self.clone
if (header == true)
header = to_print.shift
print_table_data_row(column_sizes, header)
puts head
end
to_print.each{ |row| print_table_data_row(column_sizes, row) }
puts head
end
private
def print_table_data_row(column_sizes, row)
row = row.fill(nil, row.size..(column_sizes.size - 1))
row = row.each_with_index.map{|v, i| v = v.to_s + ' ' * (column_sizes[i] - v.to_s.length)}
puts '| ' + row.join(' | ') + ' |'
end
end
Using:
data = [
['column 1', 'column 2', 'ciolumn 3'],
['row 1 col 2', 'row 1 col 2', 'row 1 col 3'],
['row 1 col 2', 'row 1 col 2', 'row 1 col 3']
]
data.to_table
+-------------+-------------+-------------+
| column 1 | column 2 | ciolumn 3 |
+-------------+-------------+-------------+
| row 1 col 2 | row 1 col 2 | row 1 col 3 |
| row 1 col 2 | row 1 col 2 | row 1 col 3 |
+-------------+-------------+-------------+
data.to_table(header: false)
+-------------+-------------+-------------+
| column 1 | column 2 | ciolumn 3 |
| row 1 col 2 | row 1 col 2 | row 1 col 3 |
| row 1 col 2 | row 1 col 2 | row 1 col 3 |
+-------------+-------------+-------------+
Let's say I have a list of items in which each item can be repeated once or more than once (1 to n). I'm trying to find an algorithm that extracts random items from the list until it's empty, but with the constraint that an item cannot be repeated consecutively more than a fixed number of times (which could be different for each item). I would like the algorithm to have "correct probabilities" (I try to explain that later).
For example, say I have this list:
Item | Count | Max. consecutive
-----+-------+-----------------
A | 2 | 1
B | 4 | 2
Some results could be:
B A B A B B
B B A B A B
But the following would be incorrect, since "B" has 3 consecutive repetitions, when the maximum was 2:
B B B A B A
I've managed to create an algorithm that works, but it has a problem with the probabilities. I will put the code first, and then talk about the problem. It's a C# class; sorry if you don't like the format or the GOTOs, let's be friends ;P
To use it, you instantiate an object providing the number of unique items, use SetIndexData() to add the information for each item (number of elements and max. repetitions), and call GetNextRandomIndex() to get the next random item (I have implemented it using int as if the items were indexes, you would have to create an indexed collection to store the real items' values).
You can use the method CheckIndexMaxConsecutiveCorrectness() to check if it's possible to have an item with the specified maximum repetitions in a bag with the specified total elements. For example, if a bag has only 1 item with 2 elements, but we say it can only be repeated once, it's obviously impossible, so CheckIndexMaxConsecutiveCorrectness() would return false.
The method GetIndexMinMaxGroups() is used internally by the class to retrieve the minimum and maximum number of groups of consecutive elements for an item (for example, in the sequence B B B A B A we have 2 groups for "A" and 2 for "B"). Actually, the numbers calculated in this method are not exactly true, since there are some variables that it doesn't take into account, but the values it returns are useful for two things: doing the check from CheckIndexMaxConsecutiveCorrectness() (if the minimum number of groups is greater than the maximum, then the maximum number of repetitions is not correct), and knowing when it's obligatory to return a determined item from GetNextRandomIndex() (when the minimum is equal to the maximum). I haven't used any mathematical algorithm to deduce those properties, so something could be wrong, though so far it has "magically" worked perfectly in my tests, millions of times...
class ShuffleBag
{
/*** INFORMATION FOR THE INDEXES ***/
// Class for the data:
class IndexData
{
public int Count = 0; // The current number of instances of the index in the bag.
public int MaxConsecutive = int.MaxValue; // The maximum consecutive repetitions allowed for the index.
}
// List of indexes data for this bag:
IndexData[] IndexesDataList;
/*** MEMBERS USED FOR CALCULATIONS ***/
// Random number generator:
Random RandGenerator;
// Remaining elements in the bag:
int _RemainingElementsCount = 0;
public int RemainingElementsCount
{
get { return _RemainingElementsCount; }
}
// Last retrieved index (-1 if no index has been retrieved yet),
// and the last consecutive repetitions of that index:
int LastIndex = -1;
int LastRepetitions = 0;
///==============///
/// Constructor. ///
///==============///
public ShuffleBag(int uniqueIndexesCount)
{
IndexesDataList = new IndexData[uniqueIndexesCount];
for (int i = 0; i < uniqueIndexesCount; i++)
IndexesDataList[i] = new IndexData();
RandGenerator = new Random();
}
///===========================================================///
/// Resets the shuffle bag; must be called before reusing it. ///
/// The number of unique indexes won't be reset. ///
/// Doesn't need to be called just after creating the bag. ///
///===========================================================///
public void Reset()
{
for (int i = 0; i < IndexesDataList.Length; i++)
{
IndexesDataList[i].Count = 0;
IndexesDataList[i].MaxConsecutive = int.MaxValue;
}
_RemainingElementsCount = 0;
LastIndex = -1;
LastRepetitions = 0;
}
///==================================================================================================///
/// Checks if it's possible to honor the max repetitions of an index with the provided data. ///
/// If it was not possible, the behaviour of a shuffle bag with those parameters would be undefined. ///
///==================================================================================================///
public static bool CheckIndexMaxConsecutiveCorrectness(int maxConsecutive, int indexElements, int bagTotalElements)
{
int min, max;
GetIndexMinMaxGroups(indexElements, maxConsecutive, bagTotalElements, out min, out max);
return min <= max;
}
///=====================================================================///
/// Sets the data for the specified index. ///
/// Can be called after starting to use the bag, ///
/// but if any parameters make the max consecutive repetitions invalid, ///
/// the behaviour of the bag will be undefined. ///
///=====================================================================///
public void SetIndexData(int index, int count, int maxConsecutive)
{
IndexData data = IndexesDataList[index];
_RemainingElementsCount += count - data.Count;
data.Count = count;
data.MaxConsecutive = maxConsecutive;
}
///====================================================================================================================///
/// Retrieves the next random index. The caller must check if there are remaining elements in the bag to be retrieved. ///
///====================================================================================================================///
public int GetNextRandomIndex()
{
/*** GET THE INDEX ***/
int index;
// If, for any index, the minimum possible groups equals the maximum, it must be the returned index:
for (index = 0; index < IndexesDataList.Length; index++)
{
IndexData data = IndexesDataList[index];
int minGroups, maxGroups;
GetIndexMinMaxGroups(data.Count, data.MaxConsecutive, _RemainingElementsCount, out minGroups, out maxGroups);
if (minGroups == maxGroups)
goto _INDEX_FOUND_;
}
// Get a random number to choose the index:
int rand = RandGenerator.Next(_RemainingElementsCount);
for (index = 0; index < IndexesDataList.Length; index++)
{
IndexData data = IndexesDataList[index];
// This index corresponds with the random number:
if (rand < data.Count)
{
// Check if the index has reached the maximum consecutive repetitions;
// in that case, get the next available one:
if (index == LastIndex && data.MaxConsecutive == LastRepetitions)
{
for (int k = 1; k <= IndexesDataList.Length - 1; k++)
{
int m = WrapIndexSimple(index + k, IndexesDataList.Length);
if (IndexesDataList[m].Count > 0)
{
index = m;
goto _INDEX_FOUND_;
}
}
}
goto _INDEX_FOUND_;
}
// This index doesn't correspond with the random number; update it to check the next index:
else
{
rand -= data.Count;
}
}
/*** INDEX FOUND, UPDATE AND RETURN ***/
_INDEX_FOUND_:
IndexData resultData = IndexesDataList[index];
resultData.Count--;
_RemainingElementsCount--;
if (LastIndex == index)
{
LastRepetitions++;
}
else
{
LastIndex = index;
LastRepetitions = 1;
}
return index;
}
///===============================================================================///
/// Calculates the minimum and maximum possible groups of consecutive repetitions ///
/// for an index with the specified data. ///
/// If any provided data is invalid, the behaviour and results are undefined. ///
///===============================================================================///
static void GetIndexMinMaxGroups(int indexRemainingElements, int indexMaxConsecutive, int bagRemainingElements, out int min, out int max)
{
int rem;
int div = Math.DivRem(indexRemainingElements, indexMaxConsecutive, out rem);
min = rem == 0 ? div : div + 1;
max = bagRemainingElements - indexRemainingElements + 1;
}
///=======================================================///
/// Converts an index out of bounds to a valid value. ///
/// "length" is the number of elements of the collection. ///
/// Only works for indexes that are less than 2*length. ///
///=======================================================///
static int WrapIndexSimple(int index, int length)
{
if (index >= length)
return index - length;
else if (index < 0)
return length + index;
else
return index;
}
}
The problem:
As I said, so far the class has worked as I wanted, extracting items without ever exceeding the number of repetitions. The problem is that the probabilities of extracting an item at each call of GetNextRandomIndex() are not what they should be. For example, if I have this:
Item | Count | Max. consecutive
-----+-------+-----------------
a | 30 | 3
X | 10 | 2
If I'm right, there would be 13 different possible sequences. The method should return "a" 12 times for each time it returns "X" in the first 3 calls; for the fourth, it should return "a" 10 times for every 3 times it returns "X"; etc. These are the different sequences:
Seq. 1 | Seq. 2 | Seq. 3 | Seq. 4 | Seq. 5 | Seq. 6 | Seq. 7 | Seq. 8 | Seq. 9 | Seq. 10 | Seq. 11 | Seq. 12 | Seq. 13 | Number of “a” | Number of “X”
-------+--------+--------+--------+--------+--------+--------+--------+--------+---------+---------+---------+---------+---------------+--------------
a | a | a | X | a | a | a | a | a | a | a | a | a | 12 | 1
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | X | X | X | X | X | X | X | X | X | 3 | 10
a | a | a | X | X | a | a | a | a | a | a | a | a | 11 | 2
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | X | X | X | X | X | X | X | X | 4 | 9
a | a | a | X | X | X | a | a | a | a | a | a | a | 10 | 3
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | a | X | X | X | X | X | X | X | 5 | 8
a | a | a | X | X | X | X | a | a | a | a | a | a | 9 | 4
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | a | a | X | X | X | X | X | X | 6 | 7
a | a | a | X | X | X | X | X | a | a | a | a | a | 8 | 5
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | a | a | a | X | X | X | X | X | 7 | 6
a | a | a | X | X | X | X | X | X | a | a | a | a | 7 | 6
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | a | a | a | a | X | X | X | X | 8 | 5
a | a | a | X | X | X | X | X | X | X | a | a | a | 6 | 7
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | a | a | a | a | a | X | X | X | 9 | 4
a | a | a | X | X | X | X | X | X | X | X | a | a | 5 | 8
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | a | a | a | a | a | a | X | X | 10 | 3
a | a | a | X | X | X | X | X | X | X | X | X | a | 4 | 9
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | a | a | a | a | a | a | a | X | 11 | 2
a | a | a | X | X | X | X | X | X | X | X | X | X | 3 | 10
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
This little console application shows the results of the use of the class. For the specified number of iterations, it fills the bag and extracts all the random items. At the end, it shows the maximum number of consecutive repetitions for each item, and the accumulated number of occurrences of each item and each call to GetNextRandomIndex() for all the iterations. As you can see, after 1 million iterations, the accumulated elements for the first call are around 750,000 for "a" and 250,000 for "X", and for the last call are around 999,950 for "a" and 50 for "X":
static void Main(string[] args)
{
int MaxIterations = 1000000;
bool ShowResults = true;
string[] items = new string[] { "a", "X" };
int[] count = new int[] { 30, 10 };
int[] maxRepet = new int[] { 3, 2 };
int elementCount = count.Sum();
bool maxRepetOK = true;
for (int i = 0; i < items.Length; i++)
maxRepetOK = maxRepetOK && ShuffleBag.CheckIndexMaxConsecutiveCorrectness(maxRepet[i], count[i], elementCount);
if (! maxRepetOK)
{
Console.WriteLine("*** Bad number of repetitions!! ***\n");
goto _END_;
}
Dictionary<string, Tuple<int,int>> results = new Dictionary<string,Tuple<int,int>>();
for (int i = 0; i < items.Length; i++)
results.Add(items[i], new Tuple<int,int>(0, 0));
int[,] resultsPerCall = new int[elementCount, items.Length];
ShuffleBag bag = new ShuffleBag(items.Length);
int iterations = 0;
for (int x = 0; x < MaxIterations; x++)
{
iterations++;
bag.Reset();
for (int i = 0; i < items.Length; i++)
bag.SetIndexData(i, count[i], maxRepet[i]);
string prevItem = "";
int prevRepetitions = 0;
int row = 0;
while (bag.RemainingElementsCount > 0)
{
int newIndex = bag.GetNextRandomIndex();
if (prevItem == items[newIndex])
{
prevRepetitions++;
}
else
{
prevItem = items[newIndex];
prevRepetitions = 1;
}
var resultAnt = results[items[newIndex]];
results[items[newIndex]] = new Tuple<int,int>(resultAnt.Item1 + 1, Math.Max(prevRepetitions, resultAnt.Item2));
resultsPerCall[row, newIndex]++;
if (ShowResults)
{
Console.WriteLine(items[newIndex]);
}
row++;
}
if (ShowResults && MaxIterations > 1)
{
Console.WriteLine("\nESC:\tEnd\nENTER:\tNext iteration\nTAB:\tAll iterations");
while (true)
{
switch (Console.ReadKey(true).Key)
{
case ConsoleKey.Enter:
goto _CONTINUE_;
case ConsoleKey.Escape:
goto _RESULTS_;
case ConsoleKey.Tab:
ShowResults = false;
goto _CONTINUE_;
}
}
_CONTINUE_:
Console.Clear();
Console.WriteLine("Iterating ...");
}
}
_RESULTS_:
Console.Clear();
Console.WriteLine("RESULTS\n------------------------------------------");
for (int i = 0; i < items.Length; i++)
{
var data = results[items[i]];
double average = (double)data.Item1 / iterations;
Console.WriteLine(items[i] +
": Average = " + average.ToString() + (average != count[i] ? "(!)" : "") +
"\t\tMax. repetitions = " + data.Item2.ToString() + (data.Item2 > maxRepet[i] ? "(!)" : ""));
}
Console.WriteLine();
for (int i = 0; i < elementCount; i++)
{
Console.Write((i+1).ToString("00") + ")");
for (int k = 0; k < items.Length; k++)
{
Console.Write(" \t" + items[k] + ": " + resultsPerCall[i, k].ToString());
}
Console.WriteLine();
}
_END_:
Console.WriteLine("\n\nESC to exit");
while (Console.ReadKey(true).Key != ConsoleKey.Escape);
}
The problem is that, from GetNextRandomIndex(), I only use the number of remaining elements of an item as the probability for selecting it, so for the first call the probability of getting "a" is 3 times the probability of getting "X" (since there are 30 elements of "a" and 10 of "X"). Does anybody have any idea of how can I change my algorithm (or use a different one) to get the correct probabilities?
As promised, here’s a Markov chain Monte Carlo sampler. I can’t seem to
get an exact version using the Propp–Wilson technique; this one merely
converges to a distribution that is biased toward and uniform on the
desired outcomes, possibly rather slowly. Define the score of a
permutation to be the number of invalid letters. Specifically, if A
should appear at most once consecutively and B should appear at most
twice consecutively, then the score of
AAABBAABBBB
^^ ^ ^^
is 5, where the contributing letters are marked with ^.
Starting with a random permutation, repeatedly choose two positions with
replacement independently and uniformly at random. Score the permutation
with the letters at those positions swapped; this is the proposed
permutation. Compute two to the power of (the current score - the
proposed score). If a uniform random float between zero and one is less
than this number, then keep this proposed swap; otherwise, undo it. Do
this as long as you can stand, then take the next valid permutation.
Here's an approach for exact sampling that did not work as well as I had hoped. I'm posting it in case it proves useful anyway. Given the maximum run lengths, prepare an automaton that recognizes the valid strings.
def make_automaton(max_consecutive):
states = {letter * j for letter, k in max_consecutive.items() for j in range(1, k + 1)}
states.add('')
automaton = {}
for state in states:
transitions = {}
for letter in max_consecutive.keys():
new_state = state + letter if letter == state[-1:] else letter
if new_state in states:
transitions[letter] = new_state
automaton[state] = transitions
return automaton
Here's the code in action.
>>> from pprint import pprint
>>> pprint(make_automaton({'a': 3, 'X': 2}))
{'': {'X': 'X', 'a': 'a'},
'X': {'X': 'XX', 'a': 'a'},
'XX': {'a': 'a'},
'a': {'X': 'X', 'a': 'aa'},
'aa': {'X': 'X', 'a': 'aaa'},
'aaa': {'X': 'X'}}
Now, there's an approach to exact sampling that seems to use too much space. Prepare a table indexed by pairs of an automaton state and a multiset of letters. The table values give the number of words with specified letter counts that put the automaton in the specified state. Then we can use this table by sampling the final state and using conditional probabilities to derive the string that got the automaton there.
My idea was, instead of trying to remember the counts, sample from the following distribution until we get the letter counts right. Each letter is distributed independently according to the counts. We condition on the resulting word being valid.
Here's the rest of the code. It seems to work OK as long as the counts aren't too large and the runs aren't too short.
from collections import defaultdict
def make_probabilities(count, automaton):
probabilities = [{min(automaton): 1.0}]
total = sum(count.values())
for i in range(1, total + 1):
distribution = defaultdict(float)
for state, p in probabilities[-1].items():
transitions = automaton[state]
for letter, k in count.items():
if letter in transitions:
distribution[transitions[letter]] += p * (k / total)
probabilities.append(dict(distribution))
return probabilities
pprint(make_probabilities({'a': 30, 'X': 10}, make_automaton({'a': 3, 'X': 2})))
from random import random
def weighted_sample(distribution):
while True:
sample = random() * sum(distribution.values())
for letter, k in distribution.items():
sample -= k
if sample < 0.0:
return letter
from collections import Counter
print(Counter(weighted_sample({'a': 30, 'X': 10}) for i in range(10000)))
def unbiased_sample(count, max_consecutive):
automaton = make_automaton(max_consecutive)
probabilities = make_probabilities(count, automaton)
total = sum(count.values())
while True:
sample = []
state = weighted_sample(probabilities[-1])
for i in range(len(probabilities) - 2, -1, -1):
conditional_distribution = {}
for old_state, p in probabilities[i].items():
transitions = automaton[old_state]
for letter, k in count.items():
if letter in transitions and transitions[letter] == state:
conditional_distribution[(old_state, letter)] = p * (k / total)
state, letter = weighted_sample(conditional_distribution)
sample.append(letter)
if Counter(sample) == count:
return ''.join(sample)
for r in range(1000):
print(unbiased_sample({'A': 10, 'B': 10, 'C': 10, 'D': 10}, {'A': 5, 'B': 4, 'C': 3, 'D': 2}))
Here's what I'm trying to do. I'm iterating through an array of strings, each of which may contain [] multiple times. I want to use String#match as many times as necessary to process every occurrence. So I have an Array#each block, and nested within that is an infinite loop that should break only when I run out of matches for a given string.
def follow(c, dir)
case c
when "\\"
dir.rotate!
when "/"
dir.rotate!.map! {|x| -x}
end
return dir
end
def parse(ar)
ar.each_with_index { |s, i|
idx = 0
while true do
t = s.match(/\[[ ]*([a-zA-Z]+)[ ]*\]/) { |m|
idx = m.end 0
r = m[1]
s[m.begin(0)...idx] = " "*m[0].size
a = [i, idx]
dir = [0, 1]
c = ar[a[0]][a[1]]
while !c.nil? do
dir = follow c, dir
ar[a[0]][a[1]] = " "
a[0] += dir[0]; a[1] += dir[1]
c = ar[a[0]][a[1]]
if c == ">" then
ar[a[0]][a[1]+1] = r; c=nil
elsif c == "<" then
ar[a[0]][a[1]-r.size] = r; c=nil
end
end
ar[a[0]][a[1]] = " "
puts ar
}
if t == nil then break; end
end
}
parse File.new("test", "r").to_a
Contents of test:
+--------+----------+-------------+
| Colors | Foods | Countries |
+--------+----------+-------------+
| red | pizza | Switzerland |
/--> /----> | |
| |[kale]/ | hot dogs | Brazil |
| | <----------------------\ |
| | orange |[yellow]\ | [green]/ |
| +--------+--------|-+-------------+
\-------------------/
Goal:
+--------+----------+-------------+
| Colors | Foods | Countries |
+--------+----------+-------------+
| red | pizza | Switzerland |
yellow kale | |
| | hot dogs | Brazil |
| green |
| orange | | |
+--------+-------- -+-------------+
Actual output of program:
+--------+----------+-------------+
| Colors | Foods | Countries |
+--------+----------+-------------+
| red | pizza | Switzerland |
yellow kale | |
| | hot dogs | Brazil |
| <----------------------\ |
| orange | | [green]/ |
+--------+-------- -+-------------+
(Since the array is modified in place, I figure there is no need to update the match index.) The inner loop should run again for each pair of [] I find, but instead I only get one turn per array entry. (It seems that the t=match... bit isn't the problem, but I could be wrong.) How can I fix this?