Print all possible path using haskell - algorithm

I need to write a program to draw all possible paths in a given matrix that can be had by moving in only left, right and up direction.
One should not cross the same location more than once. Note also that on a particular path, we may or may not use motion in all possible directions.
Path will start in the bottom-left corner in the matrix and will reach the top-right corner.
Following symbols are used to denote the direction of the motion in the current position:
+---+
| > | right
+---+
+---+
| ^ | up
+---+
+---+
| < | left
+---+
The symbol * is used in the final location to indicate end of path.
Example:
For a 5x8 matrix, using left, right and up directions, 2 different paths are shown below.
Path 1:
+---+---+---+---+---+---+---+---+
| | | | | | | | * |
+---+---+---+---+---+---+---+---+
| | | > | > | > | > | > | ^ |
+---+---+---+---+---+---+---+---+
| | | ^ | < | < | | | |
+---+---+---+---+---+---+---+---+
| | > | > | > | ^ | | | |
+---+---+---+---+---+---+---+---+
| > | ^ | | | | | | |
+---+---+---+---+---+---+---+---+
Path 2
+---+---+---+---+---+---+---+---+
| | | | > | > | > | > | * |
+---+---+---+---+---+---+---+---+
| | | | ^ | < | < | | |
+---+---+---+---+---+---+---+---+
| | | | | | ^ | | |
+---+---+---+---+---+---+---+---+
| | | > | > | > | ^ | | |
+---+---+---+---+---+---+---+---+
| > | > | ^ | | | | | |
+---+---+---+---+---+---+---+---+
Can anyone help me with this?
I tried to solve using lists. It i soon realized that i am making a disaster. Here is the code i tried with.
solution x y = travel (1,1) (x,y)
travelRight (x,y) = zip [1..x] [1,1..] ++ [(x,y)]
travelUp (x,y) = zip [1,1..] [1..y] ++ [(x,y)]
minPaths = [[(1,1),(2,1),(2,2)],[(1,1),(1,2),(2,2)]]
travel startpos (x,y) = rt (x,y) ++ up (x,y)
rt (x,y) | odd y = map (++[(x,y)]) (furtherRight (3,2) (x,2) minPaths)
| otherwise = furtherRight (3,2) (x,2) minPaths
up (x,y) | odd x = map (++[(x,y)]) (furtherUp (2,3) (2,y) minPaths)
| otherwise = furtherUp (2,3) (2,y) minPaths
furtherRight currpos endpos paths | currpos == endpos = (travelRight currpos) : map (++[currpos]) paths
| otherwise = furtherRight (nextRight currpos) endpos ((travelRight currpos) : (map (++[currpos]) paths))
nextRight (x,y) = (x+1,y)
furtherUp currpos endpos paths | currpos == endpos = (travelUp currpos) : map (++[currpos]) paths
| otherwise = furtherUp (nextUp currpos) endpos ((travelUp currpos) : (map(++[currpos]) paths))
nextUp (x,y) = (x,y+1)
identify lst = map (map iden) lst
iden (x,y) = (x,y,1)
arrows lst = map mydir lst
mydir (ele:[]) = "*"
mydir ((x1,y1):(x2,y2):lst) | x1==x2 = '>' : mydir ((x2,y2):lst)
| otherwise = '^' : mydir ((x2,y2):lst)
surroundBox lst = map (map createBox) lst
bar = "+ -+"
mid x = "| "++ [x] ++" |"
createBox chr = bar ++ "\n" ++ mid chr ++ "\n" ++ bar ++ "\n"

This ASCII grids are much more confusing than enlightening. Let me describe a better way to represent each possible path.
Each non-top row will have exactly one cell with UP. I claim that once each of the UP cells has been chosen that the LEFT and RIGHT and EMPTY cells can be determined. I claim that all possible cells in each of the non-top rows can be UP in all combination.
Each path is thus isomorphic to a (rows-1) length list of numbers in the range (1..columns) that determine the UP cells. The number of allowed paths is thus columns^(rows-1) and enumerating the possible paths in this format should be easy.
Then you could make a printer that converts this format to the ASCII art. This may be annoying, depending on skill level.

Looks like a homework so I will try to give enough hints
Try first filling number of paths from a cell to your goal.
So
+---+---+---+---+---+---+---+---+
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | * |
+---+---+---+---+---+---+---+---+
| | | | | | | | |
+---+---+---+---+---+---+---+---+
| | | | | | | | |
+---+---+---+---+---+---+---+---+
| | | | | | | | |
+---+---+---+---+---+---+---+---+
| | | | | | | | |
+---+---+---+---+---+---+---+---+
The thing to note here is from the cell in the top level there will always be one path to the *.
Number of possible path from cells in the same row will be same. You can realize this as all the paths will ultimately have to move up as there is no down action so in any path the cell above the current row can be reached by any cell in the current row.
You can feel the all possible paths from the current cell has its relation with the possible paths from the cell left,right and above. But as we know we can find all possible paths from only one cell in a row and rest of cells' possible paths will be some movements in the same row followed by a suffix of possible paths from that cell.
Maybe I will give you a example
+---+---+---+
| 1 | 1 | * |
+---+---+---+
| | | |
+---+---+---+
| | | |
+---+---+---+
You know all possible paths from cells in the first row. You need to find the same in the second row. So a good strategy would be to do it for the right most cell
+---+---+---+
| > | > | * |
+---+---+---+
| ^ | < | < |
+---+---+---+
| | | |
+---+---+---+
+---+---+---+
| | > | * |
+---+---+---+
| | ^ | < |
+---+---+---+
| | | |
+---+---+---+
+---+---+---+
| | | * |
+---+---+---+
| | | ^ |
+---+---+---+
| | | |
+---+---+---+
Now finding this for rest of the cells in the same row is trivial using these as I have told before.
In the end if you have m X n matrix the number of paths from bottom-left corner to top-right corner will be n^(m-1).
Another way
This way is not very optimal but easy to implement. Consider m X n grid
Find the path of longest length. You dont need the exact path just the number of <,>,^.
You can find the direct formula in terms of m and n.
Like
^ = m - 1
< = (n-1) * floor((m-1)/2)
> = (n-1) * (floor((m-1)/2) + 1)
Any valid path will be a prefix of the permutations of this which you can search exhaustively. Use permutations from Data.List to get all possible permutations. Then make a function which given a path strips a valid path from this. map this over the list of permutations and remove duplicates. The thing to note is path will be a prefix of what you get from permutation, so there can be several permutations for the same path.

Can you create that matrix and define the "fields"? Even if you can't (a specific matrix is given), you can map an [(Int, Int)] matrix (which sounds reasonable for this kind of task) to your own representation.
Since you didn't specify what your skill level was, I hope you don't mind that I suggest that you first try to create some kind of a grid in order to have something to work on:
data Status = Free | Left | Right | Up
deriving (Read, Show, Eq)
type Position = (Int, Int)
type Field = (Position, Status)
type Grid = [Field]
grid :: Grid
grid = [((x, y), stat) | x <- [1..10], y <- [1..10], let stat = Free]
Of course there are other ways to achieve this. Afterwards you can define some movement, map Position to Grid index and Statuses to printable characters... Try to fiddle with it and you might get some ideas.

Related

Any algorithm to fill a space with smallest number of boxes

Let a 3D grid, just like a checkerboard, with an extra dimension. Now let's say that I have a certain amount of cubes into that grid, each cube occupying 1x1x1 cells. Let's say that each of these cubes is an item.
What I would like to do is replace/combine these cubes into larger boxes occupying any number of cells on the X, Y and Z axes, so that the resulting number of boxes is as small as possible while preserving the overall "appearance".
It's probably unclear so I'll give a 2D example. Say I have a 2D grid containing several squares occupying 1x1 cells. A letter represents the cells occupied by a given item, each item having a different letter from the other ones. In the first example we have 10 different items, each of them occupying 1x1x1 cells.
+---+---+---+---+---+---+
| | | | | | |
+---+---+---+---+---+---+
| | A | B | C | D | |
+---+---+---+---+---+---+
| | E | F | G | H | |
+---+---+---+---+---+---+
| | | K | L | | |
+---+---+---+---+---+---+
| | | | | | |
+---+---+---+---+---+---+
That's my input data. I could now optimize it, i.e reduce the number of items while still occupying the same cells, by multiple possible ways, one of which could be :
+---+---+---+---+---+---+
| | | | | | |
+---+---+---+---+---+---+
| | A | B | B | C | |
+---+---+---+---+---+---+
| | A | B | B | C | |
+---+---+---+---+---+---+
| | | B | B | | |
+---+---+---+---+---+---+
| | | | | | |
+---+---+---+---+---+---+
Here, instead of 10 items, I only have 3 (i.e A, B and C). However it can be optimized even more :
+---+---+---+---+---+---+
| | | | | | |
+---+---+---+---+---+---+
| | A | A | A | A | |
+---+---+---+---+---+---+
| | A | A | A | A | |
+---+---+---+---+---+---+
| | | B | B | | |
+---+---+---+---+---+---+
| | | | | | |
+---+---+---+---+---+---+
Here I only have two items, A and B. This is as optimized as this can be.
What I am looking for is an algorithm capable of finding the best item sizes and arrangement, or at least a reasonably good one, so that I have as few items as possible while occupying the same cells, and in 3D !
Is there such an algorithm ? I'm sure there are some domains where that kind of algorithm would be useful, and I need it for a video game. Thanks !!
Perhaps a simpler algorithm is possible, but a set partition should suffice.
Min x1 + x2 + x3 + ... //where x1 is 1 if the 1th partition is chosen, 0 otherwise
such that x1 + + x3 = 1// if 1st and 3rd partition contain 1st item
x2 + x3 = 1//if 2nd and 3rd partition contain 2nd item and so on.
x1, x2, x3,... are binary
You have 1 constraint for each item. Each constraint stipulates that each item can be part of exactly one box. The objective minimizes the total number of boxes.
This is an NP Hard integer programming however.
The number of variables in this problem could be exponential. You need to have an efficient way of enumerating them -- that is figuring out when a contiguous box can be found that is capable of including all points in it. It is here that you have to take into account information such as whether the grid is 2d or 3d, how you define a contiguous "box", etc.
Such problems are usually solved by column-generation, where these columns of the integer program are dynamically generated on the fly.
If I understand David Eppstein's1 explanation (see section 3) then a solution can be found in a maximal independent set in the bipartite intersection graph of axis-aligned diagonals connecting one concave vertex to another. (This would be 2d. I'm not sure about 3d, although perhaps it involves evaluating hyperplanes instead of lines?)
In your example, there is only one such diagonal:
________
| |
|_x....x_|
|____|
The two xs represent connected concave vertices. The maximal independent set of edges here contains only one edge, splitting the polygon in two.
Here's another with only one axis-parallel edge connecting two concave vertices, x and x. This polygon, though, also has two concave vertices, a and b, that do not have an opposite, axis-parallel match. In that case, it seems to me, each concave vertex without a partner would split the polygon it's on in two (either vertically or horizontally):
____________
| |
| |x
| . |
| . |a
|___ . |
b| . |
| .___|
|________|x
results in 4 rectangles:
____________
| |
| |x
| . |
| ..|a
|___.......... |
b| . |
| .___|
|________|x
Here's one with two intersecting axis-parallel diagonals, each connecting two concave vertices, (x,x) and (y,y):
____________
| |
| |x_
| . |
| . |
|___ . . . .z. .|y
y| . |
| .____|
|________|x
In this case, as I understand, the intersection graph again contains only one independent set:
(y,z) (z,y) (x,z) (z,x)
yielding 4 rectangles as a solution.
Since I'm not completely sure how the "intersection graph" in the paper is defined, I would welcome any clarifying comments.
1. Graph-Theoretic Solutions to Computational Geometry Problems, David Eppstein (Submitted on 26 Aug 2009)

How many leafref is possible inside a leaf in Yang modelling?

According to RFC - RFC 6020 - LeafRef I can understand that the leaf can contain a leafref which inturn have the path pointing to the instance which is referenced but question is how many leafrefs are possible for one leaf? Only one or many?
Ex.
leaf mgmt-interface {
type leafref {
path "../interface/name";
}
type leafref {
path "../interface/ip";
}
}
Is the above possible?
A leafref may only target a single leaf or leaf-list node via path. There may only be one type substatement to a leaf (also applies to leaf-list, typedef) and there may only be a single path substatement to type.
7.6.2. The leaf's Substatements
+--------------+---------+-------------+
| substatement | section | cardinality |
+--------------+---------+-------------+
| config | 7.19.1 | 0..1 |
| default | 7.6.4 | 0..1 |
| description | 7.19.3 | 0..1 |
| if-feature | 7.18.2 | 0..n |
| mandatory | 7.6.5 | 0..1 |
| must | 7.5.3 | 0..n |
| reference | 7.19.4 | 0..1 |
| status | 7.19.2 | 0..1 |
| type | 7.6.3 | 1 | <--
| units | 7.3.3 | 0..1 |
| when | 7.19.5 | 0..1 |
+--------------+---------+-------------+
12. YANG ABNF Grammar
type-stmt = type-keyword sep identifier-ref-arg-str optsep
(";" /
"{" stmtsep
type-body-stmts
"}")
type-body-stmts = numerical-restrictions /
decimal64-specification /
string-restrictions /
enum-specification /
leafref-specification /
identityref-specification /
instance-identifier-specification /
bits-specification /
union-specification
leafref-specification =
;; these stmts can appear in any order
path-stmt stmtsep
[require-instance-stmt stmtsep]
path-stmt = path-keyword sep path-arg-str stmtend
Note: it is not possible to use union for leafref types in YANG 1.0. This has changed in YANG 1.1 however, where any built-in YANG type may appear inside a union.
9.12. The union Built-In Type
A member type can be of any built-in or derived type, except it MUST
NOT be one of the built-in types "empty" or "leafref".

Weighted random selection with maximum consecutive repetitions

Let's say I have a list of items in which each item can be repeated once or more than once (1 to n). I'm trying to find an algorithm that extracts random items from the list until it's empty, but with the constraint that an item cannot be repeated consecutively more than a fixed number of times (which could be different for each item). I would like the algorithm to have "correct probabilities" (I try to explain that later).
For example, say I have this list:
Item | Count | Max. consecutive
-----+-------+-----------------
A | 2 | 1
B | 4 | 2
Some results could be:
B A B A B B
B B A B A B
But the following would be incorrect, since "B" has 3 consecutive repetitions, when the maximum was 2:
B B B A B A
I've managed to create an algorithm that works, but it has a problem with the probabilities. I will put the code first, and then talk about the problem. It's a C# class; sorry if you don't like the format or the GOTOs, let's be friends ;P
To use it, you instantiate an object providing the number of unique items, use SetIndexData() to add the information for each item (number of elements and max. repetitions), and call GetNextRandomIndex() to get the next random item (I have implemented it using int as if the items were indexes, you would have to create an indexed collection to store the real items' values).
You can use the method CheckIndexMaxConsecutiveCorrectness() to check if it's possible to have an item with the specified maximum repetitions in a bag with the specified total elements. For example, if a bag has only 1 item with 2 elements, but we say it can only be repeated once, it's obviously impossible, so CheckIndexMaxConsecutiveCorrectness() would return false.
The method GetIndexMinMaxGroups() is used internally by the class to retrieve the minimum and maximum number of groups of consecutive elements for an item (for example, in the sequence B B B A B A we have 2 groups for "A" and 2 for "B"). Actually, the numbers calculated in this method are not exactly true, since there are some variables that it doesn't take into account, but the values it returns are useful for two things: doing the check from CheckIndexMaxConsecutiveCorrectness() (if the minimum number of groups is greater than the maximum, then the maximum number of repetitions is not correct), and knowing when it's obligatory to return a determined item from GetNextRandomIndex() (when the minimum is equal to the maximum). I haven't used any mathematical algorithm to deduce those properties, so something could be wrong, though so far it has "magically" worked perfectly in my tests, millions of times...
class ShuffleBag
{
/*** INFORMATION FOR THE INDEXES ***/
// Class for the data:
class IndexData
{
public int Count = 0; // The current number of instances of the index in the bag.
public int MaxConsecutive = int.MaxValue; // The maximum consecutive repetitions allowed for the index.
}
// List of indexes data for this bag:
IndexData[] IndexesDataList;
/*** MEMBERS USED FOR CALCULATIONS ***/
// Random number generator:
Random RandGenerator;
// Remaining elements in the bag:
int _RemainingElementsCount = 0;
public int RemainingElementsCount
{
get { return _RemainingElementsCount; }
}
// Last retrieved index (-1 if no index has been retrieved yet),
// and the last consecutive repetitions of that index:
int LastIndex = -1;
int LastRepetitions = 0;
///==============///
/// Constructor. ///
///==============///
public ShuffleBag(int uniqueIndexesCount)
{
IndexesDataList = new IndexData[uniqueIndexesCount];
for (int i = 0; i < uniqueIndexesCount; i++)
IndexesDataList[i] = new IndexData();
RandGenerator = new Random();
}
///===========================================================///
/// Resets the shuffle bag; must be called before reusing it. ///
/// The number of unique indexes won't be reset. ///
/// Doesn't need to be called just after creating the bag. ///
///===========================================================///
public void Reset()
{
for (int i = 0; i < IndexesDataList.Length; i++)
{
IndexesDataList[i].Count = 0;
IndexesDataList[i].MaxConsecutive = int.MaxValue;
}
_RemainingElementsCount = 0;
LastIndex = -1;
LastRepetitions = 0;
}
///==================================================================================================///
/// Checks if it's possible to honor the max repetitions of an index with the provided data. ///
/// If it was not possible, the behaviour of a shuffle bag with those parameters would be undefined. ///
///==================================================================================================///
public static bool CheckIndexMaxConsecutiveCorrectness(int maxConsecutive, int indexElements, int bagTotalElements)
{
int min, max;
GetIndexMinMaxGroups(indexElements, maxConsecutive, bagTotalElements, out min, out max);
return min <= max;
}
///=====================================================================///
/// Sets the data for the specified index. ///
/// Can be called after starting to use the bag, ///
/// but if any parameters make the max consecutive repetitions invalid, ///
/// the behaviour of the bag will be undefined. ///
///=====================================================================///
public void SetIndexData(int index, int count, int maxConsecutive)
{
IndexData data = IndexesDataList[index];
_RemainingElementsCount += count - data.Count;
data.Count = count;
data.MaxConsecutive = maxConsecutive;
}
///====================================================================================================================///
/// Retrieves the next random index. The caller must check if there are remaining elements in the bag to be retrieved. ///
///====================================================================================================================///
public int GetNextRandomIndex()
{
/*** GET THE INDEX ***/
int index;
// If, for any index, the minimum possible groups equals the maximum, it must be the returned index:
for (index = 0; index < IndexesDataList.Length; index++)
{
IndexData data = IndexesDataList[index];
int minGroups, maxGroups;
GetIndexMinMaxGroups(data.Count, data.MaxConsecutive, _RemainingElementsCount, out minGroups, out maxGroups);
if (minGroups == maxGroups)
goto _INDEX_FOUND_;
}
// Get a random number to choose the index:
int rand = RandGenerator.Next(_RemainingElementsCount);
for (index = 0; index < IndexesDataList.Length; index++)
{
IndexData data = IndexesDataList[index];
// This index corresponds with the random number:
if (rand < data.Count)
{
// Check if the index has reached the maximum consecutive repetitions;
// in that case, get the next available one:
if (index == LastIndex && data.MaxConsecutive == LastRepetitions)
{
for (int k = 1; k <= IndexesDataList.Length - 1; k++)
{
int m = WrapIndexSimple(index + k, IndexesDataList.Length);
if (IndexesDataList[m].Count > 0)
{
index = m;
goto _INDEX_FOUND_;
}
}
}
goto _INDEX_FOUND_;
}
// This index doesn't correspond with the random number; update it to check the next index:
else
{
rand -= data.Count;
}
}
/*** INDEX FOUND, UPDATE AND RETURN ***/
_INDEX_FOUND_:
IndexData resultData = IndexesDataList[index];
resultData.Count--;
_RemainingElementsCount--;
if (LastIndex == index)
{
LastRepetitions++;
}
else
{
LastIndex = index;
LastRepetitions = 1;
}
return index;
}
///===============================================================================///
/// Calculates the minimum and maximum possible groups of consecutive repetitions ///
/// for an index with the specified data. ///
/// If any provided data is invalid, the behaviour and results are undefined. ///
///===============================================================================///
static void GetIndexMinMaxGroups(int indexRemainingElements, int indexMaxConsecutive, int bagRemainingElements, out int min, out int max)
{
int rem;
int div = Math.DivRem(indexRemainingElements, indexMaxConsecutive, out rem);
min = rem == 0 ? div : div + 1;
max = bagRemainingElements - indexRemainingElements + 1;
}
///=======================================================///
/// Converts an index out of bounds to a valid value. ///
/// "length" is the number of elements of the collection. ///
/// Only works for indexes that are less than 2*length. ///
///=======================================================///
static int WrapIndexSimple(int index, int length)
{
if (index >= length)
return index - length;
else if (index < 0)
return length + index;
else
return index;
}
}
The problem:
As I said, so far the class has worked as I wanted, extracting items without ever exceeding the number of repetitions. The problem is that the probabilities of extracting an item at each call of GetNextRandomIndex() are not what they should be. For example, if I have this:
Item | Count | Max. consecutive
-----+-------+-----------------
a | 30 | 3
X | 10 | 2
If I'm right, there would be 13 different possible sequences. The method should return "a" 12 times for each time it returns "X" in the first 3 calls; for the fourth, it should return "a" 10 times for every 3 times it returns "X"; etc. These are the different sequences:
Seq. 1 | Seq. 2 | Seq. 3 | Seq. 4 | Seq. 5 | Seq. 6 | Seq. 7 | Seq. 8 | Seq. 9 | Seq. 10 | Seq. 11 | Seq. 12 | Seq. 13 | Number of “a” | Number of “X”
-------+--------+--------+--------+--------+--------+--------+--------+--------+---------+---------+---------+---------+---------------+--------------
a | a | a | X | a | a | a | a | a | a | a | a | a | 12 | 1
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | X | X | X | X | X | X | X | X | X | 3 | 10
a | a | a | X | X | a | a | a | a | a | a | a | a | 11 | 2
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | X | X | X | X | X | X | X | X | 4 | 9
a | a | a | X | X | X | a | a | a | a | a | a | a | 10 | 3
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | a | X | X | X | X | X | X | X | 5 | 8
a | a | a | X | X | X | X | a | a | a | a | a | a | 9 | 4
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | a | a | X | X | X | X | X | X | 6 | 7
a | a | a | X | X | X | X | X | a | a | a | a | a | 8 | 5
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | a | a | a | X | X | X | X | X | 7 | 6
a | a | a | X | X | X | X | X | X | a | a | a | a | 7 | 6
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | a | a | a | a | X | X | X | X | 8 | 5
a | a | a | X | X | X | X | X | X | X | a | a | a | 6 | 7
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | a | a | a | a | a | X | X | X | 9 | 4
a | a | a | X | X | X | X | X | X | X | X | a | a | 5 | 8
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | a | a | a | a | a | a | X | X | 10 | 3
a | a | a | X | X | X | X | X | X | X | X | X | a | 4 | 9
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | a | a | a | a | a | a | a | X | 11 | 2
a | a | a | X | X | X | X | X | X | X | X | X | X | 3 | 10
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
This little console application shows the results of the use of the class. For the specified number of iterations, it fills the bag and extracts all the random items. At the end, it shows the maximum number of consecutive repetitions for each item, and the accumulated number of occurrences of each item and each call to GetNextRandomIndex() for all the iterations. As you can see, after 1 million iterations, the accumulated elements for the first call are around 750,000 for "a" and 250,000 for "X", and for the last call are around 999,950 for "a" and 50 for "X":
static void Main(string[] args)
{
int MaxIterations = 1000000;
bool ShowResults = true;
string[] items = new string[] { "a", "X" };
int[] count = new int[] { 30, 10 };
int[] maxRepet = new int[] { 3, 2 };
int elementCount = count.Sum();
bool maxRepetOK = true;
for (int i = 0; i < items.Length; i++)
maxRepetOK = maxRepetOK && ShuffleBag.CheckIndexMaxConsecutiveCorrectness(maxRepet[i], count[i], elementCount);
if (! maxRepetOK)
{
Console.WriteLine("*** Bad number of repetitions!! ***\n");
goto _END_;
}
Dictionary<string, Tuple<int,int>> results = new Dictionary<string,Tuple<int,int>>();
for (int i = 0; i < items.Length; i++)
results.Add(items[i], new Tuple<int,int>(0, 0));
int[,] resultsPerCall = new int[elementCount, items.Length];
ShuffleBag bag = new ShuffleBag(items.Length);
int iterations = 0;
for (int x = 0; x < MaxIterations; x++)
{
iterations++;
bag.Reset();
for (int i = 0; i < items.Length; i++)
bag.SetIndexData(i, count[i], maxRepet[i]);
string prevItem = "";
int prevRepetitions = 0;
int row = 0;
while (bag.RemainingElementsCount > 0)
{
int newIndex = bag.GetNextRandomIndex();
if (prevItem == items[newIndex])
{
prevRepetitions++;
}
else
{
prevItem = items[newIndex];
prevRepetitions = 1;
}
var resultAnt = results[items[newIndex]];
results[items[newIndex]] = new Tuple<int,int>(resultAnt.Item1 + 1, Math.Max(prevRepetitions, resultAnt.Item2));
resultsPerCall[row, newIndex]++;
if (ShowResults)
{
Console.WriteLine(items[newIndex]);
}
row++;
}
if (ShowResults && MaxIterations > 1)
{
Console.WriteLine("\nESC:\tEnd\nENTER:\tNext iteration\nTAB:\tAll iterations");
while (true)
{
switch (Console.ReadKey(true).Key)
{
case ConsoleKey.Enter:
goto _CONTINUE_;
case ConsoleKey.Escape:
goto _RESULTS_;
case ConsoleKey.Tab:
ShowResults = false;
goto _CONTINUE_;
}
}
_CONTINUE_:
Console.Clear();
Console.WriteLine("Iterating ...");
}
}
_RESULTS_:
Console.Clear();
Console.WriteLine("RESULTS\n------------------------------------------");
for (int i = 0; i < items.Length; i++)
{
var data = results[items[i]];
double average = (double)data.Item1 / iterations;
Console.WriteLine(items[i] +
": Average = " + average.ToString() + (average != count[i] ? "(!)" : "") +
"\t\tMax. repetitions = " + data.Item2.ToString() + (data.Item2 > maxRepet[i] ? "(!)" : ""));
}
Console.WriteLine();
for (int i = 0; i < elementCount; i++)
{
Console.Write((i+1).ToString("00") + ")");
for (int k = 0; k < items.Length; k++)
{
Console.Write(" \t" + items[k] + ": " + resultsPerCall[i, k].ToString());
}
Console.WriteLine();
}
_END_:
Console.WriteLine("\n\nESC to exit");
while (Console.ReadKey(true).Key != ConsoleKey.Escape);
}
The problem is that, from GetNextRandomIndex(), I only use the number of remaining elements of an item as the probability for selecting it, so for the first call the probability of getting "a" is 3 times the probability of getting "X" (since there are 30 elements of "a" and 10 of "X"). Does anybody have any idea of how can I change my algorithm (or use a different one) to get the correct probabilities?
As promised, here’s a Markov chain Monte Carlo sampler. I can’t seem to
get an exact version using the Propp–Wilson technique; this one merely
converges to a distribution that is biased toward and uniform on the
desired outcomes, possibly rather slowly. Define the score of a
permutation to be the number of invalid letters. Specifically, if A
should appear at most once consecutively and B should appear at most
twice consecutively, then the score of
AAABBAABBBB
^^ ^ ^^
is 5, where the contributing letters are marked with ^.
Starting with a random permutation, repeatedly choose two positions with
replacement independently and uniformly at random. Score the permutation
with the letters at those positions swapped; this is the proposed
permutation. Compute two to the power of (the current score - the
proposed score). If a uniform random float between zero and one is less
than this number, then keep this proposed swap; otherwise, undo it. Do
this as long as you can stand, then take the next valid permutation.
Here's an approach for exact sampling that did not work as well as I had hoped. I'm posting it in case it proves useful anyway. Given the maximum run lengths, prepare an automaton that recognizes the valid strings.
def make_automaton(max_consecutive):
states = {letter * j for letter, k in max_consecutive.items() for j in range(1, k + 1)}
states.add('')
automaton = {}
for state in states:
transitions = {}
for letter in max_consecutive.keys():
new_state = state + letter if letter == state[-1:] else letter
if new_state in states:
transitions[letter] = new_state
automaton[state] = transitions
return automaton
Here's the code in action.
>>> from pprint import pprint
>>> pprint(make_automaton({'a': 3, 'X': 2}))
{'': {'X': 'X', 'a': 'a'},
'X': {'X': 'XX', 'a': 'a'},
'XX': {'a': 'a'},
'a': {'X': 'X', 'a': 'aa'},
'aa': {'X': 'X', 'a': 'aaa'},
'aaa': {'X': 'X'}}
Now, there's an approach to exact sampling that seems to use too much space. Prepare a table indexed by pairs of an automaton state and a multiset of letters. The table values give the number of words with specified letter counts that put the automaton in the specified state. Then we can use this table by sampling the final state and using conditional probabilities to derive the string that got the automaton there.
My idea was, instead of trying to remember the counts, sample from the following distribution until we get the letter counts right. Each letter is distributed independently according to the counts. We condition on the resulting word being valid.
Here's the rest of the code. It seems to work OK as long as the counts aren't too large and the runs aren't too short.
from collections import defaultdict
def make_probabilities(count, automaton):
probabilities = [{min(automaton): 1.0}]
total = sum(count.values())
for i in range(1, total + 1):
distribution = defaultdict(float)
for state, p in probabilities[-1].items():
transitions = automaton[state]
for letter, k in count.items():
if letter in transitions:
distribution[transitions[letter]] += p * (k / total)
probabilities.append(dict(distribution))
return probabilities
pprint(make_probabilities({'a': 30, 'X': 10}, make_automaton({'a': 3, 'X': 2})))
from random import random
def weighted_sample(distribution):
while True:
sample = random() * sum(distribution.values())
for letter, k in distribution.items():
sample -= k
if sample < 0.0:
return letter
from collections import Counter
print(Counter(weighted_sample({'a': 30, 'X': 10}) for i in range(10000)))
def unbiased_sample(count, max_consecutive):
automaton = make_automaton(max_consecutive)
probabilities = make_probabilities(count, automaton)
total = sum(count.values())
while True:
sample = []
state = weighted_sample(probabilities[-1])
for i in range(len(probabilities) - 2, -1, -1):
conditional_distribution = {}
for old_state, p in probabilities[i].items():
transitions = automaton[old_state]
for letter, k in count.items():
if letter in transitions and transitions[letter] == state:
conditional_distribution[(old_state, letter)] = p * (k / total)
state, letter = weighted_sample(conditional_distribution)
sample.append(letter)
if Counter(sample) == count:
return ''.join(sample)
for r in range(1000):
print(unbiased_sample({'A': 10, 'B': 10, 'C': 10, 'D': 10}, {'A': 5, 'B': 4, 'C': 3, 'D': 2}))

MapReduce matrix multiplication complexity

Assume, that we have large file, which contains descriptions of the cells of two matrices (A and B):
+---------------------------------+
| i | j | value | matrix |
+---------------------------------+
| 1 | 1 | 10 | A |
| 1 | 2 | 20 | A |
| | | | |
| ... | ... | ... | ... |
| | | | |
| 1 | 1 | 5 | B |
| 1 | 2 | 7 | B |
| | | | |
| ... | ... | ... | ... |
| | | | |
+---------------------------------+
And we want to calculate the product of this matrixes: C = A x B
By definition: C_i_j = sum( A_i_k * B_k_j )
And here is a two-step MapReduce algorithm, for calculation of this product (I will provide a pseudocode):
First step:
function Map (input is a single row of the file from above):
i = row[0]
j = row[1]
value = row[2]
matrix = row[3]
if(matrix == 'A')
emit(i, {j, value, 'A'})
else
emit(j, {i, value, 'B'})
Complexity of this Map function is O(1)
function Reduce(Key, List of tuples from the Map function):
Matrix_A_tuples =
filter( List of tuples from the Map function, where matrix == 'A' )
Matrix_B_tuples =
filter( List of tuples from the Map function, where matrix == 'B' )
for each tuple_A from Matrix_A_tuples
i = tuple_A[0]
value_A = tuple_A[1]
for each tuple_B from Matrix_B_tuples
j = tuple_B[0]
value_B = tuple_B[1]
emit({i, j}, {value_A * value_b, 'C'})
Complexity of this Reduce function is O(N^2)
After the first step we will get something like the following file (which contains O(N^3) lines):
+---------------------------------+
| i | j | value | matrix |
+---------------------------------+
| 1 | 1 | 50 | C |
| 1 | 1 | 45 | C |
| | | | |
| ... | ... | ... | ... |
| | | | |
| 2 | 2 | 70 | C |
| 2 | 2 | 17 | C |
| | | | |
| ... | ... | ... | ... |
| | | | |
+---------------------------------+
So, all we have to do - just sum the values, from lines, which contains the same values i and j.
Second step:
function Map (input is a single row of the file, which produced in first step):
i = row[0]
j = row[1]
value = row[2]
emit({i, j}, value)
function Reduce(Key, List of values from the Map function)
i = Key[0]
j = Key[1]
result = 0;
for each Value from List of values from the Map function
result += Value
emit({i, j}, result)
After the second step we will get the file, which contains cells of the matrix C.
So the question is:
Taking into account, that there are multiple number of instances in MapReduce cluster - which is the most correct way to estimate complexity of the provided algorithm?
The first one, which comes to mind is such:
When we assume that number of instances in the MapReduce cluster is K.
And, because of the number of lines - from file, which produced after the first step is O(N^3) - the overall complexity can be estimated as O((N^3)/K).
But this estimation doesn't take into account many details: such as network bandwidth between instances of MapReduce cluster, ability to distribute data between distances - and perform most of the calculations locally etc.
So, I would like to know which is the best approach for estimation of efficiency of the provided MapReduce algorithm, and does it make sense to use Big-O notation to estimate efficiency of MapReduce algorithms at all?
as you said the Big-O estimates the computation complexity, and does not take into consideration the networking issues such(bandwidth, congestion, delay...)
If you want to calculate how much efficient the communication between instances, in this case you need other networking metrics...
However, I want to tell you something, if your file is not big enough, you will not see an improvement in term of execution speed. This is because the MapReduce works efficiently only with BIG data. Moreover, your code has two steps, that means two jobs. MapReduce, from one job to another, takes time to upload the file and start the job again. This can affect slightly the performance.
I think you can calculate the efficiently in term of speed and time as the MapReduce approach is for sure faster when it comes to big data. This is if we compared it to the sequential algorithms.
Moreover, efficiency can be with regards to the fault-tolerance. This is because MapReduce will manage to handle failures by itself. So, no need for the programmers to handle instance failure or networking failures..

Regex that matches valid Ruby local variable names

Does anyone know the rules for valid Ruby variable names? Can it be matched using a RegEx?
UPDATE: This is what I could come up with so far:
^[_a-z][a-zA-Z0-9_]+$
Does this seem right?
Identifiers are pretty straightforward. They begin with letters or an underscore, and contain letters, underscore and numbers. Local variables can't (or shouldn't?) begin with an uppercase letter, so you could just use a regex like this.
/^[a-z_][a-zA-Z_0-9]*$/
It's possible for variable names to be unicode letters, in which case most of the existing regexes don't match.
varname = "\u2211" # => "∑"
eval(varname + '= "Tony the Pony"') => "Tony the Pony"
puts varname # => ∑
local_variable_identifier = /Insert large regular expression here/
varname =~ local_variable_identifier # => nil
See also "Fun with Unicode" in either the Ruby 1.9 Pickaxe or at Fun with Unicode.
According to http://rubylearning.com/satishtalim/ruby_names.html a Ruby variable consists of:
A name is an uppercase letter,
lowercase letter, or an underscore
("_"), followed by Name characters
(this is any combination of upper- and
lowercase letters, underscore and
digits).
In addition, global variables begin with a dollar sign, instance variables with a single at-sign, and class variables with two at-signs.
A regular expression to match all that would be:
%r{
(\$|#{1,2})? # optional leading punctuation
[A-Za-z_] # at least one upper case, lower case, or underscore
[A-Za-z0-9_]* # optional characters (including digits)
}x
Hope that helps.
I like #aboutruby's answer, but just to complete it, here's the equivalent using POSIX bracket expressions.
/^[_[:lower:]][_[:alnum:]]*$/
Or, since a-z is actually shorter than [:lower:]:
/^[_a-z][_[:alnum:]]*$/
I think /^(\$){0,1}[_a-zA-Z][a-zA-Z0-9_]*([?!]){0,1}$/ is a bit closer to what you will need...
It depends on whether you want to match method names as well.
If you are trying to match a name that might be encountered in an expression, then it might start with $ and it might end with ? or !. If you know for sure that it is just a local variable then the rule will be much simpler.
i was trying to figure one out for a rails patch, and Matthew Draper wrote this one, using the ruby parser as a reference:
/\A(?![A-Z0-9])(?:[[:alnum:]_]|[^\0-\177])+\z/
And here it is, straight from the horse's mouth. (The horse in this case is the Draft ISO Ruby Specification):
local-variable-identifier → ( lowercase-character | _ ) identifier-character *
identifier-character → lowercase-character | uppercase-character | decimal-digit | _
uppercase-character → A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z
lowercase-character → a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z
decimal-digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
In Ruby 1.9, using named groups, you can translate this literally:
local_variable_identifier = %r{
(?<uppercase_character> A | B | C | D | E | F | G | H | I | J | K | L | M
| N | O | P | Q | R | S | T | U | V | W | X | Y | Z
){0}
(?<lowercase_character> a | b | c | d | e | f | g | h | i | j | k | l | m
| n | o | p | q | r | s | t | u | v | w | x | y | z
){0}
(?<decimal_digit> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9){0}
(?<identifier_character> \g<lowercase_character>
| \g<uppercase_character>
| \g<decimal_digit>
| _
){0}
( \g<lowercase_character> | _ ) \g<identifier_character>*
}x
Of course, this is not how you would really write it.

Resources