Weighted random selection with maximum consecutive repetitions - algorithm

Let's say I have a list of items in which each item can be repeated once or more than once (1 to n). I'm trying to find an algorithm that extracts random items from the list until it's empty, but with the constraint that an item cannot be repeated consecutively more than a fixed number of times (which could be different for each item). I would like the algorithm to have "correct probabilities" (I try to explain that later).
For example, say I have this list:
Item | Count | Max. consecutive
-----+-------+-----------------
A | 2 | 1
B | 4 | 2
Some results could be:
B A B A B B
B B A B A B
But the following would be incorrect, since "B" has 3 consecutive repetitions, when the maximum was 2:
B B B A B A
I've managed to create an algorithm that works, but it has a problem with the probabilities. I will put the code first, and then talk about the problem. It's a C# class; sorry if you don't like the format or the GOTOs, let's be friends ;P
To use it, you instantiate an object providing the number of unique items, use SetIndexData() to add the information for each item (number of elements and max. repetitions), and call GetNextRandomIndex() to get the next random item (I have implemented it using int as if the items were indexes, you would have to create an indexed collection to store the real items' values).
You can use the method CheckIndexMaxConsecutiveCorrectness() to check if it's possible to have an item with the specified maximum repetitions in a bag with the specified total elements. For example, if a bag has only 1 item with 2 elements, but we say it can only be repeated once, it's obviously impossible, so CheckIndexMaxConsecutiveCorrectness() would return false.
The method GetIndexMinMaxGroups() is used internally by the class to retrieve the minimum and maximum number of groups of consecutive elements for an item (for example, in the sequence B B B A B A we have 2 groups for "A" and 2 for "B"). Actually, the numbers calculated in this method are not exactly true, since there are some variables that it doesn't take into account, but the values it returns are useful for two things: doing the check from CheckIndexMaxConsecutiveCorrectness() (if the minimum number of groups is greater than the maximum, then the maximum number of repetitions is not correct), and knowing when it's obligatory to return a determined item from GetNextRandomIndex() (when the minimum is equal to the maximum). I haven't used any mathematical algorithm to deduce those properties, so something could be wrong, though so far it has "magically" worked perfectly in my tests, millions of times...
class ShuffleBag
{
/*** INFORMATION FOR THE INDEXES ***/
// Class for the data:
class IndexData
{
public int Count = 0; // The current number of instances of the index in the bag.
public int MaxConsecutive = int.MaxValue; // The maximum consecutive repetitions allowed for the index.
}
// List of indexes data for this bag:
IndexData[] IndexesDataList;
/*** MEMBERS USED FOR CALCULATIONS ***/
// Random number generator:
Random RandGenerator;
// Remaining elements in the bag:
int _RemainingElementsCount = 0;
public int RemainingElementsCount
{
get { return _RemainingElementsCount; }
}
// Last retrieved index (-1 if no index has been retrieved yet),
// and the last consecutive repetitions of that index:
int LastIndex = -1;
int LastRepetitions = 0;
///==============///
/// Constructor. ///
///==============///
public ShuffleBag(int uniqueIndexesCount)
{
IndexesDataList = new IndexData[uniqueIndexesCount];
for (int i = 0; i < uniqueIndexesCount; i++)
IndexesDataList[i] = new IndexData();
RandGenerator = new Random();
}
///===========================================================///
/// Resets the shuffle bag; must be called before reusing it. ///
/// The number of unique indexes won't be reset. ///
/// Doesn't need to be called just after creating the bag. ///
///===========================================================///
public void Reset()
{
for (int i = 0; i < IndexesDataList.Length; i++)
{
IndexesDataList[i].Count = 0;
IndexesDataList[i].MaxConsecutive = int.MaxValue;
}
_RemainingElementsCount = 0;
LastIndex = -1;
LastRepetitions = 0;
}
///==================================================================================================///
/// Checks if it's possible to honor the max repetitions of an index with the provided data. ///
/// If it was not possible, the behaviour of a shuffle bag with those parameters would be undefined. ///
///==================================================================================================///
public static bool CheckIndexMaxConsecutiveCorrectness(int maxConsecutive, int indexElements, int bagTotalElements)
{
int min, max;
GetIndexMinMaxGroups(indexElements, maxConsecutive, bagTotalElements, out min, out max);
return min <= max;
}
///=====================================================================///
/// Sets the data for the specified index. ///
/// Can be called after starting to use the bag, ///
/// but if any parameters make the max consecutive repetitions invalid, ///
/// the behaviour of the bag will be undefined. ///
///=====================================================================///
public void SetIndexData(int index, int count, int maxConsecutive)
{
IndexData data = IndexesDataList[index];
_RemainingElementsCount += count - data.Count;
data.Count = count;
data.MaxConsecutive = maxConsecutive;
}
///====================================================================================================================///
/// Retrieves the next random index. The caller must check if there are remaining elements in the bag to be retrieved. ///
///====================================================================================================================///
public int GetNextRandomIndex()
{
/*** GET THE INDEX ***/
int index;
// If, for any index, the minimum possible groups equals the maximum, it must be the returned index:
for (index = 0; index < IndexesDataList.Length; index++)
{
IndexData data = IndexesDataList[index];
int minGroups, maxGroups;
GetIndexMinMaxGroups(data.Count, data.MaxConsecutive, _RemainingElementsCount, out minGroups, out maxGroups);
if (minGroups == maxGroups)
goto _INDEX_FOUND_;
}
// Get a random number to choose the index:
int rand = RandGenerator.Next(_RemainingElementsCount);
for (index = 0; index < IndexesDataList.Length; index++)
{
IndexData data = IndexesDataList[index];
// This index corresponds with the random number:
if (rand < data.Count)
{
// Check if the index has reached the maximum consecutive repetitions;
// in that case, get the next available one:
if (index == LastIndex && data.MaxConsecutive == LastRepetitions)
{
for (int k = 1; k <= IndexesDataList.Length - 1; k++)
{
int m = WrapIndexSimple(index + k, IndexesDataList.Length);
if (IndexesDataList[m].Count > 0)
{
index = m;
goto _INDEX_FOUND_;
}
}
}
goto _INDEX_FOUND_;
}
// This index doesn't correspond with the random number; update it to check the next index:
else
{
rand -= data.Count;
}
}
/*** INDEX FOUND, UPDATE AND RETURN ***/
_INDEX_FOUND_:
IndexData resultData = IndexesDataList[index];
resultData.Count--;
_RemainingElementsCount--;
if (LastIndex == index)
{
LastRepetitions++;
}
else
{
LastIndex = index;
LastRepetitions = 1;
}
return index;
}
///===============================================================================///
/// Calculates the minimum and maximum possible groups of consecutive repetitions ///
/// for an index with the specified data. ///
/// If any provided data is invalid, the behaviour and results are undefined. ///
///===============================================================================///
static void GetIndexMinMaxGroups(int indexRemainingElements, int indexMaxConsecutive, int bagRemainingElements, out int min, out int max)
{
int rem;
int div = Math.DivRem(indexRemainingElements, indexMaxConsecutive, out rem);
min = rem == 0 ? div : div + 1;
max = bagRemainingElements - indexRemainingElements + 1;
}
///=======================================================///
/// Converts an index out of bounds to a valid value. ///
/// "length" is the number of elements of the collection. ///
/// Only works for indexes that are less than 2*length. ///
///=======================================================///
static int WrapIndexSimple(int index, int length)
{
if (index >= length)
return index - length;
else if (index < 0)
return length + index;
else
return index;
}
}
The problem:
As I said, so far the class has worked as I wanted, extracting items without ever exceeding the number of repetitions. The problem is that the probabilities of extracting an item at each call of GetNextRandomIndex() are not what they should be. For example, if I have this:
Item | Count | Max. consecutive
-----+-------+-----------------
a | 30 | 3
X | 10 | 2
If I'm right, there would be 13 different possible sequences. The method should return "a" 12 times for each time it returns "X" in the first 3 calls; for the fourth, it should return "a" 10 times for every 3 times it returns "X"; etc. These are the different sequences:
Seq. 1 | Seq. 2 | Seq. 3 | Seq. 4 | Seq. 5 | Seq. 6 | Seq. 7 | Seq. 8 | Seq. 9 | Seq. 10 | Seq. 11 | Seq. 12 | Seq. 13 | Number of “a” | Number of “X”
-------+--------+--------+--------+--------+--------+--------+--------+--------+---------+---------+---------+---------+---------------+--------------
a | a | a | X | a | a | a | a | a | a | a | a | a | 12 | 1
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | X | X | X | X | X | X | X | X | X | 3 | 10
a | a | a | X | X | a | a | a | a | a | a | a | a | 11 | 2
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | X | X | X | X | X | X | X | X | 4 | 9
a | a | a | X | X | X | a | a | a | a | a | a | a | 10 | 3
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | a | X | X | X | X | X | X | X | 5 | 8
a | a | a | X | X | X | X | a | a | a | a | a | a | 9 | 4
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | a | a | X | X | X | X | X | X | 6 | 7
a | a | a | X | X | X | X | X | a | a | a | a | a | 8 | 5
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | a | a | a | X | X | X | X | X | 7 | 6
a | a | a | X | X | X | X | X | X | a | a | a | a | 7 | 6
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | a | a | a | a | X | X | X | X | 8 | 5
a | a | a | X | X | X | X | X | X | X | a | a | a | 6 | 7
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | a | a | a | a | a | X | X | X | 9 | 4
a | a | a | X | X | X | X | X | X | X | X | a | a | 5 | 8
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | a | a | a | a | a | a | X | X | 10 | 3
a | a | a | X | X | X | X | X | X | X | X | X | a | 4 | 9
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | a | a | a | a | a | a | a | X | 11 | 2
a | a | a | X | X | X | X | X | X | X | X | X | X | 3 | 10
a | a | X | a | a | a | a | a | a | a | a | a | a | 12 | 1
a | X | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
X | a | a | a | a | a | a | a | a | a | a | a | a | 12 | 1
This little console application shows the results of the use of the class. For the specified number of iterations, it fills the bag and extracts all the random items. At the end, it shows the maximum number of consecutive repetitions for each item, and the accumulated number of occurrences of each item and each call to GetNextRandomIndex() for all the iterations. As you can see, after 1 million iterations, the accumulated elements for the first call are around 750,000 for "a" and 250,000 for "X", and for the last call are around 999,950 for "a" and 50 for "X":
static void Main(string[] args)
{
int MaxIterations = 1000000;
bool ShowResults = true;
string[] items = new string[] { "a", "X" };
int[] count = new int[] { 30, 10 };
int[] maxRepet = new int[] { 3, 2 };
int elementCount = count.Sum();
bool maxRepetOK = true;
for (int i = 0; i < items.Length; i++)
maxRepetOK = maxRepetOK && ShuffleBag.CheckIndexMaxConsecutiveCorrectness(maxRepet[i], count[i], elementCount);
if (! maxRepetOK)
{
Console.WriteLine("*** Bad number of repetitions!! ***\n");
goto _END_;
}
Dictionary<string, Tuple<int,int>> results = new Dictionary<string,Tuple<int,int>>();
for (int i = 0; i < items.Length; i++)
results.Add(items[i], new Tuple<int,int>(0, 0));
int[,] resultsPerCall = new int[elementCount, items.Length];
ShuffleBag bag = new ShuffleBag(items.Length);
int iterations = 0;
for (int x = 0; x < MaxIterations; x++)
{
iterations++;
bag.Reset();
for (int i = 0; i < items.Length; i++)
bag.SetIndexData(i, count[i], maxRepet[i]);
string prevItem = "";
int prevRepetitions = 0;
int row = 0;
while (bag.RemainingElementsCount > 0)
{
int newIndex = bag.GetNextRandomIndex();
if (prevItem == items[newIndex])
{
prevRepetitions++;
}
else
{
prevItem = items[newIndex];
prevRepetitions = 1;
}
var resultAnt = results[items[newIndex]];
results[items[newIndex]] = new Tuple<int,int>(resultAnt.Item1 + 1, Math.Max(prevRepetitions, resultAnt.Item2));
resultsPerCall[row, newIndex]++;
if (ShowResults)
{
Console.WriteLine(items[newIndex]);
}
row++;
}
if (ShowResults && MaxIterations > 1)
{
Console.WriteLine("\nESC:\tEnd\nENTER:\tNext iteration\nTAB:\tAll iterations");
while (true)
{
switch (Console.ReadKey(true).Key)
{
case ConsoleKey.Enter:
goto _CONTINUE_;
case ConsoleKey.Escape:
goto _RESULTS_;
case ConsoleKey.Tab:
ShowResults = false;
goto _CONTINUE_;
}
}
_CONTINUE_:
Console.Clear();
Console.WriteLine("Iterating ...");
}
}
_RESULTS_:
Console.Clear();
Console.WriteLine("RESULTS\n------------------------------------------");
for (int i = 0; i < items.Length; i++)
{
var data = results[items[i]];
double average = (double)data.Item1 / iterations;
Console.WriteLine(items[i] +
": Average = " + average.ToString() + (average != count[i] ? "(!)" : "") +
"\t\tMax. repetitions = " + data.Item2.ToString() + (data.Item2 > maxRepet[i] ? "(!)" : ""));
}
Console.WriteLine();
for (int i = 0; i < elementCount; i++)
{
Console.Write((i+1).ToString("00") + ")");
for (int k = 0; k < items.Length; k++)
{
Console.Write(" \t" + items[k] + ": " + resultsPerCall[i, k].ToString());
}
Console.WriteLine();
}
_END_:
Console.WriteLine("\n\nESC to exit");
while (Console.ReadKey(true).Key != ConsoleKey.Escape);
}
The problem is that, from GetNextRandomIndex(), I only use the number of remaining elements of an item as the probability for selecting it, so for the first call the probability of getting "a" is 3 times the probability of getting "X" (since there are 30 elements of "a" and 10 of "X"). Does anybody have any idea of how can I change my algorithm (or use a different one) to get the correct probabilities?

As promised, here’s a Markov chain Monte Carlo sampler. I can’t seem to
get an exact version using the Propp–Wilson technique; this one merely
converges to a distribution that is biased toward and uniform on the
desired outcomes, possibly rather slowly. Define the score of a
permutation to be the number of invalid letters. Specifically, if A
should appear at most once consecutively and B should appear at most
twice consecutively, then the score of
AAABBAABBBB
^^ ^ ^^
is 5, where the contributing letters are marked with ^.
Starting with a random permutation, repeatedly choose two positions with
replacement independently and uniformly at random. Score the permutation
with the letters at those positions swapped; this is the proposed
permutation. Compute two to the power of (the current score - the
proposed score). If a uniform random float between zero and one is less
than this number, then keep this proposed swap; otherwise, undo it. Do
this as long as you can stand, then take the next valid permutation.

Here's an approach for exact sampling that did not work as well as I had hoped. I'm posting it in case it proves useful anyway. Given the maximum run lengths, prepare an automaton that recognizes the valid strings.
def make_automaton(max_consecutive):
states = {letter * j for letter, k in max_consecutive.items() for j in range(1, k + 1)}
states.add('')
automaton = {}
for state in states:
transitions = {}
for letter in max_consecutive.keys():
new_state = state + letter if letter == state[-1:] else letter
if new_state in states:
transitions[letter] = new_state
automaton[state] = transitions
return automaton
Here's the code in action.
>>> from pprint import pprint
>>> pprint(make_automaton({'a': 3, 'X': 2}))
{'': {'X': 'X', 'a': 'a'},
'X': {'X': 'XX', 'a': 'a'},
'XX': {'a': 'a'},
'a': {'X': 'X', 'a': 'aa'},
'aa': {'X': 'X', 'a': 'aaa'},
'aaa': {'X': 'X'}}
Now, there's an approach to exact sampling that seems to use too much space. Prepare a table indexed by pairs of an automaton state and a multiset of letters. The table values give the number of words with specified letter counts that put the automaton in the specified state. Then we can use this table by sampling the final state and using conditional probabilities to derive the string that got the automaton there.
My idea was, instead of trying to remember the counts, sample from the following distribution until we get the letter counts right. Each letter is distributed independently according to the counts. We condition on the resulting word being valid.
Here's the rest of the code. It seems to work OK as long as the counts aren't too large and the runs aren't too short.
from collections import defaultdict
def make_probabilities(count, automaton):
probabilities = [{min(automaton): 1.0}]
total = sum(count.values())
for i in range(1, total + 1):
distribution = defaultdict(float)
for state, p in probabilities[-1].items():
transitions = automaton[state]
for letter, k in count.items():
if letter in transitions:
distribution[transitions[letter]] += p * (k / total)
probabilities.append(dict(distribution))
return probabilities
pprint(make_probabilities({'a': 30, 'X': 10}, make_automaton({'a': 3, 'X': 2})))
from random import random
def weighted_sample(distribution):
while True:
sample = random() * sum(distribution.values())
for letter, k in distribution.items():
sample -= k
if sample < 0.0:
return letter
from collections import Counter
print(Counter(weighted_sample({'a': 30, 'X': 10}) for i in range(10000)))
def unbiased_sample(count, max_consecutive):
automaton = make_automaton(max_consecutive)
probabilities = make_probabilities(count, automaton)
total = sum(count.values())
while True:
sample = []
state = weighted_sample(probabilities[-1])
for i in range(len(probabilities) - 2, -1, -1):
conditional_distribution = {}
for old_state, p in probabilities[i].items():
transitions = automaton[old_state]
for letter, k in count.items():
if letter in transitions and transitions[letter] == state:
conditional_distribution[(old_state, letter)] = p * (k / total)
state, letter = weighted_sample(conditional_distribution)
sample.append(letter)
if Counter(sample) == count:
return ''.join(sample)
for r in range(1000):
print(unbiased_sample({'A': 10, 'B': 10, 'C': 10, 'D': 10}, {'A': 5, 'B': 4, 'C': 3, 'D': 2}))

Related

how to delete few rows of data from a text file using shell scripting based on some conditions

I have a text file with more than 100k rows. Below mentioned data is a sample for the text file I have. I want to use some conditions on this data and delete some rows. The text file does not have headers (ID,NAME,Code-1,code,2-code-3). I mentioned for reference. How can I achieve this with shell scripting?
Input test file:
| ID | NAME | Code-1 | code-2 | code-3 |
| $$ | 5HF | 1E | N | Y |
| $$ | 2MU | 3C | N | Y |
| $$ | 32E | 3C | N | N |
| AB | 3CH | 3C | N | N |
| MK | A1M | AS | P | N |
| $$ | Y01 | 01 | F | Y |
| $$ | BG0 | 0G | F | N |
Conditions:
if code-2 = 'N' and code-1 not equal to ( '3C' , '3B' , '32' , '31' , '3D' ) then ID='$$'
if code-2 ='N' and code-1 equal to ( '3C' , '3B' , '32' , '31' , '3D') then accept any ID and (accept ID='$$' only if code-3='Y')'
if code-2 != 'N' then accept (ID='$$' only if code-3='Y') and all other IDs
Output:
| ID | NAME | Code-1 | code-2 | code-3 |
| $$ | 5HF | 1E | N | Y |
| $$ | 2MU | 3C | N | Y |
| AB | 3CH | 3C | N | N |
| MK | A1M | AS | P | N |
| $$ | Y01 | 01 | F | Y |
It's encouraged you demonstrate own efforts when ask questions. But I do understand this question could be complicated if you are new to Bash. Here is my solution using awk. Spent 0.545s processed 137k lines on my computer (with moderate specs).
awk '{
ID=$2; NAME=$4; CODE1=$6; CODE2=$8; CODE3=$10;
if (CODE2 == "N") {
if (CODE1 ~ /(3C|3B|32|31|3D)/) {
if (ID == "$$") {
if (CODE3 == "Y") {
print;
}
}
else {
print;
}
}
else {
if (ID == "$$") {
print;
}
}
}
else {
if (ID == "$$") {
if (CODE3 == "Y") {
print;
}
}
else {
print;
}
}}' file
Note it has certain restrictions:
a) It delimits values by spaces not |. It will work with your exact input format, but won't work with input rows without additional spaces, e.g.
|$$|32E|3C|N|N|
|AB|3CH|3C|N|N|
b) For the same reason, the command will generate incorrect result, if col value has extra spaces, e.g.
| $$ | 32E FOO | 3C | N | N |
| AB | 3CH BBT | 3C | N | N |

Optimal algorithm to determine relationship tree

Suppose that you have a list of formulas like the following one:
KPI1 = somevalue1 + somevalue2
KPI2 = somevalue1 + somevalue3
KPI3 = KPI1 + somevalue4
KPI4 = KPI2 + KPI3
etc.
Which is the optimal algorithm that can be used to obtain a relationship tree for each of the elements referenced in the formulas?
i.e., using the example above:
+------------------------------------------------------------+
| somevalue3 somevalue1 somevalue2 somevalue4 |
| | | | | | |
| --------------- -------------- | |
| | | | |
| KPI2 KPI1 | |
| | | | |
| | --------------------- |
| | | |
| | KPI3 |
| | | |
| --------------------------- |
| | |
| KPI4 |
+------------------------------------------------------------+
You can use a hashmap where a key corresponds to a defined name (e.g. "KPI3"), and the corresponding value is a list of names/values on which that name depends (e.g. ["KPI1", somevalue4]).
Here is an implementation in JavaScript, where we can use the native Map constructor. Once the map is populated with all the dependencies, a recursive function can for example print the tree:
function printTree(map, name, indent="") {
console.log(indent + name);
let children = map.get(name);
if (children !== undefined) {
for (let childName of children) {
printTree(map, childName, indent+" ");
}
}
}
let map = new Map();
map.set("KPI1", ["value1", "value2"]);
map.set("KPI2", ["value1", "value3"]);
map.set("KPI3", ["KPI1", "value4"]);
map.set("KPI4", ["KPI2", "KPI3"]);
printTree(map, "KPI4");
In Python you would use a dictionary:
def print_tree(d, name, indent=""):
print(indent + name)
children = d.get(name, None)
if children is not None:
for childName in children:
print_tree(d, childName, indent+" ")
d = dict()
d["KPI1"] = ["value1", "value2"]
d["KPI2"] = ["value1", "value3"]
d["KPI3"] = ["KPI1", "value4"]
d["KPI4"] = ["KPI2", "KPI3"]
print_tree(d, "KPI4")

Multiple Tables Group and substract sum of columns using linq sql

Here i have two tables
Table One
+---------------+----------+------------+
| Raw Material | Size | Qty |
+---------------+----------+------------+
| A | 1 | 5 |
| A | 2 | 2 |
| A | 1 | 2 |
| B | 0 | 5 |
| B | 0 | 1 |
+---------------+----------+------------+
Table Two
+---------------+----------+------------+
| Raw Material | Size | Qty |
+---------------+----------+------------+
| A | 1 | 2 |
| A | 2 | 1 |
| A | 1 | 1 |
+---------------+----------+------------+
I want out put like
+---------------+----------+------------+
| Raw Material | Size | Qty |
+---------------+----------+------------+
| A | 1 | 4 |
| A | 2 | 1 |
| B | 0 | 6 |
+---------------+----------+------------+
Want to get substract first two tables sum of qty by grouping Rawmaterial and Size
Something like this should do the job
var result = tableA.Select(e => new { Item = e, Factor = 1 })
.Concat(tableB.Select(e => new { Item = e, Factor = -1 }))
.GroupBy(e => new { e.Item.RawMaterial, e.Item.Size }, (key, elements) => new
{
RawMaterial = key.RawMaterial,
Size = key.Size,
Qty = elements.Sum(e => e.Item.Qty * e.Factor)
}).ToList();
First we create a union of the two tables using Concat, including the information which one is additive (in Factor field), and then just do the normal grouping.
If you want the result to be List<YourTableElementType>, just replace the final anonymous type projection (new { ... }) with new YourTableElementType { ... }.

Infinite loop nested within a block in Ruby

Here's what I'm trying to do. I'm iterating through an array of strings, each of which may contain [] multiple times. I want to use String#match as many times as necessary to process every occurrence. So I have an Array#each block, and nested within that is an infinite loop that should break only when I run out of matches for a given string.
def follow(c, dir)
case c
when "\\"
dir.rotate!
when "/"
dir.rotate!.map! {|x| -x}
end
return dir
end
def parse(ar)
ar.each_with_index { |s, i|
idx = 0
while true do
t = s.match(/\[[ ]*([a-zA-Z]+)[ ]*\]/) { |m|
idx = m.end 0
r = m[1]
s[m.begin(0)...idx] = " "*m[0].size
a = [i, idx]
dir = [0, 1]
c = ar[a[0]][a[1]]
while !c.nil? do
dir = follow c, dir
ar[a[0]][a[1]] = " "
a[0] += dir[0]; a[1] += dir[1]
c = ar[a[0]][a[1]]
if c == ">" then
ar[a[0]][a[1]+1] = r; c=nil
elsif c == "<" then
ar[a[0]][a[1]-r.size] = r; c=nil
end
end
ar[a[0]][a[1]] = " "
puts ar
}
if t == nil then break; end
end
}
parse File.new("test", "r").to_a
Contents of test:
+--------+----------+-------------+
| Colors | Foods | Countries |
+--------+----------+-------------+
| red | pizza | Switzerland |
/--> /----> | |
| |[kale]/ | hot dogs | Brazil |
| | <----------------------\ |
| | orange |[yellow]\ | [green]/ |
| +--------+--------|-+-------------+
\-------------------/
Goal:
+--------+----------+-------------+
| Colors | Foods | Countries |
+--------+----------+-------------+
| red | pizza | Switzerland |
yellow kale | |
| | hot dogs | Brazil |
| green |
| orange | | |
+--------+-------- -+-------------+
Actual output of program:
+--------+----------+-------------+
| Colors | Foods | Countries |
+--------+----------+-------------+
| red | pizza | Switzerland |
yellow kale | |
| | hot dogs | Brazil |
| <----------------------\ |
| orange | | [green]/ |
+--------+-------- -+-------------+
(Since the array is modified in place, I figure there is no need to update the match index.) The inner loop should run again for each pair of [] I find, but instead I only get one turn per array entry. (It seems that the t=match... bit isn't the problem, but I could be wrong.) How can I fix this?

Print all possible path using haskell

I need to write a program to draw all possible paths in a given matrix that can be had by moving in only left, right and up direction.
One should not cross the same location more than once. Note also that on a particular path, we may or may not use motion in all possible directions.
Path will start in the bottom-left corner in the matrix and will reach the top-right corner.
Following symbols are used to denote the direction of the motion in the current position:
+---+
| > | right
+---+
+---+
| ^ | up
+---+
+---+
| < | left
+---+
The symbol * is used in the final location to indicate end of path.
Example:
For a 5x8 matrix, using left, right and up directions, 2 different paths are shown below.
Path 1:
+---+---+---+---+---+---+---+---+
| | | | | | | | * |
+---+---+---+---+---+---+---+---+
| | | > | > | > | > | > | ^ |
+---+---+---+---+---+---+---+---+
| | | ^ | < | < | | | |
+---+---+---+---+---+---+---+---+
| | > | > | > | ^ | | | |
+---+---+---+---+---+---+---+---+
| > | ^ | | | | | | |
+---+---+---+---+---+---+---+---+
Path 2
+---+---+---+---+---+---+---+---+
| | | | > | > | > | > | * |
+---+---+---+---+---+---+---+---+
| | | | ^ | < | < | | |
+---+---+---+---+---+---+---+---+
| | | | | | ^ | | |
+---+---+---+---+---+---+---+---+
| | | > | > | > | ^ | | |
+---+---+---+---+---+---+---+---+
| > | > | ^ | | | | | |
+---+---+---+---+---+---+---+---+
Can anyone help me with this?
I tried to solve using lists. It i soon realized that i am making a disaster. Here is the code i tried with.
solution x y = travel (1,1) (x,y)
travelRight (x,y) = zip [1..x] [1,1..] ++ [(x,y)]
travelUp (x,y) = zip [1,1..] [1..y] ++ [(x,y)]
minPaths = [[(1,1),(2,1),(2,2)],[(1,1),(1,2),(2,2)]]
travel startpos (x,y) = rt (x,y) ++ up (x,y)
rt (x,y) | odd y = map (++[(x,y)]) (furtherRight (3,2) (x,2) minPaths)
| otherwise = furtherRight (3,2) (x,2) minPaths
up (x,y) | odd x = map (++[(x,y)]) (furtherUp (2,3) (2,y) minPaths)
| otherwise = furtherUp (2,3) (2,y) minPaths
furtherRight currpos endpos paths | currpos == endpos = (travelRight currpos) : map (++[currpos]) paths
| otherwise = furtherRight (nextRight currpos) endpos ((travelRight currpos) : (map (++[currpos]) paths))
nextRight (x,y) = (x+1,y)
furtherUp currpos endpos paths | currpos == endpos = (travelUp currpos) : map (++[currpos]) paths
| otherwise = furtherUp (nextUp currpos) endpos ((travelUp currpos) : (map(++[currpos]) paths))
nextUp (x,y) = (x,y+1)
identify lst = map (map iden) lst
iden (x,y) = (x,y,1)
arrows lst = map mydir lst
mydir (ele:[]) = "*"
mydir ((x1,y1):(x2,y2):lst) | x1==x2 = '>' : mydir ((x2,y2):lst)
| otherwise = '^' : mydir ((x2,y2):lst)
surroundBox lst = map (map createBox) lst
bar = "+ -+"
mid x = "| "++ [x] ++" |"
createBox chr = bar ++ "\n" ++ mid chr ++ "\n" ++ bar ++ "\n"
This ASCII grids are much more confusing than enlightening. Let me describe a better way to represent each possible path.
Each non-top row will have exactly one cell with UP. I claim that once each of the UP cells has been chosen that the LEFT and RIGHT and EMPTY cells can be determined. I claim that all possible cells in each of the non-top rows can be UP in all combination.
Each path is thus isomorphic to a (rows-1) length list of numbers in the range (1..columns) that determine the UP cells. The number of allowed paths is thus columns^(rows-1) and enumerating the possible paths in this format should be easy.
Then you could make a printer that converts this format to the ASCII art. This may be annoying, depending on skill level.
Looks like a homework so I will try to give enough hints
Try first filling number of paths from a cell to your goal.
So
+---+---+---+---+---+---+---+---+
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | * |
+---+---+---+---+---+---+---+---+
| | | | | | | | |
+---+---+---+---+---+---+---+---+
| | | | | | | | |
+---+---+---+---+---+---+---+---+
| | | | | | | | |
+---+---+---+---+---+---+---+---+
| | | | | | | | |
+---+---+---+---+---+---+---+---+
The thing to note here is from the cell in the top level there will always be one path to the *.
Number of possible path from cells in the same row will be same. You can realize this as all the paths will ultimately have to move up as there is no down action so in any path the cell above the current row can be reached by any cell in the current row.
You can feel the all possible paths from the current cell has its relation with the possible paths from the cell left,right and above. But as we know we can find all possible paths from only one cell in a row and rest of cells' possible paths will be some movements in the same row followed by a suffix of possible paths from that cell.
Maybe I will give you a example
+---+---+---+
| 1 | 1 | * |
+---+---+---+
| | | |
+---+---+---+
| | | |
+---+---+---+
You know all possible paths from cells in the first row. You need to find the same in the second row. So a good strategy would be to do it for the right most cell
+---+---+---+
| > | > | * |
+---+---+---+
| ^ | < | < |
+---+---+---+
| | | |
+---+---+---+
+---+---+---+
| | > | * |
+---+---+---+
| | ^ | < |
+---+---+---+
| | | |
+---+---+---+
+---+---+---+
| | | * |
+---+---+---+
| | | ^ |
+---+---+---+
| | | |
+---+---+---+
Now finding this for rest of the cells in the same row is trivial using these as I have told before.
In the end if you have m X n matrix the number of paths from bottom-left corner to top-right corner will be n^(m-1).
Another way
This way is not very optimal but easy to implement. Consider m X n grid
Find the path of longest length. You dont need the exact path just the number of <,>,^.
You can find the direct formula in terms of m and n.
Like
^ = m - 1
< = (n-1) * floor((m-1)/2)
> = (n-1) * (floor((m-1)/2) + 1)
Any valid path will be a prefix of the permutations of this which you can search exhaustively. Use permutations from Data.List to get all possible permutations. Then make a function which given a path strips a valid path from this. map this over the list of permutations and remove duplicates. The thing to note is path will be a prefix of what you get from permutation, so there can be several permutations for the same path.
Can you create that matrix and define the "fields"? Even if you can't (a specific matrix is given), you can map an [(Int, Int)] matrix (which sounds reasonable for this kind of task) to your own representation.
Since you didn't specify what your skill level was, I hope you don't mind that I suggest that you first try to create some kind of a grid in order to have something to work on:
data Status = Free | Left | Right | Up
deriving (Read, Show, Eq)
type Position = (Int, Int)
type Field = (Position, Status)
type Grid = [Field]
grid :: Grid
grid = [((x, y), stat) | x <- [1..10], y <- [1..10], let stat = Free]
Of course there are other ways to achieve this. Afterwards you can define some movement, map Position to Grid index and Statuses to printable characters... Try to fiddle with it and you might get some ideas.

Resources