How to walk binary abstract syntax tree to generate infix notation with minimally correct parentheses - algorithm

I am being passed a binary AST representing a math formula. Each internal node is an operator and leaf nodes are the operands. I need to walk the tree and output the formula in infix notation. This is pretty easy to do by walking the tree with a recursive algorithm such as the Print() method shows below. The problem with the Print() method is that the order of operations is lost when converting to infix because no parentheses are generated.
I wrote the PrintWithParens() method which outputs a correct infix formula, however it adds extraneous parentheses. You can see in three of the four cases of my main method it adds parenthesis when none are necessary.
I have been racking my brain trying to figure out what the correct algorithm for PrintWithMinimalParens() should be. I'm sure there must be an algorithm that can output only parentheses when necessary to group terms, however I have been unable to implement it correctly. I think I must need to look at the precedence of the operators in the tree below the current node, but the algorithm I have there now does't work (see the last 2 cases in my main method. No parentheses are needed, but my logic adds them).
public class Test {
static abstract class Node {
Node left;
Node right;
String text;
abstract void Print();
abstract void PrintWithParens();
abstract void PrintWithMinimalParens();
int precedence()
{
return 0;
}
}
enum Operator {
PLUS(1,"+"),
MINUS(1, "-"),
MULTIPLY(2, "*"),
DIVIDE(2, "/"),
POW(3, "^")
;
private final int precedence;
private final String text;
private Operator(int precedence, String text)
{
this.precedence = precedence;
this.text = text;
}
#Override
public String toString() {
return text;
}
public int getPrecedence() {
return precedence;
}
}
static class OperatorNode extends Node {
private final Operator op;
OperatorNode(Operator op)
{
this.op = op;
}
#Override
void Print() {
left.Print();
System.out.print(op);
right.Print();
}
#Override
void PrintWithParens() {
System.out.print("(");
left.PrintWithParens();
System.out.print(op);
right.PrintWithParens();
System.out.print(")");
}
#Override
void PrintWithMinimalParens() {
boolean needParens =
(left.precedence() != 0 && left.precedence() < this.op.precedence)
||
(right.precedence() != 0 && right.precedence() < this.op.precedence);
if(needParens)
System.out.print("(");
left.PrintWithMinimalParens();
System.out.print(op);
right.PrintWithMinimalParens();
if(needParens)
System.out.print(")");
}
#Override
int precedence() {
return op.getPrecedence();
}
}
static class TextNode extends Node {
TextNode(String text)
{
this.text = text;
}
#Override
void Print() {
System.out.print(text);
}
#Override
void PrintWithParens() {
System.out.print(text);
}
#Override
void PrintWithMinimalParens() {
System.out.print(text);
}
}
private static void printExpressions(Node rootNode) {
System.out.print("Print() : ");
rootNode.Print();
System.out.println();
System.out.print("PrintWithParens() : ");
rootNode.PrintWithParens();
System.out.println();
System.out.print("PrintWithMinimalParens() : ");
rootNode.PrintWithMinimalParens();
System.out.println();
System.out.println();
}
public static void main(String[] args)
{
System.out.println("Desired: 1+2+3+4");
Node rootNode = new OperatorNode(Operator.PLUS);
rootNode.left = new TextNode("1");
rootNode.right = new OperatorNode(Operator.PLUS);
rootNode.right.left = new TextNode("2");
rootNode.right.right = new OperatorNode(Operator.PLUS);
rootNode.right.right.left = new TextNode("3");
rootNode.right.right.right = new TextNode("4");
printExpressions(rootNode);
System.out.println("Desired: 1+2*3+4");
rootNode = new OperatorNode(Operator.PLUS);
rootNode.left = new TextNode("1");
rootNode.right = new OperatorNode(Operator.PLUS);
rootNode.right.left = new OperatorNode(Operator.MULTIPLY);
rootNode.right.left.left = new TextNode("2");
rootNode.right.left.right = new TextNode("3");
rootNode.right.right = new TextNode("4");
printExpressions(rootNode);
System.out.println("Desired: 1+2*(3+4)");
rootNode = new OperatorNode(Operator.PLUS);
rootNode.left = new TextNode("1");
rootNode.right = new OperatorNode(Operator.MULTIPLY);
rootNode.right.left = new TextNode("2");
rootNode.right.right = new OperatorNode(Operator.PLUS);
rootNode.right.right.left = new TextNode("3");
rootNode.right.right.right = new TextNode("4");
printExpressions(rootNode);
System.out.println("Desired: 1+2^8*3+4");
rootNode = new OperatorNode(Operator.PLUS);
rootNode.left = new TextNode("1");
rootNode.right = new OperatorNode(Operator.MULTIPLY);
rootNode.right.left = new OperatorNode(Operator.POW);
rootNode.right.left.left = new TextNode("2");
rootNode.right.left.right = new TextNode("8");
rootNode.right.right = new OperatorNode(Operator.PLUS);
rootNode.right.right.left = new TextNode("3");
rootNode.right.right.right = new TextNode("4");
printExpressions(rootNode);
}
}
Output:
Desired: 1+2+3+4
Print() : 1+2+3+4
PrintWithParens() : (1+(2+(3+4)))
PrintWithMinimalParens() : 1+2+3+4
Desired: 1+2*3+4
Print() : 1+2*3+4
PrintWithParens() : (1+((2*3)+4))
PrintWithMinimalParens() : 1+2*3+4
Desired: 1+2*(3+4)
Print() : 1+2*3+4
PrintWithParens() : (1+(2*(3+4)))
PrintWithMinimalParens() : 1+(2*3+4)
Desired: 1+2^8*3+4
Print() : 1+2^8*3+4
PrintWithParens() : (1+((2^8)*(3+4)))
PrintWithMinimalParens() : 1+(2^8*3+4)
Is is possible to implement the PrintWithMinimalParens() that I want? Does the fact that order is implicit in the tree make doing what I want impossible?

In your code you are comparing each operator with its children to see if you need parentheses around it. But you should actually be comparing it with its parent. Here are some rules that can determine if parentheses can be omitted:
You never need parentheses around the operator at the root of the AST.
If operator A is the child of operator B, and A has a higher precedence than B, the parentheses around A can be omitted.
If a left-associative operator A is the left child of a left-associative operator B with the same precedence, the parentheses around A can be omitted. A left-associative operator is one for which x A y A z is parsed as (x A y) A z.
If a right-associative operator A is the right child of a right-associative operator B with the same precedence, the parentheses around A can be omitted. A right-associative operator is one for which x A y A z is parsed as x A (y A z).
If you can assume that an operator A is associative, i.e. that (x A y) A z = x A (y A z) for all x,y,z, and A is the child of the same operator A, you can choose to omit parentheses around the child A. In this case, reparsing the expression will yield a different AST that gives the same result when evaluated.
Note that for your first example, the desired result is only correct if you can assume that + is associative (which is true when dealing with normal numbers) and implement rule #5. This is because your input tree is built in a right-associative fashion, while operator + is normally left-associative.

You're enclosing an entire expression in parentheses if either the left or the right child has a lower-precedence operator even if one of them is a higher- or equal-precedence operator.
I think you need to separate your boolean needParens into distinct cases for the left and right children. Something like this (untested):
void PrintWithMinimalParens() {
boolean needLeftChildParens =
(left.precedence() != 0 && left.precedence() < this.op.precedence);
boolean needRightChildParens =
(right.precedence() != 0 && right.precedence() < this.op.precedence);
if(needLeftChildParens)
System.out.print("(");
left.PrintWithMinimalParens();
if(needLeftChildParens)
System.out.print(")");
System.out.print(op);
if(needRightChildParens)
System.out.print("(");
right.PrintWithMinimalParens();
if(needRightChildParens)
System.out.print(")");
}
Also, I don't think your last example is correct. Looking at your tree I think it should be:
1+2^8*(3+4)

Related

BASH brace expansion algorithm

I am stuck on this algorithmic question :
design an algorithm that parse an expression like this :
"((a,b,cy)n,m)" should give :
an - bn - cyn - m
The expression can nest, therefore :
"((a,b)o(m,n)p,b)" parses to ;
aomp - aonp - bomp - bonp - b.
I thought of using stacks, but it is too complicated.
thanks.
You can parse it with a Recursive Descent Parser.
Let's say the comma separated strings are components, so for an expression ((a, b, cy)n, m), (a, b, cy)n and m are two components. a, b and cy are also components. So this is a recursive definition.
For a component (a, b, cy)n, let's say (a, b, cy) and n are two component parts of the component. Component parts will later be combined to produce final result (i.e., an - bn - cyn).
Let's say an expression is comma separated components, for example, (a, cy)n, m is an expression. It has two components (a, cy)n and m, and the component (a, cy)n has two component parts (a, cy) and n, and component part (a, cy) is a brace expression containing a nested expression: a, cy, which also has two components a and cy.
With these definitions (you might use other terms), we can write down the grammar for your expression:
expression = component, component, ...
component = component_part component_part ...
component_part = letters | (expression)
One line is one grammar rule. The first line means an expression is a list of comma separated components. The second line means a component can be constructed with one or more component parts. The third line means a component part can be either a continuous sequence of letters or a nested expression inside a pair of braces.
Then you can use a Recursive Descent Parser to solve your problem with the above grammar.
We will define one method/function for each grammar rule. So basically we will have three methods ParseExpression, ParseComponent, ParseComponentPart.
Algorithm
As I stated above, an expression is comma separated components, so in our ParseExpression method, it simply calls ParseComponent, and then check if the next char is comma or not, like this (I'm using C#, I think you can easily convert it to other languages):
private List<string> ParseExpression()
{
var result = new List<string>();
while (!Eof())
{
// Parsing a component will produce a list of strings,
// they are added to the final string list
var items = ParseComponent();
result.AddRange(items);
// If next char is ',' simply skip it and parse next component
if (Peek() == ',')
{
// Skip comma
ReadNextChar();
}
else
{
break;
}
}
return result;
}
You can see that, when we are parsing an expression, we recursively call ParseComponent (it will then recursively call ParseComponentPart). It's a top-down approach, that's why it's called Recursive Descent Parsing.
ParseComponent is similar, like this:
private List<string> ParseComponent()
{
List<string> leftItems = null;
while (!Eof())
{
// Parse a component part will produce a list of strings (rightItems)
// We need to combine already parsed string list (leftItems) in this component
// with the newly parsed 'rightItems'
var rightItems = ParseComponentPart();
if (rightItems == null)
{
// No more parts, return current result (leftItems) to the caller
break;
}
if (leftItems == null)
{
leftItems = rightItems;
}
else
{
leftItems = Combine(leftItems, rightItems);
}
}
return leftItems;
}
The combine method simply combines two string list:
// Combine two lists of strings and return the combined string list
private List<string> Combine(List<string> leftItems, List<string> rightItems)
{
var result = new List<string>();
foreach (var leftItem in leftItems)
{
foreach (var rightItem in rightItems)
{
result.Add(leftItem + rightItem);
}
}
return result;
}
Then is the ParseComponentPart:
private List<string> ParseComponentPart()
{
var nextChar = Peek();
if (nextChar == '(')
{
// Skip '('
ReadNextChar();
// Recursively parse the inner expression
var items = ParseExpression();
// Skip ')'
ReadNextChar();
return items;
}
else if (char.IsLetter(nextChar))
{
var letters = ReadLetters();
return new List<string> { letters };
}
else
{
// Fail to parse a part, it means a component is ended
return null;
}
}
Full Source Code (C#)
The other parts are mostly helper methods, full C# source code is listed below:
using System;
using System.Collections.Generic;
using System.Text;
namespace Examples
{
public class BashBraceParser
{
private string _expression;
private int _nextCharIndex;
/// <summary>
/// Parse the specified BASH brace expression and return the result string list.
/// </summary>
public IList<string> Parse(string expression)
{
_expression = expression;
_nextCharIndex = 0;
return ParseExpression();
}
private List<string> ParseExpression()
{
// ** This part is already posted above **
}
private List<string> ParseComponent()
{
// ** This part is already posted above **
}
private List<string> ParseComponentPart()
{
// ** This part is already posted above **
}
// Combine two lists of strings and return the combined string list
private List<string> Combine(List<string> leftItems, List<string> rightItems)
{
// ** This part is already posted above **
}
// Peek next char without moving the cursor
private char Peek()
{
if (Eof())
{
return '\0';
}
return _expression[_nextCharIndex];
}
// Read next char and move the cursor to next char
private char ReadNextChar()
{
return _expression[_nextCharIndex++];
}
private void UnreadChar()
{
_nextCharIndex--;
}
// Check if the whole expression string is scanned.
private bool Eof()
{
return _nextCharIndex == _expression.Length;
}
// Read a continuous sequence of letters.
private string ReadLetters()
{
if (!char.IsLetter(Peek()))
{
return null;
}
var str = new StringBuilder();
while (!Eof())
{
var ch = ReadNextChar();
if (char.IsLetter(ch))
{
str.Append(ch);
}
else
{
UnreadChar();
break;
}
}
return str.ToString();
}
}
}
Use The Code
var parser = new BashBraceParser();
var result = parser.Parse("((a,b)o(m,n)p,b)");
var output = String.Join(" - ", result);
// Result: aomp - aonp - bomp - bonp - b
Console.WriteLine(output);
public class BASHBraceExpansion {
public static ArrayList<StringBuilder> parse_bash(String expression, WrapperInt p) {
ArrayList<StringBuilder> elements = new ArrayList<StringBuilder>();
ArrayList<StringBuilder> result = new ArrayList<StringBuilder>();
elements.add(new StringBuilder(""));
while(p.index < expression.length())
{
if (expression.charAt(p.index) == '(')
{
p.advance();
ArrayList<StringBuilder> temp = parse_bash(expression, p);
ArrayList<StringBuilder> newElements = new ArrayList<StringBuilder>();
for(StringBuilder e : elements)
{
for(StringBuilder t : temp)
{
StringBuilder s = new StringBuilder(e);
newElements.add(s.append(t));
}
}
System.out.println("elements :");
elements = newElements;
}
else if (expression.charAt(p.index) == ',')
{
result.addAll(elements);
elements.clear();
elements.add(new StringBuilder(""));
p.advance();
}
else if (expression.charAt(p.index) == ')')
{
p.advance();
result.addAll(elements);
return result;
}
else
{
for(StringBuilder sb : elements)
{
sb.append(expression.charAt(p.index));
}
p.advance();
}
}
return elements;
}
public static void print(ArrayList<StringBuilder> list)
{
for(StringBuilder s : list)
{
System.out.print(s + " * ");
}
System.out.println();
}
public static void main(String[] args) {
WrapperInt p = new WrapperInt();
ArrayList<StringBuilder> list = parse_bash("((a,b)o(m,n)p,b)", p);
//ArrayList<StringBuilder> list = parse_bash("(a,b)", p);
WrapperInt q = new WrapperInt();
ArrayList<StringBuilder> list1 = parse_bash("((a,b,cy)n,m)", q);
ArrayList<StringBuilder> list2 = parse_bash("((a,b)dr(f,g)(k,m),L(p,q))", new WrapperInt());
System.out.println("*****RESULT : ******");
print(list);
print(list1);
print(list2);
}
}
public class WrapperInt {
public WrapperInt() {
index = 0;
}
public int advance()
{
index ++;
return index;
}
public int index;
}
// aomp - aonp - bomp - bonp - b.

Subset sum with positive and negative integers

I've to implement a variation of the subset sum problem, my input will be positive and negative decimal, also I will need to know the subset, knowing that exists unfortunately it's not enough.
I've tried the algorithms found on wikipedia, but I can't make them work with negative numbers, and also I can't find the way to obtain the subset if it exists.
Could anyone point me where I could find some pseudo-code, documentation or implementation, for this algorithm.
I wrote the code in Java
it checks all the possibilities
import java.util.*;
public class StackOverFlow {
public static <T> Set<Set<T>> powerSet(Set<T> originalSet) {
Set<Set<T>> sets = new HashSet<Set<T>>();
if (originalSet.isEmpty()) {
sets.add(new HashSet<T>());
return sets;
}
List<T> list = new ArrayList<T>(originalSet);
T head = list.get(0);
Set<T> rest = new HashSet<T>(list.subList(1, list.size()));
for (Set<T> set : powerSet(rest)) {
Set<T> newSet = new HashSet<T>();
newSet.add(head);
newSet.addAll(set);
sets.add(newSet);
sets.add(set);
}
return sets;
}
public static int sumSet(Set<Integer> set){
int sum =0;
for (Integer s : set) {
sum += s;
}
return sum;
}
public static void main(String[] args) {
Set<Integer> mySet = new HashSet<Integer>();
mySet.add(-1);
mySet.add(2);
mySet.add(3);
int mySum = 4;
for (Set<Integer> s : powerSet(mySet)) {
if(mySum == sumSet(s))
System.out.println(s + " = " + sumSet(s));
}
}
}
I hope it helps

Longest string that can be made out of a list of strings

I'm looking for an efficient algorithm which will give me the longest string which can be made out of a list of strings. More precisely:
Given a file containing large number of strings, find the longest string from the list of strings presented in the file, which is a concatenation of other one or more strings.
Note: The answer string should also belong to the list of strings in file.
Example input:
The
he
There
After
ThereAfter
Output:
ThereAfter
Sort list in descending length order for strings in the list (first in the list is longest string). Quicksort is sorting with average time complexity O(nlogn).
Then, iterate on strings in the list starting left.
From string S, iterate on elements s to its right. If s is a substring of S, remove s from S. Continue iterating to the right until S is empty, meaning that it is made of items of the list.
public static class ListCompare implements Comparator<String> {
public int compare(String s1, String s2) {
if (s1.length() < s2.length())
return 1;
else if (s1.length() > s2.length())
return -1;
else
return 0;
}
}
public static String longestSurString(String[] ss) {
Arrays.sort(ss, new ListCompare ());
for (String S: ss) {
String b = new String(s);
for (String s: ss) {
if (!s.equals(b) && S.contains(s)) {
S = S.replace(s, "");
}
}
if (S.length() == 0)
return b;
}
return null;
}
Let us number the strings from S1, S2, ..., Sn
If I understand the problem statement correctly, than for Sito be a candidate for an answer, it must be equal to the concatenation of some
Sj_1, Sj_2, ..., Sj_k where forall x in 1..k: i != j_x. That is Si must be the concatenation of a subset of the strings that it is not a member of.
Given that, add all the strings to a trie. This will find all the prefix pairs, that is all (Si, Sj_1) from the above. Removing the Sj_1 prefix from Si renders a new string T, that must either be equal to Sj_k, or can be similarly reduced by searching for Sj_2 in the trie.
Best Solution using Trie & HashMap data structure:
We need first store all string and then just check prefix which words length is more.
public class TrieNode {
private HashMap<Character, TrieNode> children;
private boolean isWord;
public TrieNode() {
children = new HashMap<>();
isWord = false;
}
public void add(String s) {
HashMap<Character,TrieNode> temp_child = children;
for(int i=0; i<s.length(); i++){
char c = s.charAt(i);
TrieNode trieNode;
if(temp_child.containsKey(c)){
trieNode = temp_child.get(c);
}else{
trieNode = new TrieNode();
temp_child.put(c, trieNode);
}
temp_child = trieNode.children;
//set leaf node
if(i==s.length()-1)
trieNode.isWord = true;
}
}
public boolean isWord(String s) {
TrieNode trieNode = searchNode(s);
if(trieNode != null && trieNode.isWord)
return true;
else
return false;
}
public TrieNode searchNode(String s){
HashMap<Character, TrieNode> temp_child = children;
TrieNode trieNode = null;
for(int i=0; i<s.length(); i++){
char c = s.charAt(i);
if(temp_child.containsKey(c)){
trieNode = temp_child.get(c);
temp_child = trieNode.children;
}else{
return null;
}
}
return trieNode;
}

Lossless hierarchical run length encoding

I want to summarize rather than compress in a similar manner to run length encoding but in a nested sense.
For instance, I want : ABCBCABCBCDEEF to become: (2A(2BC))D(2E)F
I am not concerned that an option is picked between two identical possible nestings E.g.
ABBABBABBABA could be (3ABB)ABA or A(3BBA)BA which are of the same compressed length, despite having different structures.
However I do want the choice to be MOST greedy. For instance:
ABCDABCDCDCDCD would pick (2ABCD)(3CD) - of length six in original symbols which is less than ABCDAB(4CD) which is length 8 in original symbols.
In terms of background I have some repeating patterns that I want to summarize. So that the data is more digestible. I don't want to disrupt the logical order of the data as it is important. but I do want to summarize it , by saying, symbol A times 3 occurrences, followed by symbols XYZ for 20 occurrences etc. and this can be displayed in a nested sense visually.
Welcome ideas.
I'm pretty sure this isn't the best approach, and depending on the length of the patterns, might have a running time and memory usage that won't work, but here's some code.
You can paste the following code into LINQPad and run it, and it should produce the following output:
ABCBCABCBCDEEF = (2A(2BC))D(2E)F
ABBABBABBABA = (3A(2B))ABA
ABCDABCDCDCDCD = (2ABCD)(3CD)
As you can see, the middle example encoded ABB as A(2B) instead of ABB, you would have to make that judgment yourself, if single-symbol sequences like that should be encoded as a repeated symbol or not, or if a specific threshold (like 3 or more) should be used.
Basically, the code runs like this:
For each position in the sequence, try to find the longest match (actually, it doesn't, it takes the first 2+ match it finds, I left the rest as an exercise for you since I have to leave my computer for a few hours now)
It then tries to encode that sequence, the one that repeats, recursively, and spits out a X*seq type of object
If it can't find a repeating sequence, it spits out the single symbol at that location
It then skips what it encoded, and continues from #1
Anyway, here's the code:
void Main()
{
string[] examples = new[]
{
"ABCBCABCBCDEEF",
"ABBABBABBABA",
"ABCDABCDCDCDCD",
};
foreach (string example in examples)
{
StringBuilder sb = new StringBuilder();
foreach (var r in Encode(example))
sb.Append(r.ToString());
Debug.WriteLine(example + " = " + sb.ToString());
}
}
public static IEnumerable<Repeat<T>> Encode<T>(IEnumerable<T> values)
{
return Encode<T>(values, EqualityComparer<T>.Default);
}
public static IEnumerable<Repeat<T>> Encode<T>(IEnumerable<T> values, IEqualityComparer<T> comparer)
{
List<T> sequence = new List<T>(values);
int index = 0;
while (index < sequence.Count)
{
var bestSequence = FindBestSequence<T>(sequence, index, comparer);
if (bestSequence == null || bestSequence.Length < 1)
throw new InvalidOperationException("Unable to find sequence at position " + index);
yield return bestSequence;
index += bestSequence.Length;
}
}
private static Repeat<T> FindBestSequence<T>(IList<T> sequence, int startIndex, IEqualityComparer<T> comparer)
{
int sequenceLength = 1;
while (startIndex + sequenceLength * 2 <= sequence.Count)
{
if (comparer.Equals(sequence[startIndex], sequence[startIndex + sequenceLength]))
{
bool atLeast2Repeats = true;
for (int index = 0; index < sequenceLength; index++)
{
if (!comparer.Equals(sequence[startIndex + index], sequence[startIndex + sequenceLength + index]))
{
atLeast2Repeats = false;
break;
}
}
if (atLeast2Repeats)
{
int count = 2;
while (startIndex + sequenceLength * (count + 1) <= sequence.Count)
{
bool anotherRepeat = true;
for (int index = 0; index < sequenceLength; index++)
{
if (!comparer.Equals(sequence[startIndex + index], sequence[startIndex + sequenceLength * count + index]))
{
anotherRepeat = false;
break;
}
}
if (anotherRepeat)
count++;
else
break;
}
List<T> oneSequence = Enumerable.Range(0, sequenceLength).Select(i => sequence[startIndex + i]).ToList();
var repeatedSequence = Encode<T>(oneSequence, comparer).ToArray();
return new SequenceRepeat<T>(count, repeatedSequence);
}
}
sequenceLength++;
}
// fall back, we could not find anything that repeated at all
return new SingleSymbol<T>(sequence[startIndex]);
}
public abstract class Repeat<T>
{
public int Count { get; private set; }
protected Repeat(int count)
{
Count = count;
}
public abstract int Length
{
get;
}
}
public class SingleSymbol<T> : Repeat<T>
{
public T Value { get; private set; }
public SingleSymbol(T value)
: base(1)
{
Value = value;
}
public override string ToString()
{
return string.Format("{0}", Value);
}
public override int Length
{
get
{
return Count;
}
}
}
public class SequenceRepeat<T> : Repeat<T>
{
public Repeat<T>[] Values { get; private set; }
public SequenceRepeat(int count, Repeat<T>[] values)
: base(count)
{
Values = values;
}
public override string ToString()
{
return string.Format("({0}{1})", Count, string.Join("", Values.Select(v => v.ToString())));
}
public override int Length
{
get
{
int oneLength = 0;
foreach (var value in Values)
oneLength += value.Length;
return Count * oneLength;
}
}
}
public class GroupRepeat<T> : Repeat<T>
{
public Repeat<T> Group { get; private set; }
public GroupRepeat(int count, Repeat<T> group)
: base(count)
{
Group = group;
}
public override string ToString()
{
return string.Format("({0}{1})", Count, Group);
}
public override int Length
{
get
{
return Count * Group.Length;
}
}
}
Looking at the problem theoretically, it seems similar to the problem of finding the smallest context free grammar which generates (only) the string, except in this case the non-terminals can only be used in direct sequence after each other, so e.g.
ABCBCABCBCDEEF
s->ttDuuF
t->Avv
v->BC
u->E
ABABCDABABCD
s->ABtt
t->ABCD
Of course, this depends on how you define "smallest", but if you count terminals on the right side of rules, it should be the same as the "length in original symbols" after doing the nested run-length encoding.
The problem of the smallest grammar is known to be hard, and is a well-studied problem. I don't know how much the "direct sequence" part adds to or subtracts from the complexity.

Partition/split/section IEnumerable<T> into IEnumerable<IEnumerable<T>> based on a function using LINQ?

I'd like to split a sequence in C# to a sequence of sequences using LINQ. I've done some investigation, and the closest SO article I've found that is slightly related is this.
However, this question only asks how to partition the original sequence based upon a constant value. I would like to partition my sequence based on an operation.
Specifically, I have a list of objects which contain a decimal property.
public class ExampleClass
{
public decimal TheValue { get; set; }
}
Let's say I have a sequence of ExampleClass, and the corresponding sequence of values of TheValue is:
{0,1,2,3,1,1,4,6,7,0,1,0,2,3,5,7,6,5,4,3,2,1}
I'd like to partition the original sequence into an IEnumerable<IEnumerable<ExampleClass>> with values of TheValue resembling:
{{0,1,2,3}, {1,1,4,6,7}, {0,1}, {0,2,3,5,7}, {6,5,4,3,2,1}}
I'm just lost on how this would be implemented. SO, can you help?
I have a seriously ugly solution right now, but have a "feeling" that LINQ will increase the elegance of my code.
Okay, I think we can do this...
public static IEnumerable<IEnumerable<TElement>>
PartitionMontonically<TElement, TKey>
(this IEnumerable<TElement> source,
Func<TElement, TKey> selector)
{
// TODO: Argument validation and custom comparisons
Comparer<TKey> keyComparer = Comparer<TKey>.Default;
using (var iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
{
yield break;
}
TKey currentKey = selector(iterator.Current);
List<TElement> currentList = new List<TElement> { iterator.Current };
int sign = 0;
while (iterator.MoveNext())
{
TElement element = iterator.Current;
TKey key = selector(element);
int nextSign = Math.Sign(keyComparer.Compare(currentKey, key));
// Haven't decided a direction yet
if (sign == 0)
{
sign = nextSign;
currentList.Add(element);
}
// Same direction or no change
else if (sign == nextSign || nextSign == 0)
{
currentList.Add(element);
}
else // Change in direction: yield current list and start a new one
{
yield return currentList;
currentList = new List<TElement> { element };
sign = 0;
}
currentKey = key;
}
yield return currentList;
}
}
Completely untested, but I think it might work...
alternatively with linq operators and some abuse of .net closures by reference.
public static IEnumerable<IEnumerable<T>> Monotonic<T>(this IEnumerable<T> enumerable)
{
var comparator = Comparer<T>.Default;
int i = 0;
T last = default(T);
return enumerable.GroupBy((value) => { i = comparator.Compare(value, last) > 0 ? i : i+1; last = value; return i; }).Select((group) => group.Select((_) => _));
}
Taken from some random utility code for partitioning IEnumerable's into a makeshift table for logging. If I recall properly, the odd ending Select is to prevent ambiguity when the input is an enumeration of strings.
Here's a custom LINQ operator which splits a sequence according to just about any criteria. Its parameters are:
xs: the input element sequence.
func: a function which accepts the "current" input element and a state object, and returns as a tuple:
a bool stating whether the input sequence should be split before the "current" element; and
a state object which will be passed to the next invocation of func.
initialState: the state object that gets passed to func on its first invocation.
Here it is, along with a helper class (required because yield return apparently cannot be nested):
public static IEnumerable<IEnumerable<T>> Split<T, TState>(
this IEnumerable<T> xs,
Func<T, TState, Tuple<bool, TState>> func,
TState initialState)
{
using (var splitter = new Splitter<T, TState>(xs, func, initialState))
{
while (splitter.HasNext)
{
yield return splitter.GetNext();
}
}
}
internal sealed class Splitter<T, TState> : IDisposable
{
public Splitter(IEnumerable<T> xs,
Func<T, TState, Tuple<bool, TState>> func,
TState initialState)
{
this.xs = xs.GetEnumerator();
this.func = func;
this.state = initialState;
this.hasNext = this.xs.MoveNext();
}
private readonly IEnumerator<T> xs;
private readonly Func<T, TState, Tuple<bool, TState>> func;
private bool hasNext;
private TState state;
public bool HasNext { get { return hasNext; } }
public IEnumerable<T> GetNext()
{
while (hasNext)
{
Tuple<bool, TState> decision = func(xs.Current, state);
state = decision.Item2;
if (decision.Item1) yield break;
yield return xs.Current;
hasNext = xs.MoveNext();
}
}
public void Dispose() { xs.Dispose(); }
}
Note: Here are some of the design decisions that went into the Split method:
It should make only a single pass over the sequence.
State is made explicit so that it's possible to keep side effects out of func.

Resources