What does ExpressionVisitor.Visit<T> Do? - linq

Before someone shouts out the answer, please read the question through.
What is the purpose of the method in .NET 4.0's ExpressionVisitor:
public static ReadOnlyCollection<T> Visit<T>(ReadOnlyCollection<T> nodes, Func<T, T> elementVisitor)
My first guess as to the purpose of this method was that it would visit each node in each tree specified by the nodes parameter and rewrite the tree using the result of the elementVisitor function.
This does not appear to be the case. Actually this method appears to do a little more than nothing, unless I'm missing something here, which I strongly suspect I am...
I tried to use this method in my code and when things didn't work out as expected, I reflectored the method and found:
public static ReadOnlyCollection<T> Visit<T>(ReadOnlyCollection<T> nodes, Func<T, T> elementVisitor)
{
T[] list = null;
int index = 0;
int count = nodes.Count;
while (index < count)
{
T objA = elementVisitor(nodes[index]);
if (list != null)
{
list[index] = objA;
}
else if (!object.ReferenceEquals(objA, nodes[index]))
{
list = new T[count];
for (int i = 0; i < index; i++)
{
list[i] = nodes[i];
}
list[index] = objA;
}
index++;
}
if (list == null)
{
return nodes;
}
return new TrueReadOnlyCollection<T>(list);
}
So where would someone actually go about using this method? What am I missing here?
Thanks.

It looks to me like a convenience method to apply an aribitrary transform function to an expression tree, and return the resulting transformed tree, or the original tree if there is no change.
I can't see how this is any different of a pattern that a standard expression visitor, other than except for using a visitor type, it uses a function.
As for usage:
Expression<Func<int, int, int>> addLambdaExpression= (a, b) => a + b;
// Change add to subtract
Func<Expression, Expression> changeToSubtract = e =>
{
if (e is BinaryExpression)
{
return Expression.Subtract((e as BinaryExpression).Left,
(e as BinaryExpression).Right);
}
else
{
return e;
}
};
var nodes = new Expression[] { addLambdaExpression.Body }.ToList().AsReadOnly();
var subtractExpression = ExpressionVisitor.Visit(nodes, changeToSubtract);
You don't explain how you expected it to behave and why therefore you think it does little more than nothing.

Related

Check if two mathematical expressions are equivalent

I came across a question in an interview. I tried solving it but could not come up with a solution. Question is:
[Edited]
First Part: You are given two expressions with only "+" operator, check if given two expressions are mathematically equivalent.
For eg "A+B+C" is equivalent to "A+(B+C)".
Second Part : You are given two expressions with only "+" and "-" operators, check if given two expressions are mathematically equivalent.
For eg "A+B-C" is equivalent to "A-(-B+C)".
My thought process : I was thinking in terms of building an expression tree out of the given expressions and look for some kind of similarity. But I am unable to come up with a good way of checking if two expression trees are some way same or not.
Can some one help me on this :) Thanks in advance !
As long as the operations are commutative, the solution I'd propose is distribute parenthetic operations and then sort terms by 'variable', then run an aggregator across them and you should get a string of factors and symbols. Then just check the set of factors.
Aggregate variable counts until encountering an opening brace, treating subtraction as addition of the negated variable. Handle sub-expressions recursively.
The content of sub-expressions can be directly aggregated into the counts, you just need to take the sign into account properly -- there is no need to create an actual expression tree for this task. The TreeMap used in the code is just a sorted map implementation in the JDK.
The code takes advantage of the fact that the current position is part of the Reader state, so we can easily continue parsing after the closing bracket of the recursive call without needing to hand this information back to the caller explicitly somehow.
Implementation in Java (untested):
class Expression {
// Count for each variable name
Map<String, Integer> counts = new TreeMap<>();
Expression(Srring s) throws IOException {
this(new StringReader(s));
}
Expression(Reader reader) throws IOException {
int sign = 1;
while (true) {
int token = reader.read();
switch (token) {
case -1: // Eof
case ')':
return;
case '(':
add(sign, new Expression(reader));
sign = 1;
break;
case '+':
break;
case '-':
sign = -sign;
break;
default:
add(sign, String.valueOf((char) token));
sign = 1;
break;
}
}
}
void add(int factor, String variable) {
int count = counts.containsKey(variable) ? counts.get(variable) : 0;
counts.put(count + factor, variable);
}
void add(int sign, Expression expr) {
for (Map.Entry<String,Integer> entry : expr.counts.entrySet()) {
add(sign * entry.getVaue(), entry.getKey());
}
}
void equals(Object o) {
return (o instanceof Expression)
&& ((Expression) o).counts.equals(counts);
}
// Not needed for the task, just added for illustration purposes.
String toString() {
StringBuilder sb = new StringBuilder();
for (Map.Entry<String,Integer> entry : expr.counts.entrySet()) {
if (sb.length() > 0) {
sb.append(" + ");
}
sb.append(entry.getValue()); // count
sb.append(entry.getKey()); // variable name
}
return sb.toString();
}
}
Compare with
new Expression("A+B-C").equals(new Expression("A-(-B+C)"))
P.S: Added a toString() method to illustrate the data structure better.
Should print 1A + 1B + -1C for the example.
P.P.P.P.S.: Fixes, simplification, better explanation.
You can parse the expressions from left to right and reduce them to a canonical form for comparison in a straightforward way; the only complication is that when you encounter a closing bracket, you need to know whether its associated opening bracket had a plus or minus in front of it; you can use a stack for that; e.g.:
function Dictionary() {
this.d = [];
}
Dictionary.prototype.add = function(key, value) {
if (!this.d.hasOwnProperty(key)) this.d[key] = value;
else this.d[key] += value;
}
Dictionary.prototype.compare = function(other) {
for (var key in this.d) {
if (!other.d.hasOwnProperty(key) || other.d[key] != this.d[key]) return false;
}
return this.d.length == other.d.length;
}
function canonize(expression) {
var tokens = expression.split('');
var variables = new Dictionary();
var sign_stack = [];
var total_sign = 1;
var current_sign = 1;
for (var i in tokens) {
switch(tokens[i]) {
case '(' : {
sign_stack.push(current_sign);
total_sign *= current_sign;
current_sign = 1;
break;
}
case ')' : {
total_sign *= sign_stack.pop();
break;
}
case '+' : {
current_sign = 1;
break;
}
case '-' : {
current_sign = -1;
break;
}
case ' ' : {
break;
}
default : {
variables.add(tokens[i], current_sign * total_sign);
}
}
}
return variables;
}
var a = canonize("A + B + (A - (A + C - B) - B) - C");
var b = canonize("-C - (-A - (B + (-C)))");
document.write(a.compare(b));

Two sum data structure problems

I built a data structure for two sum question. In this data structure I built add and find method.
add - Add the number to an internal data structure.
find - Find if there exists any pair of numbers which sum is equal to the value.
For example:
add(1); add(3); add(5);
find(4) // return true
find(7) // return false
the following is my code, so what is wrong with this code?
http://www.lintcode.com/en/problem/two-sum-data-structure-design/
this is the test website, some cases could not be passed
public class TwoSum {
private List<Integer> sets;
TwoSum() {
this.sets = new ArrayList<Integer>();
}
// Add the number to an internal data structure.
public void add(int number) {
// Write your code here
this.sets.add(number);
}
// Find if there exists any pair of numbers which sum is equal to the value.
public boolean find(int value) {
// Write your code here
Collections.sort(sets);
for (int i = 0; i < sets.size(); i++) {
if (sets.get(i) > value) break;
for (int j = i + 1; j < sets.size(); j++) {
if (sets.get(i) + sets.get(j) == value) {
return true;
}
}
}
return false;
}
}
There does not seem to be anything wrong with your code.
However a coding challenge could possibly require a more performant solution. (You check every item against every item, which would take O(N^2)).
The best solution to implement find, is using a HashMap, which would take O(N). It's explained more in detail here.

Returning null if multiple instances are found

I have an IEnumerable and a predicate (Func) and I am writing a method that shall return a value if only one instance in the list matches the predicate. If the criteria is matched by none, then none was found. If the criteria is matched by many instances, then the predicate was insufficient to successfully identify the desired record. Both cases should return null.
What is the recommended way to express this in LINQ that does not result in multiple enumerations of the list?
The LINQ operator SingleOrDefault will throw an exception if multiple instances are found.
The LINQ operator FirstOrDefault will return the first even when multiple was found.
MyList.Where(predicate).Skip(1).Any()
...will check for ambiguity, but will not retain the desired record.
It seems that my best move is to grab the Enumerator from MyList.Where(predicate) and retain the first instance if accessing the next item fails, but it seems slightly verbose.
Am I missing something obvious?
The "slightly verbose" option seems reasonable to me, and can easily be isolated into a single extension method:
// TODO: Come up with a better name :)
public static T SingleOrDefaultOnMultiple<T>(this IEnumerable<T> source)
{
// TODO: Validate source is non-null
using (var iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
{
return default(T);
}
T first = iterator.Current;
return iterator.MoveNext() ? default(T) : first;
}
}
Update: Here is a more general approach which might be more reusable.
public static IEnumerable<TSource> TakeIfCountBetween<TSource>(this IEnumerable<TSource> source, int minCount, int maxCount, int? maxTake = null)
{
if (source == null)
throw new ArgumentNullException("source");
if (minCount <= 0 || minCount > maxCount)
throw new ArgumentException("minCount must be greater 0 and less than or equal maxCount", "minCount");
if (maxCount <= 0)
throw new ArgumentException("maxCount must be greater 0", "maxCount");
int take = maxTake ?? maxCount;
if (take > maxCount)
throw new ArgumentException("maxTake must be lower or equal maxCount", "maxTake");
if (take < minCount)
throw new ArgumentException("maxTake must be greater or equal minCount", "maxTake");
int count = 0;
ICollection objCol;
ICollection<TSource> genCol = source as ICollection<TSource>;
if (genCol != null)
{
count = genCol.Count;
}
else if ((objCol = source as ICollection) != null)
{
count = objCol.Count;
}
else
{
using (IEnumerator<TSource> enumerator = source.GetEnumerator())
{
while (enumerator.MoveNext() && ++count < maxCount);
}
}
bool valid = count >= minCount && count <= maxCount;
if (valid)
return source.Take(take);
else
return Enumerable.Empty<TSource>();
}
Usage:
var list = new List<string> { "A", "B", "C", "E", "E", "F" };
IEnumerable<string> result = list
.Where(s => s == "A")
.TakeIfCountBetween(1, 1);
Console.Write(string.Join(",", result)); // or result.First()

Compare each string in datatable with that of list takes longer time.poor performance

I have a datatable of 200,000 rows and want to validate each row with that of list and return that string codesList..
It is taking very long time..I want to improve the performance.
for (int i = 0; i < dataTable.Rows.Count; i++)
{
bool isCodeValid = CheckIfValidCode(codevar, codesList,out CodesCount);
}
private bool CheckIfValidCode(string codevar, List<Codes> codesList, out int count)
{
List<Codes> tempcodes= codesList.Where(code => code.StdCode.Equals(codevar)).ToList();
if (tempcodes.Count == 0)
{
RetVal = false;
for (int i = 0; i < dataTable.Rows.Count; i++)
{
bool isCodeValid = CheckIfValidCode(codevar, codesList,out CodesCount);
}
}
}
private bool CheckIfValidCode(string codevar, List<Codes> codesList, out int count)
{
List<Codes> tempcodes= codesList.Where(code => code.StdCode.Equals(codevar)).ToList();
if (tempcodes.Count == 0)
{
RetVal = false;
}
else
{
RetVal=true;
}
return bRetVal;
}
codelist is a list which also contains 200000 records. Please suggest. I used findAll which takes same time and also used LINQ query which also takes same time.
A few optimizations come to mind:
You could start by removing the Tolist() altogether
replace the Count() with .Any(), which returns true if there are items in the result
It's probably also a lot faster when you replace the List with a HashSet<Codes> (this requires your Codes class to implement HashCode and Equals properly. Alternatively you could populate a HashSet<string> with the contents of Codes.StdCode
It looks like you're not using the out count at all. Removing it would make this method a lot faster. Computing a count requires you to check all codes.
You could also split the List into a Dictionary> which you populate with by taking the first character of the code. That would reduce the number of codes to check drastically, since you can exclude 95% of the codes by their first character.
Tell string.Equals to use a StringComparison of type Ordinal or OrdinalIgnoreCase to speed up the comparison.
It looks like you can stop processing a lot earlier as well, the use of .Any takes care of that in the second method. A similar construct can be used in the first, instead of using for and looping through each row, you could short-circuit after the first failure is found (unless this code is incomplete and you mark each row as invalid individually).
Something like:
private bool CheckIfValidCode(string codevar, List<Codes> codesList)
{
Hashset<string> codes = new Hashset(codesList.Select(c ==> code.StdCode));
return codes.Contains(codevar);
// or: return codes.Any(c => string.Equals(codevar, c, StringComparison.Ordinal);
}
If you're adamant about the count:
private bool CheckIfValidCode(string codevar, List<Codes> codesList, out int count)
{
Hashset<string> codes = new Hashset(codesList.Select(c ==> code.StdCode));
count = codes.Count(codevar);
// or: count = codes.Count(c => string.Equals(codevar, c, StringComparison.Ordinal);
return count > 0;
}
You can optimize further by creating the HashSet outside of the call and re-use the instance:
InCallingCode
{
...
Hashset<string> codes = new Hashset(codesList.Select(c ==> code.StdCode));
for (/*loop*/) {
bool isValid = CheckIfValidCode(codevar, codes, out int count)
}
....
}
private bool CheckIfValidCode(string codevar, List<Codes> codesList, out int count)
{
count = codes.Count(codevar);
// or: count = codes.Count(c => string.Equals(codevar, c, StringComparison.Ordinal);
return count > 0;
}

How to efficiently add the entire English dictionary to a trie data structure

Simply put I want to check if a specified word exists or not.
The lookup needs to be very fast which is why I decided to store the dictionary in a trie. So far so good! My trie works without issues. The problem is filling the trie with a dictionary. What I'm currently doing is looping through every line of a plain text file that is the dictionary and adding each word to my trie.
This is understandably so an extremely slow process. The file contains just about 120 000 lines. If anyone could point me in the right direction for what I could do it would be much appreciated!
This is how I add words to the trie (in Boo):
trie = Trie()
saol = Resources.Load("saol") as TextAsset
text = saol.text.Split(char('\n'))
for new_word in text:
trie.Add(new_word)
And this is my trie (in C#):
using System.Collections.Generic;
public class TrieNode {
public char letter;
public bool word;
public Dictionary<char, TrieNode> child;
public TrieNode(char letter) {
this.letter = letter;
this.word = false;
this.child = new Dictionary<char, TrieNode>();
}
}
public class Trie {
private TrieNode root;
public Trie() {
root = new TrieNode(' ');
}
public void Add(string word) {
TrieNode node = root;
bool found_letter;
int c = 1;
foreach (char letter in word) {
found_letter = false;
// if current letter is in child list, set current node and break loop
foreach (var child in node.child) {
if (letter == child.Key) {
node = child.Value;
found_letter = true;
break;
}
}
// if current letter is not in child list, add child node and set it as current node
if (!found_letter) {
TrieNode new_node = new TrieNode(letter);
if (c == word.Length) new_node.word = true;
node.child.Add(letter, new_node);
node = node.child[letter];
}
c ++;
}
}
public bool Find(string word) {
TrieNode node = root;
bool found_letter;
int c = 1;
foreach (char letter in word) {
found_letter = false;
// check if current letter is in child list
foreach (var child in node.child) {
if (letter == child.Key) {
node = child.Value;
found_letter = true;
break;
}
}
if (found_letter && node.word && c == word.Length) return true;
else if (!found_letter) return false;
c ++;
}
return false;
}
}
Assuming that you don't have any serious implementation problems, pay the price for populating the trie. After you've populated the trie serialize it to a file. For future needs, just load the serialized version. That should be faster that reconstructing the trie.
-- ADDED --
Looking closely at your TrieNode class, you may want to replacing the Dictionary you used for child with an array. You may consume more space, but have a faster lookup time.
Anything you do with CLI yourself will be slower then using the built-in functions.
120k is not that much for a dictionary.
First thing I would do is fire up the code performance tool.
But just some wild guesses: You have a lot of function calls. Just starting with the Boo C# binding in a for loop. Try to pass the whole text block and tare it apart with C#.
Second, do not use a Dictionary. You waste just about as much resources with your code now as you would just using a Dictionary.
Third, sort the text before you go inserting - you can probably make some optimizations that way. Maybe just construct a suffix table.

Resources