LINQ recursion function? - linq

Let's take this n-tier deep structure for example:
public class SomeItem
{
public Guid ID { get;set; }
public string Name { get; set; }
public bool HasChildren { get;set; }
public IEnumerable<SomeItem> Children { get; set; }
}
If I am looking to get a particular Item by ID (anywhere in the structure) is there some LINQ goodness I can use to easily get it in a single statement or do I have to use some recursive function as below:
private SomeItem GetSomeItem(IEnumerable<SomeItem> items, Guid ID)
{
foreach (var item in items)
{
if (item.ID == ID)
{
return item;
}
else if (item.HasChildren)
{
return GetSomeItem(item.Children, ID);
}
}
return null;
}

LINQ doesn't really "do" recursion nicely. Your solution seems appropriate - although I'm not sure HasChildren is really required... why not just use an empty list for an item with no children?
An alternative is to write a DescendantsAndSelf method which will return all of the descendants (including the item itself), something like this;
// Warning: potentially expensive!
public IEnumerable<SomeItem> DescendantsAndSelf()
{
yield return this;
foreach (var item in Children.SelectMany(x => x.DescendantsAndSelf()))
{
yield return item;
}
}
However, if the tree is deep that ends up being very inefficient because each item needs to "pass through" all the iterators of its ancestry. Wes Dyer has blogged about this, showing a more efficient implementation.
Anyway, if you have a method like this (however it's implemented) you can just use a normal "where" clause to find an item (or First/FirstOrDefault etc).

Here's one without recursion. This avoids the cost of passing through several layers of iterators, so I think it's about as efficient as they come.
public static IEnumerable<T> IterateTree<T>(this T root, Func<T, IEnumerable<T>> childrenF)
{
var q = new List<T>() { root };
while (q.Any())
{
var c = q[0];
q.RemoveAt(0);
q.AddRange(childrenF(c) ?? Enumerable.Empty<T>());
yield return c;
}
}
Invoke like so:
var subtree = root.IterateTree(x => x. Children).ToList();

hope this helps
public static IEnumerable<Control> GetAllChildControls(this Control parent)
{
foreach (Control child in parent.Controls)
{
yield return child;
if (child.HasChildren)
{
foreach (Control grandChild in child.GetAllChildControls())
yield return grandChild;
}
}
}

It is important to remember you don't need to do everything with LINQ, or default to recursion. There are interesting options when you use data structures. The following is a simple flattening function in case anyone is interested.
public static IEnumerable<SomeItem> Flatten(IEnumerable<SomeItem> items)
{
if (items == null || items.Count() == 0) return new List<SomeItem>();
var result = new List<SomeItem>();
var q = new Queue<SomeItem>(collection: items);
while (q.Count > 0)
{
var item = q.Dequeue();
result.Add(item);
if (item?.Children?.Count() > 0)
foreach (var child in item.Children)
q.Enqueue(child);
}
return result;
}

While there are extension methods that enable recursion in LINQ (and probably look like your function), none are provided out of the box.
Examples of these extension methods can be found here or here.
I'd say your function is fine.

Related

Better version of Compare Extension for Linq

I need to get differences between two IEnumerable. I wrote extension method for it. But as you can see, it has performance penalties. Anyone can write better version of it?
EDIT
After first response, I understand that I could not explain well. I'm visiting both arrays three times. This is performance penalty. It must be a single shot.
PS: Both is optional :)
public static class LinqExtensions
{
public static ComparisonResult<T> Compare<T>(this IEnumerable<T> source, IEnumerable<T> target)
{
// Looping three times is performance penalty!
var res = new ComparisonResult<T>
{
OnlySource = source.Except(target),
OnlyTarget = target.Except(source),
Both = source.Intersect(target)
};
return res;
}
}
public class ComparisonResult<T>
{
public IEnumerable<T> OnlySource { get; set; }
public IEnumerable<T> OnlyTarget { get; set; }
public IEnumerable<T> Both { get; set; }
}
Dependig on the use-case, this might be more efficient:
public static ComparisonResult<T> Compare<T>(this IEnumerable<T> source, IEnumerable<T> target)
{
var both = source.Intersect(target).ToArray();
if (both.Any())
{
return new ComparisonResult<T>
{
OnlySource = source.Except(both),
OnlyTarget = target.Except(both),
Both = both
};
}
else
{
return new ComparisonResult<T>
{
OnlySource = source,
OnlyTarget = target,
Both = both
};
}
}
You're looking for an efficient full outer join.
Insert all items into a Dictionary<TKey, Tuple<TLeft, TRight>>. If a given key is not present, add it to the dictionary. If it is present, update the value. If the "left member" is set, this means that the item is present in the left source collection (you call it source). The opposite is true for the right member. You can do that using a single pass over both collections.
After that, you iterate over all values of this dictionary and output the respective items into one of three collections, or you just return it as an IEnumerable<Tuple<TLeft, TRight>> which saves the need for result collections.

Partition/split/section IEnumerable<T> into IEnumerable<IEnumerable<T>> based on a function using LINQ?

I'd like to split a sequence in C# to a sequence of sequences using LINQ. I've done some investigation, and the closest SO article I've found that is slightly related is this.
However, this question only asks how to partition the original sequence based upon a constant value. I would like to partition my sequence based on an operation.
Specifically, I have a list of objects which contain a decimal property.
public class ExampleClass
{
public decimal TheValue { get; set; }
}
Let's say I have a sequence of ExampleClass, and the corresponding sequence of values of TheValue is:
{0,1,2,3,1,1,4,6,7,0,1,0,2,3,5,7,6,5,4,3,2,1}
I'd like to partition the original sequence into an IEnumerable<IEnumerable<ExampleClass>> with values of TheValue resembling:
{{0,1,2,3}, {1,1,4,6,7}, {0,1}, {0,2,3,5,7}, {6,5,4,3,2,1}}
I'm just lost on how this would be implemented. SO, can you help?
I have a seriously ugly solution right now, but have a "feeling" that LINQ will increase the elegance of my code.
Okay, I think we can do this...
public static IEnumerable<IEnumerable<TElement>>
PartitionMontonically<TElement, TKey>
(this IEnumerable<TElement> source,
Func<TElement, TKey> selector)
{
// TODO: Argument validation and custom comparisons
Comparer<TKey> keyComparer = Comparer<TKey>.Default;
using (var iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
{
yield break;
}
TKey currentKey = selector(iterator.Current);
List<TElement> currentList = new List<TElement> { iterator.Current };
int sign = 0;
while (iterator.MoveNext())
{
TElement element = iterator.Current;
TKey key = selector(element);
int nextSign = Math.Sign(keyComparer.Compare(currentKey, key));
// Haven't decided a direction yet
if (sign == 0)
{
sign = nextSign;
currentList.Add(element);
}
// Same direction or no change
else if (sign == nextSign || nextSign == 0)
{
currentList.Add(element);
}
else // Change in direction: yield current list and start a new one
{
yield return currentList;
currentList = new List<TElement> { element };
sign = 0;
}
currentKey = key;
}
yield return currentList;
}
}
Completely untested, but I think it might work...
alternatively with linq operators and some abuse of .net closures by reference.
public static IEnumerable<IEnumerable<T>> Monotonic<T>(this IEnumerable<T> enumerable)
{
var comparator = Comparer<T>.Default;
int i = 0;
T last = default(T);
return enumerable.GroupBy((value) => { i = comparator.Compare(value, last) > 0 ? i : i+1; last = value; return i; }).Select((group) => group.Select((_) => _));
}
Taken from some random utility code for partitioning IEnumerable's into a makeshift table for logging. If I recall properly, the odd ending Select is to prevent ambiguity when the input is an enumeration of strings.
Here's a custom LINQ operator which splits a sequence according to just about any criteria. Its parameters are:
xs: the input element sequence.
func: a function which accepts the "current" input element and a state object, and returns as a tuple:
a bool stating whether the input sequence should be split before the "current" element; and
a state object which will be passed to the next invocation of func.
initialState: the state object that gets passed to func on its first invocation.
Here it is, along with a helper class (required because yield return apparently cannot be nested):
public static IEnumerable<IEnumerable<T>> Split<T, TState>(
this IEnumerable<T> xs,
Func<T, TState, Tuple<bool, TState>> func,
TState initialState)
{
using (var splitter = new Splitter<T, TState>(xs, func, initialState))
{
while (splitter.HasNext)
{
yield return splitter.GetNext();
}
}
}
internal sealed class Splitter<T, TState> : IDisposable
{
public Splitter(IEnumerable<T> xs,
Func<T, TState, Tuple<bool, TState>> func,
TState initialState)
{
this.xs = xs.GetEnumerator();
this.func = func;
this.state = initialState;
this.hasNext = this.xs.MoveNext();
}
private readonly IEnumerator<T> xs;
private readonly Func<T, TState, Tuple<bool, TState>> func;
private bool hasNext;
private TState state;
public bool HasNext { get { return hasNext; } }
public IEnumerable<T> GetNext()
{
while (hasNext)
{
Tuple<bool, TState> decision = func(xs.Current, state);
state = decision.Item2;
if (decision.Item1) yield break;
yield return xs.Current;
hasNext = xs.MoveNext();
}
}
public void Dispose() { xs.Dispose(); }
}
Note: Here are some of the design decisions that went into the Split method:
It should make only a single pass over the sequence.
State is made explicit so that it's possible to keep side effects out of func.

Creating linq expression with a subtype restriction

I have this list of type IEnumerable<MyBaseType> for which I am trying to create an extra where-clause to retrieve a specific item in the list. The specific value does only exist on subtype MyFirstType and MySecondType. Not on MyBaseType.
Is it possible to create an expression kind of...
MyList.Where(b => (b is MyFirstType || (b is MySecondType)) && b.SpecificValue == message.SpecificValue);
Above is not working since b is of type MyBaseType and SpecificValue does not exist there. Also note that I do have another subtype MyThirdType that neither has the SpecificValue.
What does work doing what I want is this...
foreach (dynamic u in MyList)
{
if (u is MyFirstType || u is MySecondType)
{
if (u.SpecificValue == message.SpecificValue)
{
//Extracted code goes here
break;
}
}
}
Anyone have an idea how to create an linq expression for the above scenario?
Maybe there is a better solution but as I see it, this could work well enough... If you don't mind performance.
Well then, start by declaring an interface:
public interface IMySpecialType
{
object SpecificValue {get; set;} //you didn't specify what type this is
//all your other relevant properties which first and second types have in common
}
Then, make MyFirstType and MySecondType derive from this interface:
public class MyFirstType : MyBaseType, IMySpecialType
{
//snipet
}
public class MyFirstType : MySecondType, IMySpecialType
{
//snipet
}
Then, filter and cast:
MyList
.Where(b => (b is MyFirstType) || (b is MySecondType))
.Cast<IMySpecialType>()
.Where(b => b.SpecificValue == message.SpecificValue);
//do something
The direct translation of your code to a Linq where clause is
string messageValue = "foo";
var result = baseList.Where(item =>
{
dynamic c = item;
if(item is MyFirstType || item is MySecondType)
{
if( c.SpecificValue == messageValue)
return true;
}
return false;
});
This will require testing the type of the class though and using dynamic - so you might as well cast item to either MyFirstType or MySecondType directly.
An alternative would be using reflection to check if the property exists, using this approach you are not dependent on the actual types of your items as long as they do have the property you are interested in:
string messageValue = "foo";
var result = baseList.Where( item =>
{
var prop = item.GetType().GetProperty("SpecificValue");
if (prop != null && prop.GetValue(item, null) == messageValue)
return true;
else return false;
});
If modifying the class hierarchy is an option you can have you MyFirstType or MySecondType implement an interface that holds the property, then you can use OfType() in your Linq query:
interface ISpecific
{
string SpecificValue { get; set; }
}
class MyFirstType : MyBase, ISpecific
{
public string SpecificValue { get; set; }
}
...
string messageValue = "foo";
var result = baseList.OfType<ISpecific>()
.Where(item => item.SpecificValue == messageValue);
A far more easy way to do that would be to create an interface to mark all your classes having this property SpecificValue. Then it's a child play :
static void Main(string[] args)
{
List<MyBaseType> MyList = new List<MyBaseType>();
ISpecificValue message = new MyFirstType();
MyList.OfType<ISpecificValue>().Where(b => b.SpecificValue == message.SpecificValue);
}
}
class MyBaseType { }
interface ISpecificValue { string SpecificValue { get; set; } }
class MyFirstType : MyBaseType, ISpecificValue
{
public string SpecificValue;
}
class MySecondType : MyBaseType, ISpecificValue
{
public string SpecificValue;
}

Using Distinct with LINQ and Objects [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 months ago.
The community reviewed whether to reopen this question 9 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
Until recently, I was using a Distinct in LINQ to select a distinct category (an enum) from a table. This was working fine.
I now need to have it distinct on a class containing a category and country (both enums). The Distinct isn't working now.
What am I doing wrong?
I believe this post explains your problem:
http://blog.jordanterrell.com/post/LINQ-Distinct()-does-not-work-as-expected.aspx
The content of the above link can be summed up by saying that the Distinct() method can be replaced by doing the following.
var distinctItems = items
.GroupBy(x => x.PropertyToCompare)
.Select(x => x.First());
try an IQualityComparer
public class MyObjEqualityComparer : IEqualityComparer<MyObj>
{
public bool Equals(MyObj x, MyObj y)
{
return x.Category.Equals(y.Category) &&
x.Country.Equals(y.Country);
}
public int GetHashCode(MyObj obj)
{
return obj.GetHashCode();
}
}
then use here
var comparer = new MyObjEqualityComparer();
myObjs.Where(m => m.SomeProperty == "whatever").Distinct(comparer);
You're not doing it wrong, it is just the bad implementation of .Distinct() in the .NET Framework.
One way to fix it is already shown in the other answers, but there is also a shorter solution available, which has the advantage that you can use it as an extension method easily everywhere without having to tweak the object's hash values.
Take a look at this:
**Usage:**
var myQuery=(from x in Customers select x).MyDistinct(d => d.CustomerID);
Note: This example uses a database query, but it does also work with an enumerable object list.
Declaration of MyDistinct:
public static class Extensions
{
public static IEnumerable<T> MyDistinct<T, V>(this IEnumerable<T> query,
Func<T, V> f)
{
return query.GroupBy(f).Select(x=>x.First());
}
}
Or if you want it shorter, this is the same as above, but as "one-liner":
public static IEnumerable<T> MyDistinct<T, V>(this IEnumerable<T> query, Func<T, V> f)
=> query.GroupBy(f).Select(x => x.First());
And it works for everything, objects as well as entities. If required, you can create a second overloaded extension method for IQueryable<T> by just replacing the return type and first parameter type in the example I've given above.
Test data:
You can try it out with this test data:
List<A> GetData()
=> new List<A>()
{
new A() { X="1", Y="2" }, new A() { X="1", Y="2" },
new A() { X="2", Y="3" }, new A() { X="2", Y="3" },
new A() { X="1", Y="3" }, new A() { X="1", Y="3" },
};
class A
{
public string X;
public string Y;
}
Example:
void Main()
{
// returns duplicate rows:
GetData().Distinct().Dump();
// Gets distinct rows by i.X
GetData().MyDistinct(i => i.X).Dump();
}
For explanation, take a look at other answers. I'm just providing one way to handle this issue.
You might like this:
public class LambdaComparer<T>:IEqualityComparer<T>{
private readonly Func<T,T,bool> _comparer;
private readonly Func<T,int> _hash;
public LambdaComparer(Func<T,T,bool> comparer):
this(comparer,o=>0) {}
public LambdaComparer(Func<T,T,bool> comparer,Func<T,int> hash){
if(comparer==null) throw new ArgumentNullException("comparer");
if(hash==null) throw new ArgumentNullException("hash");
_comparer=comparer;
_hash=hash;
}
public bool Equals(T x,T y){
return _comparer(x,y);
}
public int GetHashCode(T obj){
return _hash(obj);
}
}
Usage:
public void Foo{
public string Fizz{get;set;}
public BarEnum Bar{get;set;}
}
public enum BarEnum {One,Two,Three}
var lst=new List<Foo>();
lst.Distinct(new LambdaComparer<Foo>(
(x1,x2)=>x1.Fizz==x2.Fizz&&
x1.Bar==x2.Bar));
You can even wrap it around to avoid writing noisy new LambdaComparer<T>(...) thing:
public static class EnumerableExtensions{
public static IEnumerable<T> SmartDistinct<T>
(this IEnumerable<T> lst, Func<T, T, bool> pred){
return lst.Distinct(new LambdaComparer<T>(pred));
}
}
Usage:
lst.SmartDistinct((x1,x2)=>x1.Fizz==x2.Fizz&&x1.Bar==x2.Bar);
NB: works reliably only for Linq2Objects
I know this is an old question, but I am not satisfied with any of the answers. I took time to figure this out for myself and I wanted to share my findings.
First it is important to read and understand these two things:
IEqualityComparer
EqualityComparer
Long story short in order to make the .Distinct() extension understand how to determine equality of your object - you must define a "EqualityComparer" for your object T. When you read the Microsoft docs it literally states:
We recommend that you derive from the EqualityComparer class
instead of implementing the IEqualityComparer interface...
That is how you determine what to use, because it had been decided for you already.
For the .Distinct() extension to work successfully you must ensure that your objects can be compared accurately. In the case of .Distinct() the GetHashCode() method is what really matters.
You can test this out for yourself by writing a GetHashCode() implementation that just returns the current Hash Code of the object being passed in and you will see the results are bad because this value changes on each run. That makes your objects too unique which is why it is important to actually write a proper implementation of this method.
Below is an exact copy of the code sample from IEqualityComparer<T>'s page with test data, small modification to the GetHashCode() method and comments to demonstrate the point.
//Did this in LinqPad
void Main()
{
var lst = new List<Box>
{
new Box(1, 1, 1),
new Box(1, 1, 1),
new Box(1, 1, 1),
new Box(1, 1, 1),
new Box(1, 1, 1)
};
//Demonstration that the hash code for each object is fairly
//random and won't help you for getting a distinct list
lst.ForEach(x => Console.WriteLine(x.GetHashCode()));
//Demonstration that if your EqualityComparer is setup correctly
//then you will get a distinct list
lst = lst
.Distinct(new BoxEqualityComparer())
.ToList();
lst.Dump();
}
public class Box
{
public Box(int h, int l, int w)
{
this.Height = h;
this.Length = l;
this.Width = w;
}
public int Height { get; set; }
public int Length { get; set; }
public int Width { get; set; }
public override String ToString()
{
return String.Format("({0}, {1}, {2})", Height, Length, Width);
}
}
public class BoxEqualityComparer
: EqualityComparer<Box>
{
public override bool Equals(Box b1, Box b2)
{
if (b2 == null && b1 == null)
return true;
else if (b1 == null || b2 == null)
return false;
else if (b1.Height == b2.Height && b1.Length == b2.Length
&& b1.Width == b2.Width)
return true;
else
return false;
}
public override int GetHashCode(Box bx)
{
#region This works
//In this example each component of the box object are being XOR'd together
int hCode = bx.Height ^ bx.Length ^ bx.Width;
//The hashcode of an integer, is that same integer
return hCode.GetHashCode();
#endregion
#region This won't work
//Comment the above lines and uncomment this line below if you want to see Distinct() not work
//return bx.GetHashCode();
#endregion
}
}

Building an external list while filtering in LINQ

I have an array of input strings that contains either email addresses or account names in the form of domain\account. I would like to build a List of string that contains only email addresses. If an element in the input array is of the form domain\account, I will perform a lookup in the dictionary. If the key is found in the dictionary, that value is the email address. If not found, that won't get added to the result list. The code below will makes the above description clear:
private bool where(string input, Dictionary<string, string> dict)
{
if (input.Contains("#"))
{
return true;
}
else
{
try
{
string value = dict[input];
return true;
}
catch (KeyNotFoundException)
{
return false;
}
}
}
private string select(string input, Dictionary<string, string> dict)
{
if (input.Contains("#"))
{
return input;
}
else
{
try
{
string value = dict[input];
return value;
}
catch (KeyNotFoundException)
{
return null;
}
}
}
public void run()
{
Dictionary<string, string> dict = new Dictionary<string, string>()
{
{ "gmail\\nameless", "nameless#gmail.com"}
};
string[] s = { "anonymous#gmail.com", "gmail\\nameless", "gmail\\unknown" };
var q = s.Where(p => where(p, dict)).Select(p => select(p, dict));
List<string> resultList = q.ToList<string>();
}
While the above code works (hope I don't have any typo here), there are 2 problems that I do not like with the above:
The code in where() and select() seems to be redundant/repeating.
It takes 2 passes. The second pass converts from the query expression to List.
So I would like to add to the List resultList directly in the where() method. It seems like I should be able to do so. Here's the code:
private bool where(string input, Dictionary<string, string> dict, List<string> resultList)
{
if (input.Contains("#"))
{
resultList.Add(input); //note the difference from above
return true;
}
else
{
try
{
string value = dict[input];
resultList.Add(value); //note the difference from above
return true;
}
catch (KeyNotFoundException)
{
return false;
}
}
}
The my LINQ expression can be nicely in 1 single statement:
List<string> resultList = new List<string>();
s.Where(p => where(p, dict, resultList));
Or
var q = s.Where(p => where(p, dict, resultList)); //do nothing with q afterward
Which seems like perfect and legal C# LINQ. The result: sometime it works and sometime it doesn't. So why doesn't my code work reliably and how can I make it do so?
If you reverse the where and the select you can convert unknown domain accounts to null first, then just filter them out.
private string select(string input, Dictionary<string, string> dict)
{
if (input.Contains("#"))
{
return input;
}
else
{
if (dict.ContainsKey(input))
return dict[input];
}
return null;
}
var resultList = s
.Select(p => select(p, dict))
.Where(p => p != null)
.ToList()
This takes care of your duplicate code.
It takes 2 passes. The second pass converts from the query expression to List.
Actually this is only one pass as LINQ is lazy evaluated. This is why your last statements only work sometimes. The filter is only applied and your list generated if the LINQ query is evaluated. Otherwise the Where statement is never run.
It sounds like what you want is an iterator. By making your own iterator you can filter the list and produce output at the same time.
public static IEnumerable EmailAddresses(IEnumerable<string> inputList,
Dictionary<string, string> dict)
{
foreach (string input in inputList)
{
string dictValue;
if (input.Contains("#"))
yield return input;
else if (TryGetValue(input, out dictValue)
yield return dictValue;
// else do nothing
}
}
List<string> resultList = EmailAddresses(s, dict).ToList();
You don't generally want to have side-effects on an unrelated object like your list. It makes it difficult to understand, debug, and refactor. I wouldn't worry about optimizing the query until you know it's not performing well.
So, what's wrong with your original expression? You don't need both the select and the where. You only need the Where() call. This will return a list of email addresses, which you can stick into a HashSet. The HashSet will provide the uniqueness you seem to desire. This will add execution time, so if you don't need it, don't use it.
You should only really need something like:
var s = new[] {"me#me.com", "me_not_at_me.com", "not_me"};
var emailAddrs = s.Where( a => a.Contains("#")); // This is a bad email address validator; find a better one.
var uniqueAddrs = new HashSet<string>(emailAddrs);
(Note, I've not dealt with HashSet, so the constructor might not take an Enumerable. This would be an exercise for the reader.)
Here is one way you could approach it with LINQ. It groups the values by whether or not they are email addresses, resulting in 2 groups of strings. If a group is the email address group, we select directly from it, otherwise we look up the emails and select from those:
public static IEnumerable<string> SelectEmails(
this IEnumerable<string> values,
IDictionary<string, string> accountEmails)
{
return
from value in values
group value by value.Contains("#") into valueGroup
from email in (valueGroup.Key ? valueGroup : GetEmails(valueGroup, accountEmails))
select email;
}
private static IEnumerable<string> GetEmails(
IEnumerable<string> accounts,
IDictionary<string, string> accountEmails)
{
return
from account in accounts
where accountEmails.ContainsKey(account)
select accountEmails[account];
}
You would use it like this:
var values = new string[] { ... };
var accountEmails = new Dictionary<string, string> { ... };
var emails = values.SelectEmails(accountEmails).ToList();
Of course, the most straightforward way to implement this extension method would be #gabe's approach.

Resources