Using Distinct with LINQ and Objects [closed] - linq

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 months ago.
The community reviewed whether to reopen this question 9 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
Until recently, I was using a Distinct in LINQ to select a distinct category (an enum) from a table. This was working fine.
I now need to have it distinct on a class containing a category and country (both enums). The Distinct isn't working now.
What am I doing wrong?

I believe this post explains your problem:
http://blog.jordanterrell.com/post/LINQ-Distinct()-does-not-work-as-expected.aspx
The content of the above link can be summed up by saying that the Distinct() method can be replaced by doing the following.
var distinctItems = items
.GroupBy(x => x.PropertyToCompare)
.Select(x => x.First());

try an IQualityComparer
public class MyObjEqualityComparer : IEqualityComparer<MyObj>
{
public bool Equals(MyObj x, MyObj y)
{
return x.Category.Equals(y.Category) &&
x.Country.Equals(y.Country);
}
public int GetHashCode(MyObj obj)
{
return obj.GetHashCode();
}
}
then use here
var comparer = new MyObjEqualityComparer();
myObjs.Where(m => m.SomeProperty == "whatever").Distinct(comparer);

You're not doing it wrong, it is just the bad implementation of .Distinct() in the .NET Framework.
One way to fix it is already shown in the other answers, but there is also a shorter solution available, which has the advantage that you can use it as an extension method easily everywhere without having to tweak the object's hash values.
Take a look at this:
**Usage:**
var myQuery=(from x in Customers select x).MyDistinct(d => d.CustomerID);
Note: This example uses a database query, but it does also work with an enumerable object list.
Declaration of MyDistinct:
public static class Extensions
{
public static IEnumerable<T> MyDistinct<T, V>(this IEnumerable<T> query,
Func<T, V> f)
{
return query.GroupBy(f).Select(x=>x.First());
}
}
Or if you want it shorter, this is the same as above, but as "one-liner":
public static IEnumerable<T> MyDistinct<T, V>(this IEnumerable<T> query, Func<T, V> f)
=> query.GroupBy(f).Select(x => x.First());
And it works for everything, objects as well as entities. If required, you can create a second overloaded extension method for IQueryable<T> by just replacing the return type and first parameter type in the example I've given above.
Test data:
You can try it out with this test data:
List<A> GetData()
=> new List<A>()
{
new A() { X="1", Y="2" }, new A() { X="1", Y="2" },
new A() { X="2", Y="3" }, new A() { X="2", Y="3" },
new A() { X="1", Y="3" }, new A() { X="1", Y="3" },
};
class A
{
public string X;
public string Y;
}
Example:
void Main()
{
// returns duplicate rows:
GetData().Distinct().Dump();
// Gets distinct rows by i.X
GetData().MyDistinct(i => i.X).Dump();
}

For explanation, take a look at other answers. I'm just providing one way to handle this issue.
You might like this:
public class LambdaComparer<T>:IEqualityComparer<T>{
private readonly Func<T,T,bool> _comparer;
private readonly Func<T,int> _hash;
public LambdaComparer(Func<T,T,bool> comparer):
this(comparer,o=>0) {}
public LambdaComparer(Func<T,T,bool> comparer,Func<T,int> hash){
if(comparer==null) throw new ArgumentNullException("comparer");
if(hash==null) throw new ArgumentNullException("hash");
_comparer=comparer;
_hash=hash;
}
public bool Equals(T x,T y){
return _comparer(x,y);
}
public int GetHashCode(T obj){
return _hash(obj);
}
}
Usage:
public void Foo{
public string Fizz{get;set;}
public BarEnum Bar{get;set;}
}
public enum BarEnum {One,Two,Three}
var lst=new List<Foo>();
lst.Distinct(new LambdaComparer<Foo>(
(x1,x2)=>x1.Fizz==x2.Fizz&&
x1.Bar==x2.Bar));
You can even wrap it around to avoid writing noisy new LambdaComparer<T>(...) thing:
public static class EnumerableExtensions{
public static IEnumerable<T> SmartDistinct<T>
(this IEnumerable<T> lst, Func<T, T, bool> pred){
return lst.Distinct(new LambdaComparer<T>(pred));
}
}
Usage:
lst.SmartDistinct((x1,x2)=>x1.Fizz==x2.Fizz&&x1.Bar==x2.Bar);
NB: works reliably only for Linq2Objects

I know this is an old question, but I am not satisfied with any of the answers. I took time to figure this out for myself and I wanted to share my findings.
First it is important to read and understand these two things:
IEqualityComparer
EqualityComparer
Long story short in order to make the .Distinct() extension understand how to determine equality of your object - you must define a "EqualityComparer" for your object T. When you read the Microsoft docs it literally states:
We recommend that you derive from the EqualityComparer class
instead of implementing the IEqualityComparer interface...
That is how you determine what to use, because it had been decided for you already.
For the .Distinct() extension to work successfully you must ensure that your objects can be compared accurately. In the case of .Distinct() the GetHashCode() method is what really matters.
You can test this out for yourself by writing a GetHashCode() implementation that just returns the current Hash Code of the object being passed in and you will see the results are bad because this value changes on each run. That makes your objects too unique which is why it is important to actually write a proper implementation of this method.
Below is an exact copy of the code sample from IEqualityComparer<T>'s page with test data, small modification to the GetHashCode() method and comments to demonstrate the point.
//Did this in LinqPad
void Main()
{
var lst = new List<Box>
{
new Box(1, 1, 1),
new Box(1, 1, 1),
new Box(1, 1, 1),
new Box(1, 1, 1),
new Box(1, 1, 1)
};
//Demonstration that the hash code for each object is fairly
//random and won't help you for getting a distinct list
lst.ForEach(x => Console.WriteLine(x.GetHashCode()));
//Demonstration that if your EqualityComparer is setup correctly
//then you will get a distinct list
lst = lst
.Distinct(new BoxEqualityComparer())
.ToList();
lst.Dump();
}
public class Box
{
public Box(int h, int l, int w)
{
this.Height = h;
this.Length = l;
this.Width = w;
}
public int Height { get; set; }
public int Length { get; set; }
public int Width { get; set; }
public override String ToString()
{
return String.Format("({0}, {1}, {2})", Height, Length, Width);
}
}
public class BoxEqualityComparer
: EqualityComparer<Box>
{
public override bool Equals(Box b1, Box b2)
{
if (b2 == null && b1 == null)
return true;
else if (b1 == null || b2 == null)
return false;
else if (b1.Height == b2.Height && b1.Length == b2.Length
&& b1.Width == b2.Width)
return true;
else
return false;
}
public override int GetHashCode(Box bx)
{
#region This works
//In this example each component of the box object are being XOR'd together
int hCode = bx.Height ^ bx.Length ^ bx.Width;
//The hashcode of an integer, is that same integer
return hCode.GetHashCode();
#endregion
#region This won't work
//Comment the above lines and uncomment this line below if you want to see Distinct() not work
//return bx.GetHashCode();
#endregion
}
}

Related

Better version of Compare Extension for Linq

I need to get differences between two IEnumerable. I wrote extension method for it. But as you can see, it has performance penalties. Anyone can write better version of it?
EDIT
After first response, I understand that I could not explain well. I'm visiting both arrays three times. This is performance penalty. It must be a single shot.
PS: Both is optional :)
public static class LinqExtensions
{
public static ComparisonResult<T> Compare<T>(this IEnumerable<T> source, IEnumerable<T> target)
{
// Looping three times is performance penalty!
var res = new ComparisonResult<T>
{
OnlySource = source.Except(target),
OnlyTarget = target.Except(source),
Both = source.Intersect(target)
};
return res;
}
}
public class ComparisonResult<T>
{
public IEnumerable<T> OnlySource { get; set; }
public IEnumerable<T> OnlyTarget { get; set; }
public IEnumerable<T> Both { get; set; }
}
Dependig on the use-case, this might be more efficient:
public static ComparisonResult<T> Compare<T>(this IEnumerable<T> source, IEnumerable<T> target)
{
var both = source.Intersect(target).ToArray();
if (both.Any())
{
return new ComparisonResult<T>
{
OnlySource = source.Except(both),
OnlyTarget = target.Except(both),
Both = both
};
}
else
{
return new ComparisonResult<T>
{
OnlySource = source,
OnlyTarget = target,
Both = both
};
}
}
You're looking for an efficient full outer join.
Insert all items into a Dictionary<TKey, Tuple<TLeft, TRight>>. If a given key is not present, add it to the dictionary. If it is present, update the value. If the "left member" is set, this means that the item is present in the left source collection (you call it source). The opposite is true for the right member. You can do that using a single pass over both collections.
After that, you iterate over all values of this dictionary and output the respective items into one of three collections, or you just return it as an IEnumerable<Tuple<TLeft, TRight>> which saves the need for result collections.

Linq Query Need - Looking for a pattern of data

Say I have a collection of the following simple class:
public class MyEntity
{
public string SubId { get; set; }
public System.DateTime ApplicationTime { get; set; }
public double? ThicknessMicrons { get; set; }
}
I need to search through the entire collection looking for 5 consecutive (not 5 total, but 5 consecutive) entities that have a null ThicknessMicrons value. Consecutiveness will be based on the ApplicationTime property. The collection will be sorted on that property.
How can I do this in a Linq query?
You can write your own extension method pretty easily:
public static IEnumerable<IEnumerable<T>> FindSequences<T>(this IEnumerable<T> sequence, Predicate<T> selector, int size)
{
List<T> curSequence = new List<T>();
foreach (T item in sequence)
{
// Check if this item matches the condition
if (selector(item))
{
// It does, so store it
curSequence.Add(item);
// Check if the list size has met the desired size
if (curSequence.Count == size)
{
// It did, so yield that list, and reset
yield return curSequence;
curSequence = new List<T>();
}
}
else
{
// No match, so reset the list
curSequence = new List<T>();
}
}
}
Now you can just say:
var groupsOfFive = entities.OrderBy(x => x.ApplicationTime)
.FindSequences(x => x.ThicknessMicrons == null, 5);
Note that this will return all sub-sequences of length 5. You can test for the existence of one like so:
bool isFiveSubsequence = groupsOfFive.Any();
Another important note is that if you have 9 consecutive matches, only one sub-sequence will be located.

Partition/split/section IEnumerable<T> into IEnumerable<IEnumerable<T>> based on a function using LINQ?

I'd like to split a sequence in C# to a sequence of sequences using LINQ. I've done some investigation, and the closest SO article I've found that is slightly related is this.
However, this question only asks how to partition the original sequence based upon a constant value. I would like to partition my sequence based on an operation.
Specifically, I have a list of objects which contain a decimal property.
public class ExampleClass
{
public decimal TheValue { get; set; }
}
Let's say I have a sequence of ExampleClass, and the corresponding sequence of values of TheValue is:
{0,1,2,3,1,1,4,6,7,0,1,0,2,3,5,7,6,5,4,3,2,1}
I'd like to partition the original sequence into an IEnumerable<IEnumerable<ExampleClass>> with values of TheValue resembling:
{{0,1,2,3}, {1,1,4,6,7}, {0,1}, {0,2,3,5,7}, {6,5,4,3,2,1}}
I'm just lost on how this would be implemented. SO, can you help?
I have a seriously ugly solution right now, but have a "feeling" that LINQ will increase the elegance of my code.
Okay, I think we can do this...
public static IEnumerable<IEnumerable<TElement>>
PartitionMontonically<TElement, TKey>
(this IEnumerable<TElement> source,
Func<TElement, TKey> selector)
{
// TODO: Argument validation and custom comparisons
Comparer<TKey> keyComparer = Comparer<TKey>.Default;
using (var iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
{
yield break;
}
TKey currentKey = selector(iterator.Current);
List<TElement> currentList = new List<TElement> { iterator.Current };
int sign = 0;
while (iterator.MoveNext())
{
TElement element = iterator.Current;
TKey key = selector(element);
int nextSign = Math.Sign(keyComparer.Compare(currentKey, key));
// Haven't decided a direction yet
if (sign == 0)
{
sign = nextSign;
currentList.Add(element);
}
// Same direction or no change
else if (sign == nextSign || nextSign == 0)
{
currentList.Add(element);
}
else // Change in direction: yield current list and start a new one
{
yield return currentList;
currentList = new List<TElement> { element };
sign = 0;
}
currentKey = key;
}
yield return currentList;
}
}
Completely untested, but I think it might work...
alternatively with linq operators and some abuse of .net closures by reference.
public static IEnumerable<IEnumerable<T>> Monotonic<T>(this IEnumerable<T> enumerable)
{
var comparator = Comparer<T>.Default;
int i = 0;
T last = default(T);
return enumerable.GroupBy((value) => { i = comparator.Compare(value, last) > 0 ? i : i+1; last = value; return i; }).Select((group) => group.Select((_) => _));
}
Taken from some random utility code for partitioning IEnumerable's into a makeshift table for logging. If I recall properly, the odd ending Select is to prevent ambiguity when the input is an enumeration of strings.
Here's a custom LINQ operator which splits a sequence according to just about any criteria. Its parameters are:
xs: the input element sequence.
func: a function which accepts the "current" input element and a state object, and returns as a tuple:
a bool stating whether the input sequence should be split before the "current" element; and
a state object which will be passed to the next invocation of func.
initialState: the state object that gets passed to func on its first invocation.
Here it is, along with a helper class (required because yield return apparently cannot be nested):
public static IEnumerable<IEnumerable<T>> Split<T, TState>(
this IEnumerable<T> xs,
Func<T, TState, Tuple<bool, TState>> func,
TState initialState)
{
using (var splitter = new Splitter<T, TState>(xs, func, initialState))
{
while (splitter.HasNext)
{
yield return splitter.GetNext();
}
}
}
internal sealed class Splitter<T, TState> : IDisposable
{
public Splitter(IEnumerable<T> xs,
Func<T, TState, Tuple<bool, TState>> func,
TState initialState)
{
this.xs = xs.GetEnumerator();
this.func = func;
this.state = initialState;
this.hasNext = this.xs.MoveNext();
}
private readonly IEnumerator<T> xs;
private readonly Func<T, TState, Tuple<bool, TState>> func;
private bool hasNext;
private TState state;
public bool HasNext { get { return hasNext; } }
public IEnumerable<T> GetNext()
{
while (hasNext)
{
Tuple<bool, TState> decision = func(xs.Current, state);
state = decision.Item2;
if (decision.Item1) yield break;
yield return xs.Current;
hasNext = xs.MoveNext();
}
}
public void Dispose() { xs.Dispose(); }
}
Note: Here are some of the design decisions that went into the Split method:
It should make only a single pass over the sequence.
State is made explicit so that it's possible to keep side effects out of func.

Pass a lambda expression in place of IComparer or IEqualityComparer or any single-method interface?

I happened to have seen some code where this guy passed a lambda expression to a ArrayList.Sort(IComparer here) or a IEnumerable.SequenceEqual(IEnumerable list, IEqualityComparer here) where an IComparer or an IEqualityComparer was expected.
I can't be sure if I saw it though, or I am just dreaming. And I can't seem to find an extension on any of these collections that accepts a Func<> or a delegate in their method signatures.
Is there such an overload/extension method? Or, if not, is it possible to muck around like this and pass an algorithm (read delegate) where a single-method interface is expected?
Update
Thanks, everyone. That's what I thought. I must've been dreaming. I know how to write a conversion. I just wasn't sure if I'd seen something like that or just thought I'd seen it.
Yet another update
Look, here, I found one such instance. I wasn't dreaming after all. Look at what this guy is doing here. What gives?
And here's another update:
Ok, I get it. The guy's using the Comparison<T> overload. Nice. Nice, but totally prone to mislead you. Nice, though. Thanks.
I'm not much sure what useful it really is, as I think for most cases in the Base Library expecting an IComparer there's an overload that expects a Comparison... but just for the record:
in .Net 4.5 they've added a method to obtain an IComparer from a Comparison:
Comparer.Create
so you can pass your lambda to it and obtain an IComparer.
I was also googling the web for a solution, but i didn't found any satisfying one. So i've created a generic EqualityComparerFactory:
using System;
using System.Collections.Generic;
/// <summary>
/// Utility class for creating <see cref="IEqualityComparer{T}"/> instances
/// from Lambda expressions.
/// </summary>
public static class EqualityComparerFactory
{
/// <summary>Creates the specified <see cref="IEqualityComparer{T}" />.</summary>
/// <typeparam name="T">The type to compare.</typeparam>
/// <param name="getHashCode">The get hash code delegate.</param>
/// <param name="equals">The equals delegate.</param>
/// <returns>An instance of <see cref="IEqualityComparer{T}" />.</returns>
public static IEqualityComparer<T> Create<T>(
Func<T, int> getHashCode,
Func<T, T, bool> equals)
{
if (getHashCode == null)
{
throw new ArgumentNullException(nameof(getHashCode));
}
if (equals == null)
{
throw new ArgumentNullException(nameof(equals));
}
return new Comparer<T>(getHashCode, equals);
}
private class Comparer<T> : IEqualityComparer<T>
{
private readonly Func<T, int> _getHashCode;
private readonly Func<T, T, bool> _equals;
public Comparer(Func<T, int> getHashCode, Func<T, T, bool> equals)
{
_getHashCode = getHashCode;
_equals = equals;
}
public bool Equals(T x, T y) => _equals(x, y);
public int GetHashCode(T obj) => _getHashCode(obj);
}
}
The idea is, that the CreateComparer method takes two arguments: a delegate to GetHashCode(T) and a delegate to Equals(T,T)
Example:
class Person
{
public int Id { get; set; }
public string LastName { get; set; }
public string FirstName { get; set; }
}
class Program
{
static void Main(string[] args)
{
var list1 = new List<Person>(new[]{
new Person { Id = 1, FirstName = "Walter", LastName = "White" },
new Person { Id = 2, FirstName = "Jesse", LastName = "Pinkman" },
new Person { Id = 3, FirstName = "Skyler", LastName = "White" },
new Person { Id = 4, FirstName = "Hank", LastName = "Schrader" },
});
var list2 = new List<Person>(new[]{
new Person { Id = 1, FirstName = "Walter", LastName = "White" },
new Person { Id = 4, FirstName = "Hank", LastName = "Schrader" },
});
// We're comparing based on the Id property
var comparer = EqualityComparerFactory.Create<Person>(
a => a.Id.GetHashCode(),
(a, b) => a.Id==b.Id);
var intersection = list1.Intersect(list2, comparer).ToList();
}
}
You can provide a lambda for a Array.Sort method, as it requires a method that accepts two objects of type T and returns an integer. As such, you could provide a lambda of the following definition (a, b) => a.CompareTo(b). An example to do a descending sort of an integer array:
int[] array = { 1, 8, 19, 4 };
// descending sort
Array.Sort(array, (a, b) => -1 * a.CompareTo(b));
public class Comparer2<T, TKey> : IComparer<T>, IEqualityComparer<T>
{
private readonly Expression<Func<T, TKey>> _KeyExpr;
private readonly Func<T, TKey> _CompiledFunc
// Constructor
public Comparer2(Expression<Func<T, TKey>> getKey)
{
_KeyExpr = getKey;
_CompiledFunc = _KeyExpr.Compile();
}
public int Compare(T obj1, T obj2)
{
return Comparer<TKey>.Default.Compare(_CompiledFunc(obj1), _CompiledFunc(obj2));
}
public bool Equals(T obj1, T obj2)
{
return EqualityComparer<TKey>.Default.Equals(_CompiledFunc(obj1), _CompiledFunc(obj2));
}
public int GetHashCode(T obj)
{
return EqualityComparer<TKey>.Default.GetHashCode(_CompiledFunc(obj));
}
}
use it like this
ArrayList.Sort(new Comparer2<Product, string>(p => p.Name));
You can't pass it directly however you could do so by defining a LambdaComparer class that excepts a Func<T,T,int> and then uses that in it's CompareTo.
It is not quite as concise but you could make it shorter through some creative extension methods on Func.
These methods don't have overloads that accept a delegate instead of an interface, but:
You can normally return a simpler sort key through the delegate you pass to Enumerable.OrderBy
Likewise, you could call Enumerable.Select before calling Enumerable.SequenceEqual
It should be straightforward to write a wrapper that implements IEqualityComparer<T> in terms of Func<T, T, bool>
F# lets you implement this sort of interface in terms of a lambda :)
I vote for the dreaming theory.
You can't pass a function where an object is expected: derivatives of System.Delegate (which is what lambdas are) don't implement those interfaces.
What you probably saw is a use of the of the Converter<TInput, TOutput> delegate, which can be modeled by a lambda. Array.ConvertAll uses an instance of this delegate.
In case if you need this function for use with lambda and possibly two different element types:
static class IEnumerableExtensions
{
public static bool SequenceEqual<T1, T2>(this IEnumerable<T1> first, IEnumerable<T2> second, Func<T1, T2, bool> comparer)
{
if (first == null)
throw new NullReferenceException("first");
if (second == null)
throw new NullReferenceException("second");
using (IEnumerator<T1> e1 = first.GetEnumerator())
using (IEnumerator<T2> e2 = second.GetEnumerator())
{
while (e1.MoveNext())
{
if (!(e2.MoveNext() && comparer(e1.Current, e2.Current)))
return false;
}
if (e2.MoveNext())
return false;
}
return true;
}
}

IEqualityComparer for anonymous type

I have this
var n = ItemList.Select(s => new { s.Vchr, s.Id, s.Ctr, s.Vendor, s.Description, s.Invoice }).ToList();
n.AddRange(OtherList.Select(s => new { s.Vchr, s.Id, s.Ctr, s.Vendor, s.Description, s.Invoice }).ToList(););
I would like to do this if it where allowed
n = n.Distinct((x, y) => x.Vchr == y.Vchr)).ToList();
I tried using the generic LambdaComparer but since im using anonymous types there is no type associate it with.
"Help me Obi Wan Kenobi, you're my only hope"
The trick is to create a comparer that only works on inferred types. For instance:
public class Comparer<T> : IComparer<T> {
private Func<T,T,int> _func;
public Comparer(Func<T,T,int> func) {
_func = func;
}
public int Compare(T x, T y ) {
return _func(x,y);
}
}
public static class Comparer {
public static Comparer<T> Create<T>(Func<T,T,int> func){
return new Comparer<T>(func);
}
public static Comparer<T> CreateComparerForElements<T>(this IEnumerable<T> enumerable, Func<T,T,int> func) {
return new Comparer<T>(func);
}
}
Now I can do the following ... hacky solution:
var comp = n.CreateComparerForElements((x, y) => x.Vchr == y.Vchr);
Most of the time when you compare (for equality or sorting) you're interested in choosing the keys to compare by, not the equality or comparison method itself (this is the idea behind Python's list sort API).
There's an example key equality comparer here.
I note that JaredPar's answer does not quite answer the question since the set methods like Distinct and Except require an IEqualityComparer<T> not an IComparer<T>. The following assumes that an IEquatable will have a suitable GetHashCode, and it certainly has a suitable Equals method.
public class GeneralComparer<T, TEquatable> : IEqualityComparer<T>
{
private readonly Func<T, IEquatable<TEquatable>> equatableSelector;
public GeneralComparer(Func<T, IEquatable<TEquatable>> equatableSelector)
{
this.equatableSelector = equatableSelector;
}
public bool Equals(T x, T y)
{
return equatableSelector.Invoke(x).Equals(equatableSelector.Invoke(y));
}
public int GetHashCode(T x)
{
return equatableSelector(x).GetHashCode();
}
}
public static class GeneralComparer
{
public static GeneralComparer<T, TEquatable> Create<T, TEquatable>(Func<T, TEquatable> equatableSelector)
{
return new GeneralComparer<T, TEquatable>(equatableSelector);
}
}
Where the same inference from a static class trick is used as in JaredPar's answer.
To be more general you could provide two Funcs: a Func<T, T, bool> to check equality and Func<T, T, int> to select a hash code.

Resources