LINQ GroupBy Anonymous Type - linq

I am wondering why GroupBy works with anonymous types.
List<string> values = new List<string>();
values.GroupBy(s => new { Length = s.Length, Value = s })
Anonymous types do not implement any interfaces, so I am confused how this is working.
I assume that the algorithm is working by creating an instance of the anonymous type for each item in the source and using hashing to group the items together. However, no IEqualityComparer is provided to define how to generate a hash or whether two instances are equal. I would assume, then, that the Object.Equals and Object.GetHashCode methods would be the fallback, which rely on object identity.
So, how is it that this is working as expected? And yet it doesn't work in an OrderBy. Do anonymous types override Equals and GetHashCode? or does the underlying GroupBy algorithm do some magic I haven't thought of?

As per the documentation, an anonymous type is a reference type:
From the perspective of the common language runtime, an anonymous type is no different from any other reference type.
Therefore, it will be using the default implementation for those functions as implemented by System.Object (which at least for equality is based on referential equality).
EDIT: Actually, as per that same first doco link it says:
Because the Equals and GetHashCode methods on anonymous types are defined in terms of the Equals and GetHashcode methods of the properties, two instances of the same anonymous type are equal only if all their properties are equal.

http://msdn.microsoft.com/en-us/library/bb397696.aspx
This link explains that GetHashCode and Equals are overridden.

It doesn't work on OrderBy because the new object does not implement IComparable.

Related

IEnumerable<T>.Count() vs List<T>.Count with Entity Framework

I am retrieving a list of items using Entity Framework and if there are some items retrieved I do something with them.
var items = db.MyTable.Where(t => t.Expiration < DateTime.Now).ToList();
if(items.Count != 0)
{
// Do something...
}
The if statement could also be written as
if(items.Count() != 0)
{
// Do something...
}
In the first case, the .Count is a List<T>.Count property. In the second case, the .Count() is IEnumerable<T>.Count() extension method.
Although both approaches achieve the same result, however, is one more preferred than the other? (Possibly some difference in performance?)
Enumerable.Count<T> (the extension method for IEnumerable<T>) just calls Count if the underlying type is an ICollection<T>, so for List<T> there is no difference.
Queryable.Count<T> (the extension method for IQueryable<T>) will use the underlying query provider, which in many cases will push the count down to the actual SQL, which will perform faster than counting the objects in memory.
If a filter is applied (e.g. Count(i => i.Name = "John")) or if the underlying type is not an ICollection<T>, the collection is enumerated to compute the count.
is one more preferred than the other?
I generally prefer to use Count() since 1) it's more portable (the underlying type can be anything that implements IEnumerable<T> or IQueryable<T>) and 2) it's easier to add a filter later if necessary.
As Tim states in his comment, I also prefer using Any() to Count() > 0 since it doesn't have to actually count the items - it will just check for the existence of one item. Conversely I use !Any() instead of Count() == 0.
It depends on the underlying collection and where Linq will be pulling from. For example if it's SQL then using .ToList() will cause the query to pull back the entire list, and then count it. However, the .Count() extension method will translate it into a SQL COUNT statement on the database side. In which case there will be an obvious performance difference.
For just a standard List or Collection it's as stated in D. Stanley's answer.
I would say that it depends on what's going on inside the if block. If you're simply doing the check to determine whether to perform a sequence of operations on the underlying enumeration, then it's probably not needed in any event. Simply iterate over the enumeration (omitting ToList as well). If you're not using the collection inside the if block, then you should avoid using ToList and definitely use Any over any Count/Count() method.
Once you've performed the ToList then you're no longer using Entity Framework and I expect that Count() is only marginally slower than Count since, if the underlying collection is ICollection<T> it defers to that implementation. The only overhead would be determining whether it implements that interface.
http://msdn.microsoft.com/en-us/library/bb338038.aspx
Remarks:
If the type of source implements ICollection<T>, that implementation is used to obtain the count of elements. Otherwise, this method determines the count.

Linq To Sql Where does not call overridden Equals

I'm currently working on a project where I'm going to do a lot of comparison of similar non-database (service layer objects for this discuss) objects and objects retrieved from the database via LinqToSql. For the sake of this discussion, assume I have a service layer Product object with a string field that is represented in the database. However, in the database, there is also a primary key Id that is not represented in the service layer.
Accordingly (as I often do for unit testing etc), I overrode Equals(Object), Equals(Product), and GetHashCode and implemented IEquatable with the expectation that I would be able to write code like this:
myContext.Products.Where(p => p.Equals(passedInProduct).SingleOrDefault();
And so forth.
The Equals override is tested and works. The objects are mutable so the usual caveats apply to the GetHashCode override. However, for the purposes of this example, the objects are not modified except by LtS and could be made readonly.
Here's the simple test:
Create a test object in memory and commit to the LtS context. By committing, the test object is populated with a few auto-generated fields.
Create another identical test object in memory (separate reference)
Attempt to retrieve the first object from the database using the second object as the criteria. (see code line above).
// Setup
string productDesc = "12A";
Product testProduct1 = _CreateTestProductInDatabase(productDesc);
Product testProduct2 = _CreateTestProduct(productDesc);
// check setup
Product retrievedProduct1 = ProductRepo.Retrieve(testProduct1);
//Assert.IsNotNull(retrievedProduct1);
// execute - try to retrieve the 'equivalent' product object
Product retrievedProduct2 = ProductRepo.Retrieve(testProduct2);
A simplified version of Retrieve (cruft removed is just parameter checks etc):
using (var dbContext = new ProductDataContext()) {
Product retrievedProduct = dbContext.Products
.Where(p => p.Equals(product)).SingleOrDefault();
NB: The overridden Equals method knows not to care about the auto-generated fields from the database and only looks at the string that is represented in the service layer.
Here's what I observed:
Retrieve on testProduct1 succeeds (no surprise, equal by reference)
Retrieve on testProduct2 fails (null)
The overridden Equals method called in the Retrieve method is never hit during either Retrieve calls
However, the overridden Equals method is called multiple times by the context on SubmitChanges (called when creating the first test object in the database) (works as expected).
Statically, the compiler knows that the type of the objects being emitted and is able to resolve the type.
So my specific questions:
Am I trying to do something ill-advised? Seems like a straightforward use of Equals.
Corollary to first question: alternate suggestions to deal with linq to sql equality checking while keeping comparison details inside the objects rather than the repository
Why might I have observed the Equals method being resolved in SubmitChanges but not in the Where clause?
I'm as much interested in understanding as making my Equals calls work. But I also would love to learn how to make this 'pattern' work rather than just understand why it appears to be an 'anti-pattern' in the contest of LtS and C#.
Please don't suggest I just filter directly on the context with Where statements. Obviously, I can remove the Equals call and do that. However, some of the other objects (not presented here) are large and a bit complicated. For the sake of maintenance and clarity, I want to keep knowledge of how to compare itself to another of its own type in one place and ideally as part of the object in question.
Some other things I tried that didn't change the behavior:
Overloaded and used == instead
Casting the lambda variable to the type p => (Product)p
Getting an IQueryable object first and calling Equals in the Where clause
Some other things I tried that didn't work:
Creating a static ProductEquals(Product first, Product second) method: System.NotSupportedException:has no supported translation to SQL.
Thanks StackOverflow contributors!
Re Possible dups: I've read ~10 other questions. I'd love a pointer to an exact duplicate but most don't seem to directly address what seems to be an oddity of LinqToSql.
Am I trying to do something ill-advised?
Absolutely. Consider what LINQ to SQL does: it creates a SQL representation of your query. It doesn't know what your overridden Equals method does, so can't translate that logic into SQL.
Corollary to first question: alternate suggestions to deal with linq to sql equality checking while keeping comparison details inside the objects rather than the repository
You'd need to do something with expression trees to represent the equality that way - and then build those expression trees up into a full query. It won't be fun, but it should be possible. It will affect how you build all your queries though.
I would have expected most database representations to be ID-based though, so you should be able to just compare IDs for equality. Usually when I've seen attempts to really model data in an OO fashion but store it in a database, the leakiness of the abstraction has caused a lot of pain.
Why might I have observed the Equals method being resolved in SubmitChanges but not in the Where clause?
Presumably SubmitChanges is working against a set of in-memory objects to work out what's changed - it doesn't have to do any conversion to SQL to do that part.

Consuming anonymous type in an Expression

The Entity Framework has a function with this signature:
public EntityTypeConfiguration<TEntityType> HasKey<TKey>(Expression<Func<TEntityType, TKey>> keyExpression);
If your table has a clustered primary key you can represent that like this:
this.HasKey(t => new { t.Field1, t.Field2 });
My question is, how are they consuming this anonymous type? I would like to build some similar functionality in methods of my own, that allow a lambda expression that returns multiple properties.
Is there some special way to peek into an anonymous type?
They just use reflection.
For additional performance, you can use expression trees to store pre-compiled delegates in a generic type.

How to detect if a LINQ enumeration is materialized?

Is there some way of detecting whether an enumerable built using LINQ (to Objects in this case) have been materialized or not? Other than trying to inspect the type of the underlying collection?
Specifically, since enumerable.ToArray() will build a new array even if the underlying collection already is an array I'm looking for a way of avoiding ToArray() being called twice on the same collection.
The enumerable won't have an "underlying collection", it will be a collection. Try casting it to one and use the resulting reference:
var coll = enumerable as ICollection<T>;
if (coll != null) {
// We have a collection!
}
Checking "is IEnumerable" can do this, there isn't another way. You could, however, not use the "var" declaration for the return type - because it is "hiding" from you the type. If you declare an explicit IEnumerable, then the compiler will let you know if that is what is being returned.

How Does Queryable.OfType Work?

Important The question is not "What does Queryable.OfType do, it's "how does the code I see there accomplish that?"
Reflecting on Queryable.OfType, I see (after some cleanup):
public static IQueryable<TResult> OfType<TResult>(this IQueryable source)
{
return (IQueryable<TResult>)source.Provider.CreateQuery(
Expression.Call(
null,
((MethodInfo)MethodBase.GetCurrentMethod()).MakeGenericMethod(
new Type[] { typeof(TResult) }) ,
new Expression[] { source.Expression }));
}
So let me see if I've got this straight:
Use reflection to grab a reference to the current method (OfType).
Make a new method, which is exactly the same, by using MakeGenericMethod to change the type parameter of the current method to, er, exactly the same thing.
The argument to that new method will be not source but source.Expression. Which isn't an IQueryable, but we'll be passing the whole thing to Expression.Call, so that's OK.
Call Expression.Call, passing null as method (weird?) instance and the cloned method as its arguments.
Pass that result to CreateQuery and cast the result, which seems like the sanest part of the whole thing.
Now the effect of this method is to return an expression which tells the provider to omit returning any values where the type is not equal to TResult or one of its subtypes. But I can't see how the steps above actually accomplish this. It seems to be creating an expression representing a method which returns IQueryable<TResult>, and making the body of that method simply the entire source expression, without ever looking at the type. Is it simply expected that an IQueryable provider will just silently not return any records not of the selected type?
So are the steps above incorrect in some way, or am I just not seeing how they result in the behavior observed at runtime?
It's not passing in null as the method - it's passing it in as the "target expression", i.e. what it's calling the method on. This is null because OfType is a static method, so it doesn't need a target.
The point of calling MakeGenericMethod is that GetCurrentMethod() returns the open version, i.e. OfType<> instead of OfType<YourType>.
Queryable.OfType itself isn't meant to contain any of the logic for omitting returning any values. That's up to the LINQ provider. The point of Queryable.OfType is to build up the expression tree to include the call to OfType, so that when the LINQ provider eventually has to convert it into its native format (e.g. SQL) it knows that OfType was called.
This is how Queryable works in general - basically it lets the provider see the whole query expression as an expression tree. That's all it's meant to do - when the provider is asked to translate this into real code, that's where the magic happens.
Queryable couldn't possibly do the work itself - it has no idea what sort of data store the provider represents. How could it come up with the semantics of OfType without knowing whether the data store was SQL, LDAP or something else? I agree it takes a while to get your head round though :)

Resources