IEnumerable<T>.Count() vs List<T>.Count with Entity Framework - linq

I am retrieving a list of items using Entity Framework and if there are some items retrieved I do something with them.
var items = db.MyTable.Where(t => t.Expiration < DateTime.Now).ToList();
if(items.Count != 0)
{
// Do something...
}
The if statement could also be written as
if(items.Count() != 0)
{
// Do something...
}
In the first case, the .Count is a List<T>.Count property. In the second case, the .Count() is IEnumerable<T>.Count() extension method.
Although both approaches achieve the same result, however, is one more preferred than the other? (Possibly some difference in performance?)

Enumerable.Count<T> (the extension method for IEnumerable<T>) just calls Count if the underlying type is an ICollection<T>, so for List<T> there is no difference.
Queryable.Count<T> (the extension method for IQueryable<T>) will use the underlying query provider, which in many cases will push the count down to the actual SQL, which will perform faster than counting the objects in memory.
If a filter is applied (e.g. Count(i => i.Name = "John")) or if the underlying type is not an ICollection<T>, the collection is enumerated to compute the count.
is one more preferred than the other?
I generally prefer to use Count() since 1) it's more portable (the underlying type can be anything that implements IEnumerable<T> or IQueryable<T>) and 2) it's easier to add a filter later if necessary.
As Tim states in his comment, I also prefer using Any() to Count() > 0 since it doesn't have to actually count the items - it will just check for the existence of one item. Conversely I use !Any() instead of Count() == 0.

It depends on the underlying collection and where Linq will be pulling from. For example if it's SQL then using .ToList() will cause the query to pull back the entire list, and then count it. However, the .Count() extension method will translate it into a SQL COUNT statement on the database side. In which case there will be an obvious performance difference.
For just a standard List or Collection it's as stated in D. Stanley's answer.

I would say that it depends on what's going on inside the if block. If you're simply doing the check to determine whether to perform a sequence of operations on the underlying enumeration, then it's probably not needed in any event. Simply iterate over the enumeration (omitting ToList as well). If you're not using the collection inside the if block, then you should avoid using ToList and definitely use Any over any Count/Count() method.
Once you've performed the ToList then you're no longer using Entity Framework and I expect that Count() is only marginally slower than Count since, if the underlying collection is ICollection<T> it defers to that implementation. The only overhead would be determining whether it implements that interface.
http://msdn.microsoft.com/en-us/library/bb338038.aspx
Remarks:
If the type of source implements ICollection<T>, that implementation is used to obtain the count of elements. Otherwise, this method determines the count.

Related

What is the benefit of using IQueryable and LINQ queries?

I have a project where was realized own configuration classes:
IconSizesConfigSection: ConfigurationSection
IconSizesCollection: ConfigurationElementCollection
IconSize: ConfigurationElement
In Config class exists this property:
public IQueryable<IconSize> IconSizes
{
get
{
IconSizesConfigSection configInfo = (IconSizesConfigSection)ConfigurationManager.GetSection("iconConfig");
return configInfo.IconSizes.OfType<IconSize>().AsQueryable<IconSize>();
}
}
IconSizes property returns IconSizesCollection which derives from ConfigurationElementCollection. In turn ConfigurationElementCollection derives from ICollection, IEnumerable.
In some another class I have such code:
var previewIconSize = Config.IconSizes.FirstOrDefault(c => c.Name == "AvatarSize");
Why in such case uses Deffered Execution?
Why initially it uses AsQueryable<IconSize>() for collection and then uses LINQ and Deffered Execution?
Is there any benefits compared with using simple List?
In these case, there is no practical benefit. Using IQueryable is helpful for cases when query rewriting/translation will optimize performance. You will actually incur decreased performance in the provided example.
One example of using IQueryable in a helpful way is the significant performance increase gained when lazily translating and evaluating queries against a database or web service. This will perform significantly better than the alternative of pulling massive result sets and applying query logic in active memory with a "simple List".
The way you can tell that using the IQueryable in your case is detrimental is that the collection is already loaded into memory, when you begin the query.
Both IEnumerable and IQueryable use deferred execution. The difference is that IQueryable is used to cross boundaries like database queries, entity framework queries or OData queries.
When an IQueryable is iterated over, the query is translated to the remote provider's idiom and executed there. When the response is received from the remote provider, it is translated to a local object representation.
Deferred Execution is good because your user may never use the result set and hence there would have been no point querying the data source.
There may be some LINQ methods your user can't use unless they cast the result to IQueryable which means you might restrict what they can do, or force them to cast/copy the list into something more useful.
If you use a List, then you're hard coding your solution to a List, do you care what the implementation of the collection is, does your user ... probably not as long as it supports the necessary interfaces.

Linq To Sql Where does not call overridden Equals

I'm currently working on a project where I'm going to do a lot of comparison of similar non-database (service layer objects for this discuss) objects and objects retrieved from the database via LinqToSql. For the sake of this discussion, assume I have a service layer Product object with a string field that is represented in the database. However, in the database, there is also a primary key Id that is not represented in the service layer.
Accordingly (as I often do for unit testing etc), I overrode Equals(Object), Equals(Product), and GetHashCode and implemented IEquatable with the expectation that I would be able to write code like this:
myContext.Products.Where(p => p.Equals(passedInProduct).SingleOrDefault();
And so forth.
The Equals override is tested and works. The objects are mutable so the usual caveats apply to the GetHashCode override. However, for the purposes of this example, the objects are not modified except by LtS and could be made readonly.
Here's the simple test:
Create a test object in memory and commit to the LtS context. By committing, the test object is populated with a few auto-generated fields.
Create another identical test object in memory (separate reference)
Attempt to retrieve the first object from the database using the second object as the criteria. (see code line above).
// Setup
string productDesc = "12A";
Product testProduct1 = _CreateTestProductInDatabase(productDesc);
Product testProduct2 = _CreateTestProduct(productDesc);
// check setup
Product retrievedProduct1 = ProductRepo.Retrieve(testProduct1);
//Assert.IsNotNull(retrievedProduct1);
// execute - try to retrieve the 'equivalent' product object
Product retrievedProduct2 = ProductRepo.Retrieve(testProduct2);
A simplified version of Retrieve (cruft removed is just parameter checks etc):
using (var dbContext = new ProductDataContext()) {
Product retrievedProduct = dbContext.Products
.Where(p => p.Equals(product)).SingleOrDefault();
NB: The overridden Equals method knows not to care about the auto-generated fields from the database and only looks at the string that is represented in the service layer.
Here's what I observed:
Retrieve on testProduct1 succeeds (no surprise, equal by reference)
Retrieve on testProduct2 fails (null)
The overridden Equals method called in the Retrieve method is never hit during either Retrieve calls
However, the overridden Equals method is called multiple times by the context on SubmitChanges (called when creating the first test object in the database) (works as expected).
Statically, the compiler knows that the type of the objects being emitted and is able to resolve the type.
So my specific questions:
Am I trying to do something ill-advised? Seems like a straightforward use of Equals.
Corollary to first question: alternate suggestions to deal with linq to sql equality checking while keeping comparison details inside the objects rather than the repository
Why might I have observed the Equals method being resolved in SubmitChanges but not in the Where clause?
I'm as much interested in understanding as making my Equals calls work. But I also would love to learn how to make this 'pattern' work rather than just understand why it appears to be an 'anti-pattern' in the contest of LtS and C#.
Please don't suggest I just filter directly on the context with Where statements. Obviously, I can remove the Equals call and do that. However, some of the other objects (not presented here) are large and a bit complicated. For the sake of maintenance and clarity, I want to keep knowledge of how to compare itself to another of its own type in one place and ideally as part of the object in question.
Some other things I tried that didn't change the behavior:
Overloaded and used == instead
Casting the lambda variable to the type p => (Product)p
Getting an IQueryable object first and calling Equals in the Where clause
Some other things I tried that didn't work:
Creating a static ProductEquals(Product first, Product second) method: System.NotSupportedException:has no supported translation to SQL.
Thanks StackOverflow contributors!
Re Possible dups: I've read ~10 other questions. I'd love a pointer to an exact duplicate but most don't seem to directly address what seems to be an oddity of LinqToSql.
Am I trying to do something ill-advised?
Absolutely. Consider what LINQ to SQL does: it creates a SQL representation of your query. It doesn't know what your overridden Equals method does, so can't translate that logic into SQL.
Corollary to first question: alternate suggestions to deal with linq to sql equality checking while keeping comparison details inside the objects rather than the repository
You'd need to do something with expression trees to represent the equality that way - and then build those expression trees up into a full query. It won't be fun, but it should be possible. It will affect how you build all your queries though.
I would have expected most database representations to be ID-based though, so you should be able to just compare IDs for equality. Usually when I've seen attempts to really model data in an OO fashion but store it in a database, the leakiness of the abstraction has caused a lot of pain.
Why might I have observed the Equals method being resolved in SubmitChanges but not in the Where clause?
Presumably SubmitChanges is working against a set of in-memory objects to work out what's changed - it doesn't have to do any conversion to SQL to do that part.

Can it be advantageous for a method to return IOrderedEnumerable<T> instead of IEnumerable<T>?

Can it be advantageous for a method to return IOrderedEnumerable instead of IEnumerable?
Only if you expect people to order that enumerable every time and would find it hard to figure out how to do this OR if you can provide a collection that implements that interface that can efficiently order its contents and is paired with an extension method that is aware of your collection.
Best option is to return a specific collection type (see Richter for details on that). 99 times out of 100 whoever gets even a simple enumerable can use the standard linq extension methods to order it if they want.
Specifically, it would be worth doing if it makes sense for further .ThenBy calls, and not otherwise.

Executing a certain action for all elements in an Enumerable<T>

I have an Enumerable<T> and am looking for a method that allows me to execute an action for each element, kind of like Select but then for side-effects. Something like:
string[] Names = ...;
Names.each(s => Console.Writeline(s));
or
Names.each(s => GenHTMLOutput(s));
// (where GenHTMLOutput cannot for some reason receive the enumerable itself as a parameter)
I did try Select(s=> { Console.WriteLine(s); return s; }), but it wasn't printing anything.
A quick-and-easy way to get this is:
Names.ToList().ForEach(e => ...);
You are looking for the ever-elusive ForEach that currently only exists on the List generic collection. There are many discussions online about whether Microsoft should or should not add this as a LINQ method. Currently, you have to roll your own:
public static void ForEach<T>(this IEnumerable<T> value, Action<T> action)
{
foreach (T item in value)
{
action(item);
}
}
While the All() method provides similar abilities, it's use-case is for performing a predicate test on every item rather than an action. Of course, it can be persuaded to perform other tasks but this somewhat changes the semantics and would make it harder for others to interpret your code (i.e. is this use of All() for a predicate test or an action?).
Disclaimer: This post no longer resembles my original answer, but rather incorporates the some seven years experience I've gained since. I made the edit because this is a highly-viewed question and none of the existing answers really covered all the angles. If you want to see my original answer, it's available in the revision history for this post.
The first thing to understand here is C# linq operations like Select(), All(), Where(), etc, have their roots in functional programming. The idea was to bring some of the more useful and approachable parts of functional programming to the .Net world. This is important, because a key tenet of functional programming is for operations to be free of side effects. It's hard to understate this. However, in the case of ForEach()/each(), side effects are the entire purpose of the operation. Adding each() or ForEach() is not just outside the functional programming scope of the other linq operators, but in direct opposition to them.
But I understand this feels unsatisfying. It may help explain why ForEach() was omitted from the framework, but fails to address the real issue at hand. You have a real problem you need to solve. Why should all this ivory tower philosophy get in the way of something that might be genuinely useful?
Eric Lippert, who was on the C# design team at the time, can help us out here. He recommends using a traditional foreach loop:
[ForEach()] adds zero new representational power to the language. Doing this lets you rewrite this perfectly clear code:
foreach(Foo foo in foos){ statement involving foo; }
into this code:
foos.ForEach(foo=>{ statement involving foo; });
His point is, when you look closely at your syntax options, you don't gain anything new from a ForEach() extension versus a traditional foreach loop. I partially disagree. Imagine you have this:
foreach(var item in Some.Long(and => possibly)
.Complicated(set => ofLINQ)
.Expression(to => evaluate))
{
// now do something
}
This code obfuscates meaning, because it separates the foreach keyword from the operations in the loop. It also lists the loop command prior to the operations that define the sequence on which the loop operates. It feels much more natural to want to have those operations come first, and then have the the loop command at the end of the query definition. Also, the code is just ugly. It seems like it would be much nicer to be able to write this:
Some.Long(and => possibly)
.Complicated(set => ofLINQ)
.Expression(to => evaluate)
.ForEach(item =>
{
// now do something
});
However, even here, I eventually came around to Eric's point of view. I realized code like you see above is calling out for an additional variable. If you have a complicated set of LINQ expressions like that, you can add valuable information to your code by first assigning the result of the LINQ expression to a new variable:
var queryForSomeThing = Some.Long(and => possibly)
.Complicated(set => ofLINQ)
.Expressions(to => evaluate);
foreach(var item in queryForSomeThing)
{
// now do something
}
This code feels more natural. It puts the foreach keyword back next to the rest of the loop, and after the query definition. Most of all, the variable name can add new information that will be helpful to future programmers trying to understand the purpose of the LINQ query. Again, we see the desired ForEach() operator really added no new expressive power to the language.
However, we are still missing two features of a hypothetical ForEach() extension method:
It's not composable. I can't add a further .Where() or GroupBy() or OrderBy() after a foreach loop inline with the rest of the code, without creating a new statement.
It's not lazy. These operations happen immediately. It doesn't allow me to, say, have a form where a user chooses an operation as one field in a larger screen that is not acted on until the user presses a command button. This form might allow the user to change their mind before executing the command. This is perfectly normal (easy even) with a LINQ query, but not as simple with a foreach.
(FWIW, most naive .ForEach() implementations also have these issues. But it's possible to craft one without them.)
You could, of course, make your own ForEach() extension method. Several other answers have implementations of this method already; it's not all that complicated. However, I feel like it's unnecessary. There's already an existing method that fits what we want to do from both semantic and operational standpoints. Both of the missing features above can be addressed by use of the existing Select() operation.
Select() fits the kind of transformation or projection described by both of the examples above. Keep in mind, though, that I would still avoid creating side effects. The call to Select() should return either new objects or projections from the originals. This can sometimes be aided through the use of an anonymous type or dynamic object (if and only if necessary). If you need the results to persist in, say, an original list variable, you can always call .ToList() and assign it back to your original variable. I'll add here that I prefer working with IEnumerable<T> variables as much as possible over more concrete types.
myList = myList.Select(item => new SomeType(item.value1, item.value2 *4)).ToList();
In summary:
Just stick with foreach most of the time.
When foreach really won't do (which probably isn't as often as you think), use Select()
When you need to use Select(), you can still generally avoid (program-visible) side effects, possibly by projecting to an anonymous type.
Avoid the crutch of calling ToList(). You don't need it as much as you might think, and it can have significant negative consequence for performance and memory use.
Unfortunately there is no built-in way to do this in the current version of LINQ. The framework team neglected to add a .ForEach extension method. There's a good discussion about this going on right now on the following blog.
http://blogs.msdn.com/kirillosenkov/archive/2009/01/31/foreach.aspx
It's rather easy to add one though.
public static void ForEach<T>(this IEnumerable<T> enumerable, Action<T> action) {
foreach ( var cur in enumerable ) {
action(cur);
}
}
You cannot do this right away with LINQ and IEnumerable - you need to either implement your own extension method, or cast your enumeration to an array with LINQ and then call Array.ForEach():
Array.ForEach(MyCollection.ToArray(), x => x.YourMethod());
Please note that because of the way value types and structs work, if the collection is of a value type and you modify the elements of the collection this way, it will have no effect on the elements of the original collection.
Because LINQ is designed to be a query feature and not an update feature you will not find an extension which executes methods on IEnumerable<T> because that would allow you to execute a method (potentially with side effects). In this case you may as well just stick with
foreach(string name in Names)
Console.WriteLine(name);
Using Parallel Linq:
Names.AsParallel().ForAll(name => ...)
Well, you can also use the standard foreach keyword, just format it into a oneliner:
foreach(var n in Names.Where(blahblah)) DoStuff(n);
Sorry, thought this option deserves to be here :)
There is a ForEach method off of List. You could convert the Enumerable to List by calling the .ToList() method, and then call the ForEach method off of that.
Alternatively, I've heard of people defining their own ForEach method off of IEnumerable. This can be accomplished by essentially calling the ForEach method, but instead wrapping it in an extension method:
public static class IEnumerableExtensions
{
public static IEnumerable<T> ForEach<T>(this IEnumerable<T> _this, Action<T> del)
{
List<T> list = _this.ToList();
list.ForEach(del);
return list;
}
}
As mentioned before ForEach extension will do the fix.
My tip for the current question is how to execute the iterator
[I did try Select(s=> { Console.WriteLine(s); return s; }), but it wasn't printing anything.]
Check this
_= Names.Select(s=> { Console.WriteLine(s); return 0; }).Count();
Try it!

Do you ToList()?

Do you have a default type that you prefer to use in your dealings with the results of LINQ queries?
By default LINQ will return an IEnumerable<> or maybe an IOrderedEnumerable<>. We have found that a List<> is generally more useful to us, so have adopted a habit of ToList()ing our queries most of the time, and certainly using List<> in our function arguments and return values.
The only exception to this has been in LINQ to SQL where calling .ToList() would enumerate the IEnumerable prematurely.
We are also using WCF extensively, the default collection type of which is System.Array. We always change this to System.Collections.Generic.List in the Service Reference Settings dialog in VS2008 for consistency with the rest of our codebase.
What do you do?
ToList always evaluates the sequence immediately - not just in LINQ to SQL. If you want that, that's fine - but it's not always appropriate.
Personally I would try to avoid declaring that you return List<T> directly - usually IList<T> is more appropriate, and allows you to change to a different implementation later on. Of course, there are some operations which are only specified on List<T> itself... this sort of decision is always tricky.
EDIT: (I would have put this in a comment, but it would be too bulky.) Deferred execution allows you to deal with data sources which are too big to fit in memory. For instance, if you're processing log files - transforming them from one format to another, uploading them into a database, working out some stats, or something like that - you may very well be able to handle arbitrary amounts of data by streaming it, but you really don't want to suck everything into memory. This may not be a concern for your particular application, but it's something to bear in mind.
We have the same scenario - WCF communications to a server, the server uses LINQtoSQL.
We use .ToArray() when requesting objects from the server, because it's "illegal" for the client to change the list. (Meaning, there is no purpose to support ".Add", ".Remove", etc).
While still on the server, however, I would recommend that you leave it as it's default (which is not IEnumerable, but rather IQueryable). This way, if you want to filter even more based on some criteria, the filtering is STILL on the SQL side until evaluated.
This is a very important point as it means incredible performance gains or losses depending on what you do.
EXAMPLE:
// This is just an example... imagine this is on the server only. It's the
// basic method that gets the list of clients.
private IEnumerable<Client> GetClients()
{
var result = MyDataContext.Clients;
return result.AsEnumerable();
}
// This method here is actually called by the user...
public Client[] GetClientsForLoggedInUser()
{
var clients = GetClients().Where(client=> client.Owner == currentUser);
return clients.ToArray();
}
Do you see what's happening there? The "GetClients" method is going to force a download of ALL 'clients' from the database... THEN the Where clause will happen in the GetClientsForLoogedInUser method to filter it down.
Now, notice the slight change:
private IQueryable<Client> GetClients()
{
var result = MyDataContext.Clients;
return result.AsQueryable();
}
Now, the actual evaluation won't happen until ".ToArray" is called... and SQL will do the filtering. MUCH better!
In the Linq-to-Objects case, returning List<T> from a function isn't as nice as returning IList<T>, as THE VENERABLE SKEET points out. But often you can still do better than that. If the thing you are returning ought to be immutable, IList is a bad choice because it invites the caller to add or remove things.
For example, sometimes you have a method or property that returns the result of a Linq query or uses yield return to lazily generate a list, and then you realise that it would be better to do that the first time you're called, cache the result in a List<T> and return the cached version thereafter. That's when returning IList may be a bad idea, because the caller may modify the list for their own purposes, which will then corrupt your cache, making their changes visible to all other callers.
Better to return IEnumerable<T>, so all they have is forward iteration. And if the caller wants rapid random access, i.e. they wish they could use [] to access by index, they can use ElementAt, which Linq defines so that it quietly sniffs for IList and uses that if available, and if not it does the dumb linear lookup.
One thing I've used ToList for is when I've got a complex system of Linq expressions mixed with custom operators that use yield return to filter or transform lists. Stepping through in the debugger can get mighty confusing as it jumps around doing lazy evaluation, so I sometimes temporarily add a ToList() to a few places so that I can more easily follow the execution path. (Although if the things you are executing have side-effects, this can change the meaning of the program.)
It depends if you need to modify the collection. I like to use an Array when I know that no one is going to add/delete items. I use a list when I need to sort/add/delete items. But, usually I just leave it as IEnumerable as long as I can.
If you don't need the added features of List<>, why not just stick with IQueryable<> ?!?!?! Lowest common denominator is the best solution (especially when you see Timothy's answer).

Resources