How to detect if a LINQ enumeration is materialized?

How to detect if a LINQ enumeration is materialized? - linq

Is there some way of detecting whether an enumerable built using LINQ (to Objects in this case) have been materialized or not? Other than trying to inspect the type of the underlying collection?
Specifically, since enumerable.ToArray() will build a new array even if the underlying collection already is an array I'm looking for a way of avoiding ToArray() being called twice on the same collection.

The enumerable won't have an "underlying collection", it will be a collection. Try casting it to one and use the resulting reference:
var coll = enumerable as ICollection<T>;
if (coll != null) {
// We have a collection!
}

Checking "is IEnumerable" can do this, there isn't another way. You could, however, not use the "var" declaration for the return type - because it is "hiding" from you the type. If you declare an explicit IEnumerable, then the compiler will let you know if that is what is being returned.

Related

IEnumerable<T>.Count() vs List<T>.Count with Entity Framework

I am retrieving a list of items using Entity Framework and if there are some items retrieved I do something with them.
var items = db.MyTable.Where(t => t.Expiration < DateTime.Now).ToList();
if(items.Count != 0)
{
// Do something...
}
The if statement could also be written as
if(items.Count() != 0)
{
// Do something...
}
In the first case, the .Count is a List<T>.Count property. In the second case, the .Count() is IEnumerable<T>.Count() extension method.
Although both approaches achieve the same result, however, is one more preferred than the other? (Possibly some difference in performance?)

Enumerable.Count<T> (the extension method for IEnumerable<T>) just calls Count if the underlying type is an ICollection<T>, so for List<T> there is no difference.
Queryable.Count<T> (the extension method for IQueryable<T>) will use the underlying query provider, which in many cases will push the count down to the actual SQL, which will perform faster than counting the objects in memory.
If a filter is applied (e.g. Count(i => i.Name = "John")) or if the underlying type is not an ICollection<T>, the collection is enumerated to compute the count.
is one more preferred than the other?
I generally prefer to use Count() since 1) it's more portable (the underlying type can be anything that implements IEnumerable<T> or IQueryable<T>) and 2) it's easier to add a filter later if necessary.
As Tim states in his comment, I also prefer using Any() to Count() > 0 since it doesn't have to actually count the items - it will just check for the existence of one item. Conversely I use !Any() instead of Count() == 0.

It depends on the underlying collection and where Linq will be pulling from. For example if it's SQL then using .ToList() will cause the query to pull back the entire list, and then count it. However, the .Count() extension method will translate it into a SQL COUNT statement on the database side. In which case there will be an obvious performance difference.
For just a standard List or Collection it's as stated in D. Stanley's answer.

I would say that it depends on what's going on inside the if block. If you're simply doing the check to determine whether to perform a sequence of operations on the underlying enumeration, then it's probably not needed in any event. Simply iterate over the enumeration (omitting ToList as well). If you're not using the collection inside the if block, then you should avoid using ToList and definitely use Any over any Count/Count() method.
Once you've performed the ToList then you're no longer using Entity Framework and I expect that Count() is only marginally slower than Count since, if the underlying collection is ICollection<T> it defers to that implementation. The only overhead would be determining whether it implements that interface.
http://msdn.microsoft.com/en-us/library/bb338038.aspx
Remarks:
If the type of source implements ICollection<T>, that implementation is used to obtain the count of elements. Otherwise, this method determines the count.

LINQ GroupBy Anonymous Type

I am wondering why GroupBy works with anonymous types.
List<string> values = new List<string>();
values.GroupBy(s => new { Length = s.Length, Value = s })
Anonymous types do not implement any interfaces, so I am confused how this is working.
I assume that the algorithm is working by creating an instance of the anonymous type for each item in the source and using hashing to group the items together. However, no IEqualityComparer is provided to define how to generate a hash or whether two instances are equal. I would assume, then, that the Object.Equals and Object.GetHashCode methods would be the fallback, which rely on object identity.
So, how is it that this is working as expected? And yet it doesn't work in an OrderBy. Do anonymous types override Equals and GetHashCode? or does the underlying GroupBy algorithm do some magic I haven't thought of?

As per the documentation, an anonymous type is a reference type:
From the perspective of the common language runtime, an anonymous type is no different from any other reference type.
Therefore, it will be using the default implementation for those functions as implemented by System.Object (which at least for equality is based on referential equality).
EDIT: Actually, as per that same first doco link it says:
Because the Equals and GetHashCode methods on anonymous types are defined in terms of the Equals and GetHashcode methods of the properties, two instances of the same anonymous type are equal only if all their properties are equal.

http://msdn.microsoft.com/en-us/library/bb397696.aspx
This link explains that GetHashCode and Equals are overridden.

It doesn't work on OrderBy because the new object does not implement IComparable.

Do collection initializers relate to LINQ in anyway like object initializers?

I am reading about LINQ and seeing Collection Initialzers come up, but does it really directly relate to LINQ like Object Initializers do?
List<string> stringsNew = new List<string> { "string 1", "string 2" };

Collection initializers can be said to be related to Linq in so far as you can thank Linq that the feature made it into C# 3. In case something ever happens to the linked question, the useful text is
The by-design goal motivated by typical usage scenarios for collection
initializers was to make initialization of existing collection types
possible in an expression syntax so that collection initializers could
be embedded in query comprehensions or converted to expression trees.
Every other scenario was lower priority; the feature exists at all
because it helps make LINQ work.
-Eric Lippert
They are useful in the context of Linq because they give you the ability to write stuff like the below example
var query = from foo in foos
select new Bar
{
ValueList = new List<string> { foo.A, foo.B, foo.C }
};
Where you can construct a query that projects a given foo into a Bar with a list of foo's property values. Without such initializer support, you couldn't create such a query (although you could turn ValueList into a static-length array and achieve something similar).
But, like object initializers and other features that are new with C# 3+, many features inspired or added expressly to make Linq work are no doubt useful in code that has nothing at all to do with Linq, and they do not require Linq to work (either via a using directive or DLL reference). Initializers are ultimately nothing more than syntactic sugar that the compiler will turn into the longer code you would have had to write yourself in earlier language versions.

Object and Collection Initializers have nothing to do with LINQ - they're a completely unrelated language feature.
Any class with an Add method that implements IEnumerable can use a collection initializer. The items in the braces will be added, one at a time, instead of having to repeatedly call inst.Add(item1).

How Does Queryable.OfType Work?

Important The question is not "What does Queryable.OfType do, it's "how does the code I see there accomplish that?"
Reflecting on Queryable.OfType, I see (after some cleanup):
public static IQueryable<TResult> OfType<TResult>(this IQueryable source)
{
return (IQueryable<TResult>)source.Provider.CreateQuery(
Expression.Call(
null,
((MethodInfo)MethodBase.GetCurrentMethod()).MakeGenericMethod(
new Type[] { typeof(TResult) }) ,
new Expression[] { source.Expression }));
}
So let me see if I've got this straight:
Use reflection to grab a reference to the current method (OfType).
Make a new method, which is exactly the same, by using MakeGenericMethod to change the type parameter of the current method to, er, exactly the same thing.
The argument to that new method will be not source but source.Expression. Which isn't an IQueryable, but we'll be passing the whole thing to Expression.Call, so that's OK.
Call Expression.Call, passing null as method (weird?) instance and the cloned method as its arguments.
Pass that result to CreateQuery and cast the result, which seems like the sanest part of the whole thing.
Now the effect of this method is to return an expression which tells the provider to omit returning any values where the type is not equal to TResult or one of its subtypes. But I can't see how the steps above actually accomplish this. It seems to be creating an expression representing a method which returns IQueryable<TResult>, and making the body of that method simply the entire source expression, without ever looking at the type. Is it simply expected that an IQueryable provider will just silently not return any records not of the selected type?
So are the steps above incorrect in some way, or am I just not seeing how they result in the behavior observed at runtime?

It's not passing in null as the method - it's passing it in as the "target expression", i.e. what it's calling the method on. This is null because OfType is a static method, so it doesn't need a target.
The point of calling MakeGenericMethod is that GetCurrentMethod() returns the open version, i.e. OfType<> instead of OfType<YourType>.
Queryable.OfType itself isn't meant to contain any of the logic for omitting returning any values. That's up to the LINQ provider. The point of Queryable.OfType is to build up the expression tree to include the call to OfType, so that when the LINQ provider eventually has to convert it into its native format (e.g. SQL) it knows that OfType was called.
This is how Queryable works in general - basically it lets the provider see the whole query expression as an expression tree. That's all it's meant to do - when the provider is asked to translate this into real code, that's where the magic happens.
Queryable couldn't possibly do the work itself - it has no idea what sort of data store the provider represents. How could it come up with the semantics of OfType without knowing whether the data store was SQL, LDAP or something else? I agree it takes a while to get your head round though :)

Executing a certain action for all elements in an Enumerable<T>

I have an Enumerable<T> and am looking for a method that allows me to execute an action for each element, kind of like Select but then for side-effects. Something like:
string[] Names = ...;
Names.each(s => Console.Writeline(s));
or
Names.each(s => GenHTMLOutput(s));
// (where GenHTMLOutput cannot for some reason receive the enumerable itself as a parameter)
I did try Select(s=> { Console.WriteLine(s); return s; }), but it wasn't printing anything.

A quick-and-easy way to get this is:
Names.ToList().ForEach(e => ...);

You are looking for the ever-elusive ForEach that currently only exists on the List generic collection. There are many discussions online about whether Microsoft should or should not add this as a LINQ method. Currently, you have to roll your own:
public static void ForEach<T>(this IEnumerable<T> value, Action<T> action)
{
foreach (T item in value)
{
action(item);
}
}
While the All() method provides similar abilities, it's use-case is for performing a predicate test on every item rather than an action. Of course, it can be persuaded to perform other tasks but this somewhat changes the semantics and would make it harder for others to interpret your code (i.e. is this use of All() for a predicate test or an action?).

Disclaimer: This post no longer resembles my original answer, but rather incorporates the some seven years experience I've gained since. I made the edit because this is a highly-viewed question and none of the existing answers really covered all the angles. If you want to see my original answer, it's available in the revision history for this post.
The first thing to understand here is C# linq operations like Select(), All(), Where(), etc, have their roots in functional programming. The idea was to bring some of the more useful and approachable parts of functional programming to the .Net world. This is important, because a key tenet of functional programming is for operations to be free of side effects. It's hard to understate this. However, in the case of ForEach()/each(), side effects are the entire purpose of the operation. Adding each() or ForEach() is not just outside the functional programming scope of the other linq operators, but in direct opposition to them.
But I understand this feels unsatisfying. It may help explain why ForEach() was omitted from the framework, but fails to address the real issue at hand. You have a real problem you need to solve. Why should all this ivory tower philosophy get in the way of something that might be genuinely useful?
Eric Lippert, who was on the C# design team at the time, can help us out here. He recommends using a traditional foreach loop:
[ForEach()] adds zero new representational power to the language. Doing this lets you rewrite this perfectly clear code:
foreach(Foo foo in foos){ statement involving foo; }
into this code:
foos.ForEach(foo=>{ statement involving foo; });
His point is, when you look closely at your syntax options, you don't gain anything new from a ForEach() extension versus a traditional foreach loop. I partially disagree. Imagine you have this:
foreach(var item in Some.Long(and => possibly)
.Complicated(set => ofLINQ)
.Expression(to => evaluate))
{
// now do something
}
This code obfuscates meaning, because it separates the foreach keyword from the operations in the loop. It also lists the loop command prior to the operations that define the sequence on which the loop operates. It feels much more natural to want to have those operations come first, and then have the the loop command at the end of the query definition. Also, the code is just ugly. It seems like it would be much nicer to be able to write this:
Some.Long(and => possibly)
.Complicated(set => ofLINQ)
.Expression(to => evaluate)
.ForEach(item =>
{
// now do something
});
However, even here, I eventually came around to Eric's point of view. I realized code like you see above is calling out for an additional variable. If you have a complicated set of LINQ expressions like that, you can add valuable information to your code by first assigning the result of the LINQ expression to a new variable:
var queryForSomeThing = Some.Long(and => possibly)
.Complicated(set => ofLINQ)
.Expressions(to => evaluate);
foreach(var item in queryForSomeThing)
{
// now do something
}
This code feels more natural. It puts the foreach keyword back next to the rest of the loop, and after the query definition. Most of all, the variable name can add new information that will be helpful to future programmers trying to understand the purpose of the LINQ query. Again, we see the desired ForEach() operator really added no new expressive power to the language.
However, we are still missing two features of a hypothetical ForEach() extension method:
It's not composable. I can't add a further .Where() or GroupBy() or OrderBy() after a foreach loop inline with the rest of the code, without creating a new statement.
It's not lazy. These operations happen immediately. It doesn't allow me to, say, have a form where a user chooses an operation as one field in a larger screen that is not acted on until the user presses a command button. This form might allow the user to change their mind before executing the command. This is perfectly normal (easy even) with a LINQ query, but not as simple with a foreach.
(FWIW, most naive .ForEach() implementations also have these issues. But it's possible to craft one without them.)
You could, of course, make your own ForEach() extension method. Several other answers have implementations of this method already; it's not all that complicated. However, I feel like it's unnecessary. There's already an existing method that fits what we want to do from both semantic and operational standpoints. Both of the missing features above can be addressed by use of the existing Select() operation.
Select() fits the kind of transformation or projection described by both of the examples above. Keep in mind, though, that I would still avoid creating side effects. The call to Select() should return either new objects or projections from the originals. This can sometimes be aided through the use of an anonymous type or dynamic object (if and only if necessary). If you need the results to persist in, say, an original list variable, you can always call .ToList() and assign it back to your original variable. I'll add here that I prefer working with IEnumerable<T> variables as much as possible over more concrete types.
myList = myList.Select(item => new SomeType(item.value1, item.value2 *4)).ToList();
In summary:
Just stick with foreach most of the time.
When foreach really won't do (which probably isn't as often as you think), use Select()
When you need to use Select(), you can still generally avoid (program-visible) side effects, possibly by projecting to an anonymous type.
Avoid the crutch of calling ToList(). You don't need it as much as you might think, and it can have significant negative consequence for performance and memory use.

Unfortunately there is no built-in way to do this in the current version of LINQ. The framework team neglected to add a .ForEach extension method. There's a good discussion about this going on right now on the following blog.
http://blogs.msdn.com/kirillosenkov/archive/2009/01/31/foreach.aspx
It's rather easy to add one though.
public static void ForEach<T>(this IEnumerable<T> enumerable, Action<T> action) {
foreach ( var cur in enumerable ) {
action(cur);
}
}

You cannot do this right away with LINQ and IEnumerable - you need to either implement your own extension method, or cast your enumeration to an array with LINQ and then call Array.ForEach():
Array.ForEach(MyCollection.ToArray(), x => x.YourMethod());
Please note that because of the way value types and structs work, if the collection is of a value type and you modify the elements of the collection this way, it will have no effect on the elements of the original collection.

Because LINQ is designed to be a query feature and not an update feature you will not find an extension which executes methods on IEnumerable<T> because that would allow you to execute a method (potentially with side effects). In this case you may as well just stick with
foreach(string name in Names)
Console.WriteLine(name);

Using Parallel Linq:
Names.AsParallel().ForAll(name => ...)

Well, you can also use the standard foreach keyword, just format it into a oneliner:
foreach(var n in Names.Where(blahblah)) DoStuff(n);
Sorry, thought this option deserves to be here :)

There is a ForEach method off of List. You could convert the Enumerable to List by calling the .ToList() method, and then call the ForEach method off of that.
Alternatively, I've heard of people defining their own ForEach method off of IEnumerable. This can be accomplished by essentially calling the ForEach method, but instead wrapping it in an extension method:
public static class IEnumerableExtensions
{
public static IEnumerable<T> ForEach<T>(this IEnumerable<T> _this, Action<T> del)
{
List<T> list = _this.ToList();
list.ForEach(del);
return list;
}
}

As mentioned before ForEach extension will do the fix.
My tip for the current question is how to execute the iterator
[I did try Select(s=> { Console.WriteLine(s); return s; }), but it wasn't printing anything.]
Check this
_= Names.Select(s=> { Console.WriteLine(s); return 0; }).Count();
Try it!

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to detect if a LINQ enumeration is materialized? - linq

The enumerable won't have an "underlying collection", it will be a collection. Try casting it to one and use the resulting reference: var coll = enumerable as ICollection<T>; if (coll != null) { // We have a collection! }

Checking "is IEnumerable" can do this, there isn't another way. You could, however, not use the "var" declaration for the return type - because it is "hiding" from you the type. If you declare an explicit IEnumerable, then the compiler will let you know if that is what is being returned.

Related

IEnumerable<T>.Count() vs List<T>.Count with Entity Framework

LINQ GroupBy Anonymous Type

Do collection initializers relate to LINQ in anyway like object initializers?

How Does Queryable.OfType Work?

Executing a certain action for all elements in an Enumerable<T>

Categories

Resources