Optimize IEnumerable to HashSet conversion in LINQ - performance

public HashSet<Student> GetStudents(int studentId)
{
IEnumerable<Student> studentTypes = this.studentTypes .Where(x => (x.studentID== studentId));
if (studentTypes .FirstOrDefault() != null)
{
//return new HashSet<Student>(studentTypes);
return studentTypes.ToHashSet();
}
else
{
return new HashSet<Student>();
}
}
public static class LinqUtilities
{
public static HashSet<T> ToHashSet<T>(this IEnumerable<T> enumerable)
{
HashSet<T> hashSet = new HashSet<T>();
foreach (var en in enumerable)
{
hashSet.Add(en);
}
return hashSet;
}
}
This function is called alot of times say 1000 times and there are 5000 students in the result set.
How can I optimize this function ...I know that the conversion from IEnumerable to HashSet is causing a lot of overheads.
ToHashSet is my extension method.
This function is to slow and eating a lot of time.

First, you don't need to enumerate the hashset values in your utilities function
you could improve the efficient by using nice static extension class written by #Jon
Converting linq result to hashset
and i think you don't need to check on the FirstOrDefault since the extension will handle the new student object given T
so you could change to more clean and tidy way.
IEnumerable<Student> studentTypes = this.studentTypes.Where(x => (x.studentID== studentId));
return studentTypes.toHashSet();
The other option is you can pass you IEnumerable into your constructor for HashSet
like
HashSet<Student> studentTypes = new HashSet<Student>(this.studentTypes.Where(x => (x.studentID== studentId)));
so you only have one line of code in your GetStudents function

Don't run the query twice per call.
//sets up a deferred query. This query will be "executed" when enumerated.
IEnumerable<Student> studentTypes = this.studentTypes
.Where(x => (x.studentID== studentId));
//enumeration #1 (stops on first hit)
if (studentTypes .FirstOrDefault() != null)
{
//enumeration #2
return studentTypes.ToHashSet();
Your condition is unnecessary:
//sets up a deferred query. This query will be "executed" when enumerated.
IEnumerable<Student> studentTypes = this.studentTypes
.Where(x => (x.studentID== studentId));
//enumeration #1
return studentTypes.ToHashSet();
I know that the conversion from Ienumerable to Hasset is causing a lot
of overheads
That's bull. You've measured nothing and are misleading yourself to optimize the wrong part of the code.

Related

Linq query where there's a certain desired relationship between items in the result

A linq query Where clause can apply a func to an item in the original set and return a bool to include or not include the item based on the item's characteristics. Great stuff:
var q = myColl.Where(o => o.EffectiveDate = LastThursday);
But what if I want to find a set of items where each item is related to the last item in some way? Like:
var q = myColl.Where(o => o.EffectiveDate = thePreviousItem.ExpirationDate);
How do you make a Where (or other linq function) "jump out" of the current item?
Here's what I tried, trying to be clever. I made every item an array just so I can use the Aggregate function:
public IQueryable<T> CurrentVersions
{
get => AllVersions
.Select(vo => new T[] { vo })
.Aggregate((voa1, voa2) => voa1[0].BusinessExpirationDate.Value == voa2[0].BusinessEffectiveDate.Value ? voa1.Concat(voa2).ToArray() : voa1)
.SelectMany(vo => vo);
}
but that doesn't compile on the SelectMany:
The type arguments for method Enumerable.SelectMany<TSource,
TResult>(IEnumerable<TSource>, Func<TSource, IEnumerable<TResult>>)
cannot be inferred from the usage. Try specifying the type arguments
explicitly.
EDIT (SOLUTION)
As it turns out, I was on the right track, but was just confused about what SelectMany does. I didn't need it. I also needed to change IQueryable to IEnumerable because I'm using EF and you can't query after you let go of the DbContext. So, here is the actual solution.
public IEnumerable<T> CurrentVersions
{
get => AllVersions
.Select(vo => new T[] { vo })
.Aggregate((voa1, voa2) => voa1[0].BusinessExpirationDate.Value == voa2[0].BusinessEffectiveDate.Value ? voa1.Concat(voa2).ToArray() : voa1);
}
Linq queries are most effective when each item is processed in isolation. It doesn't work well when trying to relate items within the same collection, without having to process the same collection multiple times and standard linq operators.
The MoreLINQ library helps provide additional operators to fill in some of those gaps. I'm not sure what operators it provides that could be used in this instance, but I know it has a Pairwise() method that combines the current and previous items in the iteration.
In general, for situations like this, if you needed to roll out your own, it would be far easier to write it using a generator to generate your sequence. Either as a general purpose extension method:
public static IEnumerable<TSource> WhereWithPrevious<TSource>(
this IEnumerable<TSource> source,
Func<TSource, TSource, bool> predicate)
{
using (var iter = source.GetEnumerator())
{
if (!iter.MoveNext())
yield break;
var previous = iter.Current;
while (iter.MoveNext())
{
var current = iter.Current;
if (predicate(current, previous))
yield return current;
}
}
}
or one specifically for the problem you're trying to solve.
public static IEnumerable<MyType> GetVersions(IEnumerable<MyType> source)
{
using (var iter = source.GetEnumerator())
{
if (!iter.MoveNext())
yield break;
var previous = iter.Current;
while (iter.MoveNext())
{
var current = iter.Current;
if (current.EffectiveDate == previous.ExpirationDate)
yield return current;
}
}
}
An alternative approach which while standard practice in other languages but terribly inefficient here would be to zip the collection with itself offset by one.
var query = Collection.Skip(1).Zip(Collection, (c, p) => (current:c,previous:p))
.Where(x => x.current.EffectiveDate == x.previous.ExpirationDate)
...;
And with all of that said, using any of these options will most likely make your query incompatible with query providers. It's not something you would want expressed as a single query anyway.

Scalable Contains method for LINQ against a SQL backend

I'm looking for an elegant way to execute a Contains() statement in a scalable way. Please allow me to give some background before I come to the actual question.
The IN statement
In Entity Framework and LINQ to SQL the Contains statement is translated as a SQL IN statement. For instance, from this statement:
var ids = Enumerable.Range(1,10);
var courses = Courses.Where(c => ids.Contains(c.CourseID)).ToList();
Entity Framework will generate
SELECT
[Extent1].[CourseID] AS [CourseID],
[Extent1].[Title] AS [Title],
[Extent1].[Credits] AS [Credits],
[Extent1].[DepartmentID] AS [DepartmentID]
FROM [dbo].[Course] AS [Extent1]
WHERE [Extent1].[CourseID] IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
Unfortunately, the In statement is not scalable. As per MSDN:
Including an extremely large number of values (many thousands) in an IN clause can consume resources and return errors 8623 or 8632
which has to do with running out of resources or exceeding expression limits.
But before these errors occur, the IN statement becomes increasingly slow with growing numbers of items. I can't find documentation about its growth rate, but it performs well up to a few thousands of items, but beyond that it gets dramatically slow. (Based on SQL Server experiences).
Scalable
We can't always avoid this statement. A JOIN with the source data in stead would generally perform much better, but that's only possible when the source data is in the same context. Here I'm dealing with data coming from a client in a disconnected scenario. So I have been looking for a scalable solution. A satisfactory approach turned out to be cutting the operation into chunks:
var courses = ids.ToChunks(1000)
.Select(chunk => Courses.Where(c => chunk.Contains(c.CourseID)))
.SelectMany(x => x).ToList();
(where ToChunks is this little extension method).
This executes the query in chunks of 1000 that all perform well enough. With e.g. 5000 items, 5 queries will run that together are likely to be faster than one query with 5000 items.
But not DRY
But of course I don't want to scatter this construct all over my code. I am looking for an extension method by which any IQueryable<T> can be transformed into a chunky executing statement. Ideally something like this:
var courses = Courses.Where(c => ids.Contains(c.CourseID))
.AsChunky(1000)
.ToList();
But maybe this
var courses = Courses.ChunkyContains(c => c.CourseID, ids, 1000)
.ToList();
I've given the latter solution a first shot:
public static IEnumerable<TEntity> ChunkyContains<TEntity, TContains>(
this IQueryable<TEntity> query,
Expression<Func<TEntity,TContains>> match,
IEnumerable<TContains> containList,
int chunkSize = 500)
{
return containList.ToChunks(chunkSize)
.Select (chunk => query.Where(x => chunk.Contains(match)))
.SelectMany(x => x);
}
Obviously, the part x => chunk.Contains(match) doesn't compile. But I don't know how to manipulate the match expression into a Contains expression.
Maybe someone can help me make this solution work. And of course I'm open to other approaches to make this statement scalable.
I’ve solved this problem with a little different approach a view month ago. Maybe it’s a good solution for you too.
I didn’t want my solution to change the query itself. So a ids.ChunkContains(p.Id) or a special WhereContains method was unfeasible. Also should the solution be able to combine a Contains with another filter as well as using the same collection multiple times.
db.TestEntities.Where(p => (ids.Contains(p.Id) || ids.Contains(p.ParentId)) && p.Name.StartsWith("Test"))
So I tried to encapsulate the logic in a special ToList method that could rewrite the Expression for a specified collection to be queried in chunks.
var ids = Enumerable.Range(1, 11);
var result = db.TestEntities.Where(p => Ids.Contains(p.Id) && p.Name.StartsWith ("Test"))
.ToChunkedList(ids,4);
To rewrite the expression tree I discovered all Contains Method calls from local collections in the query with a view helping classes.
private class ContainsExpression
{
public ContainsExpression(MethodCallExpression methodCall)
{
this.MethodCall = methodCall;
}
public MethodCallExpression MethodCall { get; private set; }
public object GetValue()
{
var parent = MethodCall.Object ?? MethodCall.Arguments.FirstOrDefault();
return Expression.Lambda<Func<object>>(parent).Compile()();
}
public bool IsLocalList()
{
Expression parent = MethodCall.Object ?? MethodCall.Arguments.FirstOrDefault();
while (parent != null) {
if (parent is ConstantExpression)
return true;
var member = parent as MemberExpression;
if (member != null) {
parent = member.Expression;
} else {
parent = null;
}
}
return false;
}
}
private class FindExpressionVisitor<T> : ExpressionVisitor where T : Expression
{
public List<T> FoundItems { get; private set; }
public FindExpressionVisitor()
{
this.FoundItems = new List<T>();
}
public override Expression Visit(Expression node)
{
var found = node as T;
if (found != null) {
this.FoundItems.Add(found);
}
return base.Visit(node);
}
}
public static List<T> ToChunkedList<T, TValue>(this IQueryable<T> query, IEnumerable<TValue> list, int chunkSize)
{
var finder = new FindExpressionVisitor<MethodCallExpression>();
finder.Visit(query.Expression);
var methodCalls = finder.FoundItems.Where(p => p.Method.Name == "Contains").Select(p => new ContainsExpression(p)).Where(p => p.IsLocalList()).ToList();
var localLists = methodCalls.Where(p => p.GetValue() == list).ToList();
If the local collection passed in the ToChunkedList method was found in the query expression, I replace the Contains call to the original list with a new call to a temporary list containing the ids for one batch.
if (localLists.Any()) {
var result = new List<T>();
var valueList = new List<TValue>();
var containsMethod = typeof(Enumerable).GetMethods(BindingFlags.Static | BindingFlags.Public)
.Single(p => p.Name == "Contains" && p.GetParameters().Count() == 2)
.MakeGenericMethod(typeof(TValue));
var queryExpression = query.Expression;
foreach (var item in localLists) {
var parameter = new List<Expression>();
parameter.Add(Expression.Constant(valueList));
if (item.MethodCall.Object == null) {
parameter.AddRange(item.MethodCall.Arguments.Skip(1));
} else {
parameter.AddRange(item.MethodCall.Arguments);
}
var call = Expression.Call(containsMethod, parameter.ToArray());
var replacer = new ExpressionReplacer(item.MethodCall,call);
queryExpression = replacer.Visit(queryExpression);
}
var chunkQuery = query.Provider.CreateQuery<T>(queryExpression);
for (int i = 0; i < Math.Ceiling((decimal)list.Count() / chunkSize); i++) {
valueList.Clear();
valueList.AddRange(list.Skip(i * chunkSize).Take(chunkSize));
result.AddRange(chunkQuery.ToList());
}
return result;
}
// if the collection was not found return query.ToList()
return query.ToList();
Expression Replacer:
private class ExpressionReplacer : ExpressionVisitor {
private Expression find, replace;
public ExpressionReplacer(Expression find, Expression replace)
{
this.find = find;
this.replace = replace;
}
public override Expression Visit(Expression node)
{
if (node == this.find)
return this.replace;
return base.Visit(node);
}
}
Please allow me to provide an alternative to the Chunky approach.
The technique involving Contains in your predicate works well for:
A constant list of values (no volatile).
A small list of values.
Contains will do great if your local data has those two characteristics because these small set of values will be hardcoded in the final SQL query.
The problem begins when your list of values has entropy (non-constant). As of this writing, Entity Framework (Classic and Core) do not try to parameterize these values in any way, this forces SQL Server to generate a query plan every time it sees a new combination of values in your query. This operation is expensive and gets aggravated by the overall complexity of your query (e.g. many tables, a lot of values in the list, etc.).
The Chunky approach still suffers from this SQL Server query plan cache pollution problem, because it does not parametrizes the query, it just moves the cost of creating a big execution plan into smaller ones that are more easy to compute (and discard) by SQL Server, furthermore, every chunk adds an additional round-trip to the database, which increases the time needed to resolve the query.
An Efficient Solution for EF Core
🎉 NEW! QueryableValues EF6 Edition has arrived!
For EF Core keep reading below.
Wouldn't it be nice to have a way of composing local data in your query in a way that's SQL Server friendly? Enter QueryableValues.
I designed this library with these two main goals:
It MUST solve the SQL Server's query plan cache pollution problem ✅
It MUST be fast! ⚡
It has a flexible API that allows you to compose local data provided by an IEnumerable<T> and you get back an IQueryable<T>; just use it as if it were another entity of your DbContext (really), e.g.:
// Sample values.
IEnumerable<int> values = Enumerable.Range(1, 1000);
// Using a Join (query syntax).
var query1 =
from e in dbContext.MyEntities
join v in dbContext.AsQueryableValues(values) on e.Id equals v
select new
{
e.Id,
e.Name
};
// Using Contains (method syntax)
var query2 = dbContext.MyEntities
.Where(e => dbContext.AsQueryableValues(values).Contains(e.Id))
.Select(e => new
{
e.Id,
e.Name
});
You can also compose complex types!
It goes without saying that the provided IEnumerable<T> is only enumerated at the time that your query is materialized (not before), preserving the same behavior of EF Core in this regard.
How Does It Works?
Internally QueryableValues creates a parameterized query and provides your values in a serialized format that is natively understood by SQL Server. This allows your query to be resolved with a single round-trip to the database and avoids creating a new query plan on subsequent executions due to the parameterized nature of it.
Useful Links
Nuget Package
GitHub Repository
Benchmarks
SQL Server Cache Pollution Problem
QueryableValues is distributed under the MIT license
Linqkit to the rescue! Might be a better way that does it directly, but this seems to work fine and makes it pretty clear what's being done. The addition being AsExpandable(), which lets you use the Invoke extension.
using LinqKit;
public static IEnumerable<TEntity> ChunkyContains<TEntity, TContains>(
this IQueryable<TEntity> query,
Expression<Func<TEntity,TContains>> match,
IEnumerable<TContains> containList,
int chunkSize = 500)
{
return containList
.ToChunks(chunkSize)
.Select (chunk => query.AsExpandable()
.Where(x => chunk.Contains(match.Invoke(x))))
.SelectMany(x => x);
}
You might also want to do this:
containsList.Distinct()
.ToChunks(chunkSize)
...or something similar so you don't get duplicate results if something this occurs:
query.ChunkyContains(x => x.Id, new List<int> { 1, 1 }, 1);
Another way would be to build the predicate this way (of course, some parts should be improved, just giving the idea).
public static Expression<Func<TEntity, bool>> ContainsPredicate<TEntity, TContains>(this IEnumerable<TContains> chunk, Expression<Func<TEntity, TContains>> match)
{
return Expression.Lambda<Func<TEntity, bool>>(Expression.Call(
typeof (Enumerable),
"Contains",
new[]
{
typeof (TContains)
},
Expression.Constant(chunk, typeof(IEnumerable<TContains>)), match.Body),
match.Parameters);
}
which you could call in your ChunkContains method
return containList.ToChunks(chunkSize)
.Select(chunk => query.Where(ContainsPredicate(chunk, match)))
.SelectMany(x => x);
Using a stored procedure with a table valued parameter could also work well. You in effect write a joint In the stored procedure between your table / view and the table valued parameter.
https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/sql/table-valued-parameters

Regarding the Efficiency of the LINQ Any() Method

Is there any performance difference between these two approaches?
// First approach, iterating until a match
public bool Find(IEnumerable<Object> allObjects, Object testObj)
{
foreach (Object obj in allObjects)
{
if (obj.Equals(testObj))
{ return true; }
}
return false;
}
// Second approach, using LINQ and Any()
public bool Find(IEnumerable<Object> allObjects, Object testObj)
{
var query = from Object obj in allObjects where obj.Equals(testObj) select obj;
return query.Any();
}
My question is whether the LINQ version compares testObj to all objects in the collection and then the Any() method checks if the resulting collection is empty. This would be generally less efficient than the first case where the iteration stops after the first match.
No, the performance should be equivalent - Any() will stop iterat​ing over the source enumeration after the first match.
Also you could do this more concise (and easier to read and understand, but that's a matter of opinion) using method syntax:
return allObjects.Any(obj => obj.Equals(testObj));

Linq Parsing Error when trying to create seperation of concerns

I am in the middle of a refactoring cycle where I converted some extension methods that used to look like this:
public static IQueryable<Family> FilterOnRoute(this IQueryable<Family> families, WicRoute route)
{
return families.Where(fam => fam.PODs
.Any(pod => pod.Route.RouteID == route.RouteID));
}
to a more fluent implementation like this:
public class SimplifiedFamilyLinqBuilder
{
private IQueryable<Family> _families;
public SimplifiedFamilyLinqBuilder Load(IQueryable<Family> families)
{
_families = families;
return this;
}
public SimplifiedFamilyLinqBuilder OnRoute(WicRoute route)
{
_families = _families.Where(fam => fam.PODs
.Any(pod => pod.Route.RouteID == route.RouteID));
return this;
}
public IQueryable<Family> AsQueryable()
{
return _families;
}
}
which I can call like this: (note this is using Linq-to-Nhibernate)
var families =
new SimplifiedFamilyLinqBuilder()
.Load(session.Query<Family>())
.OnRoute(new WicRoute() {RouteID = 1})
.AsQueryable()
.ToList();
this produces the following SQL which is fine with me at the moment: (of note is that the above Linq is being translated to a SQL Query)
select ... from "Family" family0_
where exists (select pods1_.PODID from "POD" pods1_
inner join Route wicroute2_ on pods1_.RouteID=wicroute2_.RouteID
where family0_.FamilyID=pods1_.FamilyID
and wicroute2_.RouteID=#p0);
#p0 = 1
my next effort in refactoring is to move the query part that deals with the child to another class like this:
public class SimplifiedPODLinqBuilder
{
private IQueryable<POD> _pods;
public SimplifiedPODLinqBuilder Load(IQueryable<POD> pods)
{
_pods = pods;
return this;
}
public SimplifiedPODLinqBuilder OnRoute(WicRoute route)
{
_pods = _pods.Where(pod => pod.Route.RouteID == route.RouteID);
return this;
}
public IQueryable<POD> AsQueryable()
{
return _pods;
}
}
with SimplifiedFamilyLinqBuilder changing to this:
public SimplifiedFamilyLinqBuilder OnRoute(WicRoute route)
{
_families = _families.Where(fam =>
_podLinqBuilder.Load(fam.PODs.AsQueryable())
.OnRoute(route)
.AsQueryable()
.Any()
);
return this;
}
only I now get this error:
Remotion.Linq.Parsing.ParserException : Cannot parse expression 'value(Wic.DataTests.LinqBuilders.SimplifiedPODLinqBuilder)' as it has an unsupported type. Only query sources (that is, expressions that implement IEnumerable) and query operators can be parsed.
I started to implement IQueryable on SimplifiedPODLinqBuilder(as that seemed more logical than implementing IEnumberable) and thought I would be clever by doing this:
public class SimplifiedPODLinqBuilder : IQueryable
{
private IQueryable<POD> _pods;
...
public IEnumerator GetEnumerator()
{
return _pods.GetEnumerator();
}
public Expression Expression
{
get { return _pods.Expression; }
}
public Type ElementType
{
get { return _pods.ElementType; }
}
public IQueryProvider Provider
{
get { return _pods.Provider; }
}
}
only to get this exception (apparently Load is not being called and _pods is null):
System.NullReferenceException : Object reference not set to an instance of an object.
is there a way for me to refactor this code out that will parse properly into an expression that will go to SQL?
The part fam => _podLinqBuilder.Load(fam.PODs.AsQueryable() is never going to work, because the linq provider will try to parse this into SQL and for that it needs mapped members of Family after the =>, or maybe a mapped user-defined function but I don't know if Linq-to-Nhibernate supports that (I never really worked with it, because I still doubt if it is production-ready).
So, what can you do?
To be honest, I like the extension methods much better. You switched to a stateful approach, which doesn't mix well with the stateless paradigm of linq. So you may consider to retrace your steps.
Another option: the expression in .Any(pod => pod.Route.RouteID == route.RouteID)); could be paremeterized (.Any(podExpression), with
OnRoute(WicRoute route, Expression<Func<POD,bool>> podExpression)
(pseudocode).
Hope this makes any sense.
You need to separate methods you intend to call from expressions you intend to translate.
This is great, you want each of those methods to run. They return an instance that implements IQueryable<Family> and operate on that instance.
var families = new SimplifiedFamilyLinqBuilder()
.Load(session.Query<Family>())
.OnRoute(new WicRoute() {RouteID = 1})
.AsQueryable()
.ToList();
This is no good. you don't want Queryable.Where to get called, you want it to be an expression tree which can be translated to SQL. But PodLinqBuilder.Load is a node in that expression tree which can't be translated to SQL!
families = _families
.Where(fam => _podLinqBuilder.Load(fam.PODs.AsQueryable())
.OnRoute(route)
.AsQueryable()
.Any();
You can't call .Load inside the Where expression (it won't translate to sql).
You can't call .Load outside the Where expression (you don't have the fam parameter).
In the name of "separation of concerns", you are mixing query construction methods with query definition expressions. LINQ, by its Integrated nature, encourages you to attempt this thing which will not work.
Consider making expression construction methods instead of query construction methods.
public static Expression<Func<Pod, bool>> GetOnRouteExpr(WicRoute route)
{
int routeId = route.RouteID;
Expression<Func<Pod, bool>> result = pod => pod.Route.RouteID == route.RouteID;
return result;
}
called by:
Expression<Func<Pod, bool>> onRoute = GetOnRouteExpr(route);
families = _families.Where(fam => fam.PODs.Any(onRoute));
With this approach, the question is now - how do I fluidly hang my ornaments from the expression tree?

LINQ equivalent of foreach for IEnumerable<T>

I'd like to do the equivalent of the following in LINQ, but I can't figure out how:
IEnumerable<Item> items = GetItems();
items.ForEach(i => i.DoStuff());
What is the real syntax?
There is no ForEach extension for IEnumerable; only for List<T>. So you could do
items.ToList().ForEach(i => i.DoStuff());
Alternatively, write your own ForEach extension method:
public static void ForEach<T>(this IEnumerable<T> enumeration, Action<T> action)
{
foreach(T item in enumeration)
{
action(item);
}
}
Fredrik has provided the fix, but it may be worth considering why this isn't in the framework to start with. I believe the idea is that the LINQ query operators should be side-effect-free, fitting in with a reasonably functional way of looking at the world. Clearly ForEach is exactly the opposite - a purely side-effect-based construct.
That's not to say this is a bad thing to do - just thinking about the philosophical reasons behind the decision.
Update 7/17/2012: Apparently as of C# 5.0, the behavior of foreach described below has been changed and "the use of a foreach iteration variable in a nested lambda expression no longer produces unexpected results." This answer does not apply to C# ≥ 5.0.
#John Skeet and everyone who prefers the foreach keyword.
The problem with "foreach" in C# prior to 5.0, is that it is inconsistent with how the equivalent "for comprehension" works in other languages, and with how I would expect it to work (personal opinion stated here only because others have mentioned their opinion regarding readability). See all of the questions concerning "Access to modified closure"
as well as "Closing over the loop variable considered harmful". This is only "harmful" because of the way "foreach" is implemented in C#.
Take the following examples using the functionally equivalent extension method to that in #Fredrik Kalseth's answer.
public static class Enumerables
{
public static void ForEach<T>(this IEnumerable<T> #this, Action<T> action)
{
foreach (T item in #this)
{
action(item);
}
}
}
Apologies for the overly contrived example. I'm only using Observable because it's not entirely far fetched to do something like this. Obviously there are better ways to create this observable, I am only attempting to demonstrate a point. Typically the code subscribed to the observable is executed asynchronously and potentially in another thread. If using "foreach", this could produce very strange and potentially non-deterministic results.
The following test using "ForEach" extension method passes:
[Test]
public void ForEachExtensionWin()
{
//Yes, I know there is an Observable.Range.
var values = Enumerable.Range(0, 10);
var observable = Observable.Create<Func<int>>(source =>
{
values.ForEach(value =>
source.OnNext(() => value));
source.OnCompleted();
return () => { };
});
//Simulate subscribing and evaluating Funcs
var evaluatedObservable = observable.ToEnumerable().Select(func => func()).ToList();
//Win
Assert.That(evaluatedObservable,
Is.EquivalentTo(values.ToList()));
}
The following fails with the error:
Expected: equivalent to < 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 >
But was: < 9, 9, 9, 9, 9, 9, 9, 9, 9, 9 >
[Test]
public void ForEachKeywordFail()
{
//Yes, I know there is an Observable.Range.
var values = Enumerable.Range(0, 10);
var observable = Observable.Create<Func<int>>(source =>
{
foreach (var value in values)
{
//If you have resharper, notice the warning
source.OnNext(() => value);
}
source.OnCompleted();
return () => { };
});
//Simulate subscribing and evaluating Funcs
var evaluatedObservable = observable.ToEnumerable().Select(func => func()).ToList();
//Fail
Assert.That(evaluatedObservable,
Is.EquivalentTo(values.ToList()));
}
You could use the FirstOrDefault() extension, which is available for IEnumerable<T>. By returning false from the predicate, it will be run for each element but will not care that it doesn't actually find a match. This will avoid the ToList() overhead.
IEnumerable<Item> items = GetItems();
items.FirstOrDefault(i => { i.DoStuff(); return false; });
Keep your Side Effects out of my IEnumerable
I'd like to do the equivalent of the following in LINQ, but I can't figure out how:
As others have pointed out here and abroad LINQ and IEnumerable methods are expected to be side-effect free.
Do you really want to "do something" to each item in the IEnumerable? Then foreach is the best choice. People aren't surprised when side-effects happen here.
foreach (var i in items) i.DoStuff();
I bet you don't want a side-effect
However in my experience side-effects are usually not required. More often than not there is a simple LINQ query waiting to be discovered accompanied by a StackOverflow.com answer by either Jon Skeet, Eric Lippert, or Marc Gravell explaining how to do what you want!
Some examples
If you are actually just aggregating (accumulating) some value then you should consider the Aggregate extension method.
items.Aggregate(initial, (acc, x) => ComputeAccumulatedValue(acc, x));
Perhaps you want to create a new IEnumerable from the existing values.
items.Select(x => Transform(x));
Or maybe you want to create a look-up table:
items.ToLookup(x, x => GetTheKey(x))
The list (pun not entirely intended) of possibilities goes on and on.
I took Fredrik's method and modified the return type.
This way, the method supports deferred execution like other LINQ methods.
EDIT: If this wasn't clear, any usage of this method must end with ToList() or any other way to force the method to work on the complete enumerable. Otherwise, the action would not be performed!
public static IEnumerable<T> ForEach<T>(this IEnumerable<T> enumeration, Action<T> action)
{
foreach (T item in enumeration)
{
action(item);
yield return item;
}
}
And here's the test to help see it:
[Test]
public void TestDefferedExecutionOfIEnumerableForEach()
{
IEnumerable<char> enumerable = new[] {'a', 'b', 'c'};
var sb = new StringBuilder();
enumerable
.ForEach(c => sb.Append("1"))
.ForEach(c => sb.Append("2"))
.ToList();
Assert.That(sb.ToString(), Is.EqualTo("121212"));
}
If you remove the ToList() in the end, you will see the test failing since the StringBuilder contains an empty string. This is because no method forced the ForEach to enumerate.
So many answers, yet ALL fail to pinpoint one very significant problem with a custom generic ForEach extension: Performance! And more specifically, memory usage and GC.
Consider the sample below. Targeting .NET Framework 4.7.2 or .NET Core 3.1.401, configuration is Release and platform is Any CPU.
public static class Enumerables
{
public static void ForEach<T>(this IEnumerable<T> #this, Action<T> action)
{
foreach (T item in #this)
{
action(item);
}
}
}
class Program
{
private static void NoOp(int value) {}
static void Main(string[] args)
{
var list = Enumerable.Range(0, 10).ToList();
for (int i = 0; i < 1000000; i++)
{
// WithLinq(list);
// WithoutLinqNoGood(list);
WithoutLinq(list);
}
}
private static void WithoutLinq(List<int> list)
{
foreach (var item in list)
{
NoOp(item);
}
}
private static void WithLinq(IEnumerable<int> list) => list.ForEach(NoOp);
private static void WithoutLinqNoGood(IEnumerable<int> enumerable)
{
foreach (var item in enumerable)
{
NoOp(item);
}
}
}
At a first glance, all three variants should perform equally well. However, when the ForEach extension method is called many, many times, you will end up with garbage that implies a costly GC. In fact, having this ForEach extension method on a hot path has been proven to totally kill performance in our loop-intensive application.
Similarly, the weakly typed foreach loop will also produce garbage, but it will still be faster and less memory-intensive than the ForEach extension (which also suffers from a delegate allocation).
Strongly typed foreach: Memory usage
Weakly typed foreach: Memory usage
ForEach extension: Memory usage
Analysis
For a strongly typed foreach the compiler is able to use any optimized enumerator (e.g. value based) of a class, whereas a generic ForEach extension must fall back to a generic enumerator which will be allocated on each run. Furthermore, the actual delegate will also imply an additional allocation.
You would get similar bad results with the WithoutLinqNoGood method. There, the argument is of type IEnumerable<int> instead of List<int> implying the same type of enumerator allocation.
Below are the relevant differences in IL. A value based enumerator is certainly preferable!
IL_0001: callvirt instance class
[mscorlib]System.Collections.Generic.IEnumerator`1<!0>
class [mscorlib]System.Collections.Generic.IEnumerable`1<!!T>::GetEnumerator()
vs
IL_0001: callvirt instance valuetype
[mscorlib]System.Collections.Generic.List`1/Enumerator<!0>
class [mscorlib]System.Collections.Generic.List`1<int32>::GetEnumerator()
Conclusion
The OP asked how to call ForEach() on an IEnumerable<T>. The original answer clearly shows how it can be done. Sure you can do it, but then again; my answer clearly shows that you shouldn't.
Verified the same behavior when targeting .NET Core 3.1.401 (compiling with Visual Studio 16.7.2).
If you want to act as the enumeration rolls you should yield each item.
public static class EnumerableExtensions
{
public static IEnumerable<T> ForEach<T>(this IEnumerable<T> enumeration, Action<T> action)
{
foreach (var item in enumeration)
{
action(item);
yield return item;
}
}
}
There is an experimental release by Microsoft of Interactive Extensions to LINQ (also on NuGet, see RxTeams's profile for more links). The Channel 9 video explains it well.
Its docs are only provided in XML format. I have run this documentation in Sandcastle to allow it to be in a more readable format. Unzip the docs archive and look for index.html.
Among many other goodies, it provides the expected ForEach implementation. It allows you to write code like this:
int[] numbers = { 1, 2, 3, 4, 5, 6, 7, 8 };
numbers.ForEach(x => Console.WriteLine(x*x));
According to PLINQ (available since .Net 4.0), you can do an
IEnumerable<T>.AsParallel().ForAll()
to do a parallel foreach loop on an IEnumerable.
The purpose of ForEach is to cause side effects.
IEnumerable is for lazy enumeration of a set.
This conceptual difference is quite visible when you consider it.
SomeEnumerable.ForEach(item=>DataStore.Synchronize(item));
This wont execute until you do a "count" or a "ToList()" or something on it.
It clearly is not what is expressed.
You should use the IEnumerable extensions for setting up chains of iteration, definining content by their respective sources and conditions. Expression Trees are powerful and efficient, but you should learn to appreciate their nature. And not just for programming around them to save a few characters overriding lazy evaluation.
Many people mentioned it, but I had to write it down. Isn't this most clear/most readable?
IEnumerable<Item> items = GetItems();
foreach (var item in items) item.DoStuff();
Short and simple(st).
Now we have the option of...
ParallelOptions parallelOptions = new ParallelOptions();
parallelOptions.MaxDegreeOfParallelism = 4;
#if DEBUG
parallelOptions.MaxDegreeOfParallelism = 1;
#endif
Parallel.ForEach(bookIdList, parallelOptions, bookID => UpdateStockCount(bookID));
Of course, this opens up a whole new can of threadworms.
ps (Sorry about the fonts, it's what the system decided)
As numerous answers already point out, you can easily add such an extension method yourself. However, if you don't want to do that, although I'm not aware of anything like this in the BCL, there's still an option in the System namespace, if you already have a reference to Reactive Extension (and if you don't, you should have):
using System.Reactive.Linq;
items.ToObservable().Subscribe(i => i.DoStuff());
Although the method names are a bit different, the end result is exactly what you're looking for.
ForEach can also be Chained, just put back to the pileline after the action. remain fluent
Employees.ForEach(e=>e.Act_A)
.ForEach(e=>e.Act_B)
.ForEach(e=>e.Act_C);
Orders //just for demo
.ForEach(o=> o.EmailBuyer() )
.ForEach(o=> o.ProcessBilling() )
.ForEach(o=> o.ProcessShipping());
//conditional
Employees
.ForEach(e=> { if(e.Salary<1000) e.Raise(0.10);})
.ForEach(e=> { if(e.Age >70 ) e.Retire();});
An Eager version of implementation.
public static IEnumerable<T> ForEach<T>(this IEnumerable<T> enu, Action<T> action)
{
foreach (T item in enu) action(item);
return enu; // make action Chainable/Fluent
}
Edit: a Lazy version is using yield return, like this.
public static IEnumerable<T> ForEachLazy<T>(this IEnumerable<T> enu, Action<T> action)
{
foreach (var item in enu)
{
action(item);
yield return item;
}
}
The Lazy version NEEDs to be materialized, ToList() for example, otherwise, nothing happens. see below great comments from ToolmakerSteve.
IQueryable<Product> query = Products.Where(...);
query.ForEachLazy(t => t.Price = t.Price + 1.00)
.ToList(); //without this line, below SubmitChanges() does nothing.
SubmitChanges();
I keep both ForEach() and ForEachLazy() in my library.
Inspired by Jon Skeet, I have extended his solution with the following:
Extension Method:
public static void Execute<TSource, TKey>(this IEnumerable<TSource> source, Action<TKey> applyBehavior, Func<TSource, TKey> keySelector)
{
foreach (var item in source)
{
var target = keySelector(item);
applyBehavior(target);
}
}
Client:
var jobs = new List<Job>()
{
new Job { Id = "XAML Developer" },
new Job { Id = "Assassin" },
new Job { Id = "Narco Trafficker" }
};
jobs.Execute(ApplyFilter, j => j.Id);
.
.
.
public void ApplyFilter(string filterId)
{
Debug.WriteLine(filterId);
}
This "functional approach" abstraction leaks big time. Nothing on the language level prevents side effects. As long as you can make it call your lambda/delegate for every element in the container - you will get the "ForEach" behavior.
Here for example one way of merging srcDictionary into destDictionary (if key already exists - overwrites)
this is a hack, and should not be used in any production code.
var b = srcDictionary.Select(
x=>
{
destDictionary[x.Key] = x.Value;
return true;
}
).Count();
MoreLinq has IEnumerable<T>.ForEach and a ton of other useful extensions. It's probably not worth taking the dependency just for ForEach, but there's a lot of useful stuff in there.
https://www.nuget.org/packages/morelinq/
https://github.com/morelinq/MoreLINQ
I respectually disagree with the notion that link extension methods should be side-effect free (not only because they aren't, any delegate can perform side effects).
Consider the following:
public class Element {}
public Enum ProcessType
{
This = 0, That = 1, SomethingElse = 2
}
public class Class1
{
private Dictionary<ProcessType, Action<Element>> actions =
new Dictionary<ProcessType,Action<Element>>();
public Class1()
{
actions.Add( ProcessType.This, DoThis );
actions.Add( ProcessType.That, DoThat );
actions.Add( ProcessType.SomethingElse, DoSomethingElse );
}
// Element actions:
// This example defines 3 distict actions
// that can be applied to individual elements,
// But for the sake of the argument, make
// no assumption about how many distict
// actions there may, and that there could
// possibly be many more.
public void DoThis( Element element )
{
// Do something to element
}
public void DoThat( Element element )
{
// Do something to element
}
public void DoSomethingElse( Element element )
{
// Do something to element
}
public void Apply( ProcessType processType, IEnumerable<Element> elements )
{
Action<Element> action = null;
if( ! actions.TryGetValue( processType, out action ) )
throw new ArgumentException("processType");
foreach( element in elements )
action(element);
}
}
What the example shows is really just a kind of late-binding that allows one invoke one of many possible actions having side-effects on a sequence of elements, without having to write a big switch construct to decode the value that defines the action and translate it into its corresponding method.
To stay fluent one can use such a trick:
GetItems()
.Select(i => new Action(i.DoStuf)))
.Aggregate((a, b) => a + b)
.Invoke();
For VB.NET you should use:
listVariable.ForEach(Sub(i) i.Property = "Value")
Yet another ForEach Example
public static IList<AddressEntry> MapToDomain(IList<AddressModel> addresses)
{
var workingAddresses = new List<AddressEntry>();
addresses.Select(a => a).ToList().ForEach(a => workingAddresses.Add(AddressModelMapper.MapToDomain(a)));
return workingAddresses;
}

Resources