Linq query where there's a certain desired relationship between items in the result - linq

A linq query Where clause can apply a func to an item in the original set and return a bool to include or not include the item based on the item's characteristics. Great stuff:
var q = myColl.Where(o => o.EffectiveDate = LastThursday);
But what if I want to find a set of items where each item is related to the last item in some way? Like:
var q = myColl.Where(o => o.EffectiveDate = thePreviousItem.ExpirationDate);
How do you make a Where (or other linq function) "jump out" of the current item?
Here's what I tried, trying to be clever. I made every item an array just so I can use the Aggregate function:
public IQueryable<T> CurrentVersions
{
get => AllVersions
.Select(vo => new T[] { vo })
.Aggregate((voa1, voa2) => voa1[0].BusinessExpirationDate.Value == voa2[0].BusinessEffectiveDate.Value ? voa1.Concat(voa2).ToArray() : voa1)
.SelectMany(vo => vo);
}
but that doesn't compile on the SelectMany:
The type arguments for method Enumerable.SelectMany<TSource,
TResult>(IEnumerable<TSource>, Func<TSource, IEnumerable<TResult>>)
cannot be inferred from the usage. Try specifying the type arguments
explicitly.
EDIT (SOLUTION)
As it turns out, I was on the right track, but was just confused about what SelectMany does. I didn't need it. I also needed to change IQueryable to IEnumerable because I'm using EF and you can't query after you let go of the DbContext. So, here is the actual solution.
public IEnumerable<T> CurrentVersions
{
get => AllVersions
.Select(vo => new T[] { vo })
.Aggregate((voa1, voa2) => voa1[0].BusinessExpirationDate.Value == voa2[0].BusinessEffectiveDate.Value ? voa1.Concat(voa2).ToArray() : voa1);
}

Linq queries are most effective when each item is processed in isolation. It doesn't work well when trying to relate items within the same collection, without having to process the same collection multiple times and standard linq operators.
The MoreLINQ library helps provide additional operators to fill in some of those gaps. I'm not sure what operators it provides that could be used in this instance, but I know it has a Pairwise() method that combines the current and previous items in the iteration.
In general, for situations like this, if you needed to roll out your own, it would be far easier to write it using a generator to generate your sequence. Either as a general purpose extension method:
public static IEnumerable<TSource> WhereWithPrevious<TSource>(
this IEnumerable<TSource> source,
Func<TSource, TSource, bool> predicate)
{
using (var iter = source.GetEnumerator())
{
if (!iter.MoveNext())
yield break;
var previous = iter.Current;
while (iter.MoveNext())
{
var current = iter.Current;
if (predicate(current, previous))
yield return current;
}
}
}
or one specifically for the problem you're trying to solve.
public static IEnumerable<MyType> GetVersions(IEnumerable<MyType> source)
{
using (var iter = source.GetEnumerator())
{
if (!iter.MoveNext())
yield break;
var previous = iter.Current;
while (iter.MoveNext())
{
var current = iter.Current;
if (current.EffectiveDate == previous.ExpirationDate)
yield return current;
}
}
}
An alternative approach which while standard practice in other languages but terribly inefficient here would be to zip the collection with itself offset by one.
var query = Collection.Skip(1).Zip(Collection, (c, p) => (current:c,previous:p))
.Where(x => x.current.EffectiveDate == x.previous.ExpirationDate)
...;
And with all of that said, using any of these options will most likely make your query incompatible with query providers. It's not something you would want expressed as a single query anyway.

Related

C#7: How to use tuples in generic methods (LINQ select example)

I have some heavily repeating code, which has always the same structure, just using different columns in a database for accessing it and doing similar stuff
A typical query looks like:
var portfolioIds = context.PortSelMotorSeries
.Select(x => new { x.Id, x.InstallationAltitudeMax })
.ToList();
Now I want to use a generic function for dependency inversion and to pass the selector function as a delegate to the query:
private void ForEachIterate<T1>(Func<MotorSeriesDb, T1> selectorFunc): where T1 : (int Id, double Value)
{
...
var portfolioIds = context.PortSelMotorSeries
.Select(selectorFunct)
.ToList();
...
}
So that I can call the query with my own selector:
ForEachIterate(x => new { Id = x.Id, Value = x.InstallationAltitudeMax });
ForEachIterate(x => new { Id = x.Id, Value = x.TemperatureMax });
Specifying the constraint with "where T1 : (int Id, double Value)" leads to a compiler error CS0701.
Leaving it away leads to other compiler errors.
Is there any way to use tuples in generic functions?
For one thing you're confusing tuples (the (int Id, double Value) thing) with anonymous classes (the new { Id = x.Id, Value = x.TemperatureMax }). They aren't even related, so your code would never work as is.
For another, if all you want is to force the user to output a tuple of some specific type, you can do something like this:
private void ForEachIterate(Func<MotorSeriesDb, (int, double)> selectorFunc)
{
...
var portfolioIds = context.PortSelMotorSeries
.Select(selectorFunct)
.ToList();
...
}
// call like:
ForEachIterate(x => (x.Id, x.InstallationAltitudeMax));
Note that there's nothing generic about your function at all. Which leads me to my third point: you're missing the entire point of Linq. You talk about inversion of control, but you're the one who's inverting it in the wrong direction to begin with.
You already have a construct that allows arbitrary selection: context.PortSelMotorSeries. Simply use Linq to select what you want out of it in the call site and you're done.
If you try this it could works:
private static void ForEachIterate<T1>(Func<MotorSeriesDb, T1> selectorFunc) where T1 : Tuple<int, double>
{
var portfolioIds = context.PortSelMotorSeries
.Select(selectorFunct)
.ToList();
}

Scalable Contains method for LINQ against a SQL backend

I'm looking for an elegant way to execute a Contains() statement in a scalable way. Please allow me to give some background before I come to the actual question.
The IN statement
In Entity Framework and LINQ to SQL the Contains statement is translated as a SQL IN statement. For instance, from this statement:
var ids = Enumerable.Range(1,10);
var courses = Courses.Where(c => ids.Contains(c.CourseID)).ToList();
Entity Framework will generate
SELECT
[Extent1].[CourseID] AS [CourseID],
[Extent1].[Title] AS [Title],
[Extent1].[Credits] AS [Credits],
[Extent1].[DepartmentID] AS [DepartmentID]
FROM [dbo].[Course] AS [Extent1]
WHERE [Extent1].[CourseID] IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
Unfortunately, the In statement is not scalable. As per MSDN:
Including an extremely large number of values (many thousands) in an IN clause can consume resources and return errors 8623 or 8632
which has to do with running out of resources or exceeding expression limits.
But before these errors occur, the IN statement becomes increasingly slow with growing numbers of items. I can't find documentation about its growth rate, but it performs well up to a few thousands of items, but beyond that it gets dramatically slow. (Based on SQL Server experiences).
Scalable
We can't always avoid this statement. A JOIN with the source data in stead would generally perform much better, but that's only possible when the source data is in the same context. Here I'm dealing with data coming from a client in a disconnected scenario. So I have been looking for a scalable solution. A satisfactory approach turned out to be cutting the operation into chunks:
var courses = ids.ToChunks(1000)
.Select(chunk => Courses.Where(c => chunk.Contains(c.CourseID)))
.SelectMany(x => x).ToList();
(where ToChunks is this little extension method).
This executes the query in chunks of 1000 that all perform well enough. With e.g. 5000 items, 5 queries will run that together are likely to be faster than one query with 5000 items.
But not DRY
But of course I don't want to scatter this construct all over my code. I am looking for an extension method by which any IQueryable<T> can be transformed into a chunky executing statement. Ideally something like this:
var courses = Courses.Where(c => ids.Contains(c.CourseID))
.AsChunky(1000)
.ToList();
But maybe this
var courses = Courses.ChunkyContains(c => c.CourseID, ids, 1000)
.ToList();
I've given the latter solution a first shot:
public static IEnumerable<TEntity> ChunkyContains<TEntity, TContains>(
this IQueryable<TEntity> query,
Expression<Func<TEntity,TContains>> match,
IEnumerable<TContains> containList,
int chunkSize = 500)
{
return containList.ToChunks(chunkSize)
.Select (chunk => query.Where(x => chunk.Contains(match)))
.SelectMany(x => x);
}
Obviously, the part x => chunk.Contains(match) doesn't compile. But I don't know how to manipulate the match expression into a Contains expression.
Maybe someone can help me make this solution work. And of course I'm open to other approaches to make this statement scalable.
I’ve solved this problem with a little different approach a view month ago. Maybe it’s a good solution for you too.
I didn’t want my solution to change the query itself. So a ids.ChunkContains(p.Id) or a special WhereContains method was unfeasible. Also should the solution be able to combine a Contains with another filter as well as using the same collection multiple times.
db.TestEntities.Where(p => (ids.Contains(p.Id) || ids.Contains(p.ParentId)) && p.Name.StartsWith("Test"))
So I tried to encapsulate the logic in a special ToList method that could rewrite the Expression for a specified collection to be queried in chunks.
var ids = Enumerable.Range(1, 11);
var result = db.TestEntities.Where(p => Ids.Contains(p.Id) && p.Name.StartsWith ("Test"))
.ToChunkedList(ids,4);
To rewrite the expression tree I discovered all Contains Method calls from local collections in the query with a view helping classes.
private class ContainsExpression
{
public ContainsExpression(MethodCallExpression methodCall)
{
this.MethodCall = methodCall;
}
public MethodCallExpression MethodCall { get; private set; }
public object GetValue()
{
var parent = MethodCall.Object ?? MethodCall.Arguments.FirstOrDefault();
return Expression.Lambda<Func<object>>(parent).Compile()();
}
public bool IsLocalList()
{
Expression parent = MethodCall.Object ?? MethodCall.Arguments.FirstOrDefault();
while (parent != null) {
if (parent is ConstantExpression)
return true;
var member = parent as MemberExpression;
if (member != null) {
parent = member.Expression;
} else {
parent = null;
}
}
return false;
}
}
private class FindExpressionVisitor<T> : ExpressionVisitor where T : Expression
{
public List<T> FoundItems { get; private set; }
public FindExpressionVisitor()
{
this.FoundItems = new List<T>();
}
public override Expression Visit(Expression node)
{
var found = node as T;
if (found != null) {
this.FoundItems.Add(found);
}
return base.Visit(node);
}
}
public static List<T> ToChunkedList<T, TValue>(this IQueryable<T> query, IEnumerable<TValue> list, int chunkSize)
{
var finder = new FindExpressionVisitor<MethodCallExpression>();
finder.Visit(query.Expression);
var methodCalls = finder.FoundItems.Where(p => p.Method.Name == "Contains").Select(p => new ContainsExpression(p)).Where(p => p.IsLocalList()).ToList();
var localLists = methodCalls.Where(p => p.GetValue() == list).ToList();
If the local collection passed in the ToChunkedList method was found in the query expression, I replace the Contains call to the original list with a new call to a temporary list containing the ids for one batch.
if (localLists.Any()) {
var result = new List<T>();
var valueList = new List<TValue>();
var containsMethod = typeof(Enumerable).GetMethods(BindingFlags.Static | BindingFlags.Public)
.Single(p => p.Name == "Contains" && p.GetParameters().Count() == 2)
.MakeGenericMethod(typeof(TValue));
var queryExpression = query.Expression;
foreach (var item in localLists) {
var parameter = new List<Expression>();
parameter.Add(Expression.Constant(valueList));
if (item.MethodCall.Object == null) {
parameter.AddRange(item.MethodCall.Arguments.Skip(1));
} else {
parameter.AddRange(item.MethodCall.Arguments);
}
var call = Expression.Call(containsMethod, parameter.ToArray());
var replacer = new ExpressionReplacer(item.MethodCall,call);
queryExpression = replacer.Visit(queryExpression);
}
var chunkQuery = query.Provider.CreateQuery<T>(queryExpression);
for (int i = 0; i < Math.Ceiling((decimal)list.Count() / chunkSize); i++) {
valueList.Clear();
valueList.AddRange(list.Skip(i * chunkSize).Take(chunkSize));
result.AddRange(chunkQuery.ToList());
}
return result;
}
// if the collection was not found return query.ToList()
return query.ToList();
Expression Replacer:
private class ExpressionReplacer : ExpressionVisitor {
private Expression find, replace;
public ExpressionReplacer(Expression find, Expression replace)
{
this.find = find;
this.replace = replace;
}
public override Expression Visit(Expression node)
{
if (node == this.find)
return this.replace;
return base.Visit(node);
}
}
Please allow me to provide an alternative to the Chunky approach.
The technique involving Contains in your predicate works well for:
A constant list of values (no volatile).
A small list of values.
Contains will do great if your local data has those two characteristics because these small set of values will be hardcoded in the final SQL query.
The problem begins when your list of values has entropy (non-constant). As of this writing, Entity Framework (Classic and Core) do not try to parameterize these values in any way, this forces SQL Server to generate a query plan every time it sees a new combination of values in your query. This operation is expensive and gets aggravated by the overall complexity of your query (e.g. many tables, a lot of values in the list, etc.).
The Chunky approach still suffers from this SQL Server query plan cache pollution problem, because it does not parametrizes the query, it just moves the cost of creating a big execution plan into smaller ones that are more easy to compute (and discard) by SQL Server, furthermore, every chunk adds an additional round-trip to the database, which increases the time needed to resolve the query.
An Efficient Solution for EF Core
🎉 NEW! QueryableValues EF6 Edition has arrived!
For EF Core keep reading below.
Wouldn't it be nice to have a way of composing local data in your query in a way that's SQL Server friendly? Enter QueryableValues.
I designed this library with these two main goals:
It MUST solve the SQL Server's query plan cache pollution problem ✅
It MUST be fast! ⚡
It has a flexible API that allows you to compose local data provided by an IEnumerable<T> and you get back an IQueryable<T>; just use it as if it were another entity of your DbContext (really), e.g.:
// Sample values.
IEnumerable<int> values = Enumerable.Range(1, 1000);
// Using a Join (query syntax).
var query1 =
from e in dbContext.MyEntities
join v in dbContext.AsQueryableValues(values) on e.Id equals v
select new
{
e.Id,
e.Name
};
// Using Contains (method syntax)
var query2 = dbContext.MyEntities
.Where(e => dbContext.AsQueryableValues(values).Contains(e.Id))
.Select(e => new
{
e.Id,
e.Name
});
You can also compose complex types!
It goes without saying that the provided IEnumerable<T> is only enumerated at the time that your query is materialized (not before), preserving the same behavior of EF Core in this regard.
How Does It Works?
Internally QueryableValues creates a parameterized query and provides your values in a serialized format that is natively understood by SQL Server. This allows your query to be resolved with a single round-trip to the database and avoids creating a new query plan on subsequent executions due to the parameterized nature of it.
Useful Links
Nuget Package
GitHub Repository
Benchmarks
SQL Server Cache Pollution Problem
QueryableValues is distributed under the MIT license
Linqkit to the rescue! Might be a better way that does it directly, but this seems to work fine and makes it pretty clear what's being done. The addition being AsExpandable(), which lets you use the Invoke extension.
using LinqKit;
public static IEnumerable<TEntity> ChunkyContains<TEntity, TContains>(
this IQueryable<TEntity> query,
Expression<Func<TEntity,TContains>> match,
IEnumerable<TContains> containList,
int chunkSize = 500)
{
return containList
.ToChunks(chunkSize)
.Select (chunk => query.AsExpandable()
.Where(x => chunk.Contains(match.Invoke(x))))
.SelectMany(x => x);
}
You might also want to do this:
containsList.Distinct()
.ToChunks(chunkSize)
...or something similar so you don't get duplicate results if something this occurs:
query.ChunkyContains(x => x.Id, new List<int> { 1, 1 }, 1);
Another way would be to build the predicate this way (of course, some parts should be improved, just giving the idea).
public static Expression<Func<TEntity, bool>> ContainsPredicate<TEntity, TContains>(this IEnumerable<TContains> chunk, Expression<Func<TEntity, TContains>> match)
{
return Expression.Lambda<Func<TEntity, bool>>(Expression.Call(
typeof (Enumerable),
"Contains",
new[]
{
typeof (TContains)
},
Expression.Constant(chunk, typeof(IEnumerable<TContains>)), match.Body),
match.Parameters);
}
which you could call in your ChunkContains method
return containList.ToChunks(chunkSize)
.Select(chunk => query.Where(ContainsPredicate(chunk, match)))
.SelectMany(x => x);
Using a stored procedure with a table valued parameter could also work well. You in effect write a joint In the stored procedure between your table / view and the table valued parameter.
https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/sql/table-valued-parameters

How to easly convert linq result to Business Object Collection <T>

I have Business Object Collection
I'd like to filter rows using linq, but noticed it returns IEnumerable what can not be cast then to my BOC
E.g I cannot do that
BOC <Client> bocCLients = (BOC <Client>)
from C in ClientsColl where C.ClientId == 100 select C
I've resolved that by looping by linq results and adding returned object to my original collection.
I wonder if there is simpler way?
var bocCLients = ClientsColl.Where(c => c.ClientId == 100).ToList();
Or
var bocCLients = new BOC<Client>(ClientsColl.Where(c => c.ClientId == 100));
Edit
Or maybe an AddRange extension
public static void AddRange<T>(this ICollection<T> colSource, IEnumerable<T> collection)
{
if (colSource is List<T>)
((List<T>)colSource).AddRange(collection); //If List use build in optimized AddRange function
else
{
foreach (var item in collection)
colSource.Add(item);
}
}
This looks like a perfect opportunity to create an extension method. From looking at your question, it appears that ClientsColl already contains objects of type Client. In this case, your solution of a foreach loop is ideal. However, you can encapsulate that solution into an extension method and make it reusable and easy to read.
Here's an example of how it would look like:
public static BOC<T> ToBOC<T>(this IEnumerable<T> sourceCollection)
{
var boc = new BOC<T>();
foreach (T item in sourceCollection)
{
boc.Add(item);
}
return boc;
}
Using this extension method, you would just write your query as follows:
BOC<Client> bocClients =
(
from C in ClientsColl
where C.ClientID == 100
select C
).ToBOC();
EDIT:
To follow up on the idea of the more generic extension method to ICollection, but keeping in line the original question which was to perform a sort of Cast to a specific type of collection, and now having the new information that BOC implements ICollection, here is a more generic extension method and usage to perform the job:
public static TCollection ToICollection<T, TCollection>(this IEnumerable<T> sourceCollection)
where TCollection : ICollection<T>, new()
{
TCollection col = new TCollection();
foreach (T item in sourceCollection)
{
col.Add(item);
}
return col;
}
And usage:
BOC<Client> bocClients2 =
(
from C in ClientsColl
where C.ClientID == 100
select C
).ToICollection<Client, BOC<Client>>();
Does this look more useful? Let me know what you think.

Can I use Action<T> to perform a recursive search and return an IEnumerable of a specified property?

I've written the following code for retrieving the StructureIds from an IEnumerable<Structure>:
Action<Structure> recurse = null;
List<int> structureIds = new List<int>();
recurse = (r) =>
{
structureIds.Add(r.StructureId);
r.Children.ForEach(recurse);
};
IEnumerable<Structure> structures = GetStructures();
structures.ForEach(recurse);
I'd really like to make this generic so I can use it with any IEnumerable, i.e. something like:
public static IEnumerable<TType> GetPropertyValues<TType, TPropertyType>(
this IEnumerable<TType> this, <Property Declaration>)
{
// Generic version of the above code?
}
Can this be done?
Action isn't very Linq'ish. How about Func instead? (Untested code)
public static IEnumerable<TProp> RecurseSelect<TSource, TProp>(
this IEnumerable<TSource> source,
Func<TSource, TProp> propertySelector,
Func<TSource, IEnumerable<TSource>> childrenSelector
)
{
foreach(TSource x in source)
{
yield return propertySelector(x);
IEnumerable<TSource> children = childrenSelector(x);
IEnumerable<TProp> values = children.RecurseSelect(propertySelector, childrenSelector);
foreach(TProp y in values)
{
yield return y;
}
}
}
And then
IEnumerable<Structure> structures = GetStructures();
IEnumerable<int> structureIds = structures.RecurseSelect(
s => s.StructureId,
s => s.Children);
Your problem is that you're not adding each item to a list, you're adding the a property of each item. That property will only be available for a Structure, and not any other type you might reuse the code with.
You also don't have a mechanism for getting the children of your other classes. (the r.Children property you use).
Your two solutions would be to use interfaces (that is, define IHasChildren and IGetProperty) that could be used as base types for a simple algorithm, or you could pass in functions to your method that allow this to be more freely calculated. For example, your method signature might need to be this:
public static IEnumerable<TPropertyType> GetPropertyValues<TType, TPropertyType>
(this IEnumerable<TType> rootItem, Func<TType, IEnumerable<TType>> getChildren, Func<TType, TPropertyType> getIdValue)
... but that's not going to be very pretty!

LINQ equivalent of foreach for IEnumerable<T>

I'd like to do the equivalent of the following in LINQ, but I can't figure out how:
IEnumerable<Item> items = GetItems();
items.ForEach(i => i.DoStuff());
What is the real syntax?
There is no ForEach extension for IEnumerable; only for List<T>. So you could do
items.ToList().ForEach(i => i.DoStuff());
Alternatively, write your own ForEach extension method:
public static void ForEach<T>(this IEnumerable<T> enumeration, Action<T> action)
{
foreach(T item in enumeration)
{
action(item);
}
}
Fredrik has provided the fix, but it may be worth considering why this isn't in the framework to start with. I believe the idea is that the LINQ query operators should be side-effect-free, fitting in with a reasonably functional way of looking at the world. Clearly ForEach is exactly the opposite - a purely side-effect-based construct.
That's not to say this is a bad thing to do - just thinking about the philosophical reasons behind the decision.
Update 7/17/2012: Apparently as of C# 5.0, the behavior of foreach described below has been changed and "the use of a foreach iteration variable in a nested lambda expression no longer produces unexpected results." This answer does not apply to C# ≥ 5.0.
#John Skeet and everyone who prefers the foreach keyword.
The problem with "foreach" in C# prior to 5.0, is that it is inconsistent with how the equivalent "for comprehension" works in other languages, and with how I would expect it to work (personal opinion stated here only because others have mentioned their opinion regarding readability). See all of the questions concerning "Access to modified closure"
as well as "Closing over the loop variable considered harmful". This is only "harmful" because of the way "foreach" is implemented in C#.
Take the following examples using the functionally equivalent extension method to that in #Fredrik Kalseth's answer.
public static class Enumerables
{
public static void ForEach<T>(this IEnumerable<T> #this, Action<T> action)
{
foreach (T item in #this)
{
action(item);
}
}
}
Apologies for the overly contrived example. I'm only using Observable because it's not entirely far fetched to do something like this. Obviously there are better ways to create this observable, I am only attempting to demonstrate a point. Typically the code subscribed to the observable is executed asynchronously and potentially in another thread. If using "foreach", this could produce very strange and potentially non-deterministic results.
The following test using "ForEach" extension method passes:
[Test]
public void ForEachExtensionWin()
{
//Yes, I know there is an Observable.Range.
var values = Enumerable.Range(0, 10);
var observable = Observable.Create<Func<int>>(source =>
{
values.ForEach(value =>
source.OnNext(() => value));
source.OnCompleted();
return () => { };
});
//Simulate subscribing and evaluating Funcs
var evaluatedObservable = observable.ToEnumerable().Select(func => func()).ToList();
//Win
Assert.That(evaluatedObservable,
Is.EquivalentTo(values.ToList()));
}
The following fails with the error:
Expected: equivalent to < 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 >
But was: < 9, 9, 9, 9, 9, 9, 9, 9, 9, 9 >
[Test]
public void ForEachKeywordFail()
{
//Yes, I know there is an Observable.Range.
var values = Enumerable.Range(0, 10);
var observable = Observable.Create<Func<int>>(source =>
{
foreach (var value in values)
{
//If you have resharper, notice the warning
source.OnNext(() => value);
}
source.OnCompleted();
return () => { };
});
//Simulate subscribing and evaluating Funcs
var evaluatedObservable = observable.ToEnumerable().Select(func => func()).ToList();
//Fail
Assert.That(evaluatedObservable,
Is.EquivalentTo(values.ToList()));
}
You could use the FirstOrDefault() extension, which is available for IEnumerable<T>. By returning false from the predicate, it will be run for each element but will not care that it doesn't actually find a match. This will avoid the ToList() overhead.
IEnumerable<Item> items = GetItems();
items.FirstOrDefault(i => { i.DoStuff(); return false; });
Keep your Side Effects out of my IEnumerable
I'd like to do the equivalent of the following in LINQ, but I can't figure out how:
As others have pointed out here and abroad LINQ and IEnumerable methods are expected to be side-effect free.
Do you really want to "do something" to each item in the IEnumerable? Then foreach is the best choice. People aren't surprised when side-effects happen here.
foreach (var i in items) i.DoStuff();
I bet you don't want a side-effect
However in my experience side-effects are usually not required. More often than not there is a simple LINQ query waiting to be discovered accompanied by a StackOverflow.com answer by either Jon Skeet, Eric Lippert, or Marc Gravell explaining how to do what you want!
Some examples
If you are actually just aggregating (accumulating) some value then you should consider the Aggregate extension method.
items.Aggregate(initial, (acc, x) => ComputeAccumulatedValue(acc, x));
Perhaps you want to create a new IEnumerable from the existing values.
items.Select(x => Transform(x));
Or maybe you want to create a look-up table:
items.ToLookup(x, x => GetTheKey(x))
The list (pun not entirely intended) of possibilities goes on and on.
I took Fredrik's method and modified the return type.
This way, the method supports deferred execution like other LINQ methods.
EDIT: If this wasn't clear, any usage of this method must end with ToList() or any other way to force the method to work on the complete enumerable. Otherwise, the action would not be performed!
public static IEnumerable<T> ForEach<T>(this IEnumerable<T> enumeration, Action<T> action)
{
foreach (T item in enumeration)
{
action(item);
yield return item;
}
}
And here's the test to help see it:
[Test]
public void TestDefferedExecutionOfIEnumerableForEach()
{
IEnumerable<char> enumerable = new[] {'a', 'b', 'c'};
var sb = new StringBuilder();
enumerable
.ForEach(c => sb.Append("1"))
.ForEach(c => sb.Append("2"))
.ToList();
Assert.That(sb.ToString(), Is.EqualTo("121212"));
}
If you remove the ToList() in the end, you will see the test failing since the StringBuilder contains an empty string. This is because no method forced the ForEach to enumerate.
So many answers, yet ALL fail to pinpoint one very significant problem with a custom generic ForEach extension: Performance! And more specifically, memory usage and GC.
Consider the sample below. Targeting .NET Framework 4.7.2 or .NET Core 3.1.401, configuration is Release and platform is Any CPU.
public static class Enumerables
{
public static void ForEach<T>(this IEnumerable<T> #this, Action<T> action)
{
foreach (T item in #this)
{
action(item);
}
}
}
class Program
{
private static void NoOp(int value) {}
static void Main(string[] args)
{
var list = Enumerable.Range(0, 10).ToList();
for (int i = 0; i < 1000000; i++)
{
// WithLinq(list);
// WithoutLinqNoGood(list);
WithoutLinq(list);
}
}
private static void WithoutLinq(List<int> list)
{
foreach (var item in list)
{
NoOp(item);
}
}
private static void WithLinq(IEnumerable<int> list) => list.ForEach(NoOp);
private static void WithoutLinqNoGood(IEnumerable<int> enumerable)
{
foreach (var item in enumerable)
{
NoOp(item);
}
}
}
At a first glance, all three variants should perform equally well. However, when the ForEach extension method is called many, many times, you will end up with garbage that implies a costly GC. In fact, having this ForEach extension method on a hot path has been proven to totally kill performance in our loop-intensive application.
Similarly, the weakly typed foreach loop will also produce garbage, but it will still be faster and less memory-intensive than the ForEach extension (which also suffers from a delegate allocation).
Strongly typed foreach: Memory usage
Weakly typed foreach: Memory usage
ForEach extension: Memory usage
Analysis
For a strongly typed foreach the compiler is able to use any optimized enumerator (e.g. value based) of a class, whereas a generic ForEach extension must fall back to a generic enumerator which will be allocated on each run. Furthermore, the actual delegate will also imply an additional allocation.
You would get similar bad results with the WithoutLinqNoGood method. There, the argument is of type IEnumerable<int> instead of List<int> implying the same type of enumerator allocation.
Below are the relevant differences in IL. A value based enumerator is certainly preferable!
IL_0001: callvirt instance class
[mscorlib]System.Collections.Generic.IEnumerator`1<!0>
class [mscorlib]System.Collections.Generic.IEnumerable`1<!!T>::GetEnumerator()
vs
IL_0001: callvirt instance valuetype
[mscorlib]System.Collections.Generic.List`1/Enumerator<!0>
class [mscorlib]System.Collections.Generic.List`1<int32>::GetEnumerator()
Conclusion
The OP asked how to call ForEach() on an IEnumerable<T>. The original answer clearly shows how it can be done. Sure you can do it, but then again; my answer clearly shows that you shouldn't.
Verified the same behavior when targeting .NET Core 3.1.401 (compiling with Visual Studio 16.7.2).
If you want to act as the enumeration rolls you should yield each item.
public static class EnumerableExtensions
{
public static IEnumerable<T> ForEach<T>(this IEnumerable<T> enumeration, Action<T> action)
{
foreach (var item in enumeration)
{
action(item);
yield return item;
}
}
}
There is an experimental release by Microsoft of Interactive Extensions to LINQ (also on NuGet, see RxTeams's profile for more links). The Channel 9 video explains it well.
Its docs are only provided in XML format. I have run this documentation in Sandcastle to allow it to be in a more readable format. Unzip the docs archive and look for index.html.
Among many other goodies, it provides the expected ForEach implementation. It allows you to write code like this:
int[] numbers = { 1, 2, 3, 4, 5, 6, 7, 8 };
numbers.ForEach(x => Console.WriteLine(x*x));
According to PLINQ (available since .Net 4.0), you can do an
IEnumerable<T>.AsParallel().ForAll()
to do a parallel foreach loop on an IEnumerable.
The purpose of ForEach is to cause side effects.
IEnumerable is for lazy enumeration of a set.
This conceptual difference is quite visible when you consider it.
SomeEnumerable.ForEach(item=>DataStore.Synchronize(item));
This wont execute until you do a "count" or a "ToList()" or something on it.
It clearly is not what is expressed.
You should use the IEnumerable extensions for setting up chains of iteration, definining content by their respective sources and conditions. Expression Trees are powerful and efficient, but you should learn to appreciate their nature. And not just for programming around them to save a few characters overriding lazy evaluation.
Many people mentioned it, but I had to write it down. Isn't this most clear/most readable?
IEnumerable<Item> items = GetItems();
foreach (var item in items) item.DoStuff();
Short and simple(st).
Now we have the option of...
ParallelOptions parallelOptions = new ParallelOptions();
parallelOptions.MaxDegreeOfParallelism = 4;
#if DEBUG
parallelOptions.MaxDegreeOfParallelism = 1;
#endif
Parallel.ForEach(bookIdList, parallelOptions, bookID => UpdateStockCount(bookID));
Of course, this opens up a whole new can of threadworms.
ps (Sorry about the fonts, it's what the system decided)
As numerous answers already point out, you can easily add such an extension method yourself. However, if you don't want to do that, although I'm not aware of anything like this in the BCL, there's still an option in the System namespace, if you already have a reference to Reactive Extension (and if you don't, you should have):
using System.Reactive.Linq;
items.ToObservable().Subscribe(i => i.DoStuff());
Although the method names are a bit different, the end result is exactly what you're looking for.
ForEach can also be Chained, just put back to the pileline after the action. remain fluent
Employees.ForEach(e=>e.Act_A)
.ForEach(e=>e.Act_B)
.ForEach(e=>e.Act_C);
Orders //just for demo
.ForEach(o=> o.EmailBuyer() )
.ForEach(o=> o.ProcessBilling() )
.ForEach(o=> o.ProcessShipping());
//conditional
Employees
.ForEach(e=> { if(e.Salary<1000) e.Raise(0.10);})
.ForEach(e=> { if(e.Age >70 ) e.Retire();});
An Eager version of implementation.
public static IEnumerable<T> ForEach<T>(this IEnumerable<T> enu, Action<T> action)
{
foreach (T item in enu) action(item);
return enu; // make action Chainable/Fluent
}
Edit: a Lazy version is using yield return, like this.
public static IEnumerable<T> ForEachLazy<T>(this IEnumerable<T> enu, Action<T> action)
{
foreach (var item in enu)
{
action(item);
yield return item;
}
}
The Lazy version NEEDs to be materialized, ToList() for example, otherwise, nothing happens. see below great comments from ToolmakerSteve.
IQueryable<Product> query = Products.Where(...);
query.ForEachLazy(t => t.Price = t.Price + 1.00)
.ToList(); //without this line, below SubmitChanges() does nothing.
SubmitChanges();
I keep both ForEach() and ForEachLazy() in my library.
Inspired by Jon Skeet, I have extended his solution with the following:
Extension Method:
public static void Execute<TSource, TKey>(this IEnumerable<TSource> source, Action<TKey> applyBehavior, Func<TSource, TKey> keySelector)
{
foreach (var item in source)
{
var target = keySelector(item);
applyBehavior(target);
}
}
Client:
var jobs = new List<Job>()
{
new Job { Id = "XAML Developer" },
new Job { Id = "Assassin" },
new Job { Id = "Narco Trafficker" }
};
jobs.Execute(ApplyFilter, j => j.Id);
.
.
.
public void ApplyFilter(string filterId)
{
Debug.WriteLine(filterId);
}
This "functional approach" abstraction leaks big time. Nothing on the language level prevents side effects. As long as you can make it call your lambda/delegate for every element in the container - you will get the "ForEach" behavior.
Here for example one way of merging srcDictionary into destDictionary (if key already exists - overwrites)
this is a hack, and should not be used in any production code.
var b = srcDictionary.Select(
x=>
{
destDictionary[x.Key] = x.Value;
return true;
}
).Count();
MoreLinq has IEnumerable<T>.ForEach and a ton of other useful extensions. It's probably not worth taking the dependency just for ForEach, but there's a lot of useful stuff in there.
https://www.nuget.org/packages/morelinq/
https://github.com/morelinq/MoreLINQ
I respectually disagree with the notion that link extension methods should be side-effect free (not only because they aren't, any delegate can perform side effects).
Consider the following:
public class Element {}
public Enum ProcessType
{
This = 0, That = 1, SomethingElse = 2
}
public class Class1
{
private Dictionary<ProcessType, Action<Element>> actions =
new Dictionary<ProcessType,Action<Element>>();
public Class1()
{
actions.Add( ProcessType.This, DoThis );
actions.Add( ProcessType.That, DoThat );
actions.Add( ProcessType.SomethingElse, DoSomethingElse );
}
// Element actions:
// This example defines 3 distict actions
// that can be applied to individual elements,
// But for the sake of the argument, make
// no assumption about how many distict
// actions there may, and that there could
// possibly be many more.
public void DoThis( Element element )
{
// Do something to element
}
public void DoThat( Element element )
{
// Do something to element
}
public void DoSomethingElse( Element element )
{
// Do something to element
}
public void Apply( ProcessType processType, IEnumerable<Element> elements )
{
Action<Element> action = null;
if( ! actions.TryGetValue( processType, out action ) )
throw new ArgumentException("processType");
foreach( element in elements )
action(element);
}
}
What the example shows is really just a kind of late-binding that allows one invoke one of many possible actions having side-effects on a sequence of elements, without having to write a big switch construct to decode the value that defines the action and translate it into its corresponding method.
To stay fluent one can use such a trick:
GetItems()
.Select(i => new Action(i.DoStuf)))
.Aggregate((a, b) => a + b)
.Invoke();
For VB.NET you should use:
listVariable.ForEach(Sub(i) i.Property = "Value")
Yet another ForEach Example
public static IList<AddressEntry> MapToDomain(IList<AddressModel> addresses)
{
var workingAddresses = new List<AddressEntry>();
addresses.Select(a => a).ToList().ForEach(a => workingAddresses.Add(AddressModelMapper.MapToDomain(a)));
return workingAddresses;
}

Resources