Executing a certain action for all elements in an Enumerable<T> - linq

I have an Enumerable<T> and am looking for a method that allows me to execute an action for each element, kind of like Select but then for side-effects. Something like:
string[] Names = ...;
Names.each(s => Console.Writeline(s));
or
Names.each(s => GenHTMLOutput(s));
// (where GenHTMLOutput cannot for some reason receive the enumerable itself as a parameter)
I did try Select(s=> { Console.WriteLine(s); return s; }), but it wasn't printing anything.

A quick-and-easy way to get this is:
Names.ToList().ForEach(e => ...);

You are looking for the ever-elusive ForEach that currently only exists on the List generic collection. There are many discussions online about whether Microsoft should or should not add this as a LINQ method. Currently, you have to roll your own:
public static void ForEach<T>(this IEnumerable<T> value, Action<T> action)
{
foreach (T item in value)
{
action(item);
}
}
While the All() method provides similar abilities, it's use-case is for performing a predicate test on every item rather than an action. Of course, it can be persuaded to perform other tasks but this somewhat changes the semantics and would make it harder for others to interpret your code (i.e. is this use of All() for a predicate test or an action?).

Disclaimer: This post no longer resembles my original answer, but rather incorporates the some seven years experience I've gained since. I made the edit because this is a highly-viewed question and none of the existing answers really covered all the angles. If you want to see my original answer, it's available in the revision history for this post.
The first thing to understand here is C# linq operations like Select(), All(), Where(), etc, have their roots in functional programming. The idea was to bring some of the more useful and approachable parts of functional programming to the .Net world. This is important, because a key tenet of functional programming is for operations to be free of side effects. It's hard to understate this. However, in the case of ForEach()/each(), side effects are the entire purpose of the operation. Adding each() or ForEach() is not just outside the functional programming scope of the other linq operators, but in direct opposition to them.
But I understand this feels unsatisfying. It may help explain why ForEach() was omitted from the framework, but fails to address the real issue at hand. You have a real problem you need to solve. Why should all this ivory tower philosophy get in the way of something that might be genuinely useful?
Eric Lippert, who was on the C# design team at the time, can help us out here. He recommends using a traditional foreach loop:
[ForEach()] adds zero new representational power to the language. Doing this lets you rewrite this perfectly clear code:
foreach(Foo foo in foos){ statement involving foo; }
into this code:
foos.ForEach(foo=>{ statement involving foo; });
His point is, when you look closely at your syntax options, you don't gain anything new from a ForEach() extension versus a traditional foreach loop. I partially disagree. Imagine you have this:
foreach(var item in Some.Long(and => possibly)
.Complicated(set => ofLINQ)
.Expression(to => evaluate))
{
// now do something
}
This code obfuscates meaning, because it separates the foreach keyword from the operations in the loop. It also lists the loop command prior to the operations that define the sequence on which the loop operates. It feels much more natural to want to have those operations come first, and then have the the loop command at the end of the query definition. Also, the code is just ugly. It seems like it would be much nicer to be able to write this:
Some.Long(and => possibly)
.Complicated(set => ofLINQ)
.Expression(to => evaluate)
.ForEach(item =>
{
// now do something
});
However, even here, I eventually came around to Eric's point of view. I realized code like you see above is calling out for an additional variable. If you have a complicated set of LINQ expressions like that, you can add valuable information to your code by first assigning the result of the LINQ expression to a new variable:
var queryForSomeThing = Some.Long(and => possibly)
.Complicated(set => ofLINQ)
.Expressions(to => evaluate);
foreach(var item in queryForSomeThing)
{
// now do something
}
This code feels more natural. It puts the foreach keyword back next to the rest of the loop, and after the query definition. Most of all, the variable name can add new information that will be helpful to future programmers trying to understand the purpose of the LINQ query. Again, we see the desired ForEach() operator really added no new expressive power to the language.
However, we are still missing two features of a hypothetical ForEach() extension method:
It's not composable. I can't add a further .Where() or GroupBy() or OrderBy() after a foreach loop inline with the rest of the code, without creating a new statement.
It's not lazy. These operations happen immediately. It doesn't allow me to, say, have a form where a user chooses an operation as one field in a larger screen that is not acted on until the user presses a command button. This form might allow the user to change their mind before executing the command. This is perfectly normal (easy even) with a LINQ query, but not as simple with a foreach.
(FWIW, most naive .ForEach() implementations also have these issues. But it's possible to craft one without them.)
You could, of course, make your own ForEach() extension method. Several other answers have implementations of this method already; it's not all that complicated. However, I feel like it's unnecessary. There's already an existing method that fits what we want to do from both semantic and operational standpoints. Both of the missing features above can be addressed by use of the existing Select() operation.
Select() fits the kind of transformation or projection described by both of the examples above. Keep in mind, though, that I would still avoid creating side effects. The call to Select() should return either new objects or projections from the originals. This can sometimes be aided through the use of an anonymous type or dynamic object (if and only if necessary). If you need the results to persist in, say, an original list variable, you can always call .ToList() and assign it back to your original variable. I'll add here that I prefer working with IEnumerable<T> variables as much as possible over more concrete types.
myList = myList.Select(item => new SomeType(item.value1, item.value2 *4)).ToList();
In summary:
Just stick with foreach most of the time.
When foreach really won't do (which probably isn't as often as you think), use Select()
When you need to use Select(), you can still generally avoid (program-visible) side effects, possibly by projecting to an anonymous type.
Avoid the crutch of calling ToList(). You don't need it as much as you might think, and it can have significant negative consequence for performance and memory use.

Unfortunately there is no built-in way to do this in the current version of LINQ. The framework team neglected to add a .ForEach extension method. There's a good discussion about this going on right now on the following blog.
http://blogs.msdn.com/kirillosenkov/archive/2009/01/31/foreach.aspx
It's rather easy to add one though.
public static void ForEach<T>(this IEnumerable<T> enumerable, Action<T> action) {
foreach ( var cur in enumerable ) {
action(cur);
}
}

You cannot do this right away with LINQ and IEnumerable - you need to either implement your own extension method, or cast your enumeration to an array with LINQ and then call Array.ForEach():
Array.ForEach(MyCollection.ToArray(), x => x.YourMethod());
Please note that because of the way value types and structs work, if the collection is of a value type and you modify the elements of the collection this way, it will have no effect on the elements of the original collection.

Because LINQ is designed to be a query feature and not an update feature you will not find an extension which executes methods on IEnumerable<T> because that would allow you to execute a method (potentially with side effects). In this case you may as well just stick with
foreach(string name in Names)
Console.WriteLine(name);

Using Parallel Linq:
Names.AsParallel().ForAll(name => ...)

Well, you can also use the standard foreach keyword, just format it into a oneliner:
foreach(var n in Names.Where(blahblah)) DoStuff(n);
Sorry, thought this option deserves to be here :)

There is a ForEach method off of List. You could convert the Enumerable to List by calling the .ToList() method, and then call the ForEach method off of that.
Alternatively, I've heard of people defining their own ForEach method off of IEnumerable. This can be accomplished by essentially calling the ForEach method, but instead wrapping it in an extension method:
public static class IEnumerableExtensions
{
public static IEnumerable<T> ForEach<T>(this IEnumerable<T> _this, Action<T> del)
{
List<T> list = _this.ToList();
list.ForEach(del);
return list;
}
}

As mentioned before ForEach extension will do the fix.
My tip for the current question is how to execute the iterator
[I did try Select(s=> { Console.WriteLine(s); return s; }), but it wasn't printing anything.]
Check this
_= Names.Select(s=> { Console.WriteLine(s); return 0; }).Count();
Try it!

Related

Is there a DSL or declarative system for TPL Dataflow?

Is there any DSL or other fully- or partially-declarative mechanism for declaring TPL Dataflow flows? Or is the best (only) practice to just wire them up in code?
Failing that, is there any DSL or other fully- or partially-declarative mechanism for using any dataflow library that I could use as a model and/or source of ideas?
(I've searched without success so maybe one doesn't exist ... or maybe I didn't find it.)
Update: To answer #svick below as to why I want this and what do I gain by it:
First, I just like a sparser syntax that more clearly shows the flow rather than the details. I think
downloadString => createWordList => filterWordList => findPalindromes;
is preferable to
downloadString.LinkTo(createWordList);
createWordList.LinkTo(filterWordList);
filterWordList.LinkTo(findPalindromes);
findPalindromes.LinkTo(printPalindrome);
with its repeated names and extra punctuation. Similar to the way you'd rather use the dot DSL to describe a DAG than a bunch of calls to the Visio DOM API. You can imagine a syntax for network flows, as well as pipelines, such that network flows in particular would be very clear. That may not seem compelling, of course, but I like it.
Second, I think that with a DSL you might be able to persist the DSL description, e.g., as a field in a row in a database, and then instantiate it later. Though perhaps that's a different capability entirely.
Let's start with the relevant facts and work from there:
There isn't anything like this for TPL Dataflow yet.
There isn't a good way of embedding a DSL into C#. The common compilers are not extensible and it would be hard to access local variables from a string-based DSL.
The are several limitations to operators in C#, but the most significant here is that operators can't be generic. This means that the sparser syntax either wouldn't be type-safe (which is unacceptable to me), or it can't use overloaded operators.
The IDisposable returned from LinkTo() that can be used to break the created link isn't used that often, so it doesn't have to be supported. (Or maybe the expression that sets up the flow could return a single IDisposable that breaks the whole flow?)
Because of this, I think the best that can be done is something like:
downloadString.Link(createWordList).Link(filterWordList).Link(findPalindromes);
This avoids the repetition of LinkTo(), but is not much better.
The implementation of the simple form of this is mostly trivial:
public static class DataflowLinkExtensions
{
public static ISourceBlock<TTarget> Link<TSource, TTarget>(
this ISourceBlock<TSource> source,
IPropagatorBlock<TSource, TTarget> target)
{
source.LinkTo(
target,
new DataflowLinkOptions { PropagateCompletion = true });
return target;
}
public static void Link<TSource>(
this ISourceBlock<TSource> source, ITargetBlock<TSource> target)
{
source.LinkTo(
target,
new DataflowLinkOptions { PropagateCompletion = true });
}
}
I chose to set PropagateCompletion to true, because I think that makes the most sense here. But it could also be an option of Link().
I think most of the alternative linking operators of Axum are not relevant to TPL Dataflow, but linking multiple blocks to or from the same block could be done by taking a collection or array as one of the parameters of Link():
new[] { source1, source2 }.Link(target);
source.Link(target1, target2);
If Link() actually returned something that represents the whole flow (similar to Encapsulate()), you could combine this to create more complicated flows, like:
source.Link(propagator1.Link(target1), target2);

IEnumerable<T>.Count() vs List<T>.Count with Entity Framework

I am retrieving a list of items using Entity Framework and if there are some items retrieved I do something with them.
var items = db.MyTable.Where(t => t.Expiration < DateTime.Now).ToList();
if(items.Count != 0)
{
// Do something...
}
The if statement could also be written as
if(items.Count() != 0)
{
// Do something...
}
In the first case, the .Count is a List<T>.Count property. In the second case, the .Count() is IEnumerable<T>.Count() extension method.
Although both approaches achieve the same result, however, is one more preferred than the other? (Possibly some difference in performance?)
Enumerable.Count<T> (the extension method for IEnumerable<T>) just calls Count if the underlying type is an ICollection<T>, so for List<T> there is no difference.
Queryable.Count<T> (the extension method for IQueryable<T>) will use the underlying query provider, which in many cases will push the count down to the actual SQL, which will perform faster than counting the objects in memory.
If a filter is applied (e.g. Count(i => i.Name = "John")) or if the underlying type is not an ICollection<T>, the collection is enumerated to compute the count.
is one more preferred than the other?
I generally prefer to use Count() since 1) it's more portable (the underlying type can be anything that implements IEnumerable<T> or IQueryable<T>) and 2) it's easier to add a filter later if necessary.
As Tim states in his comment, I also prefer using Any() to Count() > 0 since it doesn't have to actually count the items - it will just check for the existence of one item. Conversely I use !Any() instead of Count() == 0.
It depends on the underlying collection and where Linq will be pulling from. For example if it's SQL then using .ToList() will cause the query to pull back the entire list, and then count it. However, the .Count() extension method will translate it into a SQL COUNT statement on the database side. In which case there will be an obvious performance difference.
For just a standard List or Collection it's as stated in D. Stanley's answer.
I would say that it depends on what's going on inside the if block. If you're simply doing the check to determine whether to perform a sequence of operations on the underlying enumeration, then it's probably not needed in any event. Simply iterate over the enumeration (omitting ToList as well). If you're not using the collection inside the if block, then you should avoid using ToList and definitely use Any over any Count/Count() method.
Once you've performed the ToList then you're no longer using Entity Framework and I expect that Count() is only marginally slower than Count since, if the underlying collection is ICollection<T> it defers to that implementation. The only overhead would be determining whether it implements that interface.
http://msdn.microsoft.com/en-us/library/bb338038.aspx
Remarks:
If the type of source implements ICollection<T>, that implementation is used to obtain the count of elements. Otherwise, this method determines the count.

Workarounds for using custom methods/extension methods in LINQ to Entities

I have defined a GenericRepository class which does the db interaction.
protected GenericRepository rep = new GenericRepository();
And in my BLL classes, I can query the db like:
public List<Album> GetVisibleAlbums(int accessLevel)
{
return rep.Find<Album>(a => a.AccessLevel.BinaryAnd(accessLevel)).ToList();
}
BinaryAnd is an extension method which checks two int values bit by bit. e.g. AccessLevel=5 => AccessLevel.BinaryAnd(5) and AccessLevel.binaryAnd(1) both return true.
However I cannot use this extension method in my LINQ queries. I get a runtime error as follows:
LINQ to Entities does not recognize the method 'Boolean BinaryAnd(System.Object, System.Object)' method, and this method cannot be translated into a store expression.
Also tried changing it to a custom method but no luck. What are the workarounds?
Should I get all the albums and then iterate them through a foreach loop and pick those which match the AccessLevels?
I realize this already has an accepted answer, I just thought I'd post this in case someone wanted to try writing a LINQ expression interceptor.
So... here is what I did to make translatable custom extension methods: Code Sample
I don't believe this to be a finished solution, but it should hopefully provide a good starting point for anyone brave enough to see it through to completion.
You can only use the core extension methods and CLR methods defined for your EF provider when using Entity Framework and queries on IQueryable<T>. This is because the query is translated directly to SQL code and run on the server.
You can stream the entire collection (using .ToEnumerable()) then query this locally, or convert this to a method that is translatable directly to SQL by your provider.
That being said, basic bitwise operations are supported:
The bitwise AND, OR, NOT, and XOR operators are also mapped to canonical functions when the operand is a numeric type.
So, if you rewrite this to not use a method, and just do the bitwise operation on the value directly, it should work as needed. Try something like the following:
public List<Album> GetVisibleAlbums(int accessLevel)
{
return rep.Find<Album>(a => (a.AccessLevel & accessLevel > 0)).ToList();
}
(I'm not sure exactly how your current extension method works - the above would check to see if any of the flags come back true, which seems to match your statement...)
There are ways to change the linq query just before EF translates it to SQL, at that moment you'd have to translate your ''foreign'' method into a construct translatable by EF.
See an previous question of mine How to wrap Entity Framework to intercept the LINQ expression just before execution? and mine EFWrappableFields extension which does just this for wrapped fields.

Do you ToList()?

Do you have a default type that you prefer to use in your dealings with the results of LINQ queries?
By default LINQ will return an IEnumerable<> or maybe an IOrderedEnumerable<>. We have found that a List<> is generally more useful to us, so have adopted a habit of ToList()ing our queries most of the time, and certainly using List<> in our function arguments and return values.
The only exception to this has been in LINQ to SQL where calling .ToList() would enumerate the IEnumerable prematurely.
We are also using WCF extensively, the default collection type of which is System.Array. We always change this to System.Collections.Generic.List in the Service Reference Settings dialog in VS2008 for consistency with the rest of our codebase.
What do you do?
ToList always evaluates the sequence immediately - not just in LINQ to SQL. If you want that, that's fine - but it's not always appropriate.
Personally I would try to avoid declaring that you return List<T> directly - usually IList<T> is more appropriate, and allows you to change to a different implementation later on. Of course, there are some operations which are only specified on List<T> itself... this sort of decision is always tricky.
EDIT: (I would have put this in a comment, but it would be too bulky.) Deferred execution allows you to deal with data sources which are too big to fit in memory. For instance, if you're processing log files - transforming them from one format to another, uploading them into a database, working out some stats, or something like that - you may very well be able to handle arbitrary amounts of data by streaming it, but you really don't want to suck everything into memory. This may not be a concern for your particular application, but it's something to bear in mind.
We have the same scenario - WCF communications to a server, the server uses LINQtoSQL.
We use .ToArray() when requesting objects from the server, because it's "illegal" for the client to change the list. (Meaning, there is no purpose to support ".Add", ".Remove", etc).
While still on the server, however, I would recommend that you leave it as it's default (which is not IEnumerable, but rather IQueryable). This way, if you want to filter even more based on some criteria, the filtering is STILL on the SQL side until evaluated.
This is a very important point as it means incredible performance gains or losses depending on what you do.
EXAMPLE:
// This is just an example... imagine this is on the server only. It's the
// basic method that gets the list of clients.
private IEnumerable<Client> GetClients()
{
var result = MyDataContext.Clients;
return result.AsEnumerable();
}
// This method here is actually called by the user...
public Client[] GetClientsForLoggedInUser()
{
var clients = GetClients().Where(client=> client.Owner == currentUser);
return clients.ToArray();
}
Do you see what's happening there? The "GetClients" method is going to force a download of ALL 'clients' from the database... THEN the Where clause will happen in the GetClientsForLoogedInUser method to filter it down.
Now, notice the slight change:
private IQueryable<Client> GetClients()
{
var result = MyDataContext.Clients;
return result.AsQueryable();
}
Now, the actual evaluation won't happen until ".ToArray" is called... and SQL will do the filtering. MUCH better!
In the Linq-to-Objects case, returning List<T> from a function isn't as nice as returning IList<T>, as THE VENERABLE SKEET points out. But often you can still do better than that. If the thing you are returning ought to be immutable, IList is a bad choice because it invites the caller to add or remove things.
For example, sometimes you have a method or property that returns the result of a Linq query or uses yield return to lazily generate a list, and then you realise that it would be better to do that the first time you're called, cache the result in a List<T> and return the cached version thereafter. That's when returning IList may be a bad idea, because the caller may modify the list for their own purposes, which will then corrupt your cache, making their changes visible to all other callers.
Better to return IEnumerable<T>, so all they have is forward iteration. And if the caller wants rapid random access, i.e. they wish they could use [] to access by index, they can use ElementAt, which Linq defines so that it quietly sniffs for IList and uses that if available, and if not it does the dumb linear lookup.
One thing I've used ToList for is when I've got a complex system of Linq expressions mixed with custom operators that use yield return to filter or transform lists. Stepping through in the debugger can get mighty confusing as it jumps around doing lazy evaluation, so I sometimes temporarily add a ToList() to a few places so that I can more easily follow the execution path. (Although if the things you are executing have side-effects, this can change the meaning of the program.)
It depends if you need to modify the collection. I like to use an Array when I know that no one is going to add/delete items. I use a list when I need to sort/add/delete items. But, usually I just leave it as IEnumerable as long as I can.
If you don't need the added features of List<>, why not just stick with IQueryable<> ?!?!?! Lowest common denominator is the best solution (especially when you see Timothy's answer).

Using return statements to great effect!

When I am making methods with return values, I usually try and set things up so that there is never a case when the method is called in such a way that it would have to return some default value. When I started I would often write methods that did something, and would either return what they did or, if they failed to do anything, would return null. But I hate having ugly if(!null) statements all over my code,
I'm reading a re-guide to ruby that I read many moons ago, by the pragmatic programmers, and I notice that they often return self (ruby's this) when they wouldn't normally return anything. This is, they say, in order to be able to chain method calls, as in this example using setters that return the object whose attributes they set.
tree.setColor(green).setDecor(gaudy).setPractical(false)
Initially I find this sort of thing attractive. There have been a couple of times when I have rejoiced at being able to chain method calls, like Player.getHand().getSize() but this is somewhat different in that the object of the method call changes from step to step.
What does Stack Overflow think about return values? Are there any patterns or idioms that come to mind warmly when you think of return values? Any great ways to avoid frustration and increase beauty?
In my humble opinion, there are three kinds of return-cases that you should take into consideration:
Object property manipulation
The first is the manipulation of object properties. The pattern you describe here is very often used when manipulating objects. A very typical scenario is using it together with a factory. Consider this hypothetical creation call:
// When the object has manipulative methods:
Pizza p = PizzaFactory().create().addAnchovies().addTomatoes();
// When the factory has manipulative methods working on the
// object, IMHO more elegant from a semantic point of view:
Pizza p = PizzaFactory().create().addAnchovies().addTomatoes().getPizza();
It allows for a quick grasp at what exactly is being created or how an object is manipulated, because the methods form one human-readable expression. It's definitely nice, but don't overuse. A rule of thumb is that this might be used with methods whose return value you could also declare as void.
Evaluating object properties
The second might be when a method evaluates something on an object. Consider, for example, the method car.getCurrentSpeed(), that could be interpreted as a message to an object asking for the current speed and returning that. It would simply return the value, not too complicated. :)
Make object do this or that
The third might be when a method makes an perform an operation, returning some sort of value indicating how well the caller's intention was fulfilled - but laying out such a method could be difficult:
int new_gear = 20;
if (car.gears.changeGear(new_gear)) // does that mean success or fail?
This is where you can see a difficulty in designing the method. Should it return 0 upon success or failure? How about -1 if the gear could not be set, because the car only has 5 gears? Does that mean the current gear is at -1 now, too? The method could return the gear it changed to, meaning you would have to compare the argument supplied to the method to the return code. That would work. On the other hand, you could simply return either true or false for failure or false or true for failure. Which one to use could be decided by estimating if you'd expect those method calls to rather fail or succeed.
In my humble opinion, there is a way to better express the semantics of such return values, by giving them a semantic description. Future developers interacting with your objects will love you for not having to look up the comments or documentation for your methods:
class GearSystem {
// (...)
public:
enum GearChangeResult
{ GearChangeSuccess, NonExistingGear, MechanicalGearProblem };
GearChangeResult changeGear (int gear);
};
That way, it becomes perfectly obvious for any programmer looking at your code, what the return value means (consider: if (gears.changeGear(20) == GearSystem::GearChangeSuccess) - much clearer what that means than the example above)
Antipattern: Failures as return codes.
The fourth possibility for a return value I actually omitted, because in my opinion it isn't any: when there's an error in your program, like a logic error or a failure that needs to be dealt with - you could theoretically return a value indicating so. But today, that's not done so often anymore (or should not be), because for that, there are exceptions.
I don't agree that methods should never return null. The most obvious examples are from systems programming. For instance, if someone asks to open a file, you simply have to give them null if the open fails. There is no sane alternative. There are other cases where null is appropriate, such as a getNextNode(node) method, when called on the last node of a linked list. So I guess what these cases have in common is that null represents "no object" (either no file handle or no list node), which makes sense.
In other cases, the method should never fail, and there is an appropriate exception facility. Then, I think method chaining like your example can be used to great effect. I think it's a bit funny that you seem to believe this is an innovation of the "Pragmatic Programmers". In fact, it dates to Lisp if not before.
Returning this is also used in the "builder pattern", another case where method chaining can enhance readability as well as writing convenience.
A null is often returned as an out-of-band value to indicate that no result could be produced. I believe that this is perfectly reasonable when getting no result is a normal event; examples would include a null return from readLine() at end-of-file, or a null returned when providing a non-existent key to the get(...) method of a Map. Reading to the end of the file is normal behavior (as opposed to an IOException, which indicates that something went abnormally wrong while trying to read). Similarly, looking up a key and being told that it has no value is a normal case.
A good alternative to null for some cases is a "null object", which is a full-fledged instance of the result class, but which has appropriate state and behavior for a "nobody's home" case. For instance, the result of looking up a non-existent user ID might well be a NullUser object which has a zero-length name and no permissions to do anything in the system.
It's confusing to me. OO programming languages need Smalltalk's semicolon:
tree color: green;
decor: gaudy;
practical: false.
obj method1; method2. means "call method1 on obj then method2 on obj". This kind of object setup is very common.

Resources