Which method is faster - foreach or LINQ selection? - linq

I am dealing with List collection. Which of the below will execute faster ?
Considering number of items in result as 1000.
Method 1:
var linqResult = from item in databaseObject
select item;
foreach(customobject item in linqResult)
{
List<customobject>.add(item);
}
Method 2
var data = from item in databaseObject
select new customobject()
{
//initialize properties
});
data.ToList();

I would suggest you unit/performance test your code to see which is faster for your particular scenario.
It depends on what exactly you are doing. My gut feeling is that a LINQ query (properly performance tested and optimized) will be faster than your run of the mill foreach loop.
LINQ-TO-SQL (which you have tagged), from what I understand, generates SQL commands based on your LINQ query, and runs these against the data source. This will definitely be faster that a foreach loop as the data source will return ONLY what you are expecting, as per the query, whereas the foreach loop would likely need to get the entire collection from SQL first, and then iterate over it.
If you are using In-Memory collections, you might want to investigate the speed of LINQ vs foreach vs Parallel.ForEach. Parallelism "could" perform faster than the equivalent LINQ, but again, you need to performance test all scenarios.

I think the benefits of linq selection, it provides you deferred execution while simple foreach not. If you use
yield return
you can achieve what linq provides you by default.
I don't know is there any performance difference, but deferred execution can save you a lot of memory.

Related

How can Linq be so incredibly fast? C#

Let's say I have 100 000 objects of type Person which have a date property with their birthday in them.
I place all the objects in a List<Person> (or an array) and also in a dictionary where I have the date as the key and every value is a array/list with persons that share the same birthday.
Then I do this:
DateTime date = new DateTime(); // Just some date
var personsFromList = personList.Where(person => person.Birthday == date);
var personsFromDictionary = dictionary[date];
If I run that 1000 times the Linq .Where lookup will be significantly faster in the end than the dictionary. Why is that? It does not seem logical to me. Is the results being cached (and used again) behind the scenes?
From Introduction to LINQ Queries (C#) (The Query)
... the important point is that in LINQ, the query variable itself takes no action and returns no data. It just stores the information that is required to produce the results when the query is executed at some later point.
This is known as deferred execution. (later down the same page):
As stated previously, the query variable itself only stores the query commands. The actual execution of the query is deferred until you iterate over the query variable in a foreach statement. This concept is referred to as deferred execution...
Some linq methods must iterate the IEnumerable and therefor will execute immediately - methods like Count, Max, Average etc' - all the aggregation methods.
Another way to force immediate execution is to use ToArray or ToList, which will execute the query and store it's results in an array or list.

Clean way to write this query

I'm looking for a clean way to write this Linq query.
Basically I have a collection of objects with id's, then using nhibernate and Linq, I need to check if the nhibernate entity has a subclass collection where all id's in object collection exist in the nhibernate subclass collection.
If there was just one item this would work:
var objectImCheckingAgainst = ... //irrelevant
where Obj.SubObj.Any(a => a.id == objectImCheckingAgainst.Id)
Now I want to instead somehow pass a list of objectImCheckingAgainst and return true only if the Obj.SubObj collection contains all items in list of objectImCheckingAgainst based on Id.
I like to use GroupJoin for this.
return objectImCheckingAgainst.GroupJoin(Obj.SubObj,
a => a.Id,
b => b.id,
(a, b) => b.Any())
.All(c => c);
I believe this query should be more or less self-explanatory, but essentially, this joins the two collections using their respective ids as keys, then groups those results. Then for each of those groupings, it determines whether any matches exist. Finally, it ensures that all groupings had matches.
A useful alternative that I sometimes use is .Count() == 1 instead of the .Any(). Obviously, the difference there is whether you want to support multiple elements with the same id matching. From your description, it sounded like that either doesn't matter or is enforced by another means. But that's an easy swap, either way.
An important concept in GroupJoin that I know is relevant, but may or may not be obvious, is that the first enumerable (which is to say, the first argument to the extension method, or objectImCheckingAgainst in this example) will have all its elements included in the result, but the second one may or may not. It's not like Join, where the ordering is irrelevant. If you're used to SQL, these are the elementary beginnings of a LEFT OUTER JOIN.
Another way you could accomplish this, somewhat more simply but not as efficiently, would be to simply nest the queries:
return objectImCheckingAgainst.All(c => Obj.SubObj.Any(x => x.id == c.Id));
I say this because it's pretty similar to the example you provided.
I don't have any experience with NHibernate, but I know many ORMs (I believe EF included) will map this to SQL, so efficiency may or may not be a concern. But in general, I like to write LINQ as close to par as I can so it works as well in memory as against a database, so I'd go with the first one I mentioned.
I'm not well versed in LINQ-to-NHibernate but when using LINQ against any SQL backen it's always important to keep an eye on the generated SQL. I think this where clause...
where Obj.SubObj.All(a => idList.Contains(a.id))
...will produce the best SQL (having an IN statement).
idList is a list of Ids extracted from the list of objectImCheckingAgainst objects.

Linq - Is operations order relevant?

I have some slow linq queries and need to optimize them. I have read about compiled queries and setting the merge option in NoTracking in my readonly operations.
But I think my problem is that I have too many Includes so the number of joins done in the DB is huge.
context.ExampleEntity
.Include("A")
.Include("B")
.Include("D.E.F")
.Include("G.H")
.Include("I.J")
.Include("K.M")
.Include("K.N")
.Include("O.P")
.Include("Q.R")
.Where(a => condition1 || complexCondition2)
My doubt is, if I put the Where before the Includes, would this filter ExampleEntity objects before making all the joins?? Im not sure about how linq queries are translated to SQL
"Yes".
Each sub-query passes it's results to the next. Moving the Where first will filter, then perform the includes against a potentially smaller set.
Whether that makes sense in the context of your specific query is up to you to decide.

Data Entity Framework and LINQ - Get giant data set and execute a command one at a time on each object

We are pulling in a giant dataset of records (in the 100's of thousands) and then need to update a field on each one, one at a time in an atomic transation. They records are unrelated to each other and we don't want to do a blind update to all couple hundred thousand (there are views and indexes on this table that make that very prohibitive). The ONLY way that I could get this to work without doing a giant transation was as follows (container is a reference to a custom ObjectContext):
var expiredWorkflows = from iw in container.InitiatedWorkflows
where iw.InitiationStatusID != 1 && iw.ExpirationDate < DateTime.Now
select iw.ID;
foreach (int expiredWorkflow in expiredWorkflows)
container.ExecuteStoreCommand("UPDATE dbo.InitiatedWorkflow SET InitiationStatusID = 7 WHERE ID = #ID", new SqlParameter() { ParameterName = "#ID", Value = expiredWorkflow.ToString() } );
We tried looping through each one and just updating the field via the container and then calling SaveChanges(), but that runs everything as one transaction. We tried calling SaveChanges() in the foreach loop, but that threw transaction exceptions. Is there any way to what we are trying to do using the ObjectContext, so it would do something like (the above select would be changed to return the full object, not just the ID):
foreach (var expiredWorkflow in expiredWorkflows)
expiredWorkflow.InitiationStatusID = 7
container.SaveChanges(SaveOptions.OneAtATime);
Speaking generally, if the operation you need to carry out is as simple as the sort of UPDATE your code above suggests, this is the sort of operation that will run far better on the back end database--assuming, of course, there's some clear way to select only the rows that need to be changed. Entity Framework is intended more for manipulating small to medium sets of objects that can easily be loaded into memory and twiddled there, not large bulk-processing operations for which stored procedures are often best. EF can certainly perform those big operations, but it will take a lot longer to execute one SQL statement per row.

LINQ and Generated sql

suppose my LINQ query is like
var qry = from c in nwEntitiesContext.CategorySet.AsEnumerable()
let products = this.GetProducts().WithCategoryID(c.CategoryID)
select new Model.Category
{
ID = c.CategoryID,
Name = c.CategoryName,
Products = new Model.LazyList<Core.Model.Product>(products)
};
return qry.AsQueryable();
i just want to know what query it will generate at runtime....how to see what query it is generating from VS2010 IDE when we run the code in debug mode....guide me step by step.
There is not much to see here - it will just select all fields from the Category table since you call AsEnumerable thus fetching all the data from the Category table into memory. After that you are in object space. Well, depending on what this.GetProducts() does - and my guess it makes another EF query fetching the results into memory. If that's the case, I would strongly recommend you to post another question with this code and the code of your GetProducts method so that we can take a look and rewrite this in a more optimal way. (Apart from this, you are projecting onto a mapped entity Model.Category which again won't (and should not) work with Linq-to-Entities.)
Before reading into your query I was going to recommend doing something like this:
string sqlQueryString = ((ObjectQuery)qry).ToTraceString();
But that won't work since you are mixing Linq-to-Entities with Linq-to-objects and you will actually have several queries executed in case GetProducts queries EF. You can separate the part with your EF query and see the SQL like this though:
string sqlString = nwEntitiesContext.CategorySet.ToTraceString();
but as I mentioned earlier - that would just select everything from the Categories table.
In your case (unless you rewrite your code in a drastic way), you actually want to see what queries are run against the DB when you execute the code and enumerate the results of the queries. See this question:
exact sql query executed by Entity Framework
Your choices are SQL Server Profiler and Entity Framework Profiler. You can also try out LinqPad, but in general I still recommend you to describe what your queries are doing in more detail (and most probably rewrite them in a more optimal way before proceeding).
Try Linqpad
This will produce SELECT * FROM Categories. Nothing more. Once you call AsEnumerable you are in Linq-to-objects and there is no way to get back to Linq-to-entities (AsQueryable doesn't do that).
If you want to see what query is generated use SQL Profiler or any method described in this article.

Resources