Is better Linq or SQL query for complex calculations and aggregations? - performance

We must create and show at runtime (asp.net mvc) some complex reports from Oracle tables data with millions of records. The reports data must be obtained from groupings and little complex calculations.
So is it better for performance and maintainability of code that do these groupings and calculations via sql query (pl/sql) or via linq?
Thanks for your kindle reply

So is it better for performance and maintainability of code that do
these groupings and calculations via sql query (pl/sql) or via linq?
It depends on what you mean by via linq. If you mean that you fetch the complete table to local memory and then use linq statements to extract the result that you want, then of course SQL statements are faster.
However, if you mean that you use Entity Framework, or something similar, then the answer is not a easy to give.
If you use Entity Framework (or some clone), your tables will be represented by IQueryable<...> instead of IEnumerable<...>. An IQueryable has an Expression and a Provider. The Expression represents the query that must be performed. The Provider knows which system must execute the query (usually a Database Management System) and how to communicate with this system. When the query must be executed, it is the task of the Provider to translate the Expression into the language that the system knows (usually something SQL-like) and to execute the SQL-query.
There are two kinds of IQueryable LINQ statements: those that return an IQueryable<...> of something, and those that return a TResult. The ones that return IQueryable only change the Expression. They are functions that use deferred execution.
Function that do not return an IQueryable, are ToList(), FirstOrDefault(), Any(), Max(), etc. Internally they will call functions that will GetEnumerator() (usually via a foreach), which orders the Provider to translate the Expression and execute the query.
Back to your question
So which one is more efficient, entity framework or SQL? Efficiency is not only the time to perform the queries, it is also the development/testing time, for the first version and for future changes in the software.
If you use an entity-framework (-clone), the SQL-queries created from the Expressions are pretty efficient, depending on the framework manufacturer. If you look at the code, then sometimes the SQL query is not the optimal one, although you'll have to be a pretty good SQL-programmer to improve most queries.
The big advantage above using Entity Framework and LINQ queries above SQL statements is that development times will be shorter. The syntax of the LINQ statements is checked at compile time, SQL statements at run-time. Development and test periods will be shorter.
It is easy to reuse LINQ statements, while SQL statements almost always have to be written especially for the query you want to execute. LINQ statements can be tested without a database on any sequence of items that represent your tables.
My Advice
For most queries you won't notice any difference in execution time between the entity framework query or the SQL query.
If you expect complicated queries and future changes, I'd go for entity framework. With main argument the shorter development time, the better testing possibilities, and the better maintainability.
If you detect some queries where you notice that the execution time is too long, you can always decide to bypass entity framework by executing a SQL query instead of using LINQ.
If you've wrapped your DbContext in a proper repository, where you hide the use cases from their implementations, the users of your repository won't notice the difference.

Related

What are the best practices for determining when (or not) to pre-compile a Linq Query? [duplicate]

I have a table:
-- Tag
ID | Name
-----------
1 | c#
2 | linq
3 | entity-framework
I have a class that will have the following methods:
IEnumerable<Tag> GetAll();
IEnumerable<Tag> GetByName();
Should I use a compiled query in this case?
static readonly Func<Entities, IEnumerable<Tag>> AllTags =
CompiledQuery.Compile<Entities, IEnumerable<Tag>>
(
e => e.Tags
);
Then my GetByName method would be:
IEnumerable<Tag> GetByName(string name)
{
using (var db = new Entities())
{
return AllTags(db).Where(t => t.Name.Contains(name)).ToList();
}
}
Which generates a SELECT ID, Name FROM Tag and execute Where on the code. Or should I avoid CompiledQuery in this case?
Basically I want to know when I should use compiled queries. Also, on a website they are compiled only once for the entire application?
You should use a CompiledQuery when all of the following are true:
The query will be executed more than once, varying only by parameter values.
The query is complex enough that the cost of expression evaluation and view generation is "significant" (trial and error)
You are not using a LINQ feature like IEnumerable<T>.Contains() which won't work with CompiledQuery.
You have already simplified the query, which gives a bigger performance benefit, when possible.
You do not intend to further compose the query results (e.g., restrict or project), which has the effect of "decompiling" it.
CompiledQuery does its work the first time a query is executed. It gives no benefit for the first execution. Like any performance tuning, generally avoid it until you're sure you're fixing an actual performance hotspot.
2012 Update: EF 5 will do this automatically (see "Entity Framework 5: Controlling automatic query compilation") . So add "You're not using EF 5" to the above list.
Compiled queries save you time, which would be spent generating expression trees. If the query is used often and you'll save the compiled query, you should definitely use it. I had many cases when the query parsing took more time than the actual round trip to the database.
In your case, if you are sure that it would generate SELECT ID, Name FROM Tag without the WHERE case (which I doubt, as your AllQueries function should return IQueryable and the actual query should be made only after calling ToList) - you shouldn't use it.
As someone already mentioned, on bigger tables SELECT * FROM [someBigTable] would take very long and you'll spend even more time filtering that on the client side. So you should make sure that your filtering is made on the database side, no matter if you are using compiled queries or not.
compiled queries are more helpfull with linq queries with large expression trees say complex queries to gain performance over building expression tree again and again while reusing query. in your case i guess it will save a very little time.
Compiled queries are compiled when the application is compiled and every time you reuse a query often or it is complex you should definitely try compiled queries to make execution faster.
But I would not go for it on all queries as it is a little more code to write and for simple queries it might not be worthwhile.
But for maximum performance you should also evaluate Stored Procedures where you do all the processing on the database server, even if Linq tries to push as much of the work to the db as possible you will have situations where a stored procedure will be faster.
Compiled queries offer a performance improvement, but it's not huge. If you have complex queries, I'd rather go with a stored procedure or a view, if possible; letting the database do it's thing might be a better approach.

Deciding on LINQ to SQL vs StoredProcs

While developing applications, I usually go for Stored Procedures to contain CRUD logic, so as improve performance and maintainability. But after experimenting with LINQ to SQL, I was wondering whether, using compiled LINQ-to-SQL queries over stored procedures will that help improve performance?
LINQ to SQL will not improve your performance, because you will be sending each CRUD operation as a string over the wire.
Performance will still be better with Stored Procedures, but ORM's like Linq to SQL usually make development time faster.
From my experience, I can rank performance as following:
Stored procedures
Native queries (using DBCommand)
Linq to entity (compiled query, EF4)
Linq to SQL (compiled)
Linq to entity (not compiled EF4)
Linq to SQL
ESQL
2,3,4 may change their order depends on the nature of the queries, but in general raw sql query is executed fater.
Based on your comments to both DevSlick and a1ex07, it seems you have a fundamental misunderstanding of what LINQ is. In order for LINQ queries to allow chaining, like
var activePeople = peopleList.Where(o => o.Active).OrderBy(o => o.Ordering).Select(o => o.Name);
the execution of the LINQ query must be delayed until it is enumerated:
foreach(var person in activePeople)
{
//If this is LINQ-to-SQL, the query to peopleList has waited until now to request anything from the database
}
This means that the query .Where(o => o.Active).OrderBy(o => o.Ordering).Select(o => o.Name) is not actually interpreted by your computer until that point as well. If you run the same query 100 times, that means the computer has to reinterpret that query 100 times. For LINQ-to-SQL, that means translating the query to SQL 100 times before that SQL is sent to the database each time, even if the SQL is exactly the same every time.
Compiling the query ahead of time causes it to generate the SQL only once, and use that SQL every time the query is called. This has nothing to do with stored procedures - you would compile a query-to-a-stored-procedure in the same way that you would compile any other query. Asking "which gives better performance" is meaningless, as they are not mutually exclusive.
Though compiling a query sounds like a good thing, in practice interpreting a LINQ query (usually called "evaluating the expression tree") takes very very little time compared to actually executing the SQL against the database, so you get very little benefit for compiling the query. In the meanwhile, the syntax for compiling a query is atrocious:
static readonly Func<AdventureWorksEntities, Decimal, IQueryable<SalesOrderHeader>> s_compiledQuery2 =
CompiledQuery.Compile<AdventureWorksEntities, Decimal, IQueryable<SalesOrderHeader>>(
(ctx, total) => from order in ctx.SalesOrderHeaders
where order.TotalDue >= total
select order);
var orders = s_compiledQuery2.Invoke(context, totalDue);
For this reason, it is usually recommended to simply not compile your LINQ-to-SQL queries, because the ratio of code-noise-to-benefit is terrible.

What is the big deal with IQueryable?

I've seen a lot of people talking about IQueryable and I haven't quite picked up on what all the buzz is about. I always work with generic List's and find they are very rich in the way you can "query" them and work with them, even run LINQ queries against them.
I'm wondering if there is a good reason to start considering a different default collection in my projects.
The IQueryable interface allows you to define parts of a query against a remote LINQ provider (typically against a database, but doesn't have to be) in multiple steps, and with deferred execution.
E.g. your database layer could define some restriction (e.g. based on permissions, security - whatever) by adding a .Where(x => x.......) clause to your query. But this doesn't get executed just yet - e.g. you're not retrieving 150'000 rows that match that criteria.
Instead, you pass up the IQueryable interface to the next level, the business layer, where you might be adding additional requirements and where clauses to your query - again, nothing gets executed just yet, you're also not tossing out 80'000 of your 150'000 rows you retrieved - you're just defining additional query criteria.
And the UI layer might do the same thing, e.g. based on user input in a form or something.
The magic is that you're passing the IQueryable interface through all the layers, adding additional critieria to it - but it doesn't get executed / evaluated until you actually force it. This also means you're not needlessly selecting and retrieving tons of data which you end up discarding afterwards.
You can't really do that with a classic static list - you have to pick the data, possibly discarding a lot of it again later on in the process - you have a static list, after all.
IQueryable allows you to make queries using LINQ, just like the LINQ to Object queries, where the queries are actually "compiled" and run elsewhere.
The most common implementations work against databases. If you use List<T> and LINQ to Objects, you load the entire "table" of data into memory, then run your query against it.
By using IQueryable<T>, the LINQ provide can "translate" your LINQ statement into actual SQL code, and run it on the database. The results can be returned to you and enumerated.
This is much, much more efficient, especially if you're working in N-Tiered systems.
LINQ queries against IEnumerable<T> produce delegates (methods) which, when invoked, perform the described query.
LINQ queries against IQueryable<T> produce expression trees, a data structure which represents the code that produced the query. LINQ providers such as LINQ to SQL interpret these data structures, generating the same query on the target platform (T-SQL in this case).
For an example of how the compiler interprets the query syntax against IQueryable<T>, see my answer to this question:
Building Dynamic LINQ Queries based on Combobox Value

What are the advantages of LINQ to SQL?

I've just started using LINQ to SQL on a mid-sized project, and would like to increase my understanding of what advantages L2S offers.
One disadvantage I see is that it adds another layer of code, and my understanding is that it has slower performance than using stored procedures and ADO.Net. It also seems that debugging could be a challenge, especially for more complex queries, and that these might end up being moved to a stored proc anyway.
I've always wanted a way to write queries in a better development environment, are L2S queries the solution I've been looking for? Or have we just created another layer on top of SQL, and now have twice as much to worry about?
Advantages L2S offers:
No magic strings, like you have in SQL queries
Intellisense
Compile check when database changes
Faster development
Unit of work pattern (context)
Auto-generated domain objects that are usable small projects
Lazy loading.
Learning to write linq queries/lambdas is a must learn for .NET developers.
Regarding performance:
Most likely the performance is not going to be a problem in most solutions. To pre-optimize is an anti-pattern. If you later see that some areas of the application are to slow, you can analyze these parts, and in some cases even swap some linq queries with stored procedures or ADO.NET.
In many cases the lazy loading feature can speed up performance, or at least simplify the code a lot.
Regarding debuging:
In my opinion debuging Linq2Sql is much easier than both stored procedures and ADO.NET. I recommend that you take a look at Linq2Sql Debug Visualizer, which enables you to see the query, and even trigger an execute to see the result when debugging.
You can also configure the context to write all sql queries to the console window, more information here
Regarding another layer:
Linq2Sql can be seen as another layer, but it is a purely data access layer. Stored procedures is also another layer of code, and I have seen many cases where part of the business logic has been implemented into stored procedures. This is much worse in my opinion because you are then splitting the business layer into two places, and it will be harder for developers to get a clear view of the business domain.
Just a few quick thoughts.
LINQ in general
Query in-memory collections and out-of-process data stores with the same syntax and operators
A declarative style works very well for queries - it's easier to both read and write in very many cases
Neat language integration allows new providers (both in and out of process) to be written and take advantage of the same query expression syntax
LINQ to SQL (or other database LINQ)
Writing queries where you need them rather than as stored procs makes development a lot faster: there are far fewer steps involved just to get the data you want
Far fewer strings (stored procs, parameter names or just plain SQL) involved where typos can be irritating; the other side of this coin is that you get Intellisense for your query
Unless you're going to work with the "raw" data from ADO.NET, you're going to have an object model somewhere anyway. Why not let LINQ to SQL handle it for you? I rather like being able to just do a query and get back the objects, ready to use.
I'd expect the performance to be fine - and where it isn't, you can tune it yourself or fall back to straight SQL. Using an ORM certainly doesn't remove the need for creating the right indexes etc, and you should usually check the SQL being generated for non-trivial queries.
It's not a panacea by any means, but I vastly prefer it to either making SQL queries directly or using stored procs.
I must say they are what you have been looking for. It takes some time getting used to it, but once you do you can't think of going back (at least for me).
Regarding linq vs. stored procedures, you can have poor performance on either if you build it wrong. I moved to linq to sql some stored procedures of a client that were awfully coded, so the time dropped from 20secs (totally unaceptable for a web app) to < 1 sec. And much much less code then the stored procedure solution.
Update 1: Also you get a lot of flexibility, as you can limit the columns of what you are getting and it will actually only retrieve that. On the stored procedure solution you have to define a procedure for each column set you are getting, even if the underlying queries are the same.
Just as an update, here are some links on the future of LINQ to SQL:
What is the Future of Linq to SQL
Has Microsoft confirmed their stance on LINQ to SQL end-of-life?
Is LINQ to SQL Dead or Alive?
As a comment in the last link states, LINQ to SQL isn't going to go away, just not "improved upon" at least by Microsoft. Take these comments and posts as you will, just be cautious in your development plans.
We switched over to LINQ2Entity over the Entity Framework environment recently. Before, we had basic SQLadapters. Since the database we are working with is rather small, I cannot comment on the performance of LINQ.
I must admit though, writing queries have become a lot easier, and the addition of Entities, does enable strong typing.

Write stored procedures in Linq/Lambda (Unit-testable but performant stored procs). Possible?

I am creating a data source for reporting model (SQL Server Reporting Services).
The reports requires a lot of joins and calculations (let's say, calculating financial parameters like money spent on this, that, amount A vs amount B)...all this involves subobjects.
It makes a lot of sense to me to write unit tests for this code (i.e. walking through order collection, aggregating info based on business rules and subobjects etc.)
To do this properly, I would expect my code to look approx. like this
foreach (IOrder in Orders)
foreach (IOrderLine in IOrder.Orderlines)
...
return ...
and then test the return value.
But this code is not the SQL which is going to be used in the reporting view...of course...
So I am thinking, I could plug-in a .NET assembly in the database.
The issue here is, of course, performance...I don't want to loop all these objects in C#...too slow.
So, naturally, Linq/Lambda/Expression trees seem to be the answer to me.
As we know, when you are doing Linq to SQL, expression trees are built, and then proper SQL is generated based on them.
So, I could write my code in Linq to Objects, using lambda expressions, unit test this code on sample collections (having expressions compiled to .net), and reuse the same code as Linq to SQL in the DB stored procedure, so that inside SQL Server it would generate proper SQL for me (as Linq to SQL already does)...
Then I could get benefits of both unit-tests and writing domain logic code in C# and high-performing stored procedures for reports.
Possible? Can I use Linq/Lambda in SQL Server CLR Stored procedures? Anyone did it or knows how to make it work?
Am I crazy? Do you know a better way of doing it?
Thanks
P.S. I think now I figured out how this should be done properly. According to Udi Dahan, if I understand him right. Database should be denormalized, and all the calculated fields should be on the objects in the table.
When something is happening on the subobject (OrderLine added), my Customer object should receive an event and recalculate the smart value (cache it and persist).
Then reports go straight-forward, without logic and work fast...
No, you cannot use LINQ/Lambda in SQL CRL Procs - it is based on a different version of .NET and does not support those namespaces.
So, I could write my code in Linq to
Objects, using lambda expressions,
unit test this code on sample
collections (having expressions
compiled to .net), and reuse the same
code as Linq to SQL in the DB stored
procedure, so that inside SQL Server
it would generate proper SQL for me
(as Linq to SQL already does)...
This plan was fine until you suggested the CLR code be called from your stored procs. Running CLR code from the database process itself creates a lot of problems with regards to versioning, configuration and database stability... Too many problems if you do that.
Your motivation was to have the benefit of using stored procs, which are faster in general. If those stored procs are in turn running CLR code, they're not going to be faster than the CLR code running in the local process.
Using the LINQ generated expressions technically consumes more CPU cycles than stored procs. This is because the database engine has to regenerate the execution plan each time a query is ran. Typically your database server is on a separate machine though that is not CPU bound (it will be limited instead by disk or network capacity), so this is not a real performance issue. It could be if you run the database server on the same machine as everything else, but don't try to fix this with something so convoluted until its a real issue.
Udi's suggestion may be appropriate, if you want to decrease the overhead of generating the reports. There are two important side effects though to consider first. First, can afford to increase the performance overhead of the operations that pregenerate the reported fields? A bigger problem is that it couples your reporting logic with the code that runs the target system. This prevents you from being able to update the reporting code without also updating the business code, and presumes the reporting code being running as soon as the reported code is put into production.

Resources