DbContext ChangeTracking kills performance? - performance

I am in the process of upgrading an application from EF1 to EF4.1
I created a DbContext and a set of POCOs using the "ADO.NET DbContext Generator" templates.
When I query the generated DbContext the database part of the query takes 4ms to execute (validated with EF Profiler). And then it takes the context about 40 seconds (in words: FORTY!) to do whatever it does before it returns the result to the application.
EF1 handles the same query in less than 2 seconds.
Turning off AutoDetectChanges, LazyLoading and ProxyGeneration wins me 2-3 seconds.
When I use the AsNoTracking() extension method I am able to reduce the total execution time to about 3 seconds.
That indicates that ChangeTracking is the culprit.
But ChangeTracking is what I need. I must be able to eventually persist all changes without having to handpick which entities were modified.
Any ideas how I could solve that performance issue?

Is the technique at the end of this documentation useful? Alternatively, I've avoided many of the performance pitfalls using a fluent interface to declaratively state which entities in a given transaction for sure won't change vs. might change (immutable vs. immutable). For example, if the entities I am saving are aggregate roots in which the root or its entities refer to "refdata" items, then this heuristic prevents many writes because the immutable items don't need to be tracked. The mutable items all get written without check (a weakness... One which may or may not be acceptable).
I'm using this with a generic repository pattern precisely because I don't want to track changes or implement a specific strategy for each case. If that's not enough, perhaps rolling your own change tracking outside of the context and adding entities in as needed will work.

Without seeing the query, I can't say for sure what the problem might be. Could this be related?
Why does the Contains() operator degrade Entity Framework's performance so dramatically?
Depending on the LINQ operators being used, it appears that EF has a tough time converting some queries to SQL. Maybe you're running up against a similar situation here.

Related

How to decrease the startup time of this LINQ-EF query? Precompiled EF Queries?

I am using MVC3, .NET4.5, C#, EF5.0, MSSQL2008 R2
My web application can take between 30 and 60secs to warm up ie get through first page load. Following page loads are very quick.
I have done a little more analysis, using DotTrace.
I have discovered that some of my LINQ queries, particularly the .COUNT and .ANY() ones take a long time to execute first time ie:
if (!Queryable.Any<RPRC_Product>((IQueryable<RPRC_Product>) this.db.Class.OfType<RPRC_Product>(), (Expression<Func<RPRC_Product, bool>>) (c => c.ReportId == (int?) myReportId)))
takes around 12 secs on first use.
Can you provide pointer how I can get these times down. I have heard about precompilers for EF queries.
I have a feeling the answer lies in using precompilation rather than altering this specific query.
Many thanks in advance
EDIT
Just read up on EF5's auto compile feature, so second time round compiled queries are in cache. So first time round, compilation is still required to intermediate EF language. Also read up on pregeneration of Views which may help generally as well?
It is exactly as you said - you do not have to worry about compiling queries - they are cached automatically by ef. Pregenerated Views might help you or may not. Automatic tools for scaffolding them generates not all views required and the missing ones still needs to be made at run time. When I developed my solutions and expected the same problem as you the only thing that helped me was to simplify a query. It turns out that if the first query to run is very complex and involves complicated joins then ef needs to generate many views to execute it. So I simplified the queries - instead of loading whole joined entities I loaded only ids (grouped , filtered out or whatever) and then when I needed to load single entities I loaded them by ids one by one. This allowed me to avoid long execution time of my first query.

SearchScope fetchRows vs fetchObjects (IBM FileNet CE API)

I've been using SearchScope.fetchObjects() method till this time, and then it just occurred to me that fetchRows might be the better choice in some cases (when you don't need metadata like class names, object stores etc). Something tells me it might be faster, but I didn't found any arguments about what method to use in which case, and why.
Here is SearchScope documentation.
The difference in performance of fetchRows() and fetchObjects() is negligible in most cases. If you process significant volume of data and still are concerned about performance I suggest making a simple test.
The only reason for existence of fetchRows() is the possibility to query disparate object classes using JOIN.

Entity Framework startup time

I'm wondering if it is possible to speed up the first query made with EF code first.
I've made a small test program with one entity containing 2 fields, and the first query takes 2.2 seconds, the second query (which is the exact same) takes 0.006 second.
I am already precompiling the view, so that wont help here.
I think the problem is that it takes some time to contruct the model in memory, but should it take that long? And is there a way to precompile this model like there is with the views?
This article: Squash Entity Framework startup time with pre-compiled views describes a solution in detail.
It involves using the Optimize Entity Data Model option in Entity Framework Power Tools to generate a pre-compiled .Views class file.
When you make your first query, EF initializes itself and that takes some time. I don't think there's much to do in order to speed up EF's infrastructure initialization but, if what you are really looking is to speed up the first query you make and not EF's initialization itself, well, you can try to force EF to initialize before running your first query.
using (var db = new MyContext())
{
db.Database.Initialize(force: true);
}

Is ORM (Linq, Hibernate...) really that useful?

I have been playing with some LINQ ORM (LINQ directly to SQL) and I have to admit I like its expressive powers . For small utility-like apps, It also works quite fast: dropping a SQL server on some surface and you're set to linq away.
For larger apps however, the DAL never was that big of an issue to me to setup, nor maintain, and more often than not, once it was set, all the programming was not happening there anyway...
My, honest - I am an ORM newbie - question : what is the big advantage of ORM over writing a decent DAL by hand?
(seems like a double, couldn't find it though)
UPDATE : OK its a double :-) I found it myself eventually :
ORM vs Handcoded Data Access Layer
Strong-typing
No need to write the DAL yourself => time savings
No need to write SQL code yourself =>
less error-prone
I've used Hibernate in the past to dynamically create quite complex queries. The logic involved to create the appropriate SQL would have been very time-consuming to implement, compared with the logic to build the appropriate Criteria. Additionally, Hibernate knew how to work with various different databases, so I didn't need to put any of that logic in our code. We had to test against different databases of course, and I needed to write an extension to handle "like" queries appropriately, but then it ran against SQL Server, Oracle and HSqldb (for testing) with no issues.
There's also the fact that it's more code you don't have to write, which is always a nice thing :) I can't say I've used LINQ to SQL in anything big, but where I've used it for a "quick and dirty" web-site (very small, rarely updated, little benefit from full layer abstraction) it was lovely.
I used JPA in a project, and at first I was extremely impressed. Gosh it saved me all that time writing SQL! Gradually, however, I became a bit disenchanted.
Difficulty defining tables without surrogate keys. Sometimes we need tables that don't have surrogate keys. Sometimes we want a multicolumn primary key. TopLink had difficulties with that.
Forced datastructure relationships. JPA uses annotations to describe the relationship between a field and the container or referencing class. While this may seem great at first site, what do you do when you reference the objects differently in the application? Say for example, you need just specific objects that reference specific records based on some specific criteria (and it needs to be high-performance with no unnecessary object allocation or record retrieval). The effort to modify Entity classes will almost always exceed the effort that would have existed had you never used JPA in the first place (assuming you are at all successful getting JPA to do what you want).
Caching. JPA defines the notion of caches for your objects. It must be remembered that the database has its own cache, typically optimized around minimizing disk reads. Now you're caching your data twice (ignoring the uncollected GC heap). How this can be an advantage is beyond me.
Data != Objects. For high-performance applications, the retrieval of data from the DB must be done very efficiently. Forcing object creation is not always a good thing. For example, sometimes you may want arrays of primitives. This is about 30 minutes of work for an experienced programmer working with straight JDBC.
Performance, debugging.
It is much more difficult to gauge the performance of an application with complex things going on in the (sub-optimal, autogenerated) caching subsystem, further straining project resources and budgets.
Most developers don't really understand the impedence mismatch problem that has always existed when mapping objects to tables. This fact ensures that JPA and friends will probably enjoy considerable (cough cough) success for the forseeable future.
Well, for me it is a lot about not having to reinvent/recreate the wheel each time I need to implement a new domain model. It is simply a lot more efficient to use for instance nHibernate (my ORM of choice) for creating, using and maintaining the data access layer.
You don't specify exactly how you build your DAL, but for me I used to spend quite some time doing the same stuff over and over again. I used to start with the database model and work my way up from there, creating stored procedures etc. Even if I sometimes used little tools to generate parts of the setup, it was a lot of repetitive coding.
Nowadays I start with the domain. I model it in UML, and for most of the time I'm able to generate everything from that model, including the database schema. It need a few tweaks here and there, but with my current setup I get 95% of the job with the data access done in no time at all. The time I save I can use to fine tune the parts that need tuning. I seldom need to write any SQL statements.
That's my two cents. :-)
Portability between different db vendors.
My, honest - i am an ORM newbie - question : what is the big advance of ORM over writing a decent DAL by hand?
Not all programmers are willing or even capable of writing "a decent DAL". Those who can't or get scared from the mere thought of it, find LINQ or any other ORM a blessing.
I personally use LINQ to manipulate collections in the code because of its expressiveness. It offers a very compact and transparent way to perform some common tasks on collections directly in code.
LINQ will stop being useful to you when you will want to create very specific and optimized queries by hand. Then you are likely to get a mixture of LINQ queries intermingled with custom stored procedures wired into it. Because of this considerations, I decided against LINQ to SQL in my current project (since I have a decent (imho) DAL layer). But I'm sure LINW will do just fine for simple sites like maybe your blog (or SO for that matter).
With LINQ/ORM there may also be a consideration of lagging for high traffic sites (since each incoming query will have to be compiled all over again). Though I have to admit I do not see any performance issues on SO.
You can also consider waiting for the Entity Framework v2. It should be more powerful than LINQ (and hopefully not that bad as v1 (according to some people)).
Transparent persistence - changes get saved (and cascaded) without you having to call Save(). At first glance this seems like a nightmare, but once you get used to working with it rather than against it, your domain code can be freed of persistence concerns almost completely. I don't know of any ORM other than Hibernate / NHibernate that does this, though there might be some...
The best way to answer the question is to understand exactly what libraries like Hibernate are actually accomplishing on your behalf. Most of the time abstractions exist for a reason, often to make certain problems less complex, or in the case Hibernate is almost a DSL for expression certain persistance concepts in a simple terse manner.
One can easily change the fetch strategy for collections by changing an annotation rather than writing up lots of code.
Hibernate and Linq are proven and tested by many, there is little chance you can achieve this quality without lots of work.
Hibernate addresses many features that would take you months and years to code.
Also, while the JPA documentation says that composite keys are supported, it can get very (very) tricky quickly. You can easily spend hours (days?) trying to get something quite simple working. If JPA really makes things simpler then developers should be freed from thinking too much about these details. It doesn't, and we are left with having to understand two levels of abstraction, the ORM (JPA) and JDBC. For my current project I'm using a very simple implementation that uses a package protected static get "constructor" that takes a ResultSet and returns an Object. This is about 4 lines of code per class, plus one line of code for each field. It's simple, high-performance, and quite effective, and I retain total control. If I need to access objects differently I can add another method that reads different fields (leaving the others null, for example). I don't require a spec that tells me "how ORMs must (!) be done". If I require caching for that class, I can implement it precisely as required.
I have used Linq, I found it very useful. I saves a lot of your time writing data access code. But for large applications you need more than DAL, for them you can easily extent classes created by it. Believe me, it really improves your productivity.

Switching to LINQ

I'm considering spending time learning and using LINQ to SQL but after years of best practices advising NOT to embed SQL I'm having a hard time changing paradigms.
Why does it seem accepted now to embed queries in compiled code? It seems almost a step backwards to me in some ways.
Has anyone had issues with fix query / compile / deploy cycle after switching to LINQ?
I think I still might wait for the finished Entity Framework.
What do you think?
The advantage of Linq to Sql is that it doesn't really embed queries in compiled code - not really. The Linq statement means that your .Net code actually has the logic required to build the Sql statement embedded, not the raw Sql.
It really makes a lot of sense to have .Net code that converts directly to the Sql to execute, rather than a long list of sprocs with associated documentation. The Linq way is much easier to maintain and improve.
I don't think I'd switch an existing project to Linq - really it's a replacement for the entire data-layer and it can change the way all access to that layer is done. Unless you're switching from a very similar model the cost is going to be far too high for any potential gains.
Linq to Sql's real power is in quickly creating new applications - it allows you to very rapidly create the data-layer code.
I undertand your point, this does indeed seem like a bit of a backward step...
Actually I would probably steer away from LINQ to SQL and look more at LINQ to Entities, your entities model your conceptual data model and I personaly feel more comfortable embedding queries agains a conceptual model in my code. The actual physical model is abstracted away from you by an entity framework.
This link (excuse the pun) discusses LINQ to Entities and the Entity Framework: http://msdn.microsoft.com/en-us/library/bb386992.aspx
This is an interesting article discussign the pros and cons of both approaches: http://dotnetaddict.dotnetdevelopersjournal.com/adoef_vs_linqsql.htm
Edit Another thought, if you don't want wait for EF, have a look at NHibernate, you can LINQ to that too... See http://www.hookedonlinq.com/LINQToNHibernate.ashx
You need to think of LINQ to SQL as an abstraction above writing SQL directly yourself. If you can get your head around this then you’ve made a step in the right direction. You also need to let go of some long held beliefs such as compiled sprocs are always faster and SQL accounts shouldn’t have data reader / writer privileges.
I’ve found that it’s possible to begin gradually moving existing solutions towards LINQ to SQL so long as there is a clear DAL in place and you’re just changing the implementation without affecting the contract it may have with consuming code. Reference lists are an easy candidate as they’re low impact, read only sets of data. The main thing you need to remain conscious of if retrofitting is potential ambiguous class names if you’ve already hand coded them to model the database.
With the value of hindsight in bringing LINQ to SQL into a large enterprise (since CTP days), I’d do it again in a heartbeat. It’s not perfect and there are issues but there are enormous benefits particularly when it comes to development speed and maintainability. It’s a new paradigm and is definitely, definitely a step forward.
There is an implementation of LINQ to SQL not only for SQL Server databases, so the non-SQL Server developers can also take advantage of using this efficient ORM.
We have already added support for query-level LaodWith( ) and extended the error processing.
Also we plan to support all three inheritance models (TPH, TPT, TPC) and key field generation.
You can find the list of supported databases here
I don't think of it as embedding SQL in your code any more than embedding a Stored Proc name in your code is. More often than not a change to your Proc involves change to your code anyway. For example, you usually need to add a new in/out parameter or update a getter/setter method to reference a new column.
What it does is remove a lot of the leg work of writing twice as much code to align properties and methods in your code with procs and columns in your DB.

Resources