Entity Framework associations killing performance - performance

Here is the performance test i am looking at. I have 8 different entities that are table per type. Some of the entities contain over 100 thousand rows.
This particular application does several recursive calculations on the client so I think it may be best to preload the data instead of lazy loading.
If there are no associations I can load the entire database in about 3 seconds. As I add associations in any way the performance starts to drastically decline.
I am loading all the data the same way (just calling toList() on the entity attached to the context). I ran the test with edmx generated classes and self tracking entities and had similar results.
I am sure if I were to try and deal with the associations myself, similar to how I would in a dataset, the performance problem would go away. On the other hand I am pretty sure this is not how the entity framework was intended to being used. Any thoughts or ideas?

Loading entities with relationships is going to be much slower than loading entities without even if the related entities are not fetched at load time since it will need to create the complex object used to track the relationship in one case vs perhaps a simple value type like an int in the other. How much slower are you seeing it?
But ...
Preloading 100 thousand rows sounds like a really bad idea. When you do ToList() you have eliminated any chance that EF and SQL can do any kind of optimized query against your data. Are your calculations such that you always need to examine all the data? Have you tried it without preloading and examined the queries it is generating? Have you tried using .Include to just include the related objects you know you will need?
EF will be smart about caching if you give it the chance.

Related

Ditching ActiveRecord and NHibernate -- how to rearchitect?

I have an MVC3 NHibernate/ActiveRecord project. The project is going okay, and I'm getting a bit of use out of my model objects (mostly one giant hierarchy of three or four classes).
My application is analytics based; I store hierarchial data, and later slice it up, display it in graphs, etc. so the actual relationship is not that complicated.
So far, I haven't benefited much from ORM; it makes querying easy (ActiveRecord), but I frequently need less information than full objects, and I need to write "hard" queries through complex and multiple selects and iterations over collections -- raw SQL would be much faster and cleaner.
So I'm thinking about ditching ORM in this case, and going back to raw SQL. But I'm not sure how to rearchitect my solution. How should I handle the database tier?
Should I still have one class per model, with static methods to query for objects? Or should I have one class representing the DB?
Should I write my own layer under ActiveRecord (or my own ActiveRecord-like implementation) to keep the existing code more or less sound?
Should I combine ORM methods (like Save/Delete) into my model classes or not?
Should I change my table structure (one table per class with all of the fields)?
Any advice would be appreciated. I'm trying to figure out the best architecture and design to go with.
Many, including myself, think the ActiveRecord pattern is an anti-pattern mainly because it breaks the SRP and doesn't allow POCO objects (tightly coupling your domain to a particular ORM).
In saying that, you can't beat an ORM for simple CRUD stuff, so I would keep some kind of ORM around for that kind of work. Just re-architect your application to use POCO objects and some kind or repository pattern with your ORM implementation specifics in another project.
As for your "hard" queries, I would consider creating one class per view using a tiny ORM (like Dapper, PetaPoco, or Massive), to query the objects with your own raw sql.

Best strategy for retrieving large dynamically-specified tables on an ASP.NET page

Looking for a bit of advice on how to optimise one of our projects. We have a ASP.NET/C# system that retrieves data from a SQL2008 data and presents it on a DevExpress ASPxGridView. The data that's retrieved can come from one of a number of databases - all of which are slightly different and are being added and removed regularly. The user is presented with a list of live "companies", and the data is retrieved from the corresponding database.
At the moment, data is being retrieved using a standard SqlDataSource and a dynamically-created SQL SELECT statement. There are a few JOINs in the statement, as well as optional WHERE constraints, again dynamically-created depending on the database and the user's permission level.
All of this works great (honest!), apart from performance. When it comes to some databases, there are several hundreds of thousands of rows, and retrieving and paging through the data is quite slow (the databases are already properly indexed). I've therefore been looking at ways of speeding the system up, and it seems to boil down to two choices: XPO or LINQ.
LINQ seems to be the popular choice, but I'm not sure how easy it will be to implement with a system that is so dynamic in nature - would I need to create "definitions" for each database that LINQ could access? I'm also a bit unsure about creating the LINQ queries dynamically too, although looking at a few examples that part at least seems doable.
XPO, on the other hand, seems to allow me to create a XPO Data Source on the fly. However, I can't find too much information on how to JOIN to other tables.
Can anyone offer any advice on which method - if any - is the best to try and retro-fit into this project? Or is the dynamic SQL model currently used fundamentally different from LINQ and XPO and best left alone?
Before you go and change the whole way that your app talks to the database, have you had a look at the following:
Run your code through a performance profiler (such as Redgate's performance profiler), the results are often surprising.
If you are constructing the SQL string on the fly, are you using .Net best practices such as String.Concat("str1", "str2") instead of "str1" + "str2". Remember, multiple small gains add up to big gains.
Have you thought about having a summary table or database that is periodically updated (say every 15 mins, you might need to run a service to update this data automatically.) so that you are only hitting one database. New connections to databases are quiet expensive.
Have you looked at the query plans for the SQL that you are running. Today, I moved a dynamically created SQL string to a sproc (only 1 param changed) and shaved 5-10 seconds off the running time (it was being called 100-10000 times depending on some conditions).
Just a warning if you do use LINQ. I have seen some developers who have decided to use LINQ write more inefficient code because they did not know what they are doing (pulling 36,000 records when they needed to check for 1 for example). This things are very easily overlooked.
Just something to get you started on and hopefully there is something there that you haven't thought of.
Cheers,
Stu
As far as I understand you are talking about so called server mode when all data manipulations are done on the DB server instead of them to the web server and processing them there. In this mode grid works very fast with data sources that can contain hundreds thousands records. If you want to use this mode, you should either create the corresponding LINQ classes or XPO classes. If you decide to use LINQ based server mode, the LINQServerModeDataSource provides the Selecting event which can be used to set a custom IQueryable and KeyExpression. I would suggest that you use LINQ in your application. I hope, this information will be helpful to you.
I guess there are two points where performance might be tweaked in this case. I'll assume that you're accessing the database directly rather than through some kind of secondary layer.
First, you don't say how you're displaying the data itself. If you're loading thousands of records into a grid, that will take time no matter how fast everything else is. Obviously the trick here is to show a subset of the data and allow the user to page, etc. If you're not doing this then that might be a good place to start.
Second, you say that the tables are properly indexed. If this is the case, and assuming that you're not loading 1,000 records into the page at once and retreiving only subsets at a time, then you should be OK.
But, if you're only doing an ExecuteQuery() against an SQL connection to get a dataset back I don't see how Linq or anything else will help you. I'd say that the problem is obviously on the DB side.
So to solve the problem with the database you need to profile the different SELECT statements you're running against it, examine the query plan and identify the places where things are slowing down. You might want to start by using the SQL Server Profiler, but if you have a good DBA, sometimes just looking at the query plan (which you can get from Management Studio) is usually enough.

How to access data in Dynamics CRM?

What is the best way in terms of speed of the platform and maintainability to access data (read only) on Dynamics CRM 4? I've done all three, but interested in the opinions of the crowd.
Via the API
Via the webservices directly
Via DB calls to the views
...and why?
My thoughts normally center around DB calls to the views but I know there are purists out there.
Given both requirements I'd say you want to call the views. Properly crafted SQL queries will fly.
Going through the API is required if you plan to modify data, but it isnt the fastest approach around because it doesnt allow deep loading of entities. For instance if you want to look at customers and their orders you'll have to load both up individually and then join them manually. Where as a SQL query will already have the data joined.
Nevermind that the TDS stream is a lot more effecient that the SOAP messages being used by the API & webservices.
UPDATE
I should point out in regard to the views and CRM database in general: CRM does not optimize the indexes on the tables or views for custom entities (how could it?). So if you have a truckload entity that you lookup by destination all the time you'll need to add an index for that property. Depending upon your application it could make a huge difference in performance.
I'll add to jake's comment by saying that querying against the tables directly instead of the views (*base & *extensionbase) will be even faster.
In order of speed it'd be:
direct table query
view query
filterd view query
api call
Direct table updates:
I disagree with Jake that all updates must go through the API. The correct statement is that going through the API is the only supported way to do updates. There are in fact several instances where directly modifying the tables is the most reasonable option:
One time imports of large volumes of data while the system is not in operation.
Modification of specific fields across large volumes of data.
I agree that this sort of direct modification should only be a last resort when the performance of the API is unacceptable. However, if you want to modify a boolean field on thousands of records, doing a direct SQL update to the table is a great option.
Relative Speed
I agree with XVargas as far as relative speed.
Unfiltered Views vs Tables: I have not found the performance advantage to be worth the hassle of manually joining the base and extension tables.
Unfiltered views vs Filtered views: I recently was working with a complicated query which took about 15 minutes to run using the filtered views. After switching to the unfiltered views this query ran in about 10 seconds. Looking at the respective query plans, the raw query had 8 operations while the query against the filtered views had over 80 operations.
Unfiltered Views vs API: I have never compared querying through the API against querying views, but I have compared the cost of writing data through the API vs inserting directly through SQL. Importing millions of records through the API can take several days, while the same operation using insert statements might take several minutes. I assume the difference isn't as great during reads but it is probably still large.

When using LINQ shall we use 3 layers?

When using LINQ to SQL or Entity framework,shall we need to separate application in 3 layers?BLL,DAL,Interface?
Do what works for you. Building a wedding website with a handful of links and getting 5 content pages out of the database? More than 1 layer seems like tremendous overkill. On the flip side, for a very complex or large project: I think you'd want at least some degree separation because it saves time, confusion and sanity.
It matters what you're working on and how much division it requires. Ultimately it's what you and your team prefer. There's no right answer, it's what fits the situation.
in projects I've been developing, I find value in creating a DL even when using Linq2Sql for data access.
My main reason is because many of the calls to the DL, to retreive one or more business objects from the DB, actually require more than one call to the database, especially when implementing an eager-loading strategy. and when saving a business object, whose data is stored in multiple tables, a transaction can be established across multiple calls to the database.
The business layer doesn't need to know that; it should be able to make a single call to the DL and leave it to the DL to do all the tedious querying and collation of data into business objects.
I'm with #MikeJacobs.
I've actually written a LINQ2SQL library which abstracts ALL the DataContext stuff, and all the .Insert(), .Execute() and .SubmitChanges().
It's really nice to just abstract that away. In LINQ2SQL, you're still dependant on all your layers knowing about the LINQ2SQL Entities, but my top layers is very rarely sending complex lambdas to the DAL, most of that's done in the DAL.

Loading a huge entity tree with EF

I need to load a model, existing of +/- 20 tables from the database with Entity Framework.
So there are probably a few ways of doing this:
Use one huge Include call
Use many Includes calls while manually iterating the model
Use many IsLoaded and Load calls
Here's what happens with the 2 options
EF creates a HUGE query, puts a very heavy load on the DB and then again with mapping the model. So not really an option.
The database gets called a lot, with again pretty big queries.
Again, the database gets called even more, but this time with small loads.
All of these options weigh heavy on the performance. I do need to load all of that data (calculations for drawing).
So what can I do?
a) Heavy operation => heavy load => do nothing :)
b) Review design => but how?
c) A magical option that will make all these problems go away
When you need to load a lot of data from a lack of different tables, there is no "magic" solution which makes all problems go away. But in addition to what you have already discussed, you should consider projection. If you don't need every single property of an entity, it is often cheaper to project the information you do need, i.e.:
from parent in MyEntities.Parents
select new
{
ParentName = ParentName,
Children = from child in parent.Children
select new
{
ChildName = child.Name
}
}
One other thing to keep in mind is that for very large queries, the cost of compiling the query can often exceed the cost of executing it. Only profiling can tell you if this is the problem. If this turns out to be the problem, consider using CompiledQuery.
You might analyze the ratio of queries to updates. If you mostly upload the model once, then everything else is a query, then maybe you should store an XML representation of the model in the database as a "shadow" of the model. You should be able to either read the entire XML column in at once fairly quickly, or else maybe you can do your calculations (or at least the fetch of the values necessary for the calculations) using XQuery.
This assumes SQL Server 2005 or above.
You could consider caching your data in memory instead of getting it from the database each time.
I would recommend Enterprise Library Caching Application block: http://msdn.microsoft.com/en-us/library/dd203099.aspx

Resources