Entity Framework with a very large and complex dataset - performance

I would appreciate it very much if you can help me with my questions:
Is EF5 reliable and efficient enough to deal with very large and complex dataset in the real world?
Comparing EF5 with ADO.NET, does EF5 requires significantly more resources such as memory?
For those who have tried EF5 on a real world project with very large and complex dataset, are you happy with the performance so far?

As EF creates an abstraction over the data access. Usage of EF introduces number of additional steps to execute any query. But there are workarounds to reduce the cost. As MS is promoting this ORM, i believe they are doing alot for performance improvement as well. EF 6 beta is already out.
There is good article on performance of EF5 available on MSDN for this.
http://msdn.microsoft.com/en-us/data/hh949853
I will not use EF if lot of complex manipulations and iterations are required on the DBsets after populating the DBSets from the EF query.
Hope this helps.

EF is more than capable of handling large amounts of data. Just as in plain ADO.NET, you need to understand how to use it correctly. Its just as easy to write code in ADO.NET that performs poorly. Its also important to remember that EF is built on top of ADO.NET.
DBSets will be much slower with large amounts of data than a code first EF approach. Plain Datareaders could be faster if done correctly.
The correct answer is 'profile'. Code some large objects and profile the differences.

I did some research into this on EF 4.1 and some details might still apply though there have been upgrades in performance to EF5 to keep in mind.
ADO vs EF performance
My conclusions:
-You won't match ADO performance with a framework that has to generate the actual SQL statement dynamically from C# and turn that sql statement into a strongly typed object, there is simply too much going on, even with pre compiled and 'warmed' statements (and performance tests conclude this). This is a happy trade off for many who find it much easier to write and maintain Linq in C# than stored procs and mapping code.
-You can still use stored procedures with equal performance to ADO which to me is the perfect situation. You can write linq to sql in situations where it works, and as soon as you would like more performance, write the sproc and import it to EF to enjoy the best performance possible.
-There is no technical reason to avoid EF accept for the requirements of your project and possibly your teams knowledge and comfort level. Be sure to make sure it fits into your selected design pattern.. EF Anti Patterns to avoid
I hope this helps.

Related

Best approach to designing DAL with ADO.NET for MVC 3 application?

I see a ton of examples for MVC DAL with entity framework, but nothing for ADO.NET and stored procedures?
There seems to be a trend on the "Repository" pattern and "UnitofWork" for creating a DAL, similar to this:
http://www.codeproject.com/Articles/207820/The-Repository-Pattern-with-EF-code-first-Dependen
How would I migrate this codebase away from EF to ADO.net stored procedures?
How would I migrate this codebase away from EF to ADO.net stored procedures?
You have gotten very few answers as most of us are moving away from stored procedures.
The two biggest reasons for that are:
Control over the business logic
Having all the business logic in one place makes it easier to read the code, and therefore maintain the application. i.e. you get a muc better flow when programming.
If you spread out the business logic between SPs and .NET code you have to mentally shift (store state) each time to switch between code and SPs.
Easier to test
Testing is important. Especially for applications which have a maintenance plan.
For .NET there are several tools for testing your code. Everything can be tested in isolation (without external dependencies) with little effort, and there are several articles describing different test techniques.
Testing stored procedures in isolation is hard.
Myth: Stored procedures is faster than SQL queries.
Today stored procedures do not have a performance gain over parameterized queries (i.e. queries that uses arguments as #userName) as they did a couple of years ago (SQL Server 2000 and below). They should infact have similar performance as the execution plan is now saved for parameterized queries too.
However, if you have logic in your SP:s which process the result from multiple queries they DO get better performance as no roundtrip between your application and the database server is required. But the can easily be compensated by different application architecture.
Conclusion
Think twice before going down that path. It's usually not worth it. What you gain (money) in less CPU cycles is typically a lot less than the amount of hours spent on creating and maintaining the application.
That said, stored procedures can be used as instructed here: http://msdn.microsoft.com/en-us/data/gg699321.aspx

Linq, entity framework and their usage

I had to develop system for my university which includes tracking of almost all data (lectures, lecturers, teaching assistants, students, etc..)
My database have like 30 tables, and it's pretty complex.
I used EF and linq to solve connecting to database and querying from it.
But the more I'm going into, the more my queries become to hard to write, and not to mention to maintain.
Here is the example of one query: http://pastebin.com/Za1cYMPa
It is pretty much in chaos as you can see.
So, am I misusing linq (linq can solve this but on different way) or linq just isn't for more complex systems like this one?
This is general question, not one to solve particular problem.
Do you think that query will look better or be better maintainable if written in native SQL? In such case you can hide the query in stored procedure. Once you came into advanced queries it always mess. You can reduce some complexities by hiding subqueries either into database views or into EF query views. But still if you have highly normalized OLTP database and you need to do complex reporting / analysis / data mining query it will always be big and badly maintainable. That is the reason why OLAP systems exist (I didn't check content of your query - just length so don't take it as the reason to build OLAP Cube).
More important is performance of the query ...
In general, Complexity can be reduced by abstracting your repetitive code into reusable, easily-maintainable components.
In your particular case, the complexity can be reduced by implementing the Repository and Specification patterns to make your code more consistent, easier to read, and easier to maintain.
There is a very helpful sequence of articles by Huy Nhuyen explaining the repository pattern, the specification pattern and how to effectively combine them with the Entity Framework for better code. These are available at:
Entity Framework 4 POCO, Repository and Specification Pattern
Specification Pattern In Entity Framework 4 Revisited
Entity Framework 4 POCO, Repository and Specification Pattern [Upgraded to CTP5]
Entity Framework 4 POCO, Repository and Specification Pattern [Upgraded to EF 4.1]
Ooof, that is pretty nasty.
You should think about your database design. I believe having so many tables is causing your queries to be overly complicated. I well placed set of views (or stored procedures) can simplify the querying logic. However this will not simplify your overall system because at some point you have to connect all these tables.

hibernate v/s stored procedure or functions performance

I am analyzing the options for database layer in my application. I found hibernate a very popular choice however few friends told that better to use stored procedures / function rather than going for hibernate. Hibernate has performance issues compared to these database objects. Is there any other option. My application may have very high volume of transactions so need to select a option which gives a better performance. Can someone put some light on this and help me choose the best option. I am using spring framework as core and richfaces for web layer.
My application may have very high
volume of transactions so need to
select a option which gives a better
performance
Well, if performance is your only (or primary) benchmark, then its hard to beat Oracle packages on the db server. However, your company should consider the strengths of its developers. Is this a shop with mostly Java devs and 1 or 2 lonely Oracle devs and 1 DBA? If so, don't develop your middleware system in Oracle packages, you'll probably have some XML service written in Java using Hibernate. Won't be as fast under load, but will be easier to maintain and grow for YOUR company.
Note: I'm biased towards using Oracle technologies, but thats where my strengths are.
This is pretty late, but I would like to add a few points of using hibernate vs using stored procedures.
From a performance perspective I believe that since writing a stored procedure means you are closer to the database, it would invariably result in a faster output if written efficiently. Consequently hibernate, since its working on the database cannot really be faster than the database. The freedom to optimize queries is something hibernate steals from you and while hibernate may come up with many optimal queries, there may still be some chances for better optimization.
Even pro hibernate developers confess that if you are updating a large dataset, its better to use a procedure rather than make multiple calls with hibernate over the network.
So to summarize I suggest to use procedures and functions for good performance
The best option is you'll figure it out. Sometimes using any ORM is perfectly fine and will support you, other instances it isn't the best option. I think the real answer is it depends on what you're doing, how you're doing it and the quality of product design. All of those make a difference and can greatly dictate a failure or success.
Bottom line, absolutes are a horrible policy -- Use the tech that works and fixes a problem. If it starts being a problem, re-evaluate.

Performance gains using straight ado.net vs an ORM?

would i get any performance gains if i replace my data access part of the application from nhiberate to straight ado.net ?
i know that NHibernate uses ado.net at its core !
Short answer:
It depends on what kind of operations you perform. You probably would get a performance improvement if you write good SQL, but in some cases you might get worse performance since you lose the NHibernate caching etc.
Long answer:
As you mention, NHibernate sits on top of ADO.NET and provides a layer of abstraction. This makes your life easier in many ways, but as all layers of abstraction it has a certain performance cost.
The main case where you probably would see a performance benefit is when you are operating on many objects at once, such as updating many entities or fetching a large amount of entities. This is because of the work that the NHibernate session does to keep track of which objects are modified etc. My experience is that the performance of NHibernate degrades significantly as the amount of entities in the session grows.
NHibernate has a lot of ways to improve performance and if you really know it well, you can get it to perform quite close to ADO.NET. However, if you are not that familiar with it, you can easilly shoot yourself in the foot, performance-wise. (Select N+1 problem, etc.)
There are some situations where you could actually get worse performance when switching form NHibernate to straight ADO.NET. This is because of the fact that the NHibernate abstraction layer introduces some features that can improve performance, such as caching. NHibernate also includes functionality for optimizing the generated SQL for the current database management system. For example, if you are using SQL Server it might generate slightly different SQL than if you are using Oracle.
It is worth mentioning that it does not have to be an all or nothing kind of situation. You could use NHibernate for the 90% of your database access for which it works great, and then use straight SQL for the 10% where you do complex queries, batch inserts/updates etc.

LINQ2SQL performance vs. custom DAL vs. NHibernate

Given a straightforward user-driven, high traffic web application (no fancy reporting/BI):
If my utmost goal is performance (not ease of maintainability, ease of queryability, etc) I would surmise that in most cases, a roll-yourown DAL would be the best choice.
However, if i were to choose Linq2SQL or NHibernate, roughly what kind of performance hit would we be talking about? 10%? 20%? 200%? Of the two, which would be faster?
Does anyone have any real world numbers that could shed some light on this? (and yes, I know Stackoverflow runs on Linq2SQL..)
If you know your stuff (esp. in SQL and ADO.NET), then yes - most likely, you'll be able to create a highly tweaked, highly optimized custom DAL for your particular interest and be faster overall than a general-purpose ORM like Linq-to-SQL or NHibernate.
As to how much - that's really really hard to say without knowing your concrete table structure, data and usage patterns. I remember Rico Mariani did some Linq-to-SQL vs. raw SQL comparisons, and his end result was that Linq-to-SQL achieve over 90% of the performance of a highly skilled SQL programmer.
See: http://blogs.msdn.com/ricom/archive/2007/07/05/dlinq-linq-to-sql-performance-part-4.aspx
Not too shabby in my book, especially if you factor in the productivity gains you get - but that's the big trade-off always: productivity vs. raw performance.
Here's another blog post on Entity Framework and Linq-to-SQL compared to DataReader and DataTable performance.
I don't have any such numbers for NHibernate, unfortunately.
In two high traffic web apps refactoring a ORM call to use a stored procedure from ado.net only got us about 1-2% change in CPU and time.
Going from an ORM to a custom DAL is an exercise in micro optimization.

Resources