hibernate v/s stored procedure or functions performance - performance

I am analyzing the options for database layer in my application. I found hibernate a very popular choice however few friends told that better to use stored procedures / function rather than going for hibernate. Hibernate has performance issues compared to these database objects. Is there any other option. My application may have very high volume of transactions so need to select a option which gives a better performance. Can someone put some light on this and help me choose the best option. I am using spring framework as core and richfaces for web layer.

My application may have very high
volume of transactions so need to
select a option which gives a better
performance
Well, if performance is your only (or primary) benchmark, then its hard to beat Oracle packages on the db server. However, your company should consider the strengths of its developers. Is this a shop with mostly Java devs and 1 or 2 lonely Oracle devs and 1 DBA? If so, don't develop your middleware system in Oracle packages, you'll probably have some XML service written in Java using Hibernate. Won't be as fast under load, but will be easier to maintain and grow for YOUR company.
Note: I'm biased towards using Oracle technologies, but thats where my strengths are.

This is pretty late, but I would like to add a few points of using hibernate vs using stored procedures.
From a performance perspective I believe that since writing a stored procedure means you are closer to the database, it would invariably result in a faster output if written efficiently. Consequently hibernate, since its working on the database cannot really be faster than the database. The freedom to optimize queries is something hibernate steals from you and while hibernate may come up with many optimal queries, there may still be some chances for better optimization.
Even pro hibernate developers confess that if you are updating a large dataset, its better to use a procedure rather than make multiple calls with hibernate over the network.
So to summarize I suggest to use procedures and functions for good performance

The best option is you'll figure it out. Sometimes using any ORM is perfectly fine and will support you, other instances it isn't the best option. I think the real answer is it depends on what you're doing, how you're doing it and the quality of product design. All of those make a difference and can greatly dictate a failure or success.
Bottom line, absolutes are a horrible policy -- Use the tech that works and fixes a problem. If it starts being a problem, re-evaluate.

Related

Entity Framework with a very large and complex dataset

I would appreciate it very much if you can help me with my questions:
Is EF5 reliable and efficient enough to deal with very large and complex dataset in the real world?
Comparing EF5 with ADO.NET, does EF5 requires significantly more resources such as memory?
For those who have tried EF5 on a real world project with very large and complex dataset, are you happy with the performance so far?
As EF creates an abstraction over the data access. Usage of EF introduces number of additional steps to execute any query. But there are workarounds to reduce the cost. As MS is promoting this ORM, i believe they are doing alot for performance improvement as well. EF 6 beta is already out.
There is good article on performance of EF5 available on MSDN for this.
http://msdn.microsoft.com/en-us/data/hh949853
I will not use EF if lot of complex manipulations and iterations are required on the DBsets after populating the DBSets from the EF query.
Hope this helps.
EF is more than capable of handling large amounts of data. Just as in plain ADO.NET, you need to understand how to use it correctly. Its just as easy to write code in ADO.NET that performs poorly. Its also important to remember that EF is built on top of ADO.NET.
DBSets will be much slower with large amounts of data than a code first EF approach. Plain Datareaders could be faster if done correctly.
The correct answer is 'profile'. Code some large objects and profile the differences.
I did some research into this on EF 4.1 and some details might still apply though there have been upgrades in performance to EF5 to keep in mind.
ADO vs EF performance
My conclusions:
-You won't match ADO performance with a framework that has to generate the actual SQL statement dynamically from C# and turn that sql statement into a strongly typed object, there is simply too much going on, even with pre compiled and 'warmed' statements (and performance tests conclude this). This is a happy trade off for many who find it much easier to write and maintain Linq in C# than stored procs and mapping code.
-You can still use stored procedures with equal performance to ADO which to me is the perfect situation. You can write linq to sql in situations where it works, and as soon as you would like more performance, write the sproc and import it to EF to enjoy the best performance possible.
-There is no technical reason to avoid EF accept for the requirements of your project and possibly your teams knowledge and comfort level. Be sure to make sure it fits into your selected design pattern.. EF Anti Patterns to avoid
I hope this helps.

My Database Design skills stink. Where to seek remedy?

I have a web site that's been progressivelly expanding in both traffic and complexity of database design. I've always worked as a developer first & foremost, and never really been much of a DB administrator beyond what I need to do to get my code running. This needs to change - I need to improve efficiency on the database side of things.
To give a vague example, I'm looking for how to go about learning:
Optimising complex tables/relationships for performance/scaling
How to index efficiently. (At the moment I throw indexes on foreign keys, and that's about it)
General design principles for complex databases
Most of the resources I've found are either directed more towards the basics of SQL ("this is a SELECT query, a JOIN, etc") or focus primarily on performance issues outside the DB.
So, I know this is a little vague - but where should I look to ensure my database is designed in the most most efficient & integral manner possible?
Learn about data modeling. Choosing the right data structure is always a crucial first step, for programming in general and databases in particular. Performance cannot be "bolted" on top of a bad data structure! The ERwin Methods Guide is probably not a bad way to start learning about data modeling.
Learn how DBMSes organize data at the physical level. This will help you immensely in understanding how to "shape" your data for performance and how to effectively leverage many of the performance mechanisms modern DBMSes put at your disposal. Use The Index, Luke! is an excellent tutorial on the topic.
Learn how to efficiently access the database and make sure you really understand the client API that will be called from your code. Different APIs have their own idiosyncrasies, but they all share some common themes, such as parameter binding, query preparation and fetching. Even if you are "shielded" by an ORM from ever having to, say, bind parameters manually, this is still taking place "under the covers" and understanding it raises your ability to write performant code.
Measure, measure, measure. Modern information systems are immensely complex and even experts find themselves making incorrect assumptions, so don't rely on assumptions!
I would suggest some reading in performance tuning. It is very specialized depending on the database backend you use. BUt here are some books to consider:
SQl Server
http://www.amazon.com/Server-Query-Performance-Tuning-Distilled/dp/1590594215/ref=sr_1_2?s=books&ie=UTF8&qid=1334154710&sr=1-2
http://www.amazon.com/Performance-Tuning-Server-Dynamic-Management/dp/1906434476/ref=sr_1_12?s=books&ie=UTF8&qid=1334154710&sr=1-12
MySQL
http://www.amazon.com/High-Performance-MySQL-Optimization-ebook/dp/B0028N4W7Y/ref=sr_1_3?ie=UTF8&qid=1334154504&sr=8-3
Oracle
http://www.amazon.com/Oracle-Database-Release-Performance-Techniques/dp/0071780262/ref=sr_1_2?s=books&ie=UTF8&qid=1334154909&sr=1-2
General performance Tuning
http://www.amazon.com/SQL-Performance-Tuning-Peter-Gulutzan/dp/0201791692/ref=sr_1_18?s=books&ie=UTF8&qid=1334154964&sr=1-18
First and foremost, I'd recommend learning how to use EXPLAIN and what its output means. Run it on your most common queries and study the output. Are the queries using sensible indexes? Are they using indexes at all? Queries that look very simple at a glance might end up being quite costly.
Next, I'd suggest finding your slowest queries. Postgres (for example) has a feature that allows you to log the SQL source for all queries that take longer than N seconds to run. Are they slow because they're unindexed, very complex, or operating on a huge amount of data?
Third, I'd look at the number of times a particular query is run. Are you using the database to store static data, and hitting a table over and over again to grab a record that never changes? You could probably cache the result somewhere.

Tracing ORM performance

This isn't a question of "which is the fastest ORM", nor is it a question on "how to write good code with ORMs". This is the other side: the code's been written, it's gone live, several thousand users are hitting the application, but there's a perceived overall performance problem. A SQL Profiler trace can only be ran for a short amount of time: 5 mins gives several hundred thousand results.
The question is simply this: having used SQL Profiler to narrow down a number of slow queries (duration greater than a given amount of time), what techniques and solutions exist for tracing these SQL queries back into the problematic component? A releated question is that if a specific area is slow, how can we identify the SQL that this area is executing so it can be suitably filtered in SQL Profiler?
The background to this is that we have a rather large application with a fairly complex table structure, and is currently based around data-access via stored procedures. If a SQL performance problem arises, it's usually case of pulling out SQL profiler, find out if there's anything slow (filter by duration) or if a the area being complained about is slow (filter by stored procedure), and tune the stored procedures (or the schema - through indexing).
Now there's a push to move our code over from a mostly-sproc solution to a mostly-ORM solution, however the big push against the move is how performance problems, if they arise, can be traced back to problematic code. I've read around and it seems that more often than not, it may involve third-party tools (ORM tracing utilities like NHProf or .NET tracing utils like dottrace) that we'd need to install on the server. Now whether additional tools can be installed on a live environment is another question, so if things like this can be performed without additional tools, then that may be a bonus.
I'm mostly interested in solutions with SQL Server 2008, but it's probably generic enough for any RDBMS. As far as the ORM tech, on this I have no specific focus as nothing's currently in use, so be interested to hear how techniques differ (or are common) twixt nHibernate, fluent-nhibernate and Entity Framework. Other ORMs are welcome though if they offer something else :-)
I've read through How to find and fix performance problems (...), and I think the issue is simply the section on there that says "isolate". A problem that is easily reproducible only on a live system is going to be difficult to isolate. The figures I quoted in para 2 are figures the types of volumes that we can get from a profile as well...
If you have real-world experience of ORM tracing on live, so much the better :-)
Update, 2016-10-21: Just for completeness, we eventually solved this for NHibernate by writing code, and overriding NHibernate methods. Full details in this other SO question I asked: NHibernate and Interceptors - measuring SQL round trip times. I expect this will be a similar approach for many different ORMs.
There exists profilers for ORM tools, like UberProf. It finds out which SQL statements that are generated by the ORM can be problematic.
Like the select n+1 problem, for instance. These kind of tools might give you an indication of which ORM query statements result in poor SQL code, and perhaps even how you could improve them.
We had a Java/Hibernate app with issues, so we used SET CONTEXT_INFO with a different value. If we saw, say, 0x14 on the same SPID just before a WTF query, we could narrow it to module x.
Not being a Java guy, I don't know exactly what they did, and of course it may not apply to .net. IIRC you have to be careful about when connections are opened/closed
We could also control the client load at this time so we didn't have too much superfluous traffic.
YMMV of course, but it may be useful
I just found these which could be useful too
Temporary tables, sessions and logging in SQL Server?
Why is my CONTEXT_INFO() empty?

Optimizations employed by ORM's

I'm teaching Java EE, especially JPA, Spring and Spring MVC. As I have not so much experience in large projects, it is difficult to know what to present to students about optimisation of ORM.
At the present time, I present some classic optimisation tricks:
prepared statements (most of ORM implicitely use them by default)
first and second-level caches
"write first, optimize later"
it is possible to switch off ORM and send SQL commands directly to the database for very frequent, specialized and costly requests
Is there any other point the community see about other way to optimize ORM ? I'm especially interested by DAO patterns...
From the point of developer, there are following optimization cases he must deal with:
Reduce chattiness between ORM and DB. Low chatiness is important, since each roundtrip between ORM and database implies network interaction, and thus its length varies between 0.1 and 1ms at least - independently of query complxity (note that may be 90% of queries are normally fairly simple). Particular case is SELECT N+1 problem: if processing of each row of some query result requires an additional query to be executed (so 1 + count(...) queries are executed at total), developer must try to rewrite the code in such a way that nearly constant count of queries is executed. CRUD sequence batching and future queries are other examples of optimization reducing the chattines (described below).
Reduce query complexity. Usually ORM is helpless here, so this is solely a developer's headache. But APIs allowing to execute SQL commands directly are frequently intended to be used also in this case.
So I can enlist few more optimizations:
Future queries: an API allowing to delay an execution of query until the moment when its result will be necessary. If there are several future queries scheduled at this moment, they're executed alltogether as a single batch. So the main benefit of this is reduction of # of roundtrip to database (= reduction of chattiness between ORM and DB). Many ORMs implement this, e.g. NHibernate.
CRUD sequence batching: nearly the same, but when INSERT, UPDATE and DELETE statements are batched together to reduce the chattines. Again, implemented by many ORM tools.
Combination of above two cases - so-called "generalized batching". AFAIK, so far this is implemented only by DataObjects.Net (the ORM my team works on).
Asynchronous generalized batching: if batch requires no immediate reply, it is executed asynchronously (but certainly, in sync with other batches sent by the same session, i.e. underlying connection is anyway used synchronously). Brings noticeable benefits when there are lots of CRUD statements: the code modifying persistent entities is executed in parallel with DB-side operation. AFAIK, no ORM implements this optimization so far.
All these cases fit under "write first, optimize later" rule (or "express intention first, optimize later").
Another well-known optimization-related API is prefetch API ("Prefetch paths"). The idea behind is to fetch a graph of objects that is expected to be processed further with minimal count of queries (or, better, in minimal time). So this API addresses "SELECT N+1" problem. Again, this part is normally expected to be implemented in any serious ORM product.
All above optimizations are safe from the point of transaction isolation - i.e. they don't break it. Caching-related optimizations normally aren't safe from this point: you must carefully configure caching to ensure you won't get stale objects when getting actual content is important (e.g. on security checks or on some real-time interaction). There are lots of techniques here, starting from usage of built-in caches in finishing with integration with distributed caches (memcached, etc.). Any approach solving the problem is good here; personally I would expect an open API allowing to integrate any with cache I prefer.
P.S. I'm a .NET fanboy, as well as one of DataObjects.Net and ORMeter.NET developers. So I don't know how exactly similar features are implemented in Java, but I'm familiar with the range of available solutions.
How about N+1 queries for collections? For example, see here: ORM Select n + 1 performance; join or no join
Regarding Spring and Spring MVC, you might find this interesting.
It's in C++, not Java, but it shows how to reduce UI source code w.r.t. Spring by an order of magnitude.
Lazy-loading is via proxys is probably one of the killer features in ORM's.
Additionally, Hibernate is also able to proxy out requests like object.collection.count and optimizes them, so instead of the whole collection being retrieved only a SELECT Count(*) is issues.
You mentioned the DAO pattern, but many in the JPA camp are saying the pattern is dead (I think the Hibernate guys have blogged on this, but I can't remember the link). Have a look at Spring Roo to see how they add the ORM-related functionality directly to the domain model via static methods.
For a set of JPA optimizations, and their resulting improvements, see,
http://java-persistence-performance.blogspot.com/2011/06/how-to-improve-jpa-performance-by-1825.html

Performance gains using straight ado.net vs an ORM?

would i get any performance gains if i replace my data access part of the application from nhiberate to straight ado.net ?
i know that NHibernate uses ado.net at its core !
Short answer:
It depends on what kind of operations you perform. You probably would get a performance improvement if you write good SQL, but in some cases you might get worse performance since you lose the NHibernate caching etc.
Long answer:
As you mention, NHibernate sits on top of ADO.NET and provides a layer of abstraction. This makes your life easier in many ways, but as all layers of abstraction it has a certain performance cost.
The main case where you probably would see a performance benefit is when you are operating on many objects at once, such as updating many entities or fetching a large amount of entities. This is because of the work that the NHibernate session does to keep track of which objects are modified etc. My experience is that the performance of NHibernate degrades significantly as the amount of entities in the session grows.
NHibernate has a lot of ways to improve performance and if you really know it well, you can get it to perform quite close to ADO.NET. However, if you are not that familiar with it, you can easilly shoot yourself in the foot, performance-wise. (Select N+1 problem, etc.)
There are some situations where you could actually get worse performance when switching form NHibernate to straight ADO.NET. This is because of the fact that the NHibernate abstraction layer introduces some features that can improve performance, such as caching. NHibernate also includes functionality for optimizing the generated SQL for the current database management system. For example, if you are using SQL Server it might generate slightly different SQL than if you are using Oracle.
It is worth mentioning that it does not have to be an all or nothing kind of situation. You could use NHibernate for the 90% of your database access for which it works great, and then use straight SQL for the 10% where you do complex queries, batch inserts/updates etc.

Resources