What are the advantages of using Spring JPA Specifications over direct queries - spring

I am currently working on a project where I have to retrieve some rows from the database based on some filters (I also have to paginate them).
My solution was to make a function that generates the queries and to query the database directly (it works and it's fast)
When I presented this solution to the senior programmer he told me this is going to work but it's not a long-term solution and I should rather use Spring Specifications.
Now here comes my questions :
Why is Spring Specifications better than generating a query?
Is a query generated by Spring Specifications faster than a normal query?
Is it that big of a deal to use hard-coded queries ?
Is there a better approach to this problem ?
I have to mention that the tables in the database don't store a lot of data, the biggest one (which will be queried the least) has around 134.000 rows after 1 year since the application was launched.
The tables have indexes on the rows that we will use to filter.

A "function that generates the queries" sounds like building query strings by concatenating smaller parts based on conditions. Even presuming this is a JPQL query string and not a native SQL string that would be DB dependent, there are several problems:
you lose the IDEs help if you ever refactor your entities
not easy to modularize and reuse parts of the query generation logic (eg. if you want to extract a method that adds the same conditions to a bunch of different queries with different joins and aliases for the tables)
easy to break the syntax of the query by a typo (eg. "a=b" + "and c=d")
more difficult to debug
if your queries are native SQL then you also become dependent on a database (eg. maybe you want your integration tests to run on an in-memory DB while the production code is on a regular DB)
if in your project all the queries are generated in a way but yours is generated in a different way (without a good reason) then maintenance of the will be more difficult
JPA frameworks generate optimized queries for most common use cases, so generally speaking you'll get at least the same speed from a Specification query as you do from a native one. There are times when you need to write native SQL to further optimize a query but these are exceptional cases.
Yes, it's bad practice that makes maintenance a nightmare

Related

ActiveRecord (CDbCriteria) vs QueryBuilder?

I have to make some filters, such as get persons who are in a given department, and I was wondering about the best way to do it.
Some of them are going to require the join of multiple tables.
Does anyone know about the main differences between CDbCriteria and Query Builder? I would particularly like to know about the compatibility with databases.
I found this in the Yii documentation about Query Builder:
It offers certain degree of DB abstraction, which simplifies migration to different DB platforms.
Is it the same for the CDbCriteria objects? Is it better?
The concept of CDbCriteria is used when working with Yii's active record (AR) abstraction (which is usually all of the time). AR requires that you have created models for the various tables in your database.
Query builder a very different way to access the database; in effect it is a structured wrapper that allows you to programmatically construct an SQL query instead of just writing it out as a string (as an added bonus it also offers a degree of database abstraction as you mention).
In a typical application there would be little to no need to use query builder because AR already provides a great deal of functionality and it also offers the same degree of database abstraction.
In some cases you might want to run a very specific type of query that is not convenient or performant to issue through AR. You then have two options:
If the query is fixed or almost fixed then you can simply issue it through DAO; in fact the query builder documentation mentions that "if your queries are simple, it is easier and faster to directly write SQL statements".
If the query needs to be dynamically constructed then query builder becomes a good fit for the job.
So as you can see, query builder is not all that useful most of the time. Only if you want to write very customized and at the same time dynamically constructed queries does it make sense to use it.
The example feature that you mention can and should be implemented using AR.

Best strategy for retrieving large dynamically-specified tables on an ASP.NET page

Looking for a bit of advice on how to optimise one of our projects. We have a ASP.NET/C# system that retrieves data from a SQL2008 data and presents it on a DevExpress ASPxGridView. The data that's retrieved can come from one of a number of databases - all of which are slightly different and are being added and removed regularly. The user is presented with a list of live "companies", and the data is retrieved from the corresponding database.
At the moment, data is being retrieved using a standard SqlDataSource and a dynamically-created SQL SELECT statement. There are a few JOINs in the statement, as well as optional WHERE constraints, again dynamically-created depending on the database and the user's permission level.
All of this works great (honest!), apart from performance. When it comes to some databases, there are several hundreds of thousands of rows, and retrieving and paging through the data is quite slow (the databases are already properly indexed). I've therefore been looking at ways of speeding the system up, and it seems to boil down to two choices: XPO or LINQ.
LINQ seems to be the popular choice, but I'm not sure how easy it will be to implement with a system that is so dynamic in nature - would I need to create "definitions" for each database that LINQ could access? I'm also a bit unsure about creating the LINQ queries dynamically too, although looking at a few examples that part at least seems doable.
XPO, on the other hand, seems to allow me to create a XPO Data Source on the fly. However, I can't find too much information on how to JOIN to other tables.
Can anyone offer any advice on which method - if any - is the best to try and retro-fit into this project? Or is the dynamic SQL model currently used fundamentally different from LINQ and XPO and best left alone?
Before you go and change the whole way that your app talks to the database, have you had a look at the following:
Run your code through a performance profiler (such as Redgate's performance profiler), the results are often surprising.
If you are constructing the SQL string on the fly, are you using .Net best practices such as String.Concat("str1", "str2") instead of "str1" + "str2". Remember, multiple small gains add up to big gains.
Have you thought about having a summary table or database that is periodically updated (say every 15 mins, you might need to run a service to update this data automatically.) so that you are only hitting one database. New connections to databases are quiet expensive.
Have you looked at the query plans for the SQL that you are running. Today, I moved a dynamically created SQL string to a sproc (only 1 param changed) and shaved 5-10 seconds off the running time (it was being called 100-10000 times depending on some conditions).
Just a warning if you do use LINQ. I have seen some developers who have decided to use LINQ write more inefficient code because they did not know what they are doing (pulling 36,000 records when they needed to check for 1 for example). This things are very easily overlooked.
Just something to get you started on and hopefully there is something there that you haven't thought of.
Cheers,
Stu
As far as I understand you are talking about so called server mode when all data manipulations are done on the DB server instead of them to the web server and processing them there. In this mode grid works very fast with data sources that can contain hundreds thousands records. If you want to use this mode, you should either create the corresponding LINQ classes or XPO classes. If you decide to use LINQ based server mode, the LINQServerModeDataSource provides the Selecting event which can be used to set a custom IQueryable and KeyExpression. I would suggest that you use LINQ in your application. I hope, this information will be helpful to you.
I guess there are two points where performance might be tweaked in this case. I'll assume that you're accessing the database directly rather than through some kind of secondary layer.
First, you don't say how you're displaying the data itself. If you're loading thousands of records into a grid, that will take time no matter how fast everything else is. Obviously the trick here is to show a subset of the data and allow the user to page, etc. If you're not doing this then that might be a good place to start.
Second, you say that the tables are properly indexed. If this is the case, and assuming that you're not loading 1,000 records into the page at once and retreiving only subsets at a time, then you should be OK.
But, if you're only doing an ExecuteQuery() against an SQL connection to get a dataset back I don't see how Linq or anything else will help you. I'd say that the problem is obviously on the DB side.
So to solve the problem with the database you need to profile the different SELECT statements you're running against it, examine the query plan and identify the places where things are slowing down. You might want to start by using the SQL Server Profiler, but if you have a good DBA, sometimes just looking at the query plan (which you can get from Management Studio) is usually enough.

Hibernate Criteria vs HQL: which is faster? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have been reading some anwers, but i'm still confused. ¿Why? because the differences that you have mentioned do not relate with the performance. they are related with easy use.(Objetc(criteria) and SQL(hql)). But I would like to know if "criteria" is slower than hql for some reason.
I read this in another anwers
"There is a difference in terms of performance between HQL and criteriaQuery, everytime you fire a query using criteriaQuery, it creates a new alias for the table name which does not reflect in the last queried cache for any DB. This leads to an overhead of compiling the generated SQL, taking more time to execute." by Varun Mehta.
This is very close BUT! i read in another website(http://gary-rowe.com/agilestack/tag/hibernate/) This is no longer the case with Hibernate 3.3 and above(please read this: 9) Hibernate is slow because the SQL generated by the Criteria interface is not consistent)
I have done some test trying to find out the differences but both generate qry's and it doesn't change the alias to the table.
I'm very confused. If somebody knows the main reason please, could you help us. Thanks
I'm the guy who wrote the Hibernate 3 query translator back in 2004, so I know something about how it works.
Criteria, in theory should have less overhead than an HQL query (except for named queries, which I'll get to). This is because Criteria doesn't need to parse anything. HQL queries are parsed with an ANTLR-based parser and then the resulting AST is turned into SQL. However, with HQL/JPAQL you can define named queries, where the SQL is generated when the
SessionFactory starts up. In theory, named queries have less overhead than Criteria.
So, in terms of SQL-generation overhead we have:
Named HQL/JPAQL Query - SQL generation happens only once.
Criteria - No need to parse before generating.
(non-named) HQL/JPAQL Query - Parse, then generate.
That said, choosing a query technique based on the overhead of parsing and SQL generation is probably a mistake in my opinion. This overhead is typically very small when compared to performing a real query on a real database server with real data. If this overhead does actually show up when profiling the app then maybe you should switch to a named query.
Here are the things I consider when deciding between Criteria and HQL/JPAQL:
First, you have to decide if you're OK with having a dependency on Hibernate-proprietary API in your code. JPA doesn't have Criteria.
Criteria is really good at handling many optional search parameters such as you might find on a typical web page with a multi-parameter 'search form'. With HQL, developers tend to tack on where clause expressions with StringBuilder (avoid this!). With Criteria, you don't need to do that. Hardik posted similar opinions.
HQL/JPAQL can be used for most other things, because the code tends to be smaller and easier for developers to understand.
Really frequent queries can be turned into named queries if you use HQL. I prefer to do this later, after some profiling.
I am working on millions of record. I found HQL is much more faster than Criteria. Criteria lags a lot in performance.
If you are dealing with lot of data then go for HQL.
Other Advantages I think for Criteria over HQL:
Writing HQL makes the code messy.
It doesn't look like writing clean Object Oriented Code anymore.
While writing Criteria Queries, IDE suggests us with Intellisense, so there is less chance of committing mistakes except when we write something like variable names in double quotes.
I mostly prefer Criteria Queries for dynamic queries. For example it is much easier to add some ordering dynamically or leave some parts (e.g. restrictions) out depending on some parameter.
On the other hand I'm using HQL for static and complex queries, because it's much easier to understand/read HQL. Also, HQL is a bit more powerful, I think, e.g. for different join types.
JPA and Hibernate - Criteria vs. JPQL or HQL
You are right and not.
Result list is retrieved from database or cache with the org.hibernate.loader.Loader class. When cache is not enabled, prepared statement is build using the Dialect object which is created in SessionFactoryImp. So, statements either initialized per list call.
Besides, low-level query is generated automatically. Basically, it is an aid, but there may be a case when a specific query will be more effective when manually written.

What are the advantages of LINQ to SQL?

I've just started using LINQ to SQL on a mid-sized project, and would like to increase my understanding of what advantages L2S offers.
One disadvantage I see is that it adds another layer of code, and my understanding is that it has slower performance than using stored procedures and ADO.Net. It also seems that debugging could be a challenge, especially for more complex queries, and that these might end up being moved to a stored proc anyway.
I've always wanted a way to write queries in a better development environment, are L2S queries the solution I've been looking for? Or have we just created another layer on top of SQL, and now have twice as much to worry about?
Advantages L2S offers:
No magic strings, like you have in SQL queries
Intellisense
Compile check when database changes
Faster development
Unit of work pattern (context)
Auto-generated domain objects that are usable small projects
Lazy loading.
Learning to write linq queries/lambdas is a must learn for .NET developers.
Regarding performance:
Most likely the performance is not going to be a problem in most solutions. To pre-optimize is an anti-pattern. If you later see that some areas of the application are to slow, you can analyze these parts, and in some cases even swap some linq queries with stored procedures or ADO.NET.
In many cases the lazy loading feature can speed up performance, or at least simplify the code a lot.
Regarding debuging:
In my opinion debuging Linq2Sql is much easier than both stored procedures and ADO.NET. I recommend that you take a look at Linq2Sql Debug Visualizer, which enables you to see the query, and even trigger an execute to see the result when debugging.
You can also configure the context to write all sql queries to the console window, more information here
Regarding another layer:
Linq2Sql can be seen as another layer, but it is a purely data access layer. Stored procedures is also another layer of code, and I have seen many cases where part of the business logic has been implemented into stored procedures. This is much worse in my opinion because you are then splitting the business layer into two places, and it will be harder for developers to get a clear view of the business domain.
Just a few quick thoughts.
LINQ in general
Query in-memory collections and out-of-process data stores with the same syntax and operators
A declarative style works very well for queries - it's easier to both read and write in very many cases
Neat language integration allows new providers (both in and out of process) to be written and take advantage of the same query expression syntax
LINQ to SQL (or other database LINQ)
Writing queries where you need them rather than as stored procs makes development a lot faster: there are far fewer steps involved just to get the data you want
Far fewer strings (stored procs, parameter names or just plain SQL) involved where typos can be irritating; the other side of this coin is that you get Intellisense for your query
Unless you're going to work with the "raw" data from ADO.NET, you're going to have an object model somewhere anyway. Why not let LINQ to SQL handle it for you? I rather like being able to just do a query and get back the objects, ready to use.
I'd expect the performance to be fine - and where it isn't, you can tune it yourself or fall back to straight SQL. Using an ORM certainly doesn't remove the need for creating the right indexes etc, and you should usually check the SQL being generated for non-trivial queries.
It's not a panacea by any means, but I vastly prefer it to either making SQL queries directly or using stored procs.
I must say they are what you have been looking for. It takes some time getting used to it, but once you do you can't think of going back (at least for me).
Regarding linq vs. stored procedures, you can have poor performance on either if you build it wrong. I moved to linq to sql some stored procedures of a client that were awfully coded, so the time dropped from 20secs (totally unaceptable for a web app) to < 1 sec. And much much less code then the stored procedure solution.
Update 1: Also you get a lot of flexibility, as you can limit the columns of what you are getting and it will actually only retrieve that. On the stored procedure solution you have to define a procedure for each column set you are getting, even if the underlying queries are the same.
Just as an update, here are some links on the future of LINQ to SQL:
What is the Future of Linq to SQL
Has Microsoft confirmed their stance on LINQ to SQL end-of-life?
Is LINQ to SQL Dead or Alive?
As a comment in the last link states, LINQ to SQL isn't going to go away, just not "improved upon" at least by Microsoft. Take these comments and posts as you will, just be cautious in your development plans.
We switched over to LINQ2Entity over the Entity Framework environment recently. Before, we had basic SQLadapters. Since the database we are working with is rather small, I cannot comment on the performance of LINQ.
I must admit though, writing queries have become a lot easier, and the addition of Entities, does enable strong typing.

Is Hibernate good for batch processing? What about memory usage?

I have a daily batch process that involves selecting out a large number of records and formatting up a file to send to an external system. I also need to mark these records as sent so they are not transmitted again tomorrow.
In my naive JDBC way, I would prepare and execute a statement and then begin to loop through the recordset. As I only go forwards through the recordset there is no need for my application server to hold the whole result set in memory at one time. Groups of records can be feed across from the database server.
Now, lets say I'm using hibernate. Won't I endup with a bunch of objects representing the whole result set in memory at once?
Hibernate does also iterate over the result set so only one row is kept in memory. This is the default. If it to load greedily, you must tell it so.
Reasons to use Hibernate:
"Someone" was "creative" with the column names (PRXFC0315.XXFZZCC12)
The DB design is still in flux and/or you want one place where column names are mapped to Java.
You're using Hibernate anyway
You have complex queries and you're not fluent in SQL
Reasons not to use Hibernate:
The rest of your app is pure JDBC
You don't need any of the power of Hibernate
You have complex queries and you're fluent in SQL
You need a specific feature of your DB to make the SQL perform
Hibernate offers some possibilities to keep the session small.
You can use Query.scroll(), Criteria.scroll() for JDBC-like scrolling. You can use Session.evict(Object entity) to remove entities from the session. You can use a StatelessSession to suppress dirty-checking. And there are some more performance optimizations, see the Hibernate documentation.
Hibernate as any ORM framework is intended for developing and maintaining systems based on object oriented programming principal. But most of the databases are relational and not object oriented, so in any case ORM is always a trade off between convenient OOP programming and optimized/most effective DB access.
I wouldn't use ORM for specific isolated tasks, but rather as an overall architectural choice for application persistence layer.
In my opinion I would NOT use Hibernate, since it makes your application a whole lot bigger and less maintainable and you do not really have a chance of optimizing the generated sql-scripts in a quick way.
Furthermore you could use all the SQL functionality the JDBC-bridge supports and are not limited to the hibernate functionality. Another thing is that you have the limitations too that come along with each layer of legacy code.
But in the end it is a philosophical question and you should do it the way it fits you're way of thinking best.
If there are possible performance issues then stick with the JDBC code.
There are a number of well known pure SQL optimisations which
which would be very difficult to do in Hibernate.
Only select the columns you use! (No "select *" stuff ).
Keep the SQl as simple as possible. e.g. Dont include small reference tables like currency codes in the join. Instead load the currency table into memory and resolve currency descriptions with a program lookup.
Depending on the DBMS minor re-ordering of the SQL where predicates can have a major effect on performance.
If you are updateing/inserting only commit every 100 to 1000 updates. i.e. Do not commit every unit of work but keep some counter so you commit less often.
Take advantage of the aggregate functions of your database. If you want totals by DEPT code then do it in the SQL with " SUM(amount) ... GROUP BY DEPT ".

Resources