JPA getResultList much slower than SQL query - performance

I have a (oracle)table with about 5 million records and a quite complex query which returns about 5000 records of it in less than 5 seconds with a database tool like toad.
However when I ran the query via entityManager(eclipseLink) the query runs for minutes...
I'm probably too naive in the implementation.
I do:
Query query = em.createNativeQuery(complexQueryString, Myspecific.class);
... setParameter...
List result = query.getResultList();
The complexQueryString starts with a "SELECT *".
What kinds of optimization do I have?
May be one is only to select the fields I really need later. Some explanation would be great.

I had a similar problem (I tried to read 800000 records with 8 columns in less than one second) and the best solution was to fall back to jdbc. The ResultSet was created and read really 10 times faster than using JPA, even when doing a native query.
How to use jdbc: normally in the J2EE-Servers a JDBC-DataSource can be injected as #Resource.
An explanation: I think the OR-Mappers try to create and cache objects so that changes can easily detected later. This is a very substantial overhead, that can't be recognized if you are just working with single entities.

Query.setFetchSize(...) may help a bit. It tells the jdbc driver how many rows to return in one chunk. Just call it before getResultList();
query.setFetchSize(5000);
query.getResultList();

Related

Hibernate pagination or batch processing

Question: How can I process (read in) batches of records 1000 at a time and ensure that only the current batch of 1000 records is in memory? Assume my primary key is called 'ID' and my table is called Customer.
Background: This is not for user pagination, it is for compiling statistics about my table. I have limited memory available, therefore I want to read my records in batches of 1000 records at a time. I am only reading in records, they will not be modified. I have read that StatelessSession is good for this kind of thing and I've heard about people using ScrollableResults.
What I have tried: Currently I am working on a custom made solution where I implemented Iterable and basically did the pagination by using setFirstResult and setMaxResults. This seems to be very slow for me but it allows me to get 1000 records at a time. I would like to know how I can do this more efficiently, perhaps with something like ScrollableResults. I'm not yet sure why my current method is so slow; I'm ordering by ID but ID is the primary key so the table should already be indexed that way.
As you might be able to tell, I keep reading bits and pieces about how to do this. If anyone can provide me a complete way to do this it would be greatly appreciated. I do know that you have to set FORWARD_ONLY on ScrollableResults and that calling evict(entity) will take an entity out of memory (unless you're doing second level caching, which I do not yet know how to check if I am or not). However I don't see any methods in the JavaDoc to read in say, 1000 records at a time. I want a balance between my lack of available memory and my slow network performance, so sending records over the network one at a time really isn't an option here. I am using Criteria API where possible. Thanks for any detailed replies.
May useing of ROWNUM feature of oracle will hepl you.
Lets say we need to fetch 1000 rows(pagesize) of table CUSTOMERS and we need to fetch second page(pageNumber)
Creating and Calling some query like this may be the answer
select * from
(select rownum row_number,customers.* from Customer
where rownum <= pagesize*pageNumber order by ID)
where row_number >= (pagesize -1)*pageNumber
Load entities as read-only.
For HQL
Query.setReadOnly( true );
For Criteria
Criteria.setReadOnly( true );
http://docs.jboss.org/hibernate/orm/3.6/reference/en-US/html/readonly.html#readonly-api-querycriteria
Stateless session quite different with State-Session.
Operations performed using a stateless session never cascade to associated instances. Collections are ignored by a stateless session
http://docs.jboss.org/hibernate/orm/3.3/reference/en/html/batch.html#batch-statelesssession
Use flash() and clear() to clean up session cache.
session.flush();
session.clear();
Question about Hibernate session.flush()
ScrollableResults should works that you expect.
Do not forget that each item that you loaded takes memory space unless you evict or clear and need to check it really works well.
ScrollableResults in Mysql J/Connecotr works fake, it loads entire rows, but I think oracle connector works fine.
Using Hibernate's ScrollableResults to slowly read 90 million records
If you find alternatives, you may consider to use this way
1. Select PrimaryKey of every rows that you will process
2. Chopping them into PK chunk
3. iterate -
select rows by PK chunk (using in-query)
process them what you want

Spring JDBCTemplate vs Plain JDBC for inserting large numbers of records

We have to insert 2 millions of records across multiple tables and right now we are writing into a CSV file and using db2 import to load into database.
We wanted to change this logic to some kind of JDBC. While looking into multiple options, I am confused with Spring JDBC template and plain JDBC.
Let us say I wanted to insert 1 million records into 10 tables, each of which will have 100 thousand, and all these are simple JDBC statements (not prepared statements because I don't know which table I am dealing with at runtime).
Whatever system we choose will need to handle inserting up to 15 million records for a peak request.
Which framework will be better?
If you want to move a lot of data, then using JDBC (or any library building on top of JDBC) may be a bad choice, compared to using bulk copy tools (like db2import). JDBC is going to be orders of magnitude slower, because
JDBC is a very chatty protocol, and
usually bulk copy tools relax constraints during the copy process.
The difference in time can be extreme: what takes the bulk copy tool 10 minutes can take hours using JDBC. You'll want to create a prototype and do some timings and be certain about what kind of performance you'll get before you commit to something like this.
If you're already using Spring, then you may as well use JdbcTemplate. It makes things a bit easier, and in some simple cases means you need not use the JDBC API directly yourself. Essentially, JdbcTemplate is a very thin wrapper around JDBC that removes some of your annoying boiler-plate code.
As skaffman said, if you are already using Spring then your choice is probably JdbcTemplate. Specifically you may want to look at the batchUpdate() method. Here is a pretty good example of how it works. I've used it to insert a couple hundred thousand rows quickly with great success.
Consider JdbcSession from jcabi-jdbc. It's as simple as JDBC should be, for example (inserting a million records):
JdbcSession session = new JdbcSession(source);
for (int i = 0; i < 1000000; ++i) {
session.sql("INSERT INTO foo (number) VALUES (?)")
.set(i)
.insert(new VoidHandler());
}
That's it.

Hibernate with Oracle JDBC issue

I have a select query which takes 10 min to complete as it runs thru 10M records. When I run thru TOAD or program using normal JDBC connection I get the results back, but while running a Job which uses Hibernate as ORM does not return any results. It just hangs up ...even after 45 min? Please help
Are you saying you trying to retrieve 10M records using an ORM like hibernate?
If this is the case you have one big problems, you need to redesign your application because this is not going to work, and about why it hangs up, well, I bet is because it runs out of memory.
Have you enabled SQL output for Hibernate? You need to set hibernate.show_sql to true in order to do that.
Once that's done, compare the generated SQL with the one you're been running through TOAD. Are they exactly the same or not?
I'm going to venture a guess here and say they're not because once SQL is generated Hibernate does nothing fancy - connection is taken from a pool; prepared statement is created and executed - so it should be no different from JDBC.
Thus the question most likely is how can your HQL be optimized. If you need any help with that you'll have to post the HQL in question as well as appropriate mappings / table schemas. Running explain on query would help as well.

LinqToSQL DateTime filters?

I've got a linqtosql query filtering and ordering by a datecolumn that takes 20 seconds to run. When I run the generated sqlquery directly on the DB it returns in 0 seconds.
var myObjs = DB.Table
.Where(obj => obj.DateCreated>=DateTime.Today)
.OrderByDescending(obj => obj.DateCreated);
The table has only 100,000 records and the DateTime column is indexed.
Just another in a long line of linqtosql performance grievances. But this one is SOO bad that I'm sure I must be doing something wrong.
I suspect the difference is that although running the generated query only takes 0 seconds, that's because it's not actually showing you all the results if you're using something like Enterprise Manager. Just fetching (and deserializing) all the data for 100,000 results could well take a significant amount of time, but your manual query is probably only showing you the first 20 hits or something similar.
If you run the same SQL in .NET and use a DataReader to fetch all the data, how long does it take then?
If you run server with profiling turned on, how long does it say the query took to execute from LINQ to SQL?
Thanks guys...
The problem was mine, not linq's. For brevity I shortened the query in the question but there was actually another filter that had been applied to a NON indexed column. Adding the index solved the problem.
What through me for a loop though was that, as Jon Skeet suggested, running the query in Sql Mgmt studio gave a false sense of confidence because the query was paged, and very quickly returned the top 20 rows, leaving me to think linq was to blame. So the index problem only showed up in linq and not in sql mgmt studio.
I can't see anything wrong in your query. It would be great to see the T-SQL generated by Linq. Did you try that?

Entity framework and performance

I am trying to develop my first web project using the entity framework, while I love the way that you can use linq instead of writing sql, I do have some severe performance issuses. I have a lot of unhandled data in a table which I would like to do a few transformations on and then insert into another table. I run through all objects and then inserts them into my new table. I need to do some small comparisons (which is why I need to insert the data into another table) but for performance tests I have removed them. The following code (which approximately 12-15 properties to set) took 21 seconds, which is quite a long time. Is it usually this slow, and what might I do wrong?
DataLayer.MotorExtractionEntities mee = new DataLayer.MotorExtractionEntities();
List<DataLayer.CarsBulk> carsBulkAll = ((from c in mee.CarsBulk select c).Take(100)).ToList();
foreach (DataLayer.CarsBulk carBulk in carsBulkAll)
{
DataLayer.Car car = new DataLayer.Car();
car.URL = carBulk.URL;
car.color = carBulk.SellerCity.ToString();
car.year = //... more properties is set this way
mee.AddToCar(car);
}
mee.SaveChanges();
You cannot create batch updates using Entity Framework.
Imagine you need to update rows in a table with a SQL statement like this:
UPDATE table SET col1 = #a where col2 = #b
Using SQL this is just one roundtrip to the server. Using Entity Framework, you have (at least) one roundtrip to the server loading all the data, then you modify the rows on the client, then it will send it back row by row.
This will slow things down especially if your network connection is limited, and if you have more than just a couple of rows.
So for this kind of updates a stored procedure is still a lot more efficient.
I have been experimenting with the entity framework quite a lot and I haven't seen any real performance issues.
Which row of your code is causing the big delay, have you tried debugging it and just measuring which method takes the most time?
Also, the complexity of your database structure could slow down the entity framework a bit, but not to the speed you are saying. Are there some 'infinite loops' in your DB structure? Without the DB structure it is really hard to say what's wrong.
can you try the same in straight SQL?
The problem might be related to your database and not the Entity Framework. For example, if you have massive indexes and lots of check constraints, inserting can become slow.
I've also seen problems at insert with databases which had never been backed-up. The transaction log could not be reclaimed and was growing insanely, causing a single insert to take a few seconds.
Trying this in SQL directly would tell you if the problem is indeed with EF.
I think I solved the problem. I have been running the app locally, and the database is in another country (neighbor, but never the less). I tried to load the application to the server and run it from there, and it then only took 2 seconds to run instead of 20. I tried to transfer 1000 records which took 26 seconds, which is quite an update, though I don't know if this is the "regular" speed for saving the 1000 records to the database?

Resources