Is there a way around Hibernate calling OracleStatement.getColumnIndex for each row and column? - performance

I am puzzled by Hibernate’s behavior when loading ResultSets with many columns via HQL: it seems like OracleStatement.getColumnIndex(String) is called over and over again, not just once for every column at the beginning of the load but once for every column when retrieving each and every row of data.
In case of a ResultSet with many columns this seems to take a significant amount of time (in our case about 43 % of the total time, see attached screenshot of a VisualVM profile). Our HQL loads a table with many columns, join fetching two other tables with to-many relations (both having lots of columns, too). Unfortunately we cannot restrict the columns to be loaded, because the task is to preload all objects into a Coherence cache on startup of the system, so the objects have to be complete.
As far as I can tell the problem arises because hydrating the mapped result objects of an HQL query from the ResultSet does use nullSafeGet() for each column which takes String arguments to identify the column and therefore has to call getColumnIndex().
(When loading the data from a ResultSet of an SQL query one can use getString(int), getTimestamp(int) etc. instead of String based versions to avoid this issue.)
We are still using an old version of Hibernate (3.6) but the source on github indicates that the same behavior is still present, as nullSafeGet() is still String based instead of taking an index (or object containing the index) which then could be precomputed once at the beginning of the load.
Is there something that I am missing?
Is there a reason for calling getColumnIndex() for each column of each row of data over and over again?
Is there a way around this which does not involve rewriting the query into SQL and using the index based accessors to build up the mapped objects manually?
The only similar issue I was able to find on the internet was this question which has no answer.
The query there had many columns, too.
Thanks for any help!
Thorsten

This problem is addressed in Hibernate 6, which switches from reading JDBC ResultSet by name to reading by position.

Related

Persisting data between benchmarks using BenchmarkDotNet

I'm trying to benchmark two databases (different types, different locations).
My select benchmarks are working fine, but I'm having trouble with my inserts, updates and deletes.
I tried saving the key (GUID) I use for the insert in a class field of type Queue<string> but when my update benchmark is run this field is reset and thus empty, the same in my delete benchmark.
I don't want to call the delete statement after the insert statement in my insert benchmark or an insert statement in my delete benchmark because then the time results are off.
How to handle this situation?
I thought of creating a list of GUIDs in the [GlobalSetup] but when I change the number of iterations I need to increase or decrease this list.
Any advice will be much appreciated.
I've fixed this myself by saving the keys in a text file on GlobalCleanup and reading this file on GlobalSetup.

Spring read query concurrent executinon in multiple threads

I Have a Spring boot project where I would like to execute a specific query in a database from x different threads while preventing different threads from reading the same database entries. So far I was able to run the query in multiple threads but had no luck on finding a way to "split" the read load. My code so far is as follows:
#Async
#Transactional
public CompletableFuture<Book> scanDatabase() {
final List<Book> books = booksRepository.findAllBooks();
return CompletableFuture.completedFuture(books);
}
Any ideas on how should I approach this?
There are plenty of ways to do that.
If you have a numeric field in the data that is somewhat random you can add a condition to your where clause like ... and some_value % :N = :i with :N being a parameter for the number of threads and :i being the index of the specific thread (0 based).
If you don't have a numeric field you can create one by using a hash function and apply it on some other field in order to turn it into something numeric. See your database specific documentation for available hash functions.
You could use an analytic function like ROW_NUMBER() to create a numeric value to be use in the condition.
You could query the number of rows in a first query and then query a the right Slice using Spring Datas pagination feature.
And many more variants.
They all have in common that the complete set of rows must not change during the processing, otherwise you may get rows queried multiple times or not at all.
If you can't guarantee that you need to mark the records to be processed by a thread before actually selecting them, for example by marking them in an extra field or by using a FOR UPDATE clause in your query.
And finally there is the question if this is really what you need.
Querying the data in multiple threads probably doesn't make the querying part faster since it makes the query more complex and doesn't speed up those parts that typically limit the throughput: network between application and database and I/O in the database.
So it might be a better approach to select the data with one query and iterate through it, passing it on to a pool of thread for processing.
You also might want to take a look at Spring Batch which might be helpful with processing large amounts of data.

Hibernate Mapping for Oracle RAW column

We're using Hibernate and not sure how to map properties to RAW columns in Oracle table (specifically that have indexes on them).
It's a known fact that String can't be used for entity property value - Hibernate isn't able to prepend the HEXTORAW Oracle function call in order to make index on a column to be used (cause without this Oracle implicitly appends RAWTOHEX to column value itself).
However, it's not clear whether using byte[] as an entity property value is solving this issue or not. Since JDBC driver is sending binary data directly - it's logical to assume that index would be used - cause there is no any need to execute neither HEXTORAW nor RAWTOHEX functions.
However, I'm not sure how to prove it (except putting million of records and performing some benchmarks).
I tried to search similar questions but without success.
Does anyone has knowledge about that?
Thanks in advance,
Final answer - yes, mapping byte[] works.
Tested that on a table with millions of records and primary key of RAW type.
It took ~2 minutes to lookup a record by PK if using String.
With byte[] record was found immediately.

Hibernate fetching quick if setMaxResults is set

There is a simple sql query which fetches only ONE record. The database is oracle.
Following is the simple query:
select *
from APPEALCASE appealcase0_
where appealcase0_.caseNumber='BAXXXXX00' and appealcase0_.DELETED_FLAG='N'
When I fetch this row using hibernate the response time is 500ms which is slow since it has to be really quick and withing 10ms. But when I set the MaxResults in the hibernate query object to 1(one) the response time improved to 15ms.
Though my issue is fixed I'm still puzzled how setting MaxResults to 1 improved the response time drastically. Can anyone explain me this?
Well, that's quite logical to me. Since you tell Oracle to retrieve at most one record, it stops searching for more as soon as it finds one. Whereas if you don't, it scans the whole table (or index) to find all the records matching the search criteria.
What you should check, though, is if you have an index defined on the caseNumber column.

How can I query two databases and combine the results using LINQ?

I need to pull values in similar tables from two different databases, combine them and then write the output to a CSV file. Can I just create a second connection string in the Properties file and explicitly pass the DataContext the second connection string for the other LINQ query? Or do I need to do something else? The tables are nearly identical except for an ID used for some criteria.
I've never used LINQ before but it seems the easier way to handle this insead of having to write SQL by hand.
if the schema matches both of the databases, then you should be able to just create second DataContext instance (giving it the second connection string as an argument). The LINQ to SQL doesn't check in any way whether you use "the right" database - if it has the right columns & tables it will work.
However, LINQ doesn't automatically work with multiple databases in any "smart" way, so it will need to download the content to the memory before doing any operations that involve multiple data sources. You can still use single LINQ query to do this - but you have to be careful about what part of it is running using in memory data. (By the way, you can use extension methods like "ToList" to explicitly say - get the data from the databse at this point).
You also mention that the tables are nearly identical except for an ID in some case - does that mean that primary/foreign keys are different? In that case, some autogenerated relations may not work. If it means that there is a different column name, then you could manually edit the generated schema to contain both columns and then use only the right one. However, this feels a bit odd - unless you're planning doing some manual edits to the schema, you could as well just generate two very similar schemas.

Resources