Hibernate generated query is slow in application but fast when executed manually - spring

I'm using Spring DATA JPA with Hibernate + PostgreSQL, and I generate q query like this:
SELECT A.field1 AS field1_10_,
A.field2 AS field2_10_,
A.field3 AS field3_10_,
...
A.field10 as field10_10_
FROM mytable A
WHERE (A.field1 BETWEEN 1 AND 2)
AND ((cast(A.field1 AS varchar(255))||cast(A.field2 AS varchar(255))) IN
(SELECT (cast(B.field1 AS varchar(255))||cast(max(B.field2) AS varchar(255)))
FROM mytable B
WHERE B.field1 BETWEEN 1 AND 2
GROUP BY B.field2))
LIMIT 20
All A and B fields are numeric.
If I copy that query and execute it directly in Postgres (using Pgadmin), the result comes in less than a second. But in my application, it takes about a minute for Hibernate to retrieve the result. I have Hibernate statistics activated and the gross of the time is running JDBC statements.
I'm using Spring pagination, what generates a count query using the same specification. This query runs fast (less that a second):
SELECT count(*) AS col_0_0_
FROM mytable A
WHERE (A.field1 BETWEEN 1 AND 2)
AND ((cast(A.field1 AS varchar(255))||cast(A.field2 AS varchar(255))) IN
(SELECT (cast(B.field1 AS varchar(255))||cast(max(B.field2) AS varchar(255)))
FROM mytable B
WHERE B.field1 BETWEEN 1 AND 2
GROUP BY B.field2))
Any idea?
Update: I've just found out that the generatedstatement executed by PostgreSQL is the following:
SELECT A.field1 AS field1_10_,
A.field2 AS field2_10_,
A.field3 AS field3_10_,
...
A.field10 as field10_10_
FROM mytable A
WHERE (A.field1 BETWEEN ? AND ?)
AND ((cast(A.field1 AS varchar(255))||cast(A.field2 AS varchar(255))) IN
(SELECT (cast(B.field1 AS varchar(255))||cast(max(B.field2) AS varchar(255)))
FROM mytable B
WHERE B.field1 BETWEEN ? AND ?
GROUP BY B.field2))
LIMIT $1
Please note that the replacement character in the BETWEEN conditions is the question mark ? but in the limit is the dollar along with the parameter order $1. The limit is automatically added by Spring pagination, if I remove it the query is fast.
Update 2: I've just found out that the problem happens only when specifying a limit of 1, with bigger values all works as expected. Weird.

There are multiple possible causes.
JPA will fetch all rows before returning you the result. Make sure PGAdmin does the same by scrolling to the very end of the result before stopping your timer.
JPA (and in some cases Spring) will convert the ResultSet you get from the database into entities or some other Java objects. If you do non-trivial stuff in constructors or getters, this might cause a performance penalty. Also if your session is really big this might as well cause performance problems. These should become obvious, once you attach a profiler.
The query has to get created from a Specification (in your case), this might take some time as well. Again: attach a profiler to identify this.

Related

Oracle11g select query with pagination

I am facing a big performance problem when trying to get a list of objects with pagination from an oracle11g database.
As far as I know and as much as I have checked online, the only way to achieve pagination in oracle11g is the following :
Example : [page=1, size=100]
SELECT * FROM
(
SELECT pagination.*, rownum r__ FROM
(
select * from "TABLE_NAME" t
inner join X on X.id = t.id
inner join .....
where ......
order
) pagination
WHERE rownum <= 200
)
WHERE r__ > 100
The problem in this query, is that the most inner query fetching data from the table "TABLE_NAME" is returning a huge amount of data and causing the overall query to take 8 seconds (there are around 2 Million records returned after applying the where clause, and it contains 9 or 10 join clause).
The reason of this is that the most inner query is fetching all the data that respects the where clause and then the second query is getting the 200 rows, and the third to exclude the first 100 to get the second pages' data we need.
Isn't there a way to do that in one query, in a way to fetch the second pages' data that we need without having to do all these steps and cause performance issues?
Thank you!!
It depends on your sorting options (order by ...): database needs to sort whole dataset before applying outer where rownum<200 because of your order by clause.
It will fetch only 200 rows if you remove your order by clause. In some cases oracle can avoid sort operations (for example, if oracle can use some indexes to get requested data in the required order). Btw, Oracle uses optimized sorting operations in case of rownum<N predicates: it doesn't sort full dataset, it just gets top N records instead.
You can investigate sort operations deeper using sort trace event: alter session set events '10032 trace name context forever, level 10';
Furthermore, sometimes it's better to use analytic functions like
select *
from (
select
t1.*
,t2.*
,row_number()over([partition by ...] order by ...) rn
from t1
,t2
where ...
)
where rn <=200
and rn>=100
because in some specific cases Oracle can transform your query to push sorting and sort filter predicates to the earliest possible steps.

Different results in Parallel Execution - Oracle

In my company's application there is a query in oracle using parallel execution (configured to 4 servers), it wasn't me who built it, but the developer put it that way for performance.
The query makes joins between views and tables and the weirdest thing is: sometimes it returns 11k results (incorrect), sometimes 27k results (correct).
After much research I found out that if I removed this parallel thing, it always returns the correct number: 27k. And if I increase the number of server to 6 or 7, it always returns the incorrect number: 11k.
The layout of the query is like this:
SELECT /*+ PARALLEL(NAME, 4) */ * FROM(
SELECT DISTINCT COLUMNS
FROM VIEW
JOIN TABLE1 ON (....)
JOIN TABLE2 ON (....)
JOIN TABLE3 ON (....)
ORDER BY 3
) NAME
Anyone has any idea why? I don't know much about this subject.

Forcing Oracle to do distinct last

I have a quite complicated view (using several layers of views across several database links) which takes a second to return all of it's rows. But, when I ask for distinct rows, it takes considerably more time. I stopped waiting after 4 minutes.
To make my self as clear as possible:
select a, b from compicated_view; -- takes 1 sec (returns 6 rows)
select distinct a, b from compicated_view; -- takes at least 4 minutes
I find that pretty weird, but hey, that's how it is. I guess Oracle messed something up when planing that query. Now, is there a way to force Oracle to first finish the select without distinct, and then do a "select distinct *" on the results? I looked into optimizer hints, but I can't find anything about hinting the order in which distinct is applied (this is first time I'm optimizing a query, obviously :-/).
I'm using Oracle SQl Developer on Oracle 10g EE.
Try:
SELECT DISTINCT A,B FROM (
SELECT A,B FROM COMPLICATED_VIEW
WHERE rownum > 0 );
this forces to materialize the subquery and prevents from view merging/predicate pushing, and likely from changing the original plan of the view.
You may also try NO_MERGE hint:
SELECT /*+ NO_MERGE(alias) */
DISTINCT a,b
FROM (
SELECT a,b FROM COMPLICATED_VIEW
) alias
Since you haven't posted details... try the following:
SELECT DISTINCT A,B
FROM
(SELECT A,B FROM COMPLICATED_VIEW);

Write a nested select statement with a where clause in Hive

I have a requirement to do a nested select within a where clause in a Hive query. A sample code snippet would be as follows;
select *
from TableA
where TA_timestamp > (select timestmp from TableB where id="hourDim")
Is this possible or am I doing something wrong here, because I am getting an error while running the above script ?!
To further elaborate on what I am trying to do, there is a cassandra keyspace that I publish statistics with a timestamp. Periodically (hourly for example) this stats will be summarized using hive, once summarized that data will be stored separately with the corresponding hour. So when the query runs for the second time (and consecutive runs) the query should only run on the new data (i.e. - timestamp > previous_execution_timestamp). I am trying to do that by storing the latest executed timestamp in a separate hive table, and then use that value to filter out the raw stats.
Can this be achieved this using hive ?!
Subqueries inside a WHERE clause are not supported in Hive:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries
However, often you can use a JOIN statement instead to get to the same result:
https://karmasphere.com/hive-queries-on-table-data#join_syntax
For example, this query:
SELECT a.KEY, a.value
FROM a
WHERE a.KEY IN
(SELECT b.KEY FROM B);
can be rewritten to:
SELECT a.KEY, a.val
FROM a LEFT SEMI JOIN b ON (a.KEY = b.KEY)
Looking at the business requirements underlying your question, it occurs that you might get more efficient results by partitioning your Hive table using hour. If the data can be written to use this factor as the partition key, then your query to update the summary will be much faster and require fewer resources.
Partitions can get out of hand when they reach the scale of millions, but this seems like a case that will not tease that limitation.
It will work if you put in :
select *
from TableA
where TA_timestamp in (select timestmp from TableB where id="hourDim")
EXPLANATION : As > , < , = need one exact figure in the right side, while here we are getting multiple values which can be taken only with 'IN' clause.

hibernate oracle rownum issue

SELECT * FROM (
select *
from tableA
where ColumnA = 'randomText'
ORDER BY columnL ASC
) WHERE ROWNUM <= 25
on execution of this query, due to some Oracle optimization, the query takes about 14 minutes to execute . If I remove the where clause , the query executes in seconds. most of the columns of the table have indexes on them, including the ones mentioned above. I do not have much flexibility on the structure of the query as I use hibernate.
This query returns results instantly too, with the correct result:
SELECT *
FROM (
select *
from tableA,
dual
where ColumnA = 'randomText'
ORDER BY columnL ASC
) WHERE ROWNUM <= 25
is there something I can do, using hibernate?
UPDATE: I use EntityManager.createQuery(), and I use setMaxResults(25) and setFirstResult() too. the query above is what hibernate's query looks like, upon observation of logs
I don't get the explain plans exactly matched to your queries, but it seems oracle using a different index for the two queries.
Can you create an index containing columnA and columnL?
If you have an index only containing columnA, you MIGHT be able to drop that without a large effect on performance of other queries.
An alternative would be to add a hint to use the index used in the faster query. But this would require you to use native sql.
this means you are using hibernate/jpa? If so, I guess you are using the EntityManager.createNativeQuery() to create the query? Try removing your where-restriction and use the .setMaxResults(25) on the Query instead.
Anyways, why do you need the outer-select? Wouldn't
select *
from tableA
where ColumnA = 'randomText'
AND ROWNUM <= 25
ORDER BY columnL ASC
produce the desired results?

Resources