Why does Hibernate ignore setMaxResults? - oracle

I am using a server side pagination for one of my tables using a CriteriaQuery and a TypedQuery and set following values:
typedQuery.setFirstResult(0);
typedQuery.setMaxResults(100);
Unfortunately, in the generated SQL query which is executed on Oracle DB, i never see the ROWNUM condition. I added also an ORDER BY in my TypedQuery, but still, the query does a simple select withouut limiting on the DB the results.
As a result i am getting following warning HHH000104: firstResult/maxResults specified with collection fetch; applying in memory! . In other words, Hibernate does the pagination on memory as it is not performed on the DB. For this warning i read following article https://vladmihalcea.com/fix-hibernate-hhh000104-entity-fetch-pagination-warning-message/ but before spliting my query into two queries (retrieve id and then retrieve data for those id), i thought of giving setMaxResults . Still i wonder why isn't the generated query as expected with a ROWNUM.
Furtehr information:
DB: Oracle 18
Dialog: org.hibernate.dialect.Oracle12cDialect
Hibernate: 5.3.15
JDK: 11

You have to understand that the first/max results apply on entity level if you select an entity. If you fetch join collection attributes or use an entity graph for collection attributes you change the cardinality of the rows returned by JDBC for each entity i.e. every row for the main entity row is duplicated for every collection element. The effect of that is, that Hibernate can't do pagination with ROWNUM anymore which is why you are not seeing it in the query. If you remove the fetch join you will see the use of ROWNUM.
Having said that, this is a perfect use case for Blaze-Persistence.
Blaze-Persistence is a query builder on top of JPA which supports many of the advanced DBMS features on top of the JPA model. The pagination support it comes with handles all of the issues you might encounter.
It also has a Spring Data integration, so you can use the same code like you do now, you only have to add the dependency and do the setup: https://persistence.blazebit.com/documentation/entity-view/manual/en_US/index.html#spring-data-setup
Blaze-Persistence has many different strategies for pagination which you can configure. The default strategy is to inline the query for ids into the main query. Something like this:
select u
from User u
left join fetch u.notes
where u.id IN (
select u2.id
from User u2
order by ...
limit ...
)
order by ...

When joining data, parent will be duplicated n times. For example:
select p from Post p join p.comments
If post have 20 comments under one post, then this one post will be returned 20 times with 20 different comments.
Limiting rows in this case doesn't make sense because actual number of returned post won't be equal to page size. In other words limiting page to 20 records will return only one post.

Related

Get only one row from jdbcTemplate query for performance optimization

jdbcTemplate.query(getQuery(id),
rs -> {
if(rs.next()) {
mainDTO.setSim(rs.getString("sim"));
mainDTO.setImei(rs.getString("imei"));
}
});
I use above code fragment to retrieve data from database and getting more than 100 records. But for all the records, sim and imei numbers are same. other fields are different. When executing above code I can get sim and imei number from the first record itself. but query run on all over the records and hence it take more than 3 seconds to complete. here is the problem.
How I can stop retrieving other records after I got the value for sim and imei from the first record. I cant change the sql query as the documentation and need to do the optimization in java code itself.
how can I optimize this to perform within below 100 mills.
You have two choices, either limit using a SQL query or use JdbcTemplate#setMaxRows:
SQL
You need to edit the query including what columns are about to be selected and the table name:
SELECT * FROM table LIMIT 1
JDBC
Use JdbcTemplate#setMaxRows to configure the JdbcTemplate to return up to one row:
jdbcTemplate.setMaxRows(1);
I guess it mimics Statement#setMaxRows.

Pagination with JPA and Oracle database

This week i was looking into a sorting issue in a WebApp. Sorting a table in the browser by a selected column did not work properly. It turned out that in the application, we used JPAs CriteriaQuery to create the query and then create a TypedQuery for the pagination as follows:
CriteriaBuilder cb = em.getCriteriaBuilder();
CriteriaQuery<SomeEntity> q = cb.createQuery(SomeEntity.class);
Root<SomeEntity> c = q.from(SomeEntity.class);
q.select(c);
...
q.orderBy(cb.asc(c.get("SomeColumn")));
TypedQuery<> query = em.createQuery(q);
query.setFirstResult(pageIx * pageSize);
query.setMaxResults(pageSize);
...
This is pretty much how the documentation suggests to create queries (see here).
In the logs i saw that this generates an SQL query like this:
select * from (
select lots_of_columns from some_view order by selected_column
) where rownum <= 50
Since Oralce 10 the ordering of the enclosed select has no effect according to the documentation and, if i remember correctly, this also makes sense according to relational algebra. We use Oracle 12c.
So my question is, how am i supposed to takle this correctly?
I have found that offset and fetch should be used but i couldn't find how to tell JPA to generate the SQL accordingly. Also i have found a post that suggested to add the id to the order by clause, however this did not solve the problem either.
Thank you in advance for any thoughts and hints on the topic.

mybatis nested select vs. nested results

I'm a junior user of mybatis, I wonder the difference of nested select and nested results whether it's just simply like the difference between sub-query vs. join, especially in performance. Or it will do some optimization?
I used mybatis 3.4.7 version and oracle DB.
here is an example for reference:
private List<Post> posts;
<resultMap id="blogResult" type="Blog">
<collection property="posts" javaType="ArrayList" column="id"
ofType="Post" select="selectPostsForBlog"/>
</resultMap>
<select id="selectBlog" resultMap="blogResult">
SELECT * FROM BLOG WHERE ID = #{id}
</select>
<select id="selectPostsForBlog" resultType="Post">
SELECT * FROM POST WHERE BLOG_ID = #{id}
</select>
or
<select id="selectBlog" resultMap="blogResult">
select
B.id as blog_id,
B.title as blog_title,
B.author_id as blog_author_id,
P.id as post_id,
P.subject as post_subject,
P.body as post_body,
from Blog B
left outer join Post P on B.id = P.blog_id
where B.id = #{id}
</select>
<resultMap id="blogResult" type="Blog">
<id property="id" column="blog_id" />
<result property="title" column="blog_title"/>
<collection property="posts" ofType="Post">
<id property="id" column="post_id"/>
<result property="subject" column="post_subject"/>
<result property="body" column="post_body"/>
</collection>
</resultMap>
if there is still N+1 problem in nested select like sub-query?
do you have any advice or experience of which one performs better in a certain environment or condition? thanks a lot :).
First of all a slight terminology note. Subquery in SQL is a part of the query that is a query by itself, for example:
SELECT ProductName
FROM Product
WHERE Id IN (SELECT ProductId
FROM OrderItem
WHERE Quantity > 100)
In this case the following piece of the query is the subquery:
SELECT ProductId
FROM OrderItem
WHERE Quantity > 100
So you are using term "subquery" here incorrectly. In mybatis documentation the term nested select is used.
There are two ways to fetch associated entities/collections in mybatis. Here's relevant part of the documentation:
Nested Select: By executing another mapped SQL statement that returns
the complex type desired. Nested Results: By using nested result
mappings to deal with repeating subsets of joined results.
When nested select is used mybatis executes the main query first (in your case selectBlog) and then for every record it executes another select (hence the name nested select) to fetch associated Post entities.
When Nested results are used only one query is executed but it already has the association data joined. So mybatis maps the result to the object structure.
In your example single Blog entity is returned so when nested select is used two queries are executed, but in general case (if you would get the list of Blogs) you would hit N+1 problem.
Now let's deal with performance. All the following assumes that the queries are tuned (as in there are no missing indices), you are using connection pool, the database is collocated, basically speaking your system is tuned in all other regards.
Speaking of the performance there is no single correct answer and you milage may differ. You always need to test your particular workflows in your setup. Given so many factors affect performance like data distribution (think of max/min/arg posts each blog have), the size of the record in DB (think of number and size of the data fields in blog and post), the DB parameters (like disk type and speed, amount of memory available for dataset caching etc) there may no be a single answer only some generic observations that follow.
But we can understand the performance difference if we look at the cases on the ends of the performance spectrum. Like to see cases when nested select significantly outperforms join and vice versa.
For collection fetching join should be better in most cases because network latency to do N+1 request counts.
One case when nested select may be better is for one-to-many association when the record in the main table reference some other table and the cardinality of the other table is not large and the size of the record in the other table is large.
For example, let's consider Blog has a category property that references categories table and it may have one of these values Science, Fashion, News. And let's imagine the list of blogs is selected by some filter like keywords in the blog title. If the result contains let's say 500 items then most of the associated categories would be duplicates.
If we select them with join every record in the result set would contain Category data fields (and as a reminder most of them are duplicates and we have a lot of data in Category record).
If we select them using nested select we would do the query for fetch the category by category id for every record and here mybatis session cache comes to play. For the duration of the SqlSession every time mybatis executes the query it stores its result in the session cache so it does not execute repeating requests to the database but takes them from the cache. It means that after mybatis has retrieved some category by id for the first record it would reuse it for all other records in the recordset it processes.
In the above example we would do up to 4 requests to the database but the reduced amount of the data passed over the network may overweight the need to do 4 requests.

Is it a good idea to store and access an active query resultset in Coldfusion vs re-quering the database?

I have a product search engine using Coldfusion8 and MySQL 5.0.88
The product search has two display modes: Multiple View and Single View.
Multiple displays basic record info, Single requires additional data to be polled from the database.
Right now a user does a search and I'm polling the database for
(a) total records and
(b) records FROM to TO.
The user always goes to Single view from his current resultset, so my idea was to store the current resultset for each user and not have to query the database again to get (waste a) overall number of records and (waste b) a the single record I already queried before AND then getting the detail information I still need for the Single view.
However, I'm getting nowhere with this.
I cannot cache the current resultset-query, because it's unique to each user(session).
The queries are running inside a CFINVOKED method inside a CFC I'm calling through AJAX, so the whole query runs and afterwards the CFC and CFINVOKE method are discarded, so I can't use query of query or variables.cfc_storage.
So my idea was to store the current resultset in the Session scope, which will be updated with every new search, the user runs (either pagination or completely new search). The maximum results stored will be the number of results displayed.
I can store the query allright, using:
<cfset Session.resultset = query_name>
This stores the whole query with results, like so:
query
CACHED: false
EXECUTIONTIME: 2031
SQL: SELECT a.*, p.ek, p.vk, p.x, p.y
FROM arts a
LEFT JOIN p ON
...
LEFT JOIN f ON
...
WHERE a.aktiv = "ja"
AND
... 20 conditions ...
SQLPARAMETERS: [array]
1) ... 20+ parameters
RESULTSET:
[Record # 1]
a: true
style: 402
price: 2.3
currency: CHF
...
[Record # 2]
a: true
style: 402abc
...
This would be overwritten every time a user does a new search. However, if a user wants to see the details of one of these items, I don't need to query (total number of records & get one record) if I can access the record I need from my temp storage. This way I would save two database trips worth 2031 execution time each to get data which I already pulled before.
The tradeoff would be every user having a resultset of up to 48 results (max number of items per page) in Session.scope.
My questions:
1. Is this feasable or should I requery the database?
2. If I have a struture/array/object like a the above, how do I pick the record I need out of it by style number = how do I access the resultset? I can't just loop over the stored query (tried this for a while now...).
Thanks for help!
KISS rule. Just re-query the database unless you find the performance is really an issue. With the correct index, it should scales pretty well. When the it is an issue, you can simply add query cache there.
QoQ would introduce overhead (on the CF side, memory & computation), and might return stale data (where the query in session is older than the one on DB). I only use QoQ when the same query is used on the same view, but not throughout a Session time span.
Feasible? Yes, depending on how many users and how much data this stores in memory, it's probably much better than going to the DB again.
It seems like the best way to get the single record you want is a query of query. In CF you can create another query that uses an existing query as it's data source. It would look like this:
<cfquery name="subQuery" dbtype="query">
SELECT *
FROM Session.resultset
WHERE style = #SelectedStyleVariable#
</cfquery>
note that if you are using CFBuilder, it will probably scream Error at you for not having a datasource, this is a bug in CFBuilder, you are not required to have a datasource if your DBType is "query"
Depending on how many records, what I would do is have the detail data stored in application scope as a structure where the ID is the key. Something like:
APPLICATION.products[product_id].product_name
.product_price
.product_attribute
Then you would really only need to query for the ID of the item on demand.
And to improve the "on demand" query, you have at least two "in code" options:
1. A query of query, where you query the entire collection of items once, and then query from that for the data you need.
2. Verity or SOLR to index everything and then you'd only have to query for everything when refreshing your search collection. That would be tons faster than doing all the joins for every single query.

How to optimize the running of Oracle orderby native query running in EJB (Oc4j)?

I've the following problem with EJB and Oracle database.
I've some native SQL query deployed in Oc4j that returns more than 21k rows from Oracle DB. when I run the query against Oracle DB I get JOOM (out of memory) exception.
And because the requirements was to include pagination for the result set, so we decided to use em.setMaxResult, em.setFirstResult to return only 10 rows a time.
Using the EntityManager to implement the pagination put us in some problem As Later, it was required to sort the result returned, but the whole result not just the 10 rows returned by setMaxResult()! We found that, to put the clause ORDER BY xxxx in the native query makes the query performance became too bad.
So, we are considering doing the pagination in the Database layer (using Oracle rownum or any other technique).
Later, we recognized that, If we use em.clear() we might be able to avoid the JOOM exception by making something like:
define the result list
while database has more records
{
use entityManager get next 10 records and add them to the result list
entityManager.clear();
}
return result list
So, we could implement the paging on the Servlet side (using session.getAttribute("all_result").sublist(from, to)) and thus we can do the sort using Java as opposite to SQL sort.
Provided code for pagination, might help you.
em.createQuery("Select o from Entity o where o.id > :lastId order by o.id");
query.setParameter("lastId", previousResult.get(previousResult.size()-1).getId());
query.setMaxResults(10);

Resources