Pagination with JPA and Oracle database - oracle

This week i was looking into a sorting issue in a WebApp. Sorting a table in the browser by a selected column did not work properly. It turned out that in the application, we used JPAs CriteriaQuery to create the query and then create a TypedQuery for the pagination as follows:
CriteriaBuilder cb = em.getCriteriaBuilder();
CriteriaQuery<SomeEntity> q = cb.createQuery(SomeEntity.class);
Root<SomeEntity> c = q.from(SomeEntity.class);
q.select(c);
...
q.orderBy(cb.asc(c.get("SomeColumn")));
TypedQuery<> query = em.createQuery(q);
query.setFirstResult(pageIx * pageSize);
query.setMaxResults(pageSize);
...
This is pretty much how the documentation suggests to create queries (see here).
In the logs i saw that this generates an SQL query like this:
select * from (
select lots_of_columns from some_view order by selected_column
) where rownum <= 50
Since Oralce 10 the ordering of the enclosed select has no effect according to the documentation and, if i remember correctly, this also makes sense according to relational algebra. We use Oracle 12c.
So my question is, how am i supposed to takle this correctly?
I have found that offset and fetch should be used but i couldn't find how to tell JPA to generate the SQL accordingly. Also i have found a post that suggested to add the id to the order by clause, however this did not solve the problem either.
Thank you in advance for any thoughts and hints on the topic.

Related

Filtering query with R2DBC

I need do a filtering query with some query parameter. I need that if the query parameter is null, the condition of the query (= or LIKE, for example) is not evaluated and return me everything. I'm using R2DBC and I don't find a way to solve it.
If you are using Spring Data R2dbc, besides the above raw SQL query, you can use R2dbcOperations to compose Criteria by condition freely.
The following is an example.
template
.select(Post.class)
.matching(
Query
.query(where("title").like("%" + name + "%"))// extract where here you can assemble the query condition freely.
.limit(10)
.offset(0)
)
.all();
Additionally using R2dbcRepository and convention-based method, try to use a default value in the like(eg. set the null to empty string "" in like) to save work to determine if it is a null value in sql.
A general prepared statement which would work might be:
SELECT *
FROM yourTable
WHERE col = ? or ? IS NULL;
In the event that you bind a NULL value to the ? from your obfuscation layer, the WHERE clause would always be true, returning all records in the table.
If you prefer doing this with a "static SQL statement" (meaning you use a single SQL string for all possible bind values), then, in Oracle, it's probably optimal to use NVL() to profit from an Oracle optimiser feature, as explained in this article, irrespective of whether you're using R2DBC:
SELECT *
FROM t
WHERE col = nvl(:bind, col)
However, a query like yours is often best implemented using dynamic SQL, such as supported by a third party library like jOOQ:
Flux<TRecord> result =
Flux.from(ctx
.selectFrom(T)
.where(bind == null ? noCondition() : T.COL.eq(bind))
);
You can obviously also do this yourself, directly with R2DBC and your own dynamic SQL library, or any other such library.
Disclaimer: I work for the company behind jOOQ.

Why does Hibernate ignore setMaxResults?

I am using a server side pagination for one of my tables using a CriteriaQuery and a TypedQuery and set following values:
typedQuery.setFirstResult(0);
typedQuery.setMaxResults(100);
Unfortunately, in the generated SQL query which is executed on Oracle DB, i never see the ROWNUM condition. I added also an ORDER BY in my TypedQuery, but still, the query does a simple select withouut limiting on the DB the results.
As a result i am getting following warning HHH000104: firstResult/maxResults specified with collection fetch; applying in memory! . In other words, Hibernate does the pagination on memory as it is not performed on the DB. For this warning i read following article https://vladmihalcea.com/fix-hibernate-hhh000104-entity-fetch-pagination-warning-message/ but before spliting my query into two queries (retrieve id and then retrieve data for those id), i thought of giving setMaxResults . Still i wonder why isn't the generated query as expected with a ROWNUM.
Furtehr information:
DB: Oracle 18
Dialog: org.hibernate.dialect.Oracle12cDialect
Hibernate: 5.3.15
JDK: 11
You have to understand that the first/max results apply on entity level if you select an entity. If you fetch join collection attributes or use an entity graph for collection attributes you change the cardinality of the rows returned by JDBC for each entity i.e. every row for the main entity row is duplicated for every collection element. The effect of that is, that Hibernate can't do pagination with ROWNUM anymore which is why you are not seeing it in the query. If you remove the fetch join you will see the use of ROWNUM.
Having said that, this is a perfect use case for Blaze-Persistence.
Blaze-Persistence is a query builder on top of JPA which supports many of the advanced DBMS features on top of the JPA model. The pagination support it comes with handles all of the issues you might encounter.
It also has a Spring Data integration, so you can use the same code like you do now, you only have to add the dependency and do the setup: https://persistence.blazebit.com/documentation/entity-view/manual/en_US/index.html#spring-data-setup
Blaze-Persistence has many different strategies for pagination which you can configure. The default strategy is to inline the query for ids into the main query. Something like this:
select u
from User u
left join fetch u.notes
where u.id IN (
select u2.id
from User u2
order by ...
limit ...
)
order by ...
When joining data, parent will be duplicated n times. For example:
select p from Post p join p.comments
If post have 20 comments under one post, then this one post will be returned 20 times with 20 different comments.
Limiting rows in this case doesn't make sense because actual number of returned post won't be equal to page size. In other words limiting page to 20 records will return only one post.

LINQ and Generated sql

suppose my LINQ query is like
var qry = from c in nwEntitiesContext.CategorySet.AsEnumerable()
let products = this.GetProducts().WithCategoryID(c.CategoryID)
select new Model.Category
{
ID = c.CategoryID,
Name = c.CategoryName,
Products = new Model.LazyList<Core.Model.Product>(products)
};
return qry.AsQueryable();
i just want to know what query it will generate at runtime....how to see what query it is generating from VS2010 IDE when we run the code in debug mode....guide me step by step.
There is not much to see here - it will just select all fields from the Category table since you call AsEnumerable thus fetching all the data from the Category table into memory. After that you are in object space. Well, depending on what this.GetProducts() does - and my guess it makes another EF query fetching the results into memory. If that's the case, I would strongly recommend you to post another question with this code and the code of your GetProducts method so that we can take a look and rewrite this in a more optimal way. (Apart from this, you are projecting onto a mapped entity Model.Category which again won't (and should not) work with Linq-to-Entities.)
Before reading into your query I was going to recommend doing something like this:
string sqlQueryString = ((ObjectQuery)qry).ToTraceString();
But that won't work since you are mixing Linq-to-Entities with Linq-to-objects and you will actually have several queries executed in case GetProducts queries EF. You can separate the part with your EF query and see the SQL like this though:
string sqlString = nwEntitiesContext.CategorySet.ToTraceString();
but as I mentioned earlier - that would just select everything from the Categories table.
In your case (unless you rewrite your code in a drastic way), you actually want to see what queries are run against the DB when you execute the code and enumerate the results of the queries. See this question:
exact sql query executed by Entity Framework
Your choices are SQL Server Profiler and Entity Framework Profiler. You can also try out LinqPad, but in general I still recommend you to describe what your queries are doing in more detail (and most probably rewrite them in a more optimal way before proceeding).
Try Linqpad
This will produce SELECT * FROM Categories. Nothing more. Once you call AsEnumerable you are in Linq-to-objects and there is no way to get back to Linq-to-entities (AsQueryable doesn't do that).
If you want to see what query is generated use SQL Profiler or any method described in this article.

How to optimize the running of Oracle orderby native query running in EJB (Oc4j)?

I've the following problem with EJB and Oracle database.
I've some native SQL query deployed in Oc4j that returns more than 21k rows from Oracle DB. when I run the query against Oracle DB I get JOOM (out of memory) exception.
And because the requirements was to include pagination for the result set, so we decided to use em.setMaxResult, em.setFirstResult to return only 10 rows a time.
Using the EntityManager to implement the pagination put us in some problem As Later, it was required to sort the result returned, but the whole result not just the 10 rows returned by setMaxResult()! We found that, to put the clause ORDER BY xxxx in the native query makes the query performance became too bad.
So, we are considering doing the pagination in the Database layer (using Oracle rownum or any other technique).
Later, we recognized that, If we use em.clear() we might be able to avoid the JOOM exception by making something like:
define the result list
while database has more records
{
use entityManager get next 10 records and add them to the result list
entityManager.clear();
}
return result list
So, we could implement the paging on the Servlet side (using session.getAttribute("all_result").sublist(from, to)) and thus we can do the sort using Java as opposite to SQL sort.
Provided code for pagination, might help you.
em.createQuery("Select o from Entity o where o.id > :lastId order by o.id");
query.setParameter("lastId", previousResult.get(previousResult.size()-1).getId());
query.setMaxResults(10);

What is the best way to integrate Solr as an index with Oracle as a storage DB?

I have an Oracle database with all the "data", and a Solr index where all this data is indexed. Ideally, I want to be able to run queries like this:
select * from data_table where id in ([solr query results for 'search string']);
However, one key issue arises:
Oracle WILL NOT allow more than 1000 items in the array of items in the "in" clause (BIG DEAL, as the list of objects I find is very often > 1000 and will usually be around the 50-200k items)
I have tried to work around this using a "split" function that will take a string of comma-separated values, and break them down into array items, but then I hit the 4000 char limit on the function parameter using SQL (PL/SQL is 32k chars, but it's still WAY too limiting for 80,000+ results in some cases)
I am also hitting performance issues using a WHERE IN (....), I am told that this causes a very slow query, even when the field referenced is an indexed field?
I've tried making recursive "OR"s for the 1000-item limit (aka: id in (1...1000 or (id in (1001....2000) or id in (2001....3000))) - and this works, but is very slow.
I am thinking that I should load the Solr Client JARs into Oracle, and write an Oracle Function in Java that will call solr and pipeline back the results as a list, so that I can do something like:
select * from data_table where id in (select * from table(runSolrQuery('my query text')));
This is proving quite hard, and I am not sure it's even possible.
Things that I can't do:
Store full data in Solr (security +
storage limits)
User Solr as
controller of pagination and ordering
(this is why I am fetching data from
the DB)
So I have to cook up a hybrid approach where Solr really act like the full-text search provider for Oracle. Help! Has anyone faced this?
Check this out:
http://demo.scotas.com/search-sqlconsole.php
This product seems to do exactly what you need.
cheers
I'm not a Solr expert, but I assume that you can get the Solr query results into a Java collection. Once you have that, you should be able to use that collection with JDBC. That avoids the limit of 1000 literal items because your IN list would be the result of a query, not a list of literal values.
Dominic Brooks has an example of using object collections with JDBC. You would do something like
Create a couple of types in Oracle
CREATE TYPE data_table_id_typ AS OBJECT (
id NUMBER
);
CREATE TYPE data_table_id_arr AS TABLE OF data_table_id_typ;
In Java, you can then create an appropriate STRUCT array, populate this array from Solr, and then bind it to the SQL statement
SELECT *
FROM data_table
WHERE id IN (SELECT * FROM TABLE( CAST (? AS data_table_id_arr)))
Instead of using a long BooleanQuery, you can use TermsFilter (works like RangeFilter, but the items doesn't have to be in sequence).
Like this (first fill your TermsFilter with terms):
TermsFilter termsFilter = new TermsFilter();
// Loop through terms and add them to filter
Term term = new Term("<field-name>", "<query>");
termsFilter.addTerm(term);
then search the index like this:
DocList parentsList = null;
parentsList = searcher.getDocList(new MatchAllDocsQuery(), searcher.convertFilter(termsFilter), null, 0, 1000);
Where searcher is SolrIndexSearcher (see java doc for more info on getDocList method):
http://lucene.apache.org/solr/api/org/apache/solr/search/SolrIndexSearcher.html
Two solutions come to mind.
First, look into using Oracle specific Java extensions to JDBC. They allow you to pass in an actual array/list as an argument. You may need to create a stored proc (it has a been a while since I had to do this), but if this is a focused use case, it shouldn't be overly burdensome.
Second, if you are still running into a boundary like 1000 object limits, consider using the "rows" setting when querying Solr and leveraging it's inherent pagination feature.
I've used this bulk fetching method with stored procs to fetch large quantities of data which needed to be put into Solr. Involve your DBA. If you have a good one, and use the Oracle specific extensions, I think you should attain very reasonable performance.

Resources