How to optimize the running of Oracle orderby native query running in EJB (Oc4j)? - performance

I've the following problem with EJB and Oracle database.
I've some native SQL query deployed in Oc4j that returns more than 21k rows from Oracle DB. when I run the query against Oracle DB I get JOOM (out of memory) exception.
And because the requirements was to include pagination for the result set, so we decided to use em.setMaxResult, em.setFirstResult to return only 10 rows a time.
Using the EntityManager to implement the pagination put us in some problem As Later, it was required to sort the result returned, but the whole result not just the 10 rows returned by setMaxResult()! We found that, to put the clause ORDER BY xxxx in the native query makes the query performance became too bad.
So, we are considering doing the pagination in the Database layer (using Oracle rownum or any other technique).
Later, we recognized that, If we use em.clear() we might be able to avoid the JOOM exception by making something like:
define the result list
while database has more records
{
use entityManager get next 10 records and add them to the result list
entityManager.clear();
}
return result list
So, we could implement the paging on the Servlet side (using session.getAttribute("all_result").sublist(from, to)) and thus we can do the sort using Java as opposite to SQL sort.

Provided code for pagination, might help you.
em.createQuery("Select o from Entity o where o.id > :lastId order by o.id");
query.setParameter("lastId", previousResult.get(previousResult.size()-1).getId());
query.setMaxResults(10);

Related

Why does Hibernate ignore setMaxResults?

I am using a server side pagination for one of my tables using a CriteriaQuery and a TypedQuery and set following values:
typedQuery.setFirstResult(0);
typedQuery.setMaxResults(100);
Unfortunately, in the generated SQL query which is executed on Oracle DB, i never see the ROWNUM condition. I added also an ORDER BY in my TypedQuery, but still, the query does a simple select withouut limiting on the DB the results.
As a result i am getting following warning HHH000104: firstResult/maxResults specified with collection fetch; applying in memory! . In other words, Hibernate does the pagination on memory as it is not performed on the DB. For this warning i read following article https://vladmihalcea.com/fix-hibernate-hhh000104-entity-fetch-pagination-warning-message/ but before spliting my query into two queries (retrieve id and then retrieve data for those id), i thought of giving setMaxResults . Still i wonder why isn't the generated query as expected with a ROWNUM.
Furtehr information:
DB: Oracle 18
Dialog: org.hibernate.dialect.Oracle12cDialect
Hibernate: 5.3.15
JDK: 11
You have to understand that the first/max results apply on entity level if you select an entity. If you fetch join collection attributes or use an entity graph for collection attributes you change the cardinality of the rows returned by JDBC for each entity i.e. every row for the main entity row is duplicated for every collection element. The effect of that is, that Hibernate can't do pagination with ROWNUM anymore which is why you are not seeing it in the query. If you remove the fetch join you will see the use of ROWNUM.
Having said that, this is a perfect use case for Blaze-Persistence.
Blaze-Persistence is a query builder on top of JPA which supports many of the advanced DBMS features on top of the JPA model. The pagination support it comes with handles all of the issues you might encounter.
It also has a Spring Data integration, so you can use the same code like you do now, you only have to add the dependency and do the setup: https://persistence.blazebit.com/documentation/entity-view/manual/en_US/index.html#spring-data-setup
Blaze-Persistence has many different strategies for pagination which you can configure. The default strategy is to inline the query for ids into the main query. Something like this:
select u
from User u
left join fetch u.notes
where u.id IN (
select u2.id
from User u2
order by ...
limit ...
)
order by ...
When joining data, parent will be duplicated n times. For example:
select p from Post p join p.comments
If post have 20 comments under one post, then this one post will be returned 20 times with 20 different comments.
Limiting rows in this case doesn't make sense because actual number of returned post won't be equal to page size. In other words limiting page to 20 records will return only one post.

How to tell my #query in spring jpa repository NOT to use prepared statement (lead to very slow queries)?

In my Spring Repository Class, I have the following query (kind of analytics query) running on a Postgresql 9.6 server :
#Query("SELECT d.id as departement_id, COUNT(m.id) as nbMateriel FROM Departement d LEFT JOIN d.sites s LEFT JOIN s.materiels m WHERE "
+ "(s.metier.id IN (:metier_id) OR :metier_id IS NULL) AND (s.entite.id IN (:entite_id) OR :entite_id IS NULL) "
+ "AND (m.materielType.id IN (:materielType_id) OR :materielType_id IS NULL) AND "
+ "(d.id= :departement_id OR :departement_id IS NULL) "
+ "AND m.dateLivraison is not null and (EXTRACT(YEAR FROM m.dateLivraison) < :date_id OR :date_id IS NULL) "
+ "AND ( m.estHISM =:estHISM OR :estHISM IS NULL OR m.estHISM IS NULL) "
+ "GROUP BY d.id")
List<Map<Long, Long>> countByDepartementWithFilter(#Param("metier_id") List<Long> metier_id,#Param("entite_id") List<Long> entite_id,#Param("materielType_id") List<Long> materielType_id,
#Param("departement_id") Long departement_id, #Param("date_id") Integer date_id,
#Param("estHISM") Boolean estHISM);
The problem is : this query is called several times with different combination of parameters, and after 5-6 calls, time execution go from 20 ms to 10 000 ms
From what I have read, what cause this is the use of prepared statements which is not suited to analytics queries, where there are number of parameters whose values can change a lot. And indeed, running the above query directly is always fast (20 ms).
Question 1 : How can I say to Spring JPA not to use prepared statements for this specific query ?
Question 2 : If Question 1 not possible, what workaround can I have ?
There are some tips in general to enhance query performance both from JPA / DB POV:
1- use #NamedQuery instead of #Query
2- For reporting queries, don't run it inside a transaction
3- You can set the flush mode to COMMIT if you don't need to flush the persistence context before the query runs
4- check the generated query, take it and run it on SQL developer od TOAD, check its cost and run strategy, you can also consult your DBA if you can enhance it with some native DB functions / provcedures , hence use a native query instead of JPQL query
5- if data returning is large, consider making this query a DB view or materialized view and calling it directly
6- make use of query hints to activate a certain index for example, note that indexes may be ignored in case of JPQL
7- you can use native query if the query hint didn't work on JPQL
8- While comparing the query on SQL Developer with that from the code make sure that you are comparing right , the query might run very quickly initially on DB directly but takes loong time to fetch all the data , and you might be comparing this initial short time with the application data fetch time
9- use fetch size hint according to your provider
10- According to my knowledge, you might escape prepared statement if you use native non parametrized query (thus using manual placeholders and replacing values manually) but generally this should be used with care and avoided as much as possible because of SQL injection vulnerabilities and also disallows the DB query engine from as well as the hibernate engine from precompiling the queries

LINQ Count() but don't return all rows

I need to get a count of a DB table.
When I do the below I can see that the table enumerates (so looks like the rows come back then get counted on the app server side) but I'd like the SQL server to do the counting then return just the count.
db.Places.Where(x => x.City == 'San Jose').Count()
This totally depends on the Linq to SQL provider that you're using, but most of them should translate your LINQ statement into an actual server side count.
You will need to enable logging on your provider or turn on profiling on your database in order to see what actual SQL was sent to it.
PS: Please tag with the appropriate provider and SQL server (for example: linq-to-sql sql-server)

LINQ and Generated sql

suppose my LINQ query is like
var qry = from c in nwEntitiesContext.CategorySet.AsEnumerable()
let products = this.GetProducts().WithCategoryID(c.CategoryID)
select new Model.Category
{
ID = c.CategoryID,
Name = c.CategoryName,
Products = new Model.LazyList<Core.Model.Product>(products)
};
return qry.AsQueryable();
i just want to know what query it will generate at runtime....how to see what query it is generating from VS2010 IDE when we run the code in debug mode....guide me step by step.
There is not much to see here - it will just select all fields from the Category table since you call AsEnumerable thus fetching all the data from the Category table into memory. After that you are in object space. Well, depending on what this.GetProducts() does - and my guess it makes another EF query fetching the results into memory. If that's the case, I would strongly recommend you to post another question with this code and the code of your GetProducts method so that we can take a look and rewrite this in a more optimal way. (Apart from this, you are projecting onto a mapped entity Model.Category which again won't (and should not) work with Linq-to-Entities.)
Before reading into your query I was going to recommend doing something like this:
string sqlQueryString = ((ObjectQuery)qry).ToTraceString();
But that won't work since you are mixing Linq-to-Entities with Linq-to-objects and you will actually have several queries executed in case GetProducts queries EF. You can separate the part with your EF query and see the SQL like this though:
string sqlString = nwEntitiesContext.CategorySet.ToTraceString();
but as I mentioned earlier - that would just select everything from the Categories table.
In your case (unless you rewrite your code in a drastic way), you actually want to see what queries are run against the DB when you execute the code and enumerate the results of the queries. See this question:
exact sql query executed by Entity Framework
Your choices are SQL Server Profiler and Entity Framework Profiler. You can also try out LinqPad, but in general I still recommend you to describe what your queries are doing in more detail (and most probably rewrite them in a more optimal way before proceeding).
Try Linqpad
This will produce SELECT * FROM Categories. Nothing more. Once you call AsEnumerable you are in Linq-to-objects and there is no way to get back to Linq-to-entities (AsQueryable doesn't do that).
If you want to see what query is generated use SQL Profiler or any method described in this article.

What is the best way to integrate Solr as an index with Oracle as a storage DB?

I have an Oracle database with all the "data", and a Solr index where all this data is indexed. Ideally, I want to be able to run queries like this:
select * from data_table where id in ([solr query results for 'search string']);
However, one key issue arises:
Oracle WILL NOT allow more than 1000 items in the array of items in the "in" clause (BIG DEAL, as the list of objects I find is very often > 1000 and will usually be around the 50-200k items)
I have tried to work around this using a "split" function that will take a string of comma-separated values, and break them down into array items, but then I hit the 4000 char limit on the function parameter using SQL (PL/SQL is 32k chars, but it's still WAY too limiting for 80,000+ results in some cases)
I am also hitting performance issues using a WHERE IN (....), I am told that this causes a very slow query, even when the field referenced is an indexed field?
I've tried making recursive "OR"s for the 1000-item limit (aka: id in (1...1000 or (id in (1001....2000) or id in (2001....3000))) - and this works, but is very slow.
I am thinking that I should load the Solr Client JARs into Oracle, and write an Oracle Function in Java that will call solr and pipeline back the results as a list, so that I can do something like:
select * from data_table where id in (select * from table(runSolrQuery('my query text')));
This is proving quite hard, and I am not sure it's even possible.
Things that I can't do:
Store full data in Solr (security +
storage limits)
User Solr as
controller of pagination and ordering
(this is why I am fetching data from
the DB)
So I have to cook up a hybrid approach where Solr really act like the full-text search provider for Oracle. Help! Has anyone faced this?
Check this out:
http://demo.scotas.com/search-sqlconsole.php
This product seems to do exactly what you need.
cheers
I'm not a Solr expert, but I assume that you can get the Solr query results into a Java collection. Once you have that, you should be able to use that collection with JDBC. That avoids the limit of 1000 literal items because your IN list would be the result of a query, not a list of literal values.
Dominic Brooks has an example of using object collections with JDBC. You would do something like
Create a couple of types in Oracle
CREATE TYPE data_table_id_typ AS OBJECT (
id NUMBER
);
CREATE TYPE data_table_id_arr AS TABLE OF data_table_id_typ;
In Java, you can then create an appropriate STRUCT array, populate this array from Solr, and then bind it to the SQL statement
SELECT *
FROM data_table
WHERE id IN (SELECT * FROM TABLE( CAST (? AS data_table_id_arr)))
Instead of using a long BooleanQuery, you can use TermsFilter (works like RangeFilter, but the items doesn't have to be in sequence).
Like this (first fill your TermsFilter with terms):
TermsFilter termsFilter = new TermsFilter();
// Loop through terms and add them to filter
Term term = new Term("<field-name>", "<query>");
termsFilter.addTerm(term);
then search the index like this:
DocList parentsList = null;
parentsList = searcher.getDocList(new MatchAllDocsQuery(), searcher.convertFilter(termsFilter), null, 0, 1000);
Where searcher is SolrIndexSearcher (see java doc for more info on getDocList method):
http://lucene.apache.org/solr/api/org/apache/solr/search/SolrIndexSearcher.html
Two solutions come to mind.
First, look into using Oracle specific Java extensions to JDBC. They allow you to pass in an actual array/list as an argument. You may need to create a stored proc (it has a been a while since I had to do this), but if this is a focused use case, it shouldn't be overly burdensome.
Second, if you are still running into a boundary like 1000 object limits, consider using the "rows" setting when querying Solr and leveraging it's inherent pagination feature.
I've used this bulk fetching method with stored procs to fetch large quantities of data which needed to be put into Solr. Involve your DBA. If you have a good one, and use the Oracle specific extensions, I think you should attain very reasonable performance.

Resources