How to tell my #query in spring jpa repository NOT to use prepared statement (lead to very slow queries)? - spring

In my Spring Repository Class, I have the following query (kind of analytics query) running on a Postgresql 9.6 server :
#Query("SELECT d.id as departement_id, COUNT(m.id) as nbMateriel FROM Departement d LEFT JOIN d.sites s LEFT JOIN s.materiels m WHERE "
+ "(s.metier.id IN (:metier_id) OR :metier_id IS NULL) AND (s.entite.id IN (:entite_id) OR :entite_id IS NULL) "
+ "AND (m.materielType.id IN (:materielType_id) OR :materielType_id IS NULL) AND "
+ "(d.id= :departement_id OR :departement_id IS NULL) "
+ "AND m.dateLivraison is not null and (EXTRACT(YEAR FROM m.dateLivraison) < :date_id OR :date_id IS NULL) "
+ "AND ( m.estHISM =:estHISM OR :estHISM IS NULL OR m.estHISM IS NULL) "
+ "GROUP BY d.id")
List<Map<Long, Long>> countByDepartementWithFilter(#Param("metier_id") List<Long> metier_id,#Param("entite_id") List<Long> entite_id,#Param("materielType_id") List<Long> materielType_id,
#Param("departement_id") Long departement_id, #Param("date_id") Integer date_id,
#Param("estHISM") Boolean estHISM);
The problem is : this query is called several times with different combination of parameters, and after 5-6 calls, time execution go from 20 ms to 10 000 ms
From what I have read, what cause this is the use of prepared statements which is not suited to analytics queries, where there are number of parameters whose values can change a lot. And indeed, running the above query directly is always fast (20 ms).
Question 1 : How can I say to Spring JPA not to use prepared statements for this specific query ?
Question 2 : If Question 1 not possible, what workaround can I have ?

There are some tips in general to enhance query performance both from JPA / DB POV:
1- use #NamedQuery instead of #Query
2- For reporting queries, don't run it inside a transaction
3- You can set the flush mode to COMMIT if you don't need to flush the persistence context before the query runs
4- check the generated query, take it and run it on SQL developer od TOAD, check its cost and run strategy, you can also consult your DBA if you can enhance it with some native DB functions / provcedures , hence use a native query instead of JPQL query
5- if data returning is large, consider making this query a DB view or materialized view and calling it directly
6- make use of query hints to activate a certain index for example, note that indexes may be ignored in case of JPQL
7- you can use native query if the query hint didn't work on JPQL
8- While comparing the query on SQL Developer with that from the code make sure that you are comparing right , the query might run very quickly initially on DB directly but takes loong time to fetch all the data , and you might be comparing this initial short time with the application data fetch time
9- use fetch size hint according to your provider
10- According to my knowledge, you might escape prepared statement if you use native non parametrized query (thus using manual placeholders and replacing values manually) but generally this should be used with care and avoided as much as possible because of SQL injection vulnerabilities and also disallows the DB query engine from as well as the hibernate engine from precompiling the queries

Related

Filtering query with R2DBC

I need do a filtering query with some query parameter. I need that if the query parameter is null, the condition of the query (= or LIKE, for example) is not evaluated and return me everything. I'm using R2DBC and I don't find a way to solve it.
If you are using Spring Data R2dbc, besides the above raw SQL query, you can use R2dbcOperations to compose Criteria by condition freely.
The following is an example.
template
.select(Post.class)
.matching(
Query
.query(where("title").like("%" + name + "%"))// extract where here you can assemble the query condition freely.
.limit(10)
.offset(0)
)
.all();
Additionally using R2dbcRepository and convention-based method, try to use a default value in the like(eg. set the null to empty string "" in like) to save work to determine if it is a null value in sql.
A general prepared statement which would work might be:
SELECT *
FROM yourTable
WHERE col = ? or ? IS NULL;
In the event that you bind a NULL value to the ? from your obfuscation layer, the WHERE clause would always be true, returning all records in the table.
If you prefer doing this with a "static SQL statement" (meaning you use a single SQL string for all possible bind values), then, in Oracle, it's probably optimal to use NVL() to profit from an Oracle optimiser feature, as explained in this article, irrespective of whether you're using R2DBC:
SELECT *
FROM t
WHERE col = nvl(:bind, col)
However, a query like yours is often best implemented using dynamic SQL, such as supported by a third party library like jOOQ:
Flux<TRecord> result =
Flux.from(ctx
.selectFrom(T)
.where(bind == null ? noCondition() : T.COL.eq(bind))
);
You can obviously also do this yourself, directly with R2DBC and your own dynamic SQL library, or any other such library.
Disclaimer: I work for the company behind jOOQ.

Pagination with JPA and Oracle database

This week i was looking into a sorting issue in a WebApp. Sorting a table in the browser by a selected column did not work properly. It turned out that in the application, we used JPAs CriteriaQuery to create the query and then create a TypedQuery for the pagination as follows:
CriteriaBuilder cb = em.getCriteriaBuilder();
CriteriaQuery<SomeEntity> q = cb.createQuery(SomeEntity.class);
Root<SomeEntity> c = q.from(SomeEntity.class);
q.select(c);
...
q.orderBy(cb.asc(c.get("SomeColumn")));
TypedQuery<> query = em.createQuery(q);
query.setFirstResult(pageIx * pageSize);
query.setMaxResults(pageSize);
...
This is pretty much how the documentation suggests to create queries (see here).
In the logs i saw that this generates an SQL query like this:
select * from (
select lots_of_columns from some_view order by selected_column
) where rownum <= 50
Since Oralce 10 the ordering of the enclosed select has no effect according to the documentation and, if i remember correctly, this also makes sense according to relational algebra. We use Oracle 12c.
So my question is, how am i supposed to takle this correctly?
I have found that offset and fetch should be used but i couldn't find how to tell JPA to generate the SQL accordingly. Also i have found a post that suggested to add the id to the order by clause, however this did not solve the problem either.
Thank you in advance for any thoughts and hints on the topic.

Is it a good idea to store and access an active query resultset in Coldfusion vs re-quering the database?

I have a product search engine using Coldfusion8 and MySQL 5.0.88
The product search has two display modes: Multiple View and Single View.
Multiple displays basic record info, Single requires additional data to be polled from the database.
Right now a user does a search and I'm polling the database for
(a) total records and
(b) records FROM to TO.
The user always goes to Single view from his current resultset, so my idea was to store the current resultset for each user and not have to query the database again to get (waste a) overall number of records and (waste b) a the single record I already queried before AND then getting the detail information I still need for the Single view.
However, I'm getting nowhere with this.
I cannot cache the current resultset-query, because it's unique to each user(session).
The queries are running inside a CFINVOKED method inside a CFC I'm calling through AJAX, so the whole query runs and afterwards the CFC and CFINVOKE method are discarded, so I can't use query of query or variables.cfc_storage.
So my idea was to store the current resultset in the Session scope, which will be updated with every new search, the user runs (either pagination or completely new search). The maximum results stored will be the number of results displayed.
I can store the query allright, using:
<cfset Session.resultset = query_name>
This stores the whole query with results, like so:
query
CACHED: false
EXECUTIONTIME: 2031
SQL: SELECT a.*, p.ek, p.vk, p.x, p.y
FROM arts a
LEFT JOIN p ON
...
LEFT JOIN f ON
...
WHERE a.aktiv = "ja"
AND
... 20 conditions ...
SQLPARAMETERS: [array]
1) ... 20+ parameters
RESULTSET:
[Record # 1]
a: true
style: 402
price: 2.3
currency: CHF
...
[Record # 2]
a: true
style: 402abc
...
This would be overwritten every time a user does a new search. However, if a user wants to see the details of one of these items, I don't need to query (total number of records & get one record) if I can access the record I need from my temp storage. This way I would save two database trips worth 2031 execution time each to get data which I already pulled before.
The tradeoff would be every user having a resultset of up to 48 results (max number of items per page) in Session.scope.
My questions:
1. Is this feasable or should I requery the database?
2. If I have a struture/array/object like a the above, how do I pick the record I need out of it by style number = how do I access the resultset? I can't just loop over the stored query (tried this for a while now...).
Thanks for help!
KISS rule. Just re-query the database unless you find the performance is really an issue. With the correct index, it should scales pretty well. When the it is an issue, you can simply add query cache there.
QoQ would introduce overhead (on the CF side, memory & computation), and might return stale data (where the query in session is older than the one on DB). I only use QoQ when the same query is used on the same view, but not throughout a Session time span.
Feasible? Yes, depending on how many users and how much data this stores in memory, it's probably much better than going to the DB again.
It seems like the best way to get the single record you want is a query of query. In CF you can create another query that uses an existing query as it's data source. It would look like this:
<cfquery name="subQuery" dbtype="query">
SELECT *
FROM Session.resultset
WHERE style = #SelectedStyleVariable#
</cfquery>
note that if you are using CFBuilder, it will probably scream Error at you for not having a datasource, this is a bug in CFBuilder, you are not required to have a datasource if your DBType is "query"
Depending on how many records, what I would do is have the detail data stored in application scope as a structure where the ID is the key. Something like:
APPLICATION.products[product_id].product_name
.product_price
.product_attribute
Then you would really only need to query for the ID of the item on demand.
And to improve the "on demand" query, you have at least two "in code" options:
1. A query of query, where you query the entire collection of items once, and then query from that for the data you need.
2. Verity or SOLR to index everything and then you'd only have to query for everything when refreshing your search collection. That would be tons faster than doing all the joins for every single query.

How to optimize the running of Oracle orderby native query running in EJB (Oc4j)?

I've the following problem with EJB and Oracle database.
I've some native SQL query deployed in Oc4j that returns more than 21k rows from Oracle DB. when I run the query against Oracle DB I get JOOM (out of memory) exception.
And because the requirements was to include pagination for the result set, so we decided to use em.setMaxResult, em.setFirstResult to return only 10 rows a time.
Using the EntityManager to implement the pagination put us in some problem As Later, it was required to sort the result returned, but the whole result not just the 10 rows returned by setMaxResult()! We found that, to put the clause ORDER BY xxxx in the native query makes the query performance became too bad.
So, we are considering doing the pagination in the Database layer (using Oracle rownum or any other technique).
Later, we recognized that, If we use em.clear() we might be able to avoid the JOOM exception by making something like:
define the result list
while database has more records
{
use entityManager get next 10 records and add them to the result list
entityManager.clear();
}
return result list
So, we could implement the paging on the Servlet side (using session.getAttribute("all_result").sublist(from, to)) and thus we can do the sort using Java as opposite to SQL sort.
Provided code for pagination, might help you.
em.createQuery("Select o from Entity o where o.id > :lastId order by o.id");
query.setParameter("lastId", previousResult.get(previousResult.size()-1).getId());
query.setMaxResults(10);

What is the best way to integrate Solr as an index with Oracle as a storage DB?

I have an Oracle database with all the "data", and a Solr index where all this data is indexed. Ideally, I want to be able to run queries like this:
select * from data_table where id in ([solr query results for 'search string']);
However, one key issue arises:
Oracle WILL NOT allow more than 1000 items in the array of items in the "in" clause (BIG DEAL, as the list of objects I find is very often > 1000 and will usually be around the 50-200k items)
I have tried to work around this using a "split" function that will take a string of comma-separated values, and break them down into array items, but then I hit the 4000 char limit on the function parameter using SQL (PL/SQL is 32k chars, but it's still WAY too limiting for 80,000+ results in some cases)
I am also hitting performance issues using a WHERE IN (....), I am told that this causes a very slow query, even when the field referenced is an indexed field?
I've tried making recursive "OR"s for the 1000-item limit (aka: id in (1...1000 or (id in (1001....2000) or id in (2001....3000))) - and this works, but is very slow.
I am thinking that I should load the Solr Client JARs into Oracle, and write an Oracle Function in Java that will call solr and pipeline back the results as a list, so that I can do something like:
select * from data_table where id in (select * from table(runSolrQuery('my query text')));
This is proving quite hard, and I am not sure it's even possible.
Things that I can't do:
Store full data in Solr (security +
storage limits)
User Solr as
controller of pagination and ordering
(this is why I am fetching data from
the DB)
So I have to cook up a hybrid approach where Solr really act like the full-text search provider for Oracle. Help! Has anyone faced this?
Check this out:
http://demo.scotas.com/search-sqlconsole.php
This product seems to do exactly what you need.
cheers
I'm not a Solr expert, but I assume that you can get the Solr query results into a Java collection. Once you have that, you should be able to use that collection with JDBC. That avoids the limit of 1000 literal items because your IN list would be the result of a query, not a list of literal values.
Dominic Brooks has an example of using object collections with JDBC. You would do something like
Create a couple of types in Oracle
CREATE TYPE data_table_id_typ AS OBJECT (
id NUMBER
);
CREATE TYPE data_table_id_arr AS TABLE OF data_table_id_typ;
In Java, you can then create an appropriate STRUCT array, populate this array from Solr, and then bind it to the SQL statement
SELECT *
FROM data_table
WHERE id IN (SELECT * FROM TABLE( CAST (? AS data_table_id_arr)))
Instead of using a long BooleanQuery, you can use TermsFilter (works like RangeFilter, but the items doesn't have to be in sequence).
Like this (first fill your TermsFilter with terms):
TermsFilter termsFilter = new TermsFilter();
// Loop through terms and add them to filter
Term term = new Term("<field-name>", "<query>");
termsFilter.addTerm(term);
then search the index like this:
DocList parentsList = null;
parentsList = searcher.getDocList(new MatchAllDocsQuery(), searcher.convertFilter(termsFilter), null, 0, 1000);
Where searcher is SolrIndexSearcher (see java doc for more info on getDocList method):
http://lucene.apache.org/solr/api/org/apache/solr/search/SolrIndexSearcher.html
Two solutions come to mind.
First, look into using Oracle specific Java extensions to JDBC. They allow you to pass in an actual array/list as an argument. You may need to create a stored proc (it has a been a while since I had to do this), but if this is a focused use case, it shouldn't be overly burdensome.
Second, if you are still running into a boundary like 1000 object limits, consider using the "rows" setting when querying Solr and leveraging it's inherent pagination feature.
I've used this bulk fetching method with stored procs to fetch large quantities of data which needed to be put into Solr. Involve your DBA. If you have a good one, and use the Oracle specific extensions, I think you should attain very reasonable performance.

Resources