EclipseLink with Oracle: "limit by rownum" does not use index - oracle

we're facing performance issues with EclipseLink 2.7.7 when accessing Oracle 12.1 tables with paging. Investigation showed that Oracle does not use its indexes with EclipseLink paging.
I've extracted the sql sent to the database and was able to reproduce the issue using a database tool (DataGrip).
Example:
-- #1: without paging
SELECT col1 AS a1, col2 AS a2, col3 AS a3, ...
FROM <TABLE>
WHERE colN > to_timestamp('2021-12-08', 'yyyy-mm-dd'))
ORDER BY col1 DESC;
Explain plan shows that the index on colN is used. Fine.
When the same query is executed with paging, the original query is wrapped in two subselects:
-- #2 with EclipseLink paging
SELECT * FROM (
SELECT a.*, ROWNUM rnum FROM (
SELECT col1 AS a1, col2 AS a2, col3 AS a3, ...
FROM <TABLE>
WHERE colN > to_timestamp('2021-12-08', 'yyyy-mm-dd'))
ORDER BY col1 DESC
) a WHERE ROWNUM <= 100
) WHERE rnum > 0;
For this query, the explain plan shows that the index on colN is not used.
As a result, querying a table with millions of rows takes 50-90 seconds (depending on the hardware).
Side note: on my test database, this query returns 0 records since colN values are before 2021-12-08.
Oracle 12c introduced the OFFSET/FETCH syntax:
-- #3
SELECT col1 AS a1, col2 AS a2, col3 AS a3, ...
FROM <TABLE>
WHERE colN > to_timestamp('2021-12-08', 'yyyy-mm-dd'))
ORDER BY col1 DESC
OFFSET 0 ROWS FETCH NEXT 100 ROWS ONLY;
Using this syntax, indexes are at least sometimes used as expected. When they are used, execution time is below 1s which is acceptable.
However, I could not figure out how to convince EclipseLink to use this syntax.
If ORDER BY col1 DESC is removed from the original paged query (#2), the index is used the query returns fast enough. However, it will not return the desired records, so that does not help.
How can I implement performant paged queries using EclipseLink and Oracle 12?
How can I force oracle to use the index on colN when using paging and order by?

The OraclePlatform printSQLSelectStatement method is responsible for building the query used, nesting the queries to use rownum for the query you've seen. To use a new form, you would extend one of the OraclePlatform classes you are using (maybe Oracle12Platform) and override that method to append the syntax you want instead. Something like:
#Override
public void printSQLSelectStatement(DatabaseCall call, ExpressionSQLPrinter printer, SQLSelectStatement statement) {
int max = 0;
int firstRow = 0;
ReadQuery query = statement.getQuery();
if (query != null) {
max = query.getMaxRows();
firstRow = query.getFirstResult();
}
if (!(this.shouldUseRownumFiltering()) || (!(max > 0) && !(firstRow > 0))) {
super.printSQLSelectStatement(call, printer, statement);
return;
}
call.setFields(statement.printSQL(printer));
printer.printString("OFFSET ");
printer.printParameter(DatabaseCall.MAXROW_FIELD);
printer.printString(" ROWS FETCH NEXT ");
printer.printParameter(DatabaseCall.FIRSTRESULT_FIELD);
printer.printString(" ROWS ONLY");
call.setIgnoreFirstRowSetting(true);
call.setIgnoreMaxResultsSetting(true);
}
You would then specify your custom OraclePlatform class using a persistent property:
<property name="eclipselink.target-database" value="my.package.MyOracle12Platform"/>
If something like that works for you, please submit it as an enhancement request - though you might want to work some way to use the old behaviour into it, as the performance differences you've experienced might depend on the query/data involved.

Thanks to #Chris, I came up with the following Oracle12Platform. This solution currently ignores "Bug #453208 - Pessimistic locking with query row limits does not work on Oracle DB". See OraclePlatform.printSQLSelectStatement for details):
public class Oracle12Platform extends org.eclipse.persistence.platform.database.Oracle12Platform {
/**
* the oracle 12c `OFFSET x ROWS FETCH NEXT y ROWS ONLY` requires `maxRows` to return the row count
*/
#Override
public int computeMaxRowsForSQL(final int firstResultIndex, final int maxResults) {
return maxResults - max(firstResultIndex, 0);
}
#Override
public void printSQLSelectStatement(final DatabaseCall call, final ExpressionSQLPrinter printer, final SQLSelectStatement statement) {
int max = 0;
int firstRow = 0;
final ReadQuery query = statement.getQuery();
if (query != null) {
max = query.getMaxRows();
firstRow = query.getFirstResult();
}
if (!(this.shouldUseRownumFiltering()) || (!(max > 0) && !(firstRow > 0))) {
super.printSQLSelectStatement(call, printer, statement);
} else {
statement.setUseUniqueFieldAliases(true);
call.setFields(statement.printSQL(printer));
if (firstRow > 0) {
printer.printString(" OFFSET ");
printer.printParameter(DatabaseCall.FIRSTRESULT_FIELD);
printer.printString(" ROWS");
call.setIgnoreFirstRowSetting(true);
}
if (max > 0) {
printer.printString(" FETCH NEXT ");
printer.printParameter(DatabaseCall.MAXROW_FIELD); //see #computeMaxRowsForSQL
printer.printString(" ROWS ONLY");
call.setIgnoreMaxResultsSetting(true);
}
}
}
}
I had to override computeMaxRowsForSQL in order to get the row count instead of "lastRowNum" when calling printer.printParameter(DatabaseCall.MAXROW_FIELD);
I also try to deal with missing firstRow xor maxResults

Related

Why isn't it using the index?

Hello kind people of the internet.
I am wrecking my head trying to figure out why the optimiser isn't using my index for my query on Amazon Aurora. The query is dynamically created based on a report users have created through an applications UI, so I can't change the query per se.
The query uses these qualifiers
WHERE
table_in_question.deleted = 0
ORDER BY
table_in_question.date_modified DESC,
table_in_question.id DESC
I have an index, "my_index", which indexes these specific fields (table_in_question fields deleted, date_modified, ID) but MySQL doesn't use it.
The query takes approx 1200 ms to run. If I add FORCE INDEX (my_index) it takes about 120ms. Arguably about 10x faster - but unless I use force index, it doesn't use it.
Around 1 million rows are returned according to EXPLAIN, so I don't think it's a case of not using the index because of a low amount of rows being returned is the case.
The full query is
SELECT
case when some_table.id IS NOT NULL then some_table.id else "" end my_favorite,
table_in_question.date_entered,
table_in_question.name,
table_in_question.description,
table_in_question.pr_is_read,
table_in_question.pr_is_approved,
table_in_question.parent_type,
table_in_question.parent_id,
table_in_question.id,
table_in_question.date_modified,
table_in_question.assigned_user_id,
table_in_question.created_by
FROM
table_in_question
INNER JOIN (
SELECT
tst.team_user_is_member_of
FROM
team_sets_teams tst
INNER JOIN team_memberships team_membershipstable_in_question ON (
team_membershipstable_in_question.team_id = tst.team_id
)
AND (team_membershipstable_in_question.user_id = 'UUID')
AND (team_membershipstable_in_question.deleted = 0)
GROUP BY
tst.team_user_is_member_of
) table_in_question_tf ON table_in_question_tf.team_user_is_member_of = table_in_question.team_user_is_member_of
LEFT JOIN systemfavourites sf_table_in_question ON (sf_table_in_question.module = 'table_in_question')
AND (sf_table_in_question.record_id = table_in_question.id)
AND (sf_table_in_question.assigned_user_id = 'UUID')
AND (sf_table_in_question.deleted = '0')
INNER JOIN opportunities jt1_table_in_question ON (table_in_question.opportunity_id = jt1_table_in_question.id)
AND (jt1_table_in_question.deleted = 0)
LEFT JOIN another_table jt1_table_in_question_cstm ON jt1_table_in_question_cstm.id_c = jt1_table_in_question.id
LEFT JOIN systemfavourites table_in_question_favorite ON (table_in_question.id = table_in_question_favorite.record_id)
AND (table_in_question_favorite.deleted = '0')
AND (table_in_question_favorite.module = 'table_in_question')
AND (table_in_question_favorite.created_by = 'UUID')
LEFT JOIN users some_table ON (
some_table.id = table_in_question_favorite.modified_user_id
)
AND (some_table.deleted = 0)
WHERE
table_in_question.deleted = 0
ORDER BY
table_in_question.date_modified DESC,
table_in_question.id DESC
;
EXPLAIN shows this
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
PRIMARY
table_in_question
ALL
idx_table_in_question_tmst_id
968234
10.0
Using where; Using temporary; Using filesort
Can anyone help explain how I make an index it will actually use by default?
Thanks.

Hibernate returning two columns instead one when using setFirstResults & setMaxResults method with Oracle

I have a query that selects a single column and I am executing the query in batches using setFirstResults & setMaxResults methods of SQLQuery.
Snippet:
SQLQuery query = <query object with query projecting a single column>;
int maxResults = 50;
int batchSize = 50;
for (int i = 0; ; i++) {
query.setFirstResult(batchSize*i);
query.setMaxResults(maxResults);
List resultSet = query.list();
if(resultSet.isEmpty())
break;
//process result set
}
I set to true the showSQL parameter in hibernate config to see the query string that hibernate produces. For the first batch, i.e. when i=0 below is the query that hibernate generates:
select * from (/* query selecting single column here */) where rownum <= ?;
which makes sense since its the first batch and we want results from first row and rownum is used to restrict the number of results to maxResults.
Now for the second and subsequent batch reads, the query hibernate generates is:
select * from ( select row_.*, rownum rownum_ from (/*query selecting single column here */) row_ where rownum <= ?) where rownum_ > ?;
and you can clearly see, the above query is selecting two columns, one being the row number itself.
So when my query only selects one column, hibernate's version of the query is selecting two.
Is this known issue? Can I do something different or am I doing something wrong?
I don't want to cast the result set into two different types before using/processing it.

Optimizing database trips for data retrieval

For multiple data insertion we have an efficient way: RecordSortedList
RecordSortedList rsl;
MyTable myTable;
;
rsl = new RecordSortedList(myTable.tableid);
rsl.sortOrder(fieldname2id(myTable.tableId,'RecId'));
myTable.field1 = 'Value1';
rsl.ins(myTable);
myTable.field1 = 'Value2';
rsl.ins(myTable);
rsl.insertDatabase();
Is the same possible for multiple records retrieval from db in one go? Something like
int i =1;
while(i<10000)
{
//enter records from db into a buffer in db
i++
}
//now bring the buffer from db in a single trip
//and do the data manipulation in AX
My intention is to optimize the db trip to the least.
Please Suggest.
Yes, it's called RecordLinkList - http://msdn.microsoft.com/en-us/library/aa643250(v=ax.50).aspx
A recordLinkList is a double linked list that can hold records of
different types at the same time. It is not keyed or sorted.
The recordLinkList is particularly useful for passing records from
different tables as a parameter instead of retrieving the same records
again.
There is no limit to the size of a recordSortedList; it is the
responsibility of the programmer to control its size and, therefore,
memory consumption.
You can also add different types of records.
static void RecordLinkList(Args _args)
{
RecordLinkList rll = new RecordLinkList();
SalesTable salesTable;
CustTable custTable;
InventTrans inventTrans;
Address address;
boolean iterate;
;
select firstonly salesTable;
select firstonly custTable;
select firstonly inventTrans;
select firstonly address;
rll.ins(salesTable);
rll.ins(custTable);
rll.ins(inventTrans);
rll.ins(address);
iterate = rll.first();
while (iterate)
{
switch (rll.fileId()) // FileId == TableId
{
case tablenum(SalesTable):
salesTable = rll.peek();
info(strfmt("SalesTable");
break;
case tablenum(CustTable):
custTable = rll.peek();
info("CustTable");
break;
case tablenum(InventTrans):
inventTrans = rll.peek();
info("InventTrans");
break;
default:
error(strfmt("Table %1 (%2) not expected", tableid2name(rll.fileId()), rll.fileId()));
}
iterate = rll.next();
}
info("Done");
}
The insertDatabase method as stated (use the RecordInsertList class instead of RecordSortedList, if you do not need the sorted order):
inserts multiple records on a single trip to the database.
However this is mostly from the programmers perspective. The operation from the SQL goes like this:
INSERT INTO MyTable ( Column1, Column2 )
VALUES ( Value1, Value2 ),
( Value1, Value2 ), ...
There are limits to the number of records inserted this way, so the AX kernel may split the list to make several calls to the SQL server.
The other way from DB to AX is easy:
while select myTable where ...
Which is translated to SQL as:
SELECT T1.Column1, T1.Column2 FROM MyTable T1 WHERE...
This transports the data from the table to AX as efficient as possible.
You may choose to use a QueryRun object instead, but the call to SQL stays the same.
If you do simple updates on the table, consider using update_recordset as this may move the updates to the SQL server and eliminating the round-trip.

LINQ execute SQL query with output parameter

I need to execute SQL query with output parameter.
For example,
SELECT #Count = COUNT(*) FROM dbo.SomeTable
SELECT * FROM SomeTable WHERE Id BETWEEN 1 AND 10
After quering I need to know the #Count value.
How can I do it with LINQ without using a stored procedure?
Thank you.
int value = yourDB.SomeTable.Count(q=>q.id >=1 && q.id <= 10);
linq is pretty easy :)
edit: so you want 2 items, the count, and then a limited part of the array.
List<SomeTable> li = yourDB.SomeTable.ToList();
int number = li.Count;
List<SomeTable> partial = li.GetRange(0, 10);
or
int value = yourDB.SomeTable.Count();
List<SomeTable> partial = yourDB.SomeTable.ToList().GetRange(0, 10);
so the best LINQ thing for paging is:
List<SomeTable> partial = yourDB.SomeTable.OrderBy(q=>q.id).Skip(0).Take(10).ToList();

Optimizing a LINQ to SQL query

I have a query that looks like this:
public IList<Post> FetchLatestOrders(int pageIndex, int recordCount)
{
DatabaseDataContext db = new DatabaseDataContext();
return (from o in db.Orders
orderby o.CreatedDate descending
select o)
.Skip(pageIndex * recordCount)
.Take(recordCount)
.ToList();
}
I need to print the information of the order and the user who created it:
foreach (var o in FetchLatestOrders(0, 10))
{
Console.WriteLine("{0} {1}", o.Code, o.Customer.Name);
}
This produces a SQL query to bring the orders and one query for each order to bring the customer. Is it possible to optimize the query so that it brings the orders and it's customer in one SQL query?
Thanks
UDPATE: By suggestion of sirrocco I changed the query like this and it works. Only one select query is generated:
public IList<Post> FetchLatestOrders(int pageIndex, int recordCount)
{
var options = new DataLoadOptions();
options.LoadWith<Post>(o => o.Customer);
using (var db = new DatabaseDataContext())
{
db.LoadOptions = options;
return (from o in db.Orders
orderby o.CreatedDate descending
select o)
.Skip(pageIndex * recordCount)
.Take(recordCount)
.ToList();
}
}
Thanks sirrocco.
Something else you can do is EagerLoading. In Linq2SQL you can use LoadOptions : More on LoadOptions
One VERY weird thing about L2S is that you can set LoadOptions only before the first query is sent to the Database.
you might want to look into using compiled queries
have a look at http://www.3devs.com/?p=3
Given a LINQ statement like:
context.Cars
.OrderBy(x => x.Id)
.Skip(50000)
.Take(1000)
.ToList();
This roughly gets translated into:
select * from [Cars] order by [Cars].[Id] asc offset 50000 rows fetch next 1000 rows
Because offset and fetch are extensions of order by, they are not executed until after the select-portion runs (google). This means an expensive select with lots of join-statements are executed on the whole dataset ([Cars]) prior to getting the fetched-results.
Optimize the statement
All that is needed is taking the OrderBy, Skip, and Take statements and putting them into a Where-clause:
context.Cars
.Where(x => context.Cars.OrderBy(y => y.Id).Select(y => y.Id).Skip(50000).Take(1000).Contains(x.Id))
.ToList();
This roughly gets translated into:
exec sp_executesql N'
select * from [Cars]
where exists
(select 1 from
(select [Cars].[Id] from [Cars] order by [Cars].[Id] asc offset #p__linq__0 rows fetch next #p__linq__1 rows only
) as [Limit1]
where [Limit1].[Id] = [Cars].[Id]
)
order by [Cars].[Id] asc',N'#p__linq__0 int,#p__linq__1 int',#p__linq__0=50000,#p__linq__1=1000
So now, the outer select-statement only executes on the filtered dataset based on the where exists-clause!
Again, your mileage may vary on how much query time is saved by making the change. General rule of thumb is the more complex your select-statement and the deeper into the dataset you want to go, the more this optimization will help.

Resources