Does "OFFSET" in Oracle reads all data from first? - oracle

I know that OFFSET 10000 LIMIT 20 in Mysql reads all data from first to 10020, and discards first 10000 data.
Does OFFSET 10000 ROWS FETCH NEXT 20 ROWS ONLY in Oracle work the same?

No, but of course it must read the proportionate of rows to satisfy the ordering sequence,
eg
select * from emp
order by empno
reads every row so they can be sorted, and
select * from emp
where deptno = 10
order by empno
reads every row for department of 10 and then can sort them.
(There's a few special cases where we can skip that with indexes etc, but that's a separate discussion).
Coming back to FETCH FIRST / OFFSET, in effect we are amending the query to satisfy that need:
select ...
from mytable
order by col
offset 100 fetch 20
becomes
select ..., row_number() over ( order by col) as r
from mytable
because then we can do
select
from ( select ..., row_number() over ( order by col) as r
from mytable
)
where r between 100 and 120
to match your FETCH/OFFSET needs.

Related

I want to get the last available value in a previous row in MySQL8 in a calculated column

I have a table in mysql database this data.
id date close previous_close
1 07-10-2020 200 300
2 06-10-2020 300 1000
3 05-10-2020 0 1000
4 04-10-2020 1000 15
I've had a look at using the lag() function but can't get my head round it. How can I craft a query such that the calculated column previous_close obtains the most recently available value in the close column where it's not zero?
Here is one way:
WITH cte AS (
SELECT *,
MAX(CASE WHEN close > 0 THEN date END) OVER
(ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS last_date
FROM yourTable
)
SELECT
t1.id,
t1.date,
t1.close,
t2.close AS previous_close
FROM cte t1
LEFT JOIN yourTable t2
ON t2.date = t1.last_date
ORDER BY t1.date DESC;
The strategy here is that the CTE finds the latest date corresponding to a non zero close occurring strictly before each given row, as sorted by date. Then, all we have to do is to join this CTE back to your original table to find the actual matching latest previous close value.

Oracle query to obtain batches of rows

So here is my problem: I need to get batches of rows (select statements) for a migration to another database (other then oracle).
Suggested solution: I take batches of rows (using rowid maybe?) example:
batch1: 0-10000,
batch2: 10000 - 20000,
batchn: 10000(n) - 10000(n+1)
So what should my query be?
batch1: select * from table_name where rownum >= 0 and rownum < 10000,
batch2: select * from table_name where rownum >= 10000 and rownum < 20000,
batch n: select * from table_name where rownum >= 10000*n and rownum < 10000*(n+1)
This does not work, (only the first select will work).
PS, I am pulling this data from a nodejs app, and thus I am sending in these batch queries in a for loop.
To illustrate my comment:
-- Between rows --
SELECT * FROM
( SELECT deptno, ename, sal, ROW_NUMBER() OVER (ORDER BY ename) Row_Num
FROM scott.emp
)
WHERE Row_Num BETWEEN 5 and 10
/
You may replace between operator with <= and >= if necessary.
Here's what I see in output:
DEPTNO ENAME SAL ROW_NUM
20 FORD 3000 5
30 JAMES 950 6
20 JONES 2975 7
10 KING 5000 8
30 MARTIN 1250 9
10 MILLER 1300 10
Using rownum is not a great idea, because there's no guarantee that the same rows will be assigned the same rownum values in different queries.
If the table has any combination of columns that uniquely identify a row, it is better to generate a ranking based on that and use that ranking to identify batches of rows. For example:
SELECT * FROM (
SELECT table.*, RANK() OVER (ORDER BY column1, column2) as my_rank
FROM table
)
WHERE my_rank >= 10000 AND my_rank < 20000
This will work with any range, and will be reproducible as long as the values in the columns used do not change and uniquely identify a row. (Actually, I think this would be usable even if they do not uniquely identify a row, as long as they work to break the rows into small enough batches.)
The downside is that MY_RANK will be included in the output. You can avoid that by explicitly listing the columns you do want to select; or it may be easier to filter it out when you are loading the data into the other database.
If you want to preserve the rowids, use the following SQL. This SQL took 4 minutes, 20 seconds to run against a 218 million row table on a 2 CPU server with 18 GB devoted to the DB.
CREATE TABLE rowids
AS
WITH
aset
AS
(SELECT ROWID AS row_id, row_number () OVER (ORDER BY ROWID) r
FROM amiadm.big_table)
SELECT *
FROM aset
WHERE MOD (r, 10000) = 0;
After creating this table, loop through it with the following:
BEGIN
FOR recs
IN ( SELECT row_id
, LAG (row_id) OVER (ORDER BY row_id) prev_row_id
, LEAD (row_id) OVER (ORDER BY row_id) next_row_id
FROM rowids
ORDER BY row_id)
LOOP
IF prev_row_id IS NULL
THEN
SELECT *
FROM big_table
WHERE ROWID <= recs.row_id;
ELSIF next_row_id IS NULL
THEN
SELECT *
FROM big_table
WHERE ROWID > row_id;
ELSE
SELECT *
FROM big_table
WHERE ROWID > prev_row_id
AND ROWID <= row_id;
END IF;
END LOOP;
END;

Use a sub-select in the PIVOT's FOR clause?

The standard PIVOT syntax uses a static FOR list:
SELECT *
FROM (
SELECT log_id, event_id, event_time
FROM patient_events
WHERE event_id IN (10,20,30,40,50)
) v
PIVOT (
max(event_time) event_time
FOR event_id IN( 10,20,30,40,50 )
)
Is there a way to make this dynamic?
I know the sub-select in the WHERE clause will work, but can I use one in the FOR?
SELECT *
FROM (
SELECT log_id, event_id, event_time
FROM patient_events
WHERE event_id IN ( sub-select to generate list of IDs )
) v
PIVOT (
max(event_time) event_time
FOR event_id IN( sub-select to generate list of IDs )
)
You can't in pure SQL, but I don't think quite because of the reason suggested - it's not that the IN clause needs to be ordered, it's that it has to be constant.
When given a query, the database needs to know the shape of the result set and the shape needs to be consistent across queries (assuming no other DDL operations have taken place that might affect it). For a PIVOT query, the shape of the result is defined by the IN clause - each entry becomes a column, with a data type corresponding to the aggregation clause.
Hypothetically if you were to allow a sub-select for the IN clause then you could alter the shape of the result set just by performing DML operations. Imagine your sub-select worked and got you a list of all event_ids known to the system - by inserting a new record into whatever drives that sub-select, your query returns a different number of columns even though no DDL has occurred.
Now we're stuck - any view built on that query is invalid because its shape wouldn't match that of the query, but Oracle couldn't know that it's invalid because none of the objects it depends on have been changed by DDL.
Depending on where you're consuming the result, dynamic SQL's your only option - either at the application level (build the IN list yourself) or via a ref cursor in a database function or procedure.
Interesting question.
On the face of it, it shouldn't work, since the list of values (which will become column names) must be ordered. This is not the case for an "IN" list in the WHERE clause. But perhaps it would work with an ORDER BY condition in the sub-SELECT?
Unfortunately, no. This is easy to test. Got the same error message with or without ORDER BY. (And the query works fine if the IN list is just 10, 20, 30, 40 - the actual department numbers from the DEPT table.) Using tables from the standard SCOTT schema.
SQL> select deptno from scott.dept;
DEPTNO
----------
10
20
30
40
4 rows selected.
SQL> select * from (
2 select sal, deptno
3 from scott.emp
4 )
5 pivot (sum(sal) as total_sal
6 for deptno in (10, 20, 30, 40))
7 ;
10_TOTAL_SAL 20_TOTAL_SAL 30_TOTAL_SAL 40_TOTAL_SAL
------------ ------------ ------------ ------------
8750 10875 9400
1 row selected.
SQL> select * from (
2 select sal, deptno
3 from scott.emp
4 )
5 pivot (sum(sal) as total_sal
6 for deptno in (select deptno from scott.dept order by deptno))
7 ;
for deptno in (select deptno from scott.dept order by deptno))
*
ERROR at line 6:
ORA-00936: missing expression

Best practice for pagination in Oracle?

Problem: I need write stored procedure(s) that will return result set of a single page of rows and the number of total rows.
Solution A: I create two stored procedures, one that returns a results set of a single page and another that returns a scalar -- total rows. The Explain Plan says the first sproc has a cost of 9 and the second has a cost of 3.
SELECT *
FROM ( SELECT ROW_NUMBER() OVER ( ORDER BY D.ID DESC ) AS RowNum, ...
) AS PageResult
WHERE RowNum >= #from
AND RowNum < #to
ORDER BY RowNum
SELECT COUNT(*)
FROM ...
Solution B: I put everything in a single sproc, by adding the same TotalRows number to every row in the result set. This solution feel hackish, but has a cost of 9 and only one sproc, so I'm inclined to use this solution.
SELECT *
FROM ( SELECT ROW_NUMBER() OVER ( ORDER BY D.ID DESC ) RowNum, COUNT(*) OVER () TotalRows,
WHERE RowNum >= from
AND RowNum < to
ORDER BY RowNum;
Is there a best-practice for pagination in Oracle? Which of the aforementioned solutions is most used in practice? Is any of them considered just plain wrong? Note that my DB is and will stay relatively small (less than 10GB).
I'm using Oracle 11g and the latest ODP.NET with VS2010 SP1 and Entity Framework 4.4. I need the final solution to work within the EF 4.4. I'm sure there are probably better methods out there for pagination in general, but I need them working with EF.
If you're already using analytics (ROW_NUMBER() OVER ...) then adding another analytic function on the same partitioning will add a negligible cost to the query.
On the other hand, there are many other ways to do pagination, one of them using rownum:
SELECT *
FROM (SELECT A.*, rownum rn
FROM (SELECT *
FROM your_table
ORDER BY col) A
WHERE rownum <= :Y)
WHERE rn >= :X
This method will be superior if you have an appropriate index on the ordering column. In this case, it might be more efficient to use two queries (one for the total number of rows, one for the result).
Both methods are appropriate but in general if you want both the number of rows and a pagination set then using analytics is more efficient because you only query the rows once.
In Oracle 12C you can use limit LIMIT and OFFSET for the pagination.
Example -
Suppose you have Table tab from which data needs to be fetched on the basis of DATE datatype column dt in descending order using pagination.
page_size:=5
select * from tab
order by dt desc
OFFSET nvl(page_no-1,1)*page_size ROWS FETCH NEXT page_size ROWS ONLY;
Explanation:
page_no=1
page_size=5
OFFSET 0 ROWS FETCH NEXT 5 ROWS ONLY - Fetch 1st 5 rows only
page_no=2
page_size=5
OFFSET 5 ROWS FETCH NEXT 5 ROWS ONLY - Fetch next 5 rows
and so on.
Refrence Pages -
https://dba-presents.com/index.php/databases/oracle/31-new-pagination-method-in-oracle-12c-offset-fetch
https://oracle-base.com/articles/12c/row-limiting-clause-for-top-n-queries-12cr1#paging
This may help:
SELECT * FROM
( SELECT deptno, ename, sal, ROW_NUMBER() OVER (ORDER BY ename) Row_Num FROM emp)
WHERE Row_Num BETWEEN 5 and 10;
A clean way to organize your SQL code could be trough WITH statement.
The reduced version implements also total number of results and total pages count.
For example
WITH SELECTION AS (
SELECT FIELDA, FIELDB, FIELDC FROM TABLE),
NUMBERED AS (
SELECT
ROW_NUMBER() OVER (ORDER BY FIELDA) RN,
SELECTION.*
FROM SELECTION)
SELECT
(SELECT COUNT(*) FROM NUMBERED) TOTAL_ROWS,
NUMBERED.*
FROM NUMBERED
WHERE
RN BETWEEN ((:page_size*:page_number)-:page_size+1) AND (:page_size*:page_number)
This code gives you a paged resultset with two more fields:
TOTAL_ROWS with the total rows of your full SELECTION
RN the row number of the record
It requires 2 parameter: :page_size and :page_number to slice your SELECTION
Reduced Version
Selection implements already ROW_NUMBER() field
WITH SELECTION AS (
SELECT
ROW_NUMBER() OVER (ORDER BY FIELDA) RN,
FIELDA,
FIELDB,
FIELDC
FROM TABLE)
SELECT
:page_number PAGE_NUMBER,
CEIL((SELECT COUNT(*) FROM SELECTION ) / :page_size) TOTAL_PAGES,
:page_size PAGE_SIZE,
(SELECT COUNT(*) FROM SELECTION ) TOTAL_ROWS,
SELECTION.*
FROM SELECTION
WHERE
RN BETWEEN ((:page_size*:page_number)-:page_size+1) AND (:page_size*:page_number)
Try this:
select * from ( select * from "table" order by "column" desc ) where ROWNUM > 0 and ROWNUM <= 5;
I also faced a similar issue. I tried all the above solutions and none gave me a better performance. I have a table with millions of records and I need to display them on screen in pages of 20. I have done the below to solve the issue.
Add a new column ROW_NUMBER in the table.
Make the column as primary key or add a unique index on it.
Use the population program (in my case, Informatica), to populate the column with rownum.
Fetch Records from the table using between statement. (SELECT * FROM TABLE WHERE ROW_NUMBER BETWEEN LOWER_RANGE AND UPPER_RANGE).
This method is effective if we need to do an unconditional pagination fetch on a huge table.
Sorry, this one works with sorting:
SELECT * FROM (SELECT ROWNUM rnum,a.* FROM (SELECT * FROM "tabla" order by "column" asc) a) WHERE rnum BETWEEN "firstrange" AND "lastrange";

Divide my Oracle table into 5 parts randomly

I want to separete my Oracle table into 5 parts, these parts records will be selected randomly from the original table. Parts can contain the same results, it is not a problem.
How can I do that?
You could use ORDER BY dbms_random.value and then work out the number of total records and divide by 5 and use this to limit the number of row returned:
SELECT * FROM
( SELECT * FROM mytable
ORDER BY dbms_random.value
)
WHERE rownum <= (SELECT count(*)/5 from mytable)

Resources