I am trying to download a huge table (billions of records) from oracle DB
The base can hold a session only for a few hours (no idea why)
So my idea is to split table for many peace's and download it using dynamic sql
split query:
SELECT
data_object_id,
file_id,
relative_fno,
file_batch,
subobject_name,
MIN (start_block_id) start_block_id,
MAX (end_block_id) end_block_id,
SUM (blocks) blocks
FROM
(SELECT
o.data_object_id,
e.file_id,
e.relative_fno,
e.block_id start_block_id,
e.block_id + e.blocks - 1 end_block_id,
e.blocks,
CEIL (SUM(e.blocks) OVER (PARTITION BY o.data_object_id, e.file_id ORDER BY e.block_id ASC) /
(SUM (e.blocks)OVER (PARTITION BY o.data_object_id,e.file_id) / 1)) file_batch
FROM
dba_extents e,
dba_objects o,
dba_tab_subpartitions tsp
WHERE
o.owner = :owner
AND o.object_name = :object_name
AND e.owner = :owner
AND e.segment_name = :object_name
AND o.owner = e.owner
AND o.object_name = e.segment_name
AND (o.subobject_name = e.partition_name
OR (o.subobject_name IS NULL
AND e.partition_name IS NULL))
AND o.owner = tsp.table_owner(+)
AND o.object_name = tsp.table_name(+)
AND o.subobject_name = tsp.subpartition_name(+))
GROUP BY
data_object_id,
file_id,
relative_fno,
file_batch
ORDER BY
data_object_id,
file_id,
relative_fno,
file_batch;
it splits usual table, but it doesn't work with partition or subpartition tables (when I try to download it has more or less lines than it is in DB)
the queries for download I used:
SELECT /*+ NO_INDEX(t) */ COLUMN_NAMES,'63_17' data_chunk_id
FROM OWNER.OBJECT_NAME t
WHERE ((rowid >= dbms_rowid.rowid_create(1, 846313, 63, 3057792, 0)
AND rowid <= dbms_rowid.rowid_create(1, 846313, 63, 4056447, 32767)));
Check out DBMS_PARALLEL_EXECUTE. Even if you don't use it "completely", ie, to run the extract, you can use the DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_ROWID routine to generate a list of rowid ranges that you can then use to run your rowid range queries that you've already written.
Related
I am further taking reference from : Apply OFFSET and LIMIT in ORACLE for complex Join Queries?.
How to find out the MIN and MAX value of rn here ?
SELECT q.*
FROM (SELECT DEPT.ID rowobjid,
DEPT.CREATOR createdby,
DEPT.CREATE_DATE createddate,
DEPT.UPDATED_BY updatedby,
DEPT.LAST_UPDATE_DATE updateddate,
DEPT.NAME name,
DEPT.STATUS status,
statusT.DESCR statusdesc,
REL.ROWID_DEPT1 rowidDEPT1,
REL.ROWID_DEPT2 rowidDEPT2,
DEPT2.DEPT_FROM_VAL parentcid,
DEPT2.NAME parentname,
ROW_NUMBER() OVER (PARTITION BY DEPT.CREATE_DATE ORDER BY DEPT.ID) AS rn
FROM TEST.DEPT_TABLE DEPT
LEFT JOIN TEST.STATUS_TABLE statusT
ON DEPT.STATUS = statusT.STATUS
LEFT JOIN TEST.C_REL_DEPT rel
ON DEPT.ID = REL.ROWID_DEPT2
LEFT JOIN TEST.DEPT_TABLE DEPT2
ON REL.ROWID_DEPT1 = DEPT2.ID) q
Min value of rn will always be 1 and for finding the max. Use max after * as follows:
SELECT q.*, max(rn) over () as max_rn, 1 as min_rn
FROM (SELECT
.......
.......
If you want max rn within inner query for each partition then you can use count analytical function as it will return total number of records within the defined partition(which is nothing but the max rn for that partition) as follows:
COUNT(1) OVER (PARTITION BY DEPT.CREATE_DATE) AS MAX_RN, 1 AS MIN_RN
I'm using Oracle 12C and I have the following code:
SELECT *
FROM
(
SELECT
*
FROM
t.tablea
WHERE
name = 'FIS'
) A
LEFT JOIN (
SELECT
*
FROM
t.tableb
WHERE
enabled = 1
) B ON b.id = a.id
AND TO_CHAR(b.createdate, 'dd-mm-yy hh24:mi') = TO_CHAR(a.createdate, 'dd-mm-yy hh24:mi')
Both a and b createdate are timestamp datatype.
Optimizer return an internal_function at TO_CHAR(b.createdate, 'dd-mm-yy hh24:mi') = TO_CHAR(a.createdate, 'dd-mm-yy hh24:mi') in Execution Plan
If I compare like this: 'AND b.createdate = a.createdate', It will lost 1000 rows that look like this '11-JUN-18 04.48.34.269928000 PM'. And If I change 269928000 to 269000000 It will work
Now, I don't want to using to_char to avoid internal_function(must create Function-based-Index)
Anyone can help me?
If I compare like this: AND b.createdate = a.createdate, It will lost 1000 rows that look like this 11-JUN-18 04.48.34.269928000 PM. And If I change 269928000 to 269000000 It will work
Your values appear to have a fractional seconds component and would have the TIMESTAMP data type. If so, you can use TRUNC( timestamp_value, 'MI' ) to truncate to the nearest minute.
SELECT *
FROM t.tablea a
LEFT OUTER JOIN t.tableb b
ON ( a.createdate >= TRUNC( b.createdate, 'MI' )
AND a.createdate < TRUNC( b.createdate, 'MI' ) + INTERVAL '1' MINUTE
AND a.id = b.id
AND b.enabled = 1
)
WHERE a.name = 'FIS'
This will remove the need to apply a function to one of the two tables (a.createdate in this case but you could swap them).
I don't even see the need for the subqueries:
SELECT a.*, b.*
FROM t.tablea a
LEFT JOIN t.tableb b
ON a.id = b.id AND
TRUNC(b.createdate, 'MI') = TRUNC(a.createdate, 'MI') AND
b.enabled = 1
WHERE
a.name = 'FIS'
I am trying to modify a query that someone else wrote which is taking a really long time to run. The problem has to do with the <> portion of the exists query. Any idea how this can be changed to run quicker?
SELECT m.level4 center, cc.description, m.employeename, m.empno,
TO_DATE (ct.tsdate, 'dd-mon-yyyy') tsdate, ct.starttime, ct.endtime,
ct.DURATION,
NVL (DECODE (ct.paycode, ' ', 'REG', ct.paycode), 'REG') paycode,
ct.titlecode, ct.costcenter, m.tsgroup
FROM clairvia_text ct, MASTER m, costcenteroutbound cc
WHERE ct.recordtype = '1'
AND ct.empno = m.empno
AND m.level4 = cc.center
AND EXISTS (
SELECT ct1.recordtype,ct1.empno,ct1.tsdate,ct1.processdate
FROM clairvia_text ct1
WHERE ct.recordtype = ct1.recordtype
AND ct.empno = ct1.empno
AND ct.tsdate = ct1.tsdate
AND ct.processdate = ct1.processdate
group by ct1.recordtype,ct1.empno,ct1.tsdate,ct1.processdate
having count(*) < 2)
Oracle can be finnicky with exists statements and subqueries. A couple of things to try:
Change the exists to an "in"
Change the exists to a group by statement with a "having count(1) > 1". This could even be changed into a join.
I'm assuming indexes are not an issue.
You can use analytic function count here to eliminate duplicated rows.
select * from (
SELECT m.level4 center, cc.description, m.employeename, m.empno,
TO_DATE (ct.tsdate, 'dd-mon-yyyy') tsdate, ct.starttime, ct.endtime,
ct.DURATION,
NVL (DECODE (ct.paycode, ' ', 'REG', ct.paycode), 'REG') paycode,
ct.titlecode, ct.costcenter, m.tsgroup,
count(1) over (partition by ct.recordtype,ct.empno,ct.tsdate,ct.processdate
order by null) cnt
FROM clairvia_text ct, MASTER m, costcenteroutbound cc
WHERE ct.recordtype = '1'
AND ct.empno = m.empno
AND m.level4 = cc.center
) where cnt=1
I do not have your structures and data, so I run similar queries with all_tab_cols and first query took ~500s on my laptop and second query ~2s.
-- slow test
select count(1)
from all_tab_cols atc
where exists (
select 1
from all_tab_cols atc1
where atc1.column_name = atc.column_name
group by column_name
having count(1) = 1)
-- fast test
select count(1)
from (
select column_name,
count(1) over (partition by atc.column_name order by null) cnt
from all_tab_cols atc
)
where cnt = 1
Please see my code below as it is running too slowly with the CROSS APPLY.
How can I remove the CROSS APPLY and add something else that will run faster?
Please note I am using SQL Server 2008 R2.
;WITH MyCTE AS
(
SELECT
R.NetWinCURRENCYValue AS NetWin
,dD.[Date] AS TheDay
FROM
dimPlayer AS P
JOIN
dbo.factRevenue AS R ON P.playerKey = R.playerKey
JOIN
dbo.vw_Date AS dD ON Dd.dateKey = R.dateKey
WHERE
P.CustomerID = 12345)
SELECT
A.TheDay AS [Date]
,ISNULL(A.NetWin, 0) AS NetWin
,rt.runningTotal AS CumulativeNetWin
FROM MyCTE AS A
CROSS APPLY (SELECT SUM(NetWin) AS runningTotal
FROM MyCTE WHERE TheDay <= A.TheDay) AS rt
ORDER BY A.TheDay
CREATE TABLE #temp (NetWin money, TheDay datetime)
insert into #temp
SELECT
R.NetWinCURRENCYValue AS NetWin
,dD.[Date] AS TheDay
FROM
dimPlayer AS P
JOIN
dbo.factRevenue AS R ON P.playerKey = R.playerKey
JOIN
dbo.vw_Date AS dD ON Dd.dateKey = R.dateKey
WHERE
P.CustomerID = 12345;
SELECT
A.TheDay AS [Date]
,ISNULL(A.NetWin, 0) AS NetWin
,SUM(B.NetWin) AS CumulativeNetWin
FROM #temp AS A
JOIN #temp AS B
ON A.TheDay >= B.TheDay
GROUP BY A.TheDay, ISNULL(A.NetWin, 0);
Here https://stackoverflow.com/a/13744550/613130 it's suggested to use recursive CTE.
;WITH MyCTE AS
(
SELECT
R.NetWinCURRENCYValue AS NetWin
,dD.[Date] AS TheDay
,ROW_NUMBER() OVER (ORDER BY dD.[Date]) AS RN
FROM dimPlayer AS P
JOIN dbo.factRevenue AS R ON P.playerKey = R.playerKey
JOIN dbo.vw_Date AS dD ON Dd.dateKey = R.dateKey
WHERE P.CustomerID = 12345
)
, MyCTERec AS
(
SELECT C.TheDay AS [Date]
,ISNULL(C.NetWin, 0) AS NetWin
,ISNULL(C.NetWin, 0) AS CumulativeNetWin
,C.RN
FROM MyCTE AS C
WHERE C.RN = 1
UNION ALL
SELECT C.TheDay AS [Date]
,ISNULL(C.NetWin, 0) AS NetWin
,P.CumulativeNetWin + ISNULL(C.NetWin, 0) AS CumulativeNetWin
,C.RN
FROM MyCTERec P
INNER JOIN MyCTE AS C ON C.RN = P.RN + 1
)
SELECT *
FROM MyCTERec
ORDER BY RN
OPTION (MAXRECURSION 0)
Note that this query will work if you have 1 record == 1 day! If you have multiple records in a day, the results will be different from the other query.
As I said here, if you want really fast calculation, put it into temporary table with sequential primary key and then calculate rolling total:
create table #Temp (
ID bigint identity(1, 1) primary key,
[Date] date,
NetWin decimal(29, 10)
)
insert into #Temp ([Date], NetWin)
select
dD.[Date],
sum(R.NetWinCURRENCYValue) as NetWin,
from dbo.dimPlayer as P
inner join dbo.factRevenue as R on P.playerKey = R.playerKey
inner join dbo.vw_Date as dD on Dd.dateKey = R.dateKey
where P.CustomerID = 12345
group by dD.[Date]
order by dD.[Date]
;with cte as (
select T.ID, T.[Date], T.NetWin, T.NetWin as CumulativeNetWin
from #Temp as T
where T.ID = 1
union all
select T.ID, T.[Date], T.NetWin, T.NetWin + C.CumulativeNetWin as CumulativeNetWin
from cte as C
inner join #Temp as T on T.ID = C.ID + 1
)
select C.[Date], C.NetWin, C.CumulativeNetWin
from cte as C
order by C.[Date]
I assume that you could have duplicates dates in the input, but don't want duplicates in the output, so I grouped data before puting it into the table.
Is there a way to make selecting random rows faster in oracle with a table that has million of rows. I tried to use sample(x) and dbms_random.value and its taking a long time to run.
Thanks!
Using appropriate values of sample(x) is the fastest way you can. It's block-random and row-random within blocks, so if you only want one random row:
select dbms_rowid.rowid_relative_fno(rowid) as fileno,
dbms_rowid.rowid_block_number(rowid) as blockno,
dbms_rowid.rowid_row_number(rowid) as offset
from (select rowid from [my_big_table] sample (.01))
where rownum = 1
I'm using a subpartitioned table, and I'm getting pretty good randomness even grabbing multiple rows:
select dbms_rowid.rowid_relative_fno(rowid) as fileno,
dbms_rowid.rowid_block_number(rowid) as blockno,
dbms_rowid.rowid_row_number(rowid) as offset
from (select rowid from [my_big_table] sample (.01))
where rownum <= 5
FILENO BLOCKNO OFFSET
---------- ---------- ----------
152 2454936 11
152 2463140 32
152 2335208 2
152 2429207 23
152 2746125 28
I suspect you should probably tune your SAMPLE clause to use an appropriate sample size for what you're fetching.
Start with Adam's answer first, but if SAMPLE just isn't fast enough, even with the ROWNUM optimization, you can use block samples:
....FROM [table] SAMPLE BLOCK (0.01)
This applies the sampling at the block level instead of for each row. This does mean that it can skip large swathes of data from the table so the sample percent will be very rough. It's not unusual for a SAMPLE BLOCK with a low percentage to return zero rows.
Here's the same question on AskTom:
http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:6075151195522
If you know how big your table is, use sample block as described above. If you don't, you can modify the routine below to get however many rows you want.
Copied from: http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:6075151195522#56174726207861
create or replace function get_random_rowid
( table_name varchar2
) return urowid
as
sql_v varchar2(100);
urowid_t dbms_sql.urowid_table;
cursor_v integer;
status_v integer;
rows_v integer;
begin
for exp_v in -6..2 loop
exit when (urowid_t.count > 0);
if (exp_v < 2) then
sql_v := 'select rowid from ' || table_name
|| ' sample block (' || power(10, exp_v) || ')';
else
sql_v := 'select rowid from ' || table_name;
end if;
cursor_v := dbms_sql.open_cursor;
dbms_sql.parse(cursor_v, sql_v, dbms_sql.native);
dbms_sql.define_array(cursor_v, 1, urowid_t, 100, 0);
status_v := dbms_sql.execute(cursor_v);
loop
rows_v := dbms_sql.fetch_rows(cursor_v);
dbms_sql.column_value(cursor_v, 1, urowid_t);
exit when rows_v != 100;
end loop;
dbms_sql.close_cursor(cursor_v);
end loop;
if (urowid_t.count > 0) then
return urowid_t(trunc(dbms_random.value(0, urowid_t.count)));
end if;
return null;
exception when others then
if (dbms_sql.is_open(cursor_v)) then
dbms_sql.close_cursor(cursor_v);
end if;
raise;
end;
/
show errors
Below Solution to this question is not the exact answer but in many scenarios you try to select a row and try to use it for some purpose and then update its status with "used" or "done" so that you do not select it again.
Solution:
Below query is useful but that way if your table is large, I just tried and see that you definitely face performance problem with this query.
SELECT * FROM
( SELECT * FROM table
ORDER BY dbms_random.value )
WHERE rownum = 1
So if you set a rownum like below then you can work around the performance problem. By incrementing rownum you can reduce the possiblities. But in this case you will always get rows from the same 1000 rows. If you get a row from 1000 and update its status with "USED", you will almost get different row everytime you query with "ACTIVE"
SELECT * FROM
( SELECT * FROM table
where rownum < 1000
and status = 'ACTIVE'
ORDER BY dbms_random.value )
WHERE rownum = 1
update the rows status after selecting it, If you can not update that means another transaction has already used it. Then You should try to get a new row and update its status. By the way, getting the same row by two different transaction possibility is 0.001 since rownum is 1000.
Someone told sample(x) is the fastest way you can.
But for me this method works slightly faster than sample(x) method.
It should take fraction of the second (0.2 in my case) no matter what is the size of the table. If it takes longer try to use hints (--+ leading(e) use_nl(e t) rowid(t)) can help
SELECT *
FROM My_User.My_Table
WHERE ROWID = (SELECT MAX(t.ROWID) KEEP(DENSE_RANK FIRST ORDER BY dbms_random.value)
FROM (SELECT o.Data_Object_Id,
e.Relative_Fno,
e.Block_Id + TRUNC(Dbms_Random.Value(0, e.Blocks)) AS Block_Id
FROM Dba_Extents e
JOIN Dba_Objects o ON o.Owner = e.Owner AND o.Object_Type = e.Segment_Type AND o.Object_Name = e.Segment_Name
WHERE e.Segment_Name = 'MY_TABLE'
AND(e.Segment_Type, e.Owner, e.Extent_Id) =
(SELECT MAX(e.Segment_Type) AS Segment_Type,
MAX(e.Owner) AS Owner,
MAX(e.Extent_Id) KEEP(DENSE_RANK FIRST ORDER BY Dbms_Random.Value) AS Extent_Id
FROM Dba_Extents e
WHERE e.Segment_Name = 'MY_TABLE'
AND e.Owner = 'MY_USER'
AND e.Segment_Type = 'TABLE')) e
JOIN My_User.My_Table t
ON t.Rowid BETWEEN Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 0)
AND Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 32767))
Version with retries when no rows returned:
WITH gen AS ((SELECT --+ inline leading(e) use_nl(e t) rowid(t)
MAX(t.ROWID) KEEP(DENSE_RANK FIRST ORDER BY dbms_random.value) Row_Id
FROM (SELECT o.Data_Object_Id,
e.Relative_Fno,
e.Block_Id + TRUNC(Dbms_Random.Value(0, e.Blocks)) AS Block_Id
FROM Dba_Extents e
JOIN Dba_Objects o ON o.Owner = e.Owner AND o.Object_Type = e.Segment_Type AND o.Object_Name = e.Segment_Name
WHERE e.Segment_Name = 'MY_TABLE'
AND(e.Segment_Type, e.Owner, e.Extent_Id) =
(SELECT MAX(e.Segment_Type) AS Segment_Type,
MAX(e.Owner) AS Owner,
MAX(e.Extent_Id) KEEP(DENSE_RANK FIRST ORDER BY Dbms_Random.Value) AS Extent_Id
FROM Dba_Extents e
WHERE e.Segment_Name = 'MY_TABLE'
AND e.Owner = 'MY_USER'
AND e.Segment_Type = 'TABLE')) e
JOIN MY_USER.MY_TABLE t ON t.ROWID BETWEEN Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 0)
AND Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 32767))),
Retries(Cnt, Row_Id) AS (SELECT 1, gen.Row_Id
FROM Dual
LEFT JOIN gen ON 1=1
UNION ALL
SELECT Cnt + 1, gen.Row_Id
FROM Retries
LEFT JOIN gen ON 1=1
WHERE Retries.Row_Id IS NULL AND Retries.Cnt < 10)
SELECT *
FROM MY_USER.MY_TABLE
WHERE ROWID = (SELECT Row_Id
FROM Retries
WHERE Row_Id IS NOT NULL)
Can you use pseudorandom rows?
select * from (
select * from ... where... order by ora_hash(rowid)
) where rownum<100