Partitioning a related set of Oracle tables by day when they don't all have Time information - oracle

I have a set of tables that look similar to this:
Time_Table (relatively small):
Time (TIMESTAMP)
timeId (NUMBER)
Data... (NUMBER)
Table2 (large, about 30 rows per time_table row):
timeId (NUMBER)
table2Id (NUMBER)
Data... (NUMBER)
Table3 (very large, around 10 rows per table2 row, currently 1.4 billion rows after a couple of hundred days):
timeId (NUMBER)
table2Id (NUMBER)
table3Id (NUMBER)
Data... (NUMBER)
My queries ALWAYS join on timeId at the very least, and each query is broken up into days (10 day read will result in 10 smaller queries). New data is written to all tables every day. We need to store (and query) years of data from these tables.
How do I partition these tables into daily chunks when the Time information is only known through a JOIN? Should I be looking at partitioning in ways not reliant on Time? Can this be done automatically, or does it have to be a manual process?
Oracle version 11.2

Reference partitioning may help here. It allows a child table's partitioning scheme to be determined by the parent table.
Schema
--drop table table3;
--drop table table2;
--drop table time_table;
drop table time_table;
create table Time_Table
(
time TIMESTAMP,
timeId NUMBER,
Data01 NUMBER,
constraint time_table_pk primary key (timeId)
)
partition by range (time)
(
partition p1 values less than (date '2000-01-02'),
partition p2 values less than (date '2000-01-03'),
partition p3 values less than (date '2000-01-04')
);
create table table2
(
timeId number,
table2Id number,
Data01 number,
constraint table2_pk primary key (table2ID),
constraint table2_fk foreign key (timeId) references time_table(timeId)
);
create table table3
(
timeId number not null,
table2Id number,
table3Id number,
Data01 number,
constraint table3_pk primary key (table3ID),
constraint table3_fk1 foreign key (timeId) references time_table(timeId),
constraint table3_fk2 foreign key (table2ID) references table2(table2ID)
) partition by reference (table3_fk1);
Execution Plans
The Pstart and Pstop show that the huge child table is correctly pruned even though the partition predicate is only set on the small parent table.
explain plan for
select *
from table3
join time_table using (timeId)
where time = date '2000-01-02';
select * from table(dbms_xplan.display);
Plan hash value: 832465087
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 91 | 3 (0)| 00:00:01 | | |
| 1 | PARTITION RANGE SINGLE| | 1 | 91 | 3 (0)| 00:00:01 | 2 | 2 |
| 2 | NESTED LOOPS | | 1 | 91 | 3 (0)| 00:00:01 | | |
|* 3 | TABLE ACCESS FULL | TIME_TABLE | 1 | 39 | 2 (0)| 00:00:01 | 2 | 2 |
|* 4 | TABLE ACCESS FULL | TABLE3 | 1 | 52 | 1 (0)| 00:00:01 | 2 | 2 |
-----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("TIME_TABLE"."TIME"=TIMESTAMP' 2000-01-02 00:00:00')
4 - filter("TABLE3"."TIMEID"="TIME_TABLE"."TIMEID")
Note
-----
- dynamic sampling used for this statement (level=2)
- automatic DOP: skipped because of IO calibrate statistics are missing
Warnings
Reference partitioning has a few quirks. It doesn't work with interval partitioning in 11g, so you have to manually define every partition for the parent table. The foreign keys are also impossible to disable which may require modifying some scripts. And like any rarely used feature it has a few bugs.

drop table time_table;
create table Time_Table
(
time TIMESTAMP,
-- timeId NUMBER, Why you need ID when you have timestamp?????
Data01 NUMBER,
constraint time_table_pk primary key (time) -- not timeID!!!
)
partition by range (time)
(
partition p1 values less than (date '2000-01-02'),
partition p2 values less than (date '2000-01-03'),
partition p3 values less than (date '2000-01-04')
);
create table table2
(
time timestamp not null,
table2ID number,
Data01 number
)
partition by range (time)
(
partition p1 values less than (date '2000-01-02'),
partition p2 values less than (date '2000-01-03'),
partition p3 values less than (date '2000-01-04')
);
create table table3
(
time timestamp not null,
table2Id number,
table3Id number,
Data01 number
)
partition by range (time)
(
partition p1 values less than (date '2000-01-02'),
partition p2 values less than (date '2000-01-03'),
partition p3 values less than (date '2000-01-04')
);

Related

Materialized view contains wrong result

I'm having trouble with the content of a materialized view in Oracle 19c (version 19.15). I've managed to distill the issues into a reproducible test with this script:
create table b(
tsn varchar2(16) not null primary key,
fid varchar2(256) not null
);
create table bs(
tsn varchar2(16) not null constraint bet_stakes_fk references b,
leg number(1) not null,
amount number(10) not null,
primary key (tsn, leg) using index compress 1
);
create materialized view log on b
with primary key, rowid, sequence, commit scn (fid)
including new values;
create materialized view log on bs
with primary key, rowid, sequence, commit scn (amount)
including new values;
create materialized view bsd_mv
refresh fast start with (sysdate - 1) next (sysdate + 1/14400)
as
select fid, leg, sum(amount), count(*)
from bs inner join b using (tsn)
group by fid, leg;
insert into b values ('a', 'o');
insert into bs values ('a', 1, 10);
commit;
delete from bs where tsn = 'a';
delete from b where tsn = 'a';
insert into b values ('a', 'o');
insert into bs values ('a', 1, 5);
commit;
Wait 10 seconds or so before selecting
select * from bsd_mv;
The result will vary somewhat with different runs of the script, but usually the result will be
| Fid | Leg | Sum | Count |
| --- | --- | --- | ----- |
| o | 1 | 15 | 3 |
... where I would expect it to be...
| Fid | Leg | Sum | Count |
| --- | --- | --- | ----- |
| o | 1 | 5 | 1 |
If I run the query the view is based on, I always get the expected result.
Am I missing something in the setup, or do I have the wrong expectations, or have I triggered a bug in Oracle?
It took months with Oracle support, but eventually this was accepted as a bug...

Table scan on internal tables

Assume I have 2 tables - TABLE-1 & TABLE-2 and each of the table has 1 million rows with 10 columns and index on col1..
Now I build a internal table on this 2 tables ( 1 + 1 = 2 million) rows,
select * from
(select col1, col2,....col10 from table-1
union all
select col1, col2,....col10 from table-2) x
Questions,
how will the internal table will be treated in Oracle since its a internal table..
1. Will the internal table will be treated as a table with index on col1?
2. Will this be captured in the Explain plan?
Yes and yes.
Oracle will effectively treat this inline view as a table. It can use predicate pushing to apply a filter on the inline view to the base tables, and potentially use an index. The explain plan will show this.
Tables, indexes, sample data, and statistics
create table table1(col1 number, col2 number, col3 number, col4 number);
create table table2(col1 number, col2 number, col3 number, col4 number);
create index table1_idx on table1(col1);
create index table2_idx on table2(col1);
insert into table1 select level, level, level, level
from dual connect by level <= 100000;
insert into table2 select level, level, level, level
from dual connect by level <= 100000;
commit;
begin
dbms_stats.gather_table_stats(user, 'TABLE1');
dbms_stats.gather_table_stats(user, 'TABLE2');
end;
/
Explain plan showing predicate pushing and index access
explain plan for
select * from
(
select col1, col2, col3, col4 from table1
union all
select col1, col2, col3, col4 from table2
)
where col1 = 1;
select * from table(dbms_xplan.display);
Plan hash value: 400235428
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 40 | 2 (0)| 00:00:01 |
| 1 | VIEW | | 2 | 40 | 2 (0)| 00:00:01 |
| 2 | UNION-ALL | | | | | |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| TABLE1 | 1 | 20 | 2 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | TABLE1_IDX | 1 | | 1 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID BATCHED| TABLE2 | 1 | 20 | 2 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | TABLE2_IDX | 1 | | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("COL1"=1)
6 - access("COL1"=1)
Notice how the predicates happen before the VIEW, and both indexes are used. By default everything should work as well as can be expected.
Notes
This type of query structure is called an inline view. Although a physical table is not built, the phrase "internal tables" is a good way of thinking about how the query logically works. Ideally, an inline view would work exactly like a pre-built table with the same data. In reality there are some cases where things don't quit work that way. But in general you are definitely on the right path - build a large query by assembling small inline views, and assume that Oracle will optimize it correctly.
for your particular query no any index will be used, but I suppose you do some filtering, ie where x.col1 = ###, I'm not sure that oracle will be able to use table-1/table-2 indexes to filter, so I suggest you to put where statements inside "union query"

Oracle primary key vs. index NOT IN performance

I have the following use case:
A table stores the changed as well as the original data from a person. My query is designed to get only one row for each person: The changed data if there is some, else the original data.
I populated the table with 100k rows of data and 2k of changed data. When using a primary key on my table the query runs in less than a half second. If I put an index on the table instead of a primary key the query runs really slow. So I'll use the primary key, no doubt about that.
My question is: Why is the PK approach so much faster than the one with an index?
Code here:
drop table up_data cascade constraints purge;
/
create table up_data(
pk integer,
hp_nr integer,
up_nr integer,
ps_flag varchar2(1),
ps_name varchar2(100)
-- comment this out and uncomment the index below.
, constraint pk_up_data primary key (pk,up_nr)
);
/
-- insert some data
insert into up_data
select rownum, 1, 0, 'A', 'tester_' || to_char(rownum)
from dual
connect by rownum < 100000;
/
-- insert some changed data
-- change ps_flag = 'B' and mark it with a change number in up_nr
insert into up_data
select rownum, 1, 1, 'B', 'tester_' || to_char(rownum)
from dual
connect by rownum < 2000;
/
-- alternative(?) to the primary key
-- CREATE INDEX idx_up_data ON up_data(pk, up_nr);
/
The select statement looks like this:
select count(*)
from
(
select *
from up_data u1
where up_nr = 1
or (up_nr = 0
and pk not in (select pk from up_data where up_nr = 1)
)
) u
The statement might be target of optimization but for the moment it will stay like this.
When you create a primary key constraint, Oracle also creates an index to support this at the same time. A primary key index has a couple of important differences over a basic index, namely:
All the values in this are guaranteed to be unique
There's no nulls in the table rows (of the columns forming the PK)
These reasons are the key to the performance differences you see. Using your setup, I get the following query plans:
--fast version with PK
explain plan for
select count(*)
from
(
select *
from up_data u1
where up_nr = 1
or (up_nr = 0
and pk not in (select pk from up_data where up_nr = 1)
)
) u
/
select * from table(dbms_xplan.display(NULL, NULL,'BASIC +ROWS'));
-----------------------------------------------------
| Id | Operation | Name | Rows |
-----------------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
| 2 | FILTER | | |
| 3 | INDEX FAST FULL SCAN| PK_UP_DATA | 103K|
| 4 | INDEX UNIQUE SCAN | PK_UP_DATA | 1 |
-----------------------------------------------------
alter table up_data drop constraint pk_up_data;
CREATE INDEX idx_up_data ON up_data(pk, up_nr);
/
--slow version with normal index
explain plan for
select count(*)
from
(
select *
from up_data u1
where up_nr = 1
or (up_nr = 0
and pk not in (select pk from up_data where up_nr = 1)
)
) u
/
select * from table(dbms_xplan.display(NULL, NULL,'BASIC +ROWS'));
------------------------------------------------------
| Id | Operation | Name | Rows |
------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
| 2 | FILTER | | |
| 3 | INDEX FAST FULL SCAN| IDX_UP_DATA | 103K|
| 4 | INDEX FAST FULL SCAN| IDX_UP_DATA | 1870 |
------------------------------------------------------
The big difference is that the fast version employs a INDEX UNIQUE SCAN, rather than a INDEX FAST FULL SCAN in the second access of the table data.
From the Oracle docs (emphasis mine):
In contrast to an index range scan, an index unique scan must have
either 0 or 1 rowid associated with an index key. The database
performs a unique scan when a predicate references all of the columns
in a UNIQUE index key using an equality operator. An index unique scan
stops processing as soon as it finds the first record because no
second record is possible.
This optimization to stop processing proves to be a significant factor in this example. The fast version of your query:
Full scans ~103,000 index entries
For each one of these finds one matching row in the PK index and stop processing the second index further
The slow version:
Full scans ~103,000 index entries
For each one of these performs another scan of the 103,000 rows to find if there's any matches.
So to compare the work done:
With the PK, we have one fast full scan, then 103,000 lookups of one index value
With normal index, we have one fast full scan then 103,000 scans of 103,000 index entries - several orders of magnitude more work!
In this example, both the uniqueness of the primary key and the not null-ness of the index values are necessary to get the performance benefit:
-- create index as unique - we still get two fast full scans
drop index index idx_up_data;
create unique index idx_up_data ON up_data(pk, up_nr);
explain plan for
select count(*)
from
(
select *
from up_data u1
where up_nr = 1
or (up_nr = 0
and pk not in (select pk from up_data where up_nr = 1)
)
) u
/
select * from table(dbms_xplan.display(NULL, NULL,'BASIC +ROWS'));
------------------------------------------------------
| Id | Operation | Name | Rows |
------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
| 2 | FILTER | | |
| 3 | INDEX FAST FULL SCAN| IDX_UP_DATA | 103K|
| 4 | INDEX FAST FULL SCAN| IDX_UP_DATA | 1870 |
------------------------------------------------------
-- now the columns are not null, we see the index unique scan
alter table up_data modify (pk not null, up_nr not null);
explain plan for
select count(*)
from
(
select *
from up_data u1
where up_nr = 1
or (up_nr = 0
and pk not in (select pk from up_data where up_nr = 1)
)
) u
/
select * from table(dbms_xplan.display(NULL, NULL,'BASIC +ROWS'));
------------------------------------------------------
| Id | Operation | Name | Rows |
------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
| 2 | FILTER | | |
| 3 | INDEX FAST FULL SCAN| IDX_UP_DATA | 103K|
| 4 | INDEX UNIQUE SCAN | IDX_UP_DATA | 1 |
------------------------------------------------------

Oracle Index with multiple Columns querying on single column

In a table in our Oracle installation we have a table with an index on two of the columns (X and Y). If I do a query on the table with a where clause only touching column X, will Oracle be able to use the index?
For example:
Table Y:
Col_A,
Col_B,
Col_C,
Index exists on (Col_A, Col_B)
SELECT * FROM Table_Y WHERE Col_A = 'STACKOVERFLOW';
Will the index be used, or will a table scan be done?
It depends.
You could check it by letting Oracle explain the execution plan:
EXPLAIN PLAN FOR
SELECT * FROM Table_Y WHERE Col_A = 'STACKOVERFLOW';
and then
select * from table(dbms_xplan.display);
So, for example with
create table table_y (
col_a varchar2(30),
col_b varchar2(30),
col_c varchar2(30)
);
create unique index table_y_ix on table_y (col_a, col_b);
and then a
explain plan for
select * from table_y
where col_a = 'STACKOVERFLOW';
select * from table(dbms_xplan.display);
The plan (on my installation) looks like:
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 51 | 1 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TABLE_Y | 1 | 51 | 1 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | TABLE_Y_IX | 1 | | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("COL_A"='STACKOVERFLOW')
ID 2 shows you, that the index TABLE_Y_IX is indeed used for an index range scan.
If on another installation Oracle chooses to use the index is dependend on many things. It's Oracle's query optimizer that makes this decision.
Update If you feel you're be better off (performance wise, that is) if Oracle used the index, you might want to try the + index_asc(...) (see index hint)
So in your case that would be something like
SELECT /*+ index_asc(TABLE_Y TABLE_Y_IX) */ *
FROM Table_Y
WHERE Col_A = 'STACKOVERFLOW';
Additionally, I would ensure that you have gathered statistics on the table and its columns. You can check the date of the last gathering of statistics with a
select last_analyzed from dba_tables where table_name = 'TABLE_Y';
and
select column_name, last_analyzed from dba_tab_columns where table_name = 'TABLE_Y';
If there are no statistics or if they're stale, make yourself familiar with the dbms_stats package to gather such statistics.
These statistics are the data that the query optimizer relies on heavily to make its decisions.

How can I get a COUNT(col) ... GROUP BY to use an index?

I've got a table (col1, col2, ...) with an index on (col1, col2, ...). The table has got millions of rows in it, and I want to run a query:
SELECT col1, COUNT(col2) WHERE col1 NOT IN (<couple of exclusions>) GROUP BY col1
Unfortunately, this is resulting in a full table scan of the table, which takes upwards of a minute. Is there any way of getting oracle to use the index on the columns to return the results much faster?
EDIT:
more specifically, I'm running the following query:
SELECT owner, COUNT(object_name) FROM all_objects GROUP BY owner
and there is an index on SYS.OBJ$ (SYS.I_OBJ2) which indexes the owner# and name columns; I believe I should be able to use this index in the query, rather than a full table scan of SYS.OBJ$
I have had the chance to play around with this, and my previous comments regarding the NOT IN are a red herring in this case. The key thing is the presence of NULLs, or rather whether the indexed columns have NOT NULL constraints enforced.
This is going to depend on the version of the database you're using, because the optimizer gets smarter with each release. I'm using 11gR1 and the optimizer used the index in all cases except one: when both columns were null and I didn't include the NOT IN clause:
SQL> desc big_table
Name Null? Type
----------------------------------- ------ -------------------
ID NUMBER
COL1 NUMBER
COL2 VARCHAR2(30 CHAR)
COL3 DATE
COL4 NUMBER
Without the NOT IN clause...
SQL> explain plan for
2 select col4, count(col1) from big_table
3 group by col4
4 /
Explained.
SQL> select * from table(dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------
Plan hash value: 1753714399
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 31964 | 280K| | 7574 (2)| 00:01:31 |
| 1 | HASH GROUP BY | | 31964 | 280K| 45M| 7574 (2)| 00:01:31 |
| 2 | TABLE ACCESS FULL| BIG_TABLE | 2340K| 20M| | 4284 (1)| 00:00:52 |
----------------------------------------------------------------------------------------
9 rows selected.
SQL>
When I dobbed the NOT IN clause back in, the optimizer opted to use the index. Weird.
SQL> explain plan for
2 select col4, count(col1) from big_table
3 where col1 not in (12, 19)
4 group by col4
5 /
Explained.
SQL> select * from table(dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------
Plan hash value: 343952376
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 31964 | 280K| | 5057 (3)| 00:01:01 |
| 1 | HASH GROUP BY | | 31964 | 280K| 45M| 5057 (3)| 00:01:01 |
|* 2 | INDEX FAST FULL SCAN| BIG_I2 | 2340K| 20M| | 1767 (2)| 00:00:22 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
2 - filter("COL1"<>12 AND "COL1"<>19)
14 rows selected.
SQL>
Just to repeat, in all other cases, as long as one of the indexed columns was declared not nill, the index was used to satisfy the query. This may not be true on earlier versions of Oracle, but it probably points the way forward.
you could use a hint http://download.oracle.com/docs/cd/B10501_01/server.920/a96533/hintsref.htm ,
but remember that using an index might not always result in faster execution.
(Just in case, are you sure it's doing a table scan and not an index scan?)
Try using COUNT(*) instead of COUNT(col2) (assuming this is appropriate for you problem, of course). Also, maybe try an index with just col1.
You are querying against oracle's fixed tables, since you've not stated which db vesion this is, I'll assume a recent one. Have the fixed tables been analyzed and have updated statistics? Have you tried your query using the rule base optimizer by the use of the /*+ rule */ hint. Often I've seen that queries against oracle's own fixed tables perform better when the rule base optimizer is used.

Resources