Union Query - force hash unique for each query - oracle

I need the list of unique values for group_name across tables T1 and T2. Both tables are partitioned and contains hundreds of millions of records. The column that I'm interested in has a local bitmap index with up-to-date statistics. There are roughly 500 unique values for this column. (See test case).
Given this information, the most efficient way to find the unique values would appear to be: Find the 500 or so unique values from T1, then find the 500 or so unique values from T2, and then deduplicate the list. So that translates into this query:
select distinct group_name from t1 union
select distinct group_name from t2;
However, the actual execution generated by Oracle is something like this:
SELECT
SORT UNIQUE <-- 500 records
UNION-ALL <-- 1,430,000,000 records
BITMAP INDEX FFS T1 <-- 1,300,000,000 records
BITMAP INDEX FFS T2 <-- 130,000,000 records
So the optimizer seems to have re-written the query to something like the below query, effectively skipping the unique operation from the intermediate steps:
select distinct group_name
from (select group_name from t1 union all -- No Unique here
select group_name from t2)
Here is my actual question:
Without re-writing the query, can I force the following execution plan using hints only? i.e the way my original query was actually written?
SELECT
SORT UNIQUE <-- 500
UNION-ALL <-- 1000
HASH UNIQUE <-- 500 <-- Reduce early
BITMAP INDEX FFS T1 <-- 1,300,000,000
HASH UNIQUE <-- 500 <-- Reduce early
BITMAP INDEX FFS T2 <-- 130,000,000
Here is a test case that creates the two tables above.
create table t1(id number, group_name number, payload varchar2(100)) nologging partition by hash(id) partitions 4;
create table t2(id number, group_name number, payload varchar2(100)) nologging partition by hash(id) partitions 4;
insert /*+ append */ all
into t1
into t2
select rownum as id
,mod(rownum, 500) as group_name
,lpad('x', 100, 'x') as payload
from dual connect by level <= 1e6;
create bitmap index t1_bx on t1(group_name) nologging local;
create bitmap index t2_bx on t2(group_name) nologging local;
I'm using Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
Edit:
Incidentally, while preparing the test case I found two ways around the problem, but I would still like to know if there is a way to /*+ hint */ my way out of the problem.
select group_name from t1 group by group_name union
select group_name from t2 group by group_name;
with v1 as(select /*+ materialize */distinct group_name from t1)
,v2 as(select /*+ materialize */distinct group_name from t2)
select group_name from v1 union
select group_name from v2;

Related

PLSQL code requirement for partition comparison in oracle

Is there any way to compare values for one partition with another partition of the same table? Requirement is like I have a table and suppose there are 5 partitions, table having two columns(not null). Suppose Col1 having all the distinct values and in col2 there can be a duplicate values. So while comparing one partition with other or we can say rest of the 4 partitions on the basis of distinct col2 values according to the partition name, if the value match between two partition then a new table will create with union of the two partition.
And if there is no match between the col2 values of one partition and with rest of the partition then new table will create of same structure(without any union).
Note:
I want to automate this process through PLSQL code.
Currently what I am doing manually:
I have one table having five partition, for example Table structure:
create table PART_TEST1
(col1 int not null,
col2 int not null)
partition by range (col2)
(partition part1 values less than (10),
partition part2 values less than (20),
partition part3 values less than (30),
partition part4 values less than (40),
partition part5 values less than (maxvalue));
Data distribution:
col1 having distinct values like- 1, 2, 3....so on.
col2 having values like- 1, 2, -1, 1, 2, 3, 4, 1...so on
col2 has duplicate values and my goal is to find the distinct values according to the name of the partition like:
select distinct col2 from PART_TEST1 partition (part1);
For example output of above query is:
Col2
1
2
Again I am querying for finding matching values in other partition:
select distinct col2 from PART_TEST1 partition (part2);
For example output of above query is:
Col2
2
3
So now part 1 and part2 has one same value '2' and two non common values 1 and 3.
so my final query is:
create table 'TABLE_NAME' as select * from part_test1 where col2 = 1;
create table 'TABLE_NAME' as select * from part_test1 where col2 = 3;
create table 'TABLE_NAME' as
(select * from part_test1 where col2 = 2
union
select * from part_test1 where col2 = 2);
Hopefully now you will get some clarity about my problem. I am new to PLSQL and not able to compare the partition values. Also if I am able to compare the values then how can I store the output of the comparison query and then finally create the table? And also I am thinking that I need to compare one partition with rest of the partition like some kind of loop operation.

perform a select statement on two or three partition at the same time

I have a set of partitions from 1 to 20.
I have this Query below
select * from table1 partition (1);
I would like to do the same statement but on two or three partitions at the same time but not the whole table.
What would be the correct query to do it?
select * from table1 partition (1)
UNION ALL
select * from table1 partition (2)
UNION ALL
select * from table1 partition (3)
--etc.
;

Create partition in an indexed table

I have a table which holds data for 12 hours. Every 5 minutes, it keeps deleting data which is more than 12 hours old and adds new data. It has almost 15-20 million rows. I want to create partition by hour and also index the table on column(time_stamp), to make the row fetching faster.
I will obviously do interval or range partitioning, but found that interval partitioning doesn't work on indexed table. So please help me with the syntax so that oracle creates 12 partitions and automatically adds new one when new time_stamp data is added which is after first 12 hours. I have already got a procedure to delete oldest partition which i will use so that there is always 12 hours of data.
I am stating the columns below.
CustomerId,ApplicationId,Time_Stamp,Service
I have tried to come up with this, but don't know how it will create new partitions
CREATE TABLE local_table
(customerid VARCHAR2(30),
applicationid VARCHAR2(30),
time_stamp TIMESTAMP,
service VARCHAR2(30))
PARTITION BY RANGE(time_stamp)
(
PARTITION t1 VALUES LESS THAN(TO_TIMESTAMP('2015-02-25 00:00:00.0','YYYY-MM-DD HH24:MI:SS.ff')),
PARTITION t2 VALUES LESS THAN(TO_TIMESTAMP('2015-02-25 01:00:00.0','YYYY-MM-DD HH24:MI:SS.ff')),
PARTITION t3 VALUES LESS THAN(TO_TIMESTAMP('2015-02-25 02:00:00.0','YYYY-MM-DD HH24:MI:SS.ff')),
PARTITION t4 VALUES LESS THAN(TO_TIMESTAMP('2015-02-25 03:00:00.0','YYYY-MM- DD HH24:MI:SS.ff')),
PARTITION t5 VALUES LESS THAN(TO_TIMESTAMP('2015-02-25 04:00:00.0','YYYY-MM-DD HH24:MI:SS.ff')),
PARTITION t6 VALUES LESS THAN(TO_TIMESTAMP('2015-02-25 05:00:00.0','YYYY-MM-DD HH24:MI:SS.ff')),
PARTITION t7 VALUES LESS THAN(TO_TIMESTAMP('2015-02-25 06:00:00.0','YYYY-MM-DD HH24:MI:SS.ff')),
PARTITION t8 VALUES LESS THAN(TO_TIMESTAMP('2015-02-25 07:00:00.0','YYYY-MM-DD HH24:MI:SS.ff')),
PARTITION t9 VALUES LESS THAN(TO_TIMESTAMP('2015-02-25 08:00:00.0','YYYY-MM-DD HH24:MI:SS.ff')),
PARTITION t10 VALUES LESS THAN(TO_TIMESTAMP('2015-02-25 09:00:00.0','YYYY-MM-DD HH24:MI:SS.ff')),
PARTITION t11 VALUES LESS THAN(TO_TIMESTAMP('2015-02-25 10:00:00.0','YYYY-MM-DD HH24:MI:SS.ff')),
PARTITION t12 VALUES LESS THAN(TO_TIMESTAMP('2015-02-25 11:00:00.0','YYYY-MM-DD HH24:MI:SS.ff'))
);
create index index_time_stamp on local_table(TIME_STAMP);
I am using- Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit
Create table with autopartitiong and LOCAL (partitioning) index.
The local_partitioned_index clauses let you specify that the index is partitioned on the same columns, with the same number of partitions and the same partition bounds as table. Oracle Database automatically maintains local index partitioning as the underlying table is repartitioned.
CREATE TABLE local_table
(customerid VARCHAR2(30),
applicationid VARCHAR2(30),
time_stamp TIMESTAMP,
service VARCHAR2(30))
PARTITION BY RANGE(time_stamp)
INTERVAL(NUMTODSINTERVAL(1, 'HOUR'))
(PARTITION t1 VALUES LESS THAN(TO_TIMESTAMP('2015-02-25 00:00:00.0','YYYY-MM-DD HH24:MI:SS.ff'))
);
CREATE INDEX index_time_stamp on local_table(TIME_STAMP) LOCAL;
SELECT *
FROM user_tab_partitions;
INSERT INTO local_table VALUES('1', 'a', sysdate, 'b');
SELECT *
FROM user_tab_partitions;
INSERT INTO local_table VALUES('2', 'c', sysdate + 1/1440, 'd');
SELECT *
FROM user_tab_partitions;
INSERT INTO local_table VALUES('3', 'e', sysdate + 1/24, 'f');
SELECT *
FROM user_tab_partitions;
The INTERVAL clause of the CREATE TABLE statement establishes interval
partitioning for the table. You must specify at least one range
partition using the PARTITION clause. The range partitioning key value
determines the high value of the range partitions, which is called the
transition point, and the database automatically creates interval
partitions for data beyond that transition point. The lower boundary
of every interval partition is the non-inclusive upper boundary of the
previous range or interval partition.
For example, if you create an interval partitioned table with monthly
intervals and the transition point at January 1, 2010, then the lower
boundary for the January 2010 interval is January 1, 2010. The lower
boundary for the July 2010 interval is July 1, 2010, regardless of
whether the June 2010 partition was previously created. Note, however,
that using a date where the high or low bound of the partition would
be out of the range set for storage causes an error. For example,
TO_DATE('9999-12-01', 'YYYY-MM-DD') causes the high bound to be
10000-01-01, which would not be storable if 10000 is out of the legal
range.
Some quick DRAFT for your second question about DROP PARTITION. Examine and debug before uncomment ALTER TABLE. You can create scheduler job for run this block of code every hour.
DECLARE
l_pt_cnt NUMBER;
l_pt_name VARCHAR2(100);
l_minrowid ROWID;
l_mindate TIMESTAMP;
BEGIN
-- get partition count
SELECT count(*)
INTO l_pt_cnt
FROM user_tab_partitions
WHERE table_name = 'LOCAL_TABLE';
IF l_pt_cnt > 12 THEN
SELECT min(time_stamp)
INTO l_mindate
FROM LOCAL_TABLE;
-- get ROWID with min date
SELECT min(rowid)
INTO l_minrowid
FROM LOCAL_TABLE
WHERE time_stamp = l_mindate;
-- get name of partition with row with min date
SELECT subobject_name
INTO l_pt_name
FROM LOCAL_TABLE
JOIN user_objects
ON dbms_rowid.rowid_object(LOCAL_TABLE.rowid) = user_objects.object_id
WHERE LOCAL_TABLE.rowid = l_minrowid;
DBMS_OUTPUT.put_line('ALTER TABLE LOCAL_TABLE DROP PARTITION ' || l_pt_name );
--EXECUTE IMMEDIATE 'ALTER TABLE LOCAL_TABLE DROP PARTITION ' || l_pt_name;
END IF;
END;

Convert rownum from Oracle in Postgres, in "having" clause

I need to convert a query from Oracle SQL to Postgres.
select count(*) from table1 group by column1 having max(rownum) = 4
If I replace "rownum" with "row_number() over()", I have an error message: "window functions are not allowed in HAVING".
Could you help me to get the same result in Postgres, as in Oracle?
The query below will do what your Oracle query is doing.
select count(*) from
(select column1, row_number() over () as x from table1) as t
group by column1 having max(t.x) = 6;
However
Neither oracle not postgres will guarantee the order in which records are read unless you specify an order by clause. So running the query multiple times is going to be inconsistent depending on how the database decides to process the query. Certainly in postgres any updates will change the underlying row order.
In the example below I've got an extra column of seq which is used to provide a consistent sort.
CREATE TABLE table1 (column1 int, seq int);
insert into table1 values (0,1),(0,2),(0,3),(1,4),(0,5),(1,6);
And a revised query which forces the order to be consistent:
select count(*) from
(select column1, row_number() over (order by seq) as x from table1) as t
group by column1 having max(t.x) = 6;

Counting rows by a condition in another table

Need to count the number of rows in one table which connect to a second table by (name.sample) where in the second table the (name.sample) was created before (or after) a certain date.
select count(*) from table1 t1
inner join table2 t2 on t1.my_foreign_key_column = t2.my_primary_key_column
where t2.creation_date >= 'my_date_literal'

Resources