How to force push predicate through UNION ALL inside a view? - oracle

I have a performance problem on a UNION ALL view. The problem can be solved by rewriting the view in two separate views but that kind of defeats the purpose of creating a view.
Here is a simple test case (Oracle 11.2.0.3.0). The real queries use about 10 different tables instead of just 3.
CREATE TABLE t0 (id number, ref_id number);
CREATE INDEX i0 on t0(id);
CREATE TABLE t1 (id number, amount number);
CREATE INDEX i1 on t1(id);
CREATE TABLE t2 (id number, amount number);
CREATE INDEX i2 on t2(id);
insert into t0 select rownum, rownum * 10 from dual connect by rownum <= 100000;
insert into t1 select rownum, rownum * 10 from dual connect by rownum <= 100000;
insert into t2 select rownum, rownum * 10 from dual connect by rownum <= 100000;
CREATE OR REPLACE VIEW v2
AS
SELECT id, sum(amount) AS total_amount
FROM t1
GROUP BY id;
CREATE OR REPLACE VIEW v3
AS
SELECT id
,sum(amount) as total_amount
FROM (SELECT id, amount
FROM t1
UNION ALL
SELECT id, amount
FROM t2)
GROUP BY id
HAVING sum(amount) <> 0
;
CREATE OR REPLACE view v1
AS
SELECT *
FROM v2
UNION ALL
SELECT *
FROM v3;
The following query uses 766 gets. Adding push_pred(a) does nothing for it.
select --+ first_rows
*
from t0, v1 a
where t0.ref_id = a.id
and t0.id = 1;
Next query bottoms out at 16 gets although it does the same thing as the first one, only scans t0 two times instead of one.
select --+ first_rows
*
from t0, v2
where t0.ref_id = v2.id
and t0.id = 1
union all
select *
from t0, v3
where t0.ref_id = v3.id
and t0.id = 1;
What am I missing?

Related

Oracle adding a subquery in a CTE

I have the following setup, which works fine and generates output as expected.
I'm trying to add the locations subquery into the CTE so my output will have a random location_id for each row.
The subquery is straight forward and should work but I am getting syntax errors when I try to place it into the 'data's CTE. I was hoping someone could help me out.
CREATE TABLE employees(
employee_id NUMBER(6),
emp_name VARCHAR2(30)
);
INSERT INTO employees(
employee_id,
emp_name
) VALUES
(1, 'John Doe');
INSERT INTO employees(
employee_id,
emp_name
) VALUES
(2, 'Jane Smith');
INSERT INTO employees(
employee_id,
emp_name
) VALUES
(3, 'Mike Jones');
CREATE TABLE locations AS
SELECT level AS location_id,
'Door ' || level AS location_name
FROM dual
CONNECT BY level <=
with rws as (
select level rn from dual connect by level <= 5 ),
data as ( select e.*,round (dbms_random.value(1,5)
) n from employees e)
select employee_id,
emp_name,
trunc (sysdate) + dbms_random.value (0, 5) AS random_date
from rws
join data d on rn <= n
order by employee_id;
-- trying to make this work
with rws as ( select level rn from dual connect by level <= 5 ),
data as ( select e.*, loc.location_id = (
select location_id
from locations order by dbms_random.value()
fetch first 1 row only
),
round (dbms_random.value(1,5)
) n from employees e )
select employee_id,
emp_name,
trunc (sysdate) + dbms_random.value (0, 5) AS random_date
from rws
join data d on rn <= n
order by employee_id;
You need to alias the subquery column expression, rather than trying to assign it to a [variable] name. So instead of this:
with rws as ( select level rn from dual connect by level <= 5 ),
data as ( select e.*, loc.location_id = (
select location_id
from locations order by dbms_random.value()
fetch first 1 row only
),
round (dbms_random.value(1,5)
) n from employees e )
you would do this:
with rws as (
select level rn
from dual
connect by level <= 5
),
data as (
select e.*,
(
select location_id
from locations
order by dbms_random.value()
fetch first 1 row only
) as location_id,
round (dbms_random.value(1,5)) as n
from employees e
)
db<>fiddle
But yes, you'll get the same location_id for each row, which probably isn't what you want.
There are probably better ways to avoid it (or to approach whatever you're actually trying to achieve) but one option is to force the subquery to be correlated by adding something like:
where location_id != -1 * e.employee_id
db<>fiddle
although that might be expensive. It's probably worth asking a new question about that specific aspect.
I am getting the same location_id for every employee_id, which I don't want either.
The subquery is in the wrong place then; move it to the main query, and correlate against both ID and n:
with rws as (
select level rn
from dual
connect by level <= 5
),
data as (
select e.*,
round (dbms_random.value(1,5)) as n
from employees e
)
select d.employee_id,
d.emp_name,
(
select location_id
from locations
where location_id != -1 * d.employee_id * d.n
order by dbms_random.value()
fetch first 1 row only
) as location_id,
trunc (sysdate) + dbms_random.value (0, 5) AS random_date
from rws r
join data d on r.rn <= d.n
order by d.employee_id;
db<>fiddle
Or move the location part to a new CTE, I suppose, with its own row number; and join that on one of your other generated values.

ORACLE 12.2.01 selecting columns from different tables with similar names --> internal column identifier used

I wrote a SELECT performing a UNION and in each UNION part using some JOINs. The tables, which are joined have partly the same column identifiers.
And if a "SELECT *" is performed, ORACLE decides to display the internal column names instead of the "real" column names.
To show the effect I created two tables (with partly similar column identifiers, "TID" and "TNAME") and filled them with some data:
create table table_one (tid number(10), tname varchar2(10), t2id number(10));
create table table_two (tid number(10), tname varchar2(10));
insert into table_two values (1,'one');
insert into table_two values (2,'two');
insert into table_two values (3,'three');
insert into table_one values (1,'eins',1);
insert into table_one values (2,'zwei',2);
insert into table_one values (3,'drei',3);
The I SELECTED the columns afterwards with the following statement:
select *
from table_one
inner join table_two on table_two.tid = table_one.t2id
where table_one.tid = 1
union
select *
from table_one
inner join table_two on table_two.tid = table_one.t2id
where table_one.tid = 2;
And got this confusing result:
QCSJ_C000000000300000 QCSJ_C000000000300002 T2ID QCSJ_C000000000300001 QCSJ_C000000000300004
1 eins 1 1 one
2 zwei 2 2 two
When the statement is written with tablenames to specify the columns, everything works as I expected:
select table_one.* , table_two.*
from table_one
inner join table_two on table_two.tid = table_one.t2id
where table_one.tid = 1
minus
select *
from table_one
inner join table_two on table_two.tid = table_one.t2id
where table_one.tid = 2;
TID TNAME T2ID TID TNAME
1 eins 1 1 one
2 zwei 2 2 two
Can anybody explain that?
I expanded my tests with two more tables to prevent double usage of table in the statement:
create table table_3 (tid number(10), tname varchar2(10), t4id number(10));
create table table_4 (tid number(10), tname varchar2(10));
insert into table_4 values (1,'one');
insert into table_4 values (2,'two');
insert into table_4 values (3,'three');
insert into table_3 values (1,'eins',1);
insert into table_3 values (2,'zwei',2);
insert into table_3 values (3,'drei',3);
select *
from table_one
inner join table_two on table_two.tid = table_one.t2id
where table_one.tid = 1
union
select *
from table_3
inner join table_4 on table_4.tid = table_3.t4id
where table_3.tid = 2;
select *
from table_one
inner join table_two on table_two.tid = table_one.t2id
where table_one.tid = 1
union
select *
from table_3
inner join table_4 on table_4.tid = table_3.t4id
where table_3.tid = 2;
The result is the same. Oracle uses internal identifiers.
According to Oracle (DocId 2658003.1), this happens when three conditions are met:
ANSI join
UNION / UNION ALL
the same table appears more than once in the query
Aparently, "QCSJ_C" is used internally when Oracle transforms ANSI style joins.
EDIT:
Found a minimal example:
SELECT * FROM dual d1 JOIN dual d2 ON d1.dummy=d2.dummy
UNION
SELECT * FROM dual d1 JOIN dual d2 ON d1.dummy=d2.dummy;
QCSJ_C000000000300000 QCSJ_C000000000300001
X X
It can be fixed by either using non-ANSI join syntax:
SELECT * FROM dual d1, dual d2 WHERE d1.dummy=d2.dummy
UNION
SELECT * FROM dual d1, dual d2 WHERE d1.dummy=d2.dummy;
DUMMY DUMMY_1
X X
Or, preferably by using column names instead of *:
SELECT d1.dummy, d2.dummy FROM dual d1 JOIN dual d2 ON d1.dummy=d2.dummy
UNION
SELECT d1.dummy, d2.dummy FROM dual d1 JOIN dual d2 ON d1.dummy=d2.dummy;
DUMMY DUMMY_1
X X
Interesting!
However, I would never use a set operator (UNION, UNION ALL, INTERSECT, MINUS) together with an asterisk (*).
The order of columns can change, maybe not by you but by somebody doing maintenance on the database, or by migrating your database to a new system using export/import, etc. Simple example:
CREATE TABLE t (a INT, b INT, c INT);
SELECT * FROM t;
A B C
ALTER TABLE t MODIFY b INVISIBLE;
ALTER TABLE t MODIFY b VISIBLE;
SELECT * FROM t;
A C B

Hive create table not insert data

I am running the below hive query. After the mapreduce is complete I see that no data is inserted.
create table t_123 as
select * from
(
select * from t1 union all
select * from t2 union all
select * from t3
) X
But if i just run the select query as below i get results. Data type of t1, t2 and t3 are same. Towards the end I get the below statement:
"numFiles = 27 , numRows = 0 and totalSize = 34567...."
select * from t1 union all
select * from t2 union all
select * from t3
Any thoughts what could be the issue. I'm running this using TEZ.

Oracle select random rows matching a join condition

My objective is simple, I have to create a temporary table with some random values from a employee table whenever the department is in some particular department (say 2). For the rest of departments I don't care the value, it can be NULL.
Currently I have the following :
create table test
as
select s.DEPTNAME,
cast (
(case when s.DEPTID in (2) then
(SELECT a.ENAME FROM
(SELECT b.ENAME, b.DEPTID FROM EMPLOYEE b
WHERE b.DEPTID IS NOT NULL
ORDER BY DBMS_RANDOM.VALUE) a
WHERE a.DEPTID = s.DEPTID AND ROWNUM = 1
)
END)
AS VARCHAR2(30)) "ENAME" from DEPARTMENT s;
But the main issue here is related to performance. For every department value in 2 we do a sort of EMPLOYEE table to get a single random ENAME.
Is there a better way to do this ? I know sample might work but I want to achieve more randomness.
First idea - join randomly numbered enames:
with
e as (select ename, deptid, row_number() over (order by dbms_random.value) rn
from employee where deptid = 2),
c as (select count(1) cnt from e),
d as (select deptname, deptid, round(dbms_random.value(1, c.cnt)) rn from department, c)
select d.deptname, e.ename from d left join e using (rn, deptid)
SQLFiddle demo
Second possible solution, which worked for me, is to create function returning random ename from table employee
and use it in your query, but it would be probably slower.
Edit - according to comment:
If, for some reason, the first part of your statement is "fixed", then you could use this syntax:
create table test as
select deptname, ename from (
with
e as (select ename, deptid, row_number() over (order by dbms_random.value) rn
from employee where deptid = 2),
c as (select count(1) cnt from e),
d as (select deptname, deptid, round(dbms_random.value(1, c.cnt)) rn
from department cross join c)
select d.deptname, e.ename from d left join e using (rn, deptid));

How to write a table literal in Oracle?

A table with one column and one row can be created with:
select 'create' as col from dual;
This can be used to build table joins:
with
a as (select 'create' as ac from dual),
b as (select 'delete' as bc from dual)
select * from a left outer join b on (ac = bc);
Now I would like to have two rows. I did it in this way:
select 'create' as col from dual
union
select 'delete' as col from dual;
But is there a more compact notation for this? I tried
select ('create', 'delete') as col from dual;
but it does not work.
You can use collection type and TABLE operator, for example (works in Oracle 10g):
SQL> SELECT column_value FROM TABLE(SYS.ODCIVARCHAR2LIST('abc', 'def', 'ghi'));
COLUMN_VALUE
--------------------------------------------------------------------------------
abc
def
ghi
A couple of ways to generate rows. You could use rownum against a table with a larger number of rows:
SELECT roWnum AS a
FROM user_objects
WHERE rownum <= 3
You could use a hierarchical query:
SELECT level AS a
FROM dual
CONNECT BY LEVEL <= 3
EDIT: change int sequence to alpha sequence:
SELECT CHR( ASCII('a') + level - 1 )
FROM dual
CONNECT BY LEVEL <= 3

Resources