Hive create table not insert data - hadoop

I am running the below hive query. After the mapreduce is complete I see that no data is inserted.
create table t_123 as
select * from
(
select * from t1 union all
select * from t2 union all
select * from t3
) X
But if i just run the select query as below i get results. Data type of t1, t2 and t3 are same. Towards the end I get the below statement:
"numFiles = 27 , numRows = 0 and totalSize = 34567...."
select * from t1 union all
select * from t2 union all
select * from t3
Any thoughts what could be the issue. I'm running this using TEZ.

Related

ORACLE 12.2.01 selecting columns from different tables with similar names --> internal column identifier used

I wrote a SELECT performing a UNION and in each UNION part using some JOINs. The tables, which are joined have partly the same column identifiers.
And if a "SELECT *" is performed, ORACLE decides to display the internal column names instead of the "real" column names.
To show the effect I created two tables (with partly similar column identifiers, "TID" and "TNAME") and filled them with some data:
create table table_one (tid number(10), tname varchar2(10), t2id number(10));
create table table_two (tid number(10), tname varchar2(10));
insert into table_two values (1,'one');
insert into table_two values (2,'two');
insert into table_two values (3,'three');
insert into table_one values (1,'eins',1);
insert into table_one values (2,'zwei',2);
insert into table_one values (3,'drei',3);
The I SELECTED the columns afterwards with the following statement:
select *
from table_one
inner join table_two on table_two.tid = table_one.t2id
where table_one.tid = 1
union
select *
from table_one
inner join table_two on table_two.tid = table_one.t2id
where table_one.tid = 2;
And got this confusing result:
QCSJ_C000000000300000 QCSJ_C000000000300002 T2ID QCSJ_C000000000300001 QCSJ_C000000000300004
1 eins 1 1 one
2 zwei 2 2 two
When the statement is written with tablenames to specify the columns, everything works as I expected:
select table_one.* , table_two.*
from table_one
inner join table_two on table_two.tid = table_one.t2id
where table_one.tid = 1
minus
select *
from table_one
inner join table_two on table_two.tid = table_one.t2id
where table_one.tid = 2;
TID TNAME T2ID TID TNAME
1 eins 1 1 one
2 zwei 2 2 two
Can anybody explain that?
I expanded my tests with two more tables to prevent double usage of table in the statement:
create table table_3 (tid number(10), tname varchar2(10), t4id number(10));
create table table_4 (tid number(10), tname varchar2(10));
insert into table_4 values (1,'one');
insert into table_4 values (2,'two');
insert into table_4 values (3,'three');
insert into table_3 values (1,'eins',1);
insert into table_3 values (2,'zwei',2);
insert into table_3 values (3,'drei',3);
select *
from table_one
inner join table_two on table_two.tid = table_one.t2id
where table_one.tid = 1
union
select *
from table_3
inner join table_4 on table_4.tid = table_3.t4id
where table_3.tid = 2;
select *
from table_one
inner join table_two on table_two.tid = table_one.t2id
where table_one.tid = 1
union
select *
from table_3
inner join table_4 on table_4.tid = table_3.t4id
where table_3.tid = 2;
The result is the same. Oracle uses internal identifiers.
According to Oracle (DocId 2658003.1), this happens when three conditions are met:
ANSI join
UNION / UNION ALL
the same table appears more than once in the query
Aparently, "QCSJ_C" is used internally when Oracle transforms ANSI style joins.
EDIT:
Found a minimal example:
SELECT * FROM dual d1 JOIN dual d2 ON d1.dummy=d2.dummy
UNION
SELECT * FROM dual d1 JOIN dual d2 ON d1.dummy=d2.dummy;
QCSJ_C000000000300000 QCSJ_C000000000300001
X X
It can be fixed by either using non-ANSI join syntax:
SELECT * FROM dual d1, dual d2 WHERE d1.dummy=d2.dummy
UNION
SELECT * FROM dual d1, dual d2 WHERE d1.dummy=d2.dummy;
DUMMY DUMMY_1
X X
Or, preferably by using column names instead of *:
SELECT d1.dummy, d2.dummy FROM dual d1 JOIN dual d2 ON d1.dummy=d2.dummy
UNION
SELECT d1.dummy, d2.dummy FROM dual d1 JOIN dual d2 ON d1.dummy=d2.dummy;
DUMMY DUMMY_1
X X
Interesting!
However, I would never use a set operator (UNION, UNION ALL, INTERSECT, MINUS) together with an asterisk (*).
The order of columns can change, maybe not by you but by somebody doing maintenance on the database, or by migrating your database to a new system using export/import, etc. Simple example:
CREATE TABLE t (a INT, b INT, c INT);
SELECT * FROM t;
A B C
ALTER TABLE t MODIFY b INVISIBLE;
ALTER TABLE t MODIFY b VISIBLE;
SELECT * FROM t;
A C B

Create Oracle database query

I have the following table (tb1):
I need to create a query that consist of:
Select the oldest Date_created having Status 001.
Should not select a PCR if the same PCR having status 002.
For the table above, this query should return the following table:
Can anyone help me how to create it?
Final query:
select q2.id,q2.PCR,q2.status, q2.date_created from (select pcr, min(date_created) date_created from table1 t1 where not exists (select * from table1 t2 where t1.pcr = t2.pcr and t2.status = '002') group by pcr) q1 inner join (select * from table1) q2 on q1.PCR = q2.PCR and q1.date_created = q2.date_created

Perfomance improvement for hive query

I am using multiple union all and then doing the sum of each column, but this query runs like forever. I have 96GB memory cluster. Please tell me what should i do for performance improvement. Following is my query in hive.
total as
(
select * from
(
select * from table1
union all
select * from table2
union all
select * from table3
union all
select * from table4
union all
select * from table5
union all
select * from table6
union all
select * from table7
union all
select * from table8
union all
select * from table9
)p
)
Select * from
(
select
sum(col_1),
sum(col_2),
sum(col_3),
sum(col_4),
sum(col_5),
sum(col_6),
sum(col_7),
sum(col_8),
sum(col_9),
sum(col_10)
from total
)q;

Old SQL to the New. table by table joins

With our oracle Database/queries that are currently running i have come across some SQL where they have done a table by table join. Now I want to be able to understand this so could someone explain? I am a newbie to this.
SELECT *
FROM ra_customer_trx_all
WHERE customer_trx_id IN
(SELECT customer_trx_id
FROM AR_PAYMENT_SCHEDULES_ALL
WHERE payment_schedule_ID IN
(SELECT payment_schedule_ID
FROM AR_RECEIVABLE_APPLICATIONS_ALL
WHERE applied_customer_trx_id =
SELECT customer_trx_id FROM ra_customer_trx_all WHERE trx_number = '34054'));
1st:
select all TRX records from table ra_customer_trx_all where number = 34054
we are looking for customer_trx_id
select * from ra_customer_trx_all t4 where t4.trx_number = '34054'
2nd: select all records from payment_schedule table that have the IDs from step1
select * from AR_RECEIVABLE_APPLICATIONS_ALL t3 where t3.payment_schedule_ID = (prev select)
3rd: select all records from customer_trx_all table that have the IDs from step2
select * from AR_PAYMENT_SCHEDULES_ALL t2 where t3.customer_trx_id = (prev select)
4th
select * from ra_customer_trx_all t1 where t2.customer_trx_id = (prev select)
5:
summary:
if trx is transation
the logic is:
select all customer transaction records that have been scheduled to be paid via the RECEIVABLE_APPLICATIONS and transaction number is 34054
SELECT t1.*
FROM ra_customer_trx_all t1
inner join AR_PAYMENT_SCHEDULES_ALL t2 on t2.customer_trx_id = t1.customer_trx_id
inner join AR_RECEIVABLE_APPLICATIONS_ALL t3 on t3.payment_schedule_ID = t2.payment_schedule_ID
inner join ra_customer_trx_all t4 on t4.customer_trx_id = t3.applied_customer_trx_id
where t4.trx_number = '34054'
You can replace
select *
from tableA
where columnA in (select columnB
from tableB
where columnB1 in (select ...))
with
select *
from tableA, tableB
where tableA.columnA = tableB.columnB
and tableB.columnB1 in (select ...)
Apply this pattern sequentially to each subquery.
Short explanation: you open outer brackets after IN keyword, move table from inner FROM clause to outer, and add condition to WHERE clause: column before IN have to be equal to column in SELECT clause in subquery.

How to force push predicate through UNION ALL inside a view?

I have a performance problem on a UNION ALL view. The problem can be solved by rewriting the view in two separate views but that kind of defeats the purpose of creating a view.
Here is a simple test case (Oracle 11.2.0.3.0). The real queries use about 10 different tables instead of just 3.
CREATE TABLE t0 (id number, ref_id number);
CREATE INDEX i0 on t0(id);
CREATE TABLE t1 (id number, amount number);
CREATE INDEX i1 on t1(id);
CREATE TABLE t2 (id number, amount number);
CREATE INDEX i2 on t2(id);
insert into t0 select rownum, rownum * 10 from dual connect by rownum <= 100000;
insert into t1 select rownum, rownum * 10 from dual connect by rownum <= 100000;
insert into t2 select rownum, rownum * 10 from dual connect by rownum <= 100000;
CREATE OR REPLACE VIEW v2
AS
SELECT id, sum(amount) AS total_amount
FROM t1
GROUP BY id;
CREATE OR REPLACE VIEW v3
AS
SELECT id
,sum(amount) as total_amount
FROM (SELECT id, amount
FROM t1
UNION ALL
SELECT id, amount
FROM t2)
GROUP BY id
HAVING sum(amount) <> 0
;
CREATE OR REPLACE view v1
AS
SELECT *
FROM v2
UNION ALL
SELECT *
FROM v3;
The following query uses 766 gets. Adding push_pred(a) does nothing for it.
select --+ first_rows
*
from t0, v1 a
where t0.ref_id = a.id
and t0.id = 1;
Next query bottoms out at 16 gets although it does the same thing as the first one, only scans t0 two times instead of one.
select --+ first_rows
*
from t0, v2
where t0.ref_id = v2.id
and t0.id = 1
union all
select *
from t0, v3
where t0.ref_id = v3.id
and t0.id = 1;
What am I missing?

Resources