Hive create table not insert data

Hive create table not insert data - hadoop

I am running the below hive query. After the mapreduce is complete I see that no data is inserted.
create table t_123 as
select * from
(
select * from t1 union all
select * from t2 union all
select * from t3
) X
But if i just run the select query as below i get results. Data type of t1, t2 and t3 are same. Towards the end I get the below statement:
"numFiles = 27 , numRows = 0 and totalSize = 34567...."
select * from t1 union all
select * from t2 union all
select * from t3
Any thoughts what could be the issue. I'm running this using TEZ.

Related

ORACLE 12.2.01 selecting columns from different tables with similar names --> internal column identifier used

I wrote a SELECT performing a UNION and in each UNION part using some JOINs. The tables, which are joined have partly the same column identifiers.
And if a "SELECT *" is performed, ORACLE decides to display the internal column names instead of the "real" column names.
To show the effect I created two tables (with partly similar column identifiers, "TID" and "TNAME") and filled them with some data:
create table table_one (tid number(10), tname varchar2(10), t2id number(10));
create table table_two (tid number(10), tname varchar2(10));
insert into table_two values (1,'one');
insert into table_two values (2,'two');
insert into table_two values (3,'three');
insert into table_one values (1,'eins',1);
insert into table_one values (2,'zwei',2);
insert into table_one values (3,'drei',3);
The I SELECTED the columns afterwards with the following statement:
select *
from table_one
inner join table_two on table_two.tid = table_one.t2id
where table_one.tid = 1
union
select *
from table_one
inner join table_two on table_two.tid = table_one.t2id
where table_one.tid = 2;
And got this confusing result:
QCSJ_C000000000300000 QCSJ_C000000000300002 T2ID QCSJ_C000000000300001 QCSJ_C000000000300004
1 eins 1 1 one
2 zwei 2 2 two
When the statement is written with tablenames to specify the columns, everything works as I expected:
select table_one.* , table_two.*
from table_one
inner join table_two on table_two.tid = table_one.t2id
where table_one.tid = 1
minus
select *
from table_one
inner join table_two on table_two.tid = table_one.t2id
where table_one.tid = 2;
TID TNAME T2ID TID TNAME
1 eins 1 1 one
2 zwei 2 2 two
Can anybody explain that?
I expanded my tests with two more tables to prevent double usage of table in the statement:
create table table_3 (tid number(10), tname varchar2(10), t4id number(10));
create table table_4 (tid number(10), tname varchar2(10));
insert into table_4 values (1,'one');
insert into table_4 values (2,'two');
insert into table_4 values (3,'three');
insert into table_3 values (1,'eins',1);
insert into table_3 values (2,'zwei',2);
insert into table_3 values (3,'drei',3);
select *
from table_one
inner join table_two on table_two.tid = table_one.t2id
where table_one.tid = 1
union
select *
from table_3
inner join table_4 on table_4.tid = table_3.t4id
where table_3.tid = 2;
select *
from table_one
inner join table_two on table_two.tid = table_one.t2id
where table_one.tid = 1
union
select *
from table_3
inner join table_4 on table_4.tid = table_3.t4id
where table_3.tid = 2;
The result is the same. Oracle uses internal identifiers.

According to Oracle (DocId 2658003.1), this happens when three conditions are met:
ANSI join
UNION / UNION ALL
the same table appears more than once in the query
Aparently, "QCSJ_C" is used internally when Oracle transforms ANSI style joins.
EDIT:
Found a minimal example:
SELECT * FROM dual d1 JOIN dual d2 ON d1.dummy=d2.dummy
UNION
SELECT * FROM dual d1 JOIN dual d2 ON d1.dummy=d2.dummy;
QCSJ_C000000000300000 QCSJ_C000000000300001
X X
It can be fixed by either using non-ANSI join syntax:
SELECT * FROM dual d1, dual d2 WHERE d1.dummy=d2.dummy
UNION
SELECT * FROM dual d1, dual d2 WHERE d1.dummy=d2.dummy;
DUMMY DUMMY_1
X X
Or, preferably by using column names instead of *:
SELECT d1.dummy, d2.dummy FROM dual d1 JOIN dual d2 ON d1.dummy=d2.dummy
UNION
SELECT d1.dummy, d2.dummy FROM dual d1 JOIN dual d2 ON d1.dummy=d2.dummy;
DUMMY DUMMY_1
X X

Interesting!
However, I would never use a set operator (UNION, UNION ALL, INTERSECT, MINUS) together with an asterisk (*).
The order of columns can change, maybe not by you but by somebody doing maintenance on the database, or by migrating your database to a new system using export/import, etc. Simple example:
CREATE TABLE t (a INT, b INT, c INT);
SELECT * FROM t;
A B C
ALTER TABLE t MODIFY b INVISIBLE;
ALTER TABLE t MODIFY b VISIBLE;
SELECT * FROM t;
A C B

Create Oracle database query

I have the following table (tb1):
I need to create a query that consist of:
Select the oldest Date_created having Status 001.
Should not select a PCR if the same PCR having status 002.
For the table above, this query should return the following table:
Can anyone help me how to create it?

Final query:
select q2.id,q2.PCR,q2.status, q2.date_created from (select pcr, min(date_created) date_created from table1 t1 where not exists (select * from table1 t2 where t1.pcr = t2.pcr and t2.status = '002') group by pcr) q1 inner join (select * from table1) q2 on q1.PCR = q2.PCR and q1.date_created = q2.date_created

Perfomance improvement for hive query

I am using multiple union all and then doing the sum of each column, but this query runs like forever. I have 96GB memory cluster. Please tell me what should i do for performance improvement. Following is my query in hive.
total as
(
select * from
(
select * from table1
union all
select * from table2
union all
select * from table3
union all
select * from table4
union all
select * from table5
union all
select * from table6
union all
select * from table7
union all
select * from table8
union all
select * from table9
)p
)
Select * from
(
select
sum(col_1),
sum(col_2),
sum(col_3),
sum(col_4),
sum(col_5),
sum(col_6),
sum(col_7),
sum(col_8),
sum(col_9),
sum(col_10)
from total
)q;

Old SQL to the New. table by table joins

With our oracle Database/queries that are currently running i have come across some SQL where they have done a table by table join. Now I want to be able to understand this so could someone explain? I am a newbie to this.
SELECT *
FROM ra_customer_trx_all
WHERE customer_trx_id IN
(SELECT customer_trx_id
FROM AR_PAYMENT_SCHEDULES_ALL
WHERE payment_schedule_ID IN
(SELECT payment_schedule_ID
FROM AR_RECEIVABLE_APPLICATIONS_ALL
WHERE applied_customer_trx_id =
SELECT customer_trx_id FROM ra_customer_trx_all WHERE trx_number = '34054'));

1st:
select all TRX records from table ra_customer_trx_all where number = 34054
we are looking for customer_trx_id
select * from ra_customer_trx_all t4 where t4.trx_number = '34054'
2nd: select all records from payment_schedule table that have the IDs from step1
select * from AR_RECEIVABLE_APPLICATIONS_ALL t3 where t3.payment_schedule_ID = (prev select)
3rd: select all records from customer_trx_all table that have the IDs from step2
select * from AR_PAYMENT_SCHEDULES_ALL t2 where t3.customer_trx_id = (prev select)
4th
select * from ra_customer_trx_all t1 where t2.customer_trx_id = (prev select)
5:
summary:
if trx is transation
the logic is:
select all customer transaction records that have been scheduled to be paid via the RECEIVABLE_APPLICATIONS and transaction number is 34054
SELECT t1.*
FROM ra_customer_trx_all t1
inner join AR_PAYMENT_SCHEDULES_ALL t2 on t2.customer_trx_id = t1.customer_trx_id
inner join AR_RECEIVABLE_APPLICATIONS_ALL t3 on t3.payment_schedule_ID = t2.payment_schedule_ID
inner join ra_customer_trx_all t4 on t4.customer_trx_id = t3.applied_customer_trx_id
where t4.trx_number = '34054'

You can replace
select *
from tableA
where columnA in (select columnB
from tableB
where columnB1 in (select ...))
with
select *
from tableA, tableB
where tableA.columnA = tableB.columnB
and tableB.columnB1 in (select ...)
Apply this pattern sequentially to each subquery.
Short explanation: you open outer brackets after IN keyword, move table from inner FROM clause to outer, and add condition to WHERE clause: column before IN have to be equal to column in SELECT clause in subquery.

How to force push predicate through UNION ALL inside a view?

I have a performance problem on a UNION ALL view. The problem can be solved by rewriting the view in two separate views but that kind of defeats the purpose of creating a view.
Here is a simple test case (Oracle 11.2.0.3.0). The real queries use about 10 different tables instead of just 3.
CREATE TABLE t0 (id number, ref_id number);
CREATE INDEX i0 on t0(id);
CREATE TABLE t1 (id number, amount number);
CREATE INDEX i1 on t1(id);
CREATE TABLE t2 (id number, amount number);
CREATE INDEX i2 on t2(id);
insert into t0 select rownum, rownum * 10 from dual connect by rownum <= 100000;
insert into t1 select rownum, rownum * 10 from dual connect by rownum <= 100000;
insert into t2 select rownum, rownum * 10 from dual connect by rownum <= 100000;
CREATE OR REPLACE VIEW v2
AS
SELECT id, sum(amount) AS total_amount
FROM t1
GROUP BY id;
CREATE OR REPLACE VIEW v3
AS
SELECT id
,sum(amount) as total_amount
FROM (SELECT id, amount
FROM t1
UNION ALL
SELECT id, amount
FROM t2)
GROUP BY id
HAVING sum(amount) <> 0
;
CREATE OR REPLACE view v1
AS
SELECT *
FROM v2
UNION ALL
SELECT *
FROM v3;
The following query uses 766 gets. Adding push_pred(a) does nothing for it.
select --+ first_rows
*
from t0, v1 a
where t0.ref_id = a.id
and t0.id = 1;
Next query bottoms out at 16 gets although it does the same thing as the first one, only scans t0 two times instead of one.
select --+ first_rows
*
from t0, v2
where t0.ref_id = v2.id
and t0.id = 1
union all
select *
from t0, v3
where t0.ref_id = v3.id
and t0.id = 1;
What am I missing?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Hive create table not insert data - hadoop

Related

ORACLE 12.2.01 selecting columns from different tables with similar names --> internal column identifier used

Create Oracle database query

Perfomance improvement for hive query

Old SQL to the New. table by table joins

How to force push predicate through UNION ALL inside a view?

Categories

Resources