What is more efficent: several insert vs single insert with union - performance

I have a large table (~6M rows, 41 cols) in Postgresql as follows:
id | answer1 | answer2 | answer3 | ... | answer40
1 | xxx | yyy | null | ... | null
2 | xxx | null | null | ... | null
3 | xxx | null | zzz | ... | aaa
Note that there are many empty columns in every rows and I only want those with data
I want to normalize it to get this:
id | answers
1 | xxx
1 | yyy
2 | xxx
3 | xxx
3 | zzz
...
3 | aaa
The question is, what is more efficient / fast, several inserts or a single insert and many unions?:
Option 1
create new_table as
select id, answer1 from my_table where answer1 is not null
union
select id, answer2 from my_table where answer2 is not null
union
select id, answer3 from my_table where answer3 is not null
union ...
Option 2
create new_table as select id, answer1 from my_table where answer1 is not null;
insert into new_table select id, answer2 from my_table where answer2 is not null;
insert into new_table select id, answer3 from my_table where answer3 is not null;
...
Option 3: is there a better way to do this?

Option 2 should be faster.
Wrap all the statements in a begin-commit block to save the time on individual commits.
For faster selects make sure that the columns being filtered (e.g. where answer1 is not null) have indexes

Related

when select clause return empty result "INSERT OVERWRITE TABLE" can't clean table

create table
create external table test_overwrite(
t_id bigint,t_name string
) STORED AS TEXTFILE;
location '/user/hive/test_overwrite'
insert one record
insert into test_overwrite select 1, "test" from (select count(*) from test_overwrite ) a;
+----------------------+------------------------+
| test_overwrite.t_id | test_overwrite.t_name |
+----------------------+------------------------+
| 1 | test |
+----------------------+------------------------+
1 row selected (0.538 seconds)
overwrite with new record
insert overwrite table test_overwrite select 2,"good";
+----------------------+------------------------+
| test_overwrite.t_id | test_overwrite.t_name |
+----------------------+------------------------+
| 2 | good |
+----------------------+------------------------+
1 row selected (0.619 seconds)
Failed to overwrite table with empty result.
insert overwrite table test_overwrite select 2,"good" from test_overwrite where t_id > 5;
+----------------------+------------------------+
| test_overwrite.t_id | test_overwrite.t_name |
+----------------------+------------------------+
| 2 | good |
+----------------------+------------------------+
1 row selected (0.619 seconds)
question: does anyone know how to fix the problem?

Insert data to table from another table containing null values and replace null values with the original table 1 values

I want to match first column of both table and insert table 2 values to table 1 . But if Table 2 values are null leave table 1 vlaues as it is .I am using Hive to dothis .Please help.
You need to use coalesce to get non null value to populate b column and case statement to make decision to populate c column.
Example:
hive> select t1.a,
coalesce(t2.y,t1.b)b,
case when t2.y is null then t1.c
else t2.z
end as c
from table1 t1 left join table2 t2 on t1.a=t2.x;
+----+-----+----+--+
| a | b | c |
+----+-----+----+--+
| a | xx | 5 |
| b | bb | 2 |
| c | zz | 7 |
| d | dd | 4 |
+----+-----+----+--+

Update between tables

I have two tables (ORACLE11) :
TABLE1: (ID_T1 is TABLE1 primary key)
| ID_T1 | NAME | DATEBEGIN | DATEEND |
| 10 | test | 01/01/2017 | 01/06/2017 |
| 11 | test | 01/01/2017 | null |
| 12 | test1 | 01/01/2017 | 01/06/2017 |
| 13 | test1 | 01/01/2017 | null |
TABLE2: (ID_T2 is TABLE2 primary key and ID_T1 is TABLE2 foreign key on TABLE1)
| ID_T2 | ID_T1 |
| 1 | 10 |
| 2 | 11 |
| 3 | 11 |
| 4 | 12 |
| 5 | 13 |
I need to delete all rows from TABLE1 where TABLE1.DATEEND = 'null'
But first I must update TABLE2 to modify TABLE2.ID_T1 to the remaining record in TABLE1 for the same NAME :
TABLE2:
| ID_T2 | ID_T1 |
| 1 | 10 |
| 2 | 10 |
| 3 | 10 |
| 4 | 12 |
| 5 | 12 |
I tried this:
UPDATE TABLE2
SET TABLE2.ID_T1 = (
SELECT TABLE1.ID_T1
FROM TABLE1
WHERE TABLE1.DATEBEGIN = '01/01/2017'
AND TABLE1.DATEEND IS NOT NULL
)
WHERE TABLE2.ID_T1 = (
SELECT TABLE1.ID_T1
FROM TABLE1
WHERE TABLE1.DATEBEGIN = '01/01/2017'
AND TABLE1.DATEEND IS NULL
);
But I don't know how to join on TABLE1.NAME and do it for all rows of TABLE2. Thanks in advance for your help.
First create a temp table to find out which id is to be retained for which name. In the case od multiple possible values, I have selected one by ascending order or id_t1.
create table table_2_update as
select id_t1, name from (select id_t1, name row_number() over(partition by
name order by id_t1) from table1 where name is not null) where rn=1;
Create next table to know which id of table2 connects to which name of table1.
create table which_to_what as
select t2.id_t2, t2.id_t1, t1.name from table1 t1 inner join table2 t2 on
t1.id_t1 = t2.id_t2 group by t2.id_t2, t2.id_t1, t1.name;
Since this newly created table now contains id and name of table1, and id of table2, merge into it to retain one to one case of id and name of table1.
merge into which_to_what a
using table_2_update b
on (a.name=b.name)
when matched then update set
a.id_t1=b.id_t1;
Now finally we have a table which contains the final correct values, you can either rename this tale to table2 or merge original table2 on the basis of id of new table and original table2.
merge into table2 a
using which_to_what a
on (a.id_t2=b.id_t2)
when matched then update set
a.id_t1=b.id_t1;
Finally delete null values from table1.
delete from table1 where dateend is null;
You can do this by joining table1 to itself, joining on the name column. Use the first table (a) to link to table2.id_t1 and the second table (b) to get the t1_id where dateend is not null.
UPDATE table2
SET table2.id_t1 = (
select b.id_t1
from table1 a, table1 b
where a.name = b.name
and b.dateend is not null
and a.id_t1 = table2.id_t1
)
WHERE EXISTS (
select b.id_t1
from table1 a, table1 b
where a.name = b.name
and b.dateend is not null
and a.id_t1 = table2.id_t1
);
This assumes that there will only be one table1 record where dateend is not null.

Unable to extend temp segment by 128 in tablespace TEMP. Another option to execute that query?

I am new in Oracle SQL and I am trying to make an update of a table with the next context:
I have a table A:
+---------+---------+---------+----------+
| ColumnA | name | ColumnC | Column H |
+---------+---------+---------+----------+
| 1 | Harry | null | null |
| 2 | Harry | null | null |
| 3 | Harry | null | null |
+---------+---------+---------+----------+
And a table B:
+---------+---------+---------+
| name | ColumnE | ColumnF |
+---------+---------+---------+
| Harry | a | d |
| Ron | b | e |
| Hermione| c | f |
+---------+---------+---------+
And I want to update the table A so that the result will be the next:
+---------+---------+---------+----------+
| ColumnA | name | ColumnC | Column H |
+---------+---------+---------+----------+
| 1 | Harry | a | d |
| 2 | Harry | a | d |
| 3 | Harry | a | d |
+---------+---------+---------+----------+
I have an issue with an Oracle SQL sentence. I have the next context:
merge into tableA a
using tableB b
on (a.name=b.name)
when matched then update set
columnC = b.columnE,
columnH = b.columnF
create table tableA (columnC varchar2(20), columnH varchar2(20), name varchar2(20), columnA number);
create table tableB (columnE varchar2(20), columnF varchar2(20), name varchar2(20));
insert into tableA values (null, null,'Harry',1);
insert into tableA values (null, null,'Harry',3);
insert into tableA values (null, null,'Harry',3);
insert into tableB values ('a', 'd','Harry');
insert into tableB values ('b', 'e','Ron');
insert into tableB values ('c', 'f','Hermione');
select * from tableA;
merge into tableA a
using tableB b
on (a.name=b.name)
when matched then update set
columnC = b.columnE,
columnH = b.columnF;
select * from tableA;
The problem is that I get the next error when I execute that command:
Error: ORA-01652: unable to extend temp segment by 128 in tablespace
TEMP
I cannot give more space to TEMP tablespace. So, my question is: Is there any option to use another SQL query that doesn't use TEMP tablespace?
you can try the following query maybe it will consume less TEMP tablespace:
update tableA
set (columnC, columnH ) = (select ColumnE, ColumnF from tableB where tableB.name = tableA.name)
where
tableA.name in (select tableB.name from tableB)
;
Or you can try to perform an update in small chunks in a loop. It's less perfomant, but if you have no other way ...
begin
FOR rec in
(select name, ColumnE, ColumnF from tableB)
LOOP
update tableA
set
columnC = rec.columnE
, columnH = rec.columnF
where name = rec.name
;
end loop;
end;
/

identifying duplicate row sets

I have table that looks like the following.
create table Testing(
inv_num varchar2(100),
po_num varchar2(100),
line_num varchar2(100)
)
data with the following.
Insert into Testing (INV_NUM,PO_num,line_num) values ('19782594','P0254836',1);
Insert into Testing (INV_NUM,PO_num,line_num) values ('19782594','P0254836',1);
Insert into Testing (INV_NUM,PO_num,line_num) values ('19968276','P0254836',1);
Insert into Testing (INV_NUM,PO_num,line_num) values ('19968276','P0254836',1);
what i'm trying to do is identify the multiple items within the table with the same PO_num but different inv_num.
I have try this
SELECT
T1.inv_num,
T1.Po_num,
T1.LINE_num ,
count(*) over( partition by
T1.inv_num)myRecords
FROM testing T1
where T1.Po_num = 'P0254836'
group by
T1.inv_num,
T1.Po_num,
T1.LINE_num
order by t1.inv_num
but this those not give me the desired end result.
I would like to end with the following.
INV_NUM PO_NUM LINE_NUM Myrecords
19782594 P0254836 1 1
19782594 P0254836 1 1
19968276 P0254836 1 2
19968276 P0254836 1 2
Where I'm going wrong? I really like to identify the change in INV_NUM for that po.
Please be aware this is part of a much larger project and I have only picked a small subset to show here.
Updated:
SELECT
inv_num
, po_num
, line_num
, DENSE_RANK() OVER (ORDER BY inv_num) "MyRecords"
FROM (
SELECT
po_num
, inv_num
, line_num
, COUNT(line_num) OVER (PARTITION BY po_num, inv_num ORDER BY NULL) cnt
FROM testing
)
WHERE cnt > 1;
returns
| INV_NUM | PO_NUM | LINE_NUM | MYRECORDS |
|----------|----------|----------|-----------|
| 19782594 | P0254836 | 1 | 1 |
| 19782594 | P0254836 | 1 | 1 |
| 19968276 | P0254836 | 1 | 2 |
| 19968276 | P0254836 | 1 | 2 |
SQL Fiddle
Maybe this helps:
SELECT inv_num,
po_num,
line_num,
DENSE_RANK() OVER (ORDER BY inv_num) AS rn
FROM testing

Resources