Load data into newly added columns in Impala - hadoop

Let's say I have a table1 in schema1 like this:
Stu_ID
Math
1
A
2
B
3
B+
Now, I want to add a new column, for instance, Literature, into table1 in schema1.
ALTER TABLE schema1.table 1
ADD COLUMN Literature STRING
Table1 now looks like
Stu_ID
Math
Literature
1
A
NULL
2
B
NULL
3
B+
NULL
I want to load data from table2, shema2 based on the respective Stu_ID. Is there a way to do so? I have thought of UPDATE, but Impala only supports updating a kudu table according to my understanding. Please correct me if I'm wrong.

instead of update you can insert+overwrite.
insert overwrite schema1.table1 t1
select
t1.stu_id, t1.Math, t2.Literature
from schema1.table1 t1
join schema2.table2 t2 ON t1.stu_id=t2.stu_id
This will replace whole data of t1 and will replace with old data + new column.

Related

Deleting data from one table using data from a second table

I have a table table1, where there is a million data and a completely identical table table2, only there are only 106 data. How it is possible to delete these 106 data from table1?
In these two tables i have fields like id, date, param0, param1, param2.
Presuming that uniqueness is enforced through the ID column in both tables, then:
delete from table1 a
where exists (select null
from table2 b
where b.id = a.id
);
Otherwise, add some more columns (into the where clause) which will help you delete only rows you really want.

to update a column details into base table 2 from column of staging table against the datails of column from base table 1

There's total of three tables involved. one header base table, one material
base table, one staging table.
I have created the staging table with 4 columns, the values will be
updated from csv uploaded, column 1 is batch_no, column 2 is for
attribute.
>header base table(h) has batch_no and batch_id
>material base table(m) has batch_id, attr_m (empty, to be updated)
>staging table(s) has batch_no and attr_s
create table he (BATCH_ID number, BATCH_NO varchar2(30));
create table me (a6 varchar2(30), BATCH_id number);
create table s (batch_no varchar2(30), att varchar2(30));
I want to take values from attr_s and update attr_m against batch_no. How do I do that?
Here's my code, please help me fix this code, it doesn't work
update me
set a6 = (select att
from s where batch_no = (select he.batch_no
from he, s
where he.batch_no=s.batch_no))
error received:
single row subquery return multiple rows.
single row subquery return multiple rows
The update statement is applied to each individual row in ME. Therefore the assignment operation requires one scalar value to be returned from the subquery. Your subquery is returning multiple values, hence the error.
To fix this you need to further restrict the subquery so it returns one row for each row in ME. From your data model the only way to do this is with the BATCH_ID, like so:
update me
set a6 = (select att
from s where batch_no = (select he.batch_no
from he, s
where he.batch_no=s.batch_no
and he.batch_id = me.batch_id))
Such a solution will work providing that there is only one record in S which matches a given permutation of (batch_no, batch_id). As you have provided any sample data I can't verify that the above statement will actually solve your problem.

Using multiple select statement inside insert statement in Hive

I'm new in Hive. I have three tables like this:
table1:
id;value
1;val1
2;val2
3;val3
table2
num;desc;refVal
1;desc;0
2;descd;0
3;desc;0
I want to create a new table3 that contains:
num;desc;refVal
1;desc;3
2;descd;3
3;desc;3
Where num and desc are columns from table2 and refVal is the max value of column id in table1
Can someone guide me to solve this?
First, you have to create an table to hold this.
CREATE TABLE my_new_table;
After that, you have to insert into this table, as showed here
INSERT INTO TABLE my_new_table
[PARTITION (partcol1=val1, partcol2=val2 ...)]
select_statement1;
In the select_statement1 you can use the same select you would normally use to join and select the columns you need.
For more informations, you can check here

How to copy all constrains and data form one schema to another in oracle

I am using Toad for oracle 12c. I need to copy a table and data (40M) from one shcema to another (prod to test). However there is an unique key(not the PK for this table) called record_Id col which has something data like this 3.000*******19E15. About 2M rows has same numbers(I believe its because very large number) which are unique in prod. When I try to copy it violets the unique key of that col. I am using toad "export data to another schema" function to copy the data.
when I execute query in prod
select count(*) from table_name
OR
select count(distinct(record_id) from table_name
Both query gives the exact same numbers of data.
I don't have DBA permission. How do I copy all data without violating unique key of the table.
Thanks in advance!
You can use UPSERT for decisional INSERT or UPDATE or you may write small procedure for this.
you may consider to use NOT EXISTS, but your data is big and it might not be resource efficient.
insert into prod_tab
select * from other_tab t1 where NOT exists (
select 1 from prod_tab t2 where t1.id = t2.id
);
In Oracle you can use a MERGE query for that.
The following query proceeds as follows for each data row :
if the source record_id does not yet exist in the target table, a new record is inserted
else, the existing record is updated with source values
For the sake of the example, I assumed that there are two other columns in the table : column1 and column2.
MERGE INTO target_table t1
USING (SELECT * from source_table t2)
ON (t1.record_id = t2.record_id)
WHEN MATCHED THEN UPDATE SET
t1.column1 = t2.column1,
t1.column2 = t2.column2
WHEN NOT MATCHED THEN INSERT
(record_id, column1, column2) VALUES (t2.record_id, t2.column1, t2.column2)

Using "contains" as a way to join tables?

The Primary key in table one is used as in table 2 but it is modified as so:
Primary key Column in table 1: 123abc
Column in table 2: 123abc_1
I.e. the key is used but then _1 is added to create a unique value in the column of Table 2.
Is there any way that I can join the two tables, the data in the 2 columns is not identical but it very similar. Could I do something like:
SELECT *
FROM TABLE1 INNER JOIN
TABLE2
ON TABLE1.COUMN1 contains TABLE2.COLUMN2;
I.e. checking that the value in Table 1 is within the value in Table 2?
You can check only the first part of column2; for example
SELECT *
FROM TABLE1 INNER JOIN TABLE2
ON INSTR(COLUMN2, COLUMN1) = 1
or
ON COLUMN2 LIKE COLUMN1 || '%'
However, keeping foreign key in such a way can be really dangerous, not to think about performance on large DBs.
You'd better use a different column in Table2 to store the key of Table 1, even adding a constraint.

Resources