How to update some rows in a partitioned table in hive? - hadoop

I need to update some rows in a partitioned table by date, with a ranges of dates and i don't know how to do it?

Using dynamic partitioning you can overwrite partitions which is necessary to update. Use case statement to check for rows to be modified and to set values, like in this template:
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table table_name partition (partition_column)
select col1,
col2,
case when col1='record to be updated' then 'new value' else col3 end as col3,
...
colN,
partition_column --partition_column should be the last
from table_name
where ...--partition predicate here

Related

Convert MERGE to UPDATE

I have the following MERGE statement.Is there a way to convert this into an update statement without using MERGE?
MERGE INTO tab1
USING (SELECT tab1.col1, tab2.col2
FROM tab1, tab2
WHERE tab1.col1 = tab2.col1) tab3
ON (tab1.col1 = tab3.col1)
WHEN MATCHED THEN UPDATE SET col2 = tab3.col2
What you are asking about is called "update through join", and contrary to a widely held belief, it is possible in Oracle. But there is a catch.
Obviously, the update - no matter how you attempt to perform it - is not well defined unless column col1 is unique in table tab2. That column is used for lookup in the update process; if its values are not unique, the update will be ambiguous. I ignore here idiotic retorts such as "uniqueness is needed only for those values also found in tab1.col1", or "there is no ambiguity as long as all values in tab2.col2 are equal when the corresponding values in tab2.col1 are equal".
The "catch" is this. The uniqueness of tab2.col1 may be a matter of data (you know it when you inspect the data), or a matter of metadata (there is a unique constraint, or a unique index, or a PK constraint, etc., on tab2.col1, which the parser can inspect without ever looking at the actual data).
merge will work even when uniqueness is only known by inspecting the data. It will still throw an error if uniqueness is violated - but that will be a runtime error (only after the data in tab2 is accessed from disk). By contrast, updating through a join requires the same uniqueness to be known ahead of time, through the metadata (or in other ways: for example if the second rowset - not a table but the table-like result of a query - is the result of an aggregation grouping on the join column; then the uniqueness is guaranteed by the definition of "aggregation").
Here is a brief example to show the difference.
Test data:
create table tab1 (col1 number, col2 number);
insert into tab1 (col1, col2) values (1, 3);
create table tab2 (col1 number, col2 number);
insert into tab2 (col1, col2) values (1, 6);
commit;
merge statement (with check at the end):
merge into tab1
using(
select tab1.col1,
tab2.col2
from tab1,tab2
where tab1.col1 = tab2.col1) tab3
on(tab1.col1 = tab3.col1)
when matched then
update
set col2 = tab3.col2;
1 row merged.
select * from tab1;
COL1 COL2
---------- ----------
1 6
Now let's restore table tab1 to its original data for the next test(s):
rollback;
select * from tab1;
COL1 COL2
---------- ----------
1 3
Update through join - with no uniqueness guaranteed in the metadata (will result in error):
update
( select t1.col2 as t1_c2, t2.col2 as t2_c2
from tab1 t1 join tab2 t2 on t1.col1 = t2.col1
)
set t1_c2 = t2_c2;
Error report -
SQL Error: ORA-01779: cannot modify a column which maps to a non key-preserved table
01779. 00000 - "cannot modify a column which maps to a non key-preserved table"
*Cause: An attempt was made to insert or update columns of a join view which
map to a non-key-preserved table.
*Action: Modify the underlying base tables directly.
Now let's add a unique constraint on the lookup column:
alter table tab2 modify (col1 unique);
Table TAB2 altered.
and try the update again (with the same update statement), plus verification:
update
( select t1.col2 as t1_c2, t2.col2 as t2_c2
from tab1 t1 join tab2 t2 on t1.col1 = t2.col1
)
set t1_c2 = t2_c2;
1 row updated.
select * from tab1;
COL1 COL2
---------- ----------
1 6
So - you can do it, if you use the correct syntax (as I have shown here) AND - very important - you have a unique or PK constraint or a unique index on column tab2.col1.

Insert the data of particular column in oracle

I have a table named as T1 with few columns. Out of these columns, one column name is INSERT_DATE and its datatype is TIMESTAMP(3). I want to update the INSERT_DATE column with data "ABC" in all rows.
CREATE TABLE T1(NAME VARCHAR2(5), INSERT_DATE TIMESTAMP(3));
INSERT INTO T1 VALUES('NAVIN',CURRENT_TIMESTAMP);
INSERT INTO T1 VALUES('KAVIN',CURRENT_TIMESTAMP);
INSERT INTO T1 VALUES('TAVIN',CURRENT_TIMESTAMP);
I have queried in oracle like
UPDATE T1
SET INSERT_DATE = 'ABC';
COMMIT;
It is not getting updated. Is there anything I need to add a code.

How to insert into hive table, partitioned by date reading from temp table? [duplicate]

This question already has answers here:
Hive dynamic partitioning
(2 answers)
Closed 2 years ago.
I have a Hive temp table without any partitions which has the data required. I want to select this data and insert into another table partitioned by date. I tried following techniques with no luck.
Source table schema
CREATE TABLE cls_staging.cls_billing_address_em_tmp
( col1 string,
col2 string,
col3 string);
Destination table :
CREATE TABLE cls_staging.cls_billing_address_em_tmp
( col1 string,
col2 string,
col3 string)
PARTITIONED BY (
curr_date string)
STORED AS ORC;
Query for inserting into destination table :
insert overwrite table cls_staging.cls_billing_address_em_tmp partition (record_date) select col1, col2, col3, FROM_UNIXTIME(UNIX_TIMESTAMP()) from myDB.mytbl;
ERROR
Dynamic partition strict mode requires at least one static partition column
2nd
insert overwrite table cls_staging.cls_billing_address_em_tmp partition (record_date = FROM_UNIXTIME(UNIX_TIMESTAMP())) select col1, col2, col3 from myDB.mytbl;
ERROR :
cannot recognize input near 'FROM_UNIXTIME' '(' 'UNIX_TIMESTAMP'
1st Switch-on dynamic partitioning and non-strict mode:
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table cls_staging.cls_billing_address_em_tmp partition (record_date)
select col1, col2, col3, current_timestamp from myDB.mytbl;
2nd: Do not use unix_timestamp() for this purpose, because it will generate many different timestamps, use current_timestamp constant, read this: https://stackoverflow.com/a/58081191/2700344

Insert data listing columns with partitioning field in Hive

First of all let's setup a test environment:
CREATE TABLE IF NOT EXISTS source_table (
`col1` TIMESTAMP,
`col2` STRING
);
CREATE TABLE IF NOT EXISTS dest_table (
`col1` TIMESTAMP,
`col2` STRING,
`col3` STRING
)
PARTITIONED BY (day STRING)
STORED AS AVRO;
INSERT INTO TABLE source_table VALUES ('2018-03-21 17:08:04.401', 'test1'), ('2018-03-22 12:02:04.222', 'test2'), ('2018-03-22 07:21:04.111', 'test3');
How could I list the column names during insertion and put the partition value dynamically? The following command doesn't work:
INSERT INTO TABLE dest_table(col1, col2) PARTITION(day) SELECT col1, col2, date_format(col1, 'yyyy-MM-dd') FROM source_table;
By the way, without listing the columns of dest_table inside the INSERT INTO command, having two tables with the same columns number, everything works fine. What if my dest_table has more fields than the source_table?
Thank you for helping me.
P.S.
Ok, if I hardcode NULL this works. I leave the question opened because there might be better ways to achieve that.
INSERT INTO TABLE dest_table PARTITION(day) SELECT col1, col2, NULL, date_format(col1, 'yyyy-MM-dd') FROM source_table;
Anyway, this method is strictly bounded with columns order? In a real-life scenario, how could I handle lots of columns specifying a mapping, to avoid mistakes?
The syntax for inserting into a partitioned table when you want to list the specific columns is shown below. You don't need to put null on col3 since Hive will put a default value NULL since it is not in the column list during insert.
INSERT INTO TABLE dest_table PARTITION (day)(col1, col2, day)
SELECT col1, col2, date_format(col1, 'yyyy-MM-dd') FROM source_table;
Result:
col1 col2 col3 day
2018-03-22 12:02:04.222 test2 NULL 2018-03-22
2018-03-22 07:21:04.111 test3 NULL 2018-03-22
2018-03-21 17:08:04.401 test1 NULL 2018-03-21

Oracle Equivalent to MySQL INSERT IGNORE?

I need to update a query so that it checks that a duplicate entry does not exist before insertion. In MySQL I can just use INSERT IGNORE so that if a duplicate record is found it just skips the insert, but I can't seem to find an equivalent option for Oracle. Any suggestions?
If you're on 11g you can use the hint IGNORE_ROW_ON_DUPKEY_INDEX:
SQL> create table my_table(a number, constraint my_table_pk primary key (a));
Table created.
SQL> insert /*+ ignore_row_on_dupkey_index(my_table, my_table_pk) */
2 into my_table
3 select 1 from dual
4 union all
5 select 1 from dual;
1 row created.
Check out the MERGE statement. This should do what you want - it's the WHEN NOT MATCHED clause that will do this.
Do to Oracle's lack of support for a true VALUES() clause the syntax for a single record with fixed values is pretty clumsy though:
MERGE INTO your_table yt
USING (
SELECT 42 as the_pk_value,
'some_value' as some_column
FROM dual
) t on (yt.pk = t.the_pke_value)
WHEN NOT MATCHED THEN
INSERT (pk, the_column)
VALUES (t.the_pk_value, t.some_column);
A different approach (if you are e.g. doing bulk loading from a different table) is to use the "Error logging" facility of Oracle. The statement would look like this:
INSERT INTO your_table (col1, col2, col3)
SELECT c1, c2, c3
FROM staging_table
LOG ERRORS INTO errlog ('some comment') REJECT LIMIT UNLIMITED;
Afterwards all rows that would have thrown an error are available in the table errlog. You need to create that errlog table (or whatever name you choose) manually before running the insert using DBMS_ERRLOG.CREATE_ERROR_LOG.
See the manual for details
I don't think there is but to save time you can attempt the insert and ignore the inevitable error:
begin
insert into table_a( col1, col2, col3 )
values ( 1, 2, 3 );
exception when dup_val_on_index then
null;
end;
/
This will only ignore exceptions raised specifically by duplicate primary key or unique key constraints; everything else will be raised as normal.
If you don't want to do this then you have to select from the table first, which isn't really that efficient.
Another variant
Insert into my_table (student_id, group_id)
select distinct p.studentid, g.groupid
from person p, group g
where NOT EXISTS (select 1
from my_table a
where a.student_id = p.studentid
and a.group_id = g.groupid)
or you could do
Insert into my_table (student_id, group_id)
select distinct p.studentid, g.groupid
from person p, group g
MINUS
select student_id, group_id
from my_table
A simple solution
insert into t1
select from t2
where not exists
(select 1 from t1 where t1.id= t2.id)
This one isn't mine, but came in really handy when using sqlloader:
create a view that points to your table:
CREATE OR REPLACE VIEW test_view
AS SELECT * FROM test_tab
create the trigger:
CREATE OR REPLACE TRIGGER test_trig
INSTEAD OF INSERT ON test_view
FOR EACH ROW
BEGIN
INSERT INTO test_tab VALUES
(:NEW.id, :NEW.name);
EXCEPTION
WHEN DUP_VAL_ON_INDEX THEN NULL;
END test_trig;
and in the ctl file, insert into the view instead:
OPTIONS(ERRORS=0)
LOAD DATA
INFILE 'file_with_duplicates.csv'
INTO TABLE test_view
FIELDS TERMINATED BY ','
(id, field1)
How about simply adding an index with whatever fields you need to check for dupes on and say it must be unique? Saves a read check.
yet another "where not exists"-variant using dual...
insert into t1(id, unique_name)
select t1_seq.nextval, 'Franz-Xaver' from dual
where not exists (select 1 from t1 where unique_name = 'Franz-Xaver');

Resources