How to insert init-data into a table in hive? - hadoop

I wanted to insert some initial data into the table in hive, so I created below HQL,
INSERT OVERWRITE TABLE table PARTITION(dt='2014-06-26') SELECT 'key_sum' as key, '0' as value;
but it does not work.
There is another query like the above,
INSERT OVERWRITE TABLE table PARTITION(dt='2014-06-26') SELECT 'key_sum' as key, '0' as value FROM table limit 1;
But it also didn't work, as I see that the tables are empty.
How can I set the initial data into the table?
(There is the reason why I have to do self-join)

About first HQL it should have from clause, its missing so HQL failure,
INSERT OVERWRITE TABLE table PARTITION(dt='2014-06-26') SELECT 'key_sum' as key, '0' as value;
Regarding second HQL, from table should have atleast one row, so it can set the constant init values into your newly created table.
INSERT OVERWRITE TABLE table PARTITION(dt='2014-06-26') SELECT 'key_sum', '0' FROM table limit 1;
you can use any old hive table having data into it, and give a hit.

The following query works fine if we have already test table created in hive.
INSERT OVERWRITE TABLE test PARTITION(dt='2014-06-26') SELECT 'key_sum' as key, '0' as value FROM test;
I think the table which we perform insert should be created first.

Related

How to replace NULL values in one column to 0 (of a very large table) without creating a new column of the desired results added to the table in HIVE?

I am trying to replace all of the NULL values to 0 in a column of a big table in HIVE.
However, every time I try to implement some code I end up generating a new column to the table. The column I am trying to change/modify still exists and still has the NULL values but the new column that is automatically generated (i.e. _c1) is what I want the column I am trying to modify, to look like.
I tried to run a COALESCE but that also ended up generating a new column. I also tried to implement a CASE WHEN, but the same results ensued.
Select *,
CASE WHEN columnname IS NULL THEN 0
ELSE columnname
END
from tablename;
Also tried
SELECT coalesce(columnname, CAST(0 AS BIGINT)) FROM tablename
I would just like to update the table with the other columns being as is but the column I want to modify still has its original name but instead of NULL values it has 0's that replaced them.
I don't want to generate a new column but modify an existing one.
How should I do that?
Use insert overwrite .. option.
insert overwrite table tablename
select c1,c2,...,coalesce(columnname,0) as columnname
from tablename
Note that you have to specify all the other column names required in select.

Hive insert overwrites truncates the table in few cases

I was working on one solution and found that in some particular cases, hive insert overwrite truncates the table however in few cases it doesn't. Would someone please explain me what it's behaving like that?
to explain this, I am table two tables, source and target and trying to insert data into master from source table using insert overwrite
When Source Table has partition
if source table has partition and if you write a condition such that partition does not exist then it won't truncate the master table.
create table source (name String) partitioned by (age int);
insert into source partition (age) values("gaurang", 11);
create table target (name String, age int);
insert into target partition (age) values("xxx", 99);
following query won't truncate the table even if select doesn't return anything.
insert overwrite table temp.test12 select * from temp.test11 where name="Ddddd" and age=99;
However, following query will truncate the table.
insert overwrite table temp.target select * from temp.test11 where name="Ddddd" and age=11;
it makes sense in the first case, as the partition(age=99) does not exist hence it should stop the execution of the query further. However this is my assumption, not sure what exactly happens.
When Source Table Doesn't have partition, but Target has
in this case target table won't be truncated even if select statement from source table returns 0 rows.
use temp;
drop table if exists source1;
drop table if exists target1;
create table source1 (name String, age int);
create table target1 (name String) partitioned by (age int);
insert into source1 values ("gaurang", 11);
insert into target1 partition(age) values("xxx", 99);
select * from source1;
select * from target1;
Following query won't truncate the table even if no data found in select statement.
insert overwrite table temp.target1 partition(age) select * from temp.source1 where age=90;
When Source or Target don't have partition
In this case if I try to insert overwrite target and select statement doesn't return any row then target table will be truncated.
check the example below.
use temp;
drop table if exists source1;
drop table if exists target1;
create table source1 (name String, age int);
create table target1 (name String, age int);
insert into source1 values ("gaurang", 11);
insert into target1 values("xxx", 99);
select * from source1;
select * from target1;
Following Query will truncate the target table.
insert overwrite table temp.target1 select * from temp.source1 where age=90;
Better use term 'overwrite' instead of truncate, because it is what exactly happening during insert overwrite.
When you write overwrite table temp.target1 partition(age) you instructs Hive to overwrite partitions, not all the target1 table, only those partitions which will be returned by select.
Empty dataset will not overwrite partitions in dynamic partition mode. because the partition to overwrite is unknown, partition should be taken from dataset, and the dataset is empty, nothing to overwrite then.
And in case of not partitioned table, it is already known that it should overwrite all the table, does not matter, empty dataset or not.
Partition column in insert overwrite statement should be the last. And the list of partitions to be overwritten in target = list of values in partition column, returned by dataset, does not matter how the source table is partitioned (you can select target partition column from any source table column, calculate it or use a constant), only what was returned does matter.

Upserting in GreenPlum

How can I upsert a record in GreenPlum while copying the data from a CSV file. The CSV file has multiple records for a given value of the primary key. If a row with some value already exists in the database I want to update that record. Otherwise, it should append a new row.
One way to do this is to copy the data to a staging table, then insert/update from that table.
Here is an example of that:
-- Duplicate the definition of your table.
CREATE TEMP TABLE my_table_stage (LIKE my_table INCLUDING DEFAULTS);
-- Your COPY statment
COPY my_table FROM 'my_file.csv' ...
-- Insert any "new" records
INSERT INTO my_table (key_field, data_field1, data_field2)
SELECT
stg.key_field,
stg.data_field1,
stg.data_field2
FROM
my_table_stage stg
WHERE
NOT EXISTS (SELECT 1 FROM my_table WHERE key_field = stg.key_field);
-- Update any existing records
UPDATE my_table orig
SET
data_field1 = stg.data_field1,
data_field2 = stg.data_field2
FROM
my_table_stage stg
WHERE
orig.key_field = stg.keyfield;

how to use one sql insert data to two table?

I have two table,and they are connected by one field : B_ID of table A & id of table B.
I want to use sql to insert data to this two table.
how to write the insert sql ?
1,id in table B is auto-increment.
2,in a stupid way,I can insert data to table B first,and then select the id from table B,then add the id to table A as message_id.
You cannot insert data to multiple tables in one SQL statement. Just insert data first to B table and then table A. You could use RETURNING statement to get ID value and get rid of additional select statement between inserts.
See: https://oracle-base.com/articles/misc/dml-returning-into-clause
Have you heard about AFTER INSERT trigger? I think it is what you are looking for.
Something like this might do what you want:
CREATE OR REPLACE TRIGGER TableB_after_insert
AFTER INSERT
ON TableB
FOR EACH ROW
DECLARE
v_id int;
BEGIN
/*
* 1. Select your id from TableB
* 2. Insert data to TableA
*/
END;
/

Inserting an empty row

This is so simple it has probably already been asked, but I couldn't find it (if that's the case I'm sorry for asking).
I would like to insert an empty row on a table so I can pick up its ID (primary key, generated by an insert trigger) through an ExecuteScalar. Data is added to it at a later time in my code.
My question is this: is there a specific insert syntax to create an empty record? or must I go with the regular insert syntax such as "INSERT INTO table (list all the columns) values (null for every column)"?
Thanks for the answer.
UPDATE: In Oracle, ExecuteScalar on INSERT only returns 0. The final answer is a combination of what was posted below. First you need to declare a parameter, and pick up it up with RETURNING.
INSERT INTO TABLENAME (ID) VALUES (DEFAULT) RETURNING ID INTO :parameterName
Check this out link for more info.
You would not have to specify every single column, but you may not be able to create an "empty" record. Check for NOT NULL constraints on the table. If none (not including the Primary Key constraint), then you would only need to supply one column. Like this:
insert into my_table ( some_column )
values ( null );
Do you know about the RETURNING clause? You can return that PK back to your calling application when you do the INSERT.
insert into my_table ( some_column )
values ( 'blah' )
returning my_table_id into <your_variable>;
I would question the approach though. Why create an empty row? That would/could mean there are no constraints on that table, a bad thing if you want good, clean, data.
Basically, in order to insert a row where values for all columns are NULL except primary
key column's value you could execute a simple insert statement:
insert into your_table(PK_col_name)
values(1); -- 1 for instance or null
The before insert trigger, which is responsible for populating primary key column will
override the value in the values clause of the insert statement leaving you with an
empty record except PK value.

Resources