I have a CSV which contains 8 column. It has NULL data for few of the rows.
To load CSV data, I have created 2 tables with the same definition.
1) TABLE_NOT_NULL to load Not NULL data
2) TABLE_NULL to load NULL Data
I am successfully able to load data into TABLE_NOT_NULL with below when condition:
insert into table '<TABLE_NAME>' when '<COLUMN_NAME>'!=' '.
Now, I want to load NULL data into the table called TABLE_NULL but I am not able to filter out only NULL value with when condition.
I tried too many things but none of them worked; like:
a) insert into table '<TABLE_NAME>' WHEN '<COLUMN_NAME>'=BLANKS
b) insert into table '<TABLE_NAME>' WHEN '<COLUMN_NAME>'=' '
Can anyone please suggest any workaround or solution for it?
Workaround?
1
Load everything into TABLE_NULL
insert into TABLE_NOT_NULL select * From TABLE_NULL where column is not null
delete from TABLE_NULL where column is not null
2
load everything into TABLE_NOT_NULL
rows, that contain NULL values, won't be loaded but end up in the BAD
file
using another control file, load BAD file into TABLE_NULL
3 (EDIT)
instead of SQL*Loader, create an external table - it acts as if it was an ordinary Oracle table, but is (really) just a pointer to the file
you'd then write 2 INSERT statements:
insert into table_not_null
select * From external_table where column is not null;
insert into table_null
select * From external_table where column is null;
Related
I'm not able to import data on partitioned table in Hive.
Here is how I create the table
CREATE TABLE IF NOT EXISTS title_ratings
(
tconst STRING,
averageRating DOUBLE,
numVotes INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
TBLPROPERTIES("skip.header.line.count"="1");
And then I load the data into it : LOAD DATA INPATH '/title.ratings.tsv.gz' INTO TABLE eval_hive_db.title_ratings;
It works fine till here. Now I want to create a dynamic partitioned table. First of all, I setup theses params:
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
I now create my partitioned table:
CREATE TABLE IF NOT EXISTS title_ratings_part
(
tconst STRING,
numVotes INT
)
PARTITIONED BY (averageRating DOUBLE)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\n'
STORED AS TEXTFILE;
insert into title_ratings_part partition(title_ratings) select tconst, averageRating, numVotes from title_ratings;
(I also tried with numVotes instead by the way)
And I receive this error: FAILED: ValidationFailureSemanticException eval_hive_db.title_ratings_part: Partition spec {title_ratings=null} contains non-partition columns
Someone can help me please?
Ideally, I want to partition my table by averageRating (less than 2, between 2 and 4, and greater than 4)
You can run this command to check if there are null values or not.
select count(averageRating) from title_ratings group by averageRating;
Now, if there are null values in this column then you will get the count, which you have to fill then apply partitioning again.
Partition column is stored as last column in a table so while inserting you need to maintain correct order in select statement.
Pls change order of columns in select.
insert into title_ratings_part partition(title_ratings)
Select
Tconst,
numVotes,
averageRating --orderwise this should always be last column
from title_ratings
I am trying to replace all of the NULL values to 0 in a column of a big table in HIVE.
However, every time I try to implement some code I end up generating a new column to the table. The column I am trying to change/modify still exists and still has the NULL values but the new column that is automatically generated (i.e. _c1) is what I want the column I am trying to modify, to look like.
I tried to run a COALESCE but that also ended up generating a new column. I also tried to implement a CASE WHEN, but the same results ensued.
Select *,
CASE WHEN columnname IS NULL THEN 0
ELSE columnname
END
from tablename;
Also tried
SELECT coalesce(columnname, CAST(0 AS BIGINT)) FROM tablename
I would just like to update the table with the other columns being as is but the column I want to modify still has its original name but instead of NULL values it has 0's that replaced them.
I don't want to generate a new column but modify an existing one.
How should I do that?
Use insert overwrite .. option.
insert overwrite table tablename
select c1,c2,...,coalesce(columnname,0) as columnname
from tablename
Note that you have to specify all the other column names required in select.
I want to read a .tsv file from Hbase into hive. The file has a columnfamily, which has 3 columns inside: news, social and all. The aim is to store these columns in an table in hbase which has the columns news, social and all.
CREATE EXTERNAL TABLE IF NOT EXISTS topwords_logs (key String,
columnfamily String , wort String , col String, occurance int)ROW FORMAT
DELIMITED FIELDS TERMINATED BY '\t'STORED AS TEXTFILE LOCATION '/home
/hfu/Testdaten';
load data local inpath '/home/hfu/Testdaten/part-r-00000.tsv' into table topwords_logs;
CREATE TABLE newtopwords (columnall int, columnsocial int , columnnews int) PARTITIONED BY(wort STRING) STORED AS SEQUENCEFILE;
Here i created a external table, which contain the data from hbase. Further on I created a table with the 3 columns.
What i have tried so far is this:
insert overwrite table newtopwords partition(wort)
select occurance ,'1', '1' , wort from topwords_log;
This Code works fine, but i have for each column an extra where clause. How can I insert data like this?
insert overwrite table newtopwords partition(wort)
values(columnall,(select occurance from topwords_logs where col =' all')),(columnnews,( select occurance from topwords_logs where col =' news')) ,(columnsocial,( select occurance from topwords_logs where col =' social')),(wort,(select wort from topwords_log));
This code isnt working ->NoViableAltException.
On every example I just see Code, where they insert data without a Where clause. How can I insert Data with a Where clause?
I am unable to append data to tables that contain an array column using insert into statements; the data type is array < varchar(200) >
Using jodbc I am unable to insert values into an array column by values like :
INSERT INTO demo.table (codes) VALUES (['a','b']);
does not recognises the "[" or "{" signs.
Using the array function like ...
INSERT INTO demo.table (codes) VALUES (array('a','b'));
I get the following error using array function:
Unable to create temp file for insert values Expression of type TOK_FUNCTION not supported in insert/values
Tried the workaround...
INSERT into demo.table (codes) select array('a','b');
unsuccessfully:
Failed to recognize predicate '<EOF>'. Failed rule: 'regularBody' in statement
How can I load array data into columns using jdbc ?
My Table has two columns: a STRING, b ARRAY<STRING>.
When I use #Kishore Kumar Suthar's method, I got this:
FAILED: ParseException line 1:33 cannot recognize input near '(' 'a' ',' in statement
But I find another way, and it works for me:
INSERT INTO test.table
SELECT "test1", ARRAY("123", "456", "789")
FROM dummy LIMIT 1;
dummy is any table which has atleast one row.
make a dummy table which has atleast one row.
INSERT INTO demo.table (codes) VALUES (array('a','b')) from dummy limit 1;
hive> select codes demo.table;
OK
["a","b"]
Time taken: 0.088 seconds, Fetched: 1 row(s)
Suppose I have a table employee containing the fields ID and Name.
I create another table employee_address with fields ID and Address. Address is a complex data of type array(string).
Here is how I can insert values into it:
insert into table employee_address select 1, 'Mark', 'Evans', ARRAY('NewYork','11th
avenue') from employee limit 1;
Here the table employee just acts as a dummy table. No data is copied from it. Its schema may not match employee_address. It doesn't matter.
I wanted to insert some initial data into the table in hive, so I created below HQL,
INSERT OVERWRITE TABLE table PARTITION(dt='2014-06-26') SELECT 'key_sum' as key, '0' as value;
but it does not work.
There is another query like the above,
INSERT OVERWRITE TABLE table PARTITION(dt='2014-06-26') SELECT 'key_sum' as key, '0' as value FROM table limit 1;
But it also didn't work, as I see that the tables are empty.
How can I set the initial data into the table?
(There is the reason why I have to do self-join)
About first HQL it should have from clause, its missing so HQL failure,
INSERT OVERWRITE TABLE table PARTITION(dt='2014-06-26') SELECT 'key_sum' as key, '0' as value;
Regarding second HQL, from table should have atleast one row, so it can set the constant init values into your newly created table.
INSERT OVERWRITE TABLE table PARTITION(dt='2014-06-26') SELECT 'key_sum', '0' FROM table limit 1;
you can use any old hive table having data into it, and give a hit.
The following query works fine if we have already test table created in hive.
INSERT OVERWRITE TABLE test PARTITION(dt='2014-06-26') SELECT 'key_sum' as key, '0' as value FROM test;
I think the table which we perform insert should be created first.