load NULL data into different table with the help of SQL*Loader - oracle

I have a CSV which contains 8 column. It has NULL data for few of the rows.
To load CSV data, I have created 2 tables with the same definition.
1) TABLE_NOT_NULL to load Not NULL data
2) TABLE_NULL to load NULL Data
I am successfully able to load data into TABLE_NOT_NULL with below when condition:
insert into table '<TABLE_NAME>' when '<COLUMN_NAME>'!=' '.
Now, I want to load NULL data into the table called TABLE_NULL but I am not able to filter out only NULL value with when condition.
I tried too many things but none of them worked; like:
a) insert into table '<TABLE_NAME>' WHEN '<COLUMN_NAME>'=BLANKS
b) insert into table '<TABLE_NAME>' WHEN '<COLUMN_NAME>'=' '
Can anyone please suggest any workaround or solution for it?

Workaround?
1
Load everything into TABLE_NULL
insert into TABLE_NOT_NULL select * From TABLE_NULL where column is not null
delete from TABLE_NULL where column is not null
2
load everything into TABLE_NOT_NULL
rows, that contain NULL values, won't be loaded but end up in the BAD
file
using another control file, load BAD file into TABLE_NULL
3 (EDIT)
instead of SQL*Loader, create an external table - it acts as if it was an ordinary Oracle table, but is (really) just a pointer to the file
you'd then write 2 INSERT statements:
insert into table_not_null
select * From external_table where column is not null;
insert into table_null
select * From external_table where column is null;

Related

HIVE - Cannot partition a table: semantic exception failure

I'm not able to import data on partitioned table in Hive.
Here is how I create the table
CREATE TABLE IF NOT EXISTS title_ratings
(
tconst STRING,
averageRating DOUBLE,
numVotes INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
TBLPROPERTIES("skip.header.line.count"="1");
And then I load the data into it : LOAD DATA INPATH '/title.ratings.tsv.gz' INTO TABLE eval_hive_db.title_ratings;
It works fine till here. Now I want to create a dynamic partitioned table. First of all, I setup theses params:
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
I now create my partitioned table:
CREATE TABLE IF NOT EXISTS title_ratings_part
(
tconst STRING,
numVotes INT
)
PARTITIONED BY (averageRating DOUBLE)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\n'
STORED AS TEXTFILE;
insert into title_ratings_part partition(title_ratings) select tconst, averageRating, numVotes from title_ratings;
(I also tried with numVotes instead by the way)
And I receive this error: FAILED: ValidationFailureSemanticException eval_hive_db.title_ratings_part: Partition spec {title_ratings=null} contains non-partition columns
Someone can help me please?
Ideally, I want to partition my table by averageRating (less than 2, between 2 and 4, and greater than 4)
You can run this command to check if there are null values or not.
select count(averageRating) from title_ratings group by averageRating;
Now, if there are null values in this column then you will get the count, which you have to fill then apply partitioning again.
Partition column is stored as last column in a table so while inserting you need to maintain correct order in select statement.
Pls change order of columns in select.
insert into title_ratings_part partition(title_ratings)
Select
Tconst,
numVotes,
averageRating --orderwise this should always be last column
from title_ratings

How to replace NULL values in one column to 0 (of a very large table) without creating a new column of the desired results added to the table in HIVE?

I am trying to replace all of the NULL values to 0 in a column of a big table in HIVE.
However, every time I try to implement some code I end up generating a new column to the table. The column I am trying to change/modify still exists and still has the NULL values but the new column that is automatically generated (i.e. _c1) is what I want the column I am trying to modify, to look like.
I tried to run a COALESCE but that also ended up generating a new column. I also tried to implement a CASE WHEN, but the same results ensued.
Select *,
CASE WHEN columnname IS NULL THEN 0
ELSE columnname
END
from tablename;
Also tried
SELECT coalesce(columnname, CAST(0 AS BIGINT)) FROM tablename
I would just like to update the table with the other columns being as is but the column I want to modify still has its original name but instead of NULL values it has 0's that replaced them.
I don't want to generate a new column but modify an existing one.
How should I do that?
Use insert overwrite .. option.
insert overwrite table tablename
select c1,c2,...,coalesce(columnname,0) as columnname
from tablename
Note that you have to specify all the other column names required in select.

How insert overwrite table in hive with diffrent where clauses?

I want to read a .tsv file from Hbase into hive. The file has a columnfamily, which has 3 columns inside: news, social and all. The aim is to store these columns in an table in hbase which has the columns news, social and all.
CREATE EXTERNAL TABLE IF NOT EXISTS topwords_logs (key String,
columnfamily String , wort String , col String, occurance int)ROW FORMAT
DELIMITED FIELDS TERMINATED BY '\t'STORED AS TEXTFILE LOCATION '/home
/hfu/Testdaten';
load data local inpath '/home/hfu/Testdaten/part-r-00000.tsv' into table topwords_logs;
CREATE TABLE newtopwords (columnall int, columnsocial int , columnnews int) PARTITIONED BY(wort STRING) STORED AS SEQUENCEFILE;
Here i created a external table, which contain the data from hbase. Further on I created a table with the 3 columns.
What i have tried so far is this:
insert overwrite table newtopwords partition(wort)
select occurance ,'1', '1' , wort from topwords_log;
This Code works fine, but i have for each column an extra where clause. How can I insert data like this?
insert overwrite table newtopwords partition(wort)
values(columnall,(select occurance from topwords_logs where col =' all')),(columnnews,( select occurance from topwords_logs where col =' news')) ,(columnsocial,( select occurance from topwords_logs where col =' social')),(wort,(select wort from topwords_log));
This code isnt working ->NoViableAltException.
On every example I just see Code, where they insert data without a Where clause. How can I insert Data with a Where clause?

Hive inserting values to an array complex type column

I am unable to append data to tables that contain an array column using insert into statements; the data type is array < varchar(200) >
Using jodbc I am unable to insert values into an array column by values like :
INSERT INTO demo.table (codes) VALUES (['a','b']);
does not recognises the "[" or "{" signs.
Using the array function like ...
INSERT INTO demo.table (codes) VALUES (array('a','b'));
I get the following error using array function:
Unable to create temp file for insert values Expression of type TOK_FUNCTION not supported in insert/values
Tried the workaround...
INSERT into demo.table (codes) select array('a','b');
unsuccessfully:
Failed to recognize predicate '<EOF>'. Failed rule: 'regularBody' in statement
How can I load array data into columns using jdbc ?
My Table has two columns: a STRING, b ARRAY<STRING>.
When I use #Kishore Kumar Suthar's method, I got this:
FAILED: ParseException line 1:33 cannot recognize input near '(' 'a' ',' in statement
But I find another way, and it works for me:
INSERT INTO test.table
SELECT "test1", ARRAY("123", "456", "789")
FROM dummy LIMIT 1;
dummy is any table which has atleast one row.
make a dummy table which has atleast one row.
INSERT INTO demo.table (codes) VALUES (array('a','b')) from dummy limit 1;
hive> select codes demo.table;
OK
["a","b"]
Time taken: 0.088 seconds, Fetched: 1 row(s)
Suppose I have a table employee containing the fields ID and Name.
I create another table employee_address with fields ID and Address. Address is a complex data of type array(string).
Here is how I can insert values into it:
insert into table employee_address select 1, 'Mark', 'Evans', ARRAY('NewYork','11th
avenue') from employee limit 1;
Here the table employee just acts as a dummy table. No data is copied from it. Its schema may not match employee_address. It doesn't matter.

How to insert init-data into a table in hive?

I wanted to insert some initial data into the table in hive, so I created below HQL,
INSERT OVERWRITE TABLE table PARTITION(dt='2014-06-26') SELECT 'key_sum' as key, '0' as value;
but it does not work.
There is another query like the above,
INSERT OVERWRITE TABLE table PARTITION(dt='2014-06-26') SELECT 'key_sum' as key, '0' as value FROM table limit 1;
But it also didn't work, as I see that the tables are empty.
How can I set the initial data into the table?
(There is the reason why I have to do self-join)
About first HQL it should have from clause, its missing so HQL failure,
INSERT OVERWRITE TABLE table PARTITION(dt='2014-06-26') SELECT 'key_sum' as key, '0' as value;
Regarding second HQL, from table should have atleast one row, so it can set the constant init values into your newly created table.
INSERT OVERWRITE TABLE table PARTITION(dt='2014-06-26') SELECT 'key_sum', '0' FROM table limit 1;
you can use any old hive table having data into it, and give a hit.
The following query works fine if we have already test table created in hive.
INSERT OVERWRITE TABLE test PARTITION(dt='2014-06-26') SELECT 'key_sum' as key, '0' as value FROM test;
I think the table which we perform insert should be created first.

Resources