Hive Insert overwrite into Dynamic partition external table from a raw external table failed with null pointer exception., - hadoop

I have a raw external table with four columns-
Table 1 :
create external table external_partitioned_rawtable (age_bucket
String,country_destination String,gender
string,population_in_thousandsyear int) row format delimited
fields terminated by '\t' lines terminated by '\n' location
'/user/HadoopUser/hive'
I want a external table with partitions from Country_destination and gender.Table -2
create external table external_partitioned (age_bucket
String,population_in_thousandsyear int) partitioned
by(country_destination String,gender String) row format delimited
fields terminated by '\t' lines terminated by '\n';
Insert Overwrite is failing with null pointer exception-
insert overwrite table external_partitioned partition(country_destination,gender) <br>
select (age_bucket,population_in_thousandsyear,country_destination,gender) <br>
from external_partitioned_rawtable;
FAILED: NullPointerException null

For dynamic partition insertion, before executing the INSERT statement you have to execute two properties of hive:
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
then execute insert statement(which I have modified)
insert overwrite table external_partitioned partition(country_destination,gender)
select age_bucket,population_in_thousandsyear,country_destination,gender
from external_partitioned_rawtable;
I hope this help you!!!

Related

HIVE - Cannot partition a table: semantic exception failure

I'm not able to import data on partitioned table in Hive.
Here is how I create the table
CREATE TABLE IF NOT EXISTS title_ratings
(
tconst STRING,
averageRating DOUBLE,
numVotes INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
TBLPROPERTIES("skip.header.line.count"="1");
And then I load the data into it : LOAD DATA INPATH '/title.ratings.tsv.gz' INTO TABLE eval_hive_db.title_ratings;
It works fine till here. Now I want to create a dynamic partitioned table. First of all, I setup theses params:
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
I now create my partitioned table:
CREATE TABLE IF NOT EXISTS title_ratings_part
(
tconst STRING,
numVotes INT
)
PARTITIONED BY (averageRating DOUBLE)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\n'
STORED AS TEXTFILE;
insert into title_ratings_part partition(title_ratings) select tconst, averageRating, numVotes from title_ratings;
(I also tried with numVotes instead by the way)
And I receive this error: FAILED: ValidationFailureSemanticException eval_hive_db.title_ratings_part: Partition spec {title_ratings=null} contains non-partition columns
Someone can help me please?
Ideally, I want to partition my table by averageRating (less than 2, between 2 and 4, and greater than 4)
You can run this command to check if there are null values or not.
select count(averageRating) from title_ratings group by averageRating;
Now, if there are null values in this column then you will get the count, which you have to fill then apply partitioning again.
Partition column is stored as last column in a table so while inserting you need to maintain correct order in select statement.
Pls change order of columns in select.
insert into title_ratings_part partition(title_ratings)
Select
Tconst,
numVotes,
averageRating --orderwise this should always be last column
from title_ratings

How I avoid the "NULL" in the first "Field Name" of Hive table

First of all I have created the table "emp" in Hive by using below commands:
create table emp (id INT, name STRING, address STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t';
Then load the data in this "emp" table by this below command:
LOAD DATA LOCAL INPATH '\home\cloudera\Desktop\emp.txt' overwrite into table emp;
When I select the data from "emp" table: it show me first field of table Null
like this:
You have an header row in your file and the first value id cannot be converted into INT therefore being replaced by NULL.
add tblproperties ("skip.header.line.count"="1") to your table definition
For an existing table -
alter table emp set tblproperties ("skip.header.line.count"="1");

Unable to load data in Hive partitioned table

I have created a table in Hive with the following query:
create table if not exists employee(CASE_NUMBER String,
CASE_STATUS String,
CASE_RECEIVED_DATE DATE,
DECISION_DATE DATE,
EMPLOYER_NAME STRING,
PREVAILING_WAGE_PER_YEAR BIGINT,
PAID_WAGE_PER_YEAR BIGINT,
order_n int) partitioned by (JOB_TITLE_SUBGROUP STRING) row format delimited fields terminated by ',';
I tried loading data into the create table using below query:
LOAD DATA INPATH '/salary_data.csv' overwrite into table employee partition (JOB_TITLE_SUBGROUP);
For the partitioned table, I have even set following configuration :
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
But I am getting below error while executing the load query:
Your query has the following error(s):
Error while compiling statement: FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Invalid partition key & values; keys [job_title_subgroup, ], values [])
Please help.
If you want to load data into a Hive partition, you have to provide the value of the partition itself in the LOAD DATA query. So in this case, your query would be something like this.
LOAD DATA INPATH '/salary_data.csv' overwrite into table employee partition (JOB_TITLE_SUBGROUP="Value");
Where "Value" is the name of the partition in which you are loading your data. The reason is because Hive will use "Value" to create the directory in which your .csv is going to be stored, which will be something like this: .../employee/JOB_TITLE_SUBGROUP=Value. I hope this helps.
Check the documentation for details on the LOAD DATA syntax.
EDITED
Since the table has dynamic partition, one solution would be loading the .csv into an external table (e.g. employee_external) and then execute an INSERT command like this:
INSERT OVERWRITE INTO TABLE employee PARTITION(JOB_TITLE_SUBGROUP)
SELECT CASE_NUMBER, CASE_STATUS, (...), JOB_TITLE_SUBGROUP
FROM employee_external
I might be little late to reply but can try below steps:
Set below properties first :
Ø set hive.exec.dynamic.partition.mode=nonstrict;
Ø set hive.exec.dynamic.partition=true;
Create temp table first:
CREATE EXTERNAL TABLE IF NOT EXISTS employee_temp(
ID STRING,
Name STRING,
Salary STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
tblproperties ("skip.header.line.count"="1");
Load Data in temporary table:
hive> LOAD DATA INPATH 'filepath/employee.csv' OVERWRITE INTO TABLE employee;
Create Partitioned Table:
CREATE EXTERNAL TABLE IF NOT EXISTS employee_part(
ID STRING,
Name STRING)
PARTITIONED BY (Salary STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
tblproperties ("skip.header.line.count"="1");
Load Data into partitioned table from intermediate / temp table:
INSERT OVERWRITE TABLE employee_part PARTITION (SALARY) SELECT * FROM employee;

How to Copy TEXT format partitioned table to ORC format Table in Hive

i have a Text Format hive table, like:
CREATE EXTERNAL TABLE op_log (
time string, debug string,app_id string,app_version string, ...more fields)
PARTITIONED BY (dt string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;
now i create a orc format table with same fields, like
CREATE TABLE op_log_orc (
time string, debug string,app_id string,app_version string, ...more fields)
PARTITIONED BY (dt string)
STORED AS ORC tblproperties ("orc.compress" = "SNAPPY");
when i copy from op_log to op_log_orc, i have get this errors:
hive> insert into op_log_orc PARTITION(dt='2016-08-09') select * from op_log where dt='2016-08-09';
FAILED: SemanticException [Error 10044]: Line 1:12 Cannot insert into target table because column number/types are different ''2016-08-09'': Table insclause-0 has 62 columns, but query has 63 columns.
hive>
The partition key (dt) in the source table is returned in the result set as though it were a regular field, so you have the extra column. Exclude the dt field from the field list (instead of *) if you're going to specify its value in the partition key. Alternatively, just specify dt as the name of the partition, without providing a value. See CTAS (create table as select...) in the example here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableAsSelect(CTAS)

Hive - facing challenge's in Dynamic partition error

Can any one guide me where I am doing mistake while doing dynamic partition.
--Staging table:
create table staging_peopledata
(
firstname string,
secondname string,
salary float,
country string
state string
)
row format delimited fields terminated by ',' lines terminated by '\n';
--Data for Staging table:
John,David,30000,RUS,tnRUS
John,David,30000,RUS,tnRUS
Mary,David,5000,AUS,syAUS
Mary,David,5000,AUS,syAUS
Mary,David,5000,AUS,weAUS
Pierre,Cathey,6000,RUS,kaRUS
Pierre,Cathey,6000,RUS,kaRUS
Ahmed,Talib,11000,US,bcUS
Ahmed,Talib,11000,US,onUS
Ahmed,Talib,11000,US,onUS
kris,David,80000,UK,lnUK
kris,David,80000,UK,soUK
--Production table:
create table Production_peopledata
(
firstname string,
lastname string,
salary float)
partitioned by (country string, state string)
row format delimited fields terminated by ',' lines terminated by '\n';
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table Production_peopledata
partition(country,state)
select firstname, secondname, salary, country, state from staging_peopledata;
If i execute the above command I am getting error as below.
FAILED: SemanticException [Error 10096]: Dynamic partition strict mode
requires atleast one static partition column. To turn this off set
hive.exec.dynamic.partition.mode=nonstrict
Can any one tell me where I am doing the mistake.
Can you please run below command on Hive Shell.
hive>set hive.exec.dynamic.partition.mode=nonstrict;
You need to set below properties:
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
The column name want to partition on should not be part of the table definition. As the partition column is dynamically generated. While filling the data in the partitioned table the partitioned column should come from the source table.
Let's say we have EMP and EMP1 tables. EMP1 is the partitioned table which will get the data from the EMP table. Initially both of these tables are same. So first we need to create a partitioned column i.e. salpart. Then we will add this column in the source table which is EMP. After successful run we can see the partitioned files in user/hive/warehouse location. The above explanation is implemented as below:
load data local inpath '/home/cloudera/myemployeedata.txt' overwrite into table emp;
CREATE TABLE IF NOT EXISTS emp ( eid int, name String,
salary String, destination String,salpart string)
COMMENT "Employee details"
ROW FORMAT DELIMITED
FIELDS TERMINATED BY "\t"
LINES TERMINATED BY "\n"
STORED AS TEXTFILE;
CREATE TABLE IF NOT EXISTS emp1 ( eid int, name String,
salary String, destination String)
COMMENT "Employee details"
partitioned by (salpart string) {this column will values will come from a seperate table }
ROW FORMAT DELIMITED
FIELDS TERMINATED BY "\t"
LINES TERMINATED BY "\n"
STORED AS TEXTFILE;
Dynamic Partition:
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table emp1 partition(salpart) select eid,name,salary,destination,salpart from emp;
as per the error it seems that mode in still strict, for dynamic partitioning it need to be set to non strict
use below command
hive>set hive.exec.dynamic.partition.mode=nonstrict;
Once again try to do
Set hive.exec.dynamic.partition.mode=nonstrict
Sometimes in hive it happens even if you set this property it considers strict mode hence I suggest you to set this property once again

Resources