getting null values while loading the data from flat files into hive tables

getting null values while loading the data from flat files into hive tables - hadoop

I am getting the null values while loading the data from flat files into hive tables.
my tables structure is like this:
hive> create table test_hive (id int,value string);
and my flat file is like this:
input.txt
1 a
2 b
3 c
4 d
5 e
6 F
7 G
8 j
when I am running the below commands I am getting null values:
hive> LOAD DATA LOCAL INPATH '/home/hduser/input.txt' OVERWRITE INTO TABLE test_hive;
hive> select * from test_hive;
OK<br>
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
screen shot:
hive> create table test_hive (id int,value string);
OK
Time taken: 4.97 seconds
hive> show tables;
OK
test_hive
Time taken: 0.124 seconds
hive> LOAD DATA LOCAL INPATH '/home/hduser/input2.txt' OVERWRITE INTO TABLE test_hive;
Copying data from file:/home/hduser/input2.txt
Copying file: file:/home/hduser/input2.txt
Loading data to table default.test_hive
Deleted hdfs://hydhtc227141d:54310/app/hive/warehouse/test_hive
OK
Time taken: 0.572 seconds
hive> select * from test_hive;
OK
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
Time taken: 0.182 seconds

The default field terminator in Hive is ^A. You need to explicitly mention in your create table statement that you are using a different field separator.
Similar to what Lorand Bending pointed in the comment, use:
CREATE TABLE test_hive(id INT, value STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ';
You don't need to specify a location since you are creating a managed table (and not an external table).

Problem you are facing is because in your data the fields are separated by ' ' and while creating table you did not mention the field delimiter. So if you don't mention the field delimiter while creating hive table, by default hive considers ^A as delimiter.
So to resolve your problem, you can recreate the table mentioning the below syntax and it would work.
CREATE TABLE test_hive(id INT, value STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ';

The solution is quite simple. The Table wan't created in the right way.
Simple solution for your problem or any further problems is knowing how to load the data.
CREATE TABLE [IF NOT EXIST] mytableName(id int,value string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '/t'
STORED AS TEXTFILE ;
Now lemme explain the code :
First Line
Creating your table. The [IF NOT EXIST] is optional that tells if the table exist don't overwrite it. Its more of safety measure.
Second line
Specifies a delimiter at the table level for structured fields.
Third Item
You can include any single character, but the default is '\001'.
'/t' is for a tab space : in your case
'|' is for data which are beside each other and separated by |
' ' for one char space. And so on...
Forth Line :
Specifies the type of file in which data is to be stored. The file can be a TEXTFILE, SEQUENCEFILE, RCFILE, or BINARY SEQUENCEFILE. Or, how the data is stored can be specified as Java input and output classes.
when loading Locally :
LOCD DATA LOCAL INPATH '/your/data/path.csv' [OVERWRITE] INTO TABLE myTableName;
Always try checking your data by a simple select* statement.
Hope it helps.

Hive’s default record and field delimiters list:
\n
^A
^B
^C
press ^V^A could insert a ^A in Vim.

The elements are separated by space or tab? Let it's tab follow these steps. If separated space use ' ' instead of '\t' Ok.
hive> CREATE TABLE test_hive(id INT, value STRING) row format
delimited fields terminated by '\t' line formated by '\n' stored as filename;
Than you have to enter
hive> LOAD DATA LOCAL INPATH '/home/hduser/input.txt' OVERWRITE INTO TABLE test_hive;
hive> select * from test_hive;
Now you will get exact your expected output "filename".

please check the dataset date column it should follow the date format yyyy-mm-dd
If the string is in the form 'yyyy-mm-dd', then a date value corresponding to that year/month/day is returned. If the string value does not match this formate, then NULL is returned.
Hive Official documentation

Related

HIVE - Cannot partition a table: semantic exception failure

I'm not able to import data on partitioned table in Hive.
Here is how I create the table
CREATE TABLE IF NOT EXISTS title_ratings
(
tconst STRING,
averageRating DOUBLE,
numVotes INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
TBLPROPERTIES("skip.header.line.count"="1");
And then I load the data into it : LOAD DATA INPATH '/title.ratings.tsv.gz' INTO TABLE eval_hive_db.title_ratings;
It works fine till here. Now I want to create a dynamic partitioned table. First of all, I setup theses params:
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
I now create my partitioned table:
CREATE TABLE IF NOT EXISTS title_ratings_part
(
tconst STRING,
numVotes INT
)
PARTITIONED BY (averageRating DOUBLE)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\n'
STORED AS TEXTFILE;
insert into title_ratings_part partition(title_ratings) select tconst, averageRating, numVotes from title_ratings;
(I also tried with numVotes instead by the way)
And I receive this error: FAILED: ValidationFailureSemanticException eval_hive_db.title_ratings_part: Partition spec {title_ratings=null} contains non-partition columns
Someone can help me please?
Ideally, I want to partition my table by averageRating (less than 2, between 2 and 4, and greater than 4)

You can run this command to check if there are null values or not.
select count(averageRating) from title_ratings group by averageRating;
Now, if there are null values in this column then you will get the count, which you have to fill then apply partitioning again.

Partition column is stored as last column in a table so while inserting you need to maintain correct order in select statement.
Pls change order of columns in select.
insert into title_ratings_part partition(title_ratings)
Select
Tconst,
numVotes,
averageRating --orderwise this should always be last column
from title_ratings

Empty String is not treated as null in Hive

My understanding of the following statement is that if blank or empty string is inserted into hive column, it will be treated as null.
TBLPROPERTIES('serialization.null.format'=''
To test the functionality i have created a table and insertted '' to the filed 3. When i query for nulls on the field3, there are no rows with that criteria.
Is my understanding of making blank string to null correct??
CREATE TABLE CDR
(
field1 string,
field2 string,
field3 string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
**TBLPROPERTIES('serialization.null.format'='');**
insert overwrite table emmtest.cdr select **field1,field2,''** from emmtest.cdr_non_orc;
select * from emmtest.cdr where **field3 is null;**
The last statement has not returned any rows. But i am expecting all rows to be returned since there is blank string in field3.

TBLPROPERTIES('serialization.null.format'='') means the following:
An empty field in the data files will be treated as NULL when you query the table
When inserting rows to the table, NULL values will be written to the data files as empty fields
You are doing something else -
You are inserting an empty string to a table from a query.
It is treated "as is" - an empty string.
Demo
bash
hdfs dfs -mkdir /user/hive/warehouse/mytable
echo Hello,,World | hdfs dfs -put - /user/hive/warehouse/mytable/data.txt
hive
create table mytable (s1 string,s2 string,s3 string)
row format delimited
fields terminated by ','
;
hive> select * from mytable;
OK
s1 s2 s3
Hello World
hive> alter table mytable set tblproperties ('serialization.null.format'='');
OK
hive> select * from mytable;
OK
s1 s2 s3
Hello NULL World

You can use the following in your Hive Query properties:
NULL DEFINED AS ''
or any character inside the quotes.

Hive Insert overwrite into Dynamic partition external table from a raw external table failed with null pointer exception.,

I have a raw external table with four columns-
Table 1 :
create external table external_partitioned_rawtable (age_bucket
String,country_destination String,gender
string,population_in_thousandsyear int) row format delimited
fields terminated by '\t' lines terminated by '\n' location
'/user/HadoopUser/hive'
I want a external table with partitions from Country_destination and gender.Table -2
create external table external_partitioned (age_bucket
String,population_in_thousandsyear int) partitioned
by(country_destination String,gender String) row format delimited
fields terminated by '\t' lines terminated by '\n';
Insert Overwrite is failing with null pointer exception-
insert overwrite table external_partitioned partition(country_destination,gender) <br>
select (age_bucket,population_in_thousandsyear,country_destination,gender) <br>
from external_partitioned_rawtable;
FAILED: NullPointerException null

For dynamic partition insertion, before executing the INSERT statement you have to execute two properties of hive:
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
then execute insert statement(which I have modified)
insert overwrite table external_partitioned partition(country_destination,gender)
select age_bucket,population_in_thousandsyear,country_destination,gender
from external_partitioned_rawtable;
I hope this help you!!!

loading data into HIve table from notepad

I have loaded the data into hive table from the notepad, it is showing data is copied but when i run the select query it is showing null, please let us know what could be the reason
hive> create table test_sq(k string, v string) stored as sequencefile;
hive> load data local inpath '/tmp/input.txt' into table test_sq;
OK
hive> select * from tesst_t;
OK
NULL NULL
NULL NULL

Notepad : Assuming it is text. Whereas you have specified it as sequencefile.
Your create table script should be:
create table test_sq(k string, v string) row format delimited fields terminated by '';

I m not sure, if it is just a typo but you are trying to query on other table (tesst_t) instead of table that you loaded (test_sq)
Can you provide a sample line from your text file.
If you are using tab as delimiter then you can just use create table test_sq(k string, v string); .In other cases , as venkat has mentioned , use create table test_sq(k string, v string) row format delimited fields terminated by 'single_character_delimiter' . This will work even with tab delimiter('\t').

Hive: Data not getting copied into Hive table from .csv file (stored on hdfs)

Learning hive, created a table and trying to insert data from a csv file, no error is raised but data inserted is all nulls(not actual data from .csv file).There are 100s of records in the .csv input file(file uploaded into hdfs). Please help me out, thanks in advance.
Following is the sequence of commands executed
hive> CREATE TABLE IF NOT EXISTS CampaignDB (isano int,MemberName string,cityordist string,state string,mobile int,email string,memtype string) comment 'Doc Campaign data' row format delimited stored as textfile;
OK
Time taken: 0.323 seconds
hive> desc CampaignDB;
OK
isano int None
membername string None
cityordist string None
state string None
mobile int None
email string None
memtype string None
Time taken: 0.212 seconds, Fetched: 7 row(s)
hive> LOAD DATA INPATH '/user/hadoop/input/campaignDB-sample.csv' OVERWRITE INTO TABLE CampaignDB;
Loading data to table default.campaigndb
Deleted hdfs://localhost:9000/user/hive/warehouse/campaigndb
Table default.campaigndb stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 239, raw_data_size: 0]
OK
Time taken: 0.536 seconds
hive> CREATE TABLE IF NOT EXISTS CampaignDB (isano int,MemberName string,cityordist string,state string,mobile int,email string,memty select * from CampaignDB;
OK
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
Time taken: 0.161 seconds, Fetched: 3 row(s)

CREATE TABLE IF NOT EXISTS CampaignDB
(isano int,
MemberName string,
cityordist string,
state string,
mobile int,
email string,
memtype string)
comment 'Doc Campaign data'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' --if it is comma separated file
STORED AS TEXTFILE;
location '/user/hadoop/input/campaignDB-sample.csv';
The above will create the metadata. To load data,
LOAD DATA LOCAL INPATH '/user/hadoop/input/campaignDB-sample.csv'
OVERWRITE INTO TABLE CampaignDB;
--This could happen if you don't specify a delimiter where the data in the file is using one.

Include a field terminator. after "ROW FORMAT DELIMITED" add FIELDS TERMINATED BY '|' or whatever character splits your fields up. csv file so probably a comma.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

getting null values while loading the data from flat files into hive tables - hadoop

Hive’s default record and field delimiters list: \n ^A ^B ^C press ^V^A could insert a ^A in Vim.

please check the dataset date column it should follow the date format yyyy-mm-dd If the string is in the form 'yyyy-mm-dd', then a date value corresponding to that year/month/day is returned. If the string value does not match this formate, then NULL is returned. Hive Official documentation

Related

HIVE - Cannot partition a table: semantic exception failure

Empty String is not treated as null in Hive

Hive Insert overwrite into Dynamic partition external table from a raw external table failed with null pointer exception.,

loading data into HIve table from notepad

Hive: Data not getting copied into Hive table from .csv file (stored on hdfs)

Categories

Resources