Hive: Data not getting copied into Hive table from .csv file (stored on hdfs) - hadoop

Learning hive, created a table and trying to insert data from a csv file, no error is raised but data inserted is all nulls(not actual data from .csv file).There are 100s of records in the .csv input file(file uploaded into hdfs). Please help me out, thanks in advance.
Following is the sequence of commands executed
hive> CREATE TABLE IF NOT EXISTS CampaignDB (isano int,MemberName string,cityordist string,state string,mobile int,email string,memtype string) comment 'Doc Campaign data' row format delimited stored as textfile;
OK
Time taken: 0.323 seconds
hive> desc CampaignDB;
OK
isano int None
membername string None
cityordist string None
state string None
mobile int None
email string None
memtype string None
Time taken: 0.212 seconds, Fetched: 7 row(s)
hive> LOAD DATA INPATH '/user/hadoop/input/campaignDB-sample.csv' OVERWRITE INTO TABLE CampaignDB;
Loading data to table default.campaigndb
Deleted hdfs://localhost:9000/user/hive/warehouse/campaigndb
Table default.campaigndb stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 239, raw_data_size: 0]
OK
Time taken: 0.536 seconds
hive> CREATE TABLE IF NOT EXISTS CampaignDB (isano int,MemberName string,cityordist string,state string,mobile int,email string,memty select * from CampaignDB;
OK
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
Time taken: 0.161 seconds, Fetched: 3 row(s)

CREATE TABLE IF NOT EXISTS CampaignDB
(isano int,
MemberName string,
cityordist string,
state string,
mobile int,
email string,
memtype string)
comment 'Doc Campaign data'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' --if it is comma separated file
STORED AS TEXTFILE;
location '/user/hadoop/input/campaignDB-sample.csv';
The above will create the metadata. To load data,
LOAD DATA LOCAL INPATH '/user/hadoop/input/campaignDB-sample.csv'
OVERWRITE INTO TABLE CampaignDB;
--This could happen if you don't specify a delimiter where the data in the file is using one.

Include a field terminator. after "ROW FORMAT DELIMITED" add FIELDS TERMINATED BY '|' or whatever character splits your fields up. csv file so probably a comma.

Related

Error Loading CSV data into a Hive table

I have a CSV file which has rows in the following format,
1, 11812, 15273, "2016-05-22T111647.800 US/Eastern", 82971850, 0
1, 11812, 7445, "2016-05-22T113640.200 US/Eastern", 82971928, 0
1, 11654, 322, "2016-05-22T113845.773 US/Eastern", 82971934, 0
1, 11722, 0, "2016-05-22T113929.541 US/Eastern", 82971940, 0
The I create a Hive table with the following command,
create table event_history(status tinyint, condition smallint,
machine_id int, time timestamp, ident int, state tinyint)
Then I am trying to load the CSV file into the table with the following command,
load data local inpath "/home/ubuntu/events.csv" into table event_history;
But all I get is NULLs when I try to do a select query in the created table. What am I missing here?
The Hive version is Hive 1.2.1
My error was in the table creation. Fixed it with the following changes
create table event_history(status tinyint, condition smallint, machine_id int,
time timestamp, drqs int, state tinyint) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

How to store date value in hive timestamp?

I am trying to store the date and timestamp values in timestamp column using hive. The source file contain the values of date or sometimes timestamps.
Is there a way to read both date and timestamp by using the timestamp data type in hive.
Input:
2015-01-01
2015-10-10 12:00:00.232
2016-02-01
Output which I am getting:
null
2015-10-10 12:00:00.232
null
Is it possible to read both values by using timestamp data type.
DDL:
create external table mytime(id string ,t timestamp) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 'hdfs://xxx/data/dev/ind/'
I was able think of a workaround. tried this with a small set of data:
Load the data with inconsistent date data into a hive table say table1 by making the column as string datatype .
Now create another table table2 with the datatype as timestamp for the required column and load the data from table1 to table2 using the transformation INSERT OVERWRITE TABLE table2 select id,if(length(tsstr) > 10, tsstr, concat(tsstr,' 00:00:00')) from table1;
This should load the data in required format.
Code as below:
`
create table table1
(
id int,
tsstr string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION '/user/cloudera/hive/table1.tb';
Data:
1,2015-04-15 00:00:00
2,2015-04-16 00:00:00
3,2015-04-17
LOAD DATA LOCAL INPATH '/home/cloudera/data/tsstr' INTO TABLE table1;
create table table2
(
id int,
mytimestamp timestamp
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION '/user/cloudera/hive/table2.tb';
INSERT INTO TABLE table2 select id,if(length(tsstr) > 10, tsstr, concat(tsstr,' 00:00:00')) from table1;
Result shows up as expected:
Hive is similar to any other database in terms of datatype mapping and hence requires a uniform values for a specific column to be stored under a conformed datatype. The data in your file for second column has non-uniform data i.e, some are in date format while others in timestamp format.
In order to not to lose the date, as suggested by #Kishore , make sure you have a uniform datatype in the file and get the file with timestamp values as 2016-01-01 00:00:000 where there are only dates.

Hive Text format with multi line column to ORC

When a hive table in text format with multiline column gets converted to ORC format, it fails to read the columns right.
Hive table with custom record delimiter
CREATE EXTERNAL TABLE IF NOT EXISTS MULTILINE_XML_TXT
(id INT, name STRING, xml STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION '/practice/xml/multiline/mysql/text/in/'
TBLPROPERTIES ('textinputformat.record.delimiter'='#');
The xml column in the above table has data in multiple lines.
When i query from this table, i see the data right.
Sample data (2 rows) in the above table
100 xyz <employees><employee><age>26</age>
</employee><employee><age>45</age>
</employee></employees>
200 abc <employees><employee><age>20</age>
</employee><employee>
<age>50</age></employee></employees>
I created another table with the ORC format and copied data from the text table to the ORC table, but the conversion is not correct.
CREATE TABLE IF NOT EXISTS MULTILINE_XML_ORC
(id INT, name STRING, xml STRING) STORED AS ORC;
INSERT OVERWRITE TABLE MULTILINE_XML_ORC
SELECT id, name, xml FROM MULTILINE_XML_TXT;
Executing the query select * from MULTILINE_XML_ORC gives the following result, which is incorrect.
100 xyz <employees><employee><age>26</age>
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
NULL abc <employees><employee><age>20</age>
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
Any thoughts?

loading data into HIve table from notepad

I have loaded the data into hive table from the notepad, it is showing data is copied but when i run the select query it is showing null, please let us know what could be the reason
hive> create table test_sq(k string, v string) stored as sequencefile;
hive> load data local inpath '/tmp/input.txt' into table test_sq;
OK
hive> select * from tesst_t;
OK
NULL NULL
NULL NULL
Notepad : Assuming it is text. Whereas you have specified it as sequencefile.
Your create table script should be:
create table test_sq(k string, v string) row format delimited fields terminated by '';
I m not sure, if it is just a typo but you are trying to query on other table (tesst_t) instead of table that you loaded (test_sq)
Can you provide a sample line from your text file.
If you are using tab as delimiter then you can just use create table test_sq(k string, v string); .In other cases , as venkat has mentioned , use create table test_sq(k string, v string) row format delimited fields terminated by 'single_character_delimiter' . This will work even with tab delimiter('\t').

getting null values while loading the data from flat files into hive tables

I am getting the null values while loading the data from flat files into hive tables.
my tables structure is like this:
hive> create table test_hive (id int,value string);
and my flat file is like this:
input.txt
1 a
2 b
3 c
4 d
5 e
6 F
7 G
8 j
when I am running the below commands I am getting null values:
hive> LOAD DATA LOCAL INPATH '/home/hduser/input.txt' OVERWRITE INTO TABLE test_hive;
hive> select * from test_hive;
OK<br>
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
screen shot:
hive> create table test_hive (id int,value string);
OK
Time taken: 4.97 seconds
hive> show tables;
OK
test_hive
Time taken: 0.124 seconds
hive> LOAD DATA LOCAL INPATH '/home/hduser/input2.txt' OVERWRITE INTO TABLE test_hive;
Copying data from file:/home/hduser/input2.txt
Copying file: file:/home/hduser/input2.txt
Loading data to table default.test_hive
Deleted hdfs://hydhtc227141d:54310/app/hive/warehouse/test_hive
OK
Time taken: 0.572 seconds
hive> select * from test_hive;
OK
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
Time taken: 0.182 seconds
The default field terminator in Hive is ^A. You need to explicitly mention in your create table statement that you are using a different field separator.
Similar to what Lorand Bending pointed in the comment, use:
CREATE TABLE test_hive(id INT, value STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ';
You don't need to specify a location since you are creating a managed table (and not an external table).
Problem you are facing is because in your data the fields are separated by ' ' and while creating table you did not mention the field delimiter. So if you don't mention the field delimiter while creating hive table, by default hive considers ^A as delimiter.
So to resolve your problem, you can recreate the table mentioning the below syntax and it would work.
CREATE TABLE test_hive(id INT, value STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ';
The solution is quite simple. The Table wan't created in the right way.
Simple solution for your problem or any further problems is knowing how to load the data.
CREATE TABLE [IF NOT EXIST] mytableName(id int,value string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '/t'
STORED AS TEXTFILE ;
Now lemme explain the code :
First Line
Creating your table. The [IF NOT EXIST] is optional that tells if the table exist don't overwrite it. Its more of safety measure.
Second line
Specifies a delimiter at the table level for structured fields.
Third Item
You can include any single character, but the default is '\001'.
'/t' is for a tab space : in your case
'|' is for data which are beside each other and separated by |
' ' for one char space. And so on...
Forth Line :
Specifies the type of file in which data is to be stored. The file can be a TEXTFILE, SEQUENCEFILE, RCFILE, or BINARY SEQUENCEFILE. Or, how the data is stored can be specified as Java input and output classes.
when loading Locally :
LOCD DATA LOCAL INPATH '/your/data/path.csv' [OVERWRITE] INTO TABLE myTableName;
Always try checking your data by a simple select* statement.
Hope it helps.
Hive’s default record and field delimiters list:
\n
^A
^B
^C
press ^V^A could insert a ^A in Vim.
The elements are separated by space or tab? Let it's tab follow these steps. If separated space use ' ' instead of '\t' Ok.
hive> CREATE TABLE test_hive(id INT, value STRING) row format
delimited fields terminated by '\t' line formated by '\n' stored as filename;
Than you have to enter
hive> LOAD DATA LOCAL INPATH '/home/hduser/input.txt' OVERWRITE INTO TABLE test_hive;
hive> select * from test_hive;
Now you will get exact your expected output "filename".
please check the dataset date column it should follow the date format yyyy-mm-dd
If the string is in the form 'yyyy-mm-dd', then a date value corresponding to that year/month/day is returned. If the string value does not match this formate, then NULL is returned.
Hive Official documentation

Resources