loading data into HIve table from notepad - hadoop

I have loaded the data into hive table from the notepad, it is showing data is copied but when i run the select query it is showing null, please let us know what could be the reason
hive> create table test_sq(k string, v string) stored as sequencefile;
hive> load data local inpath '/tmp/input.txt' into table test_sq;
OK
hive> select * from tesst_t;
OK
NULL NULL
NULL NULL

Notepad : Assuming it is text. Whereas you have specified it as sequencefile.
Your create table script should be:
create table test_sq(k string, v string) row format delimited fields terminated by '';

I m not sure, if it is just a typo but you are trying to query on other table (tesst_t) instead of table that you loaded (test_sq)
Can you provide a sample line from your text file.
If you are using tab as delimiter then you can just use create table test_sq(k string, v string); .In other cases , as venkat has mentioned , use create table test_sq(k string, v string) row format delimited fields terminated by 'single_character_delimiter' . This will work even with tab delimiter('\t').

Related

Inserting into Hive Table error

I am looking to encode columns of a table in hive.
I tried:
hive> create table encode_test(id int, name STRING, phone STRING, address STRING)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> WITH SERDEPROPERTIES ('column.encode.columns'='phone,address', 'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly') STORED AS TEXTFILE;
Say i have a CSV file, with following row
100,'navis','010-0000-0000','Seoul Seocho'
Now i tried to use.
LOAD DATA LOCAL INPATH
'/home/path/to/csv/test.csv'
INTO TABLE encode_test;
But when doing Select * from encode_test i am getting all columns NULL
Whereas the result should have been
100 navis MDEwLTAwMDAtMDAwMA== U2VvdWwsIFNlb2Nobw==
Also i want to give Fields TERMINATED BY ',' IN create table encode_test query.
but i am getting error: EOF error Near Fields
I also tried creating another table sample
create table sample(id int, name STRING, phone STRING, address STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
And then imported the csv file in the sample table. and it was successfully imported.
then i tried using.
insert into encode_test select * from sample;
But i am getting this new error
Permission denied: user=root, access=WRITE, inode="/user":h dfs:supergroup:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.c heckFsPermission(DefaultAuthorizationProvider.java:279)
I'n new into hadoop
Please refer to this link from where i tried this problem
In Hive DDL, ROW FORMAT SERDE and FIELDS TERMINATED BY cannot co-exist together. Instead you can use, field.delim serde property.
create table encode_test(id int, name STRING, phone STRING, address STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'field.delim'=',',
'column.encode.columns'='phone,address',
'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly')
STORED AS TEXTFILE;
And for the PermissionDenied exception, run the hive queries as either hdfs or hive user since root user does not have WRITE access to HDFS.

Hive External table retrieve query (New to Hive )

I created below mention external table..
create external table if not exists sensor.building1 (BuildingID int,BuildingMgr string , BuildingAge string, HVACproduct string , Country string) row format delimited fields terminated by ',';
Loaded the table by using below query..
load data inpath '/user/cloudera/sensor/SensorFiles/building.csv' into table sensor.building1;
When I am trying to retrieve the buildingID column using below query, but I am getting null value..
select a.BuildingID
from sensor.building1 as a
limit 10;
Please guide me where I am doing something wrong
You are trying to load a CSV file into hive table but hive's default field delimiter is '\001'
So while you tring to load data from csv (I am assuming its ',' separated) its get failed.
You can create table like :
create external table test1(country string, name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n';

Unable to load data in Hive partitioned table

I have created a table in Hive with the following query:
create table if not exists employee(CASE_NUMBER String,
CASE_STATUS String,
CASE_RECEIVED_DATE DATE,
DECISION_DATE DATE,
EMPLOYER_NAME STRING,
PREVAILING_WAGE_PER_YEAR BIGINT,
PAID_WAGE_PER_YEAR BIGINT,
order_n int) partitioned by (JOB_TITLE_SUBGROUP STRING) row format delimited fields terminated by ',';
I tried loading data into the create table using below query:
LOAD DATA INPATH '/salary_data.csv' overwrite into table employee partition (JOB_TITLE_SUBGROUP);
For the partitioned table, I have even set following configuration :
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
But I am getting below error while executing the load query:
Your query has the following error(s):
Error while compiling statement: FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Invalid partition key & values; keys [job_title_subgroup, ], values [])
Please help.
If you want to load data into a Hive partition, you have to provide the value of the partition itself in the LOAD DATA query. So in this case, your query would be something like this.
LOAD DATA INPATH '/salary_data.csv' overwrite into table employee partition (JOB_TITLE_SUBGROUP="Value");
Where "Value" is the name of the partition in which you are loading your data. The reason is because Hive will use "Value" to create the directory in which your .csv is going to be stored, which will be something like this: .../employee/JOB_TITLE_SUBGROUP=Value. I hope this helps.
Check the documentation for details on the LOAD DATA syntax.
EDITED
Since the table has dynamic partition, one solution would be loading the .csv into an external table (e.g. employee_external) and then execute an INSERT command like this:
INSERT OVERWRITE INTO TABLE employee PARTITION(JOB_TITLE_SUBGROUP)
SELECT CASE_NUMBER, CASE_STATUS, (...), JOB_TITLE_SUBGROUP
FROM employee_external
I might be little late to reply but can try below steps:
Set below properties first :
Ø set hive.exec.dynamic.partition.mode=nonstrict;
Ø set hive.exec.dynamic.partition=true;
Create temp table first:
CREATE EXTERNAL TABLE IF NOT EXISTS employee_temp(
ID STRING,
Name STRING,
Salary STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
tblproperties ("skip.header.line.count"="1");
Load Data in temporary table:
hive> LOAD DATA INPATH 'filepath/employee.csv' OVERWRITE INTO TABLE employee;
Create Partitioned Table:
CREATE EXTERNAL TABLE IF NOT EXISTS employee_part(
ID STRING,
Name STRING)
PARTITIONED BY (Salary STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
tblproperties ("skip.header.line.count"="1");
Load Data into partitioned table from intermediate / temp table:
INSERT OVERWRITE TABLE employee_part PARTITION (SALARY) SELECT * FROM employee;

How can I do a double delimiter(||) in Hive?

I am trying to load data into hive tables which is delimited by double pipe(||). When I try this :
Sample I/P:
1405983600000||111.111.82.41||806065581||session-id
Creating table in hive:
create table test_hive(k1 string, k2 string, k3 string, k4 string,) row format delimited fields terminated by '||' stored as textfile;
Loading data from text file:
load data local inpath '/Desktop/input.txt' into table test_hive;
When I do this it is storing data in the below format:
1405983600000 tabspace-as-second-column 111.111.82.41 tabspace-as-fourth-column
Where as I am expecting the data in table to be
1405983600000 111.111.82.41 806065581 session-id
Kindly help me out I have tried different options on this but unable to resolve it
Multicharater delimiter eg. || is not supported in Hive till ver 0.13 . So fields terminated by || won't work out.There is an alter native for this.
CREATE EXTERNAL TABLE page_view(viewTime INT, userid BIGINT,
page_url STRING, referrer_url STRING,
ip STRING COMMENT 'IP Address of the User',
country STRING COMMENT 'country of origination')
COMMENT 'This is the staging page view table'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054'
SERDE serde_name WITH SERDEPROPERTIES (field.delim='||')
STORED AS TEXTFILE
LOCATION '<hdfs_location>';
The default serde can be used. Multi character delimiters can be used for fields , line , escape characters by specifying them in the serde properties.
This issue has been resolved in hive 14 with the use of multidelimiter serde. Please find documentation here.
https://cwiki.apache.org/confluence/display/Hive/MultiDelimitSerDe
You could do this if you don't want to use alternate serde or have earlier version of hive:
create external table my_table (line string) location /path/file;
Then create view on top:
create view my_view as select split(line,'\\|\\|')[0] as column_1
, split(line,'\\|\\|')[1] as column_2
, split(line,'\\|\\|')[2] as column_3
from my_table;
Query the view. Good luck.

getting null values while loading the data from flat files into hive tables

I am getting the null values while loading the data from flat files into hive tables.
my tables structure is like this:
hive> create table test_hive (id int,value string);
and my flat file is like this:
input.txt
1 a
2 b
3 c
4 d
5 e
6 F
7 G
8 j
when I am running the below commands I am getting null values:
hive> LOAD DATA LOCAL INPATH '/home/hduser/input.txt' OVERWRITE INTO TABLE test_hive;
hive> select * from test_hive;
OK<br>
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
screen shot:
hive> create table test_hive (id int,value string);
OK
Time taken: 4.97 seconds
hive> show tables;
OK
test_hive
Time taken: 0.124 seconds
hive> LOAD DATA LOCAL INPATH '/home/hduser/input2.txt' OVERWRITE INTO TABLE test_hive;
Copying data from file:/home/hduser/input2.txt
Copying file: file:/home/hduser/input2.txt
Loading data to table default.test_hive
Deleted hdfs://hydhtc227141d:54310/app/hive/warehouse/test_hive
OK
Time taken: 0.572 seconds
hive> select * from test_hive;
OK
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
Time taken: 0.182 seconds
The default field terminator in Hive is ^A. You need to explicitly mention in your create table statement that you are using a different field separator.
Similar to what Lorand Bending pointed in the comment, use:
CREATE TABLE test_hive(id INT, value STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ';
You don't need to specify a location since you are creating a managed table (and not an external table).
Problem you are facing is because in your data the fields are separated by ' ' and while creating table you did not mention the field delimiter. So if you don't mention the field delimiter while creating hive table, by default hive considers ^A as delimiter.
So to resolve your problem, you can recreate the table mentioning the below syntax and it would work.
CREATE TABLE test_hive(id INT, value STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ';
The solution is quite simple. The Table wan't created in the right way.
Simple solution for your problem or any further problems is knowing how to load the data.
CREATE TABLE [IF NOT EXIST] mytableName(id int,value string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '/t'
STORED AS TEXTFILE ;
Now lemme explain the code :
First Line
Creating your table. The [IF NOT EXIST] is optional that tells if the table exist don't overwrite it. Its more of safety measure.
Second line
Specifies a delimiter at the table level for structured fields.
Third Item
You can include any single character, but the default is '\001'.
'/t' is for a tab space : in your case
'|' is for data which are beside each other and separated by |
' ' for one char space. And so on...
Forth Line :
Specifies the type of file in which data is to be stored. The file can be a TEXTFILE, SEQUENCEFILE, RCFILE, or BINARY SEQUENCEFILE. Or, how the data is stored can be specified as Java input and output classes.
when loading Locally :
LOCD DATA LOCAL INPATH '/your/data/path.csv' [OVERWRITE] INTO TABLE myTableName;
Always try checking your data by a simple select* statement.
Hope it helps.
Hive’s default record and field delimiters list:
\n
^A
^B
^C
press ^V^A could insert a ^A in Vim.
The elements are separated by space or tab? Let it's tab follow these steps. If separated space use ' ' instead of '\t' Ok.
hive> CREATE TABLE test_hive(id INT, value STRING) row format
delimited fields terminated by '\t' line formated by '\n' stored as filename;
Than you have to enter
hive> LOAD DATA LOCAL INPATH '/home/hduser/input.txt' OVERWRITE INTO TABLE test_hive;
hive> select * from test_hive;
Now you will get exact your expected output "filename".
please check the dataset date column it should follow the date format yyyy-mm-dd
If the string is in the form 'yyyy-mm-dd', then a date value corresponding to that year/month/day is returned. If the string value does not match this formate, then NULL is returned.
Hive Official documentation

Resources