Unable to create table in Hive for csv file

Unable to create table in Hive for csv file - hadoop

Using below command to create table in hive but giving error.
CREATE TABLE TestData ( id1 int, id2 int, id3 int, id4 String) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’ stored as textfile;
Error
FAILED: ParseException line 1:106 mismatched input ',' expecting StringLiteral near 'BY' in table row format's field separator
I have tried '\054' instead of ',' but not working.

You Didn't Mentioned How Rows are Separated..
Edit:
i.e. Lines

Most of the time things which are not visible from open eyes , can be seen through vi editor ;) . There are control characters in command.
cat -v file.txt
CREATE TABLE TestData ( id1 int, id2 int, id3 int, id4 String) ROW FORMAT DELIMITED
FIELDS TERMINATED BY M-bM-^#M-^X,M-bM-^#M-^Y stored as textfile;
Solution - don't copy paste command, just type it.

String type contains two data types in hive: VARCHAR and CHAR.
Instead of String use either VARCHAR or CHARas data type as this will resolve your issue.
*also don't forget to mention the size while specifying varchar.
Example: CREATE TABLE TestData ( id1 int, id2 int, id3 int, id4 varchar (65355)) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’ stored as textfile;

Related

HIVE - create external tables where string itself contains commas

I am new to Hive and am creating external tables on csv file. One of the issues I am coming across are values that contain multiple commas within string itself. For example, the csv file contains the following:
CSV File
When I create an external table in Hive, because there are columns within the "name" column, it shifts the first name to the right adding another column. This throws all of the data off when you view the table in Hive.
External Table result in Hive
Is there anything I can add to my script to keep the commas but also keep first and last name in the same column when the external table is created? Thank you all in advance - I am very new to Hive.
CREATE EXTERNAL TABLE database.table name (
ID INT,
Name String,
City String,
State String
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/xyz/xyz/database/directory/'
TBLPROPERTIES ("skip.header.line.count"="1");

Check this solution - you need to add this line : ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
https://community.cloudera.com/t5/Support-Questions/comma-in-between-data-of-csv-mapped-to-external-table-in/td-p/220193
Complete DDL example:
create table hcc(field1 string,
field2 string,
field3 string,
field4 string,
field5 string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = ",",
"quoteChar" = "\"");

remove surrounding quotes from fields while loading data into hive

I want to load a table with input data into hive. I have data in the following format.
"153662";"0002241447";"0"
"153662";"000647036X";"0"
"153662";"0020434901";"0"
"153662";"0020973403";"0"
"153662";"0028604202";"0"
"153662";"0030437512";"0"
I want to load this data into a table with two varchar columns and one int column.But the surrounding double quotes trouble me. I have created the following table.
CREATE EXTERNAL TABLE Table(A varchar(50),B varchar(50),C varchar(50))
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\;'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
but the quotes around the field also become part of field as shown below.
"276725" "034545104X" "0"
"276726" "0155061224" "5"
I want to ignore them. Also I want the third field to be read as INT. Currently it becomes NULL when I provide third field as INT while making table.

You will have to use Csv-Serde for this.
CREATE TABLE Table(A varchar(50),B varchar(50),C varchar(50))
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES
(
"separatorChar" = ";",
"quoteChar" = "\""
)
STORED AS TEXTFILE;

Multiple ways to achieve this:
Use CSV serde
Use regex serde- regex "\"(.*)\"\;\"(.*)\"\;\"(.*)\""
Load data to external table then remove double quotes:
CREATE EXTERNAL TABLE source(
a string,
b String,
c String)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\;' LOCATION 'xyz';
CREATE TABLE destination AS SELECT REGEXP_REPLACE(a,'"',''), REGEXP_REPLACE(b,'"',''), CAST ( REGEXP_REPLACE(c,'"','') AS BIGINT) FROM source;

Hive query to remove double quotes around the string.
Example:
col2 value: "my name is, abc"
select col1, (regexp_replace(col2,'"','')) as col2 from table;
Output: my name is, abc

I have a map of inputs inside a square bracket and I want to read it it in hive

Input File:
[Tom,123,0,jump]
[jerry,345,1,run]
I want to read the above input in hive,
my ddl is
CREATE EXTERNAL TABLE IF NOT EXISTS db1.tomjerrry
( name string, id
int, isGood int, activity string )
row format delimited fields terminated by ','
LOCATION '/user/myname/sample.txt'
When i try reading ,
Select name from db1.tomjerrry
I get,
[Tom
[jerry
How do I remove the square bracket in the hive output.?

Add ESCAPED BY '['
ie
CREATE EXTERNAL TABLE IF NOT EXISTS db1.tomjerrry ( name ARRAY<string>, id int, isGood int, activity string )
row format delimited fields terminated by ',' ESCAPED BY '[';
LOCATION '/user/myname/sample.txt'
Or update CSV file remove [.

Hive: How to delimit rows using a string literal

Need help here.
This is related to hive.
i have a text file with a single long line, for e.g:
JASON 29\SASHA 24\CHRISTINE 15\ROBERT 20\
Now i need to create a table in hive, whose rows are delimited using "\" (backslash), like if i insert the data from the above mentioned line "JASON 29\SASHA 24...." i would want 4 rows to be inseted in my table.
in other words, i want my custom char to be row delimiters, and not the default "\n".
i wrote the DDL:
CREATE TABLE newline_tab
(
name STRING,
age INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\\'
STORED AS TEXTFILE;
but i am unable to create the table, and im getting following error:
FAILED: SemanticException 9:20 LINES TERMINATED BY only supports newline '\n' right now. Error encountered near token ''\''
any help would be appreciated :)

CREATE TABLE IF NOT EXISTS employee ( eid int, name String,
salary String, destination String)
COMMENT ‘Employee details’
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘\t’
LINES TERMINATED BY ‘\n’
STORED AS TEXTFILE;

How to specify custom string for NULL values in Hive table stored as text?

When storing Hive table in text format, for example this table:
CREATE EXTERNAL TABLE clustered_item_info
(
country_id int,
item_id string,
productgroup int,
category string
)
PARTITIONED BY (cluster_id int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '${hivevar:table_location}';
Fields with null values are represented as '\N' strings, also for numbers NaNs are represented as 'NaN' strings.
Does Hive provide a way to specify custom string to represent these special values?
I would like to use empty strings instead of '\N' and 0 instead of 'NaN' - I know this substitution can be done with streaming, but is there any way to it cleanly using Hive instead of writing extra code?
Other info:
I'm using Hive 0.8 if that matters...

Use this property while creating the table
CREATE TABLE IF NOT EXISTS abc
(
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
TBLPROPERTIES ("serialization.null.format"="")

oh, sorry. I read your question not clear
If you want to represented empty string instead of '\N', you can using COALESCE function:
INSERT OVERWRITE DIRECTORY 's3://bucket/result/'
SELECT NULL, COALESCE(NULL,"")
FROM data_table;

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Unable to create table in Hive for csv file - hadoop

You Didn't Mentioned How Rows are Separated.. Edit: i.e. Lines

Related

HIVE - create external tables where string itself contains commas

remove surrounding quotes from fields while loading data into hive

I have a map of inputs inside a square bracket and I want to read it it in hive

Hive: How to delimit rows using a string literal

How to specify custom string for NULL values in Hive table stored as text?

Categories

Resources