How to create a HIVE table to read semicolon separated values - hadoop

I want to create a HIVE table that will read in semicolon separated values, but my code keeps giving me errors. Does anyone have any suggestions?
CREATE TABLE test_details(Time STRING, Vital STRING, sID STRING)
PARTITIONED BY(Country STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ';'
STORED AS TEXTFILE;

For me nothing worked except this:
FIELDS TERMINATED BY '\u0059'
Edit: After updating Hive:
FIELDS TERMINATED BY '\u003B'
so in full:
CREATE TABLE test_details(Time STRING, Vital STRING, sID STRING)
PARTITIONED BY(Country STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\u0059'
STORED AS TEXTFILE;

The delimiter you are using is the cause for errors. Semi colon is the line terminator for hive which describes completion of hive query.
Use the below modified ddl:
CREATE TABLE test_details(Time STRING, Vital STRING, sID STRING)
PARTITIONED BY(Country STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\;'
STORED AS TEXTFILE;
This will work for you.

Is your text properly sanitized? HIVE natively does not handle quotes in text nicely.
Try using serde with custom separator (i.e. semi-colon in this case).

Related

How can I create a partitioned table that is semicolumn separated and has commas as decimal points?

I'm having problems whith this type of table:
manager; sales
charles; 100,1
ferdand; 212,6
aldalbert; 23,4
chuck; 41,6
I'm using the code bellow to create and define the partitioned table:
CREATE TABLE db.table
(
manager string,
sales string
)
partitioned by (file_type string)
row format delimited fields terminated by ';'
lines terminated by '\n'
tblproperties ("skip.header.line.count"="1");
Afterwards, I'm using a regex command to replace the commas by dots and then convert the sales field to a number datatype.
I wonder if there is a better solution than that.
Other than using Spark or Pig to clean the data as well as load the Hive table, then no, you'll need to replace and cast the sales column within HiveQL to get the format you want

How to handle new line characters in hive?

I am exporting table from Teradata to Hive.. The table in the teradata Has a address field which has New line characters(\n).. initially I am exporting the table to mount filesystem path from Teradata and then I am loading the table into hive... Record counts are mismatching between teradata table and hive table, Since new line characters are presented in hive.
NOTE: I don't want to handle this through sqoop to bring the data I want to handle the new line characters while loading Into hive from local path.
I got this to work by creating an external table with the following options:
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001'
ESCAPED BY '\\'
STORED AS TEXTFILE;
Then I created a partition to the directory that contains the data files. (my table uses partitions)
i.e.
ALTER TABLE STG_HOLD_CR_LINE_FEED ADD PARTITION (part_key='part_week53') LOCATION '/ifs/test/schema.table/staging/';
NOTE: Be sure that when creating your data file you use '\' as the escape character.
Load data command in Hive only copies the data directly into the hdfs table location.
The only reason Hive would split a new line is if you only defined the table stored as TEXT, which by default uses new lines as record separators, not field separators.
To redefine the table you need something like
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',' ESCAPED BY 'x'
LINES TERMINATED BY 'y'
Where, x and y are, hopefully, escape characters around fields containing new lines, and record delimiters, respectively

How to create Hive table for special formated data

I have text files that i want to load into Hive table.
Format of the data is like below
Id|^|SegmId|^|geographyId|^|Sequence|^|Subtracted|^|FFAction|!|
4295875876|^|3|^|110170|^|1|^|False|^|I|!|
4295876137|^|2|^|110170|^|1|^|False|^|I|!|
4295876137|^|8|^|100219|^|1|^|False|^|I|!|
I want to create a table in Hive for this kind of data.
Can you please suggest how to create table for this?
This is what I have tried but getting null (also please suggest us the data type for the columns):
create table if not exists GeographicSegment
(
Id int,
SegId int,
geographyId int,
Sequence int,
Subtracted String,
FFAction String
) row format delimited fields terminated by '|!|' LINES TERMINATED BY '\n' ;
This has worked for me
row format SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH SERDEPROPERTIES ("field.delim"="|^|") tblproperties
It seems that your fields are terminated by '|^|' and your lines are terminated by '|!|\n'
Hive does not support multiple character as delimiter,
you can find the way to handle it here,
Solution
Regarding the data type what you are doing is correct except the first column ID. The value present is more than the range of INT. it can be BIGINT.

hive: external partitioned table without location

Is it possible to create external partitioned table without location? I want to add all the locations later, together with partitions.
i tried:
CREATE EXTERNAL TABLE IF NOT EXISTS a.b
(line STRING)
COMMENT 'abc'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\n'
STORED AS TEXTFILE
PARTITIONED BY day;
but i got ParseException: missing EOF at 'PARTITIONED' near 'TEXTFILE'
I don't think so, as said in alter location.
But anyway, i think your query as some errors and the correct script would be :
CREATE EXTERNAL TABLE IF NOT EXISTS a.b
(line STRING)
COMMENT 'abc'
PARTITIONED BY (day String)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\n'
STORED AS TEXTFILE
;
I think the issue is that you have not specified data type for your partition column "day". And you can create a HIVE external table without location and can use ALTER table options later to change the location.

printing null values in hive while declaring the column as decimal

I declared column as decimal.The data looks like this.
4500.00
5321.00
532.00
column name : area Decimal(9,2)
but in Hive it shows like this:
NULL
NULL
If I declare the column as a string it works fine. But I need it in decimal only.
I believe this could be problem with the external table's delimiter mismatch.
I hope you might have configured different delimiter rather than the actual delimter exist in the file in case if you are using the external table.
Please try to find the actual delimiter and alter the table using the below command,
alter table <table_name> set SERDEPROPERTIES ('field.delim' = '<actual_delimiter>');
In create table statement
for example
CREATE TABLE figure(length DOUBLE, width DOUBLE, area DOUBLE) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE;
You replace '\t' by actual delimiter from your data file.

Resources