What is the default delimiter for Hive tables? - hadoop

If we don't mention any delimiter while creating a table, is there any default delimiter hive takes?
create table logs (ts bigint, line string)
partitioned by (dt String, country String);

The default delimiter '\001' if you havn't set when create a hivetable .
you can change it to others delimiter .
for example
hive> CREATE TABLE IF NOT EXISTS student1
> (sno INT,sname STRING,age INT,sex STRING)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> STORED AS TEXTFILE;

Related

Can not create a Path from an empty string when using hive 3

I run the following hive sql on hive 3, and it throw error of Can not create a Path from an empty string, but the same sql works on hive 2, anyone know why ? Thanks
CREATE TABLE IF NOT EXISTS people_csv (id int, name string)
PARTITIONED by (dt string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;

HIVE - create external tables where string itself contains commas

I am new to Hive and am creating external tables on csv file. One of the issues I am coming across are values that contain multiple commas within string itself. For example, the csv file contains the following:
CSV File
When I create an external table in Hive, because there are columns within the "name" column, it shifts the first name to the right adding another column. This throws all of the data off when you view the table in Hive.
External Table result in Hive
Is there anything I can add to my script to keep the commas but also keep first and last name in the same column when the external table is created? Thank you all in advance - I am very new to Hive.
CREATE EXTERNAL TABLE database.table name (
ID INT,
Name String,
City String,
State String
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/xyz/xyz/database/directory/'
TBLPROPERTIES ("skip.header.line.count"="1");
Check this solution - you need to add this line : ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
https://community.cloudera.com/t5/Support-Questions/comma-in-between-data-of-csv-mapped-to-external-table-in/td-p/220193
Complete DDL example:
create table hcc(field1 string,
field2 string,
field3 string,
field4 string,
field5 string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = ",",
"quoteChar" = "\"");

How to Copy TEXT format partitioned table to ORC format Table in Hive

i have a Text Format hive table, like:
CREATE EXTERNAL TABLE op_log (
time string, debug string,app_id string,app_version string, ...more fields)
PARTITIONED BY (dt string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;
now i create a orc format table with same fields, like
CREATE TABLE op_log_orc (
time string, debug string,app_id string,app_version string, ...more fields)
PARTITIONED BY (dt string)
STORED AS ORC tblproperties ("orc.compress" = "SNAPPY");
when i copy from op_log to op_log_orc, i have get this errors:
hive> insert into op_log_orc PARTITION(dt='2016-08-09') select * from op_log where dt='2016-08-09';
FAILED: SemanticException [Error 10044]: Line 1:12 Cannot insert into target table because column number/types are different ''2016-08-09'': Table insclause-0 has 62 columns, but query has 63 columns.
hive>
The partition key (dt) in the source table is returned in the result set as though it were a regular field, so you have the extra column. Exclude the dt field from the field list (instead of *) if you're going to specify its value in the partition key. Alternatively, just specify dt as the name of the partition, without providing a value. See CTAS (create table as select...) in the example here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableAsSelect(CTAS)

How can I do a double delimiter(||) in Hive?

I am trying to load data into hive tables which is delimited by double pipe(||). When I try this :
Sample I/P:
1405983600000||111.111.82.41||806065581||session-id
Creating table in hive:
create table test_hive(k1 string, k2 string, k3 string, k4 string,) row format delimited fields terminated by '||' stored as textfile;
Loading data from text file:
load data local inpath '/Desktop/input.txt' into table test_hive;
When I do this it is storing data in the below format:
1405983600000 tabspace-as-second-column 111.111.82.41 tabspace-as-fourth-column
Where as I am expecting the data in table to be
1405983600000 111.111.82.41 806065581 session-id
Kindly help me out I have tried different options on this but unable to resolve it
Multicharater delimiter eg. || is not supported in Hive till ver 0.13 . So fields terminated by || won't work out.There is an alter native for this.
CREATE EXTERNAL TABLE page_view(viewTime INT, userid BIGINT,
page_url STRING, referrer_url STRING,
ip STRING COMMENT 'IP Address of the User',
country STRING COMMENT 'country of origination')
COMMENT 'This is the staging page view table'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054'
SERDE serde_name WITH SERDEPROPERTIES (field.delim='||')
STORED AS TEXTFILE
LOCATION '<hdfs_location>';
The default serde can be used. Multi character delimiters can be used for fields , line , escape characters by specifying them in the serde properties.
This issue has been resolved in hive 14 with the use of multidelimiter serde. Please find documentation here.
https://cwiki.apache.org/confluence/display/Hive/MultiDelimitSerDe
You could do this if you don't want to use alternate serde or have earlier version of hive:
create external table my_table (line string) location /path/file;
Then create view on top:
create view my_view as select split(line,'\\|\\|')[0] as column_1
, split(line,'\\|\\|')[1] as column_2
, split(line,'\\|\\|')[2] as column_3
from my_table;
Query the view. Good luck.

Create HIVE Table with multi character delimiter

I want to create a HIVE Table with multi string character as a delimiter such as
CREATE EXTERNAL TABlE tableex(id INT, name STRING)
ROW FORMAT delimited fields terminated by ','
LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/user/myusername';
I want to have delimiter as a multi string like "~*".
FILELDS TERMINATED BY does not support multi-character delimiters. The easiest way to do this is to use RegexSerDe:
CREATE EXTERNAL TABlE tableex(id INT, name STRING)
ROW FORMAT 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "^(\\d+)~\\*(.*)$"
)
STORED AS TEXTFILE
LOCATION '/user/myusername';
Please use MultiDelimitSerde
CREATE EXTERNAL TABlE tableex(id INT, name STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe'
WITH SERDEPROPERTIES ("field.delim"="~*")
STORED AS TEXTFILE
LOCATION '/user/myusername';

Resources