hive: external partitioned table without location - hadoop

Is it possible to create external partitioned table without location? I want to add all the locations later, together with partitions.
i tried:
CREATE EXTERNAL TABLE IF NOT EXISTS a.b
(line STRING)
COMMENT 'abc'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\n'
STORED AS TEXTFILE
PARTITIONED BY day;
but i got ParseException: missing EOF at 'PARTITIONED' near 'TEXTFILE'

I don't think so, as said in alter location.
But anyway, i think your query as some errors and the correct script would be :
CREATE EXTERNAL TABLE IF NOT EXISTS a.b
(line STRING)
COMMENT 'abc'
PARTITIONED BY (day String)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\n'
STORED AS TEXTFILE
;

I think the issue is that you have not specified data type for your partition column "day". And you can create a HIVE external table without location and can use ALTER table options later to change the location.

Related

How can I create a partitioned table that is semicolumn separated and has commas as decimal points?

I'm having problems whith this type of table:
manager; sales
charles; 100,1
ferdand; 212,6
aldalbert; 23,4
chuck; 41,6
I'm using the code bellow to create and define the partitioned table:
CREATE TABLE db.table
(
manager string,
sales string
)
partitioned by (file_type string)
row format delimited fields terminated by ';'
lines terminated by '\n'
tblproperties ("skip.header.line.count"="1");
Afterwards, I'm using a regex command to replace the commas by dots and then convert the sales field to a number datatype.
I wonder if there is a better solution than that.
Other than using Spark or Pig to clean the data as well as load the Hive table, then no, you'll need to replace and cast the sales column within HiveQL to get the format you want

Partition column equal to current date in Hive

I am trying to load data into a Hive table using partition.
The code is as follow:
CREATE EXTERNAL TABLE URL(url STRING, clicks INT)
COMMENT 'Unique Clicks per URL'
PARTITIONED BY(dt STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/mypath/URL';
LOAD DATA INPATH '/inputpath/' INTO TABLE URL
PARTITION (dt=date_format(CURRENT_TIMESTAMP, "yyyy.MM.dd HH:mm:ss"));
I am gettin the following error:
FAILED: ParseException line 4:14 cannot recognize input near
'date_format' '(' 'CURRENT_TIMESTAMP' in constant
I tried using
SET hive.exec.dynamic.partition.mode=nonstrict;
but nothing changed.
Why is it not working?
How to set the current date as partition column?
Thank you in advance.
Lorenzo
Why move the files when you can create the external table on top of them?
LOAD DATA INPATH just moves the files (HDFS metadata operation) "as is", to the table's location.
Why define the partition column as a string when it is clearly a date?
CREATE EXTERNAL TABLE URL ... PARTITIONED BY(dt DATE) ...
Why are you trying to use non-ISO formats (yyyy.MM.dd)?
ISO date format is yyyy-MM-dd
Since it seems the partition information is not part of the data you have 3 options:
1.
Use a constant (no expression are allowed, including functions), e.g.
LOAD DATA INPATH '/inputpath/' INTO TABLE URL PARTITION (dt=date '2017-03-04');
2.
Create an additional table,URL_STG, similar to URL but without partition and use it to insert the partitions dynamically.
set hive.exec.dynamic.partition.mode=nonstrict;
insert into URL select *,current_date from URL_STG;
3.
Supply the date as a variable from the CLI
hive --hivevar dt=$(date +"%Y-%m-%d") -e \
'LOAD DATA INPATH '\''/inputpath/'\'' INTO TABLE URL PARTITION (dt=date '\''${hivevar:dt}'\'')'

Impala minimum DDL

I know that we can create an Impala table like
CREATE EXTERNAL TABLE SCHEMA.TableName LIKE PARQUET
'/rootDir/SecondLevelDir/RawFileThatKnowsDataTypes.parquet'
But I am not sure if Impala can create a table from a file (preferably a text file) that has no known formatting. So in other words if I just dump a random file into hadoop with a put command, can I wrap an Impala DDL around it and have a table created. Can anyone tell me?
If you file is newline separated I believe it should work if you provide the column delimiter with the ROW FORMAT clause, since textfile is the default format. Just get rid of your LIKE clause, and choose names and datatypes for your columns something like this:
CREATE EXTERNAL TABLE SCHEMA.TableName (col1 STRING, col2 INT, col3 FLOAT)
'/rootDir/SecondLevelDir/RawFile'
row format delimited fields terminated by ",";

Hive PARTITIONED BY, list index out of range error?

I am running Hive and Hue on Cloudera.
I have the following text file uploaded to hdfs. And I'm trying to create an external table in hive partitioned by id. For whatever reason, it's not working.
/user/test2/test.csv
id,name,age
1,sam,10
2,john,5
1,rick,4
Hive:
CREATE EXTERNAL TABLE IF NOT EXISTS testDB (
name STRING,
age INT
)
COMMENT 'This is the test database'
PARTITIONED BY (id INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/user/test2/'
TBLPROPERTIES ("skip.header.line.count" = "1");
On Hue, hive editor, when I tried to look at sample data, it says list index out of range. Not sure what this is. The external table will work correctly if I remove the partitioned by.
Your data located on '/user/test2/test.csv' is structured on three columns, but your schema defined for the table 'testDB' contains two columns, it is normal that you have this error.
You have to update your script, by adding the id column:
CREATE EXTERNAL TABLE IF NOT EXISTS testDB (
id INT,
name STRING,
age INT
)
...
you do not have the data partitioned because you have not created it. You should follow 3 steps:
1- Mount the data pointing to the .csv file.
CREATE EXTERNAL TABLE TableName (id int, name string, age int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
STORED AS TEXTFILE LOCATION '/user/test2/';
Create the partitioned data
CREATE EXTERNAL TABLE IF NOT EXISTS testDB (
name STRING,
age INT )
COMMENT 'This is the test database'
PARTITIONED BY (id INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/user/other/location'
TBLPROPERTIES ("skip.header.line.count" = "1");
Insert the data from your previous table to the final table.
insert into table testDB
partition (archive) select name, age from
TableName;
Hope this help you.
So I finally solved my problem. Below is my solution.
I am using Cloudera vm - 5.4.2, when I started Hue, the Hive setting in Hue is pointing to HiveServer2; But I created the tables using Hive CLI. So basically the table only exists in HiveServer1.
Solution:
Instead of using Hive CLI, create table in beeline, everything in Hue should work then.

Hive - How to load data from a file with filename as a column?

I am running the following commands to create my table ABC and insert data from all files that are in my designated file path. Now I want to add a column with filename, but I can't find any way to do that without looping through the files or something. Any suggestions on what the best way to do this would be?
CREATE TABLE ABC
(NAME string
,DATE string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;
hive -e "LOAD DATA LOCAL INPATH '${DATA_FILE_PATH}' INTO TABLE ABC;"
Hive does have virtual columns, which include INPUT__FILE__NAME. The link shows how to use this in a statement.
To fill another table with the filename as a column. Assuming your location of your data is hdfs://hdfs.location:port/data/folder/filename1
DROP TABLE IF EXISTS ABC2;
CREATE TABLE ABC2 (
filename STRING COMMENT 'this is the file the row was in',
name STRING,
date STRING);
INSERT INTO TABLE ABC2 SELECT split(INPUT__FILE__NAME,'folder/')[1],* FROM ABC;
You can alter the split to change how much of the full path you actually want to store.

Resources