failed during adding a partition in hive external tables - hadoop

im trying to create a simple external table in hive cli and I got an error while creating a partition. I have searched on google, but I was not able to get a proper result. Can you please help
hive (sampledb)> create external table externalhive(id int,name varchar(100),age tinyint,city varchar(100),state varchar(100)) partitioned by (year string)
> row format delimited fields terminated by '/t' stored as textfile location '/user/ah12x/external';
OK
Time taken: 0.169 seconds
hive (sampledb)> show tables;
OK
externalhive
hive (sampledb)> alter table externalhive add partition (year ='2014')
> location ('/2012');
FAILED: ParseException line 2:9 extraneous input '(' expecting StringLiteral near '<EOF>'

There is no need for braces after location, https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionLocation

Related

Partition column equal to current date in Hive

I am trying to load data into a Hive table using partition.
The code is as follow:
CREATE EXTERNAL TABLE URL(url STRING, clicks INT)
COMMENT 'Unique Clicks per URL'
PARTITIONED BY(dt STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/mypath/URL';
LOAD DATA INPATH '/inputpath/' INTO TABLE URL
PARTITION (dt=date_format(CURRENT_TIMESTAMP, "yyyy.MM.dd HH:mm:ss"));
I am gettin the following error:
FAILED: ParseException line 4:14 cannot recognize input near
'date_format' '(' 'CURRENT_TIMESTAMP' in constant
I tried using
SET hive.exec.dynamic.partition.mode=nonstrict;
but nothing changed.
Why is it not working?
How to set the current date as partition column?
Thank you in advance.
Lorenzo
Why move the files when you can create the external table on top of them?
LOAD DATA INPATH just moves the files (HDFS metadata operation) "as is", to the table's location.
Why define the partition column as a string when it is clearly a date?
CREATE EXTERNAL TABLE URL ... PARTITIONED BY(dt DATE) ...
Why are you trying to use non-ISO formats (yyyy.MM.dd)?
ISO date format is yyyy-MM-dd
Since it seems the partition information is not part of the data you have 3 options:
1.
Use a constant (no expression are allowed, including functions), e.g.
LOAD DATA INPATH '/inputpath/' INTO TABLE URL PARTITION (dt=date '2017-03-04');
2.
Create an additional table,URL_STG, similar to URL but without partition and use it to insert the partitions dynamically.
set hive.exec.dynamic.partition.mode=nonstrict;
insert into URL select *,current_date from URL_STG;
3.
Supply the date as a variable from the CLI
hive --hivevar dt=$(date +"%Y-%m-%d") -e \
'LOAD DATA INPATH '\''/inputpath/'\'' INTO TABLE URL PARTITION (dt=date '\''${hivevar:dt}'\'')'

Add Column to Hive External Table Error

Trying to add a column to an external table in HIVE but get the error below. This table currently has a thousand partitions registered and I want' to avoid re-creating the table and then running MSCK REPAIR which would take a very long time to complete. Also, the table uses OpenCSVSerde format. How can I add a column
hive> ALTER TABLE schema.Table123 ADD COLUMNS (Column1000 STRING);
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. java.lang.IllegalArgumentException: Error: type expected at the position 0 of '<derived from deserializer>' but '<' is found.

hive: external partitioned table without location

Is it possible to create external partitioned table without location? I want to add all the locations later, together with partitions.
i tried:
CREATE EXTERNAL TABLE IF NOT EXISTS a.b
(line STRING)
COMMENT 'abc'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\n'
STORED AS TEXTFILE
PARTITIONED BY day;
but i got ParseException: missing EOF at 'PARTITIONED' near 'TEXTFILE'
I don't think so, as said in alter location.
But anyway, i think your query as some errors and the correct script would be :
CREATE EXTERNAL TABLE IF NOT EXISTS a.b
(line STRING)
COMMENT 'abc'
PARTITIONED BY (day String)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\n'
STORED AS TEXTFILE
;
I think the issue is that you have not specified data type for your partition column "day". And you can create a HIVE external table without location and can use ALTER table options later to change the location.

Hive PARTITIONED BY, list index out of range error?

I am running Hive and Hue on Cloudera.
I have the following text file uploaded to hdfs. And I'm trying to create an external table in hive partitioned by id. For whatever reason, it's not working.
/user/test2/test.csv
id,name,age
1,sam,10
2,john,5
1,rick,4
Hive:
CREATE EXTERNAL TABLE IF NOT EXISTS testDB (
name STRING,
age INT
)
COMMENT 'This is the test database'
PARTITIONED BY (id INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/user/test2/'
TBLPROPERTIES ("skip.header.line.count" = "1");
On Hue, hive editor, when I tried to look at sample data, it says list index out of range. Not sure what this is. The external table will work correctly if I remove the partitioned by.
Your data located on '/user/test2/test.csv' is structured on three columns, but your schema defined for the table 'testDB' contains two columns, it is normal that you have this error.
You have to update your script, by adding the id column:
CREATE EXTERNAL TABLE IF NOT EXISTS testDB (
id INT,
name STRING,
age INT
)
...
you do not have the data partitioned because you have not created it. You should follow 3 steps:
1- Mount the data pointing to the .csv file.
CREATE EXTERNAL TABLE TableName (id int, name string, age int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
STORED AS TEXTFILE LOCATION '/user/test2/';
Create the partitioned data
CREATE EXTERNAL TABLE IF NOT EXISTS testDB (
name STRING,
age INT )
COMMENT 'This is the test database'
PARTITIONED BY (id INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/user/other/location'
TBLPROPERTIES ("skip.header.line.count" = "1");
Insert the data from your previous table to the final table.
insert into table testDB
partition (archive) select name, age from
TableName;
Hope this help you.
So I finally solved my problem. Below is my solution.
I am using Cloudera vm - 5.4.2, when I started Hue, the Hive setting in Hue is pointing to HiveServer2; But I created the tables using Hive CLI. So basically the table only exists in HiveServer1.
Solution:
Instead of using Hive CLI, create table in beeline, everything in Hue should work then.

Malformed ORC file error

Upon upgrading Hive External table from RC to ORC format and running MSCK REPAIR TABLE on it when I do select all from the table , I get following error -
Failed with exception java.io.IOException:java.io.IOException: Malformed ORC file hdfs://myServer:port/my_table/prtn_date=yyyymm/part-m-00000__xxxxxxxxxxxxx Invalid postscript length 1
What is the process to be followed for migrating RC formatted historical data to ORC formatted new definition for same table if there is one ?
Hive doesn't automatically reformat the data when you add partitions. You have two choices:
Leave the old partitions as RC files and make the new partitions ORC.
Move the data to a staging table and use insert overwrite to re-write the data as ORC files.
Blockquote
Add Row format ,input format and outformat to solve the problen in create statement:
create external table xyz
(
a string,
b string)
PARTITIONED BY (
c string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.SequenceFileInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
Loacation "hdfs path";

Resources