alter table/add columns in non native table in hive - hadoop

I created a hive table with a storage handler and now I want to add a column to that table but it gives me below error:
[Code: 10134, SQL State: 42000] Error while compiling statement: FAILED:
SemanticException [Error 10134]: ALTER TABLE can only be used for [ADDPROPS,
DROPPROPS] to a non-native table
As per the hive documentation any hive table you create with storage handler is non native table.
Here's a link https://cwiki.apache.org/confluence/display/Hive/StorageHandlers
There is a JIRA case for enhancement is open with Apache for the same.
https://issues.apache.org/jira/browse/HIVE-1240
For ex, I am using Druid Storage Handler in my case.
I created a hive table using:
CREATE TABLE druid_table_1
(`__time` TIMESTAMP, `dimension1` STRING, `metric1` int)
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler';
and then I am trying to add a column:
ALTER TABLE druid_table_1 ADD COLUMNS (`dimension2` STRING);
With above approach I am getting an error.
Is there any other way to add a column to non native tables in hive without recreating it?

Patch is available in HDP 2.5+ from Hortonworks. Support for ADD columns has been added in ALTER statement.
Column can be added into druid table using ALTER table DDL in hive.
ALTER TABLE ADD COLUMNS (col_name data_type)
There is no need to specify partition spec as these are druid backed hive tables and partition/storage is maintained by druid.

Related

Dynamic Partition on external hive table

Is there a way to dynamically partition an external Hive table? I referred to posts which mentioned to follow below steps -
Create an external non-partitioned table
Create an external partitioned table
Use insert into table command to insert the data from non-partitioned to partitioned table
Problem with the above solution is that I need to always use the insert command if my base (non-partitioned) table is modified (addition/deletion of parquet files).
I am seeking for a solution where I need not do any insert command and my partitioned table should be updated as and how my non-partitioned table is changed.

Spark Sql 1.5 dataframe saveAsTable how to add hive table properties

I am running spark sql on hive. I need to add auto.purge table properties while creating new hive table. I tried below code to add options while calling saveAsTable method :
inputDF.write.option("auto.purge" -> "true").saveAsTable(hiveTableName)
Above line of code added a property under WITH SERDEPROPERTIES of table.
I need to add this property under TBLPROPERTIES section of hive DDL.
Finally i found a solution, I am not sure if this is the best solution.
Unfortunately Spark 1.5 sql saveAsTable method doesn't support table property as input.They are creating new tableProperties map before hive table creation.
check out below code:
https://github.com/apache/spark/blob/v1.5.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
To add table properties to existing hive table use alter table command.
ALTER TABLE table_name SET TBLPROPERTIES ('auto.purge'='true');
Above command will add table property to hive meta store.
To drop existing table inside encryption zone run above command before drop command.

FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException while inserting data into Hive partitioned table

I have an employee data with 3 departments A,B,C.
I am trying to create partioned table on departments.
I created the table using below command.
create external table Parti_Trail (EmployeeID Int,FirstName
String,Designation String,Salary Int) PARTITIONED BY (Department
String) row format delimited fields terminated by "," location
'/user/sree/HiveTrail';
But this did nt load my table with data in location '/user/sree/HiveTrail'
So I tried to load my table
LOAD DATA INPATH '/user/aibladmin/HiveTrail' OVERWRITE INTO TABLE Parti_SCDTrail PARTITION(department);
But showing
FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: department not found in table's partition spec: {department=null}
Why is it so. Am I doing anything wrong?
What happens if we SET hive.exec.dynamic.partition.mode = nonstrict;
While creating partitioned table , do we need to keep data seperated in different folder or whether it automatically get seperated into different partitions
For external tables with partition in Hive you need to run an ALTER statement to update the Metastore for new partitions. Because external tables are not managed by Hive.
Check this link
Hope it helps...!!!

Why hive doesn't allow create external table with CTAS?

In hive, create external table by CTAS is a semantic error, why?
The table created by CTAS is atomic, while external table means data will not be deleted when dropping table, they do not seem to conflict.
In Hive when we create a table(NOT external) the data will be stored in /user/hive/warehouse.
But during External hive table creation the file will be anywhere else, we are just pointing to that hdfs directory and exposing the data as hive table to run hive queries etc.
This SO answer more precisely Create hive table using "as select" or "like" and also specify delimiter
Am I missing something here?
Try this...You should be able to create an external table with CTAS.
CREATE TABLE ext_table LOCATION '/user/XXXXX/XXXXXX'
AS SELECT * from managed_table;
I was able to create one. I am using 0.12.
i think its a semantic error because it misses the most imp parameter of external table definition viz. the External Location of the data file! by definition, 1. External means the data is outside hive control residing outside the hive data warehouse dir. 2. if table is dropped data remains intact only table definition is removed from hive metastore. so,
i. if CTAS is with managed table, the new ext table will have file in warehouse which will be removed with drop table making #2 wrong
ii. if CTAS is with other external table, the 2 tables will point to same file location.
CTAS creates a managed hive table with the new name using the schema and data of the said table.
You can convert it to an external table using:
ALTER TABLE <TABLE_NAME> SET TBLPROPERTIES('EXTERNAL'='TRUE');

Sqoop - Create empty hive partitioned table based on schema of oracle partitioned table

I have an oracle table which has 80 columns and id partitioned on state column. My requirement is to create a hive table with similar schema of oracle table and partitioned on state.
I tried using sqoop -create-hive-table option. But keep getting an error
ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.IllegalArgumentException: Partition key state cannot be a column to import.
I understand that in Hive the partitioned column should not be in table definition, but then how do I get around the issue?
I do not want to manually write create table command, as I have 50 such tables to import and would like to use sqoop.
Any suggestion or ideas?
Thanks
There is a turn around for this.
Below is the procedure i fallow :
On Oracle run query to get the schema for a table and store it to a file.
Move that file to Hadoop
On Hadoop create a shell script which constructs a HQL file.
That hql file contains "Hive create table statement along with columns". For this we can use the above file(Oracle schema file copied to hadoop).
For this script to run u need to just pass Hive database name,table name, partition column name,path, etc.. depending on u r customization level.At the end of this shell script add "hive -f HQL filename".
If everything is ready it just takes couple of mins for each table creation.

Resources