How to rename a hive table without changing location? - hadoop

Based on the Hive doc below:
Rename Table
ALTER TABLE table_name RENAME TO new_table_name;
This statement lets you change the name of a table to a different name.
As of version 0.6, a rename on a managed table moves its HDFS location as well. (Older Hive versions just renamed the table in the metastore without moving the HDFS location.)
Is there any way to rename a table without changing the location?

Yeah we can do that. You just need to follow below three commands in sequence.
Lets say you have a external table test_1 in hive. And you want to rename it test_2 which should point test_2 location not test_1. Then you need to convert this table into Managed table using below command.
test_1 -> pointing to test_1 location
ALTER TABLE db_name.test_1 SET TBLPROPERTIES('EXTERNAL'='FALSE');
Rename the table name.
ALTER TABLE db_name.test_1 RENAME TO db_name.test_2;
Again convert the managed table after renaming to external table.
ALTER TABLE db_name.test_2 SET TBLPROPERTIES('EXTERNAL'='TRUE');
db_name.test_2 table will point the test_2 location. If we do it without making the managed table it will point the test_1 location.

As of Hive 2.2.0 a managed table's HDFS location is moved only if the table is created without a LOCATION clause and under its database directory.Link

ALTER TABLE does not follow the databasename.tablename syntax in Hive like it does in CREATE or SELECT.
Mention the databasename first and then run alter table statement.
syntax as below
USE databasename;
ALTER TABLE old_tablename RENAME TO new_tablename;

Here is the command executed
ALTER TABLE old_ratings RENAME TO ratings;

Related

What happens if I move Hive table data files before moving the table?

I am trying to move the location of a table to a new directory. Let's say the original location is /data/dir. For example, I am trying something like this:
hadoop fs -mkdir /data/dir_bkp
hadoop fs -mv /data/dir/* /data/dir_bkp
I then do hive commands such as:
ALTER TABLE db.mytable RENAME TO db.mytable_bkp;
ALTER TABLE db.mytable_bkp SET LOCATION /data/dir_bkp;
Is it fine to move the directory files before changing the location of the table? After I run these commands, will the table mytable_bkp be populated as it was before?
After you executed mv command, your original table will become empty. because mv removed data files.
After you renamed table, it is empty, because it's location is empty.
After you executed ALTER TABLE SET LOCATION - the table is empty because partitions are mounted to old locations (now empty). Sorry for misleading you in this step previously. After rename table, partitions remain as they were before rename. Each partition can normally have it's own location outside table location.
If table is MANAGED, make it EXTERNAL:
alter table table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
Now drop table + create table with new location and run MSCK to create partitions:
MSCK [REPAIR] TABLE tablename;
If you are on Amazon EMR, run
ALTER TABLE tablename RECOVER PARTITIONS; instead of MSCK

How can we drop a HIVE table with its underlying file structure, without corrupting another table under the same path?

Assuming we have 2 hive tables created under the same HDFS file path.
I want to be able to drop a table WITH the HDFS files path, without corrupting the other table that's in the same shared path.
By doing the following:
drop table test;
Then:
hadoop fs -rm -r hdfs/file/path/folder/*
I delete both tables files, not just the one I've dropped.
In another post I found this solution:
--changing the tbl properties to to make the table as internal
ALTER TABLE <table-name> SET TBLPROPERTIES('EXTERNAL'='False');
--now the table is internal if you drop the table data will be dropped automatically
drop table <table-name>;
But I couldn't get passed the ALTER statement as I got a permission denied error (User does not have [ALTER] privilege on table)
Any other solution?
If you have two tables using the same location, then all files in this location belongs to both tables, does not matter how they were created.
Say if you have table1 with location hdfs/file/path/folder and table2 with the same location hdfs/file/path/folder and you inserted some data into table1, files are created and they are being read if you select from table2, and vice-versa: if you insert into table2, new files will be accessible from table1. This is because table data is being stored in the location, no matter how you put the files inside that location. You can insert data into table using SQL, put files into location manually, etc.
Each table or partition has it's location, you cannot specify files separately.
For better understanding, read also this answer with examples about multiple tables on top of the same location: https://stackoverflow.com/a/54038932/2700344

Is there a sql command to delete the files on HDFS for an external table

I would ask if there is a sql command in hive to drop the table and delete the files on hdfs for this external table.
When I use hdfs command to delete the files, I am always afraid that I may delete other files that doesn't belong to this external table.
There is no such sql command to drop external table directly but there is an alternative
First make this table as managed:
Drop the table
Step 1 :
ALTER TABLE <table-name> SET TBLPROPERTIES('EXTERNAL'='False');
Step 2 :
drop table <table-name>; //now the table is internal if you drop the table data will be dropped automatically.

How to alter Hive partition column name

I have to change the partition column name (not partition spec), I looked for the commands in hive wiki and some google pages. I can find the options for altering the partition spec,
i.e. For example
In /table/country='US' I can change US to USA, but I want to change country to continent.
I feel like the only option available for changing partition column name is dropping and re-creating the table. Is there is any other option available please help me.
Thanks in advance.
You can change column name in metadata by following:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ChangeColumnName/Type/Position/Comment
But as the document says, it only changes the metadata. Hive partitions are implemented as directories with the naming pattern columnName=spec. So you also need to change the names of those directories on HDFS by using "hadoop fs" command.
You have alter the partition column using simple swap method.
Create a new temp table which is same schema as current table.
Move all files in the old table to newly create table location.
hadoop fs -mv <current_table_name> <temp_table_name>
Alter the schema of the original table (Rename or drop the partitions)
Recopy/load the temp table data to the original table with appropriate partition values.
hadoop fs -mv <temp_table_name> <current_table_name>
msck repair the the original table & drop the temp_table.
NOTE : mv command move the file from one location to another with reducing the copy time. alternately we can use LOAD DATA INPATH for copy the data to the original table.
You can not change the partition column in hive infact Hive does not support alterting of partitioning columns
You can think of it this way - Hive stores the data by creating a folder in hdfs with partition column values - Since if you trying to alter the hive partition it means you are trying to change the whole directory structure and data of hive table which is not possible exp if you have partitioned on year this is how directory structure looks like
tab1/clientdata/**2009**/file2
tab1/clientdata/**2010**/file3
If you want to change the partition column you can perform below steps
Create another hive table with required changes in partition column
Create table new_table ( A int, B String.....)
Load data from previous table
Insert into new_table partition ( B ) select A,B from table Prev_table
As you said, rename the value for of the partition is very straightforward:
hive> ALTER TABLE test.usage PARTITION (country ='US') RENAME TO PARTITION (date='USA');
I know that this is not what you are looking for. Unfortunately, given that your data is already partitioned by country, the only option you have is to drop the table, remove the data (supposing your table is external) from the HDFS and reinsert the data using continent as partition.
What I would do in your case is to have multiple partition levels, so that your folder structure will look like that:
/path/to/the/data/continent='america'/country='usa'
/path/to/the/data/continent='america'/country='mexico'
/path/to/the/data/continent='europe'/country='spain'
/path/to/the/data/continent='europe'/country='italy'
...
That way you can query the data for different levels of granularity (in this case continent and country).
Adding solution here for later:
Use case: Change partition column from STRING to INT
set hive.mapred.mode=norestrict;
alter table {table_name} partition column ({column_name} {column_type});
e.g. ALTER TABLE employee PARTITION COLUMN dept INT;

Why hive doesn't allow create external table with CTAS?

In hive, create external table by CTAS is a semantic error, why?
The table created by CTAS is atomic, while external table means data will not be deleted when dropping table, they do not seem to conflict.
In Hive when we create a table(NOT external) the data will be stored in /user/hive/warehouse.
But during External hive table creation the file will be anywhere else, we are just pointing to that hdfs directory and exposing the data as hive table to run hive queries etc.
This SO answer more precisely Create hive table using "as select" or "like" and also specify delimiter
Am I missing something here?
Try this...You should be able to create an external table with CTAS.
CREATE TABLE ext_table LOCATION '/user/XXXXX/XXXXXX'
AS SELECT * from managed_table;
I was able to create one. I am using 0.12.
i think its a semantic error because it misses the most imp parameter of external table definition viz. the External Location of the data file! by definition, 1. External means the data is outside hive control residing outside the hive data warehouse dir. 2. if table is dropped data remains intact only table definition is removed from hive metastore. so,
i. if CTAS is with managed table, the new ext table will have file in warehouse which will be removed with drop table making #2 wrong
ii. if CTAS is with other external table, the 2 tables will point to same file location.
CTAS creates a managed hive table with the new name using the schema and data of the said table.
You can convert it to an external table using:
ALTER TABLE <TABLE_NAME> SET TBLPROPERTIES('EXTERNAL'='TRUE');

Resources