drop table command in hive - hadoop

I am trying to drop a table and recreate it in Hive. After dropping the table if I run select query on the table it shows old rows which were in the table before dropping. How is this possible when the table is already dropped ? Why does it retain rows even after table is dropped and recreated ?
hive> select * from abc;
A 30
B 40
hive> drop table abc;
hive> create external table abc ( name string, qty int);
hive> select * from abc;
A 30
B 40

The problem is you are dropping the external table so whenever we dropped this table at that time source file of that table is still exist on that path so whenever we are going to create a new external table with same table name then data can directly extract from source path, for resolving this issue First get path of the table using following command :
hive> describe formatted database_name.table_name;
Then copy entire location which appear in description, for example :
/user/hive/warehouse/database_name.db/table_name
After this use following command to truncate all the data from given table :
hive> dfs -rmr /user/hive/warehouse/database_name.db/table_name;
OR
hive> dfs -rm -r /user/hive/warehouse/database_name.db/table_name;
Then you can wipe it completely using DROP TABLE command.

I don't know Hive, but if it is anything like Oracle (which I, kind of, know), then external table points to a file stored on your disk.
Therefore, once you dropped it you couldn't use it (of course). But then you created another EXTERNAL TABLE (see? 5th line in your example) and of course that you were able to select from it once again.
Because, you didn't delete the FILE that is a data source for that external table.

Related

Changing the Hive table type using the DDL statement does not work

Changing the Hive table type using the DDL statement does not work,for expleon
hive> alter table ads.ods_ads_copy set tblproperties('EXTERNAL'='FALSE');
OK
Time taken: 1.736 seconds
hive> desc formatted ads.ods_ads_copy;
enter image description here
You can change properties like this but the actual data/folder location will not change and can cause confusion. Hive will show you whatever DDL you will apply and it wont change table properties from external to internal.
Clean and better way is to recreate table like below.
create table mymanagedtable as select * from mytable;
drop table mytable;

How can we drop a HIVE table with its underlying file structure, without corrupting another table under the same path?

Assuming we have 2 hive tables created under the same HDFS file path.
I want to be able to drop a table WITH the HDFS files path, without corrupting the other table that's in the same shared path.
By doing the following:
drop table test;
Then:
hadoop fs -rm -r hdfs/file/path/folder/*
I delete both tables files, not just the one I've dropped.
In another post I found this solution:
--changing the tbl properties to to make the table as internal
ALTER TABLE <table-name> SET TBLPROPERTIES('EXTERNAL'='False');
--now the table is internal if you drop the table data will be dropped automatically
drop table <table-name>;
But I couldn't get passed the ALTER statement as I got a permission denied error (User does not have [ALTER] privilege on table)
Any other solution?
If you have two tables using the same location, then all files in this location belongs to both tables, does not matter how they were created.
Say if you have table1 with location hdfs/file/path/folder and table2 with the same location hdfs/file/path/folder and you inserted some data into table1, files are created and they are being read if you select from table2, and vice-versa: if you insert into table2, new files will be accessible from table1. This is because table data is being stored in the location, no matter how you put the files inside that location. You can insert data into table using SQL, put files into location manually, etc.
Each table or partition has it's location, you cannot specify files separately.
For better understanding, read also this answer with examples about multiple tables on top of the same location: https://stackoverflow.com/a/54038932/2700344

Is there a sql command to delete the files on HDFS for an external table

I would ask if there is a sql command in hive to drop the table and delete the files on hdfs for this external table.
When I use hdfs command to delete the files, I am always afraid that I may delete other files that doesn't belong to this external table.
There is no such sql command to drop external table directly but there is an alternative
First make this table as managed:
Drop the table
Step 1 :
ALTER TABLE <table-name> SET TBLPROPERTIES('EXTERNAL'='False');
Step 2 :
drop table <table-name>; //now the table is internal if you drop the table data will be dropped automatically.

Hive count(*) runs indefinitely, and data gets prepopulated with values

I am trying to load data into hive from RDBMS, using sqoop.
Once I populate the hive table with data, and try to run a count(*), the query runs forever and ever. Also if I drop the (external) hive table and delete everything from the hdfs directory and then create a similar, the table gets pre populated with old data(same as in dropped table)even after I delete everything from my hdfs directory and in-fact the trash is also cleared.
Still, the data gets populated and a count(*) runs indefinitely on it.
UPDATE 1
Its a stand alone sandbox hortonworks(2.4) environment.
I dropped the table from hive and also removed related files from HDFS.
I have a script to create and load data.
drop table employee;
and the I run following commands
hadoop fs -rm -r /user/hive/warehouse/intermidiateTable/* ,and,
hadoop fs -rm -r .Trash/Current/user/hive/warehouse/intermidiateTable/*
and then i create the table using same query as this:
create external table employee (id int, name string, account_no bigint, balance bigint, date_field timestamp, created_by string, created_date string,batch_id int, updated_by string, updated_date string)
row format delimited
fields terminated by ','
lines terminated by '\n'
location '/user/hive/warehouse/intermidiateTable';
and when i do select query the table gets populated with older data.
Als0, a select count(*) runs indefinitely.
Recommend a solution somebody.
If you are creating external table inside warehouse directory itself, then what is the purpose of declaring table as 'external',no?
Aren't external table supposed to be outside the warehouse directory so you have control over data files rather than hive itself.

How to alter Hive partition column name

I have to change the partition column name (not partition spec), I looked for the commands in hive wiki and some google pages. I can find the options for altering the partition spec,
i.e. For example
In /table/country='US' I can change US to USA, but I want to change country to continent.
I feel like the only option available for changing partition column name is dropping and re-creating the table. Is there is any other option available please help me.
Thanks in advance.
You can change column name in metadata by following:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ChangeColumnName/Type/Position/Comment
But as the document says, it only changes the metadata. Hive partitions are implemented as directories with the naming pattern columnName=spec. So you also need to change the names of those directories on HDFS by using "hadoop fs" command.
You have alter the partition column using simple swap method.
Create a new temp table which is same schema as current table.
Move all files in the old table to newly create table location.
hadoop fs -mv <current_table_name> <temp_table_name>
Alter the schema of the original table (Rename or drop the partitions)
Recopy/load the temp table data to the original table with appropriate partition values.
hadoop fs -mv <temp_table_name> <current_table_name>
msck repair the the original table & drop the temp_table.
NOTE : mv command move the file from one location to another with reducing the copy time. alternately we can use LOAD DATA INPATH for copy the data to the original table.
You can not change the partition column in hive infact Hive does not support alterting of partitioning columns
You can think of it this way - Hive stores the data by creating a folder in hdfs with partition column values - Since if you trying to alter the hive partition it means you are trying to change the whole directory structure and data of hive table which is not possible exp if you have partitioned on year this is how directory structure looks like
tab1/clientdata/**2009**/file2
tab1/clientdata/**2010**/file3
If you want to change the partition column you can perform below steps
Create another hive table with required changes in partition column
Create table new_table ( A int, B String.....)
Load data from previous table
Insert into new_table partition ( B ) select A,B from table Prev_table
As you said, rename the value for of the partition is very straightforward:
hive> ALTER TABLE test.usage PARTITION (country ='US') RENAME TO PARTITION (date='USA');
I know that this is not what you are looking for. Unfortunately, given that your data is already partitioned by country, the only option you have is to drop the table, remove the data (supposing your table is external) from the HDFS and reinsert the data using continent as partition.
What I would do in your case is to have multiple partition levels, so that your folder structure will look like that:
/path/to/the/data/continent='america'/country='usa'
/path/to/the/data/continent='america'/country='mexico'
/path/to/the/data/continent='europe'/country='spain'
/path/to/the/data/continent='europe'/country='italy'
...
That way you can query the data for different levels of granularity (in this case continent and country).
Adding solution here for later:
Use case: Change partition column from STRING to INT
set hive.mapred.mode=norestrict;
alter table {table_name} partition column ({column_name} {column_type});
e.g. ALTER TABLE employee PARTITION COLUMN dept INT;

Resources