Hive error after alter table partition set location - hadoop

I have a table TEST with one partition Profession.
After the execution of
Alter Table TEST PARTITION(Profession='50') set location 'hdfs:/apps/hive/warehouse1/TEST/Profession=50';
Command was executed without errors;
Next query failed with exception:
cannot find dir = hdfs:/xxxxxxxx/apps/hive/wharehouse/TEST/Profession=50
this was the directory where the partition was originally set.
Ever executing a Alter Table to move the location back to the original does not fix the information.
My goal is to move old partitions over time from a SSD hdfs volume to a HDD hdfs volume.
Any suggestion?
Thanks

Try to do msck repair table Test

Related

What happens if I move Hive table data files before moving the table?

I am trying to move the location of a table to a new directory. Let's say the original location is /data/dir. For example, I am trying something like this:
hadoop fs -mkdir /data/dir_bkp
hadoop fs -mv /data/dir/* /data/dir_bkp
I then do hive commands such as:
ALTER TABLE db.mytable RENAME TO db.mytable_bkp;
ALTER TABLE db.mytable_bkp SET LOCATION /data/dir_bkp;
Is it fine to move the directory files before changing the location of the table? After I run these commands, will the table mytable_bkp be populated as it was before?
After you executed mv command, your original table will become empty. because mv removed data files.
After you renamed table, it is empty, because it's location is empty.
After you executed ALTER TABLE SET LOCATION - the table is empty because partitions are mounted to old locations (now empty). Sorry for misleading you in this step previously. After rename table, partitions remain as they were before rename. Each partition can normally have it's own location outside table location.
If table is MANAGED, make it EXTERNAL:
alter table table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
Now drop table + create table with new location and run MSCK to create partitions:
MSCK [REPAIR] TABLE tablename;
If you are on Amazon EMR, run
ALTER TABLE tablename RECOVER PARTITIONS; instead of MSCK

Drop Hive Table & msck repair fails with Table stored in google cloud bucket

I am creating hive table in Google Cloud Bucket using below SQL statement.
CREATE TABLE schema_name.table_name (column1 decimal(10,0), column2 int, column3 date)
PARTITIONED BY(column7 date) STORED AS ORC
LOCATION 'gs://crazybucketstring/'
TBLPROPERTIES('ORC.COMPRESS'='SNAPPY');
Then I loaded data into this table using distcp command, Now when I try to Drop table it fails with below error message, Even if I try to drop empty table it fails.
hive>>DROP TABLE schema_name.table_name;
**Error:** Error while processing statement:
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask.MetaException
(message:java.lang.IllegalArgumentException: `hadoopPath must not be null`)
(state=08S01,code=1)
I also removed files from Google Cloud Storage bucket using gsutil rm -r gs:// command but still not able to delete table and giving same error
Also on running msck repair table it is giving following error.
FAILED:
Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1)
Any Idea what could be wrong?
The problem is related to bucket location. I will try to explain it in step by step manner on how to recreate it and how to solve it. this same issue also result in unable to run msck repair command.
How to Recreate it:
First I created a table (T1) with location pointing to the bucket given here:
LOCATION 'gs://crazybucketstring/'
Then I created another table (T2) in-side bucket in subfolder with location as given
below
LOCATION gs://crazybucketstring/schemname/tableaname/
Now when I try to drop first table (T1) it throws error as entire
bucket is behaving as table and it can't delete bucket, it can just
delete files.
When I try to drop table (T2) I am able to drop it and also files inside bucket subdirectory is deleted as it is managed table. Table T1 is still a headache.
In a desperate bid to delete Table T1, I emptied the bucket using gsutil rm -r command and tried msck repair table tablename and strangely msck repair command failed with below error message
>> msck repair table tablename
Error: Error while processing statement: FAILED:
Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1)
As usual DROP command was still not working.
Solution:
Eventually I got this Idea which worked.
I Altered Table T1 and SET its location to subdirectory inside bucket instead of bare bucket.
ALTER TABLE TABLENAME SET LOCATION gs://crazybucketstring/schemname/tableaname/
Now I do 'msck repair' and it doesn't throw any error.
I issued DROP Table command and it worked.
This issue is related to Table Location which we should deal with
carefully while creating more than 1 Table in same bucket. Best
practice is to use different subdirectories inside bucket to create
different tables and avoid using just bucket path as table location specially if you have to create multiple tables in same bucket. Thank you and feel free to reach out to Me for Big Data issues.

hive/hdfs moving data not working as expected

I had a table in hive called as test at location say 'hdfs://location1/partition='x'' and moved all the data to 'hdfs://location2/partition='x''.
hdfs dfs -mv /location1 /location2
Then I did
alter table test set location 'hdfs://location2'.
On doing
hdfs dfs -ls /location2
I see all the data in the right partition
Querying to get counts i.e.
select count(*) from test
works fine.
But doing
select * from test
pulls no records.
Unable to figure what went wrong while moving.
You need to drop the existing partitions that was pointing to the original location "hdfs://location1/partition='x'" manually.
Use below command to drop all the partitions manually:
alter table test drop partition(partition='x');
Once all the partitions are dropped run the below command to update the new partitions in hive metastore:
msck repair table test;
Why this? Because since the location of table was changed but the hive metastore was not updated with the new partitions in new location. The hive metastore is still holding the information about the partitions from old location. Once you drop partition and run the
msck repair
command, the hive metastore will get updated with the new partitions from new location.

Spark sql queries on partitioned table with removed partitions files fails

Below is what am trying in order,
create partitioned table in hive based on current hour.
use spark hive context and perform msck repair table.
delete the hdfs folders of one of the added partitions manually.
use spark hive context again and perform
a> msck repair
this does not remove the partition added already with no hdfs folder.
seems like known behavior with respect to "msck repair"
b> select * from tablexxx where (existing partition);
Fails with exception : Filenotfound exception pointing to hdfs folder
which was deleted manually.
Any insights on this behavior would be of great help.
Yes, MSCK REPAIR TABLE will only discover new partitions, not delete "old" ones.
Working with external hive tables where you deleted the HDFS folder, I see two solutions
drop the table (files will not be deleted because the table is external), then re-create the table using the same location, and then run MSCK REPAIR TABLE. This is my prefered solution.
Drop all the partitions you deleted using ALTER TABLE <table> DROP PARTITION <partition>
What you observe in your case is maybe related to these: https://issues.apache.org/jira/browse/SPARK-15044 and
https://issues.apache.org/jira/browse/SPARK-19187

MSCK Command Throwing Error When Google Storage Set As Location In Properties

I have a external partitioned hive table whose Location is set as 'gs://xxxx'.I have added some partitions manually and for regestering that partitions to hive metastore , i ran MSCK REPAIR command which throws following error:
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask
Please let me know why this error getting generated
Try this -
set hive.msck.path.validation=ignore; MSCK REPAIR TABLE table_name;
If it doesn't work check the DDL and the Partition fields. Keep in mind only Int and String is supported as partition values
The solution is to run Alter Table and set location to a subdirectory in gcs as given below.
ALTER TABLE TABLENAME SET LOCATION gs://crazybucketstring/schemname/tableaname/
If you are interested to understand why it is giving error with msck repair read this answer Drop Hive Table & msck repair fails with Table stored in google cloud bucket
This problem is related to gs location of your table.
Even though msck repair with ignored path validation as given in other answer works but it fails to solve underlying issue.

Resources