Hive bucketing and partition for existing table - hadoop

Is it possible to create bucketing and partitioning for a table that already contains data? I have a table in hive with more than 100M of records and I want to create a partition on the table. Also I need to create the bucketing.
Is it possible?
Thanks,
Bala

No, it's not possible to alter bucketing and partitioning within a preloaded table, you may have to create a new table with required bucketing and partitioning properties and then load it from the old table.
set hive.enforce.bucketing = true;
FROM old_table insert into table new_bucketed_partitioned_table select * ;

Related

Increase the performance of the insert overwrite in hive managed table

I'm new to Hive and I wanted to know the list of table properties to increase the performance of the insert overwrite in hive managed table.
Can someone help with that?
Some suggestions:
Switch-off statistics auto-gathering:
set hive.stats.autogather=false;
Remove partitions folders or table folder in advance if possible, or use PURGE option: https://stackoverflow.com/a/39623927/2700344
If you are using S3 and table is ORC, disable block-padding:
ALTER TABLE your_table SET TBLPROPERTIES ("orc.block.padding"="false", "orc.block.padding.tolerance"="1.0");
Use vectorization ConfigurationProperties-Vectorization and Tez:
set hive.execution.engine=tez;
Optimize query.

Is it possible to automatically create a new partition in a list partitioned table?

The code below would create a new partition if I would insert a date that does not exist in my table. Is it possible to do the same thing in a list partitioned table, where the partition is based on a VARCHAR2 column?
ALTER TABLE MY_TABLE MODIFY
PARTITION BY RANGE(DATE) INTERVAL(NUMTODSINTERVAL(1,'day'))
( partition MY_PARTITION values less than (to_date('2019-06-01', 'yyyy-mm-dd')));
Yes, it is possible starting from the Oracle 12.2.
See the details here.

How to update Hive partitioned columns to a new set of columns?

I have a dynamic partitioned managed table in Hive partitioned by (country,state).
I wish to add one more column to these partitioned columns, say (country,state,city).
I am thinking I may use ALTER TABLE tab_nm DROP PARTITION old_partitions and then use another ALTER TABLE tab_nm ADD PARTITION.. to add new set of columns.
In one blog I read that a new table be created with latest partitions, load the data from the table with old partitions. But then I do not wish to recreate the table as its a huge production table.
I still have not implemented the ALTER TABLE.. since I am wondering that the DROP PARTITION may remove all the data in those partitions.
Please help.

Add partition to hive table with no data

I am trying to create a hive table which has the same columns as another table (partitioned). I use the following query for the same
CREATE TABLE destTable STORED AS PARQUET AS select * from srcTable where 1=2;
Apparently I cannot use 'PARTITIONED BY(col_name)' because destTable must not be partitioned. But I want to mention that destTable should be partitioned by a column (same as srcTable) before I add data to it.
Is there a way to do that?
As you mentioned, destTable can not be a partitioned table so there is no way to do this directly. Also, destTable can not be an external table.
In this situation you will need to create a temporary "staging_table" (un-partitioned and a Hive-managed table) to hold the data.
Step 1: Transfer everything from srcTable to the staging_table
Step 2: Create a partitioned destTable and do:
INSERT OVERWRITE TABLE destTable PARTITION(xxxx)
SELECT * FROM staging_table;
Hope this helps.

how to drop partition metadata from hive, when partition is drop by using alter drop command

I have dropped the all the partitions in the hive table by using the alter command
alter table emp drop partition (hiredate>'0');
After droping partitions still I can see the partitions metadata.How to delete this partition metadata? Can I use the same table for new partitions?
Partitioning is defined when the table is created. By running ALTER TABLE ... DROP PARTITION ... you are only deleting the data and metadata for the matching partitions, not the partitioning of the table itself.
Your best bet at this point will be to recreate the table without the partitioning. If there is some data you are trying to save, rename the current table, create the new table (without the partitioning), then run an INSERT from the old table to the new table.

Resources