How to add new partitions automatically in partitioned table based on data inserted in GreenPlum.? - greenplum

I have a partitioned table in greenplum(modeled after psql), which has been partitioned with specific range of values.
Now, i have to insert the data again into the same table. New values for Partitions might overlap with existing ones. I have created alter command with new start and end dates. But, if the overlaps are there, the command fails. So, i need to create partition for each date, in order to avoid whole command failure.
Just wondering, if there is a way in greenplum to create partitions based on the inserted data automatically, just like hive does.
thanks for your help.

Greenplum does not (currently) create additional partitions for data which does not fit into an existing partition.
If you have a default partition on the table it will receive all the records which do not fit into one of the specified partitions. You can then use ALTER TABLE ... SPLIT DEFAULT PARTITION (see the documentation if required) to create the new partitions for any new dates at the end of the load batch.

Related

How to safely append data into a partitioned Hive table?

I have a production hive table partitioned by date. New data are generated hourly, and I need to merge the new data into the hive table.
In case there're duplicate data insertion requests or data overlap among hourly requests, I want to perform dedup to each partition whenever I update it.
I reviewed the answer to How to Append new data to already existing hive table
, but still have some confusions:
How should I merge the new data pieces into the existing partition?
I mean, should I create a tmp table for the new data, pull existing data into the tmp table, make dudup and OVERWRITE back the partition of the production table?
Is it possible "dirty read" could occur during the overwriting of the partition of the production hive table? Is there any solution to this?
I'm wondering if there's anything like atomic RENAME.

PL/SQL INSERT INTO & Partitions

I would like to start with saying that I am still learning PL/SQL. I was wondering if you could share your opinion on the following topic/questions. Basically, I want to insert data from Table X into Table Y via a package. I have already read about the inserting part, so this is clear, however I was wondering what will happen with existing partitions. Lets say that Table Y has partitions. What will happen when we insert new data from Table X? Will it be divided into the partitions or not? Or what if i have interval partitions, will they grow based on the incomming data from Table X? Do I need to “trigger”/ call the partitions inside the package?
Thank you in advance for your help! I am just looking for tips and advises so that I can figure out how to setup my tables and the package (in case i need to call the partitions there). Like for example should I have interval partitions or not.
Thank you for your time!
If the data you're inserting fits the definition of the existing partitions then the new data will be added to the existing partitions. If the new data does not fit the definition of any existing partition, then either
a new partition will be created for the new data, if the table is set up to do so, or
the insert will fail and you will have to manually create a new partition capable of holding the new data.
Which of the above situations will happen in your case depends entirely on how your table was created, on what partitions already exist, on what the partitioning method and key is/are, and on the data you're trying to insert.
You don't have to "do" anything in your code to tell it which partition to use. A partitioned table behaves no differently than a non-partitioned table.
Best of luck.

Copying Hive managed table by copying partition directories into warehouse

I have an existing bucketed table that has YEAR, MONTH, DAY partitioning, but I want to add additional partitioning by INGESTION_KEY, a column that doesn't exist in the existing table. This is to accommodate future table inserts so that I don't have to OVERWRITE a YEAR, MONTH, DAY partition every time I ingest data for that date; I can just do a simple INSERT INTO and create a new INGESTION_KEY partition.
I need a year's worth of data in my new table to start, so I want to copy a year of partitions from my existing table to a new table. Rather than doing a Hive INSERT for each partition, I thought it would be quicker to use distcp to copy files into the new table's partition directories in the Hive warehouse directory in HDFS, then ADD PARTITION to the new table.
So, this is all I'm doing:
hadoop distcp /apps/hive/warehouse/src_db.db/src_tbl/year=2017/month=02/day=06 /apps/hive/warehouse/dest_db.db/dest_tbl/year=2017/month=02/day=06/ingestion_key=123
hive -e "ALTER TABLE dest_tbl ADD PARTITION (year=2017,month=02,day=06,ingestion_key='123')"
Both are managed tables, the new table dest_tbl is clustered by the same column into the same number of buckets as the src_tbl, and the only difference in schema is the addition of INGESTION_KEY.
So far my SELECT * FROM dest_tbl shows everything in the new table looking normal. So my question is: is there anything wrong with this approach? Is it bad to INSERT to a managed, bucketed table this way, or is this an acceptable alternative to INSERT if no transformations are being done on the copied data?
Thanks!!
Although i prefer copying by Hive query just to make it all in Hive, but it's ok to copy data files using other tools, but ..
There is a dedicated command that add the new partitions metadata, you can use it in place of alter table add partition.., and it can add many partitions at once :
MSCK REPAIR TABLE dest_tbl;
Keep using Hive default partitioning format : partionKey=partitionValue

How to update Hive partitioned columns to a new set of columns?

I have a dynamic partitioned managed table in Hive partitioned by (country,state).
I wish to add one more column to these partitioned columns, say (country,state,city).
I am thinking I may use ALTER TABLE tab_nm DROP PARTITION old_partitions and then use another ALTER TABLE tab_nm ADD PARTITION.. to add new set of columns.
In one blog I read that a new table be created with latest partitions, load the data from the table with old partitions. But then I do not wish to recreate the table as its a huge production table.
I still have not implemented the ALTER TABLE.. since I am wondering that the DROP PARTITION may remove all the data in those partitions.
Please help.

Add sub partition on another column in oracle

I have a table which has two partitions (by range): first_half and second_half based on a column "INSERT_DAY".
I need to add subpartitions "SUCCESS" and "NONSUCCESS" based on the values of another column "STATUS" (subpartition by list) i.e. I need to transform my range partition to composite (range-list) partition.
I do not wish to drop existing tables or partitions. What is the ALTER query for this?
PS: The database is Oracle 9i
No alter query for adding subpartitions as far as i know.
To get the desired result performe the folowing steps
Create the table in the structure you want using create as select with the partitions and the sub partitions.
switch the names of the two tables.
you can also explore the use of dbms_Redefinition but if you have a luxury of a littel downtime it's not worth it.

Resources