I have a huge table containing about 500 million records. This table has 2 partitions, CURRENT_DATA and PREVIOUS_DATA.
When new data comes in, last PREVIOUS_DATA partition is truncated, and data from CURRENT_DATA partition is moved to PREVIOUS_DATA partition.
Right now, I am updating the records to move from one partition to another. However, it takes a long time. Is there any faster way to move data from one partition to another?
Don't strive to apply DML, but DROP, RENAME and MODIFY partitions as the partition previous_data is already been truncated such as
(presuming you have a list partition with VALUE = 'A' for the current, and VALUE = 'B' for the previous partitions)
ALTER TABLE tab
DROP PARTITION previous_data;
ALTER TABLE tab
RENAME PARTITION current_data TO previous_data;
ALTER TABLE tab
MODIFY PARTITION previous_data
ADD VALUES ('B');
ALTER TABLE tab
MODIFY PARTITION previous_data
DROP VALUES ('A');
ALTER TABLE tab
ADD PARTITION current_data VALUES ( 'A' );
Check out the partitions through
SELECT partition_name, high_value FROM USER_TAB_PARTITIONS WHERE table_name = 'TAB';
before and after applying those operations above(both cases should return the same results).
Be careful! Those are dangerous operation and shouldn't be tried within the production systems, but within the test databases !
If only goal is to separate current data from previous why not just to use two different tables PREVIOUS_DATA and CURRENT_DATA. You can easily get the complete picture by unioning the two together:
SELECT * FROM PREVIOUS_DATA
UNION ALL
SELECT * FROM CURRENT_DATA
When new data comes you can drop PREVIOUS_DATA table, rename CURRENT_DATA to PREVIOUS_DATA and then create new empty CURRENT_DATA table:
DROP TABLE PREVIOUS_DATA;
ALTER TABLE CURRENT_DATA RENAME TO PREVIOUS_DATA;
CREATE TABLE CURRENT_DATA AS SELECT * FROM PREVIOUS_DATA WHERE 1 = 2; /*add comments and constraints*/
Related
version 18.16.1
CREATE TABLE traffic (
`date` Date,
...
) ENGINE = MergeTree(date, (end_time), 8192);
I want to change as PARTITION BY toYYYYMMDD(date) without drop table how to do that.
Since ALTER query does not allow the partition alteration, the possible way is to create a new table
CREATE TABLE traffic_new
(
`date` Date,
...
)
ENGINE = MergeTree(date, (end_time), 8192)
PARTITION BY toYYYYMMDD(date);
and to move your data
INSERT INTO traffic_new SELECT * FROM traffic WHERE column BETWEEN x and xxxx;
Rename the final table if necessary.
And yes, this option involves deleting the old table (seems to be no way to skip this step)
I have a table with 2017 and 2018 year data. Need to create monthly partition on that table.
So I created one non partitioned table and loaded all the data from original table. now I am converting the new table to a monthly partitioned table.
When I am altering getting error as
ORA-14300: partitioning key maps to a partition outside maximum
permitted number of partitions
My Script is
ALTER TABLE ORDERHDR_PART MODIFY
PARTITION BY RANGE (LASTUPDATE) INTERVAL(NUMTOYMINTERVAL(1, 'MONTH'))
(
PARTITION ORDERHDR_PART_JAN VALUES less than (TO_DATE('01-02-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_FEB VALUES less than (TO_DATE('01-03-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_MAR VALUES less than (TO_DATE('01-04-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_APR VALUES less than (TO_DATE('01-05-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_MAY VALUES less than (TO_DATE('01-06-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_JUN VALUES less than (TO_DATE('01-07-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_JUL VALUES less than (TO_DATE('01-08-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_AUG VALUES less than (TO_DATE('01-09-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_SEP VALUES less than (TO_DATE('01-10-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_OCT VALUES less than (TO_DATE('01-11-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_NOV VALUES less than (TO_DATE('01-12-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_DEC VALUES less than (TO_DATE('01-01-2019','DD-MM-YYYY'))
)ONLINE;
I think your approach is wrong.
First create a partitioned table, e.g.
CREATE TABLE ORDERHDR_PART (....)
PARTITION BY RANGE (LASTUPDATE) INTERVAL (NUMTOYMINTERVAL(1, 'MONTH'))
(
PARTITION ORDERHDR_INITIAL VALUES less than (DATE '2000-01-01')
);
Then transfer existing data to the new table.
Either you use a simple INSERT INTO ORDERHDR_PART SELECT * FROM ORDERHDR_2017;
Oracle will create monthly partitions automatically based on LASTUPDATE value.
With this methods you would duplicate (temporary) your data and/or you may face a performance issue.
The other method is to use Exchanging Partitions, should be like this
ALTER TABLE ORDERHDR_PART
EXCHANGE PARTITION FOR (DATE '2017-01-01')
WITH TABLE ORDERHDR_2017
INCLUDING INDEXES;
I don't know whether "PARTITION FOR (DATE '2017-01-01')" is created automatically, perhaps you have to run INSERT INTO ORDERHDR_PART (LASTUPDATE) VALUES (DATE '2017-01-01'); ROLLBACK; in order to create it first.
You will get one partition for all months, afterwards you can split the partition with Splitting into Multiple Partitions. Should be like this:
ALTER TABLE ORDERHDR_PART SPLIT PARTITION FOR (DATE '2017-01-01') INTO (
PARTITION ORDERHDR_PART_JAN VALUES less than (TO_DATE('01-02-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_FEB VALUES less than (TO_DATE('01-03-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_MAR VALUES less than (TO_DATE('01-04-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_APR VALUES less than (TO_DATE('01-05-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_MAY VALUES less than (TO_DATE('01-06-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_JUN VALUES less than (TO_DATE('01-07-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_JUL VALUES less than (TO_DATE('01-08-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_AUG VALUES less than (TO_DATE('01-09-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_SEP VALUES less than (TO_DATE('01-10-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_OCT VALUES less than (TO_DATE('01-11-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_NOV VALUES less than (TO_DATE('01-12-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_DEC VALUES less than (TO_DATE('01-01-2019','DD-MM-YYYY'))
);
Note, by default you cannot drop the inital partition of a RANGE partitioned table. If you face this problem execute:
ALTER TABLE ORDERHDR_PART SET INTERVAL ();
ALTER TABLE ORDERHDR_PART DROP PARTITION ORDERHDR_INITIAL;
ALTER TABLE ORDERHDR_PART SET INTERVAL (NUMTOYMINTERVAL(1, 'MONTH'));
I was working on one solution and found that in some particular cases, hive insert overwrite truncates the table however in few cases it doesn't. Would someone please explain me what it's behaving like that?
to explain this, I am table two tables, source and target and trying to insert data into master from source table using insert overwrite
When Source Table has partition
if source table has partition and if you write a condition such that partition does not exist then it won't truncate the master table.
create table source (name String) partitioned by (age int);
insert into source partition (age) values("gaurang", 11);
create table target (name String, age int);
insert into target partition (age) values("xxx", 99);
following query won't truncate the table even if select doesn't return anything.
insert overwrite table temp.test12 select * from temp.test11 where name="Ddddd" and age=99;
However, following query will truncate the table.
insert overwrite table temp.target select * from temp.test11 where name="Ddddd" and age=11;
it makes sense in the first case, as the partition(age=99) does not exist hence it should stop the execution of the query further. However this is my assumption, not sure what exactly happens.
When Source Table Doesn't have partition, but Target has
in this case target table won't be truncated even if select statement from source table returns 0 rows.
use temp;
drop table if exists source1;
drop table if exists target1;
create table source1 (name String, age int);
create table target1 (name String) partitioned by (age int);
insert into source1 values ("gaurang", 11);
insert into target1 partition(age) values("xxx", 99);
select * from source1;
select * from target1;
Following query won't truncate the table even if no data found in select statement.
insert overwrite table temp.target1 partition(age) select * from temp.source1 where age=90;
When Source or Target don't have partition
In this case if I try to insert overwrite target and select statement doesn't return any row then target table will be truncated.
check the example below.
use temp;
drop table if exists source1;
drop table if exists target1;
create table source1 (name String, age int);
create table target1 (name String, age int);
insert into source1 values ("gaurang", 11);
insert into target1 values("xxx", 99);
select * from source1;
select * from target1;
Following Query will truncate the target table.
insert overwrite table temp.target1 select * from temp.source1 where age=90;
Better use term 'overwrite' instead of truncate, because it is what exactly happening during insert overwrite.
When you write overwrite table temp.target1 partition(age) you instructs Hive to overwrite partitions, not all the target1 table, only those partitions which will be returned by select.
Empty dataset will not overwrite partitions in dynamic partition mode. because the partition to overwrite is unknown, partition should be taken from dataset, and the dataset is empty, nothing to overwrite then.
And in case of not partitioned table, it is already known that it should overwrite all the table, does not matter, empty dataset or not.
Partition column in insert overwrite statement should be the last. And the list of partitions to be overwritten in target = list of values in partition column, returned by dataset, does not matter how the source table is partitioned (you can select target partition column from any source table column, calculate it or use a constant), only what was returned does matter.
I have to create a procedure which searches any recently added records and if there are then move them to ARCHIVE table.
This is my statement which filters recently added records
SELECT
CL_ID,
CL_NAME,
CL_SURNAME,
CL_PHONE,
VEH_ID,
VEH_REG_NO,
VEH_MODEL,
VEH_MAKE_YEAR,
WD_ID,
WORK_DESC,
INV_ID,
INV_SERIES,
INV_NUM,
INV_DATE,
INV_PRICE
FROM
CLIENT,
INVOICE,
VEHICLE,
WORKS,
WORKS_DONE
WHERE
Client.CL_ID=Invoice.INV_CL_ID and
Invoice.INV_CL_ID = Client.CL_ID and
Client.CL_ID = Vehicle.VEH_CL_ID and
Vehicle.VEH_ID = Works_Done.WD_VEH_ID and
Works_done.WD_INV_ID = Invoice.INV_ID and
WORKS_DONE.WD_WORK_ID = Works.WORK_ID and
Works_done. Timestamp >= sysdate -1;
You may need something like this (pseudo-code):
create or replace procedure moveRecords is
vLimitDate timestamp := systimestamp -1;
begin
insert into table2
select *
from table1
where your_date >= vLimitDate;
--
delete table1
where your_date >= vLimitDate;
end;
Here are the steps I've used for this sort of task in the past.
Create a global temporary table (GTT) to hold a set of ROWIDs
Perform a multitable direct path insert, which selects the rows to be archived from the source table and inserts their ROWIDs into the GTT and the rest of the data into the archive table.
Perform a delete from the source table, where the source table ROWID is in the GTT of rowids
Issue a commit.
The business with the GTT and the ROWIDs ensures that you have 100% guaranteed stability in the set of rows that you are selecting and then deleting from the source table, regardless of any changes that might occur between the start of your select and the start of your delete (other than someone causing a partitioned table row migration or shrinking the table).
You could alternatively achieve that through changing the transaction isolation level.
O.K. may be something like this...
The downside is - it can be slow for large tables.
The upside is that there is no dependence on date and time - so you can run it anytime and synchronize your archives with live data...
create or replace procedure archive is
begin
insert into archive_table
(
select * from main_table
minus
select * from archive_table
);
end;
I have a question regarding a unified insert query against tables with different data
structures (Oracle). Let me elaborate with an example:
tb_customers (
id NUMBER(3), name VARCHAR2(40), archive_id NUMBER(3)
)
tb_suppliers (
id NUMBER(3), name VARCHAR2(40), contact VARCHAR2(40), xxx, xxx,
archive_id NUMBER(3)
)
The only column that is present in all tables is [archive_id]. The plan is to create a new archive of the dataset by copying (duplicating) all records to a different database partition and incrementing the archive_id for those records accordingly. [archive_id] is always part of the primary key.
My problem is with select statements to do the actual duplication of the data. Because the columns are variable, I am struggling to come up with a unified select statement that will copy the data and update the archive_id.
One solution (that works), is to iterate over all the tables in a stored procedure and do a:
CREATE TABLE temp as (SELECT * from ORIGINAL_TABLE);
UPDATE temp SET archive_id=something;
INSERT INTO ORIGINAL_TABLE (select * from temp);
DROP TABLE temp;
I do not like this solution very much as the DDL commands muck up all restore points.
Does anyone else have any solution?
How about creating a global temporary table for each base table?
create global temporary table tb_customers$ as select * from tb_customers;
create global temporary table tb_suppliers$ as select * from tb_suppliers;
You don't need to create and drop these each time, just leave them as-is.
You're archive process is then a single transaction...
insert into tb_customers$ as select * from tb_customers;
update tb_customers$ set archive_id = :v_new_archive_id;
insert into tb_customers select * from tb_customers$;
insert into tb_suppliers$ as select * from tb_suppliers;
update tb_suppliers$ set archive_id = :v_new_archive_id;
insert into tb_suppliers select * from tb_suppliers$;
commit; -- this will clear the global temporary tables
Hope this helps.
I would suggest not having a single sql statement for all tables and just use and insert.
insert into tb_customers_2
select id, name, 'new_archive_id' from tb_customers;
insert into tb_suppliers_2
select id, name, contact, xxx, xxx, 'new_archive_id' from tb_suppliers;
Or if you really need a single sql statement for all of them at least precreate all the temp tables (as temp tables) and leave them in place for next time. Then just use dynamic sql to refer to the temp table.
insert into ORIGINAL_TABLE_TEMP (SELECT * from ORIGINAL_TABLE);
UPDATE ORIGINAL_TABLE_TEMP SET archive_id=something;
INSERT INTO NEW_TABLE (select * from ORIGINAL_TABLE_TEMP);