In our database(Oracle 12c, Exadata) we plan to store sales data. Input text files containing sales data comes daily basis(~1000 files every day each containing ~20000 rows). Text files are read and transfered to db asap. Acccording to our calculations it will grow up to 5 terabytes in one year.
Data format:
[transaction date][category][sales_number][buyer_id][other columns]
sales data comes in 10 different categories with same fields. Data logically can be stored in just one single table or can be divided into 10 tables (with respect to categories).
What is the best practice for storing such kind of big data in oracle? What kind of partitioning and indexing strategy should be applied?
Constraints: Data should be available for analysis in 2-3 days to marketing department. Queries based on [sales_number] or [category],[buyer_id] or [buyer_id] columns.
If the number of categories is known and fix then you can use a subpartition for each category.
One approach could be this one:
CREATE TABLE SALES_DATA
(
TRANSACTION_DATE TIMESTAMP(0) NOT NULL,
CATEGORY NUMBER NOT NULL,
SALES_NUMBER NUMBER,
BUYER_ID NUMBER,
[OTHER COLUMNS]
)
PARTITION BY RANGE (TRANSACTION_DATE) INTERVAL (INTERVAL '1' DAY)
SUBPARTITION BY LIST (CATEGORY)
SUBPARTITION TEMPLATE
(
SUBPARTITION CAT_1 VALUES (1),
SUBPARTITION CAT_2 VALUES (2),
SUBPARTITION CAT_3_AND_4 VALUES (3,4),
SUBPARTITION CAT_5 VALUES (5),
...
SUBPARTITION CAT_10 VALUES (10),
SUBPARTITION CAT_OTHERS VALUES (DEFAULT)
)
(
PARTITION P_INITIAL VALUES LESS THAN (TIMESTAMP '2018-01-01 00:00:00')
);
Local indexes would be needed on sales_number and buyer_id. You can put every (sub)partition into a separated tablespace if required.
Related
Summary of the question: To Create table with partitions which are range partitioned. However records which do not know the range value should reside in a different (default) partition and be moved to the correct partition when the value is filled. The default partition would never be dropped while the other partitions would be dropped after a defined retention period via an script.
The whole story:
I have a table where the records have to be placed in a partition based on a date field. This is a growing table and after some time the data from these partitions can be purged. I used to create table with something like the snippet below.
This works fine because we knew the value of the date column based on which we partition (RDATE). However in our new project we do not know this when a record is inserted. The value would eventually be filled in during the course of the application processing.
My initial thought was to create MAXPARTITION (MAXVALUE) which would be a catch-all partition for records which do not have the date filled and enable ROW MOVEMENTS so that when the date is filled it moves into an appropriate partition. However I think it is not possible to have both MAXVALUE partition and interval partitioning together. Is that right?
Also Is there a better way to do this?
PARTITION BY RANGE ("RDATE") INTERVAL (NUMTODSINTERVAL (1,'DAY'))
SUBPARTITION BY HASH ("RKEY")
SUBPARTITION TEMPLATE (
SUBPARTITION "SP01",
SUBPARTITION "SP02",
SUBPARTITION "SP03",
SUBPARTITION "SP04",
SUBPARTITION "SP05",
SUBPARTITION "SP06",
SUBPARTITION "SP07",
SUBPARTITION "SP08",
SUBPARTITION "SP09",
SUBPARTITION "SP10",
SUBPARTITION "SP11",
SUBPARTITION "SP12",
SUBPARTITION "SP13",
SUBPARTITION "SP14",
SUBPARTITION "SP15",
SUBPARTITION "SP16" )
(PARTITION "INITIALPARTITION" VALUES LESS THAN (TO_DATE(' 2016-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))
I expect a table with default and range partitions and records to move to the range partitions from the default when a column is filled.
The column you use as partition key cannot be NULL but you can use a workaround like this:
CREATE TABLE ... (
...
RDATE DATE,
PARTITION_KEY DATE GENERATED ALWAYS AS (COALESCE(RDATE, DATE '1969-12-31'))
)
PARTITION BY RANGE (PARTITION_KEY) INTERVAL (NUMTODSINTERVAL (1,'DAY'))
...
(PARTITION INITIAL_PARTITION VALUES LESS THAN (DATE '1970-01-01'))
ENABLE ROW MOVEMENT;
If you insert a record with RDATE = NULL then it will be inserted into partition INITIAL_PARTITION. For the initial data (e.g. 1970-01-01) you must select a values whicc will never fall into the "real" date values. You could also use a date in far future, e.g.
CREATE TABLE ... (
...
RDATE DATE,
PARTITION_KEY DATE GENERATED ALWAYS AS (COALESCE(RDATE, DATE '2999-12-31'))
)
PARTITION BY RANGE (PARTITION_KEY) INTERVAL (NUMTODSINTERVAL (1,'DAY'))
...
(PARTITION INITIAL_PARTITION VALUES LESS THAN (DATE '2019-04-01'))
ENABLE ROW MOVEMENT;
-- Create DEFAULT_PARTITION
INSERT INTO ... (RDATE) VALUES (NULL);
ROLLBACK;
ALTER TABLE ... RENAME PARTITION FOR (TIMESTAMP '2999-12-31 00:00:00') TO DEFAULT_PARTITION;
I have a table with 2017 and 2018 year data. Need to create monthly partition on that table.
So I created one non partitioned table and loaded all the data from original table. now I am converting the new table to a monthly partitioned table.
When I am altering getting error as
ORA-14300: partitioning key maps to a partition outside maximum
permitted number of partitions
My Script is
ALTER TABLE ORDERHDR_PART MODIFY
PARTITION BY RANGE (LASTUPDATE) INTERVAL(NUMTOYMINTERVAL(1, 'MONTH'))
(
PARTITION ORDERHDR_PART_JAN VALUES less than (TO_DATE('01-02-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_FEB VALUES less than (TO_DATE('01-03-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_MAR VALUES less than (TO_DATE('01-04-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_APR VALUES less than (TO_DATE('01-05-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_MAY VALUES less than (TO_DATE('01-06-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_JUN VALUES less than (TO_DATE('01-07-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_JUL VALUES less than (TO_DATE('01-08-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_AUG VALUES less than (TO_DATE('01-09-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_SEP VALUES less than (TO_DATE('01-10-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_OCT VALUES less than (TO_DATE('01-11-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_NOV VALUES less than (TO_DATE('01-12-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_DEC VALUES less than (TO_DATE('01-01-2019','DD-MM-YYYY'))
)ONLINE;
I think your approach is wrong.
First create a partitioned table, e.g.
CREATE TABLE ORDERHDR_PART (....)
PARTITION BY RANGE (LASTUPDATE) INTERVAL (NUMTOYMINTERVAL(1, 'MONTH'))
(
PARTITION ORDERHDR_INITIAL VALUES less than (DATE '2000-01-01')
);
Then transfer existing data to the new table.
Either you use a simple INSERT INTO ORDERHDR_PART SELECT * FROM ORDERHDR_2017;
Oracle will create monthly partitions automatically based on LASTUPDATE value.
With this methods you would duplicate (temporary) your data and/or you may face a performance issue.
The other method is to use Exchanging Partitions, should be like this
ALTER TABLE ORDERHDR_PART
EXCHANGE PARTITION FOR (DATE '2017-01-01')
WITH TABLE ORDERHDR_2017
INCLUDING INDEXES;
I don't know whether "PARTITION FOR (DATE '2017-01-01')" is created automatically, perhaps you have to run INSERT INTO ORDERHDR_PART (LASTUPDATE) VALUES (DATE '2017-01-01'); ROLLBACK; in order to create it first.
You will get one partition for all months, afterwards you can split the partition with Splitting into Multiple Partitions. Should be like this:
ALTER TABLE ORDERHDR_PART SPLIT PARTITION FOR (DATE '2017-01-01') INTO (
PARTITION ORDERHDR_PART_JAN VALUES less than (TO_DATE('01-02-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_FEB VALUES less than (TO_DATE('01-03-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_MAR VALUES less than (TO_DATE('01-04-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_APR VALUES less than (TO_DATE('01-05-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_MAY VALUES less than (TO_DATE('01-06-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_JUN VALUES less than (TO_DATE('01-07-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_JUL VALUES less than (TO_DATE('01-08-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_AUG VALUES less than (TO_DATE('01-09-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_SEP VALUES less than (TO_DATE('01-10-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_OCT VALUES less than (TO_DATE('01-11-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_NOV VALUES less than (TO_DATE('01-12-2018','DD-MM-YYYY')),
PARTITION ORDERHDR_PART_DEC VALUES less than (TO_DATE('01-01-2019','DD-MM-YYYY'))
);
Note, by default you cannot drop the inital partition of a RANGE partitioned table. If you face this problem execute:
ALTER TABLE ORDERHDR_PART SET INTERVAL ();
ALTER TABLE ORDERHDR_PART DROP PARTITION ORDERHDR_INITIAL;
ALTER TABLE ORDERHDR_PART SET INTERVAL (NUMTOYMINTERVAL(1, 'MONTH'));
I want to create a table where I do partition by hash on one column and subpartition by list on another column. Table creation should look like below:
CREATE TABLE testt
(
Id CHAR(3),
time DATE,
month AS (EXTRACT (MONTH FROM time))
)
PARTITION BY HASH (Id)
PARTITIONS 4
STORE IN (ts1, ts2, ts3, ts4)
SUBPARTITION BY LIST (month)
SUBPARTITION template
(
SUBPARTITION JANUARY VALUES (01),
SUBPARTITION FEBRUARY VALUES (02),
...
)
I need to maintain partition by hash for legacy reasons. I can change subpartition to Range/Hash.
But Oracle is simply not letting me create Partition by hash + subpartition by list/range/hash. I searched a lot but didn't get even one example. Now I am wondering if it is even supported or not. Can someone please let me know how to do it?
Your statement has invalid syntax, see http://docs.oracle.com/database/121/SQLRF/statements_7002.htm#CJABBBAI.
The specification of hash partition count and tablespaces should be after the subpartition templates.
CREATE TABLE testt
(
Id CHAR(3),
time DATE,
month AS (EXTRACT (MONTH FROM time))
)
PARTITION BY HASH (Id)
SUBPARTITION BY LIST (month)
SUBPARTITION template (
SUBPARTITION JANUARY VALUES (01),
SUBPARTITION FEBRUARY VALUES (02),
...
)
PARTITIONS 4
STORE IN (ts1, ts2, ts3, ts4)
Below the simplified structure of a table:
create table customer(
incident_id number,
customer_id number,
customer_name varchar2(400),
sla_id number
failure_start_date date,
failure_end_date date,
churn_flag number, -- 0 or 1
active number, -- 0 or 1
constraint pk_incident_id primary key (incident_id))
PARTITION BY LIST (active)
SUBPARTITION BY LIST (churn_flag)
SUBPARTITION TEMPLATE
( SUBPARTITION sp_churn_flag_1 VALUES (1)
, SUBPARTITION sp_churn_flag_0 VALUES (0)
)
(PARTITION sp_active_1 values (1)
, PARTITION sp_active_0 VALUES (0)
)
,
ENABLE ROW MOVEMENT COMPRESS FOR QUERY LOW;
Now I need to add additonally to the existing Composite-List-Partition an Interval-Range-Partitioning, in order to partitionate the data by month (failure_starte_date - YYYYMM). The table contains data from 200701 up to now (201511). Failure_start_date < 2013 should be partitionied into one partition for older data. All newer months should have an dedicated partition, whereas partitions for upcoming months shall be created automatically.
How can this be integrating into the already existing partitoning?
You cannot do it exactly the way you want. Partitioning strategies are limited in two relevant ways: first, composite strategies can only have two levels (you need 3) and second, interval partitioning, when used in a composite strategy must be at the top level.
Here is the closest legal thing to what you want:
CREATE TABLE matt_customer
(
incident_id NUMBER,
customer_id NUMBER,
customer_name VARCHAR2 (400),
sla_id NUMBER,
failure_start_date DATE,
failure_end_date DATE,
churn_flag VARCHAR2 (1), -- 0 or 1
active VARCHAR2 (1), -- 0 or 1
active_churn_flags VARCHAR2 (2) GENERATED ALWAYS AS (active || churn_flag) VIRTUAL,
CONSTRAINT pk_incident_id PRIMARY KEY (incident_id)
)
PARTITION BY RANGE
(failure_start_date)
INTERVAL ( NUMTOYMINTERVAL (1, 'MONTH') )
SUBPARTITION BY LIST
(active_churn_flags)
SUBPARTITION TEMPLATE (
SUBPARTITION sp_ac_00 VALUES ('00'),
SUBPARTITION sp_ac_01 VALUES ('01'),
SUBPARTITION sp_ac_10 VALUES ('10'),
SUBPARTITION sp_ac_11 VALUES ('11'))
(PARTITION customer_old VALUES LESS THAN (TO_DATE ('01-JAN-2013', 'DD-MON-YYYY')))
ENABLE ROW MOVEMENT
--COMPRESS FOR QUERY LOW;
;
This uses interval-list partitioning, and uses a virtual column to combine your active and churn_flag columns into one (I turned those columns into VARCHAR2(1) for simplicity.
To make use of partition pruning, your queries would need to be modified to select active_churn_flags = '01' for example, instead of specifying values for active and churn_flag independently.
I'm trying to add partitions to a table I created. I want it partitioned on "PARTITION GRP" and subpartitioned by month. But I don't know how to write the INTERVAL clause inside a subpartition.
Can someone help me on this? thanks!
PARTITION BY RANGE (PARTITION_GRP)
SUBPARTITION BY RANGE (RPTG_MTH_DATE)
INTERVAL(NUMTOYMINTERVAL(1,'MONTH'))
(
PARTITION PG_0 VALUES LESS THAN (1)
(SUBPARTITION PG_0_201401 VALUES LESS THAN (TO_DATE('1-FEB-2014', 'DD-MON-YYYY'))),
PARTITION PG_1 VALUES LESS THAN (2)
(SUBPARTITION PG_1_201401 VALUES LESS THAN (TO_DATE('1-FEB-2014', 'DD-MON-YYYY'))),
PARTITION PG_2 VALUES LESS THAN (3)
(SUBPARTITION PG_2_201401 VALUES LESS THAN (TO_DATE('1-FEB-2014', 'DD-MON-YYYY'))),
PARTITION PG_3 VALUES LESS THAN (4)
(SUBPARTITION PG_3_201401 VALUES LESS THAN (TO_DATE('1-FEB-2014', 'DD-MON-YYYY'))),
PARTITION PG_4 VALUES LESS THAN (MAXVALUE)
(SUBPARTITION PG_4_201401 VALUES LESS THAN (TO_DATE('1-FEB-2014', 'DD-MON-YYYY')))
)
From the documentation:
Restrictions on Interval Partitioning
The INTERVAL clause is subject to the following restrictions:
You can specify only one partitioning key column, and it must be of NUMBER or DATE type.
This clause is not supported for index-organized tables.
You cannot create a domain index on an interval-partitioned table.
Interval partitioning is not supported at the subpartition level.
...
So you can't have interval subpartitioning.