Automatic Partitioning in Oracle11g - oracle

I have a table with a huge volume of data. I need partitioning to be done on a daily basis automatically. I need the name of the partition to be the date of sysdate. How can I go about doing this?

There is currently (11gR2) no way to specify a name for the auto-generated partitions in an interval-partitioned table. See Common Questions On Interval Partitioning [ID 1479115.1] (Oracle support account required):
What will be the names of the automatically created interval partitions?
[...] Currently it is not possible to specify a mask or template for partition names, but the system generated name can be renamed [...]
You're also restricted to a partition key column which must be of type DATE or NUMBER, and a few other things (see that note).
You can follow the example in the Creating Partitions documentation for the syntax:
create table foo (date_created date, ...)
partition by range(date_created)
interval(numtodsinterval(1, 'DAY'))
(partition one values less than (to_date('01012013', 'DDMMYYYY')));
With the above, a new partition will be created whenever you insert a row with a date value this year or later. New partitions will not be created for dates before 2013.
To work around the partition name issue (if necessary at all), you could rename the partitions based on HIGH_VALUE in USER_TAB_PARTITIONS, although that doesn't sound very nice.
Another option is to not rename them at all and use this syntax when you want to query a specific partition:
select *
from foo
partition for (<the day you're interested in>);
See for example: Oracle Interval Partitioning Tips.

Related

Move Range Interval partition data from one table to history table in other database

We have a primary table that is Range partitioned by date with a 1-month interval. It's also a list sub-partitioned with 4 distinct values. So essentially it is one month partition having 4 sub-partitions.
Database: Oracle 19c
I need advice on how to effectively move the partition/sub-partition data from active schema to historical schema in another database.
Also, there are about 30 tables that are referenced partitioned on the primary table for which the data needs to be moved as well. Overall I'm looking to move about 2500 subpartitions
I'm not sure if an exchange partition would be the right approach in this scenario?
TIA
You could use exchange to get the data rapidly out of your active table, but you would still then to send that table over the wire to the remote history database to load it in.
In which case, using "exchange" probably is just adding more steps to the process for little gain. (There are still potential uses here depending on how you want to handle indexing etc).
But simplest is perhaps just transferring the data over, assuming a common structure between the two tables, ie
insert /*+ APPEND */ into history_table#remote_db
select * from active_table partition ( myparname )
I can't remember if partition naming syntax is supported over a db link, but if not, then the appropriate date predicates will do the same trick, and then just follow up with:
alter table active_table truncate partition myparname;

Oracle {LIST} Partition by Program and {RANGE} subpartition by DATE with interval

I'm trying to figure the best possible way to determine the partition strategy in Oracle 12c (12.2.0.1.0)
This post
is almost identical to my requirements. However, I want to know the best possible way to implement in Oracle 12c (12.2.0.1.0) version.
Here is my question:
We have four (4) distinct programs for which the bills are submitted in our system.
The approx volume of bills submitted per year is as follows:
Program_1 ~ 3M per year
Program_2 ~ 1M per year
Program_3 ~ 500K per year
Program_4 ~ 100K per year
My initial thought process is to create PARTITION BY LIST (PROGRAM) AND SUBPARTITION BY RANGE (BILL_SUBMISSION_DATE).
I would like to use oracle interval feature for SUBPARTITION, would like to know if there are any limitations with this approach.
Your approach of partitioning by PROGRAM and sub-partitioning by BILL_SUBMISSTION_DATE sounds good.
I have not tested performance differences (I imagine they would be negligible), but for coding the INTERVAL option makes querying and maintenance easier in my opinion.
For the following example the table partition clause I used was:
partition by range (INVOICE_MONTH) interval (numtoyminterval(1, 'MONTH'))
Example query, using old style partition names, query a partition for an invoice for April 2012, assuming I created a partition named INV201204 for those invoices for that month:
select * from MARK_INV_HDR
partition ('INV201204');
And the same query, using INTERVAL automatically generated partitions:
select * from MARK_INV_HDR
where invoice_month = to_date('2012-04', 'yyyy-mm');
The advantage of the later query is I don't have to know the naming convention for the partitions.
To drop the oldest partition, one query and one DDL:
select to_char(min(invoice_month), 'dd-Mon-yyyy') as min_inv_dt from MARK_INV_HDR;
MIN_INV_DT
-----------
01-Apr-2012
alter table mark_inv_hdr
drop partition for (TO_DATE('01-Apr-2012', 'dd-Mon-yyyy'))
update global indexes;
EDIT:
Update: I forgot that you cannot use the INTERVAL clause on a sub-partition; thanks to user12283435 for the reminder. In looking more closely at the question, it appears that there is probably no need to partition on PROGRAM, so just a single partition by range on BILL_SUBMISSION_DATE with the INTERVAL clause should work fine.
When you have a small set of values like you do for PROGRAM, no obvious reason to partition on it. The typical example of partitioning by list given in Oracle documentation is list of regions, for a global call center, so that you can do batch reports and maintenance on certain regions after business hours, etc. You can have a global bit-mapped index on PROGRAM, if you don't do many updates, if you query criteria frequently includes just one PROGRAM. (Updating a column with a bit-mapped index will briefly lock the table.)

Fast data migration on the same database

I'm trying to find a way to perform a migration from two tables on the same database. This migration should be as fast as possible in order to minimize the downtime.
To put it on an example lets say I have a person table like so:
person_table -> (id, name, address)
So a person as an Id, a name and an address. My system will contain millions of person registries and it was decided that the person table should be partitioned. To do so, I've created a new table:
partitioned_person_table->(id,name,address,partition_time)
Now this table will contain an extra column called partition_time. This is the partition key for this table since this is a range partition (one partition every hour).
Finally, I need to find a way to move all the information from the person_table to the partitioned_person_table with the best performance.
The first thing I could try is to simply create a statement like:
INSERT INTO partitioned_person_table (id, name, address, partition_time)
SELECT id, name, address, CURRENT_TIMESTAMP FROM person_table;
The problem is that when it comes to millions of registries this might become very slow (also the temporary tablespace might not be able to handle all this information)
My second approach was to use the EXCHANGE PARTITION method. Unfortunetly, I cannot do this because the tables contain diffrent column numbers.
Is there any other way that I can perfom this with the best performance (less downtime) ?
Thank you.
If you can live with the state, that all the current records would be located in one partition (and your INSERT approach suggest that), you may only
1) add a new column partition_time either as NULL or possible with metadata default only - required 12c
2) switch the table to a partitioned table either with online redefinition (if you have no maintainace window, where the table is offline) or with exchange partition otherwise.

Query a table in different ways or orderings in Cassandra

I've recently started to play around with Cassandra. My understanding is that in a Cassandra table you define 2 keys, which can be either single column or composites:
The Partitioning Key: determines how to distribute data across nodes
The Clustering Key: determines in which order the records of a same partitioning key (i.e. within a same node) are written. This is also the order in which the records will be read.
Data from a table will always be sorted in the same order, which is the order of the clustering key column(s). So a table must be designed for a specific query.
But what if I need to perform 2 different queries on the data from a table. What is the best way to solve this when using Cassandra ?
Example Scenario
Let's say I have a simple table containing posts that users have written :
CREATE TABLE posts (
username varchar,
creation timestamp,
content varchar,
PRIMARY KEY ((username), creation)
);
This table was "designed" to perform the following query, which works very well for me:
SELECT * FROM posts WHERE username='luke' [ORDER BY creation DESC];
Queries
But what if I need to get all posts regardless of the username, in order of time:
Query (1): SELECT * FROM posts ORDER BY creation;
Or get the posts in alphabetical order of the content:
Query (2): SELECT * FROM posts WHERE username='luke' ORDER BY content;
I know that it's not possible given the table I created, but what are the alternatives and best practices to solve this ?
Solution Ideas
Here are a few ideas spawned from my imagination (just to show that at least I tried):
Querying with the IN clause to select posts from many users. This could help in Query (1). When using the IN clause, you can fetch globally sorted results if you disable paging. But using the IN clause quickly leads to bad performance when the number of usernames grows.
Maintaining full copies of the table for each query, each copy using its own PRIMARY KEY adapted to the query it is trying to serve.
Having a main table with a UUID as partitioning key. Then creating smaller copies of the table for each query, which only contain the (key) columns useful for their own sort order, and the UUID for each row of the main table. The smaller tables would serve only as "sorting indexes" to query a list of UUID as result, which can then be fetched using the main table.
I'm new to NoSQL, I would just want to know what is the correct/durable/efficient way of doing this.
The SELECT * FROM posts ORDER BY creation; will results in a full cluster scan because you do not provide any partition key. And the ORDER BY clause in this query won't work anyway.
Your requirement I need to get all posts regardless of the username, in order of time is very hard to achieve in a distributed system, it supposes to:
fetch all user posts and move them to a single node (coordinator)
order them by date
take top N latest posts
Point 1. require a full table scan. Indeed as long as you don't fetch all records, the ordering can not be achieve. Unless you use Cassandra clustering column to order at insertion time. But in this case, it means that all posts are being stored in the same partition and this partition will grow forever ...
Query SELECT * FROM posts WHERE username='luke' ORDER BY content; is possible using a denormalized table or with the new materialized view feature (http://www.doanduyhai.com/blog/?p=1930)
Question 1:
Depending on your use case I bet you could model this with time buckets, depending on the range of times you're interested in.
You can do this by making the primary key a year,year-month, or year-month-day depending on your use case (or finer time intervals)
The basic idea is that you bucket changes for what suites your use case. For example:
If you often need to search these posts over months in the past, then you may want to use the year as the PK.
If you usually need to search the posts over several days in the past, then you may want to use a year-month as the PK.
If you usually need to search the post for yesterday or a couple of days, then you may want to use a year-month-day as your PK.
I'll give a fleshed out example with yyyy-mm-dd as the PK:
The table will now be:
CREATE TABLE posts_by_creation (
creation_year int,
creation_month int,
creation_day int,
creation timeuuid,
username text, -- using text instead of varchar, they're essentially the same
content text,
PRIMARY KEY ((creation_year,creation_month,creation_day), creation)
)
I changed creation to be a timeuuid to guarantee a unique row for each post creation event. If we used just a timestamp you could theoretically overwrite an existing post creation record in here.
Now we can then insert the Partition Key (PK): creation_year, creation_month, creation_day based on the current creation time:
INSERT INTO posts_by_creation (creation_year, creation_month, creation_day, creation, username, content) VALUES (2016, 4, 2, now() , 'fromanator', 'content update1';
INSERT INTO posts_by_creation (creation_year, creation_month, creation_day, creation, username, content) VALUES (2016, 4, 2, now() , 'fromanator', 'content update2';
now() is a CQL function to generate a timeUUID, you would probably want to generate this in the application instead, and parse out the yyyy-mm-dd for the PK and then insert the timeUUID in the clustered column.
For a usage case using this table, let's say you wanted to see all of the changes today, your CQL would look like:
SELECT * FROM posts_by_creation WHERE creation_year = 2016 AND creation_month = 4 AND creation_day = 2;
Or if you wanted to find all of the changes today after 5pm central:
SELECT * FROM posts_by_creation WHERE creation_year = 2016 AND creation_month = 4 AND creation_day = 2 AND creation >= minTimeuuid('2016-04-02 5:00-0600') ;
minTimeuuid() is another cql function, it will create the smallest possible timeUUID for the given time, this will guarantee that you get all of the changes from that time.
Depending on the time spans you may need to query a few different partition keys, but it shouldn't be that hard to implement. Also you would want to change your creation column to a timeuuid for your other table.
Question 2:
You'll have to create another table or use materialized views to support this new query pattern, just like you thought.
Lastly if your not on Cassandra 3.x+ or don't want to use materialized views you can use Atomic batches to ensure data consistency across your several de-normalized tables (that's what it was designed for). So in your case it would be a BATCH statement with 3 inserts of the same data to 3 different tables that support your query patterns.
The solution is to create another tables to support your queries.
For SELECT * FROM posts ORDER BY creation;, you may need some special column for grouping it, maybe by month and year, e.g. PRIMARY KEY((year, month), timestamp) this way the cassandra will have a better performance on read because it doesn't need to scan the whole cluster to get all data, it will also save the data transfer between nodes too.
Same as SELECT * FROM posts WHERE username='luke' ORDER BY content;, you must create another table for this query too. All column may be same as your first table but with the different Primary Key, because you cannot order by the column that is not the clustering column.

"Dynamic" partitions in oracle 11g

I have a log table with a lot of information.
I would like to partition it into two: first part is the logs from the past month, since they are commonly viewed. Second part is the logs from the rest of the year (Compressed).
My problem is that all the examples of partitions where "up until 1/1/2013", "more recent than 1/1/2013" - That is with fixed dates...
What I am looking for/expecting is a way to define a partition on the last month, so that when the day changes, the logs from 30 days ago, are "automatically" transferred to the compressed partition.
I guess I can create another table which is completley compressed and move info using JOBS, but I was hoping for a built-in solution.
Thank you.
I think you want interval partitions based on a date. This will automatically generate the partitions for you. For example, monthly partitions would be:
create table test_data (
created_date DATE default sysdate not null,
store_id NUMBER,
inventory_id NUMBER,
qty_sold NUMBER
)
PARTITION BY RANGE (created_date)
INTERVAL(NUMTOYMINTERVAL(1, 'MONTH'))
(
PARTITION part_01 values LESS THAN (TO_DATE('20130101','YYYYMMDD'))
)
As data is inserted, Oracle will put into the proper partition or create one if needed. The partition names will be a bit cryptic (SYS_xxxx), but you can use the "partition for" clause to grab only the month you want. For example:
select * from test_data partition for (to_date('20130101', 'YYYYMMDD'))
It is not possible to automatically transfer data to a compressed partition. You can, however, schedule a simple job to compress last month's partition at the beginning of every month with this statement:
ALTER TABLE some_table
MOVE PARTITION FOR (add_months(trunc(SYSDATE), -1)
COMPRESS;
If you wanted to stay with only two partitions: current month and archive for all past transactions you could also merge partitions with ALTER TABLE MERGE PARTITIONS, but as far as I'm concerned it would rebuild the whole archive partition, so I would discourage doing so and stay with storing each month in its separate partition.

Resources