I have a fact table with millions of records. The table is range partitioned on a date column.
FACT_AUM (ACCOUNT_ID VARCHAR2(30),MARKET_VALUE NUMBER(20,6), POSTING_DATE DATE);
I have another temp table
ACCOUNT_TMP (ACCOUNT_ID VARCHAR2(30), POSTING_DATE DATE);
When I run this query by hard coding the date I see partition pruning happens and the results come back quickly
SELECT A.ACCOUNT_ID, SUM(A.MARKET_VALUE) FROM
FACT_AUM A JOIN ACCOUNT_TMP B ON A.ACCOUNT_ID = B.ACCOUNT_ID
AND A.POSTING_DATE=TO_DATE('30-DEC-2016',DD-MON-YYYY') GROUP BY
A.ACCOUNT_ID;
when I run the following, I don't see partition pruning and the query keeps spinning
SELECT A.ACCOUNT_ID, SUM(A.MARKET_VALUE) FROM
FACT_AUM A JOIN ACCOUNT_TMP B ON A.ACCOUNT_ID = B.ACCOUNT_ID
AND A.POSTING_DATE = B.POSTING_DATE GROUP BY
A.ACCOUNT_ID;
Any insights on this would be helpful.
Oracle used partition pruning while you hard coded the value, because Oracle felt it would get benefit of doing the partition pruning there.
When you joined the fact table with your temporary ( i would reword it to staging) table, Oracle wouldn't be able to guess which all partitions would it have to hit for computing the answer. Please note Oracle will assess what would be the range of values available in the staging table.
But unless you provide stats of the tables involved, i couldn't dwell into more important topics of the table ordering and tables joins. For quick fix use an Order hint or nested loop hint.
Related
I only have a superficial understanding on partitions in Oracle, but, I know you can create persistent partitions on Oracle, for example within a create table statement, but, when using partition by clauses within a select statement? Will Oracle create a persistent partition, for caching reasons or whatever, or will the partition be "temporary" in some sense (e.g., it will be removed at the end of the session, the query, or after some time...)?
For example, for a query like
SELECT col1, first_value(col2)
over (partition by col3 order by col2 nulls last) as colx
FROM tbl
If I execute that query, will Oracle create a partition to speed up the execution if I execute it again, tomorrow or three months later? I'm worry about that because I don't know if it could cause memory exhaustion if I abuse that feature.
partition by is used in the query(windows function) to fetch the aggregated result using the windows function which is grouped by the columns mentioned in the partition by. It behaves like group by but has ability to provide grouped result for each row without actually grouping the final outcome.
It has nothing to do with table/index partition.
scope of this partition by is just this query and have no impact on table structure.
We have a table with partitions. It also has an overflow partition (max partition) which sorts of acts as a catch-all for records which do not match the partition criteria. The idea was to create the partitions ahead of time so the records never end up in the max_partition. However for one table, this was missed out, so all the records ended up in that single partition.
Now most of these records are not used anymore so they can be deleted. However our approach is to drop the partitions when its too old. This cannot be done in this case. Is there an easy way to handle the purge?
Maybe its an idea to create the partitions now and move the records to them and then drop the partition now, but however it seems like its going to be very poor in performance. The other option was to create a temp table where a subset of records are moved and deleted from there, but again moving the records individually seems time consuming. This table has around 5 million records.
Which would be the best way forward, performance wise. We could manage a little downtime but not much.
We use Oracle 11g.
The table creation script looks something like this:
CREATE TABLE "TRANSACTIONS"
("year" number(4,0) NOT NULL ENABLE)
PARTITION BY RANGE ("year")
(PARTITION "P_OLD" VALUES LESS THAN (2010),
PARTITION "P_2011" VALUES LESS THAN (2011),
...
PARTITION "P_MAX" VALUES LESS THAN (MAXVALUE));
There is no need to drop the partition, you can purge it.
alter table TRANSACTIONS TRUNCATE PARTITION P_MAX UPDATE INDEXES;
or if you prefer, you can also delete the rows:
delete from TRANSACTIONS PARTITION (P_MAX);
You may use INTERVAL partition to make it simpler (actually I don't understand your question):
CREATE TABLE TRANSACTIONS (
...
TRANSACTION_DATE TIMESTAMP(0) NOT NULL
)
PARTITION BY RANGE (TRANSACTION_DATE) INTERVAL (INTERVAL '12' MONTH)
(PARTITION P_OLD VALUES LESS THAN (TIMESTAMP '2000-01-01 00:00:00' ) )
ENABLE ROW MOVEMENT;
I have a scenario where I have two Hive tables, and the second one is essentially an evolved schema of the first (it has 1 more column in this example).
Table_A
{
business_date String
Name String
Age Number
} partitioned by business_date
Table_B {
business_date String
Name String
Age Number
Address String
} partitioned by business_date
In order to obfuscate downstream users from schema changes, I am creating a Hive view with the following syntax:
Create VIEW customer_info AS
select * from Table_B
UNION
select business_date, name, age, null as address from Table_A
I know the above returns all the data, but from a performance standpoint, if a query run against the view with a valid business_date value, does it take the partition key into account? Or do I lose this benefit when working with views?
Edit: I should mention that business_date is actually a unique value across all partitions. This means, that data provided in Table_A, should not be provided in Table_B. Think of Table_A as being an "older version" of data. Given this, is this the best approach of serving the data if the goal is to abstract schema changes away from the end consumers?
Edit#2: Storing this data in one table is not possible due to tons of other problems.
You are not using any partition predicates in your query, that is why it will be no partition pruning. Use explain command to check this, it will show partition predicates applied. Partition pruning should work fine with a view.
UNION is the same as UNION ALL+DISTINCT.
Use UNION ALL instead if applicable, it will perform much better.
On the other hand, partitioning by something unique will create partitions with single row, this will kill your hive metastore probably. Hope you mean something else saying that
business_date is actually a unique value across all partitions
Remove partitioning in this case and the performance will be significantly better.
I'm working with an application that has a large amount of outdated data clogging up a table in my databank. Ideally, I'd want to delete all entries in the table whose reference date is too old:
delete outdatedTable where referenceDate < :deletionCutoffDate
If this statement were to be run, it would take ages to complete, so I'd rather break it up into chunks with the following:
delete outdatedTable where referenceData < :deletionCutoffDate and rownum <= 10000
In testing, this works suprisingly slowly. The following query, however, runs dramatically faster:
delete outdatedTable where rownum <= 10000
I've been reading through multiple blogs and similar questions on StackOverflow, but I haven't yet found a straightforward description of how/whether using rownum affects the Oracle optimizer when there are other Where clauses in the query. In my case, it seems to me as if Oracle checks
referenceData < :deletionCutoffDate
on every single row, executes a massive Select on all matching rows, and only then filters out the top 10000 rows to return. Is this in fact the case? If so, is there any clever way to make Oracle stop checking the Where clause as soon as it's found enough matching rows?
How about a different approach without so much DML on the table. As a permanent solution for future you could go for table partitioning.
Create a new table with required partition(s).
Move ONLY the required rows from your existing table to the new partitioned table.
Once the new table is populated, add the required constraints and indexes.
Drop the old table.
In future, you would just need to DROP the old partitions.
CTAS(create table as select) is another way, however, if you want to have a new table with partition, you would have to go for exchange partition concept.
First of all, you should read about SQL statement's execution plan and learn how to explain in. It will help you to find answers on such questions.
Generally, one single delete is more effective than several chunked. It's main disadvantage is extremal using of undo tablespace.
If you wish to delete most rows of table, much faster way usially a trick:
create table new_table as select * from old_table where date >= :date_limit;
drop table old_table;
rename table new_table to old_table;
... recreate indexes and other stuff ...
If you wish to do it more than once, partitioning is a much better way. If table partitioned by date, you can select actual date quickly and you can drop partion with outdated data in milliseconds.
At last, paritioning if a way to dismiss 'deleting outdated records' at all. Sometimes we need old data, and it's sad if we delete it by own hands. With paritioning you can archive outdated partitions outside of the database, but connects them when you need to access old data.
This is an old request, but I'd like to show another approach (also using partitions).
Depending on what you consider old, you could create corresponding partitions (optimally exactly two; one current, one old; but you could just as well make more), e.g.:
PARTITION BY LIST ( mod(referenceDate,2) )
(
PARTITION year_odd VALUES (1),
PARTITION year_even VALUES (0)
);
This could as well be months (Jan, Feb, ... Dec), decades (XX0X, XX1X, ... XX9X), half years (first_half, second_half), etc. Anything circular.
Then whenever you want to get rid of old data, truncate:
ALTER TABLE mytable TRUNCATE PARTITION year_even;
delete from your_table
where PK not in
(select PK from your_table where rounum<=...) -- these records you want to leave
I have a problem in indexing in Oracle. Will try to explain my problem with an instance as follows.
I have a table TABLE1 with columns A,B,C,D
another table TABLE2 with columns A,B,C,E,F,H
I have created Indexes for TABLE1
IX_1 A
IX_2 A,B
IX_3 A,C
IX_4 A,B,C
I have created Indexes for TABLE1
IY_1 A,B,C
IY_2 A
when i gave query similar to this
SELECT * FROM TABLE1 T1,TABLE2 T2
WHERE T1.A=T2.A
When i give Explain Plan i got its not getting IX_1 nor IY_2
Its taking IX_4 nor IY_1
why this is not picking right index?
EDITED:
Can anyone help me to know difference between INDEX RANGE SCAN,INDEX UNIQUE SCAN, INDEX SKIP SCAN
I guess SKIP SCAN means when a column is skipped in Composite Index by Oracle
what about others i dont have idea!
The best benefit of indexes is that you can select a few rows from a table without scanning the entire table.
If you ask for too many rows(let's say 30% - depends of many things) the engine will prefer to scan the entire table for those rows.
That's because reading a row using an index is gets an overhead : reading some index blocks, and after that reading table blocks.
In your case, in order to join tables T1 and T2, Oracle needs all the rows from those table. Reading(full) the index will be an unsefull operation, adding unnecesary cost.
UPDATE: A step forward: if you run:
SELECT T1.B, T2.B FROM TABLE1 T1,TABLE2 T2
WHERE T1.A=T2.A
Oracle probably will use the indexes(IX2, IY2), because it does not need to read anything from table, because the values T1.B, T2.B, are in indexes.