Oracle `partition_by` in select clauses, does it create these partitions permantly? - oracle

I only have a superficial understanding on partitions in Oracle, but, I know you can create persistent partitions on Oracle, for example within a create table statement, but, when using partition by clauses within a select statement? Will Oracle create a persistent partition, for caching reasons or whatever, or will the partition be "temporary" in some sense (e.g., it will be removed at the end of the session, the query, or after some time...)?
For example, for a query like
SELECT col1, first_value(col2)
over (partition by col3 order by col2 nulls last) as colx
FROM tbl
If I execute that query, will Oracle create a partition to speed up the execution if I execute it again, tomorrow or three months later? I'm worry about that because I don't know if it could cause memory exhaustion if I abuse that feature.

partition by is used in the query(windows function) to fetch the aggregated result using the windows function which is grouped by the columns mentioned in the partition by. It behaves like group by but has ability to provide grouped result for each row without actually grouping the final outcome.
It has nothing to do with table/index partition.
scope of this partition by is just this query and have no impact on table structure.

Related

Hive View Query Performance: Union tables with different schemas

I have a scenario where I have two Hive tables, and the second one is essentially an evolved schema of the first (it has 1 more column in this example).
Table_A
{
business_date String
Name String
Age Number
} partitioned by business_date
Table_B {
business_date String
Name String
Age Number
Address String
} partitioned by business_date
In order to obfuscate downstream users from schema changes, I am creating a Hive view with the following syntax:
Create VIEW customer_info AS
select * from Table_B
UNION
select business_date, name, age, null as address from Table_A
I know the above returns all the data, but from a performance standpoint, if a query run against the view with a valid business_date value, does it take the partition key into account? Or do I lose this benefit when working with views?
Edit: I should mention that business_date is actually a unique value across all partitions. This means, that data provided in Table_A, should not be provided in Table_B. Think of Table_A as being an "older version" of data. Given this, is this the best approach of serving the data if the goal is to abstract schema changes away from the end consumers?
Edit#2: Storing this data in one table is not possible due to tons of other problems.
You are not using any partition predicates in your query, that is why it will be no partition pruning. Use explain command to check this, it will show partition predicates applied. Partition pruning should work fine with a view.
UNION is the same as UNION ALL+DISTINCT.
Use UNION ALL instead if applicable, it will perform much better.
On the other hand, partitioning by something unique will create partitions with single row, this will kill your hive metastore probably. Hope you mean something else saying that
business_date is actually a unique value across all partitions
Remove partitioning in this case and the performance will be significantly better.

Oracle Partition pruning not happening

I have a fact table with millions of records. The table is range partitioned on a date column.
FACT_AUM (ACCOUNT_ID VARCHAR2(30),MARKET_VALUE NUMBER(20,6), POSTING_DATE DATE);
I have another temp table
ACCOUNT_TMP (ACCOUNT_ID VARCHAR2(30), POSTING_DATE DATE);
When I run this query by hard coding the date I see partition pruning happens and the results come back quickly
SELECT A.ACCOUNT_ID, SUM(A.MARKET_VALUE) FROM
FACT_AUM A JOIN ACCOUNT_TMP B ON A.ACCOUNT_ID = B.ACCOUNT_ID
AND A.POSTING_DATE=TO_DATE('30-DEC-2016',DD-MON-YYYY') GROUP BY
A.ACCOUNT_ID;
when I run the following, I don't see partition pruning and the query keeps spinning
SELECT A.ACCOUNT_ID, SUM(A.MARKET_VALUE) FROM
FACT_AUM A JOIN ACCOUNT_TMP B ON A.ACCOUNT_ID = B.ACCOUNT_ID
AND A.POSTING_DATE = B.POSTING_DATE GROUP BY
A.ACCOUNT_ID;
Any insights on this would be helpful.
Oracle used partition pruning while you hard coded the value, because Oracle felt it would get benefit of doing the partition pruning there.
When you joined the fact table with your temporary ( i would reword it to staging) table, Oracle wouldn't be able to guess which all partitions would it have to hit for computing the answer. Please note Oracle will assess what would be the range of values available in the staging table.
But unless you provide stats of the tables involved, i couldn't dwell into more important topics of the table ordering and tables joins. For quick fix use an Order hint or nested loop hint.

Optimizing a delete... where query with rownum

I'm working with an application that has a large amount of outdated data clogging up a table in my databank. Ideally, I'd want to delete all entries in the table whose reference date is too old:
delete outdatedTable where referenceDate < :deletionCutoffDate
If this statement were to be run, it would take ages to complete, so I'd rather break it up into chunks with the following:
delete outdatedTable where referenceData < :deletionCutoffDate and rownum <= 10000
In testing, this works suprisingly slowly. The following query, however, runs dramatically faster:
delete outdatedTable where rownum <= 10000
I've been reading through multiple blogs and similar questions on StackOverflow, but I haven't yet found a straightforward description of how/whether using rownum affects the Oracle optimizer when there are other Where clauses in the query. In my case, it seems to me as if Oracle checks
referenceData < :deletionCutoffDate
on every single row, executes a massive Select on all matching rows, and only then filters out the top 10000 rows to return. Is this in fact the case? If so, is there any clever way to make Oracle stop checking the Where clause as soon as it's found enough matching rows?
How about a different approach without so much DML on the table. As a permanent solution for future you could go for table partitioning.
Create a new table with required partition(s).
Move ONLY the required rows from your existing table to the new partitioned table.
Once the new table is populated, add the required constraints and indexes.
Drop the old table.
In future, you would just need to DROP the old partitions.
CTAS(create table as select) is another way, however, if you want to have a new table with partition, you would have to go for exchange partition concept.
First of all, you should read about SQL statement's execution plan and learn how to explain in. It will help you to find answers on such questions.
Generally, one single delete is more effective than several chunked. It's main disadvantage is extremal using of undo tablespace.
If you wish to delete most rows of table, much faster way usially a trick:
create table new_table as select * from old_table where date >= :date_limit;
drop table old_table;
rename table new_table to old_table;
... recreate indexes and other stuff ...
If you wish to do it more than once, partitioning is a much better way. If table partitioned by date, you can select actual date quickly and you can drop partion with outdated data in milliseconds.
At last, paritioning if a way to dismiss 'deleting outdated records' at all. Sometimes we need old data, and it's sad if we delete it by own hands. With paritioning you can archive outdated partitions outside of the database, but connects them when you need to access old data.
This is an old request, but I'd like to show another approach (also using partitions).
Depending on what you consider old, you could create corresponding partitions (optimally exactly two; one current, one old; but you could just as well make more), e.g.:
PARTITION BY LIST ( mod(referenceDate,2) )
(
PARTITION year_odd VALUES (1),
PARTITION year_even VALUES (0)
);
This could as well be months (Jan, Feb, ... Dec), decades (XX0X, XX1X, ... XX9X), half years (first_half, second_half), etc. Anything circular.
Then whenever you want to get rid of old data, truncate:
ALTER TABLE mytable TRUNCATE PARTITION year_even;
delete from your_table
where PK not in
(select PK from your_table where rounum<=...) -- these records you want to leave

Wrong index is chosen by Oracle

I have a problem in indexing in Oracle. Will try to explain my problem with an instance as follows.
I have a table TABLE1 with columns A,B,C,D
another table TABLE2 with columns A,B,C,E,F,H
I have created Indexes for TABLE1
IX_1 A
IX_2 A,B
IX_3 A,C
IX_4 A,B,C
I have created Indexes for TABLE1
IY_1 A,B,C
IY_2 A
when i gave query similar to this
SELECT * FROM TABLE1 T1,TABLE2 T2
WHERE T1.A=T2.A
When i give Explain Plan i got its not getting IX_1 nor IY_2
Its taking IX_4 nor IY_1
why this is not picking right index?
EDITED:
Can anyone help me to know difference between INDEX RANGE SCAN,INDEX UNIQUE SCAN, INDEX SKIP SCAN
I guess SKIP SCAN means when a column is skipped in Composite Index by Oracle
what about others i dont have idea!
The best benefit of indexes is that you can select a few rows from a table without scanning the entire table.
If you ask for too many rows(let's say 30% - depends of many things) the engine will prefer to scan the entire table for those rows.
That's because reading a row using an index is gets an overhead : reading some index blocks, and after that reading table blocks.
In your case, in order to join tables T1 and T2, Oracle needs all the rows from those table. Reading(full) the index will be an unsefull operation, adding unnecesary cost.
UPDATE: A step forward: if you run:
SELECT T1.B, T2.B FROM TABLE1 T1,TABLE2 T2
WHERE T1.A=T2.A
Oracle probably will use the indexes(IX2, IY2), because it does not need to read anything from table, because the values T1.B, T2.B, are in indexes.

Is an index clustered or unclustered in Oracle?

How can I determine if an Oracle index is clustered or unclustered?
I've done
select FIELD from TABLE where rownum <100
where FIELD is the field on which is built the index. I have ordered tuples, but the result is wrong because the index is unclustered.
By default all indexes in Oracle are unclustered. The only clustered indexes in Oracle are the Index-Organized tables (IOT) primary key indexes.
You can determine if a table is an IOT by looking at the IOT_TYPE column in the ALL_TABLES view (its primary key could be determined by querying the ALL_CONSTRAINTS and ALL_CONS_COLUMNS views).
Here are some reasons why your query might return ordered rows:
Your table is index-organized and FIELD is the leading part of its primary key.
Your table is heap-organized but the rows are by chance ordered by FIELD, this happens sometimes on an incrementing identity column.
Case 2 will return sorted rows only by chance. The order of the inserts is not guaranteed, furthermore Oracle is free to reuse old blocks if some happen to have available space in the future, disrupting the fragile ordering.
Case 1 will most of the time return ordered rows, however you shouldn't rely on it since the order of the rows returned depends upon the algorithm of the access path which may change in the future (or if you change DB parameter, especially parallelism).
In both case if you want ordered rows you should supply an ORDER BY clause:
SELECT field
FROM (SELECT field
FROM TABLE
ORDER BY field)
WHERE rownum <= 100;
There is no concept of a "clustered index" in Oracle as in SQL Server and Sybase. There is an Index-Organized Table, which is similar but not the same.
"Clustered" indices, as implemented in Sybase, MS SQL Server and possibly others, where rows are physically stored in the order of the indexed column(s) don't exist as such in Oracle. "Cluster" has a different meaning in Oracle, relating, I believe, to the way blocks and tables are organized.
Oracle does have "Index Organized Tables", which are physically equivalent, but they're used much less frequently because the query optimizer works differently.
The closest I can get to an answer to the identification question is to try something like this:
SELECT IOT_TYPE FROM user_tables
WHERE table_name = '<your table name>'
My 10g instance reports IOT or null accordingly.
Index Organized Tables have to be organized on the primary key. Where the primary key is a sequence generated value this is often useless or even counter-productive (because simultaneous inserts get into conflict for the same block).
Single table clusters can be used to group data with the same column value in the same database block(s). But they are not ordered.

Resources