Create a dynamic View based on changable partitions table - oracle

I have an application that reads from a view in oracle, that view reads from a big table, and the view included functions and joins with other tables.
the view takes a while to run because the table becomes bigger for each month.
I try to partitions the table by year and become faster than before.
my problem is how to create a view based on the changeable partition (by year).

Assuming that PARTITION_COL is a date column that is your partitioning key, you could do this:
create or replace
view THIS_CURRENT_YEAR as
select *
from MY_PARTITIONED_TABLE
where PARTITION_COL >= trunc(sysdate,'YYYY')
and PARTITION_COL < add_months(trunc(sysdate,'YYYY'),12)
in this way you'll get partition pruning where possible.

Related

Adding a sequence to a large Oracle table

I have queries that take an existing large table and build tables off of them for reporting. The problem is that the source tables are 60-80MM+ records and it takes a long time to recreate. I'd like to be able to identify which records are new so I can build just add the new records to the reporting tables.
To me, the best way to identify this is to have an identity column. Is there any significant cost to creating this and adding it to the table?
Separately, is it possible to create a materialized view that takes data from one of these tables but add a sequence as part of the materialized view? That is, something like
create materialized view some_materialized_view as
select somesequence.nextval, source_table.*
from source_table?
You can add a sequence based column to your table, but as Gary suggests I wouldn't do that.
The task you are about to solve is so common that other solutions have been already implemented.
The first built-in option that comes to mind is the system change number SCN, a kind of Oracle internal clock. By default, tables are set up to record the SCN of the whole (usually 8K) block, containing usually many rows, but you can set a table to keep a record of the SCN that changed every row. Then you can track the columns that are new or change and have not been copied to your reporting tables.
CREATE TABLE t (c1 NUMBER) ROWDEPENDENCIES;
INSERT INTO t VALUES (1);
COMMIT;
SELECT c1, ora_rowscn FROM t;
Secondly, I would think of adding a date column. With 60-80 mio rows I wouldn't do this with ALTER TABLE xxx ADD (d DATE DEFAULT SYSDATE), but with rename, create as select, drop:
CREATE TABLE t AS SELECT * FROM all_objects;
RENAME t TO told;
CREATE TABLE t AS SELECT sysdate AS d, told.* FROM told;
ALTER TABLE t MODIFY d DATE DEFAULT SYSDATE;
DROP TABLE told;
Thirdly, I would read up on materialized views. I never had the chance to use this a work, but in theory, you should be able to set up a materialized view log on your 80 m table that records changes and updates dependent materialized views.
And forthly, I'd look into partitioning your large table on the (newly introduced) date column, so that identifying the new rows will become faster. That sadly depends on your version and Oracle license, though.

Are materialized views virtual tables or real tables with real data?

When materialized views are created in Oracle, do they store indices or do they store actual table values?
I am asking this as creating index on table and using views on that table and using materialized views (created with refresh complete start with (sysdate) next (sysdate+1) with rowid as) on unindexed table gives similar performance.
Where as I would expect materialized views to be far more faster.
Update
I slightly modified the content/title. My current concern after discussion is if materialized views are actual real tables or virtual tables with some optimization.
Materialized views create a copy of the data. To all intents and purposes they are actual tables. In fact we can create a materialized view from an existing table using the PREBUILT clause. The only difference is how the data is mastered - a materialized view doesn't own its data, a table does.
As to your performance conundrum:
When you say "on unindexed table" do you literally mean one table? If so, we wouldn't expect any difference in the time to query a view, a materialized view or the actual data: they all execute a full table scan on the same volume of data.
Consider the case where views have expecting select * from <table> where <condition>.
We would a SELECT against a materialised view built on that query to execute quicker than the same SELECT against the actual table, provided the WHERE clause restricts the data to a significantly smaller subset of the original data. Simply because a full table scan over a small table (materialised view) takes less time than a full table scan over a big table. Same applies if the materialised view's projection has fewer columns than the base table.
Indexing is a different matter. Unless the query selects a very small subset of the data it's not going to be more efficient than a full table scan and a filter.
To sum up: the only universal tuning heuristic is: it takes less time to do less work. Beyond that it is impossible to generalise. We can't discuss some vague "consider the case where views have select * from <table> where <condition>." It's all about the specifics.
Fundamentally, a materialized view is just a table with an associated query to populate it.
Given static data, one would generally expect the performance of a SELECT * from the materialized view (with no WHERE clause) to be at least as fast as running the query that underlies the materialized view, regardless of indexing.
If we add a WHERE clause to a SELECT * against the mview, however, that query could perform significantly slower than running the query that underlies the mview with the same WHERE clause. That's because the tables referenced in the query underlying the mview could have indexes to support the conditions in the WHERE clause, where as the mview might not have such indexes.

Optimizing a delete... where query with rownum

I'm working with an application that has a large amount of outdated data clogging up a table in my databank. Ideally, I'd want to delete all entries in the table whose reference date is too old:
delete outdatedTable where referenceDate < :deletionCutoffDate
If this statement were to be run, it would take ages to complete, so I'd rather break it up into chunks with the following:
delete outdatedTable where referenceData < :deletionCutoffDate and rownum <= 10000
In testing, this works suprisingly slowly. The following query, however, runs dramatically faster:
delete outdatedTable where rownum <= 10000
I've been reading through multiple blogs and similar questions on StackOverflow, but I haven't yet found a straightforward description of how/whether using rownum affects the Oracle optimizer when there are other Where clauses in the query. In my case, it seems to me as if Oracle checks
referenceData < :deletionCutoffDate
on every single row, executes a massive Select on all matching rows, and only then filters out the top 10000 rows to return. Is this in fact the case? If so, is there any clever way to make Oracle stop checking the Where clause as soon as it's found enough matching rows?
How about a different approach without so much DML on the table. As a permanent solution for future you could go for table partitioning.
Create a new table with required partition(s).
Move ONLY the required rows from your existing table to the new partitioned table.
Once the new table is populated, add the required constraints and indexes.
Drop the old table.
In future, you would just need to DROP the old partitions.
CTAS(create table as select) is another way, however, if you want to have a new table with partition, you would have to go for exchange partition concept.
First of all, you should read about SQL statement's execution plan and learn how to explain in. It will help you to find answers on such questions.
Generally, one single delete is more effective than several chunked. It's main disadvantage is extremal using of undo tablespace.
If you wish to delete most rows of table, much faster way usially a trick:
create table new_table as select * from old_table where date >= :date_limit;
drop table old_table;
rename table new_table to old_table;
... recreate indexes and other stuff ...
If you wish to do it more than once, partitioning is a much better way. If table partitioned by date, you can select actual date quickly and you can drop partion with outdated data in milliseconds.
At last, paritioning if a way to dismiss 'deleting outdated records' at all. Sometimes we need old data, and it's sad if we delete it by own hands. With paritioning you can archive outdated partitions outside of the database, but connects them when you need to access old data.
This is an old request, but I'd like to show another approach (also using partitions).
Depending on what you consider old, you could create corresponding partitions (optimally exactly two; one current, one old; but you could just as well make more), e.g.:
PARTITION BY LIST ( mod(referenceDate,2) )
(
PARTITION year_odd VALUES (1),
PARTITION year_even VALUES (0)
);
This could as well be months (Jan, Feb, ... Dec), decades (XX0X, XX1X, ... XX9X), half years (first_half, second_half), etc. Anything circular.
Then whenever you want to get rid of old data, truncate:
ALTER TABLE mytable TRUNCATE PARTITION year_even;
delete from your_table
where PK not in
(select PK from your_table where rounum<=...) -- these records you want to leave

comparing data in two tables taking time

I need to query table1 find all orders and created date ( key is order number an date)).
In table 2 ( key is order number an date) Check if the order exists for a a date.
For this i am scanning table 1 and for each record checking if it exists in table 2. Any better way to do this
In this situation in which your key is identical for both tables, it makes sense to have a single table in which you store both data for Table 1 and Table 2. In that way you can do a single scan on your data and know straight away if the data exists for both criteria.
Even more so, if you want to use this data in MapReduce, you would simply scan that single table. If you only want to get the relevant rows, you could define a filter on the Scan. For example, in the case where you will not be populating rows at all in Table 2, you would simply use a ColumnPrefixFilter
If, however, you do need to keep this data separately in 2 tables, you could pre-split the tables with the same region boundaries for both tables - this will be helpful when you do the query that you are aiming for - load all rows in Table 1 when row exists in Table 2. Essentially this would be a map-side join. You could define multiple inputs in your MapReduce job, and since the region borders are the same, the splits will be such that each mapper will have corresponding rows from both tables. You would probably need to implement your own MultipleInput format for that (the MultiTableInputFormat class recently introduced in 0.96 does not seem to do that map side join)

Can I create an Oracle view that automatically checks for new monthly tables?

I'm wondering if its possible to create a view that automatically checks if there is a new monthly created table and if there is include that one?
We have a new table created each month and each one ends with the number of the month, like
table for January: table_1
table for February: table_2
etc...
Is it possible to create a view that takes data from all those tables and also finds when there is a new one created?
No, a view's definition is static. You would have to replace the view each month with a new copy that included the new table; you could write a dynamic PL/SQL program to do this. Or you could create all the empty tables now and include them all in the view definition; if necessary you could postpone granting any INSERT access to the future tables until they become "live".
But really, this model is flawed - see Michael Pakhantsov's answer for a better alternative - or just have one simple table with a MONTH column.
Will be possible if you instead of creating new table each month will create new partition for existing table.
UPDATE:
If you have oracle SE without partitioning option you can create two tables: LiveTable and ArchiveTable. Then each month you need move rows from Live to ArchiveTable and clean live table. In this case you need create view just from two tables.
Another option is to create the tables in another schema with grants to the relevant user and create public synonyms to them.
As the monthly tables get created in the local schema, they'll "out-precedence" the public synonyms and the view will pick them up. It will still get invalidated and need recompiling, but the actual view text should need changing, which may be simpler from a code-control point of view.
You can write a procedure or function that looks at USER_TABLES or ALL_TABLES to determine if a table exists, generate dynamic sql, and return a ref cursor with the data. The same can be done with a pipelined function.

Resources