I am looking for performance improvement of SQL (ORACLE)
Based on few examples I tried to compare execution time between simple join between two tables v/s same query with MatrializedView.
Both execution time is almost same.
TableA Join TableB
V/s
CREATE MATERIALIZED VIEW emp_mv
BUILD IMMEDIATE
REFRESH FORCE
ON DEMAND
AS (QUERY TableA Join TableB)
both sqls are running for 7m for 1000 records.
total we have 14k Records in Table A and 50 recoreds in Table B , final output with 14K records
Is there anything which I am missing regarding performance of query execution?
Why would you expect the same query to act differently?
Materialized view's benefit might come later, when you actually start using data it contains because you already pre-processed it and prepared for future use. You could use the same query (with the join) over and over again and it'll take more or less the same time (disregard caching). But, if you store that query's data into a materialized view (and properly index it), data retrieval might/should be faster.
That's kind of "opposite" of creating an ordinary view which doesn't contain any data - it is just a stored query and it retrieves data every time you select from it, performing the same join all over again.
Materialized view contains data, just as if it were a table. It helps a lot if data is stored in tables you access over database links - that might be, and usually is, slow. But, if you create a materialized view (during night/off hours), you have data available to you much faster. It won't help much if data in tables change frequently because you'll have to refresh the MV frequently as well (usually ON COMMIT), but - if tables are really large, you have a complex query, then refreshing might also take some (a lot of?) time.
Related
When materialized views are created in Oracle, do they store indices or do they store actual table values?
I am asking this as creating index on table and using views on that table and using materialized views (created with refresh complete start with (sysdate) next (sysdate+1) with rowid as) on unindexed table gives similar performance.
Where as I would expect materialized views to be far more faster.
Update
I slightly modified the content/title. My current concern after discussion is if materialized views are actual real tables or virtual tables with some optimization.
Materialized views create a copy of the data. To all intents and purposes they are actual tables. In fact we can create a materialized view from an existing table using the PREBUILT clause. The only difference is how the data is mastered - a materialized view doesn't own its data, a table does.
As to your performance conundrum:
When you say "on unindexed table" do you literally mean one table? If so, we wouldn't expect any difference in the time to query a view, a materialized view or the actual data: they all execute a full table scan on the same volume of data.
Consider the case where views have expecting select * from <table> where <condition>.
We would a SELECT against a materialised view built on that query to execute quicker than the same SELECT against the actual table, provided the WHERE clause restricts the data to a significantly smaller subset of the original data. Simply because a full table scan over a small table (materialised view) takes less time than a full table scan over a big table. Same applies if the materialised view's projection has fewer columns than the base table.
Indexing is a different matter. Unless the query selects a very small subset of the data it's not going to be more efficient than a full table scan and a filter.
To sum up: the only universal tuning heuristic is: it takes less time to do less work. Beyond that it is impossible to generalise. We can't discuss some vague "consider the case where views have select * from <table> where <condition>." It's all about the specifics.
Fundamentally, a materialized view is just a table with an associated query to populate it.
Given static data, one would generally expect the performance of a SELECT * from the materialized view (with no WHERE clause) to be at least as fast as running the query that underlies the materialized view, regardless of indexing.
If we add a WHERE clause to a SELECT * against the mview, however, that query could perform significantly slower than running the query that underlies the mview with the same WHERE clause. That's because the tables referenced in the query underlying the mview could have indexes to support the conditions in the WHERE clause, where as the mview might not have such indexes.
I have read many posts compaing External table with sqlloader and the main advantage is optimizing the select query with many options available in SQL for the external table. But i am finding it difficult to do selects on large files(1.5 GB). Just for a select count(*) itself it takes minutes to perform.
My plan is to generate a report based on this data by doing a number of select statements from this data. I wonder if this is a better idea compared to loading the data to an internal table.
I assume the ideal use of External table would be to do SELECT on the file to perform cleanup and Load to an internal table more efficiently. It is not meant to use the file as a table for a longer duration(Especially for large files). Please correct if i am wrong.
If you're going to execute multiple select on data from big file it is much better to load it to some internal staging table (either by SQLoader or by external table and insert as select) and then perform queries.
You should probably consider creating some indexes on table to speed up your queries.
I have big sql that scans through multiple tables having million records. After query completion, i am getting 250K records. The resultset will be saved in a staging table before getting written in files. There is a possibility that the resultset will have duplicates.
The question is, which of the following options is better and gives a better result
Doing a group by or distinct before inserting into resultset into the staging table.
Insert duplicate records into staging table and use distinct/group by while selecting records from staging table
There is not much difference between 1 and 2
If you filter the duplicates before inserting then you are reducing the number of writes that you need to make into the staging table and, since those duplicate rows will not be in the staging table, then you are also going to reduce the number of reads from the staging table when you come to write it out to a file. So, logically, option 1 should give better performance.
However, if you are that concerned about the difference between the two then the answer has to be "profile both methods on your system and see which is best on your hardware/tables/indexes/etc".
I am reviewing my team's database setup, particularly focusing on Materialized Views. In most cases, we are currently doing 'Complete' refreshes, and I want to move to doing fast refreshes.
In some cases, this is straight forward -- the MV is based directly on a table on our source database, and I can enable MVIEW LOGS on the table and recreate the MV.
But in a number of cases, the MVs are based on a combination of other MVs, and Views, etc, that go several levels deep before I get to the tables on our source database.
In these cases, if I track down the ultimate source tables, will enabling MVIEW LOGS on them allow the top MV and any intermediate MVs, to use fast refresh?
The Oracle documentation contains an example for a FAST REFRESH of a materialized view based on an UNION ALL view:
CREATE VIEW view_with_unionall AS
(SELECT c.rowid crid, c.cust_id, 2 umarker
FROM customers c WHERE c.cust_last_name = 'Smith'
UNION ALL
SELECT c.rowid crid, c.cust_id, 3 umarker
FROM customers c WHERE c.cust_last_name = 'Jones');
CREATE MATERIALIZED VIEW unionall_inside_view_mv
REFRESH FAST ON DEMAND AS
SELECT * FROM view_with_unionall;
So in principle, you can indeed fast refresh materialized views based on views.
Some things to note:
there are a couple of restrictions for fast refreshable materialized views. E.g. you cannot use ROWNUM, SYSDATE or HAVING. See the docs for details
somewhat counterintuitively, a FAST REFRESH is not always faster than a COMPLETE REFRESH. This depends on the amount of data that has changed since the last refresh; IMHO, Oracle should have used the term INCREMENTAL REFRESH instead
Oracle provides a procedure for that: DBMS_MVIEW.EXPLAIN_MVIEW
You can use this procedure to check whether your Materialized Views is capable for FAST REFRESH, it also tells you the reason why it is not.
For me the most strange restriction for FAST REFRESH is: When you join several tables you have to use the (old) Oracle Join syntax, ANSI join syntax does not work. Some time ago a created a case at Oracle support for this issue, however the answer from Oracle was: "This is not a bug, it is just a lack of documentation."(!)
I don't know if it still applies for Oracle 12c version.
I have following query which has select query that returns data in 5sec. But when I add create materialized view command infront it takes ever for the query to create materialized view.
When you create a materialized view, you actually create a copy of the data that Oracle takes care to keep synchronized (and it makes those views somewhat like indexes). If your view operates over a big amount of data or over data from other servers, it's natural that the creating this view can take time.
From docs.oracle.com:
A materialized view is a replica of a target master from a single
point in time.
Just for "yuks", try
create table temp_tab nologging as select ...
I've seen cases where MV creation is long for some reason, probably logging.
Also, query development tools sometimes begin returning the data to the screen right away, but if you "paged" to the last row, you would find out how long it really takes to get all the data.
You should profile the select statement with explain plan and understand the table cardinality, indexes, waits states when running, ... in order to see if the query needs tuning.