I have read quite a lot for past few hours about refreshing MV in Oracle, but I cannot still find an answer to my question. Imagine I have a MV view on top of a table with change logs. So that there are three records in this MV:
COL_ID, COL1
1, "OLD"
2, "OLD"
3, "OLD"
Now let's say the value for COL1 has changed to "EDITED" for record 1 in the table used to create MV. I want to perform fast, in-place refresh to update MV as fast as possible. In real life example with around 50M records, this would take around 3 minutes to refresh.
Imagine situation.
Refresh process is still running (there are still records that have not been amended in the MV).
Neverthelss, record with ID = 1 has been already processed so that it has value of "EDITED" in MV.
In another session, a query to MV is executed to get the value of record with ID = 1.
As the result will record with value "OLD" or "EDITED" be given?
I understand that because this is an in-place refresh after this record is processed by refresh mechanism the value in the materialized view will reflect the value in origin table ("EDITED"). But is there any mechanism (like undo logs) that would make the whole refresh atomic? And by this I mean that unless all amendments are done to the materialized views (unless the refresh process is done) if user queries for rows that have been modified in the process of undergoing refresh, he/she will be presented to an old, cached value - before the change.
I presume that this bevahiour is true for out-of-place refresh, but since the former seems to be working way more efficient in terms of time consumption I was curious whether this is also true for the in-place transformation. If not by default, is there any way to force this atomic behaviour?
----- [EDIT]
I run the test following test to see if during the refresh process the results I get from the materialized view will change gradually.
-- create table
create table MV_REFRESH_ATOMICITY_TEST
(
id NUMBER,
value NUMBER
)
-- populate initial data
-- delete from MV_REFRESH_ATOMICITY_TEST
declare
begin
for i in 1..10000000 loop
insert into MV_REFRESH_ATOMICITY_TEST values(i, 0);
end loop;
end;
-- check if equal zero and 1M
select sum(value) from MV_REFRESH_ATOMICITY_TEST
select to_char(count(*),'999,999,999') as COUNT from MV_REFRESH_ATOMICITY_TEST
-- create mv logs on the table
-- drop materialized view log on MV_REFRESH_ATOMICITY_TEST;
create materialized view log on MV_REFRESH_ATOMICITY_TEST with rowid;
-- create mv on top
-- drop materialized view MV_REFRESH_ATOMICITY_TEST_MV
create materialized view MV_REFRESH_ATOMICITY_TEST_MV
refresh fast on demand with rowid
as
select
fact.*,
fact.ROWID "FACT_ROWID"
from
MV_REFRESH_ATOMICITY_TEST fact
-- check if equals zero and 10M
select sum(value) from MV_REFRESH_ATOMICITY_TEST_MV
select to_char(count(*),'999,999,999') as COUNT from MV_REFRESH_ATOMICITY_TEST_MV
-- change value for first million records, 1 milion records in the middle, last milion of records
update MV_REFRESH_ATOMICITY_TEST set value = 1 where id between 1 and 1000000
update MV_REFRESH_ATOMICITY_TEST set value = 1 where id between 5000001 and 6000000
update MV_REFRESH_ATOMICITY_TEST set value = 1 where id between 9000001 and 10000000
-- check if equals 3.000.000
select to_char(sum(value),'999,999,999') as "SUM" from MV_REFRESH_ATOMICITY_TEST
-- check if equals 3.000.000
select to_char(count(*),'999,999,999') from MLOG$_MV_REFRESH_ATOMICITY;
--select * from MLOG$_MV_REFRESH_ATOMICITY;
-- while refreshing mv
-- exec dbms_mview.refresh('MV_REFRESH_ATOMICITY_TEST_MV', 'F');
-- below sum should be equal 0
select
( select sum(value) from MV_REFRESH_ATOMICITY_TEST_MV ) "SUM",
( select count(*) from MV_REFRESH_ATOMICITY_TEST_MV ) "NUMBER OF RECORDS"
from dual
So what I discovered by constantly executing the last select statement was that the only time that the SUM value changed, it already changed by 3M, meaning all the records have been changed in one go - atomically.
Nevertheless, I am not 100% percent sure I can trust this experiment as at some point it took around 40s to execute these select queries. The whole refresh statement took 911s to execute.
[EDIT]
This question has been marked as a possible duplicate of this thread. The other thread does respond to a similar problem, but for a complete-refresh which to my understanding is performed in very different way than fast-refresh which is the case here. Therefore I am not sure whether the same explanation can be applied here.
As far I can see from Oracle documentation (http://docs.oracle.com/cd/B19306_01/server.102/b14226/repmview.htm#i31171) - all refresh is done in atomic way:
A materialized view's data does not necessarily match the current data of its master table or master materialized view at all times. A materialized view is a transactionally (read) consistent reflection of its master as the data existed at a specific point in time (that is, at creation or when a refresh occurs).
Oracle provides even greater read consistency with materialized view groups:
To preserve referential integrity and transactional (read) consistency among multiple materialized views, Oracle has the ability to refresh individual materialized views as part of a refresh group. After refreshing all of the materialized views in a refresh group, the data of all materialized views in the group correspond to the same transactionally consistent point in time.
Related
Friend, I have question about cascade trigger.
I have 2 tables, table data that has 3 attributes (id_data, sum, and id_tool), and table tool that has 3 attributes (id_tool, name, sum_total). table data and tool are joined using id_tool.
I want create trigger for update info sum_total. So , if I inserting on table data, sum_total on table tool where tool.id_tool = data.id_tool will updating too.
I create this trigger, but error ora-04090.
create or replace trigger aft_ins_tool
after insert on data
for each row
declare
v_stok number;
v_jum number;
begin
select sum into v_jum
from data
where id_data= :new.id_data;
select sum_total into v_stok
from tool
where id_tool=
(select id_tool
from data
where id_data= :new.id_data);
if inserting then
v_stok := v_stok + v_jum;
update tool
set sum_total=v_stok
where id_tool=
(select id_tool
from data
where id_data= :new.id_data);
end if;
end;
/
please give me opinion.
Thanks.
The ora-04090 indicates that you already have an AFTER INSERT ... FOR EACH ROW trigger on that table. Oracle doesn't like that, because the order in which the triggers fire is unpredictable, which may lead to unpredictable results, and Oracle really doesn't like those.
So, your first step is to merge the two sets of code into a single trigger. Then the real fun begins.
Presumably there is only one row in data matching the current value of id_data (if not your data model is rally messed up and there's no hope for your situation). Anyway, that means the current row already gives you access to the values of :new.sum and :new.id_tool. So you don't need those queries on the data table: removing those selects will remove the possibility of "mutating table" errors.
As a general observation, maintaining aggregate or summary tables like this is generally a bad idea. Usually it is better just to query the information when it is needed. If you really have huge volumes of data then you should use a materialized view to maintain the summary, rather than hand-rolling something.
I have a Materialized view in Oracle that contains a LEFT JOIN which takes a very long time to update. When I update the underlying table it takes 63914.765 s to run (yes that is almost 17 hours).
I am using a LEFT JOIN on the same table, because I want to pivot the data from rows to columns. The pivot command is not available in this Oracle version, and using a GROUP BY + CASE is not allowed on a FAST REFRESH Materialized View.
The Materialized View Log looks like this:
CREATE MATERIALIZED VIEW LOG ON Programmes_Titles
WITH PRIMARY KEY, rowid
INCLUDING NEW Values;
The Materialized View itself looks like this (it contains 700000 rows, the Programmes_Titles table contains 900000 rows):
CREATE MATERIALIZED VIEW Mv_Web_Programmes
REFRESH FAST ON COMMIT
AS
SELECT
t1.ProgrammeId,
t1.Title as MainTitle,
t2.Title as SecondaryTitle,
--Primary key
t1.Title_Id as t1_titleId,
t2.Title_Id as t2_titleId,
t1.rowid as t1_rowid,
t2.rowid as t2_rowid
FROM
Programmes_Titles t1,
Programmes_Titles t2
WHERE
t1.Titles_Group_Type = 'mainTitle'
AND t1.Programme_Id = t2.Programme_Id(+) AND t2.Titles_Group_Type(+) = 'secondaryTitle'
The UPDATE statement I use is this:
UPDATE Programmes_Titles
SET Title = 'New title'
WHERE rowid = 'AAAL4cAAEAAAftTABB'
This UPDATE statement takes 17 hours.
When using an INNER JOIN (remove the (+)'s) it takes milliseconds.
I also tried adding INDEXES on the Mv_Web_Programmes Materialized View, but that did not seem to help either. (It still runs for more than a minute, which is way to slow, I am not waiting 17 hours after every change, so it might improved the UPDATE)
So my question is: Why does is take such a long time to UPDATE the underlying table? How can I improve this?
I've managed to reproduce your problem on a 10.2.0.3 instance. The self- and outer-join seems to be the major problem (although with indexes on every column of the MV it finally did update in under a minute).
At first I thought you could use an aggregate MV:
SQL> CREATE MATERIALIZED VIEW LOG ON Programmes_Titles
2 WITH PRIMARY KEY, ROWID (programmeId, Titles_Group_Type, title)
3 INCLUDING NEW Values;
Materialized view log created
SQL> CREATE MATERIALIZED VIEW Mv_Web_Programmes
2 REFRESH FAST ON COMMIT
3 AS
4 SELECT ProgrammeId,
5 MAX(decode(t1.Titles_Group_Type, 'mainTitle', t1.Title)) MainTl,
6 MAX(decode(t1.Titles_Group_Type, 'secondaryTitle', t1.Title)) SecTl
7 FROM Programmes_Titles t1
8 GROUP BY ProgrammeId;
Materialized view created
Unfortunately, as you have noticed, as of 10g a MV that contains MIN or MAX can only be fast-refreshed on commit after insert (so called insert-only MV). The above solution would not work for update/delete (the MV would have to be refreshed manually).
You could trace your session and open the trace file to see what SQL query gets executed so that you can find if you can optimize it via indexes.
We too faced this issue recently on Oracle 11.2.0.3
In our case, it was unavoidable to to remove an 'OUTER JOIN' due to functional impact.
On investigation, it was found that Oracle was adding a nasty HASH_SH (Hash Semi Join) hint with MV refresh DML.
Nothing worked including things mentioned in following blog-
http://www.adellera.it/blog/2010/03/11/fast-refresh-of-join-only-mvs-_mv_refresh_use_stats-and-locking-log-stats/#comment-2975
In the end, a hidden hint worked...(though in general, it should be avoided by making change in application if possible)
Oracle Doc ID 1949537.1 suggests that setting the hidden _mv_refresh_use_hash_sj parameter to FALSE should prevent it using that hint.
alter session set "_mv_refresh_use_hash_sj"=FALSE;
That stopped CBO using the HASH_SJ hint.
Posting it here in the interests of others.
I have table with "varchar2" as primary key.
It has about 1 000 000 Transactions per day.
My app wakes up every 5 minute to generate text file by querying only new record.
It will remember last point and process only new records.
Do you have idea how to query with good performance?
I am able to add new column if necessary.
What do you think this process should do by?
plsql?
java?
Everyone here is really really close. However:
Scott Bailey's wrong about using a bitmap index if the table's under any sort of continuous DML load. That's exactly the wrong time to use a bitmap index.
Everyone else's answer about the PROCESSED CHAR(1) check in ('Y','N')column is right, but missing how to index it; you should use a function-based index like this:
CREATE INDEX MY_UNPROCESSED_ROWS_IDX ON MY_TABLE
(CASE WHEN PROCESSED_FLAG = 'N' THEN 'N' ELSE NULL END);
You'd then query it using the same expression:
SELECT * FROM MY_TABLE
WHERE (CASE WHEN PROCESSED_FLAG = 'N' THEN 'N' ELSE NULL END) = 'N';
The reason to use the function-based index is that Oracle doesn't write index entries for entirely NULL values being indexed, so the function-based index above will only contain the rows with PROCESSED_FLAG = 'N'. As you update your rows to PROCESSED_FLAG = 'Y', they'll "fall out" of the index.
Well, if you can add a new column, you could create a Processed column, which will indicate processed records, and create an index on this column for performance.
Then the query should only be for those rows that have been newly added, and not processed.
This should be easily done using sql queries.
Ah, I really hate to add another answer when the others have come so close to nailing it. But
As Ponies points out, Oracle does have a hidden column (ORA_ROWSCN - System Change Number) that can pinpoint when each row was modified. Unfortunately, the default is that it gets the information from the block instead of storing it with each row and changing that behavior will require you to rebuild a really large table. So while this answer is good for quieting the SQL Server fella, I'd not recommend it.
Astander is right there but needs a few caveats. Add a new column needs_processed CHAR(1) DEFAULT 'Y' and add a BITMAP index. For low cardinality columns ('Y'/'N') the bitmap index will be faster. Once you have the rest is pretty easy. But you've got to be careful not select the new rows, process them and mark them as processed in one step. Otherwise, rows could be inserted while you are processing that will get marked processed even though they have not been.
The easiest way would be to use pl/sql to open a cursor that selects unprocessed rows, processes them and then updates the row as processed. If you have an aversion to walking cursors, you could collect the pk's or rowids into a nested table, process them and then update using the nested table.
In MS SQL Server world where I work, we have a 'version' column of type 'timestamp' on our tables.
So, to answer #1, I would add a new column.
To answer #2, I would do it in plsql for performance.
Mark
"astander" pretty much did the work for you. You need to ALTER your table to add one more column (lets say PROCESSED)..
You can also consider creating an INDEX on the PROCESSED ( a bitmap index may be of some advantage, as the possible value can be only 'y' and 'n', but test it out ) so that when you query it will use INDEX.
Also if sure, you query only for every 5 mins, check whether you can add another column with TIMESTAMP type and partition the table with it. ( not sure, check out again ).
I would also think about writing job or some thing and write using UTL_FILE and show it front end if it can be.
If performance is really a problem and you want to create your file asynchronously, you might want to use Oracle Streams, which will actually get modification data from your redo log withou affecting performance of the main database. You may not even need a separate job, as you can configure Oracle Streams to do Asynchronous replication of the changes, through which you can trigger the file creation.
Why not create an extra table that holds two columns. The ID column and a processed flag column. Have an insert trigger on the original table place it's ID in this new table. Your logging process can than select records from this new table and mark them as processed. Finally delete the processed records from this table.
I'm pretty much in agreement with Adam's answer. But I'd want to do some serious testing compared to an alternative.
The issue I see is that you need to not only select the rows, but also do an update of those rows. While that should be pretty fast, I'd like to avoid the update. And avoid having any large transactions hanging around (see below).
The alternative would be to add CREATE_DATE date default sysdate. Index that. And then select records where create_date >= (start date/time of your previous select).
But I don't have enough data on the relative costs of setting a sysdate as default vs. setting a value of Y, updating the function based vs. date index, and doing a range select on the date vs. a specific select on a single value for the Y. You'll probably want to preserve stats or hint the query to use the index on the Y/N column, and definitely want to use a hint on a date column -- the stats on the date column will almost certainly be old.
If data are also being added to the table continuously, including during the period when your query is running, you need to watch out for transaction control. After all, you don't want to read 100,000 records that have the flag = Y, then do your update on 120,000, including the 20,000 that arrived when you query was running.
In the flag case, there are two easy ways: SET TRANSACTION before your select and commit after your update, or start by doing an update from Y to Q, then do your select for those that are Q, and then update to N. Oracle's read consistency is wonderful but needs to be handled with care.
For the date column version, if you don't mind a risk of processing a few rows more than once, just update your table that has the last processed date/time immediately before you do your select.
If there's not much information in the table, consider making it Index Organized.
What about using Materialized view logs? You have a lot of options to play with:
SQL> create table test (id_test number primary key, dummy varchar2(1000));
Table created
SQL> create materialized view log on test;
Materialized view log created
SQL> insert into test values (1, 'hello');
1 row inserted
SQL> insert into test values (2, 'bye');
1 row inserted
SQL> select * from mlog$_test;
ID_TEST SNAPTIME$$ DMLTYPE$$ OLD_NEW$$ CHANGE_VECTOR$$
---------- ----------- --------- --------- ---------------------
1 01/01/4000 I N FE
2 01/01/4000 I N FE
SQL> delete from mlog$_test where id_test in (1,2);
2 rows deleted
SQL> insert into test values (3, 'hello');
1 row inserted
SQL> insert into test values (4, 'bye');
1 row inserted
SQL> select * from mlog$_test;
ID_TEST SNAPTIME$$ DMLTYPE$$ OLD_NEW$$ CHANGE_VECTOR$$
---------- ----------- --------- --------- ---------------
3 01/01/4000 I N FE
4 01/01/4000 I N FE
I think this solution should work..
What you need to do following steps
For the first run, you will have to copy all records. In first run you need to execute following query
insert into new_table(max_rowid) as (Select max(rowid) from yourtable);
Now next time when you want to get only newly inserted values, you can do it by executing follwing command
Select * from yourtable where rowid > (select max_rowid from new_table);
Once you are done with processing above query, simply truncate new_table and insert max(rowid) from yourtable
I think this should work and would be fastest solution;
I have a self referencing table in Oracle 9i, and a view that gets data from it:
CREATE OR REPLACE VIEW config AS
SELECT c.node_id,
c.parent_node_id,
c.config_key,
c.config_value,
(SELECT c2.config_key
FROM vera.config_tab c2
WHERE c2.node_id = c.parent_node_id) AS parent_config_key,
sys_connect_by_path(config_key, '.') path,
sys_connect_by_path(config_key, '->') php_notation
FROM config_tab c
CONNECT BY c.parent_node_id = PRIOR c.node_id
START WITH c.parent_node_id IS NULL
ORDER BY LEVEL DESC
The table stores configuration for PHP application. Now I need to use same config in oracle view.
I would like to select some values from the view by path, but unfortunately this takes 0,15s so it's unacceptable cost.
SELECT * FROM some_table
WHERE some_column IN (
SELECT config_value FROM config_tab WHERE path = 'a.path.to.config'
)
At first I thought of a function index on sys_connect_by_path, but it is impossible, as it needs also CONNECT BY clause.
Any suggestions how can I emulate an index on the path column from the 'config' view?
If your data doesn't change frequently in the config_tab, you could use a materialized view with the same query as your view. You could then index the path column of your materialized view.
CREATE MATERIALIZED VIEW config
REFRESH COMPLETE ON DEMAND
AS <your_query>;
CREATE INDEX ix_config_path ON config (path);
Since this is a complex query, you would need to do a full refresh of your materialized view every time the base table is updated so that the data in the MV doesn't become stale.
Update
Your column path will be defined as a VARCHAR2(4000). You could limit the size of this column in order to index it. In your query, replace sys_connect_by_path(...) by SUBSTR(sys_connect_by_path(..., 1, 1000) for example.
You won't be able to use REFRESH ON COMMIT on a complex MV. A simple trigger won't work. You will have to modify the code that updates your base table to include a refresh somehow, I don't know if this is practical in your environment.
You could also use a trigger that submits a job that will refresh the MV. The job will execute once you commit (this is a feature of dbms_job). This is more complex since you will have to check that you only trigger the job once per transaction (using a package variable for example). Again, this is only practical if you don't update the base table frequently.
We have an Oracle database, and the customer account table has about a million rows. Over the years, we've built four different UIs (two in Oracle Forms, two in .Net), all of which remain in use. We have a number of background tasks (both persistent and scheduled) as well.
Something is occasionally holding a long lock (say, more than 30 seconds) on a row in the account table, which causes one of the persistent background tasks to fail. The background task in question restarts itself once the update times out. We find out about it a few minutes after it happens, but by then the lock has been released.
We have reason to believe that it might be a misbehaving UI, but haven't been able to find a "smoking gun".
I've found some queries that list blocks, but that's for when you've got two jobs contending for a row. I want to know which rows have locks when there's not necessarily a second job trying to get a lock.
We're on 11g, but have been experiencing the problem since 8i.
Oracle's locking concept is quite different from that of the other systems.
When a row in Oracle gets locked, the record itself is updated with the new value (if any) and, in addition, a lock (which is essentially a pointer to transaction lock that resides in the rollback segment) is placed right into the record.
This means that locking a record in Oracle means updating the record's metadata and issuing a logical page write. For instance, you cannot do SELECT FOR UPDATE on a read only tablespace.
More than that, the records themselves are not updated after commit: instead, the rollback segment is updated.
This means that each record holds some information about the transaction that last updated it, even if the transaction itself has long since died. To find out if the transaction is alive or not (and, hence, if the record is alive or not), it is required to visit the rollback segment.
Oracle does not have a traditional lock manager, and this means that obtaining a list of all locks requires scanning all records in all objects. This would take too long.
You can obtain some special locks, like locked metadata objects (using v$locked_object), lock waits (using v$session) etc, but not the list of all locks on all objects in the database.
you can find the locked tables in Oracle by querying with following query
select
c.owner,
c.object_name,
c.object_type,
b.sid,
b.serial#,
b.status,
b.osuser,
b.machine
from
v$locked_object a ,
v$session b,
dba_objects c
where
b.sid = a.session_id
and
a.object_id = c.object_id;
Rather than locks, I suggest you look at long-running transactions, using v$transaction. From there you can join to v$session, which should give you an idea about the UI (try the program and machine columns) as well as the user.
Look at the dba_blockers, dba_waiters and dba_locks for locking. The names should be self explanatory.
You could create a job that runs, say, once a minute and logged the values in the dba_blockers and the current active sql_id for that session. (via v$session and v$sqlstats).
You may also want to look in v$sql_monitor. This will be default log all SQL that takes longer than 5 seconds. It is also visible on the "SQL Monitoring" page in Enterprise Manager.
The below PL/SQL block finds all locked rows in a table. The other answers only find the blocking session, finding the actual locked rows requires reading and testing each row.
(However, you probably do not need to run this code. If you're having a locking problem, it's usually easier to find the culprit using GV$SESSION.BLOCKING_SESSION and other related data dictionary views. Please try another approach before you run this abysmally slow code.)
First, let's create a sample table and some data. Run this in session #1.
--Sample schema.
create table test_locking(a number);
insert into test_locking values(1);
insert into test_locking values(2);
commit;
update test_locking set a = a+1 where a = 1;
In session #2, create a table to hold the locked ROWIDs.
--Create table to hold locked ROWIDs.
create table locked_rowids(the_rowid rowid);
--Remove old rows if table is already created:
--delete from locked_rowids;
--commit;
In session #2, run this PL/SQL block to read the entire table, probe each row, and store the locked ROWIDs. Be warned, this may be ridiculously slow. In your real version of this query, change both references to TEST_LOCKING to your own table.
--Save all locked ROWIDs from a table.
--WARNING: This PL/SQL block will be slow and will temporarily lock rows.
--You probably don't need this information - it's usually good enough to know
--what other sessions are locking a statement, which you can find in
--GV$SESSION.BLOCKING_SESSION.
declare
v_resource_busy exception;
pragma exception_init(v_resource_busy, -00054);
v_throwaway number;
type rowid_nt is table of rowid;
v_rowids rowid_nt := rowid_nt();
begin
--Loop through all the rows in the table.
for all_rows in
(
select rowid
from test_locking
) loop
--Try to look each row.
begin
select 1
into v_throwaway
from test_locking
where rowid = all_rows.rowid
for update nowait;
--If it doesn't lock, then record the ROWID.
exception when v_resource_busy then
v_rowids.extend;
v_rowids(v_rowids.count) := all_rows.rowid;
end;
rollback;
end loop;
--Display count:
dbms_output.put_line('Rows locked: '||v_rowids.count);
--Save all the ROWIDs.
--(Row-by-row because ROWID type is weird and doesn't work in types.)
for i in 1 .. v_rowids.count loop
insert into locked_rowids values(v_rowids(i));
end loop;
commit;
end;
/
Finally, we can view the locked rows by joining to the LOCKED_ROWIDS table.
--Display locked rows.
select *
from test_locking
where rowid in (select the_rowid from locked_rowids);
A
-
1
Given some table, you can find which rows are not locked with SELECT FOR UPDATESKIP LOCKED.
For example, this query will lock (and return) every unlocked row:
SELECT * FROM mytable FOR UPDATE SKIP LOCKED
References
Ask TOM "How to get ROWID for locked rows in oracle".