I have table with "varchar2" as primary key.
It has about 1 000 000 Transactions per day.
My app wakes up every 5 minute to generate text file by querying only new record.
It will remember last point and process only new records.
Do you have idea how to query with good performance?
I am able to add new column if necessary.
What do you think this process should do by?
plsql?
java?
Everyone here is really really close. However:
Scott Bailey's wrong about using a bitmap index if the table's under any sort of continuous DML load. That's exactly the wrong time to use a bitmap index.
Everyone else's answer about the PROCESSED CHAR(1) check in ('Y','N')column is right, but missing how to index it; you should use a function-based index like this:
CREATE INDEX MY_UNPROCESSED_ROWS_IDX ON MY_TABLE
(CASE WHEN PROCESSED_FLAG = 'N' THEN 'N' ELSE NULL END);
You'd then query it using the same expression:
SELECT * FROM MY_TABLE
WHERE (CASE WHEN PROCESSED_FLAG = 'N' THEN 'N' ELSE NULL END) = 'N';
The reason to use the function-based index is that Oracle doesn't write index entries for entirely NULL values being indexed, so the function-based index above will only contain the rows with PROCESSED_FLAG = 'N'. As you update your rows to PROCESSED_FLAG = 'Y', they'll "fall out" of the index.
Well, if you can add a new column, you could create a Processed column, which will indicate processed records, and create an index on this column for performance.
Then the query should only be for those rows that have been newly added, and not processed.
This should be easily done using sql queries.
Ah, I really hate to add another answer when the others have come so close to nailing it. But
As Ponies points out, Oracle does have a hidden column (ORA_ROWSCN - System Change Number) that can pinpoint when each row was modified. Unfortunately, the default is that it gets the information from the block instead of storing it with each row and changing that behavior will require you to rebuild a really large table. So while this answer is good for quieting the SQL Server fella, I'd not recommend it.
Astander is right there but needs a few caveats. Add a new column needs_processed CHAR(1) DEFAULT 'Y' and add a BITMAP index. For low cardinality columns ('Y'/'N') the bitmap index will be faster. Once you have the rest is pretty easy. But you've got to be careful not select the new rows, process them and mark them as processed in one step. Otherwise, rows could be inserted while you are processing that will get marked processed even though they have not been.
The easiest way would be to use pl/sql to open a cursor that selects unprocessed rows, processes them and then updates the row as processed. If you have an aversion to walking cursors, you could collect the pk's or rowids into a nested table, process them and then update using the nested table.
In MS SQL Server world where I work, we have a 'version' column of type 'timestamp' on our tables.
So, to answer #1, I would add a new column.
To answer #2, I would do it in plsql for performance.
Mark
"astander" pretty much did the work for you. You need to ALTER your table to add one more column (lets say PROCESSED)..
You can also consider creating an INDEX on the PROCESSED ( a bitmap index may be of some advantage, as the possible value can be only 'y' and 'n', but test it out ) so that when you query it will use INDEX.
Also if sure, you query only for every 5 mins, check whether you can add another column with TIMESTAMP type and partition the table with it. ( not sure, check out again ).
I would also think about writing job or some thing and write using UTL_FILE and show it front end if it can be.
If performance is really a problem and you want to create your file asynchronously, you might want to use Oracle Streams, which will actually get modification data from your redo log withou affecting performance of the main database. You may not even need a separate job, as you can configure Oracle Streams to do Asynchronous replication of the changes, through which you can trigger the file creation.
Why not create an extra table that holds two columns. The ID column and a processed flag column. Have an insert trigger on the original table place it's ID in this new table. Your logging process can than select records from this new table and mark them as processed. Finally delete the processed records from this table.
I'm pretty much in agreement with Adam's answer. But I'd want to do some serious testing compared to an alternative.
The issue I see is that you need to not only select the rows, but also do an update of those rows. While that should be pretty fast, I'd like to avoid the update. And avoid having any large transactions hanging around (see below).
The alternative would be to add CREATE_DATE date default sysdate. Index that. And then select records where create_date >= (start date/time of your previous select).
But I don't have enough data on the relative costs of setting a sysdate as default vs. setting a value of Y, updating the function based vs. date index, and doing a range select on the date vs. a specific select on a single value for the Y. You'll probably want to preserve stats or hint the query to use the index on the Y/N column, and definitely want to use a hint on a date column -- the stats on the date column will almost certainly be old.
If data are also being added to the table continuously, including during the period when your query is running, you need to watch out for transaction control. After all, you don't want to read 100,000 records that have the flag = Y, then do your update on 120,000, including the 20,000 that arrived when you query was running.
In the flag case, there are two easy ways: SET TRANSACTION before your select and commit after your update, or start by doing an update from Y to Q, then do your select for those that are Q, and then update to N. Oracle's read consistency is wonderful but needs to be handled with care.
For the date column version, if you don't mind a risk of processing a few rows more than once, just update your table that has the last processed date/time immediately before you do your select.
If there's not much information in the table, consider making it Index Organized.
What about using Materialized view logs? You have a lot of options to play with:
SQL> create table test (id_test number primary key, dummy varchar2(1000));
Table created
SQL> create materialized view log on test;
Materialized view log created
SQL> insert into test values (1, 'hello');
1 row inserted
SQL> insert into test values (2, 'bye');
1 row inserted
SQL> select * from mlog$_test;
ID_TEST SNAPTIME$$ DMLTYPE$$ OLD_NEW$$ CHANGE_VECTOR$$
---------- ----------- --------- --------- ---------------------
1 01/01/4000 I N FE
2 01/01/4000 I N FE
SQL> delete from mlog$_test where id_test in (1,2);
2 rows deleted
SQL> insert into test values (3, 'hello');
1 row inserted
SQL> insert into test values (4, 'bye');
1 row inserted
SQL> select * from mlog$_test;
ID_TEST SNAPTIME$$ DMLTYPE$$ OLD_NEW$$ CHANGE_VECTOR$$
---------- ----------- --------- --------- ---------------
3 01/01/4000 I N FE
4 01/01/4000 I N FE
I think this solution should work..
What you need to do following steps
For the first run, you will have to copy all records. In first run you need to execute following query
insert into new_table(max_rowid) as (Select max(rowid) from yourtable);
Now next time when you want to get only newly inserted values, you can do it by executing follwing command
Select * from yourtable where rowid > (select max_rowid from new_table);
Once you are done with processing above query, simply truncate new_table and insert max(rowid) from yourtable
I think this should work and would be fastest solution;
Related
I have a table called orders in which data is loaded through a CSV file into a loader table for every three hours. I have a column last_modified set to SYSDATE which records the insert and update on the table. Recently, I have observed that the last_modified column has null values for more than 100k records when update happens. Is there any way to fix this issue?
Merge into orders d
using (select * from ods_prm_data ) s
on (d.order_id = s.order_id)
when not matched then
insert (d.order_id ,d.ID, d.last_modified)
values (s.order_id, s.ID,s.order_seq.val,SYSDATE)
when matched then
update set d.last_modified = SYSDATE;
Take a look at the WHEN NOT MATCHED branch of the MERGE statement. I'm rather surprised that this statement even compiled because the INSERT field list (d.order_id ,d.ID, d.last_modified) specifies three fields, but the VALUES list (s.order_id, s.ID,s.order_seq.val, SYSDATE)shows FOUR values. I'm not sure how this slipped past the compiler, and I have no idea how it's being executed; however, it may help to explain the issues you're having.
Best of luck.
Friend, I have question about cascade trigger.
I have 2 tables, table data that has 3 attributes (id_data, sum, and id_tool), and table tool that has 3 attributes (id_tool, name, sum_total). table data and tool are joined using id_tool.
I want create trigger for update info sum_total. So , if I inserting on table data, sum_total on table tool where tool.id_tool = data.id_tool will updating too.
I create this trigger, but error ora-04090.
create or replace trigger aft_ins_tool
after insert on data
for each row
declare
v_stok number;
v_jum number;
begin
select sum into v_jum
from data
where id_data= :new.id_data;
select sum_total into v_stok
from tool
where id_tool=
(select id_tool
from data
where id_data= :new.id_data);
if inserting then
v_stok := v_stok + v_jum;
update tool
set sum_total=v_stok
where id_tool=
(select id_tool
from data
where id_data= :new.id_data);
end if;
end;
/
please give me opinion.
Thanks.
The ora-04090 indicates that you already have an AFTER INSERT ... FOR EACH ROW trigger on that table. Oracle doesn't like that, because the order in which the triggers fire is unpredictable, which may lead to unpredictable results, and Oracle really doesn't like those.
So, your first step is to merge the two sets of code into a single trigger. Then the real fun begins.
Presumably there is only one row in data matching the current value of id_data (if not your data model is rally messed up and there's no hope for your situation). Anyway, that means the current row already gives you access to the values of :new.sum and :new.id_tool. So you don't need those queries on the data table: removing those selects will remove the possibility of "mutating table" errors.
As a general observation, maintaining aggregate or summary tables like this is generally a bad idea. Usually it is better just to query the information when it is needed. If you really have huge volumes of data then you should use a materialized view to maintain the summary, rather than hand-rolling something.
I am creating some record which have id, ts ... So firstly I call select to get ts and id:
select SEQ_table.nextval, CURRENT_TIMESTAMP from dual
and then I call insert
insert into table ...id, ts ...
this works good in 99 % but sometimes when there is a big load the order of record is bad because I need record.id < (record+1).id and record.ts < (record+1).ts but this conditional is met. How I can solve this problem ? I am using oracle database.
You should not use the result of a sequence for ordering. This might look strange but think about how sequences are cached and think about RAC. Every instance has it's own sequence cache .... For performance you need big caches. sequences had better be called random unique key generators that happen to work sequenctially most of the time.
The timestamp format has a time resolution upto microsecond level. When hardware becomes quicker and load increases it could be that you get multiple rows at the same time. There is not much you can do about that, until oracle takes the resolution a step farther again.
Use an INSERT trigger to populate the id and ts columns.
create table sotest
(
id number,
ts timestamp
);
create sequence soseq;
CREATE OR REPLACE TRIGGER SOTEST_BI_TRIG BEFORE
INSERT ON SOTEST REFERENCING NEW AS NEW FOR EACH ROW
BEGIN
:new.id := soseq.nextval;
:new.ts := CURRENT_TIMESTAMP;
END;
/
PHIL#PHILL11G2 > insert into sotest values (NULL,NULL);
1 row created.
PHIL#PHILL11G2 > select * from sotest;
ID TS
---------- ----------------------------------
1 11-MAY-12 13.29.33.771515
PHIL#PHILL11G2 >
You should also pay attention to the other answer provided. Is id meant to be a meaningless primary key (it usually is in apps - it's just a key to join on)?
We are running into performance issue where I need some suggestions ( we are on Oracle 10g R2)
The situation is sth like this
1) It is a legacy system.
2) In some of the tables it holds data for the last 10 years ( means data was never deleted since the first version was rolled out). Now in most of the OLTP tables they are having around 30,000,000 - 40,000,000 rows.
3) Search operations on these tables is taking flat 5-6 minutes of time. ( a simple query like select count(0) from xxxxx where isActive=’Y’ takes around 6 minutes of time.) When we saw the explain plan we found that index scan is happening on isActive column.
4) We have suggested archive and purge of the old data which is not needed and team is working towards it. Even if we delete 5 years of data we are left with around 15,000,000 - 20,000,000 rows in the tables which itself is very huge, so we thought of having table portioning on these tables, but we found that the user can perform search of most of the columns of these tables from UI,so which will defeat the very purpose of table partitioning.
so what are the steps which need to be taken to improve this situation.
First of all: question why you are issuing the query select count(0) from xxxxx where isactive = 'Y' in the first place. Nine out of ten times it is a lazy way to check for existence of a record. If that's the case with you, just replace it with a query that select 1 row (rownum = 1 and a first_rows hint).
The number of rows you mention are nothing to be worried about. If your application doesn't perform well when number of rows grows, then your system is not designed to scale. I'd investigate all queries that take too long using a SQL*Trace or ASH and fix it.
By the way: nothing you mentioned justifies the term legacy, IMHO.
Regards,
Rob.
Just a few observations:
I'm guessing that the "isActive" column can have two values - 'Y' and 'N' (or perhaps 'Y', 'N', and NULL - although why in the name of Fred there wouldn't be a NOT NULL constraint on such a column escapes me). If this is the case an index on this column would have very poor selectivity and you might be better off without it. Try dropping the index and re-running your query.
#RobVanWijk's comment about use of SELECT COUNT(*) is excellent. ONLY ask for a row count if you really need to have the count; if you don't need the count, I've found it's faster to do a direct probe (SELECT whatever FROM wherever WHERE somefield = somevalue) with an apprpriate exception handler than it is to do a SELECT COUNT(*). In the case you cited, I think it would be better to do something like
BEGIN
SELECT IS_ACTIVE
INTO strIsActive
FROM MY_TABLE
WHERE IS_ACTIVE = 'Y';
bActive_records_found := TRUE;
EXCEPTION
WHEN NO_DATA_FOUND THEN
bActive_records_found := FALSE;
WHEN TOO_MANY_ROWS THEN
bActive_records_found := TRUE;
END;
As to partitioning - partitioning can be effective at reducing query times IF the field on which the table is partitioned is used in all queries. For example, if a table is partitioned on the TRANSACTION_DATE variable, then for the partitioning to make a difference all queries against this table would have to have a TRANSACTION_DATE test in the WHERE clause. Otherwise the database will have to search each partition to satisfy the query, so I doubt any improvements would be noted.
Share and enjoy.
We are trying to copy the current row of a table to mirror table by using a trigger before delete / update. Below is the working query
BEFORE UPDATE OR DELETE
ON CurrentTable FOR EACH ROW
BEGIN
INSERT INTO MirrorTable
( EMPFIRSTNAME,
EMPLASTNAME,
CELLNO,
SALARY
)
VALUES
( :old.EMPFIRSTNAME,
:old.EMPLASTNAME,
:old.CELLNO,
:old.SALARY
);
END;
But the problem is we have more than 50 coulmns in the current table and dont want to mention all those column names. Is there a way to select all coulmns like
:old.*
SELECT * INTO MirrorTable FROM CurrentTable
Any suggestions would be helpful.
Thanks,
Realistically, no. You'll need to list all the columns.
You could, of course, dynamically generate the trigger code pulling the column names from DBA_TAB_COLUMNS. But that is going to be dramatically more work than simply typing in 50 column names.
If your table happens to be an object table, :new would be an instance of that object so you could insert that. But it would be rather rare to have an object table.
If your 'current' and 'mirror' tables have EXACTLY the same structure you may be able to use something like
INSERT INTO MirrorTable
SELECT *
FROM CurrentTable
WHERE CurrentTable.primary_key_column = :old.primary_key_column
Honestly, I think that this is a poor choice and wouldn't do it, but it's a more-or-less free world and you're free (more or less :-) to make your own choices.
Share and enjoy.
For what it's worth, I've been writing the same stuff and used this to generate the code:
SQL> set pagesize 0
SQL> select ':old.'||COLUMN_NAME||',' from all_tab_columns where table_name='BIGTABLE' and owner='BOB';
:old.COL1,
:old.COL2,
:old.COL3,
:old.COL4,
:old.COL5,
...
If you feed all columns, no need to mention them twice (and you may use NULL for empty columns):
INSERT INTO bigtable VALUES (
:old.COL1,
:old.COL2,
:old.COL3,
:old.COL4,
:old.COL5,
NULL,
NULL);
people writing tables with that many columns should have no desserts ;-)