I have an accounts table and a movements table in an Oracle 11g database. They work the way you would expect your bank account to work. A simplified version of them would be
CREATE TABLE accounts (
id NUMERIC(20) NOT NULL -- PK
);
CREATE TABLE movements (
id NUMERIC(20) NOT NULL, -- PK
account_id NUMERIC(20) NOT NULL, -- FK to accounts table
stamp TIMESTAMP NOT NULL, -- Movement creation timestamp
amount NUMERIC(20) NOT NULL,
balance NUMERIC(20) NOT NULL
);
You have an account, and some movements are secuentially created, each with a given amount. For example, I would expect the following data to be in the movements table:
| id | account_id | stamp | amount | balance |
-------------------------------------------------------------
| 1 | 1 | 2016-12-29 00:00:01 | 50.00 | 50.00 |
| 2 | 1 | 2016-12-29 00:00:02 | 80.00 | 130.00 |
| 3 | 1 | 2016-12-29 00:00:03 | -15.00 | 115.00 |
-------------------------------------------------------------
My problem is, how do I keep the balance column updated?
I'm doing the inserts inside a Stored Procedure (INSERT INTO movements ... SELECT FROM ...), so it can be done either inside the same query, in a later UPDATE, or with pure PLSQL.
I can think of two methods:
An UPDATE after the insert, something like (an idea, not tested):
UPDATE movements um
SET balance = (um.amount + (SELECT m.balance
FROM movements m
WHERE m.account_id = um.account_id
AND rownum = 1
ORDER BY stamp DESC)) -- last balance from same account?
WHERE stamp > :someDate; -- To limit the updated records
My problem with this is, does it executes in order? From the first movement to the last? Or oracle might run this without specific order, generating the scenario where, for example, the third movement gets updated before the second, so the balance from the second is still outdated?
Cursors: I could define a cursor and run a loop on the ordered list of movements, reading the previous balance of the account in each iteration, and calculating the current balance, setting it with an UPDATE.
This way I would be certain that the balances are updated in order, but I've always avoided cursors because of the performance issues. This stored procedure will work with hundreds of records each time, and the movements table will store millions of records. Will the performance become an issue this way?
My final question would be, considering performance, what is the best way to generate the balance column data?
Edit - Clarification on movements creation
I think I wasn't too clear about this part. At the moment of my SP execution, I'm creating several movements of several different accounts, that's why I mention that the movements creation is done with something like
-- Some actions
INSERT INTO movements (account_id, stamp, amount, balance)
SELECT ... FROM several_tables_with_joins;
-- More actions
That's why I mention that the balance could be generated either in the same query, in a later UPDATE or some other method like the Trigger mentioned in one of the comments.
" considering performance, what is the best way to generate the balance column data"
Usually the ongoing maintenance of summed columns after every transaction incurs a heavier cost than simply calculating them on demand. However, account balance is a special case, because we do need to know it after every transaction, to check, say, whether the account has gone into the red or exceeded an overdraft limit.
The key insight is: before we process a new movement we already know know the current balance. It's the value of BALANCE for the latest MOVEMENT record.
Ah, but how do we know which MOVEMENT record is the latest? There are various different solutions to this, but the simplest would be an ugly is_latest flag. This not only provides a simple way to get the most recent MOVEMENT record it provides a lockable target, which is important in a multi-user environment. We need to ensure that only one transaction is manipulating the balance at any given time.
So, your stored procedure will look something like:
create or replace procedure new_movement
( p_account_id in movements.account_id%type
, p_amount in movements.amount%type )
is
cursor c_curr_bal (p_acct_id movements.account_id%type) is
select balance
from movements
where account_id = p_acct_id
and is_latest = 'Y'
for update of is_latest;
l_balance movements.balance%type;
new_rec movements%rowtype;
begin
open c_curr_bal(p_account_id);
fetch c_curr_bal into l_balance;
new_rec.id := movements_seq.nextval;
new_rec.account_id := p_account_id;
new_rec.stamp := systimestamp;
new_rec.amount := p_amount;
new_rec.balance := l_balance + p_amount;
new_rec.is_latest := 'Y';
update movements
set is_latest = null
where current of c_curr_bal;
insert into movements
values new_rec;
close c_curr_bal;
commit; -- need to free the lock
end new_movement;
/
An alternative to the is_latest flag would be to maintain the current balance as a column on the ACCOUNTS table. The logic would be the same, just SELECT the ACCOUNTS table FOR UPDATE OF CURRENT_BALANCE instead.
I think I would keep the BALANCE in the ACCOUNTS table. Then when you insert your MOVEMENTS record, you update the corresponding ACCOUNT record.
I have one table name as user_count_details. There are total 3 columns in this table.
msisdn=Which uniquely defines row for one specific user
user_count= Which stores the count of user.
last_Txn_id= Which stores the last transfer id of txn which user has performed.
The user_count column of this table user_count_details is gets updated with every transaction performed by the user.
But here the logic of my system is
select sum(user_count ) from user_count_details
will always gives us the 0 and it is considered as the system is in stable state and everything is fine.
Now i want to write trigger which will check first when new request to update user_count come ,will hamper the sum(user_count )=0 or not and if it hampers that msisdn details will be captured in separate update table.
Based on your last comments, check if this works. Replace the other_table_name as per your scenario.
CREATE TRIGGER trgCheck_user_sum
BEFORE INSERT
ON user_count_details FOR EACH ROW
BEGIN
IF (select sum(user_count) from user_count_details) > 0 THEN
insert into other_table_name(msisdn) values(new.msisdn)
END IF
END
i'm quite new at oracle so i apologize in advance for the simple question.
So i have this procedure, in wich i run a query, i want to save the query result for further use, specifically i want run a for loop wich will take row by row my selection and copy some of the values in another table. The purpose is to populate a child table ( a weak entity ) starting from a parent table.
For the purpose let's imagine i have a query :
select *
from tab
where ...
now i want to save the selection with a local scope and therefore with a lifespan confined to the procedure itself ( like a local variable in a C function basically ). How can i achieve such a result ?
Basically i have a class schedule table composed like this :
Schedule
--------------------------------------------------------
subject_code | subject_name | class_starting_date | starting hour | ending hour | day_of_week
so i made a query to get all the subjects scheduled for the current accademic year, and i need to use the function next_day on each row of the result-set to populate a table of the actual classes scheduled for the next week.
My tought was :
I get the classes that need to be scheduled for the next week with a query, save the result somewhere and then trough a for loop using next_day ( because i need the actual date in wich the class take place ) populate the "class_occurence" table. I'm not sure that this is the correct way of thinking, there could be something to perform this job without saving the result first, maybe a cursor, who konws...
Global Temporary tables are a nice solution.. As long as you know the structure of the data to be inserted (how many columns and what datatype) you can insert into the global temp table. Data can only be seen by the session that does the inserts. Data can be dropped or committed by using some of the options.
CREATE GLOBAL TEMPORARY TABLE my_temp_table (
column1 NUMBER,
column2 NUMBER
) ON COMMIT DELETE ROWS;
This has worked great for me where I need to have data aggregated but only for a short period of time.
Edit: the data is local and temporary, the temp table is always there.
If you want to have the table in memory in the procedure that is another solution but somewhat more sophisticated.
I am creating some record which have id, ts ... So firstly I call select to get ts and id:
select SEQ_table.nextval, CURRENT_TIMESTAMP from dual
and then I call insert
insert into table ...id, ts ...
this works good in 99 % but sometimes when there is a big load the order of record is bad because I need record.id < (record+1).id and record.ts < (record+1).ts but this conditional is met. How I can solve this problem ? I am using oracle database.
You should not use the result of a sequence for ordering. This might look strange but think about how sequences are cached and think about RAC. Every instance has it's own sequence cache .... For performance you need big caches. sequences had better be called random unique key generators that happen to work sequenctially most of the time.
The timestamp format has a time resolution upto microsecond level. When hardware becomes quicker and load increases it could be that you get multiple rows at the same time. There is not much you can do about that, until oracle takes the resolution a step farther again.
Use an INSERT trigger to populate the id and ts columns.
create table sotest
(
id number,
ts timestamp
);
create sequence soseq;
CREATE OR REPLACE TRIGGER SOTEST_BI_TRIG BEFORE
INSERT ON SOTEST REFERENCING NEW AS NEW FOR EACH ROW
BEGIN
:new.id := soseq.nextval;
:new.ts := CURRENT_TIMESTAMP;
END;
/
PHIL#PHILL11G2 > insert into sotest values (NULL,NULL);
1 row created.
PHIL#PHILL11G2 > select * from sotest;
ID TS
---------- ----------------------------------
1 11-MAY-12 13.29.33.771515
PHIL#PHILL11G2 >
You should also pay attention to the other answer provided. Is id meant to be a meaningless primary key (it usually is in apps - it's just a key to join on)?
I have table with "varchar2" as primary key.
It has about 1 000 000 Transactions per day.
My app wakes up every 5 minute to generate text file by querying only new record.
It will remember last point and process only new records.
Do you have idea how to query with good performance?
I am able to add new column if necessary.
What do you think this process should do by?
plsql?
java?
Everyone here is really really close. However:
Scott Bailey's wrong about using a bitmap index if the table's under any sort of continuous DML load. That's exactly the wrong time to use a bitmap index.
Everyone else's answer about the PROCESSED CHAR(1) check in ('Y','N')column is right, but missing how to index it; you should use a function-based index like this:
CREATE INDEX MY_UNPROCESSED_ROWS_IDX ON MY_TABLE
(CASE WHEN PROCESSED_FLAG = 'N' THEN 'N' ELSE NULL END);
You'd then query it using the same expression:
SELECT * FROM MY_TABLE
WHERE (CASE WHEN PROCESSED_FLAG = 'N' THEN 'N' ELSE NULL END) = 'N';
The reason to use the function-based index is that Oracle doesn't write index entries for entirely NULL values being indexed, so the function-based index above will only contain the rows with PROCESSED_FLAG = 'N'. As you update your rows to PROCESSED_FLAG = 'Y', they'll "fall out" of the index.
Well, if you can add a new column, you could create a Processed column, which will indicate processed records, and create an index on this column for performance.
Then the query should only be for those rows that have been newly added, and not processed.
This should be easily done using sql queries.
Ah, I really hate to add another answer when the others have come so close to nailing it. But
As Ponies points out, Oracle does have a hidden column (ORA_ROWSCN - System Change Number) that can pinpoint when each row was modified. Unfortunately, the default is that it gets the information from the block instead of storing it with each row and changing that behavior will require you to rebuild a really large table. So while this answer is good for quieting the SQL Server fella, I'd not recommend it.
Astander is right there but needs a few caveats. Add a new column needs_processed CHAR(1) DEFAULT 'Y' and add a BITMAP index. For low cardinality columns ('Y'/'N') the bitmap index will be faster. Once you have the rest is pretty easy. But you've got to be careful not select the new rows, process them and mark them as processed in one step. Otherwise, rows could be inserted while you are processing that will get marked processed even though they have not been.
The easiest way would be to use pl/sql to open a cursor that selects unprocessed rows, processes them and then updates the row as processed. If you have an aversion to walking cursors, you could collect the pk's or rowids into a nested table, process them and then update using the nested table.
In MS SQL Server world where I work, we have a 'version' column of type 'timestamp' on our tables.
So, to answer #1, I would add a new column.
To answer #2, I would do it in plsql for performance.
Mark
"astander" pretty much did the work for you. You need to ALTER your table to add one more column (lets say PROCESSED)..
You can also consider creating an INDEX on the PROCESSED ( a bitmap index may be of some advantage, as the possible value can be only 'y' and 'n', but test it out ) so that when you query it will use INDEX.
Also if sure, you query only for every 5 mins, check whether you can add another column with TIMESTAMP type and partition the table with it. ( not sure, check out again ).
I would also think about writing job or some thing and write using UTL_FILE and show it front end if it can be.
If performance is really a problem and you want to create your file asynchronously, you might want to use Oracle Streams, which will actually get modification data from your redo log withou affecting performance of the main database. You may not even need a separate job, as you can configure Oracle Streams to do Asynchronous replication of the changes, through which you can trigger the file creation.
Why not create an extra table that holds two columns. The ID column and a processed flag column. Have an insert trigger on the original table place it's ID in this new table. Your logging process can than select records from this new table and mark them as processed. Finally delete the processed records from this table.
I'm pretty much in agreement with Adam's answer. But I'd want to do some serious testing compared to an alternative.
The issue I see is that you need to not only select the rows, but also do an update of those rows. While that should be pretty fast, I'd like to avoid the update. And avoid having any large transactions hanging around (see below).
The alternative would be to add CREATE_DATE date default sysdate. Index that. And then select records where create_date >= (start date/time of your previous select).
But I don't have enough data on the relative costs of setting a sysdate as default vs. setting a value of Y, updating the function based vs. date index, and doing a range select on the date vs. a specific select on a single value for the Y. You'll probably want to preserve stats or hint the query to use the index on the Y/N column, and definitely want to use a hint on a date column -- the stats on the date column will almost certainly be old.
If data are also being added to the table continuously, including during the period when your query is running, you need to watch out for transaction control. After all, you don't want to read 100,000 records that have the flag = Y, then do your update on 120,000, including the 20,000 that arrived when you query was running.
In the flag case, there are two easy ways: SET TRANSACTION before your select and commit after your update, or start by doing an update from Y to Q, then do your select for those that are Q, and then update to N. Oracle's read consistency is wonderful but needs to be handled with care.
For the date column version, if you don't mind a risk of processing a few rows more than once, just update your table that has the last processed date/time immediately before you do your select.
If there's not much information in the table, consider making it Index Organized.
What about using Materialized view logs? You have a lot of options to play with:
SQL> create table test (id_test number primary key, dummy varchar2(1000));
Table created
SQL> create materialized view log on test;
Materialized view log created
SQL> insert into test values (1, 'hello');
1 row inserted
SQL> insert into test values (2, 'bye');
1 row inserted
SQL> select * from mlog$_test;
ID_TEST SNAPTIME$$ DMLTYPE$$ OLD_NEW$$ CHANGE_VECTOR$$
---------- ----------- --------- --------- ---------------------
1 01/01/4000 I N FE
2 01/01/4000 I N FE
SQL> delete from mlog$_test where id_test in (1,2);
2 rows deleted
SQL> insert into test values (3, 'hello');
1 row inserted
SQL> insert into test values (4, 'bye');
1 row inserted
SQL> select * from mlog$_test;
ID_TEST SNAPTIME$$ DMLTYPE$$ OLD_NEW$$ CHANGE_VECTOR$$
---------- ----------- --------- --------- ---------------
3 01/01/4000 I N FE
4 01/01/4000 I N FE
I think this solution should work..
What you need to do following steps
For the first run, you will have to copy all records. In first run you need to execute following query
insert into new_table(max_rowid) as (Select max(rowid) from yourtable);
Now next time when you want to get only newly inserted values, you can do it by executing follwing command
Select * from yourtable where rowid > (select max_rowid from new_table);
Once you are done with processing above query, simply truncate new_table and insert max(rowid) from yourtable
I think this should work and would be fastest solution;