I have an accounts table and a movements table in an Oracle 11g database. They work the way you would expect your bank account to work. A simplified version of them would be
CREATE TABLE accounts (
id NUMERIC(20) NOT NULL -- PK
);
CREATE TABLE movements (
id NUMERIC(20) NOT NULL, -- PK
account_id NUMERIC(20) NOT NULL, -- FK to accounts table
stamp TIMESTAMP NOT NULL, -- Movement creation timestamp
amount NUMERIC(20) NOT NULL,
balance NUMERIC(20) NOT NULL
);
You have an account, and some movements are secuentially created, each with a given amount. For example, I would expect the following data to be in the movements table:
| id | account_id | stamp | amount | balance |
-------------------------------------------------------------
| 1 | 1 | 2016-12-29 00:00:01 | 50.00 | 50.00 |
| 2 | 1 | 2016-12-29 00:00:02 | 80.00 | 130.00 |
| 3 | 1 | 2016-12-29 00:00:03 | -15.00 | 115.00 |
-------------------------------------------------------------
My problem is, how do I keep the balance column updated?
I'm doing the inserts inside a Stored Procedure (INSERT INTO movements ... SELECT FROM ...), so it can be done either inside the same query, in a later UPDATE, or with pure PLSQL.
I can think of two methods:
An UPDATE after the insert, something like (an idea, not tested):
UPDATE movements um
SET balance = (um.amount + (SELECT m.balance
FROM movements m
WHERE m.account_id = um.account_id
AND rownum = 1
ORDER BY stamp DESC)) -- last balance from same account?
WHERE stamp > :someDate; -- To limit the updated records
My problem with this is, does it executes in order? From the first movement to the last? Or oracle might run this without specific order, generating the scenario where, for example, the third movement gets updated before the second, so the balance from the second is still outdated?
Cursors: I could define a cursor and run a loop on the ordered list of movements, reading the previous balance of the account in each iteration, and calculating the current balance, setting it with an UPDATE.
This way I would be certain that the balances are updated in order, but I've always avoided cursors because of the performance issues. This stored procedure will work with hundreds of records each time, and the movements table will store millions of records. Will the performance become an issue this way?
My final question would be, considering performance, what is the best way to generate the balance column data?
Edit - Clarification on movements creation
I think I wasn't too clear about this part. At the moment of my SP execution, I'm creating several movements of several different accounts, that's why I mention that the movements creation is done with something like
-- Some actions
INSERT INTO movements (account_id, stamp, amount, balance)
SELECT ... FROM several_tables_with_joins;
-- More actions
That's why I mention that the balance could be generated either in the same query, in a later UPDATE or some other method like the Trigger mentioned in one of the comments.
" considering performance, what is the best way to generate the balance column data"
Usually the ongoing maintenance of summed columns after every transaction incurs a heavier cost than simply calculating them on demand. However, account balance is a special case, because we do need to know it after every transaction, to check, say, whether the account has gone into the red or exceeded an overdraft limit.
The key insight is: before we process a new movement we already know know the current balance. It's the value of BALANCE for the latest MOVEMENT record.
Ah, but how do we know which MOVEMENT record is the latest? There are various different solutions to this, but the simplest would be an ugly is_latest flag. This not only provides a simple way to get the most recent MOVEMENT record it provides a lockable target, which is important in a multi-user environment. We need to ensure that only one transaction is manipulating the balance at any given time.
So, your stored procedure will look something like:
create or replace procedure new_movement
( p_account_id in movements.account_id%type
, p_amount in movements.amount%type )
is
cursor c_curr_bal (p_acct_id movements.account_id%type) is
select balance
from movements
where account_id = p_acct_id
and is_latest = 'Y'
for update of is_latest;
l_balance movements.balance%type;
new_rec movements%rowtype;
begin
open c_curr_bal(p_account_id);
fetch c_curr_bal into l_balance;
new_rec.id := movements_seq.nextval;
new_rec.account_id := p_account_id;
new_rec.stamp := systimestamp;
new_rec.amount := p_amount;
new_rec.balance := l_balance + p_amount;
new_rec.is_latest := 'Y';
update movements
set is_latest = null
where current of c_curr_bal;
insert into movements
values new_rec;
close c_curr_bal;
commit; -- need to free the lock
end new_movement;
/
An alternative to the is_latest flag would be to maintain the current balance as a column on the ACCOUNTS table. The logic would be the same, just SELECT the ACCOUNTS table FOR UPDATE OF CURRENT_BALANCE instead.
I think I would keep the BALANCE in the ACCOUNTS table. Then when you insert your MOVEMENTS record, you update the corresponding ACCOUNT record.
Related
Consider the following scenario:
We have a function (let's call it service_cost) that performs some sort of computations.
In that computations we also use a variable (say current_fee) witch has a certain value at a given time (we get the value of that variable from an auxiliary table - fee_table).
Now current_fee could stay the same for 4 months, then it changes and obtains a new value, and so on and so forth. Of course I would like to know the current fee, but also should be able to find out the fee that was 'active' days, months, years before...
So, one way of organizing the the fee_table is
create table fee_table (
id number,
valid_from date,
valid_to date,
fee number
)
And then at any given time - if I want to get the current fee I would:
select fee into current_fee form
fee_table where trunc(sysdate) between valid_from and valid_to;
What I don't like about the solution above, is that it is easy to create inconsistent entries into fee_table - like:
-overlapping time periods (valid_from-valid_to) e.g. (1/1/2012 - 1/2/2012) and (15/1/2012-5/2012)
-no entry for current period
-holes in between the periods e.g. ([1/1/2012-1/2/2012],[1/4/2012-1/5/2012])
etc.
Could anyone suggest a better way to handle such a scenario?
Or may be - if we stick with the above scenario - some kind of constraints, check, triggers etc upon the table to avoid the inconsistencies described?
Thanks.
Thank you for all the comments above. So based on #Alex Pool and #William Robertson.
I am leaning towards the following solution:
The table
create table fee_table (
id number unique,
valid_from date unique,
fee number
)
The Data:
insert into fee_table_todel(tid, valid_from,fee) values (1,to_date('1/1/2014','dd/mm/rrrr'), 30.5);
insert into fee_table_todel(tid, valid_from,fee) values (2,to_date('3/2/2014','dd/mm/rrrr'), 20.5);
insert into fee_table_todel(tid, valid_from,fee) values (3,to_date('4/4/2014','dd/mm/rrrr'), 10);
The select:
with from_to_table as (
SELECT tid, valid_from, LEAD(valid_from, 1, null) OVER (ORDER BY
valid_from)-1 AS valid_to,fee
FROM fee_table
)
select fee from from_to_table
where to_date(:mydate,'dd/mm/rrrr') between valid_from and nvl(valid_to,to_date(:mydate,'dd/mm/rrrr')+1)
I want to get the row ID or record ID for last inserted record in the table in Trafodion.
Example:
1 | John <br/>
2 | Michael
When executing an INSERT statement, I want to return the created ID, means 3.
Could anyone tell me how to do that using trafodion or is it not possible ?
Are you using a sequence generator to generate unique ids for this table? Something like this:
create table idcol (a largeint generated always as identity not null,
b int,
primary key(a desc));
Either way, with or without sequence generator, you could get the highest key with this statement:
select max(a) from idcol;
The problem is that this statement could be very inefficient. Trafodion has a built-in optimization to read the min of a key column, but it doesn't use the same optimization for the max value, because HBase didn't have a reverse scan until recently. We should make use of the reverse scan, please feel free to file a JIRA. To make this more efficient with the current code, I added a DESC to the primary key declaration. With a descending key, getting the max key will be very fast:
explain select max(a) from idcol;
However, having the data grow from higher to lower values might cause issues in HBase, I'm not sure whether this is a problem or not.
Here is yet another solution: Use the Trafodion feature that allows you to select the inserted data, showing you the inserted values right away:
select * from (insert into idcol(b) values (11),(12),(13)) t(a,b);
A B
-------------------- -----------
1 11
2 12
3 13
--- 3 row(s) selected.
I am creating some record which have id, ts ... So firstly I call select to get ts and id:
select SEQ_table.nextval, CURRENT_TIMESTAMP from dual
and then I call insert
insert into table ...id, ts ...
this works good in 99 % but sometimes when there is a big load the order of record is bad because I need record.id < (record+1).id and record.ts < (record+1).ts but this conditional is met. How I can solve this problem ? I am using oracle database.
You should not use the result of a sequence for ordering. This might look strange but think about how sequences are cached and think about RAC. Every instance has it's own sequence cache .... For performance you need big caches. sequences had better be called random unique key generators that happen to work sequenctially most of the time.
The timestamp format has a time resolution upto microsecond level. When hardware becomes quicker and load increases it could be that you get multiple rows at the same time. There is not much you can do about that, until oracle takes the resolution a step farther again.
Use an INSERT trigger to populate the id and ts columns.
create table sotest
(
id number,
ts timestamp
);
create sequence soseq;
CREATE OR REPLACE TRIGGER SOTEST_BI_TRIG BEFORE
INSERT ON SOTEST REFERENCING NEW AS NEW FOR EACH ROW
BEGIN
:new.id := soseq.nextval;
:new.ts := CURRENT_TIMESTAMP;
END;
/
PHIL#PHILL11G2 > insert into sotest values (NULL,NULL);
1 row created.
PHIL#PHILL11G2 > select * from sotest;
ID TS
---------- ----------------------------------
1 11-MAY-12 13.29.33.771515
PHIL#PHILL11G2 >
You should also pay attention to the other answer provided. Is id meant to be a meaningless primary key (it usually is in apps - it's just a key to join on)?
I have a table that contains a history of costs by location. These are updated on a monthly basis.
For example
Location1, $500, 01-JAN-2009
Location1, $650, 01-FEB-2009
Location1, $2000, 01-APR-2009
if I query for March 1, I want to return the value for Feb 1, since March 1 does not exist.
I've written a query using an oracle analytic, but that takes too much time (it would be fine for a report, but we are using this to allow the user to see the data visually through the front and and switch dates, requerying takes too long as the table is something like 1 million rows).
So, the next thought I had was to simply update the table with the missing data. In the case above, I'd simply add in a record identical to 01-FEB-2009 except set the date to 01-MAR-2009.
I was wondering if you all had thoughts on how to best do this.
My plan had been to simply create a cursor for a location, fetch the first record, then fetch the next, and if the next record was not for the next month, insert a record for the missing month.
A little more information:
CREATE TABLE MAXIMO.FCIHIST_BY_MONTH
(
LOCATION VARCHAR2(8 BYTE),
PARKALPHA VARCHAR2(4 BYTE),
LO2 VARCHAR2(6 BYTE),
FLO3 VARCHAR2(1 BYTE),
REGION VARCHAR2(4 BYTE),
AVG_DEFCOST NUMBER,
AVG_CRV NUMBER,
FCIDATE DATE
)
And then the query I'm using (the system will pass in the date and the parkalpha). The table is approx 1 million rows, and, again, while it takes a reasonable amount of time for a report, it takes way too long for an interactive display
select location, avg_defcost, avg_crv, fcimonth, fciyear,fcidate from
(select location, avg_defcost, avg_crv, fcimonth, fciyear, fcidate,
max(fcidate) over (partition by location) my_max_date
from FCIHIST_BY_MONTH
where fcidate <='01-DEC-2008'
and parkalpha='SAAN'
)
where fcidate=my_max_date;
The best way to do this is to create a PL/SQL stored procedure that works backwards from the present and runs queries that fail to return data. Each month that it fails to return data it inserts a row for the missing data.
create or replace PROCEDURE fill_in_missing_data IS
cursor have_data_on_date is
select locaiton, trunc(date_filed) have_date
from the_table
group by location, trunc(date_field)
order by desc 1
;
a_date date;
day_offset number;
n_days_to_insert number;
BEGIN
a_date := trunc(sysdate);
for r1 in fill_in_missing_data loop
if r1.have_date < a_date then
-- insert dates in a loop
n_days_to_insert := a_date - r1.have_date; -- Might be off by 1, need to test.
for day_offset in 1 .. n_days_to_insert loop
-- insert missing day
insert into the_table ( location, the_date, amount )
values ( r1.location, a_date-day_offset, 0 );
end loop;
end if;
a_date := r1.have_date;
-- this is a little tricky - I am going to test this and update it in a few minutes
end loop;
END;
Filling in the missing data will (if you are careful) make the queries much simpler and run faster.
I would also add a flag to the table to indicate that the data is missing data filled in so that if
you need to remove it (or create a view without it) later you can.
I have filled in missing data and also filled in dummy data so that outer join were not necessary so as to improve query performance a number of times. It is not "clean" and "perfect" but I follow Leflar's #1 Law, "always go with what works."
You can create a job in Oracle that will automatically run at off-peak times to fill in the missing data. Take a look at: This question on stackoverflow about creating jobs.
What is your precise use case underlying this request?
In every system I have worked on, if there is supposed to be a record for MARCH and there isn't a record for MARCH the users would like to know that fact. Apart from anything they might want to investigate why the MARCH record is missing.
Now if this is basically a performance issue then you ought to tune the query. Or if it presentation issue - you want to generate a matrix of twelve rows and that is difficult if a doesn't have a record for some reason - then that is a different matter, with a variety of possible solutions.
But seriously, I think it is a bad practice for the database to invent replacements for missing records.
edit
I see from your recent comment on your question that is did turn out to be a performance issue - indexes fixed the problem. So I feel vindicated.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I enjoyed the answers and questions about hidden features in sql server
What can you tell us about Oracle?
Hidden tables, inner workings of ..., secret stored procs, package that has good utils...
Since Apex is now part of every Oracle database, these Apex utility functions are useful even if you aren't using Apex:
SQL> declare
2 v_array apex_application_global.vc_arr2;
3 v_string varchar2(2000);
4 begin
5
6 -- Convert delimited string to array
7 v_array := apex_util.string_to_table('alpha,beta,gamma,delta', ',');
8 for i in 1..v_array.count
9 loop
10 dbms_output.put_line(v_array(i));
11 end loop;
12
13 -- Convert array to delimited string
14 v_string := apex_util.table_to_string(v_array,'|');
15 dbms_output.put_line(v_string);
16 end;
17 /
alpha
beta
gamma
delta
alpha|beta|gamma|delta
PL/SQL procedure successfully completed.
"Full table scans are not always bad. Indexes are not always good."
An index-based access method is less efficient at reading rows than a full scan when you measure it in terms of rows accessed per unit of work (typically per logical read). However many tools will interpret a full table scan as a sign of inefficiency.
Take an example where you are reading a few hundred invoices frmo an invoice table and looking up a payment method in a small lookup table. Using an index to probe the lookup table for every invoice probably means three or four logical io's per invoice. However, a full scan of the lookup table in preparation for a hash join from the invoice data would probably require only a couple of logical reads, and the hash join itself would cmoplete in memory at almost no cost at all.
However many tools would look at this and see "full table scan", and tell you to try to use an index. If you do so then you may have just de-tuned your code.
Incidentally over reliance on indexes, as in the above example, causes the "Buffer Cache Hit Ratio" to rise. This is why the BCHR is mostly nonsense as a predictor of system efficiency.
The cardinality hint is mostly undocumented.
explain plan for
select /*+ cardinality(#inner 5000) */ *
from (select /*+ qb_name(inner) */ * from dual)
/
select * from table(dbms_xplan.display)
/
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5000 | 10000 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS FULL| DUAL | 1 | 2 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------
The Buffer Cache Hit Ratio is virtually meaningless as a predictor of system efficiency
You can view table data as of a previous time using Flashback Query, with certain limitations.
Select *
from my_table as of timestamp(timestamp '2008-12-01 15:21:13')
11g has a whole new feature set around preserving historical changes more robustly.
Frequent rebuilding of indexes is almost always a waste of time.
wm_concat works like the the MySql group_concat but it is undocumented.
with data:
-car- -maker-
Corvette Chevy
Taurus Ford
Impala Chevy
Aveo Chevy
select wm_concat(car) Cars, maker from cars
group by maker
gives you:
-Cars- -maker-
Corvette, Impala, Aveo Chevy
Taurus Ford
The OVERLAPS predicate is undocumented.
http://oraclesponge.wordpress.com/2008/06/12/the-overlaps-predicate/
I just found out about the pseudo-column Ora_rowSCN. If you don't set your table up for this, this pcolumn gives you the block SCN. This could be really useful for the emergency, "Oh crap I have no auditing on this table and wonder if someone has changed the data since yesterday."
But even better is if you create the table with Rowdependecies ON. That puts the SCN of the last change on every row. This will help you avoid a "Lost Edit" problem without having to include every column in your query.
IOW, when you app grabs a row for user modification, also select the Ora_rowscn. Then when you post the user's edits, include Ora_rowscn = v_rscn in addition to the unique key in the where clause. If someone has touched the row since you grabbed it, aka lost edit, the update will match zero rows since the ora_rowscn will have changed.
So cool.
If you get the value of PASSWORD column on DBA_USERS you can backup/restore passwords without knowing them:
ALTER USER xxx IDENTIFIED BY VALUES 'xxxx';
Bypass the buffer cache and read straight from disk using direct path reads.
alter session set "_serial_direct_read"=true;
Causes a tablespace (9i) or fast object (10g+) checkpoint, so careful on busy OLTP systems.
More undocumented stuff at http://awads.net/wp/tag/undocumented/
Warning: Use at your own risk.
I don't know if this counts as hidden, but I was pretty happy when I saw this way of quickly seeing what happened with a SQL statement you are tuning.
SELECT /*+ GATHER_PLAN_STATISTICS */ * FROM DUAL;
SELECT * FROM TABLE(dbms_xplan.display_cursor( NULL, NULL, 'RUNSTATS_LAST'))
;
PLAN_TABLE_OUTPUT
-----------------------------------------------------
SQL_ID 5z36y0tq909a8, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */ * FROM DUAL
Plan hash value: 272002086
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
---------------------------------------------------------------------------------------------
| 1 | TABLE ACCESS FULL| DUAL | 1 | 1 | 1 |00:00:00.02 | 3 | 2 |
---------------------------------------------------------------------------------------------
12 rows selected.
Where:
E-Rows is estimated rows.
A-Rows is actual rows.
A-Time is actual time.
Buffers is actual buffers.
Where the estimated plan varies from the actual execution by orders of magnitude, you know you have problems.
Not a hidden feature, but Finegrained-access-control (FGAC), also known as row-level security, is something I have used in the past and was impressed with the efficiency of its implementation. If you are looking for something that guarantees you can control the granularity of how rows are exposed to users with differing permissions - regardless of the application that is used to view data (SQL*Plus as well as your web app) - then this a gem.
The built-in fulltext indexing is more widely documented, but still stands out because of its stability (just try running a full-reindexing of fulltext-indexed columns on similar data samples on MS-SQL and Oracle and you'll see the speed difference).
WITH Clause
Snapshot tables. Also found in Oracle Lite, and extremely useful for rolling your own replication mechanism.
#Peter
You can actually bind a variable of type "Cursor" in TOAD, then use it in your statement and it will display the results in the result grid.
exec open :cur for select * from dual;
Q: How to call a stored with a cursor from TOAD?
A: Example, change to your cursor, packagename and stored proc name
declare cursor PCK_UTILS.typ_cursor;
begin
PCK_UTILS.spc_get_encodedstring(
'U',
10000002,
null,
'none',
cursor);
end;
The Model Clause (available for Oracle 10g and up)
WM_CONCAT for string aggregation
Scalar subquery caching is one of the most surprising features in Oracle
-- my_function is NOT deterministic but it is cached!
select t.x, t.y, (select my_function(t.x) from dual)
from t
-- logically equivalent to this, uncached
select t.x, t.y, my_function(t.x) from t
The "caching" subquery above evaluates my_function(t.x) only once per unique value of t.x. If you have large partitions of the same t.x value, this will immensely speed up your queries, even if my_function is not declared DETERMINISTIC. Even if it was DETERMINISTIC, you can safe yourself a possibly expensive SQL -> PL/SQL context switch.
Of course, if my_function is not a deterministic function, then this can lead to wrong results, so be careful!