Best way to store parameter that changes over the course of time - oracle

Consider the following scenario:
We have a function (let's call it service_cost) that performs some sort of computations.
In that computations we also use a variable (say current_fee) witch has a certain value at a given time (we get the value of that variable from an auxiliary table - fee_table).
Now current_fee could stay the same for 4 months, then it changes and obtains a new value, and so on and so forth. Of course I would like to know the current fee, but also should be able to find out the fee that was 'active' days, months, years before...
So, one way of organizing the the fee_table is
create table fee_table (
id number,
valid_from date,
valid_to date,
fee number
)
And then at any given time - if I want to get the current fee I would:
select fee into current_fee form
fee_table where trunc(sysdate) between valid_from and valid_to;
What I don't like about the solution above, is that it is easy to create inconsistent entries into fee_table - like:
-overlapping time periods (valid_from-valid_to) e.g. (1/1/2012 - 1/2/2012) and (15/1/2012-5/2012)
-no entry for current period
-holes in between the periods e.g. ([1/1/2012-1/2/2012],[1/4/2012-1/5/2012])
etc.
Could anyone suggest a better way to handle such a scenario?
Or may be - if we stick with the above scenario - some kind of constraints, check, triggers etc upon the table to avoid the inconsistencies described?
Thanks.

Thank you for all the comments above. So based on #Alex Pool and #William Robertson.
I am leaning towards the following solution:
The table
create table fee_table (
id number unique,
valid_from date unique,
fee number
)
The Data:
insert into fee_table_todel(tid, valid_from,fee) values (1,to_date('1/1/2014','dd/mm/rrrr'), 30.5);
insert into fee_table_todel(tid, valid_from,fee) values (2,to_date('3/2/2014','dd/mm/rrrr'), 20.5);
insert into fee_table_todel(tid, valid_from,fee) values (3,to_date('4/4/2014','dd/mm/rrrr'), 10);
The select:
with from_to_table as (
SELECT tid, valid_from, LEAD(valid_from, 1, null) OVER (ORDER BY
valid_from)-1 AS valid_to,fee
FROM fee_table
)
select fee from from_to_table
where to_date(:mydate,'dd/mm/rrrr') between valid_from and nvl(valid_to,to_date(:mydate,'dd/mm/rrrr')+1)

Related

Best way to create tables for huge data using oracle

Functional requirement
We kinda work for devices. Each device, roughly speaking, has its unique identifier, an IP address, and a type.
I have a routine that pings all devices that has an IP address.
This routine is nothing more than a C# console application, which runs every 3 minutes trying to ping the IP address of each device.
The result of the ping I need to store in the database, as well as the date of verification (regardless of the result of the ping).
Then we got into the technical side.
Technical part:
Assuming my ping and bank structuring process is ready from the day 01/06/2016, I need to do two things:
Daily extraction
Extraction in real time (last 24 hours)
Both should return the same thing:
Devices that are unavailable for more than 24 hours.
Devices that are unavailable for more than 7 days.
Understood to be unavailable the device to be pinged AND did not responded.
It is understood by available device to be pinged AND answered successfully.
What I have today and works very badly:
A table with the following structure:
create table history (id_device number, response number, date date);
This table has a large amount of data (now has 60 million, but the trend is always grow exponentially)
** Here are the questions: **
How to achieve these objectives without encountering problems of slowness in queries?
How to create a table structure that is prepared to receive millions / billions of records within my corporate world?
Partition the table based on date.
For partitioning strategy consider performance vs maintanence.
For easy mainanence use automatic INTERVAL partitions by month or week.
You can even do it by day or manually pre-define 2 day intervals.
You query only needs 2 calendar days.
select id_device,
min(case when response is null then 'N' else 'Y' end),
max(case when response is not null then date end)
from history
where date > sysdate - 1
group by id_device
having min(case when response is null then 'N' else 'Y' end) = 'N'
and sysdate - max(case when response is not null then date end) > ?;
If for missing responses you write a default value instead of NULL, you may try building it as an index-organized table.
You need to read about Oracle partitioning.
This statement will create your HISTORY table partitioned by calendar day.
create table history (id_device number, response number, date date)
PARTITION BY RANGE (date)
INTERVAL(NUMTOYMINTERVAL(1, 'DAY'))
( PARTITION p0 VALUES LESS THAN (TO_DATE('5-24-2016', 'DD-MM-YYYY')),
PARTITION p1 VALUES LESS THAN (TO_DATE('5-25-2016', 'DD-MM-YYYY'));
All your old data will be in P0 partition.
Starting 5/24/2016 a new partition will be automatically created each day.
HISTORY now is a single logical object but physically it is a collection of identical tables stacked on top of each other.
Because each partitions data is stored separately, when a query asks for one day worth of data, only a single partition needs to be scanned.

How to guarantee order of primary key and timestamp in oracle database

I am creating some record which have id, ts ... So firstly I call select to get ts and id:
select SEQ_table.nextval, CURRENT_TIMESTAMP from dual
and then I call insert
insert into table ...id, ts ...
this works good in 99 % but sometimes when there is a big load the order of record is bad because I need record.id < (record+1).id and record.ts < (record+1).ts but this conditional is met. How I can solve this problem ? I am using oracle database.
You should not use the result of a sequence for ordering. This might look strange but think about how sequences are cached and think about RAC. Every instance has it's own sequence cache .... For performance you need big caches. sequences had better be called random unique key generators that happen to work sequenctially most of the time.
The timestamp format has a time resolution upto microsecond level. When hardware becomes quicker and load increases it could be that you get multiple rows at the same time. There is not much you can do about that, until oracle takes the resolution a step farther again.
Use an INSERT trigger to populate the id and ts columns.
create table sotest
(
id number,
ts timestamp
);
create sequence soseq;
CREATE OR REPLACE TRIGGER SOTEST_BI_TRIG BEFORE
INSERT ON SOTEST REFERENCING NEW AS NEW FOR EACH ROW
BEGIN
:new.id := soseq.nextval;
:new.ts := CURRENT_TIMESTAMP;
END;
/
PHIL#PHILL11G2 > insert into sotest values (NULL,NULL);
1 row created.
PHIL#PHILL11G2 > select * from sotest;
ID TS
---------- ----------------------------------
1 11-MAY-12 13.29.33.771515
PHIL#PHILL11G2 >
You should also pay attention to the other answer provided. Is id meant to be a meaningless primary key (it usually is in apps - it's just a key to join on)?

Storing weekly and monthly aggregates in Oracle

I need to dynamically update weekly and monthly sales data per product and customer. These need to be updated and checked during the sale of a product, and for various reasons I'm not able to use stored procedures or materialized views for this (I'll read everything into the application, modify everything in memory and then update and commit the results).
What is the best table structure for holding the sales during a period?
Store the period type (M, W) with start and end dates, or just the type and start date?
Use date fields and a char, or code it into a string ('M201201' / 'W201248')
Normalize sales and periods into two tables, or keep both sales and the period in a single table?
I will be doing two kinds of queries - select the sales of the current weekly (xor monthly) period/customer/article but not update them, and select for update weekly and monthly periods for a customer/article.
If you store both the start and end dates of the applicable period in the row, then your retrieval queries will be much easier, at least the ones that are based on a single date (like today). This is a very typical mode of access since you are probably going to be looking at things from the perspective of a business transaction (like a specific sale) which happens on a given date.
It is very direct and simple to say where #date_of_interest >= start_date and #date_of_interest <= end_date. Any other combination requires you to do date arithmetic either in code before you go to your query or within your query itself.
Keeping a type code (M, W) as well as both start and end dates entails introducing some redundancy. However, you might choose to introduce this redundancy for the sake of easing data retrieval. This: where #date_of_interest >= start_date and #date_of_interest <= end_date and range_type='M' is also very direct and simple.
As with all denormalization, you need to ensure that you have controls that will manage this redundancy.
I would recommend you to use a normalized schema for that purpose where you store weekly and monthly aggregation in two different tables with the same structure. I don't know the kind of queries you're going to do, but I suppose that this would make it easier to do any sort of search (when it's done in the right way!!!).
Probably something like this sample
product_prices (
prod_code,
price,
date_price_begin
);
sales (
prod_code,
customer_code,
sale_date
);
<aggregate-week>
select trunc(sale_date,'w') as week,
prod_code,
customer_code,
sum(price) keep (dense_rank first order by date_price_start) as price
from sales
natural join product_prices
where sale_date > date_from
group by trunc(sale_date,'iw'),
prod_code,
customer_code
/
<aggregate-month>
select trunc(sale_date,'m') as month,
prod_code,
customer_code,
sum(price) keep (dense_rank first order by date_price_start) as price
from sales
natural join product_prices
where sale_date > date_from
group by trunc(sale_date,'m'),
prod_code,
customer_code
/

Interesting Oracle analytic query challenge

I'm fairly experienced with Oracle analytic functions but this one has stumped me. I'll kick myself if there's an obvious solution :)
I have a table, JOURNAL, which records Inserts, Updates and Deletes on another table.
The table that it's a journal for is BOND_PAYMENTS, which represents links between PAYMENTS and BONDS; it stores the amount of money (AMOUNT) that was allocated to a particular bond (identified by BOND_NUMBER) from a particular payment (identified by PAYMENT_ID). In addition, it records what aspect of the bond it was allocated to (BOP_DOMAIN) which might be 'BON', 'PET', or some other code. The BOND_PAYMENTS table has a surrogate key (BOP_ID).
Therefore, my journal table will typically have 1 or more records for each BOP_ID - firstly, an INSert, followed perhaps by some UPDates, followed perhaps by a DELete.
Here is the JOURNAL table:
CREATE TABLE JOURNAL
( JN_DATE_TIME DATE NOT NULL,
JN_OPERATION VARCHAR2(3) NOT NULL,
BOP_ID NUMBER(9) NOT NULL,
PAYMENT_ID NUMBER(9) NOT NULL,
BOND_NUMBER VARCHAR2(20) NOT NULL,
BOP_DOMAIN VARCHAR2(10) NOT NULL,
AMOUNT NUMBER(14,2) NOT NULL
);
Here is some sample data:
INSERT INTO JOURNAL VALUES (TO_DATE('01/01/2010','DD/MM/YYYY'),'INS',1242043,1003700,'9995/10','BON',1800);
INSERT INTO JOURNAL VALUES (TO_DATE('03/01/2010','DD/MM/YYYY'),'INS',1242046,1003700,'9998/10','BON',1700);
INSERT INTO JOURNAL VALUES (TO_DATE('04/01/2010','DD/MM/YYYY'),'INS',1242048,1003700,'9999/10','BON',1800);
INSERT INTO JOURNAL VALUES (TO_DATE('05/01/2010','DD/MM/YYYY'),'INS',1242052,1003700,'10003/10','BON',1600);
INSERT INTO JOURNAL VALUES (TO_DATE('08/01/2010','DD/MM/YYYY'),'INS',1242058,1003700,'9998/10','BON',100);
INSERT INTO JOURNAL VALUES (TO_DATE('09/01/2010','DD/MM/YYYY'),'UPD',1242058,1003700,'9998/10','PET',100);
INSERT INTO JOURNAL VALUES (TO_DATE('01/01/2010','DD/MM/YYYY'),'INS',2242043,1003701,'8995/10','BON',1800);
INSERT INTO JOURNAL VALUES (TO_DATE('02/01/2010','DD/MM/YYYY'),'INS',2242046,1003701,'8998/10','BON',1700);
INSERT INTO JOURNAL VALUES (TO_DATE('03/01/2010','DD/MM/YYYY'),'INS',2242048,1003701,'8999/10','BON',1800);
INSERT INTO JOURNAL VALUES (TO_DATE('04/01/2010','DD/MM/YYYY'),'INS',2242058,1003701,'8998/10','BON',100);
INSERT INTO JOURNAL VALUES (TO_DATE('05/01/2010','DD/MM/YYYY'),'UPD',2242046,1003701,'8998/10','BON',1500);
INSERT INTO JOURNAL VALUES (TO_DATE('06/01/2010','DD/MM/YYYY'),'INS',2242052,1003701,'9003/10','BON',1600);
INSERT INTO JOURNAL VALUES (TO_DATE('07/01/2010','DD/MM/YYYY'),'UPD',2242058,1003701,'8998/10','PET',200);
Now, I need to extract a full set of data from this journal table but in a slightly different format. The main requirement is that we don't want the journal table to record BOP_DOMAIN anymore - it's just not required.
I need to generate a history of the total amount for each BOND_PAYMENT record. I cannot use the BOND_PAYMENT table itself because it only shows the latest status of each record. I need to mine this info from the journal.
I can't just take a SUM(amount) over(partition by payment_id, bond_number) because an individual BOP_ID might be updated several times; so at any one moment in time only the latest amount recorded for that BOP_ID should be used.
Given the above sample data, here is an illustration what I'd expect to produce:
SELECT jn_date_time,
jn_operation,
bop_id,
payment_id,
bond_number,
bop_domain,
amount,
? as running_total
FROM JOURNAL
ORDER BY jn_date_time;
Here I've reproduced on the left the sample data, for two sample payments. On the right I've got "Running Total", which is the expected output. Next to it (in red) is the logic how it calculated the running total for each row.
The "Running Total" is a snapshot, at the point in time of the journal entry, of the total amount for that combination of PAYMENT_ID and BOND_NUMBER. Remember, a particular BOP_ID might be updated several times; the total amount must consider only the most recent record for that BOP_ID.
Any solution that works will be acceptable, but I suspect an analytic function (or combination of analytic functions) will be the best way to solve this.
Try this
WITH inner AS
(SELECT jn_date_time,
jn_operation,
bop_id,
payment_id,
bond_number,
bop_domain,
amount,
amount - coalesce(lag(amount) over (partition by bop_id order by jn_date_time), 0)
as delta_bop_amount
FROM JOURNAL)
SELECT inner.*,
sum(delta_bop_amount)
over (partition by payment_id, bond_number order by jn_date_time) as running_total
FROM inner
ORDER BY bond_number, payment_id
This will return the same answer for your examples.
You need two passes - the analytic function in the inner query figures out how much each record changes the total for each BOP_ID. An INS is a straight addition, an UPD has to subtract the most recent value and add the new one.
The second pass then does a running total by bond/payment.
I'm assuming that you wanted to treat the bond/payment as a natural key for the running sum, and that there may be multiple BOP_ID's for any bond/payment combination.
SELECT a.*,
lag(amount,1) over (PARTITION BY bond_number ORDER BY
payment_id,jn_date_time)recent_amount,
amount + nvl(lag(amount,1) over (PARTITION BY bond_number ORDER BY
payment_id,jn_date_time),0) running_total
FROM JOURNAL a
ORDER BY payment_id,jn_date_time
This solution provides the exact answer which you are expecting for the above question and that too in a single table pass :).
I have just used a lag analytical function to get the most recent value of amount per bond_number/payment_id combination and then added that recent amount value to the amount to get the running total...SIMPLE!!!..aint it :)

History records, missing records, filling in the blanks

I have a table that contains a history of costs by location. These are updated on a monthly basis.
For example
Location1, $500, 01-JAN-2009
Location1, $650, 01-FEB-2009
Location1, $2000, 01-APR-2009
if I query for March 1, I want to return the value for Feb 1, since March 1 does not exist.
I've written a query using an oracle analytic, but that takes too much time (it would be fine for a report, but we are using this to allow the user to see the data visually through the front and and switch dates, requerying takes too long as the table is something like 1 million rows).
So, the next thought I had was to simply update the table with the missing data. In the case above, I'd simply add in a record identical to 01-FEB-2009 except set the date to 01-MAR-2009.
I was wondering if you all had thoughts on how to best do this.
My plan had been to simply create a cursor for a location, fetch the first record, then fetch the next, and if the next record was not for the next month, insert a record for the missing month.
A little more information:
CREATE TABLE MAXIMO.FCIHIST_BY_MONTH
(
LOCATION VARCHAR2(8 BYTE),
PARKALPHA VARCHAR2(4 BYTE),
LO2 VARCHAR2(6 BYTE),
FLO3 VARCHAR2(1 BYTE),
REGION VARCHAR2(4 BYTE),
AVG_DEFCOST NUMBER,
AVG_CRV NUMBER,
FCIDATE DATE
)
And then the query I'm using (the system will pass in the date and the parkalpha). The table is approx 1 million rows, and, again, while it takes a reasonable amount of time for a report, it takes way too long for an interactive display
select location, avg_defcost, avg_crv, fcimonth, fciyear,fcidate from
(select location, avg_defcost, avg_crv, fcimonth, fciyear, fcidate,
max(fcidate) over (partition by location) my_max_date
from FCIHIST_BY_MONTH
where fcidate <='01-DEC-2008'
and parkalpha='SAAN'
)
where fcidate=my_max_date;
The best way to do this is to create a PL/SQL stored procedure that works backwards from the present and runs queries that fail to return data. Each month that it fails to return data it inserts a row for the missing data.
create or replace PROCEDURE fill_in_missing_data IS
cursor have_data_on_date is
select locaiton, trunc(date_filed) have_date
from the_table
group by location, trunc(date_field)
order by desc 1
;
a_date date;
day_offset number;
n_days_to_insert number;
BEGIN
a_date := trunc(sysdate);
for r1 in fill_in_missing_data loop
if r1.have_date < a_date then
-- insert dates in a loop
n_days_to_insert := a_date - r1.have_date; -- Might be off by 1, need to test.
for day_offset in 1 .. n_days_to_insert loop
-- insert missing day
insert into the_table ( location, the_date, amount )
values ( r1.location, a_date-day_offset, 0 );
end loop;
end if;
a_date := r1.have_date;
-- this is a little tricky - I am going to test this and update it in a few minutes
end loop;
END;
Filling in the missing data will (if you are careful) make the queries much simpler and run faster.
I would also add a flag to the table to indicate that the data is missing data filled in so that if
you need to remove it (or create a view without it) later you can.
I have filled in missing data and also filled in dummy data so that outer join were not necessary so as to improve query performance a number of times. It is not "clean" and "perfect" but I follow Leflar's #1 Law, "always go with what works."
You can create a job in Oracle that will automatically run at off-peak times to fill in the missing data. Take a look at: This question on stackoverflow about creating jobs.
What is your precise use case underlying this request?
In every system I have worked on, if there is supposed to be a record for MARCH and there isn't a record for MARCH the users would like to know that fact. Apart from anything they might want to investigate why the MARCH record is missing.
Now if this is basically a performance issue then you ought to tune the query. Or if it presentation issue - you want to generate a matrix of twelve rows and that is difficult if a doesn't have a record for some reason - then that is a different matter, with a variety of possible solutions.
But seriously, I think it is a bad practice for the database to invent replacements for missing records.
edit
I see from your recent comment on your question that is did turn out to be a performance issue - indexes fixed the problem. So I feel vindicated.

Resources