Storing weekly and monthly aggregates in Oracle - oracle

I need to dynamically update weekly and monthly sales data per product and customer. These need to be updated and checked during the sale of a product, and for various reasons I'm not able to use stored procedures or materialized views for this (I'll read everything into the application, modify everything in memory and then update and commit the results).
What is the best table structure for holding the sales during a period?
Store the period type (M, W) with start and end dates, or just the type and start date?
Use date fields and a char, or code it into a string ('M201201' / 'W201248')
Normalize sales and periods into two tables, or keep both sales and the period in a single table?
I will be doing two kinds of queries - select the sales of the current weekly (xor monthly) period/customer/article but not update them, and select for update weekly and monthly periods for a customer/article.

If you store both the start and end dates of the applicable period in the row, then your retrieval queries will be much easier, at least the ones that are based on a single date (like today). This is a very typical mode of access since you are probably going to be looking at things from the perspective of a business transaction (like a specific sale) which happens on a given date.
It is very direct and simple to say where #date_of_interest >= start_date and #date_of_interest <= end_date. Any other combination requires you to do date arithmetic either in code before you go to your query or within your query itself.
Keeping a type code (M, W) as well as both start and end dates entails introducing some redundancy. However, you might choose to introduce this redundancy for the sake of easing data retrieval. This: where #date_of_interest >= start_date and #date_of_interest <= end_date and range_type='M' is also very direct and simple.
As with all denormalization, you need to ensure that you have controls that will manage this redundancy.

I would recommend you to use a normalized schema for that purpose where you store weekly and monthly aggregation in two different tables with the same structure. I don't know the kind of queries you're going to do, but I suppose that this would make it easier to do any sort of search (when it's done in the right way!!!).
Probably something like this sample
product_prices (
prod_code,
price,
date_price_begin
);
sales (
prod_code,
customer_code,
sale_date
);
<aggregate-week>
select trunc(sale_date,'w') as week,
prod_code,
customer_code,
sum(price) keep (dense_rank first order by date_price_start) as price
from sales
natural join product_prices
where sale_date > date_from
group by trunc(sale_date,'iw'),
prod_code,
customer_code
/
<aggregate-month>
select trunc(sale_date,'m') as month,
prod_code,
customer_code,
sum(price) keep (dense_rank first order by date_price_start) as price
from sales
natural join product_prices
where sale_date > date_from
group by trunc(sale_date,'m'),
prod_code,
customer_code
/

Related

Best way to store parameter that changes over the course of time

Consider the following scenario:
We have a function (let's call it service_cost) that performs some sort of computations.
In that computations we also use a variable (say current_fee) witch has a certain value at a given time (we get the value of that variable from an auxiliary table - fee_table).
Now current_fee could stay the same for 4 months, then it changes and obtains a new value, and so on and so forth. Of course I would like to know the current fee, but also should be able to find out the fee that was 'active' days, months, years before...
So, one way of organizing the the fee_table is
create table fee_table (
id number,
valid_from date,
valid_to date,
fee number
)
And then at any given time - if I want to get the current fee I would:
select fee into current_fee form
fee_table where trunc(sysdate) between valid_from and valid_to;
What I don't like about the solution above, is that it is easy to create inconsistent entries into fee_table - like:
-overlapping time periods (valid_from-valid_to) e.g. (1/1/2012 - 1/2/2012) and (15/1/2012-5/2012)
-no entry for current period
-holes in between the periods e.g. ([1/1/2012-1/2/2012],[1/4/2012-1/5/2012])
etc.
Could anyone suggest a better way to handle such a scenario?
Or may be - if we stick with the above scenario - some kind of constraints, check, triggers etc upon the table to avoid the inconsistencies described?
Thanks.
Thank you for all the comments above. So based on #Alex Pool and #William Robertson.
I am leaning towards the following solution:
The table
create table fee_table (
id number unique,
valid_from date unique,
fee number
)
The Data:
insert into fee_table_todel(tid, valid_from,fee) values (1,to_date('1/1/2014','dd/mm/rrrr'), 30.5);
insert into fee_table_todel(tid, valid_from,fee) values (2,to_date('3/2/2014','dd/mm/rrrr'), 20.5);
insert into fee_table_todel(tid, valid_from,fee) values (3,to_date('4/4/2014','dd/mm/rrrr'), 10);
The select:
with from_to_table as (
SELECT tid, valid_from, LEAD(valid_from, 1, null) OVER (ORDER BY
valid_from)-1 AS valid_to,fee
FROM fee_table
)
select fee from from_to_table
where to_date(:mydate,'dd/mm/rrrr') between valid_from and nvl(valid_to,to_date(:mydate,'dd/mm/rrrr')+1)

Best way to create tables for huge data using oracle

Functional requirement
We kinda work for devices. Each device, roughly speaking, has its unique identifier, an IP address, and a type.
I have a routine that pings all devices that has an IP address.
This routine is nothing more than a C# console application, which runs every 3 minutes trying to ping the IP address of each device.
The result of the ping I need to store in the database, as well as the date of verification (regardless of the result of the ping).
Then we got into the technical side.
Technical part:
Assuming my ping and bank structuring process is ready from the day 01/06/2016, I need to do two things:
Daily extraction
Extraction in real time (last 24 hours)
Both should return the same thing:
Devices that are unavailable for more than 24 hours.
Devices that are unavailable for more than 7 days.
Understood to be unavailable the device to be pinged AND did not responded.
It is understood by available device to be pinged AND answered successfully.
What I have today and works very badly:
A table with the following structure:
create table history (id_device number, response number, date date);
This table has a large amount of data (now has 60 million, but the trend is always grow exponentially)
** Here are the questions: **
How to achieve these objectives without encountering problems of slowness in queries?
How to create a table structure that is prepared to receive millions / billions of records within my corporate world?
Partition the table based on date.
For partitioning strategy consider performance vs maintanence.
For easy mainanence use automatic INTERVAL partitions by month or week.
You can even do it by day or manually pre-define 2 day intervals.
You query only needs 2 calendar days.
select id_device,
min(case when response is null then 'N' else 'Y' end),
max(case when response is not null then date end)
from history
where date > sysdate - 1
group by id_device
having min(case when response is null then 'N' else 'Y' end) = 'N'
and sysdate - max(case when response is not null then date end) > ?;
If for missing responses you write a default value instead of NULL, you may try building it as an index-organized table.
You need to read about Oracle partitioning.
This statement will create your HISTORY table partitioned by calendar day.
create table history (id_device number, response number, date date)
PARTITION BY RANGE (date)
INTERVAL(NUMTOYMINTERVAL(1, 'DAY'))
( PARTITION p0 VALUES LESS THAN (TO_DATE('5-24-2016', 'DD-MM-YYYY')),
PARTITION p1 VALUES LESS THAN (TO_DATE('5-25-2016', 'DD-MM-YYYY'));
All your old data will be in P0 partition.
Starting 5/24/2016 a new partition will be automatically created each day.
HISTORY now is a single logical object but physically it is a collection of identical tables stacked on top of each other.
Because each partitions data is stored separately, when a query asks for one day worth of data, only a single partition needs to be scanned.

Bye KEEP DENSE_RANK?

With data given
Id sdate sales
1 15.03.2015 150
2 16.03.2015 170
where id+date is unique combination
one could easily find the best date, or best item to sale.
Select max(date) keep(dense_rank last order by sales) from data.
So far so good. But suppose we have data like following:
Id sdate sales
1 15.03.2015 150
2 16.03.2015 170
1 15.03.2015 117
2 16.03.2015 97
… some other dates with worst sale sums than 15.03.2015 and 16.03.2015
Now I want to know the best DATES to sale
Select max(sdate) keep(dense_rank last order by sum(sales)) from data group by sdate.
Hey! It shows only 15.03.2015. But I want to see it both – 15.03.2015 and 16.03.2015.
LISTAGG doesn’t help here too. Only
Select sdate from data group by sdate
Order by sum(sales) DESC FETCH FIRST ROW WITH TIES
Returns me both dates. So, bye KEEP DENSE_RANK? Meet FETCH FIRST?
What is your opinion , respective all?
They're doing different things. keep can only return one row for each group. As you want to see tied values, you can't use keep, but you could do this with an inline view:
select sdate
from (
select sdate, dense_rank() over (order by sum(sales) desc) as rnk
from data
group by sdate
)
where rnk = 1;
Which is essentially what fetch first rows with ties is doing in 12c in this example.
There are situations where keep is appropriate, and others where an inline view or fetch first rows is appropriate, and some where either would work.
Having a scenario where you can't use keep to get the result you want doesn't mean you should never use it. Your first simpler query could use either approach; if you wanted other information then keep would come into its own (like the examples in the documentation for first). There are a lot of tools available and you need to pick the best one for what you're trying to achieve.

MonetDB: Group by different parts of a timestamp

I have a timestamp column in a monetdb table which I want to occasionally group by hour and occasionally group by day or month. What is the most optimal way of doing this in MonetDB?
In say postgres you could do something like:
select date_trunc('day', order_time), count(*)
from orders
group by date_trunc('day', order_time);
Which I appreciate would not use an index, but is there any way of doing this in MonetDB without creating additional date columns holding day, month and year truncated values?
Thanks.
You could use the EXTRACT(DAY FROM order_time) possibly as part of a subquery before grouping.
It might be a little late for answer, but the following should work for truncating to day precision:
SELECT CAST(order_time AS DATE) AS order_date, count(*)
FROM orders
GROUP BY order_date;
It works by casting the timestamp value to type DATE which is a MonetDB built-in type and the cast is pretty fast.
It does not have the flexibility of date_trunc in Postgres, but if you need to go to monthly of yearly precision, you could use the somewhat slower but usable EXTRACT to get the relevant parts of the timestamp and group by them. For monthly grouping, you could do:
SELECT EXTRACT(YEAR FROM order_time) AS y,
EXTRACT(MONTH FROM order_time) AS m,
count(*)
FROM orders GROUP BY y, m;
The only disadvantage is that you will have the date split to two columns.

Querying a data warehouse data involving time dimension

I have two tables for time dimension
date (unique row for each day)
time of the day (unique row for each minute in a day)
Given this schema what would a query look like if one wants to retrieve facts for last X hours where X can be any number greater than 0.
Things start to be become tricky when the start time and end time happen to be in two different days of the year.
EDIT: My Fact table does not have a time stamp column
Fact tables do have (and should have) original timestamp in order to avoid weird by-time queries which happen over the boundary of a day. Weird means having some type of complicated date-time function in the WHERE clause.
In most DWs these type of queries are very rare, but you seem to be streaming data into your DW and using it for reporting at the same time.
So I would suggest:
Introduce the full timestamp in the fact table.
For the old records, re-create the timestamp from the Date and Time keys.
DW queries are all about not having any functions in the WHERE clause, or if a function has to be used, make sure it is SARGABLE.
You would probably be better served by converting the Start Date and End Date columns to TIMESTAMP and populating them.
Slicing the table would require taking the appropriate interval BETWEEN Start Date AND End Date. In Oracle the interval would be something along the lines of SYSDATE - (4/24) or SYSDATE - NUMTODSINTERVAL(4, 'HOUR')
This could also be rewritten as:
Start Date <= (SYSDATE - (4/24)) AND End Date >= (SYSDATE - (4/24))
It seems to me that given the current schema you have, that you will need to retrieve the appropriate time IDs from the time dimension table which meet your search criteria, and then search for matching rows in the fact table. Depending on the granularity of your time dimension, you might want to check the performance of doing either (SQL Server examples):
A subselect:
SELECT X FROM FOO WHERE TIMEID IN (SELECT ID FROM DIMTIME WHERE HOUR >= DATEPART(HOUR, CURRENT_TIMESTAMP()) AND DATEID IN (SELECT ID FROM DIMDATE WHERE DATE = GETDATE())
An inner join:
SELECT X FROM FOO INNER JOIN DIMTIME ON TIMEID = DIMTIME.ID WHERE HOUR >= DATEPART(HOUR, CURRENT_TIMESTAMP()) INNER JOIN DIMDATE ON DATEID = DIMDATE.ID WHERE DATE = GETDATE()
Neither of these are truly attractive options.
Have you considered that you may be querying against a cube that is intended for roll-up analysis and not necessarily for "last X" analysis?
If this is not a "roll-up" cube, I would agree with the other posters in that you should re-stamp your fact tables with better keys, and if you do in fact intend to search off of hour frequently, you should probably include that in the fact table as well, as any other attempt will probably make the query non-sargable (see What makes a SQL statement sargable?).
Microsoft recommends at http://msdn.microsoft.com/en-us/library/aa902672%28v=sql.80%29.aspx that:
In contrast to surrogate keys used in other dimension tables, date and time dimension keys should be "smart." A suggested key for a date dimension is of the form "yyyymmdd". This format is easy for users to remember and incorporate into queries. It is also a recommended surrogate key format for fact tables that are partitioned into multiple tables by date.
Best luck!

Resources