how to model data / indexes to find timeslices fast - oracle

We have a lot of tables in our database with data that is only relevant/valid during a certain period of time. For example contracts, they have a start_date and an end_date. And it's not necessarily full months.
Now this is a typical type of query against this table:
SELECT
*
FROM
contracts c
WHERE
c.start_date <= :1
AND c.end_date >= :2
AND c.region_id = :3
Since we have 20 years of data in our table (~7000 days), the date is very good filter criteria, especially when :1 and :2 is the same day. The region_id is not such a good filter criteria because there aren't that many (~50). In this example we have (among others) 2 Indexes on our table:
contracts_valid_index (start_date, end_date)
contracts_region (region_id)
Unfortunately, above query will often us the contracts_region index because the optimizer thinks it's cheaper. The reason behind this is simple: When I pick a day in the middle of our data, then the database will think that an index over start_date will not really be good because it will only filter out half the data. And by looking at end_date the same applies. So the optimizer thinks that he can only filter out 1/4 of my data. Because he does not know that start_date and end_date are usually pretty close together and this index would be very selective.
An execution plan using the contracts_valid_index has higher costs than an execution plan using contracts_region. But in reality the contracts_valid_index is a lot better.
I currently don't think that I can speed up my queries by making better indexes (other than deleting all but contracts_valid_index). But maybe my data model is not very good for the query optimizer. So I assume that others are also having similar needs and would love to know how they modeled their data or optimized their data tables / indexes.
Any suggestions?

Since you indicate you are using Oracle 12c it may help to define your Start_Date and End_Date columns as temporal valid time columns provided they match the appropriate temporal validity semantics (start_date and end_date need to be timestamps, end_date must be > start_date or possibly null and valid time periods include the start date but exclude the end date, that is it's a partially closed/open range unlike the usual between operator which denotes a fully closed range). For example:
ALTER TABLE contracts ADD (PERIOD FOR valid_time (start_date, end_date));
You can then query the contracts table for a given validity period thusly:
SELECT
c.*
FROM
contracts VERSIONS PERIOD FOR valid_time BETWEEN :1 AND :2 c
WHERE
c.region_id = :3
This is semantically similar to:
SELECT
c.*
FROM
contracts c
WHERE
:1 < end_date
AND start_date <= :2
AND c.region_id = :3
Alternatively to query for records that are valid for a specific point in time rather than a range of time:
SELECT
c.*
FROM
contracts AS OF PERIOD FOR valid_time :1 c
WHERE
c.region_id = :2
which is semantically similar to:
SELECT
c.*
FROM
contracts c
WHERE
:1 BETWEEN start_date AND end_date
and :1 <> end_date
and c.region_id = :2
I'm not sure if null values for start_date and end_date indicate the beginning and end of time respectively or not since I don't currently have an R12 instance to test in.

I have previously come across the same problem of index usage in relation to large sets of IP addresses on MySQL databases (bear with me; it really is the same problem).
The solution I found (by much googling, I'm not taking the credit for inventing it) was to use a geospatial index. This is specifically designed to find data within ranges. Most implementations (including that in mysql) are hard-wired to a 2 dimensional space while ip addresses and time are 1 dimensional, but its trivial to map a 1 dimensional coordinate into a 2 dimensional space (see link for a step by step explanation).
Sorry, I don't know anything about Oracle's geospatial capabilities so I can't offer any example code but, it does support geospatial indexing so can resolve your queries efficiently.

You could try the following query to see if it works better:
WITH t1 AS (
SELECT *
FROM contracts c
WHERE c.start_date <= :1
AND c.end_date >= :2
)
SELECT *
FROM t1
WHERE c.region_id = :3
Though it will likely prevent any possibility of using the contracts_region index.
Alternatively you could try hinting the query to use the desired index:
SELECT /*+ INDEX(c contracts_valid_index) */
*
FROM
contracts c
WHERE
c.start_date <= :1
AND c.end_date >= :2
AND c.region_id = :3
Or hinting it to not use the undesired index:
SELECT /*+ NO_INDEX(c contracts_region ) */
*
FROM
contracts c
WHERE
c.start_date <= :1
AND c.end_date >= :2
AND c.region_id = :3
When testing this out for myself without using hints I found that when selecting for dates near the start or end of the available date range the optimizer was using the INDEX_RS_ASC hint. Adding that to the query as shown below caused my testing to use the desired index even when the date range was closer to the center of the date range:
SELECT /*+ INDEX_RS_ASC(c contracts_valid_index) */
*
FROM
contracts c
WHERE
c.start_date <= :1
AND c.end_date >= :2
AND c.region_id = :3
My sample data consisted of 10,000,000 rows evenly distributed accross 50 regions and 1000 years each with a 30 day valid range.

Related

Oracle create table query taking too long to run

I am trying to create a sales table however after 8 hours it still has not run. I have attempted to speed up the query by adding the hints and reducing the timeframe to 2022 only however after 3 hours it is still running. Is there a way to optimise this query?
DROP TABLE BIRTHDAY_SALES;
CREATE TABLE BIRTHDAY_SALES AS
--(
SELECT /*+ parallel(32) */
DISTINCT T.CONTACT_KEY
, S.CAMPAIGN_NAME
, S.CONTROL_GROUP_FLAG
, S.SEGMENT_NAME
, count(distinct t.ORDER_NUM) as TRANS
, count(distinct case when p.store_key = '42381' then t.ORDER_NUM else NULL end) as TRANS_ONLINE
, count(distinct case when p.store_key != '42381' then t.ORDER_NUM else NULL end) as TRANS_OFFLINE
, sum(t.ITEM_AMT) as SALES
, sum(case when p.store_key = '42381' then t.ITEM_AMT else NULL end) as SALES_ONLINE
, sum(case when p.store_key != '42381' then t.ITEM_AMT else NULL end) as SALES_OFFLINE
, sum(case when t.item_quantity_val>0 and t.item_amt<=0 then 0 else t.item_quantity_val end) QTY
, sum(case when (p.store_key = '42381' and t.ITEM_QUANTITY_VAL>0 and t.ITEM_AMT>0) then t.ITEM_QUANTITY_VAL else null end) QTY_ONLINE
, sum(case when (p.store_key != '42381' and t.ITEM_QUANTITY_VAL>0 and t.ITEM_AMT>0) then t.ITEM_QUANTITY_VAL else null end) QTY_OFFLINE
FROM CRM_TARGET.B_TRANSACTION T
JOIN BDAY_PROG S
ON T.CONTACT_KEY = S.CONTACT_KEY
JOIN CRM_TARGET.T_ORDITEM_SD P
ON T.PRODUCT_KEY = P.PRODUCT_KEY
where t.TRANSACTION_TYPE_NAME = 'Item'
and t.BU_KEY = '15'
and t.TRANSACTION_DT_KEY >= '20220101'
and t.TRANSACTION_DT_KEY <= '20221231'
and t.member_sale_flag = 'Y'
and t.bu_key = '15'
and t.CONTACT_KEY != 0
group by
T.CONTACT_KEY
, S.CAMPAIGN_NAME
, S.CONTROL_GROUP_FLAG
, S.SEGMENT_NAME
-- )
;
Performance tuning is not something we can effectively deal with on a forum like this, as there are too many factors to consider. You will have to examine the explain plan and look at ASH data (v$active_session_history) to see what the predominant waits are and on what plan step. Only then can you determine what's wrong and take steps to fix it.
However, here are some obvious things to look for:
Make sure there are no many-to-many joins. I'm guessing B_TRANSACTION probably has many rows with the same CONTACT_KEY and many rows with the same PRODUCT_KEy. That's okay, but then you must ensure that CONTACT_KEY is unique within BDA_PROG and PRODUCT_KEY is unique within T_ORDITEM_SD. IF that's not the case, you will get a partial Cartesian product from the hidden many-to-many join and will spend a huge amount of time on reading/writing to temp.
Make sure no more than one of those joins is one-to-many. Multiple one-to-manies stemming off the same parent table will effectively give you a many-to-many between the children, with the same effect.
You are asking for a date range of a month. In most systems, you are better off doing a full table scan (with parallel query if you can) than using indexes to get a whole month's worth of transactional data. If it is using an index that can really mess you up. You can fix this with hints (see below)
It might be using nested loops joins when a reporting query like this is likely better off using hash joins. Again, I'm just guessing based on the names of your tables; only knowledge of your data can determine this for sure.
Ensure that the PGA workareas are of reasonable size. Ask your DBA to query v$pgastat and report the global memory bound. It should be at its max of 1G, but probably anything over 100M is reasonable. If it's less than that, you may need to ask the DBA to increase the pga_aggregate_target, or you can manually set your own sort_area_size/hash_area_size session parameters (not the best thing to do).
You are asking for DOP 32. That's pretty high. Ensure there are that many CPU cores on the database server, that parallel_max_servers > 64 and that you aren't getting downgraded to serial by anything. Ask your DBA what a reasonable DOP would be.
Do you really need COUNT(DISTINCT ... ) on ORDERNUM? If you are just counting # of transactions, it would be less work to simply say SUM(CASE (WHEN .... ) THEN 1 ELSE 0 END)
Remove the DISTINCT keyword. It's not doing anything - your GROUP BY will already result in the results being distinct.
Consult ASH (v$active_session_history) to see if you are actually blocked by something, showing some kind of concurrency wait. Your CTAS might not be doing anything at all because of some library cache lock or full tablespace if the database is configured to suspend until space is added.
Here's something to try - again, it's a long shot without knowing your data or table structure. But I've seen enough reports like this to make at least a somewhat educated guess:
SELECT /*+ USE_HASH(t s p) FULL(t) FULL(s) FULL(p) PARALLEL(8) */ t.contact_key . . .

Passing a parameter to a WITH clause query in Oracle

I'm wondering if it's possible to pass one or more parameters to a WITH clause query; in a very simple way, doing something like this (taht, obviously, is not working!):
with qq(a) as (
select a+1 as increment
from dual
)
select qq.increment
from qq(10); -- should get 11
Of course, the use I'm going to do is much more complicated, since the with clause should be in a subquery, and the parameter I'd pass are values taken from the main query....details upon request... ;-)
Thanks for any hint
OK.....here's the whole deal:
select appu.* from
(<quite a complex query here>) appu
where not exists
(select 1
from dual
where appu.ORA_APP IN
(select slot from
(select distinct slots.inizio,slots.fine from
(
with
params as (select 1900 fine from dual)
--params as (select app.ora_fine_attivita fine
-- where app.cod_agenda = appu.AGE
-- and app.ora_fine_attivita = appu.fine_fascia
--and app.data_appuntamento = appu.dataapp
--)
,
Intervals (inizio, EDM) as
( select 1700, 20 from dual
union all
select inizio+EDM, EDM from Intervals join params on
(inizio <= fine)
)
select * from Intervals join params on (inizio <= fine)
) slots
) slots
where slots.slot <= slots.fine
)
order by 1,2,3;
Without going in too deep details, the where condition should remove those records where 'appu.ORA_APP' match one of the records that are supposed to be created in the (outer) 'slots' table.
The constants used in the example are good for a subset of records (a single 'appu.AGE' value), that's why I should parametrize it, in order to use the commented 'params' table (to be replicated, then, in the 'Intervals' table.
I know thats not simple to analyze from scratch, but I tried to make it as clear as possible; feel free to ask for a numeric example if needed....
Thanks

Oracle tuning for query with query annidate

i am trying to better a query. I have a dataset of ticket opened. Every ticket has different rows, every row rappresent an update of the ticket. There is a field (dt_update) that differs it every row.
I have this indexs in the st_remedy_full_light.
IDX_ASSIGNMENT (ASSIGNMENT)
IDX_REMEDY_INC_ID (REMEDY_INC_ID)
IDX_REMDULL_LIGHT_DTUPD (DT_UPDATE)
Now, the query is performed in 8 second. Is high for me.
WITH last_ticket AS
( SELECT *
FROM st_remedy_full_light a
WHERE a.dt_update IN
( SELECT MAX(dt_update)
FROM st_remedy_full_light
WHERE remedy_inc_id = a.remedy_inc_id
)
)
SELECT remedy_inc_id, ASSIGNMENT FROM last_ticket
This is the plan
How i could to better this query?
P.S. This is just a part of a big query
Additional information:
- The table st_remedy_full_light contain 529.507 rows
You could try:
WITH last_ticket AS
( SELECT remedy_inc_id, ASSIGNMENT,
rank() over (partition by remedy_inc_id order by dt_update desc) rn
FROM st_remedy_full_light a
)
SELECT remedy_inc_id, ASSIGNMENT FROM last_ticket
where rn = 1;
The best alternative query, which is also much easier to execute, is this:
select remedy_inc_id
, max(assignment) keep (dense_rank last order by dt_update)
from st_remedy_full_light
group by remedy_inc_id
This will use only one full table scan and a (hash/sort) group by, no self joins.
Don't bother about indexed access, as you'll probably find a full table scan is most appropriate here. Unless the table is really wide and a composite index on all columns used (remedy_inc_id,dt_update,assignment) would be significantly quicker to read than the table.

How to avoid expensive Cartesian product using row generator

I'm working on a query (Oracle 11g) that does a lot of date manipulation. Using a row generator, I'm examining each date within a range of dates for each record in another table. Through another query, I know that my row generator needs to generate 8500 dates, and this amount will grow by 365 days each year. Also, the table that I'm examining has about 18000 records, and this table is expected to grow by several thousand records a year.
The problem comes when joining the row generator to the other table to get the range of dates for each record. SQLTuning Advisor says that there's an expensive Cartesian product, which makes sense given that the query currently could generate up to 8500 x 18000 records. Here's the query in its stripped down form, without all the date logic etc.:
with n as (
select level n
from dual
connect by level <= 8500
)
select t.id, t.origdate + n origdate
from (
select id, origdate, closeddate
from my_table
) t
join n on origdate + n - 1 <= closeddate -- here's the problem join
order by t.id, t.origdate;
Is there an alternate way to join these two tables without the Cartesian product?
I need to calculate the elapsed time for each of these records, disallowing weekends and federal holidays, so that I can sort on the elapsed time. Also, the pagination for the table is done server-side, so we can't just load into the table and sort client-side.
The maximum age of a record in the system right now is 3656 days, and the average is 560, so it's not quite as bad as 8500 x 18000; but it's still bad.
I've just about resigned myself to adding a field to store the opendays, computing it once and storing the elapsed time, and creating a scheduled task to update all open records every night.
I think that you would get better performance if you rewrite the join condition slightly:
with n as (
select level n
from dual
connect by level <= 8500
)
select t.id, t.origdate + n origdate
from (
select id, origdate, closeddate
from my_table
) t
join n on Closeddate - Origdate + 1 <= n --you could even create a function-based index
order by t.id, t.origdate;

Oracle Daily count/average over a year

I'm pulling two pieces of information over a specific time period, but I would like to fetch the daily average of one tag and the daily count of another tag. I'm not sure how to do daily averages over a specific time period, can anyone provide some advice? Below were my first ideas on how to handle this however to change every date would be annoying. Any help is appreciated thanks
SELECT COUNT(distinct chargeno), to_char(chargetime, 'mmddyyyy') AS chargeend
FROM batch_index WHERE plant=1 AND chargetime>to_date('2012-06-18:00:00:00','yyyy-mm-dd:hh24:mi:ss')
AND chargetime<to_date('2012-07-19:00:00:00','yyyy-mm-dd:hh24:mi:ss')
group by chargetime;
The working version of the daily sum
SELECT to_char(bi.chargetime, 'mmddyyyy') as chargtime, SUM(cv.val)*0.0005
FROM Charge_Value cv, batch_index bi WHERE cv.ValueID =97
AND bi.chargetime<=to_date('2012-07-19','yyyy-mm-dd')
AND bi.chargeno = cv.chargeno AND bi.typ=1
group by to_char(bi.chargetime, 'mmddyyyy')
seems like in the first one you want to change the group to the day - not the time... (plus i dont think you need to specify all those 0's for seconds..)
SELECT COUNT(distinct chargeno), to_char(chargetime, 'mmddyyyy') AS chargeend
FROM batch_index WHERE plant=1 AND chargetime>to_date('2012-06-18','yyyy-mm-dd')
AND chargetime<to_date('2012-07-19','yyyy-mm-dd')
group by to_char(chargetime, 'mmddyyyy') ;
not 100% I'm following your question, but if you just want to do aggregates (sums, avg), then do just that. I threw in the rollup just in case that is what you were looking for
with fakeData as(
select trunc(level *.66667) nr
, trunc(2*level * .33478) lvl --these truncs just make the doubles ints
,trunc(sysdate+trunc(level*.263784123)) dte --note the trunc, this gets rid of the to_char to drop the time
from dual
connect by level < 600
) --the cte is just to create fake data
--below is just some aggregates that may help you
select sum(nr) daily_sum_of_nr
, avg(nr) daily_avg_of_nr
, count(distinct lvl) distinct_lvls_per_day
, count(lvl) count_of_nonNull_lvls_per_day
, dte days
from fakeData
group by rollup(dte)
--if you want the query to supply a total for the range, you may use rollup ( http://psoug.org/reference/rollup.html )

Resources