How can I limit the numbers of results being grouped in my Group By in Oracle? - oracle

I've got a table of a parameters, values, and times at which those values were recorded.
I've got a procedure which takes in a time, and needs to get the average result of each parameters value in the window of time that is -15/+5 seconds around that time frame. On top of that, I want to make sure that I take the no more than 15 records before the passed in time, and no more than 5 records after it.
For example, maybe I'm recording values of some parameters every second. If I passed in the time 21:30:30, I'd want to get the values between 21:30:15 and 21:30:35. But if I was recording every half second, I'd actually have more parameters that fit in that time frame than I want, and that's where my need to limit my results comes in.
I've read this question and this article which seem pretty related to what I'm trying to do, but unfortunately I'm dealing with Oracle and not MySQL, so I can't use "limit".
I've currently got something that looks like this:
std_values as
(
select
V.ParameterId,
V.NumericValue,
from
ValuesTable V
where
V.ValueSource = pValueSource
and V.Time >= pSummaryTime - 15/86400
and V.Time <= pSummaryTime + 5/86400
)
select
ParameterId,
avg(NumericValue) as NumericValue
from
std_values
group by
ParameterId
pValueSource is just something that lets me filter down which value types I'm looking at, and pSummaryTime is the input time that I'm basing my time frame around. The goal here is to get the 15 records before pSummaryTime that falls within that window, and the 5 after that falls within that window, and use those for the average. Currently I'm not limiting the number of "before" and "after" results though, so I'm ending up with the average of everything that falls into that time window. And without something like "limit", I'm not sure how to do this in Oracle.

Sounds like you want a moving window aggregate function. This is part of the Analytical functions feature of Oracle.
It's not my strong suit, and since you didn't include sample tables/data to build a test case, I'll just point you to the Oracle documentation, here:
http://docs.oracle.com/cd/B14117_01/server.101/b10736/analysis.htm#i1006709
You probably want something like:
AVG(NumericValue) over (order by pSummaryTime RANGE BETWEEN 15 PRECEDING AND 5 FOLLOWING)
but, like I said, not my strong suit, and totally untested, but, I hope it gets the idea across.
Hope that helps.

Thanks to Mark Bobak's answer getting me on the right track, I ended up with this solution.
with
values_before as
(
select
V.ParameterId,
V.NumericValue,
row_number() over (Partition by V.ParameterId order by V.Time desc) as RowNumber
from
ValuesTable V
where
V.ValueSource = pValueSource
and V.Time >= pSummaryTime - 15/86400
and V.Time <= pSummaryTime
),
values_after as
(
select
V.ParameterId,
V.NumericValue,
row_number() over (Partition by V.ParameterId order by V.Time desc) as RowNumber
from
ValuesTable V
where
V.ValueSource = pValueSource
and V.Time <= pSummaryTime + 5/86400
and V.Time > pSummaryTime
),
values_all as
(
select * from values_before where RowNumber <= 15
union all
select * from values_after where RowNumber <= 5
)
select ParameterId, avg(NumericValue) from values_all group by ParameterId
No doubt there's a better way to do this, but it at least seems to be giving the correct result. The key was using an analytical function to set the row number and order for the 15 before and 5 after, and then filtering my results down to just those.

Related

Add indicator to top and bottom 10%

I'm trying to capture the average of FIRST_CONTACT_CAL_DAYS but what I would like to do is create an indicator for the top and bottom 10% of values so I can exclude those (outliers) from my average calculation.
Not sure how to go about do this, any thoughts?
SELECT DISTINCT
TO_CHAR(A.FIRST_ASSGN_DT,'DAY') AS DAY_NUMBER,
A.FIRST_ASSGN_DT,
A.FIRST_CONTACT_DT,
TO_CHAR(A.FIRST_CONTACT_DT,'DAY') AS DAY_NUMBER2,
A.FIRST_CONTACT_DT AS FIRST_PHONE_CONTACT,
A.ID,
ABS(TO_DATE(A.FIRST_CONTACT_DT, 'DD/MM/YYYY') - TO_DATE(A.FIRST_ASSGN_DT, 'DD/MM/YYYY')) AS FIRST_CONTACT_CAL_DAYS,
FROM HIST A
LEFT JOIN CONTACTS D ON A.ID = D.ID
WHERE 1=1
You may be looking for something like this. Please adapt to your situation.
I assume you may have more than one "group" or "partition" and you need to compute the average for each group separately, after throwing out the outliers in each partition. (An alternative, which can be easily accommodated by adapting the query below, is to throw out the outliers at the global level, and only then to group and take the average for each group.)
If you don't have any groups, and everything is one big pile of data, it's even easier - you don't need GROUP BY and PARTITION BY.
Then: the function NTILE assigns a bucket number, in this example between 1 and 10, to each row, based on where they fall (first decile, i.e. first 10%, next decile, ... all the way to the last decile). I do this in a subquery. Then in the outer query just filter out the first and last bucket before you group by and you compute the average.
For testing purposes I create three groups with 10,000 random numbers each in a WITH clause - no need to spend any time on that portion of the code, since it is not part of the solution (the SQL code to solve your problem) - it's just a dirty trick to create test data on the fly.
with
inputs ( grp, val ) as (
select ceil(level/10000), dbms_random.value(0, 150)
from dual
connect by level <= 30000
)
select grp, avg(val) as avg_val
from (
select grp, val, ntile(10) over (partition by grp order by val) as bkt
from inputs
)
where bkt between 2 and 9
group by grp
;
GRP AVG_VAL
--- -----------------------
1 75.021614866547043734458
2 74.286117923344418598032
3 75.437412573353736953791

Passing a parameter to a WITH clause query in Oracle

I'm wondering if it's possible to pass one or more parameters to a WITH clause query; in a very simple way, doing something like this (taht, obviously, is not working!):
with qq(a) as (
select a+1 as increment
from dual
)
select qq.increment
from qq(10); -- should get 11
Of course, the use I'm going to do is much more complicated, since the with clause should be in a subquery, and the parameter I'd pass are values taken from the main query....details upon request... ;-)
Thanks for any hint
OK.....here's the whole deal:
select appu.* from
(<quite a complex query here>) appu
where not exists
(select 1
from dual
where appu.ORA_APP IN
(select slot from
(select distinct slots.inizio,slots.fine from
(
with
params as (select 1900 fine from dual)
--params as (select app.ora_fine_attivita fine
-- where app.cod_agenda = appu.AGE
-- and app.ora_fine_attivita = appu.fine_fascia
--and app.data_appuntamento = appu.dataapp
--)
,
Intervals (inizio, EDM) as
( select 1700, 20 from dual
union all
select inizio+EDM, EDM from Intervals join params on
(inizio <= fine)
)
select * from Intervals join params on (inizio <= fine)
) slots
) slots
where slots.slot <= slots.fine
)
order by 1,2,3;
Without going in too deep details, the where condition should remove those records where 'appu.ORA_APP' match one of the records that are supposed to be created in the (outer) 'slots' table.
The constants used in the example are good for a subset of records (a single 'appu.AGE' value), that's why I should parametrize it, in order to use the commented 'params' table (to be replicated, then, in the 'Intervals' table.
I know thats not simple to analyze from scratch, but I tried to make it as clear as possible; feel free to ask for a numeric example if needed....
Thanks

Trying to figure out top 5 land areas of the 50 states in the U.S

I have a table created. With one column named states and another column called land area. I am using oracle 11g. I have looked at various questions on here and cannot find a solution. Here is what I have tried so far:
SELECT LandAreas, State
FROM ( SELECT LandAreas, State, DENSE_RANK() OVER (ORDER BY State DESC) sal_dense_rank
FROM Map )
WHERE sal_dense_rank >= 5;
This does not provide the top 5 land areas as far as number wise.
I have also tried this one but no go either:
SELECT * FROM Map order by State desc)
where rownum < 5;
Anyone have any suggestions to get me on the right track??
Here is a samle of the table
states land areas
michagan 15000
florida 25000
tennessee 10000
alabama 80000
new york 150000
california 20000
oregon 5000
texas 6000
utah 3000
nebraska 1000
Desired output from query:
States land area
new york 150000
alabama 80000
florida 25000
california 20000
Try:
Select * from
(SELECT State, LandAreas FROM Map ORDER BY LandAreas DESC)
where rownum < 6
Link to Fiddle
Use a HAVING clause and count the number state states larger:
SELECT m.state, m.landArea
FROM Map m
LEFT JOIN Map m2 on m2.landArea > m.landArea
GROUP BY m.state, m.landArea
HAVING count(*) < 5
ORDER BY m.landArea DESC
See SQLFiddle
This joins each state to every state whose area is greater, then uses a HAVING clause to return only those states where the number of larger states was less than 5.
Ties are all returned, leading to more than 5 rows in the case of a tie for 5th.
The left join is needed for the case of the largest state, which has no other larger state to join to.
The ORDER BY is optional.
Try something like this
select m.states,m.landarea
from map m
where (select count(‘x’) from map m2 where m2.landarea > m.landarea)<=5
order by m.landarea
There are two bloomers in your posted code.
You need to use landarea in the DENSE_RANK() call. At the moment you're ordering the states in reverse alphabetical order.
Your filter in the outer query is the wrong way around: you're excluding the top four results.
Here is what you need ...
SELECT LandArea, State
FROM ( SELECT LandArea
, State
, DENSE_RANK() OVER (ORDER BY landarea DESC) as area_dr
FROM Maps )
WHERE area_dr <= 5
order by area_dr;
... and here is the SQL Fiddle to prove it. (I'm going with the statement in the question that you want the top 5 biggest states and ignoring the fact that your desired result set has only four rows. But adjust the outer filter as you will).
There are three different functions for deriving top-N result sets: DENSE_RANK, RANK and ROW_NUMBER.
Using ROW_NUMBER will always guarantee you 5 rows in the result set, but you may get the wrong result if there are several states with the same land area (unlikely in this case, but other data sets will produce such clashes). So: 1,2,3,4,5
The difference between RANK and DENSE_RANK is how they handle ties. DENSE_RANK always produces a series of consecutive numbers, regardless of how many rows there are in each rank. So: 1,2,2,3,3,3,4,5
RANK on the other hand will produce a sparse series if a given rank has more than one hit. So: 1,2,2,4,4,4.
Note that each of the example result sets has a different number of rows. Which one is correct? It depends on the precise question you want to ask.
Using a sorted sub-query with the ROWNUM pseudo-column will work like the ROW_NUMBER function, but I prefer using ROW_NUMBER because it is more powerful and more error-proof.

OBIEE using the same folder/fact twice aggregating on both

I know the exact SQL I would need to write to retrieve the results I'm looking for from the Oracle BI tool, however, as I am new to Oracle BI I am struggling to find a way to reproduce the same results. I realize that the ultimate answer largely depends on the BI data model and that takes a lot more communication than a question on Stack Overflow will allow, so I'm looking for more generic how-to answers than a specific definitive answer for my scenario.
Perhaps the SQL will help for starters:
select "All"."DT", ("LessThan5Mins"."Count" / "All"."Count") * 100
from
(
select to_char(m."EndDateTime", 'YYYY-MM') "DT", count(*) "Count"
from "Measurement" m,
"DwellTimeMeasurement" dtm
where dtm."MeasurementBase_id" = m."Id"
group by to_char(m."EndDateTime", 'YYYY-MM')
) "All",
(
select to_char(m."EndDateTime", 'YYYY-MM') "DT", count(*) "Count"
from "Measurement" m,
"DwellTimeMeasurement" dtm
where dtm."MeasurementBase_id" = m."Id"
and m."MeasValue" <= 300
group by to_char(m."EndDateTime", 'YYYY-MM')
) "LessThan5Mins"
where "All"."DT" = "LessThan5Mins"."DT";
The purpose of this is to return the percentage of dwell time records that were less than or equal to 5 mins (300 seconds).
I have a fact that represents the "MeasValue" field in the above query.
All attempts I've made to reproduce the dual result set nature of the above query in BI have failed.
Is the above possible in OBIEE and if so, how might I achieve this?
I'm assuming that you have imported the Measurement (M) and DwellTimeMeasurement (DTM) tables into the physical layer of the RPD, specified the join on DTM.MeasurementBase_id = M.Id, and then brought them both through to the presentation layer.
If so, then you could start building this query in Answers on the criteria tab by dragging in M.EndDateTime and any OBIEE measure column from DTM, for example DTM.Amount. Edit the formula for the DTM.Amount column:
Filter the column by clicking the filter button shown in blue below.
In the following dialog box double click on M.MeasValue and then select "is less than or equal to" and type 300 in the Value text box. Click OK twice and your column formula should now look something like this:
FILTER(DTM.Amount USING (M.MeasValue <= 300))
Now wrap this with COUNT():
COUNT(FILTER(DTM.Amount USING (M.MeasValue <= 300)))
This will give the count of records with M.MeasValue <= 300. You could rename this column to be "LessThan5Mins". Click OK to save the new formula. Now drag in the DTM.Amount column again but this time only perform a COUNT():
COUNT(DTM.Amount)
This will give you the count of all dwell time records. You could rename this to "All". Finally drag in the DTM.Amount column one last time and edit it's formula again. This is where you will calculate the percentage with a formula similar to the following:
COUNT(FILTER(DTM.Amount USING (M.MeasValue <= 300))) / COUNT(DTM.Amount) * 100
So ultimately you will have four columns with the following titles and formulas:
TITLE FORMULA
----- --------
EndDateTime M.EndDateTime
LessThan5Mins COUNT(FILTER(DTM.Amount USING (M.MeasValue <= 300)))
All COUNT(DTM.Amount)
% LessThan5Mins COUNT(FILTER(DTM.Amount USING (M.MeasValue <= 300))) / COUNT(DTM.Amount) * 100
Note that including the EndDateTime column takes care of grouping the records. Also, to match your original query you would only need the EndDateTime and % LessThan5Mins columns (you could hide or exclude the other columns) but I wanted to demonstrate for you the process of filtering column values in OBIEE.

Oracle Daily count/average over a year

I'm pulling two pieces of information over a specific time period, but I would like to fetch the daily average of one tag and the daily count of another tag. I'm not sure how to do daily averages over a specific time period, can anyone provide some advice? Below were my first ideas on how to handle this however to change every date would be annoying. Any help is appreciated thanks
SELECT COUNT(distinct chargeno), to_char(chargetime, 'mmddyyyy') AS chargeend
FROM batch_index WHERE plant=1 AND chargetime>to_date('2012-06-18:00:00:00','yyyy-mm-dd:hh24:mi:ss')
AND chargetime<to_date('2012-07-19:00:00:00','yyyy-mm-dd:hh24:mi:ss')
group by chargetime;
The working version of the daily sum
SELECT to_char(bi.chargetime, 'mmddyyyy') as chargtime, SUM(cv.val)*0.0005
FROM Charge_Value cv, batch_index bi WHERE cv.ValueID =97
AND bi.chargetime<=to_date('2012-07-19','yyyy-mm-dd')
AND bi.chargeno = cv.chargeno AND bi.typ=1
group by to_char(bi.chargetime, 'mmddyyyy')
seems like in the first one you want to change the group to the day - not the time... (plus i dont think you need to specify all those 0's for seconds..)
SELECT COUNT(distinct chargeno), to_char(chargetime, 'mmddyyyy') AS chargeend
FROM batch_index WHERE plant=1 AND chargetime>to_date('2012-06-18','yyyy-mm-dd')
AND chargetime<to_date('2012-07-19','yyyy-mm-dd')
group by to_char(chargetime, 'mmddyyyy') ;
not 100% I'm following your question, but if you just want to do aggregates (sums, avg), then do just that. I threw in the rollup just in case that is what you were looking for
with fakeData as(
select trunc(level *.66667) nr
, trunc(2*level * .33478) lvl --these truncs just make the doubles ints
,trunc(sysdate+trunc(level*.263784123)) dte --note the trunc, this gets rid of the to_char to drop the time
from dual
connect by level < 600
) --the cte is just to create fake data
--below is just some aggregates that may help you
select sum(nr) daily_sum_of_nr
, avg(nr) daily_avg_of_nr
, count(distinct lvl) distinct_lvls_per_day
, count(lvl) count_of_nonNull_lvls_per_day
, dte days
from fakeData
group by rollup(dte)
--if you want the query to supply a total for the range, you may use rollup ( http://psoug.org/reference/rollup.html )

Resources