calculating empirical distribution of dataset in oracle using model clause - oracle

I can find empirical distribution that way
select command_type, duration, round(percentage, 2)
from (select distinct command_type,duration_sec,
percent_rank() over(partition by command_type order by duration) percentage
from command_durations
order by 1, 2)
The question is how to do the same using oracle model clause. I have started with this
select command_type,duration,dur_count from command_durations
model UNIQUE SINGLE REFERENCE
partition by (command_type)
dimension by ( duration)
measures(0 dur_count)
rules(
dur_count[duration]=count(1)[cv(duration)]
)
order by command_type,duration
But now I need to make records distinct, in order to be able to proceed with finding empirical distribution.
How to do the records distinct in the model clause?

If you want to take that query and use 'distinct' on it, one method might be to wrap that in a From Subquery statement, and then do a distinct. For instance:
Select Distinct command_type, duration, dur_count
From (
[Your Code]
)
Let me know if that works.

Related

I tested in my SQL Developer one case about "Subquery in Order By"

I have question about "Subquery in Order by clause". The below request returns the error. Is it means that Subquery in Order by clause must be scalar?
select *
from employees
order by (select * from employees where first_name ='Steven' and last_name='King');
Error:
ORA-00913: too many values
00913. 00000 - "too many values"
Yes, it means that if you use a subquery in ORDER BY it must be scalar.
With select * your subquery returns multiple columns and the DBMS would not know which of these to use for the sorting. And if you selected one column only, you would still have to make sure you only select one row of course. (The difference is that Oracle sees the too-many-columns problem immediately, but detect too many rows only when fetching the data.)
This would be allowed:
select * from employees
order by (select birthdate from employees where employee_id = 12345);
This is a scalar query, because it returns only one value (one column, one row). But of course this still makes as little sense as your original query, because the subquery result is independent from the main query, i.e. it returns the same value for every row in the table and thus no sorting takes effect.
A last remark: A subquery in ORDER BY makes very seldomly sense, because that would mean you order by something you don't display. The exception is when looking up a sortkey. E.g.:
select *
from products p
where type = 'shirt' and color = 'blue' and size in ('S', 'M', 'L', 'XL')
order by (select sortkey from sizes s where s.size = p.size);
It means that valid options for ORDER BY clause can be
expression,
position or
column alias
A subquery is neither of these.

Function returning Last record

I don't often use ORACLE PL/SQL by the way but i need to understand what if anything in this function created by someone else
in the company before me is wrong as for it is not returning the latest record i've been told. I found out in some other forum issues that they
suggested to use the max(dateColumn) instead of "row_numer = 1" for example but not quite sure how to and where to incorporate that.
-- Knowing that --
We use Oracle version 12,
CustomObjectTypeA is an custom Oracle OBJECT TYPE defined by some old employee not longer in here,
V_OtherView is of Table_Mnd type beeing defined by some old employee not longer in here,
V_ABC_123 is a view created by some old employee not longer in here as well.
CREATE OR REPLACE FUNCTION F_TABLE_APPROVED (NUMBER_F_UPD number, NUMBER_F_GET VARcHAR2)
RETURN Table_Mnd
IS
V_OtherView Table_Mnd
BEGIN
SELECT CustomObjectTypeA (FromT.NUMBER_F,
FromT.OP_CODE,
FromT.CATG_CODE,
FromT.CATG_NAME,
FromT.CATG_SORT,
FromT.ORG_CODE,
FromT.ORG_NAME
FromT.DATA_ENTRY_VALID,
FromT.NUMBER_RECEIVED,
FromT.YEAR_1,
FromT.YEAR_2)
BULK COLLECT INTO V_OtherView
FROM (SELECT NUMBER_F,
OP_CODE,
CATG_CODE,
CATG_NAME,
CATG_SORT,
ORG_CODE,
ORG_NAME
DATA_ENTRY_VALID,
NUMBER_RECEIVED,
YEAR_1,
YEAR_2,
ROW_NUMBER() OVER (PARTITION BY BY ORG_CODE ORDER BY NUMBER_RECEIVED DESC, LOAD_DATE DESC) AS ROW_NUMBER
FROM V_ABC_123
WHERE NUMBER_F = NUMBER_F_UPD AND DATA_ENTRY_VALID <> 'OnGoing'
AND LOAD_DATE >= (SELECT sysdate-10 FROM dual)
AND LOAD_DATE <= (SELECT DISTINCT LOAD_DATE
FROM V_ABC_123
WHERE NUMBER_RECEIVED = NUMBER_F_GET)) FromT
WHERE FromT.ROW_NUMBER=1;
RETURN V_OtherView;
END F_TABLE_APPROVED;
The important bits of the query are:
SELECT ...
FROM (select ...,
ROW_NUMBER()
OVER (PARTITION BY ORG_CODE
ORDER BY NUMBER_RECEIVED DESC,
LOAD_DATE DESC) AS ROW_NUMBER
...) FromT
WHERE FromT.ROW_NUMBER = 1;
The "ROW_NUMBER" column is computed according to the following window clause:
PARTITION BY ORG_CODE
ORDER BY NUMBER_RECEIVED DESC, LOAD_DATE DESC
Which means that for each ORG_CODE, it will sort all the records by NUMBER_RECEVED,LOAD_DATE in descending order. Note that if the columns are Oracle DATEs, they will only be accurate to the nearest second; so if there are multiple records with date/times in the exact same 1-second interval, this sort order will not be guaranteed unique. The logic of ROW_NUMBER will therefore pick one of them arbitrarily (i.e. whichever record happens to be emitted first) and assign it the value "1", and this will be deemed the "latest". Subsequent executions of the same SQL could (in theory) return a different record.
The suspicious part is NUMBER_RECEIVED which sounds like it's a number, not a date? Sorting by this means that the records with the highest NUMBER_RECEIVED will be preferred. Was this intentional?
I'm not sure why the PARTITION is there, this would cause the query to return one "latest" record for each value of ORG_CODE that it finds. I can only assume this was intentional.
The problem is that the query can only determine the "latest record" as well as it can based on the data provided to it. In this case, it's possible the data is simply not granular enough to be able to decide which record is the actual "latest" record.

Oracle tuning for query with query annidate

i am trying to better a query. I have a dataset of ticket opened. Every ticket has different rows, every row rappresent an update of the ticket. There is a field (dt_update) that differs it every row.
I have this indexs in the st_remedy_full_light.
IDX_ASSIGNMENT (ASSIGNMENT)
IDX_REMEDY_INC_ID (REMEDY_INC_ID)
IDX_REMDULL_LIGHT_DTUPD (DT_UPDATE)
Now, the query is performed in 8 second. Is high for me.
WITH last_ticket AS
( SELECT *
FROM st_remedy_full_light a
WHERE a.dt_update IN
( SELECT MAX(dt_update)
FROM st_remedy_full_light
WHERE remedy_inc_id = a.remedy_inc_id
)
)
SELECT remedy_inc_id, ASSIGNMENT FROM last_ticket
This is the plan
How i could to better this query?
P.S. This is just a part of a big query
Additional information:
- The table st_remedy_full_light contain 529.507 rows
You could try:
WITH last_ticket AS
( SELECT remedy_inc_id, ASSIGNMENT,
rank() over (partition by remedy_inc_id order by dt_update desc) rn
FROM st_remedy_full_light a
)
SELECT remedy_inc_id, ASSIGNMENT FROM last_ticket
where rn = 1;
The best alternative query, which is also much easier to execute, is this:
select remedy_inc_id
, max(assignment) keep (dense_rank last order by dt_update)
from st_remedy_full_light
group by remedy_inc_id
This will use only one full table scan and a (hash/sort) group by, no self joins.
Don't bother about indexed access, as you'll probably find a full table scan is most appropriate here. Unless the table is really wide and a composite index on all columns used (remedy_inc_id,dt_update,assignment) would be significantly quicker to read than the table.

Oracle Analytic Rolling Percentile

Is it possible to use windowing with any of the percentile functions? Or do you know a work around to get a rolling percentile value?
It is easy with a moving average:
select avg(foo) over (order by foo_date rows
between 20 preceding and 1 preceding) foo_avg_ma
from foo_tab
But I can't figure out how to get the median (50% percentile) over the same window.
You can use PERCENTILE_CONT or PERCENTILE_DISC function to find the median.
PERCENTILE_CONT is an inverse distribution function that assumes a
continuous distribution model. It takes a percentile value and a sort
specification, and returns an interpolated value that would fall into
that percentile value with respect to the sort specification. Nulls
are ignored in the calculation.
...
PERCENTILE_DISC is an inverse distribution function that assumes a
discrete distribution model. It takes a percentile value and a sort
specification and returns an element from the set. Nulls are ignored
in the calculation.
...
The following example computes the median salary in each department:
SELECT department_id,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary DESC) "Median cont",
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY salary DESC) "Median disc"
FROM employees
GROUP BY department_id
ORDER BY department_id;
...
PERCENTILE_CONT and PERCENTILE_DISC may return different results.
PERCENTILE_CONT returns a computed result after doing linear
interpolation. PERCENTILE_DISC simply returns a value from the set of
values that are aggregated over. When the percentile value is 0.5, as
in this example, PERCENTILE_CONT returns the average of the two middle
values for groups with even number of elements, whereas
PERCENTILE_DISC returns the value of the first one among the two
middle values. For aggregate groups with an odd number of elements,
both functions return the value of the middle element.
a SAMPLE with windowing simulation trough range self-join
with sample_data as (
select /*+materialize*/ora_hash(owner) as table_key,object_name,
row_number() over (partition by owner order by object_name) as median_order,
row_number() over (partition by owner order by dbms_random.value) as any_window_sort_criteria
from dba_objects
)
select table_key,x.any_window_sort_criteria,x.median_order,
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY y.median_order DESC) as rolling_median,
listagg(to_char(y.median_order), ',' )WITHIN GROUP (ORDER BY y.median_order) as elements
from sample_data x
join sample_data y using (table_key)
where y.any_window_sort_criteria between x.any_window_sort_criteria-3 and x.any_window_sort_criteria+3
group by table_key,x.any_window_sort_criteria,x.median_order
order by table_key, any_window_sort_criteria
/

What does PARTITION BY 1 mean?

For a pair of cursors where the total number of rows in the resultset is required immediately after the first FETCH, ( after some trial-and-error ) I came up with the query below
SELECT
col_a,
col_b,
col_c,
COUNT(*) OVER( PARTITION BY 1 ) AS rows_in_result
FROM
myTable JOIN theirTable ON
myTable.col_a = theirTable.col_z
GROUP BY
col_a, col_b, col_c
ORDER BY
col_b
Now when the output of the query is X rows, rows_in_result reflects this accurately.
What does PARTITION BY 1 mean?
I think it probably tells the database to partition the results into pieces of 1-row each
It is an unusual use of PARTITION BY. What it does is put everything into the same partition so that if the query returns 123 rows altogether, then the value of rows_in_result on each row will be 123 (as its alias implies).
It is therefore equivalent to the more concise:
COUNT(*) OVER ()
Databases are quite free to add restrictions to the OVER() clause. Sometimes, either PARTITION BY [...] and/or ORDER BY [...] are mandatory clauses, depending on the aggregate function. PARTITION BY 1 may just be a dummy clause used for syntax integrity. The following two are usually equivalent:
[aggregate function] OVER ()
[aggregate function] OVER (PARTITION BY 1)
Note, though, that Sybase SQL Anywhere and CUBRID interpret this 1 as being a column index reference, similar to what is possible in the ORDER BY [...] clause. This might appear to be a bit surprising as it imposes an evaluation order to the query's projection. In your case, this would then mean that the following are equivalent
COUNT(*) OVER (PARTITION BY 1)
COUNT(*) OVER (PARTITION BY col_a)
This curious deviation from other databases' interpretation allows for referencing more complex grouping expressions.

Resources