is it uniform distribution HBC ORACLE HISTOGRAME? - oracle

We have this data : Table R(A,..) with attribute A, nbLine of R is 1000, distinct value for A are 500.
data are displayed like this : bucket -> end_point_value.
1 -> 800
2 -> 900
3 -> 1000
4 -> 1200
5 -> 1500
6 -> 2000
7 -> 2300
8 -> 2400
9 -> 2550
10 -> 2590
the question is : Does this histogram confirm or deny the hypothesis of a uniform distribution uniform?
I think I can not confirm nor deny, what do you think ?

First you must ask, which histogram type is defined on the column.
Oracle provides four different histogram types and is you want to claim about uniform distribution the frequency histogram must be defined.
The frequency histogram has one bucket for each distinct value (stored in ENDPOINT_VALUE and the frequency is (additive) stored in the column ENDPOINT_NUMBER)
So if you histogram has only 10 buckets (as you show in the data) you are ready and you can say nothing about the distribution.
Example of a Uniform Distribution
create table r as
select
1 + trunc((rownum-1)/2) A
from dual connect by level <= 1000;
select count(*), count(distinct a), min(a), max(a) from r;
COUNT(*) COUNT(DISTINCTA) MIN(A) MAX(A)
---------- ---------------- ---------- ----------
1000 500 1 500
Create FREQUENCY Histogram with 500 Buckets
exec dbms_stats.gather_table_stats(ownname=>user, tabname=>'R', method_opt=>'for all columns size 500');
select NUM_BUCKETS, HISTOGRAM from user_tab_columns where table_name = 'R';
NUM_BUCKETS HISTOGRAM
----------- ---------------
500 FREQUENCY
select ENDPOINT_VALUE, ENDPOINT_NUMBER from user_histograms where table_name = 'R' order by ENDPOINT_VALUE;
ENDPOINT_VALUE, ENDPOINT_NUMBER
1 2
2 4
3 6
4 8
...
498 996
499 998
500 1000

Related

Is it possible to add distinct to part of a sum clause in Oracle?

I have a pretty lengthy SQL query which I'm going to run on Oracle via hibernate. It consists of two nested selects. In the first select statement, a number of sums are calculated, but in one of them I want to filter the results using unique ids.
SELECT ...
SUM(NVL(CASE WHEN SECOND_STATUS= 50 OR SECOND_STATUS IS NULL THEN RECEIVE_AMOUNT END, 0) +
NVL(CASE WHEN FIRST_STATUS = 1010 THEN AMOUNT END, 0) +
NVL(CASE WHEN FIRST_STATUS = 1030 THEN AMOUNT END, 0) -
NVL(CASE WHEN FIRST_STATUS = 1010 AND (SECOND_STATUS= 50 OR SECOND_STATUS IS NULL) THEN RECEIVE_AMOUNT END, 0)) TOTAL, ...
And at the end:
... FROM (SELECT s.*, p.* FROM FIRST_TABLE s
JOIN SECOND_TABLE p ON s.ID = p.FIRST_ID
In one of the lines that start with NVL (second line actually), I want to add a distinct clause that sums the amounts only if first table ids are unique. But I don't know if this is possible or not. If yes, how would it be?
Assume such setup
select * from first;
ID AMOUNT
---------- ----------
1 10
2 20
select * from second;
SECOND_ID FIRST_ID AMOUNT2
---------- ---------- ----------
1 1 100
2 1 100
3 2 100
After the join you get the total sum of both amounts too high because the amount from the first table is duplicated.
select *
from first
join second on first.id = second.first_id;
ID AMOUNT SECOND_ID FIRST_ID AMOUNT2
---------- ---------- ---------- ---------- ----------
1 10 1 1 100
1 10 2 1 100
2 20 3 2 100
You must add a row_number that identifies the first occurence in the parent table and consider in the AMOUNT only the first row and resets it to NULL in the duplicated rows.
select ID,
case when row_number() over (partition by id order by second_id) = 1 then AMOUNT end as AMOUNT,
SECOND_ID, FIRST_ID, AMOUNT2
from first
join second on first.id = second.first_id;
ID AMOUNT SECOND_ID FIRST_ID AMOUNT2
---------- ---------- ---------- ---------- ----------
1 10 1 1 100
1 2 1 100
2 20 3 2 100
Now you can safely sum in a separate subquery
with tab as (
select ID,
case when row_number() over (partition by id order by second_id) = 1 then AMOUNT end as AMOUNT,
SECOND_ID, FIRST_ID, AMOUNT2
from first
join second on first.id = second.first_id
)
select id, sum(nvl(amount,0) + nvl(amount2,0))
from tab
group by id
;
ID SUM(NVL(AMOUNT,0)+NVL(AMOUNT2,0))
---------- ---------------------------------
1 210
2 120
Note also that this is an answer to your question. Some will argue that in your situation you should first aggregate and than join. This will also resolve your problem possible more elegantly.

How to do a Range Bucket on a Count returned by a Group By query

I was hoping if you can help me.
I am in a situation where I first need to do a Count Distinct of Funds and Group By Policies. Once I have done that, I need to put the count of policies in Range of Number of Funds.
This is the data available, where you can see the policies and different funds linked to it
**PolicyNum Fund**
1201 AB
1202 AC
1203 AB
1203 AC
1203 AD
1204 AB
1204 BC
1204 AC
1204 AD
1204 AE
1204 AF
Now I need to do a Count Distinct of Fund Grouped by Policy.
I have used this query to do that:
select fv, policy, count(distinct fv.fund)
from policy_fund fv
group by fv.policy
order by count(distinct fv.fund) desc
After using the above code, the following would come up
This is a view where you can see the number of funds linked to each policy
**Policy No. of Funds**
1201 1
1202 1
1203 3
1204 6
Now, the problem part, I want to reach to this, which is the Range of Number of Funds and how many policies fall under that range of funds:
Help required to achieve this view of Range of Number of Funds and how many policies are present in each range
**Range of Number of funds Number of policies**
0 to 1 2
2 to 3 1
4 to 5 0
5 to 6 1
You can left join your query to a derived table defining the ranges on the count being in the range and then group by the range and count the policies.
SELECT r.l || ' to ' || r.u "RANGE",
count(p) "COUNT"
FROM (SELECT 0 l,
1 u
FROM dual
UNION ALL
SELECT 2 l,
3 u
FROM dual
UNION ALL
SELECT 4 l,
5 u
FROM dual
UNION ALL
SELECT 6 l,
7 u
FROM dual) r
LEFT JOIN (SELECT fv.policy p,
count(distinct fv.fund) cof
FROM policy_fund fv
GROUP BY fv.policy) fpp
ON fpp.cof >= r.l
AND fpp.cof <= r.u
GROUP BY r.l,
r.u
ORDER BY r.l,
r.u;
db<>fiddle

Is there an algorithm that can divide a number into three parts and have their totals match the original number?

For example if you take the following example into consideration.
100.00 - Original Number
33.33 - 1st divided by 3
33.33 - 2nd divided by 3
33.33 - 3rd divided by 3
99.99 - Is the sum of the 3 division outcomes
But i want it to match the original 100.00
One way that i saw it could be done was by taking the original number minus the first two divisions and the result would be my third number. Now if i take those 3 numbers i get my original number.
100.00 - Original Number
33.33 - 1st divided by 3
33.33 - 2nd divided by 3
33.34 - 3rd number
100.00 - Which gives me my original number correctly. (33.33+33.33+33.34 = 100.00)
Is there a formula for this either in Oracle PL/SQL or a function or something that could be implemented?
Thanks in advance!
This version takes precision as a parameter as well:
with q as (select 100 as val, 3 as parts, 2 as prec from dual)
select rownum as no
,case when rownum = parts
then val - round(val / parts, prec) * (parts - 1)
else round(val / parts, prec)
end v
from q
connect by level <= parts
no v
=== =====
1 33.33
2 33.33
3 33.34
For example, if you want to split the value among the number of days in the current month, you can do this:
with q as (select 100 as val
,extract(day from last_day(sysdate)) as parts
,2 as prec from dual)
select rownum as no
,case when rownum = parts
then val - round(val / parts, prec) * (parts - 1)
else round(val / parts, prec)
end v
from q
connect by level <= parts;
1 3.33
2 3.33
3 3.33
4 3.33
...
27 3.33
28 3.33
29 3.33
30 3.43
To apportion the value amongst each month, weighted by the number of days in each month, you could do this instead (change the level <= 3 to change the number of months it is calculated for):
with q as (
select add_months(date '2013-07-01', rownum-1) the_month
,extract(day from last_day(add_months(date '2013-07-01', rownum-1)))
as days_in_month
,100 as val
,2 as prec
from dual
connect by level <= 3)
,q2 as (
select the_month, val, prec
,round(val * days_in_month
/ sum(days_in_month) over (), prec)
as apportioned
,row_number() over (order by the_month desc)
as reverse_rn
from q)
select the_month
,case when reverse_rn = 1
then val - sum(apportioned) over (order by the_month
rows between unbounded preceding and 1 preceding)
else apportioned
end as portion
from q2;
01/JUL/13 33.7
01/AUG/13 33.7
01/SEP/13 32.6
Use rational numbers. You could store the numbers as fractions rather than simple values. That's the only way to assure that the quantity is truly split in 3, and that it adds up to the original number. Sure you can do something hacky with rounding and remainders, as long as you don't care that the portions are not exactly split in 3.
The "algorithm" is simply that
100/3 + 100/3 + 100/3 == 300/3 == 100
Store both the numerator and the denominator in separate fields, then add the numerators. You can always convert to floating point when you display the values.
The Oracle docs even have a nice example of how to implement it:
CREATE TYPE rational_type AS OBJECT
( numerator INTEGER,
denominator INTEGER,
MAP MEMBER FUNCTION rat_to_real RETURN REAL,
MEMBER PROCEDURE normalize,
MEMBER FUNCTION plus (x rational_type)
RETURN rational_type);
Here is a parameterized SQL version
SELECT COUNT (*), grp
FROM (WITH input AS (SELECT 100 p_number, 3 p_buckets FROM DUAL),
data
AS ( SELECT LEVEL id, (p_number / p_buckets) group_size
FROM input
CONNECT BY LEVEL <= p_number)
SELECT id, CEIL (ROW_NUMBER () OVER (ORDER BY id) / group_size) grp
FROM data)
GROUP BY grp
output:
COUNT(*) GRP
33 1
33 2
34 3
If you edit the input parameters (p_number and p_buckets) the SQL essentially distributes p_number as evenly as possible among the # of buckets requested (p_buckets).
I've solved this problem yesterday by subtracting 2 of 3 parts from the starting number, e.g. 100 - 33.33 - 33.33 = 33.34 and the result of summing it up is still 100.

CONNECT BY for two tables with two JOINS

I have 3 tables:
two with hierarchical structures
(like "dimensions" of recursive type of hierarchy);
one with summing data (like "facts" with X column).
They are here:
DIM1 (ID1, PARENT2, NAME1)
DIM2 (ID2, PARENT2, NAME2)
FACTS (ID1, ID2, X)
Example of DIM1 table:
-- 1 0 DIM1
---- 2 1 DIM1-A
------ 3 2 DIM1-A-A
-------- 4 3 DIM1-A-A-A
-------- 5 3 DIM1-A-A-B
------ 6 2 DIM1-A-B
-------- 7 6 DIM1-A-B-A
-------- 8 6 DIM1-A-B-B
------ 9 2 DIM1-A-C
---- 10 1 DIM1-B
------ 11 10 DIM1-B-C
------ 12 10 DIM1-B-D
---- 13 1 DIM1-C
Example of DIM2 table:
-- 1 0 DIM2
---- 2 1 DIM2-A
------ 3 2 DIM2-A-A
-------- 4 3 DIM2-A-A-A
-------- 5 3 DIM2-A-A-B
-------- 6 3 DIM2-A-B-C
------ 7 2 DIM2-A-B
---- 8 1 DIM2-B
---- 9 1 DIM2-C
Example of FACTS table:
1 1 100
1 2 30
1 3 500
-- ................
13 9 200
And I would like to create the only SELECT where I will specify the parent for DIM1 (for example ID1=2 for DIM1-A) and parent for DIM2 (for example ID2=2 for DIM2-A) and SELECT will generate a report like this:
Name_of_1 Name_of_2 Sum_of_X
--------- --------- ----------
DIM1-A-A DIM2-A-A (some sum)
DIM1-A-A DIM2-A-B (some sum)
DIM1-A-B DIM2-A-A (some sum)
DIM1-A-B DIM2-A-B (some sum)
DIM1-A-C DIM2-A-A (some sum)
DIM1-A-C DIM2-A-B (some sum)
I would like to use CONNECT BY phrase, START WITH phrase, SUM phrase, GROUP BY phrase, and OUTER or INNER (?) JOIN. I need no other extensions of Oracle 10.2.
In other words: only with "classic" SQL and
only Oracle extensions for hierarchy queries.
Is it possible?
I tried some experiments with question in
Mixing together Connect by, inner join and sum with Oracle
(where is a very nice solution but only for one
dimension table ("Tasks"), but I need to JOIN two dimension tables to one facts table), but I was not successful.
"Some sum" is not very descriptive, so I don't see why do you need CONNECT BY at all.
SELECT dim1.name, dim2.name, x
FROM (
SELECT id1, id2, SUM(x) AS x
FROM facts
GROUP BY
id1, id2
) f
JOIN dim1
ON dim1.id = f.id1
JOIN dim2
ON dim2.id = f.id2
I think what you're trying to do is get the sum of the value in the facts table for all of the children of the specified rows grouped by the topmost children. This would mean that in your example above, the results for the first row would be the sum any intersections of (DIM1-A-A, DIM1-A-A-A, DIM1-A-A-B) and (DIM2-A-A, DIM2-A-A-A, DIM2-A-A-B, DIM3-A-A-C) found in the FACTS table. With that assumption, I have come to the following solution:
SELECT root_name1, root_name2, SUM(X)
FROM ( SELECT CONNECT_BY_ROOT(name1) AS root_name,
id1
FROM dim1
CONNECT BY parent1 = PRIOR id1
START WITH parent1 = 2) d1
CROSS JOIN
( SELECT CONNECT_BY_ROOT(name2) AS root_name,
id2
FROM dim2
CONNECT BY parent2 = PRIOR id2
START WITH parent2 = 2) d2
LEFT OUTER JOIN
facts
ON d1.id1 = facts.id1
AND d2.id2 = facts.id2
GROUP BY root_name1, root_name2
(This also assumes that the columns of FACTS are named ID1, ID2, and X.)

Oracle Calculation Involving Results of Another Calculation

First off, I'm a total Oracle noob although I'm very familiar with SQL. I have a single cost column. I need to calculate the total cost, the percentage of the total cost, and then a running sum of the percentages. I'm having trouble with the running sum of percentages because the only way I can think to do this uses nested SUM functions, which isn't allowed.
Here's what works:
SELECT cost, SUM(cost) OVER() AS total, cost / SUM(cost) OVER() AS per
FROM my_table
ORDER BY cost DESC
Here's what I'm trying to do that doesn't work:
SELECT cost, SUM(cost) OVER() AS total, cost / SUM(cost) OVER() AS per,
SUM(cost/SUM(cost) OVER()) OVER(cost) AS per_sum
FROM my_table
ORDER BY cost DESC
Am I just going about it wrong, or is what I'm trying to do just not possible? By the way I'm using Oracle 10g. Thanks in advance for any help.
You don't need the order by inside that inline view, especially since the outer select is doing an order by the order way around. Also, cost / SUM(cost) OVER () equals RATIO_TO_REPORT(cost) OVER ().
An example:
SQL> create table my_table(cost)
2 as
3 select 10 from dual union all
4 select 20 from dual union all
5 select 5 from dual union all
6 select 50 from dual union all
7 select 60 from dual union all
8 select 40 from dual union all
9 select 15 from dual
10 /
Table created.
Your initial query:
SQL> SELECT cost, SUM(cost) OVER() AS total, cost / SUM(cost) OVER() AS per
2 FROM my_table
3 ORDER BY cost DESC
4 /
COST TOTAL PER
---------- ---------- ----------
60 200 .3
50 200 .25
40 200 .2
20 200 .1
15 200 .075
10 200 .05
5 200 .025
7 rows selected.
Quassnoi's query contains a typo:
SQL> SELECT cost, total, per, SUM(running) OVER (ORDER BY cost)
2 FROM (
3 SELECT cost, SUM(cost) OVER() AS total, cost / SUM(cost) OVER() AS per
4 FROM my_table
5 ORDER BY
6 cost DESC
7 )
8 /
SELECT cost, total, per, SUM(running) OVER (ORDER BY cost)
*
ERROR at line 1:
ORA-00904: "RUNNING": invalid identifier
And if I correct that typo. It gives the right results, but wrongly sorted (I guess):
SQL> SELECT cost, total, per, SUM(per) OVER (ORDER BY cost)
2 FROM (
3 SELECT cost, SUM(cost) OVER() AS total, cost / SUM(cost) OVER() AS per
4 FROM my_table
5 ORDER BY
6 cost DESC
7 )
8 /
COST TOTAL PER SUM(PER)OVER(ORDERBYCOST)
---------- ---------- ---------- -------------------------
5 200 .025 .025
10 200 .05 .075
15 200 .075 .15
20 200 .1 .25
40 200 .2 .45
50 200 .25 .7
60 200 .3 1
7 rows selected.
I think this is the one you are looking for:
SQL> select cost
2 , total
3 , per
4 , sum(per) over (order by cost desc)
5 from ( select cost
6 , sum(cost) over () total
7 , ratio_to_report(cost) over () per
8 from my_table
9 )
10 order by cost desc
11 /
COST TOTAL PER SUM(PER)OVER(ORDERBYCOSTDESC)
---------- ---------- ---------- -----------------------------
60 200 .3 .3
50 200 .25 .55
40 200 .2 .75
20 200 .1 .85
15 200 .075 .925
10 200 .05 .975
5 200 .025 1
7 rows selected.
Regards,
Rob.
SELECT cost, total, per, SUM(per) OVER (ORDER BY cost)
FROM (
SELECT cost, SUM(cost) OVER() AS total, cost / SUM(cost) OVER() AS per
FROM my_table
)
ORDER BY
cost DESC

Resources