MDX Calculate the sum of distinct values - distinct

I'm wondering if it is possible to create a calculated member to obtain the sum of distinct values for a fact. I will try to explain it with the following example:
I have a fact where the primary key is related with two dimensions (one to many cardinality). The fact contains a measure and its value is the same for all members of each distinct combination of FACT_ID and DIM_1_ID. For the total, I don't want to consider multiple times the same values. So, with the following values the total should be 450 and not 850 (default Mondrian behavior).
| FACT_ID | DIM_1_ID | DIM_2_ID | MEASURE |
|---------|----------|----------|---------|
| 1 | A | D | 100 |
| 1 | A | E | 100 |
| 1 | B | F | 50 |
| 2 | A | D | 300 |
| 2 | A | E | 300 |
|---------|----------|----------|---------|
TOTAL | 450 |
Is it possible? How can it be done with Mondrian?
Thanks in advance
UPDATE - Current status
As described in one of the comments bellow, base on #whytheq's answer, I managed to calculate the right value for the total, using the following MDX formula for the measure:
Sum(
Order(
[dActivity.hActivity].[lActivity].MEMBERS*[dFacility.hFacility].[lFacilit‌​y].MEMBERS,
[dActivity.hActivity].[lActivity].currentmember.name
) as [m_set] ,
iif(
[m_set].currentordinal = 0
OR
not(
[m_set]
.item([m_set].currentordinal)
.item(0).NAME
=
[m_set]
.item([m_set].currentordinal-1)
.item(0).NAME
) ,
[Measures].[mBudget]
,
0
)
)
However, this expression is using the complete set for every single row, so the result overrides the measure real value for the different fact rows.
| FACT_ID | DIM_1_ID | DIM_2_ID | MEASURE |
|---------|----------|----------|---------|
| 1 | A | D | 450 |
| 1 | A | E | 450 |
| 1 | B | F | 450 |
| 2 | A | D | 450 |
| 2 | A | E | 450 |
|---------|----------|----------|---------|
TOTAL | 450 |

Great question - really tricky to do in MDX.
If we do the following then there are 158 rows returned - a handful have duplicate values for [Measures].[Internet Sales Amount]:
SELECT
[Measures].[Internet Sales Amount] ON 0
,NON EMPTY
Order
(
[Product].[Product].[Product]
,[Measures].[Internet Sales Amount]
,bdesc
) ON 1
FROM [Adventure Works];
This only counts them if the member above is different for the respective measure:
WITH
SET [x] AS
Order
(
NonEmpty
(
[Product].[Product].[Product]
,[Measures].[Internet Sales Amount]
)
,[Measures].[Internet Sales Amount]
,bdesc
)
SET [FILTERED] AS
Filter
(
[x]
,
(
[x].Item(
[x].CurrentOrdinal - 1)
,[Measures].[Internet Sales Amount]
)
<>
(
[x].Item(
[x].CurrentOrdinal)
,[Measures].[Internet Sales Amount]
)
)
MEMBER [Measures].[distCount] AS
Count([FILTERED])
SELECT
[Measures].[distCount] ON 0
FROM [Adventure Works];
Maybe try adding the EXISTING keyword into your calculatio:
Sum
(
Order
(
EXISTING //<<<
[dActivity.hActivity].[lActivity].MEMBERS
*
[dFacility.hFacility].[lFacilit‌​y].MEMBERS
,[dActivity.hActivity].[lActivity].CurrentMember.Name
) AS [m_set]
,IIF
(
[m_set].CurrentOrdinal = 0
OR
(NOT
[m_set].Item(
[m_set].CurrentOrdinal).Item(0).Name
=
[m_set].Item(
[m_set].CurrentOrdinal - 1).Item(0).Name)
,[Measures].[mBudget]
,0
)
)

You could try to obtain the average over the set. The code is a bit complex.
WITH SET SomeSet AS
{
Fact.FactID.FactID.MEMBERS
*
Fact.DimID1.DimID1.MEMBERS
*
Fact.DimID2.DimID2.MEMBERS
}
MEMBER Measures.AvgVal AS
AVG
(
{Fact.FactID.CURRENTMEMBER}
*
{Fact.DimID1.CURRENTMEMBER}
*
NonEmpty
(
Fact.DimID2.DimID2.MEMBERS,
{{Fact.FactID.CURRENTMEMBER} *
{Fact.DimID1.CURRENTMEMBER}} *
[Measures].[TheMeasure]
)
,
[Measures].[TheMeasure]
)
SELECT NON EMPTY SomeSet ON 1,
NON EMPTY {
[Measures].[TheMeasure],
Measures.AvgVal
} on 0
from [YourCube]
What I am doing is, for the current FactID- DimID1 combination on the axis, I am getting the list of all possible DimID2s and then, over the internally generated non-empty tuples of FactID-DimID1-DimID2, deriving the average value of the measure TheMeasure
So, for example (100+100)/2 = 100 value would be displayed for the combination of FactID = 1 and DimID1 = A

Related

Unexpected behaviour of rand() in MySQL

I encountered a very weird result while trying to filter my data using RAND() function.
Suppose i have a table filled with some data:
CREATE TABLE `status_log` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`rank` int(11) DEFAULT 50,
)
Then i do the following simple select:
select id,rank as rank,(rand()*100) as thres
from status_log
where rank = 50
and have a clear and expected output:
<...skip...>
| 6575476 | 50 | 34.51090244065123 |
| 6575511 | 50 | 67.84258230388404 |
| 6575589 | 50 | 35.68020727083106 |
| 6575644 | 50 | 74.87329251586766 |
| 6575723 | 50 | 67.32584384020961 |
| 6575771 | 50 | 12.009344726809621 |
| 6575863 | 50 | 58.06919518678374 |
+---------+------+-----------------------+
66169 rows in set (2.502 sec).
So, i generate some random data from 0 to 100 and join each result to the table, around 66000 results in total.
Then i want only a (random) part of the data to be shown. It doesn't have any purpose for production, by the way, it's just some artificial test, so let's not discuss it.
select *
from (
select id,rank as rank,(rand()*100) as thres
from status_log
where rank = 50) t
where thres>rank
order by thres;
After that i get the following:
<...skip...>
| 4396732 | 50 | 99.97966075314177 |
| 4001782 | 50 | 99.98002871869134 |
| 1788580 | 50 | 99.98064143581375 |
| 5300286 | 50 | 99.98275954274717 |
| 146401 | 50 | 99.98552389441573 |
| 4744748 | 50 | 99.98644758014609 |
+---------+------+--------------------+
16449 rows in set (2.188 sec)
It's obvious that for the mean of 50 the expected number of results should be around 33000 out of total 66000. So it seems that the distribution of rand() is biased, correct?
Let's then change > to <:
select *
from (
select id,rank as rank,(rand()*100) as thres
from status_log
where rank = 50) t
where thres<rank
order by thres;
<...skip...>
| 4653786 | 50 | 49.98035016467827 |
| 6041489 | 50 | 49.980370281245904 |
| 5064204 | 50 | 49.989308742796354 |
| 1699741 | 50 | 49.991373205549436 |
| 3234039 | 50 | 49.99390454030959 |
| 806791 | 50 | 49.99575274996064 |
| 3713581 | 50 | 49.99814410693771 |
+---------+------+----------------------+
16562 rows in set (2.373 sec)
Again 16000! So not the half but the quarter of all results is shown!
It seems that the output of rand() inside the brackets is somehow influenced with the expression outside them. How is this possible?
I can also union it:
select * from (select id,rank as rank,(rand()*100) as thres from status_log where rank = 50) t where thres<50
UNION ALL
select * from (select id,rank as rank,(rand()*100) as thres from status_log where rank = 50) t where thres>=50;
The expected number of results has to be somewhere around 66000, but it returns only 33000 or so.
I observe this behavior only when rand() is non-deterministic and is generated dynamically each time. If i do ...select id,rank as rank,(rand(id)*100)... (i.e. make the output of rand() dependent of id), i start getting the expected number of results (33000-ish). The same happens if i precalculate and fill a temporary field in the table.
I also tried making the filtering with rank=30, and the results were ~6000 and ~32000 for < and > respectively.
Version 10.5.8-MariaDB-3, InnoDB
Using a single query with HAVING instead of a subquery with WHERE in the main query seems to work around it.
select id,rank as rank,(rand()*100) as thres
from status_log
where rank = 50
having thres > rank
order by thres
This appears to be this bug:
RAND() evaluated and filtered twice with subquery

Lag and Lead to next month

TABLE: HIST
CUSTOMER MONTH PLAN
1 1 A
1 2 B
1 2 C
1 3 D
If I query:
select h.*, lead(plan) over (partition by customer order by month) np from HIST h
I get:
CUSTOMER MONTH PLAN np
1 1 A B
1 2 B C
1 2 C D
1 3 D (null)
But I wanted
CUSTOMER MONTH PLAN np
1 1 A B
1 2 B D
1 2 C D
1 3 D (null)
Reason being, next month to 2 is 3, with D. I'm guessing partition by customer order by month doesn't work the way I thought.
Is there a way to achieve this in Oracle 12c?
One way to do it is to use RANGE partitioning with the MIN analytic function. Like this:
select h.*,
min(plan) over
(partition by customer
order by month
range between 1 following and 1 following) np
from HIST h;
+----------+-------+------+----+
| CUSTOMER | MONTH | PLAN | NP |
+----------+-------+------+----+
| 1 | 1 | A | B |
| 1 | 2 | B | D |
| 1 | 2 | C | D |
| 1 | 3 | D | |
+----------+-------+------+----+
When you use RANGE partitioning, you are telling Oracle to make the windows based on the values of the column you are ordering by rather than making the windows based on the rows.
So, e.g.,
ROWS BETWEEN 1 following and 1 following
... will make a window containing the next row.
RANGE BETWEEN 1 following and 1 following
... will make a window containing all the rows having the next value for month.
UPDATE
If it is possible that some values for MONTH might be skipped for a given customer, you can use this variant:
select h.*,
first_value(plan) over
(partition by customer
order by month
range between 1 following and unbounded following) np
from h
+----------+-------+------+----+
| CUSTOMER | MONTH | PLAN | NP |
+----------+-------+------+----+
| 1 | 1 | A | B |
| 1 | 3 | B | D |
| 1 | 3 | C | D |
| 1 | 4 | D | |
+----------+-------+------+----+
You can use LAG/LEAD twice. The first time to check for duplicate months and to set the value to NULL in those months and the second time use IGNORE NULLS to get the next monthly value.
It has the additional benefit that if months are skipped then it will still find the next value.
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE HIST ( CUSTOMER, MONTH, PLAN ) AS
SELECT 1, 1, 'A' FROM DUAL UNION ALL
SELECT 1, 2, 'B' FROM DUAL UNION ALL
SELECT 1, 2, 'C' FROM DUAL UNION ALL
SELECT 1, 3, 'D' FROM DUAL UNION ALL
SELECT 2, 1, 'E' FROM DUAL UNION ALL
SELECT 2, 1, 'F' FROM DUAL UNION ALL
SELECT 2, 3, 'G' FROM DUAL UNION ALL
SELECT 2, 5, 'H' FROM DUAL;
Query 1:
SELECT CUSTOMER,
MONTH,
PLAN,
LEAD( np ) IGNORE NULLS OVER ( PARTITION BY CUSTOMER ORDER BY MONTH, PLAN, ROWNUM ) AS np
FROM (
SELECT h.*,
CASE MONTH
WHEN LAG( MONTH ) OVER ( PARTITION BY CUSTOMER ORDER BY MONTH, PLAN, ROWNUM )
THEN NULL
ELSE PLAN
END AS np
FROM hist h
)
Results:
| CUSTOMER | MONTH | PLAN | NP |
|----------|-------|------|--------|
| 1 | 1 | A | B |
| 1 | 2 | B | D |
| 1 | 2 | C | D |
| 1 | 3 | D | (null) |
| 2 | 1 | E | G |
| 2 | 1 | F | G |
| 2 | 3 | G | H |
| 2 | 5 | H | (null) |
Just so that it is listed here as an option for Oracle 12c (onward), you can use an apply operator for this style of problem
select
h.customer, h.month, h.plan, oa.np
from hist h
outer apply (
select
h2.plan as np
from hist h2
where h.customer = h.customer
and h2.month > h.month
order by month
fetch first 1 rows only
) oa
order by
h.customer, h.month, h.plan
I don't know of any Oracle 12c public fiddles so, an example in SQL Server can be found here: http://sqlfiddle.com/#!18/cd95e/1
| customer | month | plan | np |
|----------|-------|------|--------|
| 1 | 1 | A | C |
| 1 | 2 | B | D |
| 1 | 2 | C | D |
| 1 | 3 | D | (null) |

Oracle SQL - distributing into buckets

i'am searching for a smart oracle sql solution to distribute data into a number of buckets. The order of x is important. I know there are a lot of algorithms but iam pretty sure there must be smart sql (analytic function) solution e.g. NTILE(3) but i don't get it.
x|quantity
1|7
2|4
3|9
4|2
5|10
6|3
8|7
9|7
10|4
11|9
12|2
13|10
16|3
17|7
The result should look something like this:
x_from|x_to|sum(quantity)
1|4|22
...and so on
Thanks in advance
Tim
This example divides the table into 4 buckets (ntile( 4 )):
SELECT min( "x" ) as "From",
max( "x" ) as "To",
sum("quantity")
FROM (
SELECT t.*,
ntile( 4 ) over (order by "x" ) as group_no
FROM table1 t
)
GROUP BY group_no
ORDER BY 1;
| From | To | SUM("QUANTITY") |
|------|----|-----------------|
| 1 | 4 | 22 |
| 5 | 9 | 27 |
| 10 | 12 | 15 |
| 13 | 17 | 20 |

Oracle Recursive Subquery Factoring convert

I'm trying to use this recursive SQL feature but can't get it to do what I want, not even close. I've coded up the logic in an unrolled loop, asking if it can be converted into a single recursive SQL query, not the table update style I've used.
http://sqlfiddle.com/#!4/b7217/1
There are six players to be ranked. They have id, group id, score and rank.
Initial state
+----+--------+-------+--------+
| id | grp_id | score | rank |
+----+--------+-------+--------+
| 1 | 1 | 100 | (null) |
| 2 | 1 | 90 | (null) |
| 3 | 1 | 70 | (null) |
| 4 | 2 | 95 | (null) |
| 5 | 2 | 70 | (null) |
| 6 | 2 | 60 | (null) |
+----+--------+-------+--------+
I want to take the person with the highest initial score and give them rank 1. Then I apply 10 bonus points to the score of everyone who has the same group id. Take the next highest, assign rank 2, distribute bonus points and so on until there are no players left.
User id breaks ties.
The bonus points changes the ranking. id=4 initially appears to be second placed with 95, behind the leader with 100 but with the 10 pts bonus, id=2 moves up and takes the spot.
Final state
+-----+---------+--------+------+
| ID | GRP_ID | SCORE | RANK |
+-----+---------+--------+------+
| 1 | 1 | 100 | 1 |
| 2 | 1 | 100 | 2 |
| 4 | 2 | 95 | 3 |
| 3 | 1 | 90 | 4 |
| 5 | 2 | 80 | 5 |
| 6 | 2 | 80 | 6 |
+-----+---------+--------+------+
This is a quite a bit late, but I'm not sure this can be done using Recursive CTE. I did however come up with a solution using the MODEL clause:
WITH SAMPLE (ID,GRP_ID,SCORE,RANK) AS (
SELECT 1,1,100,NULL FROM DUAL UNION
SELECT 2,1,90,NULL FROM DUAL UNION
SELECT 3,1,70,NULL FROM DUAL UNION
SELECT 4,2,95,NULL FROM DUAL UNION
SELECT 5,2,70,NULL FROM DUAL UNION
SELECT 6,2,60,NULL FROM DUAL)
SELECT ID,GRP_ID,SCORE,RANK FROM SAMPLE
MODEL
DIMENSION BY (ID,GRP_ID)
MEASURES (SCORE,0 RANK,0 LAST_RANKED_GRP,0 ITEM_COUNT,0 HAS_RANK)
RULES
ITERATE (1000) UNTIL (ITERATION_NUMBER = ITEM_COUNT[1,1]) --ITERATE ONCE FOR EACH ITEM TO BE RANKED
(
RANK[ANY,ANY] = CASE WHEN SCORE[CV(),CV()] = MAX(SCORE) OVER (PARTITION BY HAS_RANK) THEN RANK() OVER (ORDER BY SCORE DESC,ID) ELSE RANK[CV(),CV()] END, --IF THE CURRENT ITEM SCORE IS EQUAL TO THE MAX SCORE OF UNRANKED, ASSIGN A RANK
LAST_RANKED_GRP[ANY,ANY] = FIRST_VALUE(GRP_ID) OVER (ORDER BY RANK DESC),
SCORE[ANY,ANY] = CASE WHEN RANK[CV(),CV()] = 0 AND CV(GRP_ID) = LAST_RANKED_GRP[CV(),CV()] THEN SCORE[CV(),CV()]+10 ELSE SCORE[CV(),CV()] END,
ITEM_COUNT[ANY,ANY] = COUNT(*) OVER (),
HAS_RANK[ANY,ANY] = CASE WHEN RANK[CV(),CV()] <> 0 THEN 1 ELSE 0 END --TO SEPARATE RANKED/UNRANKED ITEMS
)
ORDER BY RANK;
It's not very pretty, and I suspect there is a better way to go about this, but it does give the expected output.
Caveats:
You'd have to increase the iteration count if you have more than that number of rows.
This does a full re-ranking based on the score after each iteration. So if we took your sample data, but changed the initial score of item 2 to 95 rather than 90: after ranking item 1 and giving the 10 point bonus to item 2, it now has a score of 105. So we rank it as 1st and move item 1 down to 2nd. You'd have to make a few modifications if this is not the desired behavior.

SQL Server- RAND() - range

I want to create random numbers between 1 and 99,999,999.
I am using the following code:
SELECT CAST(RAND() * 100000000 AS INT) AS [RandomNumber]
However my results are always between the length of 7 and 8, which means that I never saw a value lower then 1,000,000.
Is there any way to generate random numbers between a defined range?
RAND Returns a pseudo-random float value from 0 through 1, exclusive.
So RAND() * 100000000 does exactly what you need. However assuming that every number between 1 and 99,999,999 does have equal probability then 99% of the numbers will likely be between the length of 7 and 8 as these numbers are simply more common.
+--------+-------------------+----------+------------+
| Length | Range | Count | Percent |
+--------+-------------------+----------+------------+
| 1 | 1-9 | 9 | 0.000009 |
| 2 | 10-99 | 90 | 0.000090 |
| 3 | 100-999 | 900 | 0.000900 |
| 4 | 1000-9999 | 9000 | 0.009000 |
| 5 | 10000-99999 | 90000 | 0.090000 |
| 6 | 100000-999999 | 900000 | 0.900000 |
| 7 | 1000000-9999999 | 9000000 | 9.000000 |
| 8 | 10000000-99999999 | 90000000 | 90.000001 |
+--------+-------------------+----------+------------+
I created a function that might help. You will need to send it the Rand() function for it to work.
CREATE FUNCTION [dbo].[RangedRand]
(
#MyRand float
,#Lower bigint = 0
,#Upper bigint = 999
)
RETURNS bigint
AS
BEGIN
DECLARE #Random BIGINT
SELECT #Random = ROUND(((#Upper - #Lower) * #MyRand + #Lower), 0)
RETURN #Random
END
GO
--Here is how it works.
--Create a test table for Random values
CREATE TABLE #MySample
(
RID INT IDENTITY(1,1) Primary Key
,MyValue bigint
)
GO
-- Lets use the function to populate the value column
INSERT INTO #MySample
(MyValue)
SELECT dbo.RangedRand(RAND(), 0, 100)
GO 1000
-- Lets look at what we get.
SELECT RID, MyValue
FROM #MySample
--ORDER BY MyValue -- Use this "Order By" to see the distribution of the random values
-- Lets use the function again to get a random row from the table
DECLARE #MyMAXID int
SELECT #MyMAXID = MAX(RID)
FROM #MySample
SELECT RID, MyValue
FROM #MySample
WHERE RID = dbo.RangedRand(RAND(), 1, #MyMAXID)
DROP TABLE #MySample
--I hope this helps.

Resources