Finding Percentile of a calculated measure in PowerPivot / DAX - dax

A table that is similar to the data set I am working on (although much simpler) is below that I would like to calculate some measures on and then find the percentiles of the measures.
Table Name: Data
Owner AgeRating OtherRating
A 1 2
A 4 4
A 4 6
B 3 3
B 3 9
B 7 4
C 8 8
C 4 2
First - A little background: I start by taking an average of the ratings (By Owner) and then normalize all ratings by dividing each rating by the maximum owner's rating - This creates the measure I would like to take the percentile of:
NormAgeRating=
average(Data[AgeRating])/
calculate(
maxx(
SUMMARIZE(Data,[Owner],"avg",average([AgeRating]))
,[avg]
)
,all(Data[owner])
)
I have a pivot table with Rows being the owner which then looks like
Owner NormAgeRating
A .5
B .72
C 1
Now for the question:
I would like to get the .33 percentile.inc of the new NormAgeRating. I would like to use this to classify each owner into groups (<=33%ile or > 33%ile)
This is what I am trying to get to:
Owner NormAgeRating 33%ile classification
A .5 .64 bottom
B .72 .64 top
C 1 .64 top
I have tried this with no success and many other variation with different groupby's etc. and continually get the wrong value:
33%ile=percentilex.inc(all(data[owner]),[NormAgeRating],0.33)
Any help would be greatly appreciated
Update:
When I try sumx countx and averagex in the form:
=
averagex(
SUMMARIZE(
all(Data[Owner]),
[Owner],
"risk",[NormAgeRating]),
[risk]
)
I am getting the right values, so I am not sure why using percentilex.inc/exc would produce the wrong values...

PERCENTILEX (and all iterator functions) operates row by row on the table in the first argument. Therefore, you need that table to be at the desired granularity before you try to compute the percentile, which means you need to summarize Data[Owner] so that you have a unique row per owner rather than iterating over the raw column.
Keeping this in mind, both measures can be written similarly:
NormAgeRating =
DIVIDE (
AVERAGE ( Data[AgeRating] ),
MAXX (
SUMMARIZE (
ALL ( Data[Owner] ),
Data[Owner],
"Avg", AVERAGE ( Data[AgeRating] )
),
[Avg]
)
)
33%ile =
PERCENTILEX.INC (
SUMMARIZE (
ALL ( Data[Owner] ),
Data[Owner],
"Risk", [NormAgeRating]
),
[Risk],
0.33
)

Related

How to combine 6 tables in one Matrix, show top 12 and categorize the rest as others?

I need to be able to sum availability based on product and say show me top 3, and categorize the rest as Others. I have two tables in a matrix connected by a product table.
I tried so many ways -
i was able to create this measure for July (which is what i will be sorting with) - I get the correct ranking column for July.
i know i'm missing something. i tried to take that ranking measure statement and add an if statement and couldn't get it to do the ranking.
the picture would make more sense *(my formulas are based on actual column names)
Partner Ranking =
VAR summry =
SUMMARIZE (
ALLSELECTED ( Latest ),
[partner_group],
"Sum", COUNT ( Latest[site_url] )
)
VAR tmp =
ADDCOLUMNS ( summry, "RNK", RANKX ( summry, [Sum],, DESC, DENSE ) )
RETURN
MAXX (
FILTER ( tmp, [partner_group] = SELECTEDVALUE ( Latest[partner_group] ) ),
[RNK]
)
I don't know what to do next. how can i do this when i have a separate table that is the product name that links the two tables?

Calculating payback period using DAX

I'm working on some calculations for capital budgeting, and I have the following two tables in my data model
I'm trying to build out a calculated column in DAX to determine the payback period for each project in the Project table. I've put together the calculation here, I'm just not sure exactly how to execute this in DAX.
Logical Steps for Calculating Payback Period:
For each Project, find the cumulative sum for each date for relevant metrics (Include OpEx Savings and OpEx Implementation Cost, but not Revenue or Working Capital)
Find the MIN date where cumulative sum is greater than zero (the "break-even" date")
Find the MIN date with non-zero implementation cost ("Investment date")
Find the difference (in months) between #2 and #3 to determine payback period
EDIT:
The answer for the listed project is 7 months. I've built an intermediate table in Excel to develop the answer, but I'd like to be able to do this directly in a PowerPivot table with DAX.
I've produced this as a solution:
Create values, which makes sure cost are - and savings are + (ValCorr)
Create a running sum (RunningSum)
Find Investment Date (InvestmentDate)
Find Breakeven Date (BreakEvenDate)
Find Difference (Payback)
DAX:
RunningSum =
CALCULATE(SUM(Impacts[ValCorr]);
FILTER(
ALL(Impacts);
Impacts[ProjectID] = EARLIER(Impacts[ProjectID]) &&
Impacts[Date] <= EARLIER(Impacts[Date])
))
InvestmentDate =
CALCULATE (
FIRSTNONBLANK ( Impacts[Date]; 0 );
FILTER ( ALL ( Impacts ); Impacts[RunningSum] <> 0 )
)
BreakEvenDate =
CALCULATE (
FIRSTNONBLANK ( Impacts[Date]; 0 );
FILTER ( ALL ( Impacts ); Impacts[RunningSum] > 0 )
)
Payback = DATEDIFF(Impacts[InvestmentDate];Impacts[BreakEvenDate];MONTH)
Result:
Good luck!
After a fair amount of trial and error, I came up with a solution.
Step 1: Build out a helper metrics table. This serves 2 purposes: (a) excludes irrelevant metrics (like revenue), and (b) ensure costs are negative and savings are positive.
Metrics Table
Step 2: Build 2 helper measures that will go into the virtual, summarized, intermediate table.
CumulativeTotalMetric :=
CALCULATE (
SUMX (
Impact,
Impact[Latest Estimate Monthly Values]
* RELATED ( BaseMetrics[Payback Period Multiplier] )
),
FILTER ( ALL ( Impact[Month] ), Impact[Month] <= MAX ( Impact[Month] ) )
)
TotalMetric :=
SUMX (
Impact,
Impact[Latest Estimate Monthly Values]
* RELATED ( BaseMetrics[Payback Period Multiplier] )
)
Step 3: Create the final measure that creates the virtual table (BaseTable), and performs logical operations on it to arrive at the final payback period.
Payback Period (Years) :=
VAR BaseTable =
ADDCOLUMNS (
SUMMARIZE ( Impact, Impact[initiative #], Impact[snapshot], Impact[Month] ),
"Cumulative Total Impact", CALCULATE ( [CumulativeTotalMetric] ),
"Total Impact", CALCULATE ( [TotalMetric] )
)
VAR LastCumulativeLossDate =
MAXX ( FILTER ( BaseTable, [Cumulative Total Impact] < 0 ), [Month] )
VAR BreakEvenDate =
MINX (
FILTER (
BaseTable,
[Month] > LastCumulativeLossDate
&& [Cumulative Total Impact] > 0
),
[Month]
)
VAR InitialInvestmentDate =
MINX ( FILTER ( BaseTable, [Total Impact] < 0 ), [Month] )
RETURN
IF (
OR ( ISBLANK ( InitialInvestmentDate ), ISBLANK ( BreakEvenDate ) ),
BLANK (),
( BreakEvenDate - InitialInvestmentDate )
/ 365
)
This last meaure is pretty complicated. It uses progressive, dependent variables. It starts with the same base table, and defines variables that are used in subsequent variables. I'm no DAX expert, but I suspect using these variables helps with the calculation efficiency.
EDIT: I should note that I didn't use this measure as a calculated column -- I simply used it in a pivot table which is the same "shape" as the "Projects" table above -- one line per project / initiative.

DAX Average at different grain

Ok, highly simplified table of three columns, order#, product#, and quantity...
Order | Product | Qty
1 | A | 10
1 | B | 20
2 | C | 30
I want to calculate an average of quantity, so.. this is at the "default grain":
AvgQty = 60/3 = 20
Easy, however, i also then want to remove Product:
Order | Qty
1 | 30
2 | 30
and now the Qty should re-aggregate [as they would with a sum()], and now I would want AvgQty to return the average of these new lines...
AvgQty = 60/2 = 30
If tried to do this by grouping by Order explicitly like so:
measure :=
IF (
ISFILTERED ( 'Table'[Product] ),
AVERAGEX (
SUMMARIZE (
'Table',
'Table'[Order],
'Table'[Product],
"SumQty", SUM ( 'Table'[Qty] )
),
[SumQty]
),
AVERAGEX (
SUMMARIZE (
'Table',
'Table'[Order],
"SumQty", SUM ( 'Table'[Qty] ) ),
[SumQty]
)
)
It doesn't quite work due to the total of the column technically not being filtered by product, so it continues to still show the incorrect total...
I am not certain how to override this..?
My actual calc is not just a simple average, but the main problem I am facing is ensuring I can get a 'recalculation' of the Qty at a new grain.. if I can nail this, I can fix my own problem.. the solution could well be to also load the table to the model at the order grain too!!! ;)
I also thought about it the last days and I am afraid there is no way to solve this for the following reasons:
there is no function in DAX to return the whole table that was calculated as your rows
there is no function to tell you what was aggregated there
for a single row you could find out what was filtered using complex cascading ISFILTERED functions but this is not really feasible nor reliable
the biggest problem: when you are on the total or sub-total level, there is no way to find out what was used for the detail rows as none of the existing functions like ISFILTERED, HASONEVALUE, etc. would work
so for DAX this cannot be solved at the moment from my point of view
in case you are using MDX to query your model (e.g. a Pivot Table) you could create a MDX measure which uses the AXIS()-function to return the set which was used on rows/columns and is it in a COUNT() function

PowerPivot LOOKUPVALUE

I'd appreciate any pointers on this, I need to look up in PP a value based on a range in another PP Table.
I want to return 'BAND' based on where Revenue in the first table falls between High and Low Band Values in the Band Table.
=LOOKUPVALUE(Band[Band],Band[Low],>=[Revenue],Band[High],<=[Revenue])
The Band Table is set up as
Band 0-100 Low 0 High 100
Band 101-200 Low 101 High 200
etc
I've also tried this...
=FILTER(Band[Band],[Revenue]>=Band[Low],[Revenue]<=Band[High])
Thanks for your help.
Gav
LookupValue doesn't support conditional evaluations, instead you can use a FILTER function and FIRSTNONBLANK function to get the right Band[Band].
Create a calculated column in the Combined table using this expression:
LookupBand =
CALCULATE (
FIRSTNONBLANK ( Band[Band], 0 ),
FILTER (
Band,
[Low] <= EARLIER ( Combined[Revenue] )
&& [High] >= EARLIER ( Combined[Revenue] )
)
)

DAX [SSAS Tabular] DISTINCTCOUNT with lastnonempty date

I am analysing product distribution data.
We want a snapshot measurement of market penetration [Depth].
Depth := DIVIDE (
[Count of ranged product/store distribution points],
[Count of audited product/store distribution points]
)
I have 2 tables:
'Distribution' (audited distribution points); and
'Calendar' (a date table).
They are related in the model.
For a snapshot at 30 June 2015, I don't want to include new product/stores from November 15, but I do want include any data points that were audited in the prior 3 months.
Logically the numerator is a filtered subset of the denominator so I need the denominator first, which is where I am stuck.
If I just do a basic distinctcount without any fancy code I get this.
Month, Denominator
Mar-15, 1
Apr-15, 0
May-15, 0
Jun-15, 2
Jul-15, 6
Aug-15, 5
Sep-15, 1
Oct-15, 40
Nov-15, 53
Dec-15, 92
But I want something that looks like this:
Month, Denominator
Mar-15, 150
Apr-15, 150
May-15, 150
Jun-15, 170 -- add 1 new product in 20 stores
Jul-15, 170
Aug-15, 170
Sep-15, 170
Oct-15, 200 -- add 1 New product in 30 stores
Nov-15, 200
Dec-15, 200
I need to drop the filter context on Distribution[Date] and apply a filter on Distribution of Distribution[Date] <= Calendar[Date] and then do the distinct count but I get errors.
Count of audited product/store distribution points:=
CALCULATE(
COUNTROWS (
VALUES ( Distribution[ProductStoreKey])
),
NOT ( Distribution[Status On Exit] = "Ineligible" ),
FILTER (
ALL ( Distribution[Date] ),
Distribution[Date] <= Calendar[Date]
)
)
ERROR:
The value for column 'Date' in table 'Distribution' cannot be determined in
the current context. Check that all columns referenced in the calculation
expression exist, and that there are no circular dependencies. This can also
occur when the formula for a measure refers directly to a column without
performing any aggregation--such as sum, average, or count--on that column.
The column does not have a single value; it has many values, one for each
row of the table, and no row has been specified.
It might be a error-proofing of the filter using HASONEVAUE. I'm not sure.
Among other ideas, I've tried rewriting the filter but this doesn't work either.
FILTER (
Distribution,
Distribution[Date] <=
CALCULATE (
MAX(Distribution[Date]),
Distribution[Date]<=Calendar[Date]
)
)
Error:
The expression contains multiple columns, but only a single column can be
used in a True/False expression that is used as a table filter expression.
This code gets the DistibutionDate of the last datapoint at variable Calendar[Date] but I cant figure out how to incorporate it.
Last Ever Dist Date:=
CALCULATE (
LASTDATE( Distribution[DATE] ),
DATESBETWEEN(
Calendar[date],
BLANK(),
LASTDATE(Calendar[Date])
),
ALL(Calendar)
)
How about:
Count of audited product/store distribution points:=
CALCULATE(
DISTINCTCOUNT(Distribution[ProductStoreKey]),
DATESBETWEEN(
Calendar[date],
BLANK(),
LASTDATE(Calendar[Date])
),
ALL(Calendar)
)

Resources