Difference vs Previous Row & DistinctCount - powerquery

I want to calculate a Delta Weeks column in Power Query WeekNum[current row] - WeekNum[previous row]
I found a way to do it using the [Index] column, but it is painfully slow, and my table is 100k rows.
let
Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Customer", type text}, {"Product", type text}, {"WeekNum", Int64.Type}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
#"Added Custom" = Table.AddColumn(#"Added Index", "Delta Weeks", each try Source[WeekNum]{[Index]} - Source[WeekNum]{[Index]-1} otherwise 0) in #"Added Custom"
Also, after this, I need another column who would count the distinct values from the beginning up to that row.
Most of the weeks are consecutive, so basically the distinct count will increase when they are not.
(I don't know how to do this in Power Query).

I believe PQ wasn't designed for working with previous row context.
What I did find works better than referencing the previous row using [Index]-1, is creating 2 index columns (one starting with: 0,1,2, and the other with 0,0,1,2, so basically an [Index]-1 floored at 0), and then joining the 2 tables, which basically puts the previous row on the same row, if that makes sense.
However even that was too slow for me, and in the end I implemented a different approach, and I simply use a bit of VBA code where I calculate the difference via previous row, and then import the table in PQ. I think this is a more efficient (and considerably faster) approach!

try/otherwise might be pretty slow. Is it any faster if you use if [Index] > 0 then Source[WeekNumber]{[Index]} - Source[WeekNumber]{[Index] - 1} else 0 for the custom formula?

Instead of your try ... otherwise code I would use something more direct like:
[WeekNumber] - #"Added Index"[WeekNumber]{[Index] - 1}.
Then I would add a Replace Errors step to clean up the first row.

Related

Power Query M - Group by Column Value with Custom Aggregation (Percentile)

I am trying to calculate percentiles by group (from column values ex: hours by department, sales by region, etc.) within power query. This same logic could be used for other custom group aggregation. After lots of searching, I found 2 potential approaches.
Approach 1:
this archived article which looked to have the perfect answer. Nothing else I could find comes close.
The solution from there is the following custom function:
//PercentileInclusive Function
(inputSeries as list, percentile as number) =>
let
SeriesCount = List.Count(inputSeries),
PercentileRank = percentile * (SeriesCount - 1) + 1, //percentile value between 0 and 1
PercentileRankRoundedUp = Number.RoundUp(PercentileRank),
PercentileRankRoundedDown = Number.RoundDown(PercentileRank),
Percentile1 = List.Max(List.MinN(inputSeries, PercentileRankRoundedDown)),
Percentile2 = List.Max(List.MinN(inputSeries, PercentileRankRoundedUp)),
PercentileInclusive = Percentile1 + (Percentile2 - Percentile1) * (PercentileRank - PercentileRankRoundedDown)
in
PercentileInclusive
Combined with a step in your table to group appropriately and use the function:
=Table.Group(TableName, {"Grouping Column"}, {{"New Column name", each
PercentileInclusive(TableName[Column to calculate Percentile of], percentile # between 0 and 1)}})
[edited to correct the typo Ron R. pointed out and remove unnecessary detail]
Example input:
Pen Type
Units Sold
Ball-Point
6,109
Ball-Point
3,085
Ball-Point
1,970
Ball-Point
8,190
Ball-Point
6,006
Ball-Point
2,671
Ball-Point
6,875
Roller
778
Roller
9,329
Roller
7,781
Roller
4,182
Roller
2,016
Roller
5,785
Roller
1,411
Desired output for a 25% inclusive percentile grouped by Pen Type:
Pen Type
0.25 Inclusive Percentile (Correct)
Ball-Point
2,878
Roller
1,714
Notes: No decimals shown above, calculated with Excel's PERCENTILE.INC function.
Approach 1 works great.
Approach 2:
Here is an alternate Power Query solution I tried. It is a single step with no custom function. It seems like it should do the trick, but I can't figure out a way to make the conditional check be row based. Something needs to go where I have //Condition// that tells it which rows belong in the current rows group, but no matter what I try it does not work. It either breaks, or gives a percentile for everything, ignoring the grouping.
=List.Percentile(Table.Column(Table.SelectRows(#"Previous Step Name", //Condition//), "Column to calculate percentile of"), percentile # 0 to 1)
Any ideas how to make approach 2 work?
It appears your Table.Group function is incorrectly specified.
Where my previous step was #"Changed Type", the following works:
#"Grouped Rows" = Table.Group(#"Changed Type", {"Pen Type"}, {
{"Percentile", each fnPercentileINC([Units Sold],0.25)}})
Original Data
M Code
let
Source = Excel.CurrentWorkbook(){[Name="Table4"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Pen Type", type text}, {"Units Sold", Int64.Type}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"Pen Type"}, {
{"Percentile", each fnPercentileINC([Units Sold],0.25), type number}})
in
#"Grouped Rows"
Result
Edit:
For your approach #2, without a custom function, you can merely use List.Percentile as an aggregation in the Table.Group function:
#"Grouped Rows" = Table.Group(#"Changed Type", {"Pen Type"}, {
{"25th Percentile", each List.Percentile([Units Sold],0.25)}
})

Bring Value with Sumifs in Pow.Query language to specified row, and column(location)

Next step? I have brought with sumifs and a lot sumif from other workbook, information to the exact row, column in excel workbook. Now I want to do the same with query language. I can bring two values if condition is met, but then it is unclear how I will bring the total sum to the one row in excel workbook. Can anyone show me the path? I guess I will need Data Model...
= Table.AddColumn(#"Changed Type", "Sumif", each if [Column2] =2 or [Column2]=1 then [Column3]+[Column4] else 0)
let
Source = Folder.Files...
#"C:\Users...
#"Imported Excel" = Excel.Workbook(#"C:\...
SegPL_Chart = #"Imported Excel"{[Name="SegPL_Chart"]}[Data],
#"Removed Top Rows" = Table.Skip(SegPL_Chart,12),
#"Removed Alternate Rows" = Table.AlternateRows(#"Removed Top Rows",1,1,90),
#"Promoted Headers" = Table.PromoteHeaders(#"Removed Alternate Rows"),
#"Filtered Rows" = Table.SelectRows(#"Promoted Headers", each ([Col1]="1" or [Col1]="2")),
#"Table Group = Table.Group(#"Filtered Rows", {}, List.TransformMany(Table.ColumnNames(#"Filtered Rows",(x)=>{each if x = "Names" then "Totals" else List.Sum(Table.Column(_,x))},(x,y)=>{x,y})),
#"append" = Table.Combine({#"Filtered Rows",#"Table Group"})
in
#"append"
It gives an error "in" Token comma needed..? What else I need to do bring total rows?
You can use several steps to create several helper columns with intermediate results of conditional sums. Then you can create a new column, sum up all the intermediate results and the delete the helper columns with the intermediate results.
Keep in mind that unlike Excel, the calculations in Power Query always return constants and you can then delete calculated columns you no longer need. So,
Create helper column 1 with complicated IF and Sum scenario
Create helper column 2 with complicated IF and Sum scenario
Create total column to add column 1 + column 2
Delete helper columns and keep only the total column
That gives me exact result what I was looking for, but it is with DAX formula in PowerPivot:
=SUMX(FILTER('TableName',[ColName] = 1),'TableName'[ColName2])
So would be glad to convert it to Power-Query formula

Power Query: Selecting multiple elements in 'value field settings' to measure a specifc field

I'm trying to create a measure that averages the 29 elements' value [Overtime/Hours_worked] into one cell, visualised by the attached image.
Cell F32 currently shows [AverageA Total Overtime/Total Hours_worked] but I want it to be an average of the 29 rows' values as displayed in cell H32, =AVERAGEA(F3:F31).
The elements' figures are based upon underlying data from Data$, currently amounting to ~150k rows. When creating a measure that's averaging the elements' values from Column E [AvereageA Overtime/Hours_worked] and showing as a % of the 29 elements' aggregate %, I'm running into the problem of averaging the separate elements' underlying data taken from Data$. Worth noting is that F3:F31 is redundant in this instance, I'm looking for the average of the 29 elements' values in column E and not their respective averages shown in column F.
Am I right to use measure here or is there a better way to approach it? If measures can be used, is there a way to design the measure so that it refers to the Pivot Table's shown data instead of the underlying data taken from Data$? For instance by designing the measure to refer to column E in the pivot table?
Side note
The table needs to remain dynamic since Data$ is being updated regularly. I'm relatively new to Power Query so I'm not sure if there are other ways to solve this, i.e. through MDX, but I doubt I'll be able to sort that out myself.
Any and all help is appreciated, thanks.
I'm not sure how you are computing the individual entries in the AverageA Total Overtime/Total Hours_worked (so I left it blank), but to compute the totals and averages for the other columns, you can use the Table.Group command in a special way with an empty list for the key (so as to return the entire table for the Aggregation operations).
Given:
M Code
read the comments in the code to understand the algorithm
If your overtime% column is in your original data, you can just delete those code lines that add that column
let
//be sure to change table name in next line to your actual table name
Source = Excel.CurrentWorkbook(){[Name="wrkTbl"]}[Content],
//set data types
#"Changed Type" = Table.TransformColumnTypes(Source,{
{"Area", type text}, {"Hours_worked", Int64.Type}, {"Overtime", Int64.Type}}),
//Add the percent overtime column
#"Added Custom" = Table.AddColumn(#"Changed Type", "Overtime/Hours_worked",
each [Overtime]/[Hours_worked], Percentage.Type),
//Use Table.Group to compute total/averages for the last row
//Be sure to use the exact same names as used in the original table
#"Grouped Rows" = Table.Group(#"Added Custom", {}, {
{"Area", each "Totals",type text},
{"Hours_worked", each List.Sum([Hours_worked]), Int64.Type},
{"Overtime", each List.Sum([Overtime]), Int64.Type},
{"Overtime/Hours_worked", each List.Sum([Overtime])/List.Sum([Hours_worked]), Percentage.Type},
{"AverageA Overtime/Hours_worked", each List.Average([#"Overtime/Hours_worked"]), Percentage.Type}
}),
//Append the two tables to add the Totals row
append= Table.Combine({ #"Added Custom", #"Grouped Rows"})
in
append
results in =>

In power query, how to turn a number into a duration in seconds?

I am trying, in power BI, to create change the type of one of my columns. Said column contain Numbers and I am trying to turn that number into a duration in seconds. But whenever I use the default type change, it turn the duration into days.
= Table.TransformColumnTypes(#"Changed Type",{{"Duration", type duration}})
is the default, I've tried puttin duration(seconds) or duration.seconds, but it didn't work.
I've looked around for a solution, but all I get are DAX solutions. I couldn't find much about power query in general.
Thanks for the help
I believe this does what you want.
If you start with a column called Seconds:
You can add a column with = Duration.From([Seconds]/86400) to get:
Alternatively, you could use:
= Table.ReplaceValue(Source, each [Seconds],each Duration.From([Seconds]/86400),Replacer.ReplaceValue,{"Seconds"})
to change...
directly to...
Here's the M code for the two different options:
Adding a column:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Custom" = Table.AddColumn(Source, "Duration", each Duration.From([Seconds]/86400))
in
#"Added Custom"
Directly changing:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Replaced Value" = Table.ReplaceValue(Source, each [Seconds],each Duration.From([Seconds]/86400),Replacer.ReplaceValue,{"Seconds"})
in
#"Replaced Value"

Populate conditional column depending on column name criteria

I receive a weekly report which contains some repetition of columns. This is because it is drawn from a collection of web forms which ask similar questions to each other - let's say they all ask "Do you want to join our email list?" - but this question is stored in the source system as a separate field for each form (each form is effectively a separate table). The columns will always be consistently named - e.g. "Email_optin_1", "Email_optin_2" - so I can come up with rules to identify the columns which ask the email question. However, the number of columns may vary from week to week - one week the report might just contain "Email_optin_2", the next week it might include four such columns. (This depends on which web-forms have been used in that week). The possible values are the same in all these columns - let's say "Yes" and "No".
Each row should normally only have one of the "Email_optin" columns populated.
What I would like to do is create a single column in Power Query called "Email_Optin_FINAL", which would return "Yes" if ANY columns beginning with "Email_optin" contain a value of "Yes".
So, basically, instead of the criteria simply referring to the values in specific columns, what I would like it to do is first of all figure out which columns it needs to be looking at, and then look at the values in those columns.
Is this possible in PowerQuery?
Thanks in advance for any advice!
This would find all the columns containing Email_optin and merge them for you into a new column and remove the original columns
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
EmailList= List.Select(Table.ColumnNames(Source), each Text.Contains(_, "Email_optin")),
#"Merged Columns" = Table.CombineColumns(Source,EmailList,Combiner.CombineTextByDelimiter("", QuoteStyle.None),"Merged")
in #"Merged Columns"
This would find all the columns containing Email_optin and merge them for you into a new column and preserve the original columns
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
Index= Table.AddIndexColumn(Source, "Index", 0, 1),
EmailList= List.Select(Table.ColumnNames(Index), each Text.Contains(_, "Email_optin")),
Merged = Table.CombineColumns(Index,EmailList,Combiner.CombineTextByDelimiter("", QuoteStyle.None),"Merged"),
#"Merged Queries" = Table.NestedJoin(Index,{"Index"},Merged,{"Index"},"Merged",JoinKind.LeftOuter),
#"Expanded Table2" = Table.ExpandTableColumn(#"Merged Queries", "Merged", {"Merged"}, {"Merged"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded Table2",{"Index"})
in #"Removed Columns"
you can then filter for "YES" among the merged answers if you want

Resources