Remove duplicates in Power query editor

Remove duplicates in Power query editor - powerquery

I have below data:
Con Payment Status Count
HUMANABRATTEN,MICOL9/20/2021 Resubmitted 15
HUMANABRATTEN,MICOL9/20/2021 In-Process 1
they have exact same length but when I try to remove duplicate it always removes the "Resubmitted" whereas I want the high count Payment status
Normally in Excel, when we remove duplicate from any Data it always return the first value and remove 2nd value. IDK why its not working in Power Query

Power Query does not necessarily return results in the order you might expect. Even the sorts are unstable, if I recall correctly.
For your problem, one solution would be to use Table.GroupBy and then extract the desired results. In your case it seems to be the Max Count, and the Payment Status that is in the same row as Max Count.
eg:
let
Source = Excel.CurrentWorkbook(){[Name="Table5"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Con", type text}, {"Payment Status", type text}, {"Count", Int64.Type}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"Con"}, {
//Return the Payment Status cell that is in the same row as Max Count
{"Payment Status", each _[Payment Status]{List.PositionOf(_[Count],List.Max(_[Count]))}},
//determine the Max Count
{"Count", each List.Max([Count]), type nullable number}})
in
#"Grouped Rows"

Related

Power Query to filter only numbers

I've been searching everywhere to find a way to filter a column that contains both Text and Numbers, I want to filter out the numbers only from that column.
Thanks.

Add column, custom column, potentially with one of these
= Text.Select([Column1],{"0".."9"})
=try Number.From([Column1]) otherwise "Text"

Try this:
let
//Change next line to reflect Data source
Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
//change next line to include all columns and their names
#"Changed Type" = Table.TransformColumnTypes(Source,{{"COLUMN", type any}}),
//Change next line to be testing the proper column
#"Numbers Only" = Table.SelectRows(#"Changed Type", each not (try Number.From([COLUMN]))[HasError]),
#"Changed Type1" = Table.TransformColumnTypes(#"Numbers Only",{{"COLUMN", type number}})
in
#"Changed Type1"

Power Query M - Group by Column Value with Custom Aggregation (Percentile)

I am trying to calculate percentiles by group (from column values ex: hours by department, sales by region, etc.) within power query. This same logic could be used for other custom group aggregation. After lots of searching, I found 2 potential approaches.
Approach 1:
this archived article which looked to have the perfect answer. Nothing else I could find comes close.
The solution from there is the following custom function:
//PercentileInclusive Function
(inputSeries as list, percentile as number) =>
let
SeriesCount = List.Count(inputSeries),
PercentileRank = percentile * (SeriesCount - 1) + 1, //percentile value between 0 and 1
PercentileRankRoundedUp = Number.RoundUp(PercentileRank),
PercentileRankRoundedDown = Number.RoundDown(PercentileRank),
Percentile1 = List.Max(List.MinN(inputSeries, PercentileRankRoundedDown)),
Percentile2 = List.Max(List.MinN(inputSeries, PercentileRankRoundedUp)),
PercentileInclusive = Percentile1 + (Percentile2 - Percentile1) * (PercentileRank - PercentileRankRoundedDown)
in
PercentileInclusive
Combined with a step in your table to group appropriately and use the function:
=Table.Group(TableName, {"Grouping Column"}, {{"New Column name", each
PercentileInclusive(TableName[Column to calculate Percentile of], percentile # between 0 and 1)}})
[edited to correct the typo Ron R. pointed out and remove unnecessary detail]
Example input:
Pen Type
Units Sold
Ball-Point
6,109
Ball-Point
3,085
Ball-Point
1,970
Ball-Point
8,190
Ball-Point
6,006
Ball-Point
2,671
Ball-Point
6,875
Roller
778
Roller
9,329
Roller
7,781
Roller
4,182
Roller
2,016
Roller
5,785
Roller
1,411
Desired output for a 25% inclusive percentile grouped by Pen Type:
Pen Type
0.25 Inclusive Percentile (Correct)
Ball-Point
2,878
Roller
1,714
Notes: No decimals shown above, calculated with Excel's PERCENTILE.INC function.
Approach 1 works great.
Approach 2:
Here is an alternate Power Query solution I tried. It is a single step with no custom function. It seems like it should do the trick, but I can't figure out a way to make the conditional check be row based. Something needs to go where I have //Condition// that tells it which rows belong in the current rows group, but no matter what I try it does not work. It either breaks, or gives a percentile for everything, ignoring the grouping.
=List.Percentile(Table.Column(Table.SelectRows(#"Previous Step Name", //Condition//), "Column to calculate percentile of"), percentile # 0 to 1)
Any ideas how to make approach 2 work?

It appears your Table.Group function is incorrectly specified.
Where my previous step was #"Changed Type", the following works:
#"Grouped Rows" = Table.Group(#"Changed Type", {"Pen Type"}, {
{"Percentile", each fnPercentileINC([Units Sold],0.25)}})
Original Data
M Code
let
Source = Excel.CurrentWorkbook(){[Name="Table4"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Pen Type", type text}, {"Units Sold", Int64.Type}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"Pen Type"}, {
{"Percentile", each fnPercentileINC([Units Sold],0.25), type number}})
in
#"Grouped Rows"
Result
Edit:
For your approach #2, without a custom function, you can merely use List.Percentile as an aggregation in the Table.Group function:
#"Grouped Rows" = Table.Group(#"Changed Type", {"Pen Type"}, {
{"25th Percentile", each List.Percentile([Units Sold],0.25)}
})

Can I add a new column with Linear Interpolation in Power Query M?

I am working on extracting an Interest Rate curve from futures market prices and create a table (Table 1) inside power query with the following columns:
- BusinessDays: Represents the nr o business days from today to the expiry of each future contract
- InterestRate: Represents the rate from today until the expiry of the futures contract
The second table (table 2) refers to the ID of internal financial products that expire in different business days.
- InstrumentID: Unique internal ID a financial product selled by a financial institution
- BusinessDays: Represents the nr o business days from today to the expiry of each financial product
I am having some trouble with M language, and unfortunately this specific calculation must be executed in Excel, so i am restricted to Power Query M.
The specific step i am not able to do is:
Creating a function in power query that adds a new column do table 2 containing the interpolated interest rate os each financial product.
The end result i am looking for would look like this

There are several ways to approach this but one way or another, you'll need to do some kind of lookup to determine which bracket to match your BusinessDays value with, so you can calculate the interpolated value.
I think it's simpler to just generate an all inclusive list of days vs interest rates, and then do a Join to pull out the matches.
I Name'd this first query intRates and expanded the Interest Rate table:
let
//Get the interest rate/business day table
Source = Excel.CurrentWorkbook(){[Name="intRates"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"BusinessDays", Int64.Type}, {"InterestRate", Percentage.Type}}),
//Add two columns which are the interest rate and business day columns offset by one
//It is faster to subtract this way than by adding an Index column
offset=
Table.FromColumns(
Table.ToColumns(#"Changed Type")
& {List.RemoveFirstN(#"Changed Type"[BusinessDays]) & {null}}
& {(List.RemoveFirstN(#"Changed Type"[InterestRate])) & {null}},
type table[BusinessDays=Int64.Type, InterestRate=Percentage.Type, shifted BusDays=Int64.Type, shifted IntRate=Percentage.Type]),
//Add a column with a list of the interest rates for each data interpolated between the segments
#"Added Custom" = Table.AddColumn(offset, "IntList", each let
sbd=[shifted BusDays],
intRateIncrement = ([shifted IntRate]-[InterestRate])/([shifted BusDays]-[BusinessDays]),
Lists= List.Generate(
()=>[d=[BusinessDays],i=[InterestRate]],
each [d]< sbd,
each [d=[d]+1, i = [i]+intRateIncrement],
each [i])
in Lists),
//add another column with a list of days corresponding to the interest rates
#"Added Custom1" = Table.AddColumn(#"Added Custom", "dayList", each {[BusinessDays]..[shifted BusDays]-1}),
//remove the last row as it will have an error
remErrRow = Table.RemoveLastN(#"Added Custom1",1),
//create the new table which has the rates for every duration
intRateTable = Table.FromColumns(
{List.Combine(remErrRow[dayList]),List.Combine(remErrRow[IntList])},
type table[Days=Int64.Type, Interest=Percentage.Type])
in
intRateTable
This results in a table that has every day (from 39 to , with its corresponding interest rate.
Then read in the "Instruments" table and Join it with the intRates, using a JoinKind.LeftOuter
let
Source = Excel.CurrentWorkbook(){[Name="Instruments"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"InstrumentID", type text}, {"BusinessDays", Int64.Type}}),
//add the rate column
#"Merged Queries" = Table.NestedJoin(#"Changed Type", {"BusinessDays"}, intRates, {"Days"}, "intRates", JoinKind.LeftOuter),
#"Expanded intRates" = Table.ExpandTableColumn(#"Merged Queries", "intRates", {"Interest"}, {"Interest"})
in
#"Expanded intRates"
Some of the results in the middle part of the table differ from what you've posted, but seem to be consistent with the linear interpolation formula for between two values, so I'm not sure how the discrepancy arises

Power Query: Selecting multiple elements in 'value field settings' to measure a specifc field

I'm trying to create a measure that averages the 29 elements' value [Overtime/Hours_worked] into one cell, visualised by the attached image.
Cell F32 currently shows [AverageA Total Overtime/Total Hours_worked] but I want it to be an average of the 29 rows' values as displayed in cell H32, =AVERAGEA(F3:F31).
The elements' figures are based upon underlying data from Data$, currently amounting to ~150k rows. When creating a measure that's averaging the elements' values from Column E [AvereageA Overtime/Hours_worked] and showing as a % of the 29 elements' aggregate %, I'm running into the problem of averaging the separate elements' underlying data taken from Data$. Worth noting is that F3:F31 is redundant in this instance, I'm looking for the average of the 29 elements' values in column E and not their respective averages shown in column F.
Am I right to use measure here or is there a better way to approach it? If measures can be used, is there a way to design the measure so that it refers to the Pivot Table's shown data instead of the underlying data taken from Data$? For instance by designing the measure to refer to column E in the pivot table?
Side note
The table needs to remain dynamic since Data$ is being updated regularly. I'm relatively new to Power Query so I'm not sure if there are other ways to solve this, i.e. through MDX, but I doubt I'll be able to sort that out myself.
Any and all help is appreciated, thanks.

I'm not sure how you are computing the individual entries in the AverageA Total Overtime/Total Hours_worked (so I left it blank), but to compute the totals and averages for the other columns, you can use the Table.Group command in a special way with an empty list for the key (so as to return the entire table for the Aggregation operations).
Given:
M Code
read the comments in the code to understand the algorithm
If your overtime% column is in your original data, you can just delete those code lines that add that column
let
//be sure to change table name in next line to your actual table name
Source = Excel.CurrentWorkbook(){[Name="wrkTbl"]}[Content],
//set data types
#"Changed Type" = Table.TransformColumnTypes(Source,{
{"Area", type text}, {"Hours_worked", Int64.Type}, {"Overtime", Int64.Type}}),
//Add the percent overtime column
#"Added Custom" = Table.AddColumn(#"Changed Type", "Overtime/Hours_worked",
each [Overtime]/[Hours_worked], Percentage.Type),
//Use Table.Group to compute total/averages for the last row
//Be sure to use the exact same names as used in the original table
#"Grouped Rows" = Table.Group(#"Added Custom", {}, {
{"Area", each "Totals",type text},
{"Hours_worked", each List.Sum([Hours_worked]), Int64.Type},
{"Overtime", each List.Sum([Overtime]), Int64.Type},
{"Overtime/Hours_worked", each List.Sum([Overtime])/List.Sum([Hours_worked]), Percentage.Type},
{"AverageA Overtime/Hours_worked", each List.Average([#"Overtime/Hours_worked"]), Percentage.Type}
}),
//Append the two tables to add the Totals row
append= Table.Combine({ #"Added Custom", #"Grouped Rows"})
in
append
results in =>

Inserting text manually in Powerquery

I'm merging multiple Excel files into one where the user can review and mark an additional Comment column as completed. Each day there are additional files and I need to refresh the query and pull the data in. Keeping the original Comment column values.
I've attempted to do this by referencing Marcel Beug's video but that uses an sql table and I cannot seem to get it to work with the Excel files as the source.
After the Merge Queries I attempt to modify the first file to my source "InputFile"
![Modify the Merge Formula1][2]
![Changed to last query step of InputFile][3]
![InputFile Query with Source2 and Merge][4]
![M Code of InputFile Query with Merge][5]
By setting the First field in the Merge Formula to the last step in the InputFile query I was able to get around the Cyclic error however I find that every Refresh creates duplicate rows. 4 become 8 that then becomes 16, etc.
let
Source = Excel.Workbook(File.Contents("S:\Fin_Aid\Operations Team\COD mpn - lec\InputFiles\8.22.18 to 8.23.18.xlsx"), null, true),
Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
Rename_RecID = Table.RenameColumns(#"Removed Columns",{{"Column3.1", "RecID"}}),
Source2 = Excel.CurrentWorkbook(){[Name="InputFile"]}[Content],
InputWithComment = Table.TransformColumnTypes(Source2,{{"RecID", Int64.Type}, {"Column1", type text}, {"Column2", type text}, {"Column4", type text}, {"Column5", type text}, {"Comment", type text}}),
#"Merged Queries" = Table.NestedJoin(Rename_RecID,{"RecID"},InputWithComment,{"RecID"},"InputWithComment",JoinKind.LeftOuter),
#"Expanded InputWithComment" = Table.ExpandTableColumn(#"Merged Queries", "InputWithComment", {"Comment"}, {"Comment"})
in
#"Expanded InputWithComment"
Regards,
Jim

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Remove duplicates in Power query editor - powerquery

Related

Power Query to filter only numbers

Power Query M - Group by Column Value with Custom Aggregation (Percentile)

Can I add a new column with Linear Interpolation in Power Query M?

Power Query: Selecting multiple elements in 'value field settings' to measure a specifc field

Inserting text manually in Powerquery

Categories

Resources