Power Query - best way to sub select? - powerquery

Suppose I have a column representing object type and another column representing object color. I want to remove blue and red fruits (example of object type) but keep all other red and blue objects.
How can I acheive this in Power Query ?
Thanks,

Just (un)select (not) matching rows
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
Filtered = Table.SelectRows(Source, each not ([ObjectType] = "Fruit" and ([ObjectColor]="Red" or [ObjectColor]="Blue")))
in
Filtered

Here's one way:
If you start with this:
You can merge the two columns together like this:
Then filter out the "Fruit,Blue" and "Fruit,Red":
Which yields this:
And you can then delete the "Merged" column to get this:
Here's the M code:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ObjectType", type text}, {"ObjectColor", type text}}),
#"Inserted Merged Column" = Table.AddColumn(#"Changed Type", "Merged", each Text.Combine({[ObjectType], [ObjectColor]}, ","), type text),
#"Filtered Rows" = Table.SelectRows(#"Inserted Merged Column", each ([Merged] <> "Fruit,Blue" and [Merged] <> "Fruit,Red")),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Merged"})
in
#"Removed Columns"

Related

Filter rows where all matching rows are True

I have a table formatted as follows:
BOM
Imported
COM123
True
COM123
True
COM123
False
COM999
True
COM999
True
COM999
True
I'd like to filter the table to show only rows where all matching BOM rows are True in the Imported column. I.e., in this case, COM999 rows should show, and COM123 rows are filtered out because one entry is False.
FYI this is to produce a list of Production BOMs where all the components already exist in Business Central. The Imported column is the result from a merge of the query with an extract from BC, and sets the value to true where the BOM components exist.
Can anyone please give me a steer?
I've been farting around with this half the day, but I can't find the equivalent to a EXISTS query in SQL...
Algorithm should be clear in the code comments
Group by BOM
Test each BOM subgroup for ALLTRUE
Filter the Grouped table
Re-expand the original
let
//change next line to reflect actual data source
Source = Excel.CurrentWorkbook(){[Name="Table21"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"BOM", type text}, {"Imported", type logical}}),
//Group rows by BOM
//Then determine where all "Imported" = True
#"Grouped Rows" = Table.Group(#"Changed Type", {"BOM"}, {
{"all true", each List.AllTrue([Imported]), type logical},
{"all", each _, type table [BOM=nullable text, Imported=nullable logical]}}),
//Remove the false rows
//then delete the "all true" column
#"Filtered Rows" = Table.SelectRows(#"Grouped Rows", each ([all true] = true)),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"all true"}),
//re-expand
#"Expanded all" = Table.ExpandTableColumn(#"Removed Columns", "all", {"Imported"}, {"Imported"})
in
#"Expanded all"

Powerquery: Remove next n rows after occurence of value in column

I frequently have large datasets in powerquery where I need to remove/filter out the same row, as well as the following 13 whenever a certain value, in this case "Page" occurs. This occurs multiple times throughout the column.
I've tried referring to the next/previous rows by adding an index column and {[Index]+1} shenanigans but that either didn't work or took 15+ minutes to load.
I've tried setting up something with Table.RemoveFirstN(Text.Contains([Column], "Page"), 13) but that just errored out.
Would anyone know how I could filter the row where a value occurs, as well as the next n rows (index?) in Powerquery?
Kind regards,
This seems to work ok
We add an index. Test for "Page". In a new column, if Page is present, copy over the index. Fill down then group on that. Add 2nd index to the grouping. Expand all columns. Filter out anything where 2nd index is <14. Remove extra columns
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Merged Price Country", type text}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1),
#"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each try if Text.Contains([Merged Price Country],"Page") then [Index] else null otherwise null),
#"Filled Down" = Table.FillDown(#"Added Custom",{"Custom"}),
mGroup = Table.Group(#"Filled Down", {"Custom"}, {{"Data", each Table.AddIndexColumn(_, "Index2", 1, 1), type table}}),
#"Removed Columns" = Table.RemoveColumns(mGroup,{"Custom"}),
// expand all columns
List = List.Union(List.Transform(#"Removed Columns"[Data], each Table.ColumnNames(_))),
#"Expanded Data" = Table.ExpandTableColumn(#"Removed Columns", "Data", List,List),
#"Filtered Rows" = Table.SelectRows(#"Expanded Data", each [Custom]=null or [Index2] > 14),
#"Removed Columns1" = Table.RemoveColumns(#"Filtered Rows",{"Index", "Custom", "Index2"})
in #"Removed Columns1"
I skipped out on using Table.RemoveFirstN() on the groupings in code above case there are leading rows you want to keep, but you could use that instead of adding the 2nd index and filtering like below
let Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Merged Price Country", type text}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1),
#"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each try if Text.Contains([Merged Price Country],"Page") then [Index] else null otherwise null),
#"Filled Down" = Table.FillDown(#"Added Custom",{"Custom"}),
mGroup = Table.Group(#"Filled Down", {"Custom"}, {{"Data", each Table.RemoveFirstN(_, 13), type table}}),
#"Removed Columns" = Table.RemoveColumns(mGroup,{"Custom"}),
// expand all columns
List = List.Union(List.Transform(#"Removed Columns"[Data], each Table.ColumnNames(_))),
#"Expanded Data" = Table.ExpandTableColumn(#"Removed Columns", "Data", List,List),
#"Removed Columns1" = Table.RemoveColumns(#"Expanded Data",{"Index", "Custom"})
in #"Removed Columns1"
Different approach. Wonder which might be faster:
Create a list of rows to be removed (by row number)
Select the rows not in that list
let
Source = Excel.CurrentWorkbook(){[Name="Table12"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Text", type text}, {"Data", Int64.Type}}),
//Add index column
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1, Int64.Type),
//create list rows to be removed
textCol = List.Transform(#"Added Index"[Text], each
if _ = null then null
else if Text.Contains(_,"Page",Comparer.OrdinalIgnoreCase) then "RemoveMe"
else _),
//create list of positions to be removed
removePos = List.Combine(List.Transform(List.PositionOf(textCol,"RemoveMe",Occurrence.All), each {_..List.Min({_+13, List.Count(textCol)})})),
//Filter the table using the "RemoveMe" list
filter = Table.SelectRows(#"Added Index", each not List.Contains(removePos,[Index])),
#"Removed Columns" = Table.RemoveColumns(filter,{"Index"})
in
#"Removed Columns"

How to select certain column in power query

I would like to choose a certain columns in power query, but not using their names. Ex. I can do this in R, by command: select. I'm wondering how i can do it in power query. I found some information here, but not all that I need.
Any idea, if I want to refer to more than one column?
It doesn't work if I write the code as below:
#"Filtered Part Desc" = Table.SelectRows (
#"Removed Columns3",
each List.Contains(
{ "ENG", "TRANS" },
Record.Field(_, Table.ColumnNames(#"Removed Columns3") { 5, 6, 7 })
)
)
Let's say I have this table and want to do a couple of things to it.
First, I want to change the column type of the second and last columns. We can use Table.ColumnNames to do this using simple indexing (which starts at zero) as follows:
Table.TransformColumnTypes(
Source,
{
{Table.ColumnNames(Source){1}, Int64.Type},
{Table.ColumnNames(Source){3}, Int64.Type}
}
)
That works but requires specifying each index separately. If we want to unpivot these columns like this
Table.Unpivot(#"Changed Type", {"Col2", "Col4"}, "Attribute", "Value")
but using the index values instead we can use the same method as above
Table.Unpivot(
#"Changed Type",
{
Table.ColumnNames(Source){1},
Table.ColumnNames(Source){3}
}, "Attribute", "Value"
)
But is there a way to do this where we can use a single list of positional index values and use Table.ColumnNames only once? I found a relatively simple though unintuitive method on this blog. For this case, it works as follows:
Table.Unpivot(
#"Changed Type",
List.Transform({1,3}, each Table.ColumnNames(Source){_}),
"Attribute", "Value"
)
This method starts with the list of positional index values and then transforms them into column names by looking up the names of the columns corresponding to those positions.
Here's the full M code for the query I was playing with:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSlTSUTIE4nIgtlSK1YlWSgKyjIC4AogtwCLJQJYxEFcCsTlYJAXIMgHiKiA2U4qNBQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Col1 = _t, Col2 = _t, Col3 = _t, Col4 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{Table.ColumnNames(Source){1}, Int64.Type},{Table.ColumnNames(Source){3}, Int64.Type}}),
#"Unpivoted Columns" = Table.Unpivot(#"Changed Type", List.Transform({1,3}, each Table.ColumnNames(Source){_}), "Attribute", "Value")
in
#"Unpivoted Columns"

countif formula in power query

I had a formula in a table in excel
=IF([#STATUS]="",[KEY]&"_"&COUNTIF(INDEX([KEY],1):[#KEY],[#KEY]),"")
which showed me how often a value showed in the data. But the same is not working in Power Query
with the formula I use to get if the same value's position in a long data list, and then I use the same in index match formula to find and locate other relevant data
I am trying to achieve:
Date Name Frequency
1/10/2019 Adrian Bartholomeusz 1
1/10/2019 Aditya Tipnis 1
2/10/2019 Abdul Atef 1
2/10/2019 Aditya Tipnis 2
3/10/2019 Abdul Atef 2
In excel I used the formula "=COUNTIF(INDEX([Name],1):[#Name],[#Name])" but when I use the same in Power Query I am getting error
The key steps are:
Add Index
Group Rows
Transform Columns to add a sub-index.
Expand the data back.
The rest are cosmetics.
let
Source = Excel.CurrentWorkbook(),
Table1 = Source{[Name="Table1"]}[Content],
#"Added Index" = Table.AddIndexColumn(Table1, "Index", 0, 1),
#"Grouped Rows" = Table.Group(#"Added Index", {"key"}, {{"Data", each _, type table [key=number, f=text, Index=number]}}),
#"TransformColumns" = Table.TransformColumns(#"Grouped Rows",{"Data", (x) => Table.AddIndexColumn(x, "Index2", 1, 1)}),
#"Expanded Data" = Table.ExpandTableColumn(#"TransformColumns", "Data", {"excel formula", "Index", "Index2"}, {"excel formula", "Index", "Index2"}),
#"Added Custom" = Table.AddColumn(#"Expanded Data", "PQ method", each Text.From([key]) & "_" & Text.From([Index2])),
#"Sorted Rows" = Table.Sort(#"Added Custom",{{"Index", Order.Ascending}}),
#"Removed Columns" = Table.RemoveColumns(#"Sorted Rows",{"Index", "Index2"})
in
#"Removed Columns"

Arranging Data in a Query Table

I'm trying to figure out a simple way (using PowerQuery) to convert this:
into this:
I've spent days trying to figure out a simple way to do it.
Everything I've tried has failed.
You can use Group By on the Transform tab, group by project and define a aggregation for each of the segment columns (e.g. Sum).
Then adjust the created code from List.Sum to List.RemoveNulls.
Then add a column with nested tables from the segment columns, using Table.FromColumns.
Remove the original segment columns and expand the nested tables.
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Project", type text}, {"Segment1", type text}, {"Segment2", type text}, {"Segment3", type text}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"Project"}, {{"Segment1", each List.RemoveNulls([Segment1]), type text}, {"Segment2", each List.RemoveNulls([Segment2]), type text}, {"Segment3", each List.RemoveNulls([Segment3]), type text}}),
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "Tabled", each Table.FromColumns({[Segment1],[Segment2],[Segment3]},{"Segment1","Segment2","Segment3"})),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Segment1", "Segment2", "Segment3"}),
#"Expanded Tabled" = Table.ExpandTableColumn(#"Removed Columns", "Tabled", {"Segment1", "Segment2", "Segment3"}, {"Segment1", "Segment2", "Segment3"})
in
#"Expanded Tabled"

Resources