Power Query - Remove text strings that contain lower case letters - powerquery

Situation: I am working in Power Query with data that was imported from a pdf file, and it came in a bit messy. I have a column that contains numbers, as well as text strings. Some of the text strings are mixed-case, containing both upper and lower case characters, while others contain only upper case characters.
Goal: I want to remove all numbers and all mixed-case text strings. The end result should show only the text strings that are completely upper case.
For example, I would want my end result to include things like IRA, IRREVOCABLE TRUST, CHARITABLE TRUST, but replace things like Number of Accounts, Totals, 14 with null.
What I have tried so far:
The following gets rid of numbers and lower case characters, but it doesn't quite work since it keeps upper case characters included in mixed-case character strings.
Table.AddColumn(#"Added Custom2", "Account Type" each Text.Select([AccountType], {"A".."Z"," "}), type text)
The following code gets rid of the mixed-case text strings, but it doesn't quite work because it doesn't remove the numbers. Also, it is too specific, requiring me to remove strings that contain specific words. I'd prefer to remove all strings that contain lower case characters.
Table.AddColumn(#"Added Custom2", "Account Type", each if [AccountType]= null or Text.Contains([AccountType],"Totals") or Text.Contains([AccountType],"of") or Text.Contains([AccountType],"report") then null else [AccountType])
Your insights would be appreciated. I'm a new PowerQuery user, so please be specific and detailed with your responses.

Hard to know exactly what you want since you provide no examples of your data.
If you have multiple (space-separated) strings in a cell, then you can use:
#"Added Custom" = Table.AddColumn(#"Changed Type", "allCaps", each
Text.Combine(
List.Accumulate(Text.Split([Column1]," "),
{},
(state, current)=>
if List.ContainsAny(
Text.ToList(current),
{"0".."9","a".."z",",",":","?","/","\"," "})
then state
else state & {current}),", "))
If you just have a single string in a cell, then you can use
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each
if List.ContainsAny(
Text.ToList([Column1]),
{"0".."9","a".."z",",",":","?","/","\"," "})
then null
else [Column1])
Then you can filter by deselecting the null in the added column
In each case, #"Changed Type" is the previous step. If that is not the case in your code, replace with the actual name of your previous step

First formula just looks for all capitals that dont contain a number
= if [Column1] = Text.Remove ([Column1],{"0".."9","a".."z"}) then [Column1] else null
Second formula removes all numbers, THEN looks for all capitals that dont contain a number
= if Text.Remove ([Column1],{"0".."9"}) = Text.Remove ([Column1],{"a".."z","0".."9"}) then Text.Remove ([Column1],{"0".."9"}) else null
let Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each if [Column1] = Text.Remove ([Column1],{"0".."9","a".."z"}) then [Column1] else null),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "Custom2", each if Text.Remove ([Column1],{"0".."9"}) = Text.Remove ([Column1],{"a".."z","0".."9"}) then Text.Remove ([Column1],{"0".."9"}) else null)
in #"Added Custom1"
~ ~ ~
If you are looking to parse words out of a list of words.......
This retains words that (a) have no numbers before after or within, and (b) are all in capitals
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each
Text.Combine(
List.RemoveNulls(
List.Transform(Text.Split([Column1]," "), each
if _ = Text.Remove (_,{"a".."z","0".."9"}) then _ else null
))," "))
in #"Added Custom"
This removes all numbers, and after that retains words that are all in capitals
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each
Text.Combine(
List.RemoveNulls(
List.Transform(Text.Split(Text.Remove ([Column1],{"0".."9"})," "), each
if _ = Text.Remove (_,{"a".."z"}) then _ else null
))," "))
in #"Added Custom"

Related

Power query grouping

Can Power query do this?
So I have a group of parent IDs. If the parent Ids are the same but the values from the corresponding attributes are different, I want PQ to let me know they can be grouped together.
Here is the example.
So Parent IDs 12345 are the same, and the values are different, I want the output to say SDSKU..Yes Then if the Parent IDs 333 are the same and values are the same, then that will not be a grouping and I want it to say NO. See image link
If you mean by "values" the values of the column "Color", try the M code below :
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Parent ID", Int64.Type}, {"Kitchen Sink", Int64.Type}, {"Color", type text}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"Parent ID", "Kitchen Sink"}, {{"AllData", each _, type table [Parent ID=nullable number, Kitchen Sink=nullable number, Color=nullable text]}, {"OccuID", each Table.RowCount(_), Int64.Type}}),
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "NumberOfColors", each List.Count(List.Distinct([AllData][Color]))),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "SDSKU", each if [OccuID] = [NumberOfColors] then "Yes" else "No"),
#"Expanded AllData" = Table.ExpandTableColumn(#"Added Custom1", "AllData", {"Kitchen Sink", "Color"}, {"Kitchen Sink.1", "Color"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded AllData",{"OccuID", "NumberOfColors"})
in
#"Removed Columns"
If "attributes" are the value of every column except the one named Parent ID, try the M code below :
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Grouped Rows" = Table.Group(Source , {"Parent ID"}, {
{"data", each _, type table },
{"check", each if Table.RowCount(_) = Table.RowCount(Table.Distinct(_, List.Difference(Table.ColumnNames(_),{"Parent ID"}))) then "YES" else "NO"}}),
#"Expanded data" = Table.ExpandTableColumn(#"Grouped Rows", "data", List.Difference(Table.ColumnNames(Source),{"Parent ID"}), List.Difference(Table.ColumnNames(Source),{"Parent ID"}))
in #"Expanded data"

Power Query - running total that resets when values change

I have been searching for a week now and cannot find a resolution to my problem. I have a table which lists the "event" and individual is in during a certain week. I want to add a column - via PowerQuery - that will count the number of weeks a person has been in that event and then resets if the event changes in the following week. For example...
Pers1
Date
Event
Weeks in Event
Pers1
12/22/2022
Consideration
1
Pers1
12/26/2022
Consideration
2
Pers1
1/05/2022
Interview
1
Pers1
1/12/2022
Consideration
1
Pers1
1/19/2022
Consideration
2
Pers1
1/26/2022
Awaiting Hire
1
Pers2
1/19/2022
Awaiting Hire
1
Pers2
1/26/2022
Awaiting Hire
2
Note how the count resets back to starting at 1 when Pers1 has their second stint of Consideration during weeks 1/12 and 1/19. Additionally, I need the solution to be smart enough to distinguish between two different individuals and uniquely count their time in an event.
This community has always come through for me. Please help!
EDIT 1: I incorporated the code provided by Ron and am receiving the following error: Expression.Error: 5 arguments were passed to function which expects between 2 and 4.
Details:
Pattern=
Arguments=List
PQ Advanced Editor Code is below:
let
Source = Excel.Workbook(File.Contents("C:\Location"), null, true),
Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
#"Promoted Headers" = Table.PromoteHeaders(Sheet1_Sheet, [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Pers1", type text}, {"Date", type date}, {"Event", type text}}),
//add an offset column for Pers and Event to do easy comparison with previous row
offsetPersonEvent=Table.FromColumns(
Table.ToColumns(#"Changed Type")
& {List.RemoveFirstN(#"Changed Type"[Pers1]) & {null}}
& {List.RemoveFirstN(#"Changed Type"[Event]) & {null}},
type table[Pers=text, Date=date,Event=text,offsetPers=text, offsetEvent=text]
),
//create "grouper" column by checking where both Pers and Event change
#"Added Index" = Table.AddIndexColumn(offsetPersonEvent, "Index", 0, 1),
#"Added Custom" = Table.AddColumn(#"Added Index", "groups",
each if [Pers]=[offsetPers] and [Event]=[offsetEvent] then null else [Index]),
//remove unneeded columns, fillUp the grouper, and group by "grouper"
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"offsetPers", "offsetEvent", "Index"}),
#"Filled Up" = Table.FillUp(#"Removed Columns",{"groups"}),
//Add Index column to each subtable
#"Grouped Rows" = Table.Group(#"Filled Up", {"groups"}, {
{"addedIndex", each Table.AddIndexColumn(_,"Weeks in Event",1,1,Int64.Type)
, type table}}),
//Remove unneccessary columns
// Expand the grouped tables
// reset the data types
#"Removed Columns1" = Table.RemoveColumns(#"Grouped Rows",{"groups"}),
#"Expanded addedIndex" = Table.ExpandTableColumn(#"Removed Columns1", "addedIndex", {"Pers", "Date", "Event", "Weeks in Event"}, {"Pers", "Date", "Event", "Weeks in Event"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Expanded addedIndex",{{"Pers", type text}, {"Date", type date}, {"Event", type text}, {"Weeks in Event", Int64.Type}})
in
#"Changed Type1"
Assuming your data is sorted by Person and then by Date, as you show, you can use the following M-Code.
(If your data is not so sorted, then you could merely add steps initially to sort it appropriately, and then continue with the code shown)
Please read the code comments and examine the Applied steps to understand the algorithm
Open the Advanced Editor and paste in the code below
let
//change table name in next line to your actual table name
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Pers1", type text}, {"Date", type date}, {"Event", type text}}),
//add an offset column for Pers and Event to do easy comparison with previous row
offsetPersonEvent=Table.FromColumns(
Table.ToColumns(#"Changed Type")
& {List.RemoveFirstN(#"Changed Type"[Pers1]) & {null}}
& {List.RemoveFirstN(#"Changed Type"[Event]) & {null}},
type table[Pers=text, Date=date,Event=text,offsetPers=text, offsetEvent=text]
),
//create "grouper" column by checking where both Pers and Event change
#"Added Index" = Table.AddIndexColumn(offsetPersonEvent, "Index", 0, 1),
#"Added Custom" = Table.AddColumn(#"Added Index", "groups",
each if [Pers]=[offsetPers] and [Event]=[offsetEvent] then null else [Index]),
//remove unneeded columns, fillUp the grouper, and group by "grouper"
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"offsetPers", "offsetEvent", "Index"}),
#"Filled Up" = Table.FillUp(#"Removed Columns",{"groups"}),
//Add Index column to each subtable
#"Grouped Rows" = Table.Group(#"Filled Up", {"groups"}, {
{"addedIndex", each Table.AddIndexColumn(_,"Weeks in Event",1,1,Int64.Type)
, type table}}),
//Remove unneccessary columns
// Expand the grouped tables
// reset the data types
#"Removed Columns1" = Table.RemoveColumns(#"Grouped Rows",{"groups"}),
#"Expanded addedIndex" = Table.ExpandTableColumn(#"Removed Columns1", "addedIndex", {"Pers", "Date", "Event", "Weeks in Event"}, {"Pers", "Date", "Event", "Weeks in Event"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Expanded addedIndex",{{"Pers", type text}, {"Date", type date}, {"Event", type text}, {"Weeks in Event", Int64.Type}})
in
#"Changed Type1"
Ron's code worked with one minor tweak. For me, the following section of code was causing an error
//Add Index column to each subtable
#"Grouped Rows" = Table.Group(#"Filled Up", {"groups"}, {
{"addedIndex", each Table.AddIndexColumn(_,"Weeks in Event",1,1,Int64.Type)
, type table}}),
I removed the Int64.Type parameter from the Table.AddIndexColumn and everything functioned. I've included the updated code snip-it below:
let
Source = Excel.Workbook(File.Contents("C:\Location"), null, true),
Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
#"Promoted Headers" = Table.PromoteHeaders(Sheet1_Sheet, [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Pers1", type text}, {"Date", type date}, {"Event", type text}}),
//add an offset column for Pers and Event to do easy comparison with previous row
offsetPersonEvent=Table.FromColumns(
Table.ToColumns(#"Changed Type")
& {List.RemoveFirstN(#"Changed Type"[Pers1]) & {null}}
& {List.RemoveFirstN(#"Changed Type"[Event]) & {null}},
type table[Pers=text, Date=date,Event=text,offsetPers=text, offsetEvent=text]
),
//create "grouper" column by checking where both Pers and Event change
#"Added Index" = Table.AddIndexColumn(offsetPersonEvent, "Index", 0, 1),
#"Added Custom" = Table.AddColumn(#"Added Index", "groups",
each if [Pers]=[offsetPers] and [Event]=[offsetEvent] then null else [Index]),
//remove unneeded columns, fillUp the grouper, and group by "grouper"
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"offsetPers", "offsetEvent", "Index"}),
#"Filled Up" = Table.FillUp(#"Removed Columns",{"groups"}),
//Add Index column to each subtable
#"Grouped Rows" = Table.Group(#"Filled Up", "groups", {
{"addedIndex", each Table.AddIndexColumn(_,"Weeks in Event",1,1)
, type table}}),
//Remove unneccessary columns
// Expand the grouped tables
// reset the data types
#"Removed Columns1" = Table.RemoveColumns(#"Grouped Rows",{"groups"}),
#"Expanded addedIndex" = Table.ExpandTableColumn(#"Removed Columns1", "addedIndex", {"Pers", "Date", "Event", "Weeks in Event"}, {"Pers", "Date", "Event", "Weeks in Event"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Expanded addedIndex",{{"Pers", type text}, {"Date", type date}, {"Event", type text}, {"Weeks in Event", Int64.Type}})
in
#"Changed Type1"

Powerquery: Remove next n rows after occurence of value in column

I frequently have large datasets in powerquery where I need to remove/filter out the same row, as well as the following 13 whenever a certain value, in this case "Page" occurs. This occurs multiple times throughout the column.
I've tried referring to the next/previous rows by adding an index column and {[Index]+1} shenanigans but that either didn't work or took 15+ minutes to load.
I've tried setting up something with Table.RemoveFirstN(Text.Contains([Column], "Page"), 13) but that just errored out.
Would anyone know how I could filter the row where a value occurs, as well as the next n rows (index?) in Powerquery?
Kind regards,
This seems to work ok
We add an index. Test for "Page". In a new column, if Page is present, copy over the index. Fill down then group on that. Add 2nd index to the grouping. Expand all columns. Filter out anything where 2nd index is <14. Remove extra columns
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Merged Price Country", type text}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1),
#"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each try if Text.Contains([Merged Price Country],"Page") then [Index] else null otherwise null),
#"Filled Down" = Table.FillDown(#"Added Custom",{"Custom"}),
mGroup = Table.Group(#"Filled Down", {"Custom"}, {{"Data", each Table.AddIndexColumn(_, "Index2", 1, 1), type table}}),
#"Removed Columns" = Table.RemoveColumns(mGroup,{"Custom"}),
// expand all columns
List = List.Union(List.Transform(#"Removed Columns"[Data], each Table.ColumnNames(_))),
#"Expanded Data" = Table.ExpandTableColumn(#"Removed Columns", "Data", List,List),
#"Filtered Rows" = Table.SelectRows(#"Expanded Data", each [Custom]=null or [Index2] > 14),
#"Removed Columns1" = Table.RemoveColumns(#"Filtered Rows",{"Index", "Custom", "Index2"})
in #"Removed Columns1"
I skipped out on using Table.RemoveFirstN() on the groupings in code above case there are leading rows you want to keep, but you could use that instead of adding the 2nd index and filtering like below
let Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Merged Price Country", type text}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1),
#"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each try if Text.Contains([Merged Price Country],"Page") then [Index] else null otherwise null),
#"Filled Down" = Table.FillDown(#"Added Custom",{"Custom"}),
mGroup = Table.Group(#"Filled Down", {"Custom"}, {{"Data", each Table.RemoveFirstN(_, 13), type table}}),
#"Removed Columns" = Table.RemoveColumns(mGroup,{"Custom"}),
// expand all columns
List = List.Union(List.Transform(#"Removed Columns"[Data], each Table.ColumnNames(_))),
#"Expanded Data" = Table.ExpandTableColumn(#"Removed Columns", "Data", List,List),
#"Removed Columns1" = Table.RemoveColumns(#"Expanded Data",{"Index", "Custom"})
in #"Removed Columns1"
Different approach. Wonder which might be faster:
Create a list of rows to be removed (by row number)
Select the rows not in that list
let
Source = Excel.CurrentWorkbook(){[Name="Table12"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Text", type text}, {"Data", Int64.Type}}),
//Add index column
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1, Int64.Type),
//create list rows to be removed
textCol = List.Transform(#"Added Index"[Text], each
if _ = null then null
else if Text.Contains(_,"Page",Comparer.OrdinalIgnoreCase) then "RemoveMe"
else _),
//create list of positions to be removed
removePos = List.Combine(List.Transform(List.PositionOf(textCol,"RemoveMe",Occurrence.All), each {_..List.Min({_+13, List.Count(textCol)})})),
//Filter the table using the "RemoveMe" list
filter = Table.SelectRows(#"Added Index", each not List.Contains(removePos,[Index])),
#"Removed Columns" = Table.RemoveColumns(filter,{"Index"})
in
#"Removed Columns"

Expanded column From List.Dates contracts after a subsequent merge

My data has invoiced rental with a start date and end date, which more often than not overlaps our fiscal periods. I used the function List.Dates to create records for each date between the start and end dates, which worked great. When trying to merge the data to get the fiscal periods for each new record, I lose all the listed dates except for the first one. Here is the advanced editor info:
let
Source = Covid19,
#"Removed Columns" = Table.RemoveColumns(Source,{"DTTRANS", "NOPRODUIT", "DSLIGNE", "QTEXP", "PXVENDANT", "MTLIGNE", "DTDEB", "DTFIN", "Location", "Tableau1.Nocardex"}),
#"Reordered Columns" = Table.ReorderColumns(#"Removed Columns",{"NoCardex", "COMLOC", "Facture", "JoursAjustés", "DateDébut", "DateFin", "ParJour"}),
#"Grouped Rows" = Table.Group(#"Reordered Columns", {"NoCardex", "COMLOC", "Facture", "JoursAjustés", "DateDébut", "DateFin"}, {{"LocationParJour", each List.Sum([ParJour]), type number}}),
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "Journee", each List.Dates([DateDébut],[JoursAjustés],#duration(1, 0, 0, 0))),
#"Expanded {0}" = Table.ExpandListColumn(#"Added Custom", "Journee"),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded {0}",{{"Journee", type date}}),
#"Removed Columns1" = Table.RemoveColumns(#"Changed Type",{"JoursAjustés", "DateDébut", "DateFin"}),
#"Merged Queries" = Table.NestedJoin(#"Removed Columns1", {"Journee"}, PériodesFiscales, {"DateTrans"}, "PériodesFiscales", JoinKind.LeftOuter),
#"Expanded {0}1" = Table.ExpandTableColumn(#"Merged Queries", "PériodesFiscales", {"Produit"}, {"PériodesFiscales.Produit"})
in
#"Expanded {0}1"
I am puzzled as to why I lose the dates. I am sure it is triviial. Hoping someone can help me figure this one out
Ok, this is a bit embarrassing. I found out it had nothing to do with the expanded List.Dates. The merge changed the order of records. I found out after pasting a 1000 records onto a spreasheet to recreate the merge in Power Query without the expanded List.Dates. Turns out that the merge changed the sort on the orignal record set. Sorry. :-)

not working order by in MS power query on accented letters

I have power query in MS excel 2016, I order data by name, but I have accented letters š, č, ... which are now sorted to the end of dataset but should be for example š after s or č after c. Is it possible how to make some workaround here? I guess maybe change encoding, but I can't find how.
#"Sorted Rows" = Table.Sort(#"Renamed Columns",{{{"Name", Order.Ascending}})
The best way I can think of to do this is to create a calculated column where you replace those special values and then sort on that column.
#"Added Custom" = Table.AddColumn(#"Renamed Columns", "Custom", each Text.Replace(Text.Replace([Name],"š","sz"),"č","cz")),
#"Sorted Rows" = Table.Sort("Added Custom",{{{"Custom", Order.Ascending}})
Once you've sorted, then you can delete that column.
I used your logic Alexis but add one step - lowercased the column. Also I replaced more values. So I'm posting it in case somebody is interested. Thanks a lot Alexis!
#"Added Custom2" = Table.AddColumn(#"Renamed Columns", "Custom", each [Name]),
#"Lowercased Text" = Table.TransformColumns(#"Added Custom2",{{"Custom", Text.Lower, type text}}),
#"Replaced Value" = Table.ReplaceValue(Table.ReplaceValue(Table.ReplaceValue(Table.ReplaceValue(Table.ReplaceValue(Table.ReplaceValue(Table.ReplaceValue(Table.ReplaceValue(Table.ReplaceValue(Table.ReplaceValue(Table.ReplaceValue(Table.ReplaceValue(Table.ReplaceValue(Table.ReplaceValue(Table.ReplaceValue(#"Lowercased Text","á","az",Replacer.ReplaceText,{"Custom"}),"č","cz",Replacer.ReplaceText,{"Custom"}),"ď","dz",Replacer.ReplaceText,{"Custom"}), "é","ez",Replacer.ReplaceText,{"Custom"}), "ě","ez",Replacer.ReplaceText,{"Custom"}), "í","iz",Replacer.ReplaceText,{"Custom"}), "ň","nz",Replacer.ReplaceText,{"Custom"}), "ó","oz",Replacer.ReplaceText,{"Custom"}), "ř","rz",Replacer.ReplaceText,{"Custom"}), "š","sz",Replacer.ReplaceText,{"Custom"}), "ť","tz",Replacer.ReplaceText,{"Custom"}), "ú","uz",Replacer.ReplaceText,{"Custom"}), "ů","uz",Replacer.ReplaceText,{"Custom"}), "ý","yz",Replacer.ReplaceText,{"Custom"}), "ž","zz",Replacer.ReplaceText,{"Custom"}),
#"Sorted Rows" = Table.Sort(#"Replaced Value",{{"Custom", Order.Ascending}}),
#"Removed Columns" = Table.RemoveColumns(#"Sorted Rows",{"Custom"})

Resources