How would one split the data in a column of an unstuctured format and put it back into structured a structured form? - powerquery

The issue I am dealing with basically has a field for 2 types of data (SERIAL_NO) and (EXP_DATE) and a third field for unstructured data entry (COMMENTS). Additional information is placed in the unstructured data field that I need to turn into structured data.
OPER_KEY
TIME_STAMP
SERIAL_NO
EXP_DATE
COMMENTS
35374
1/12/2021 11:30
M161001
10/31/2021
M190426, 5/31/2021
35374
1/8/2021 13:59
M161001
10/31/2021
M190426 , 5/31/2021
35374
1/25/2021 15:32
M190426
5/31/2021
M161001, EXP 10/31/21
35413
9/13/2019 16:21
M151144
11/30/2019
EXAMPLE TEXT
35413
9/12/2019 11:44
M15144
11/30/2019
35413
9/14/2019 12:15
M190426
4/30/2020
M151144 M190426
35058
1/14/2019 8:53
M180117
1/31/2019
E190426 5/31/2021 M161001 10/31/21
There are no easy delimiters. The format is, the first letter is a character, followed by 6 digits. Then sometimes followed by a date. These need to be merged back into the original table like so.
OPER_KEY
TIME_STAMP
SERIAL_NO
EXP_DATE
35374
1/12/2021 11:30
M161001
10/31/2021
35374
1/12/2021 11:30
M190426
5/31/2021
35374
1/8/2021 13:59
M161001
10/31/2021
35374
1/8/2021 13:59
M190426
5/31/2021
35374
1/25/2021 15:32
M190426
5/31/2021
35374
1/25/2021 15:32
M161001
10/31/21
35413
9/13/2019 16:21
M151144
11/30/2019
35413
9/12/2019 11:44
M15144
11/30/2019
35413
9/14/2019 12:15
M190426
4/30/2020
35413
9/14/2019 12:15
M151144
35413
9/14/2019 12:15
M190426
35058
1/14/2019 8:53
M180117
1/31/2019
35058
1/14/2019 8:53
E190426
5/31/2021
35058
1/14/2019 8:53
M161001
10/31/21
I can reference the original table and then group the section by "all rows" immediately to work with the data and later append the original table with the new data. The issue I am having is how to successfully parse out the relevant data into columns. Any recommendations on how to extract this information?

Here's one way that works on your data example: read the code comments and explore the Applied Steps to understand the algorithm:
let
Source = Excel.CurrentWorkbook(){[Name="Table5"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"OPER_KEY", Int64.Type}, {"TIME_STAMP", type datetime}, {"SERIAL_NO", type text}, {"EXP_DATE", type date}, {"COMMENTS", type text}}),
//Create list of Serial numbers from the Comments Column
#"Added Custom" = Table.AddColumn(#"Changed Type", "Comment SN", each if [COMMENTS] = null then {null} else
let
#"Split Text" = List.RemoveItems(Text.SplitAny([COMMENTS],", #(lf)"),{""}),
#"Serial Nos" = List.Transform(
List.Select(
List.Transform(#"Split Text", (li)=>Splitter.SplitTextByCharacterTransition({"A".."Z","a".."z"}, {"0".."9"})(li)),
each List.Count(_)=2),
each Text.Combine(_,""))
in
#"Serial Nos", type list),
//Create list of dates from the Comments column
#"Added Custom1" = Table.AddColumn(#"Added Custom", "Comments Dates", each if [COMMENTS] = null then {null} else
let
#"Split Text" = List.RemoveItems(Text.SplitAny([COMMENTS],", #(lf)"),{""}),
Dates = List.Select(#"Split Text", (li)=> Record.Field(try Date.From(li), "HasError")=false)
in
Dates, type list),
#"Removed Columns" = Table.Buffer(Table.RemoveColumns(#"Added Custom1",{"COMMENTS"})),
//create the results table
//First row is from the first four columns
//For subsequent rows we extract the corresponding SN and Date from the Comments column
#"Add Result Column" = Table.AddColumn(#"Removed Columns","Result", (tr)=>
List.Generate(
()=>[r=Record.FromList({tr[OPER_KEY],tr[TIME_STAMP],tr[SERIAL_NO],tr[EXP_DATE]},{"OPER_KEY","TIME_STAMP","SERIAL_NO","EXP_DATE"}), idx=0] ,
each [idx] <= List.Count(tr[Comment SN]),
each [r=Record.FromList({null, null, tr[Comment SN]{[idx]}, try tr[Comments Dates]{[idx]} otherwise null},{"OPER_KEY","TIME_STAMP","SERIAL_NO","EXP_DATE"}), idx=[idx]+1],
each [r]
) ),
//Remove all the columns except Results
//and expand
#"Removed Columns1" = Table.RemoveColumns(#"Add Result Column",{"OPER_KEY", "TIME_STAMP", "SERIAL_NO", "EXP_DATE", "Comment SN", "Comments Dates"}),
#"Expanded Result" = Table.ExpandListColumn(#"Removed Columns1", "Result"),
#"Expanded Result1" = Table.ExpandRecordColumn(#"Expanded Result", "Result", {"OPER_KEY", "TIME_STAMP", "SERIAL_NO", "EXP_DATE"}, {"OPER_KEY", "TIME_STAMP", "SERIAL_NO", "EXP_DATE"}),
//set the data types
#"Changed Type1" = Table.TransformColumnTypes(#"Expanded Result1",{{"OPER_KEY", Int64.Type}, {"TIME_STAMP", type datetime}, {"SERIAL_NO", type text}, {"EXP_DATE", type date}}),
#"Removed Blank Rows" = Table.SelectRows(#"Changed Type1", each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),
#"Filled Down" = Table.FillDown(#"Removed Blank Rows",{"OPER_KEY", "TIME_STAMP"})
in
#"Filled Down"

Related

Power Query M - Way to dynamically use Table.ColumnAdd using value literals to substitute for column names

Here is the script I have based on the steps generated:
let
Source = Sql.Database("BITESTDBSVR1", "DW_FINANCE"),
CORPDB_BTT_SCENARIO_DETAILS_WITH_BACKLOG = Source{[Schema="CORPDB",Item="BTT_SCENARIO_DETAILS_WITH_BACKLOG"]}[Data],
#"Filtered Rows" = Table.SelectRows(CORPDB_BTT_SCENARIO_DETAILS_WITH_BACKLOG, each [ScenarioHeaderID] = FromScenario or [ScenarioHeaderID] = ToScenario),
#"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each [ValueTypeName] <> "Backlog"),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows1",{"ID", "ContractID", "LineNumber", "Ophours", "Prime", "ScenarioHeaderID", "Division", "ORN", "BidNumber", "CustomerContractNumber", "ProjectID", "CustomerOrderID", "CurrentStatus", "CustomerBuyer", "ProgramName", "ProgramNotes", "Notes", "MajorProgram", "ProductFamily", "ProductSubFamily", "ProductGroup", "SeriesGroup", "Turret", "ProgramSeriesID", "OrderQuantity", "ContractValueSource", "SourceCurrencyID", "BookingFXRate", "ContractValueUSDollars", "PODateIn", "AcceptanceDate", "ContractAwardDate", "ContractAwardYear", "Probability", "FactoredValue", "ProductAccountManager", "ProgramManager", "InterIntraDivision", "InterIntraExternal", "Territory", "IDIQ", "Platform", "CustomerBuyerCountry", "EndUser", "LHXRegion", "LHXName", "ProcurementLocation", "CustomerType1", "CustomerType2", "ExternalCustomer", "SalesType", "BidType", "PricingType", "ExportLicenceNumber", "USExportStatus", "CANExportStatus", "ExportComments", "RevenueRecognitionCriteria", "IncoTerms", "ShipToCountry", "PaymentTerms", "EOIRMaster", "EOIRChildID", "BidDivision", "SubProduct", "ProductLine", "ServiceAccountManager", "BusinessUnit", "OffsetCapture", "EndUserBusinessPartner", "GeographicRegion", "GeographicSubRegion", "ContractDuration", "RevRecTiming", "ForecastRevenueMonths", "InvoiceNumber", "WorkOrdernumber", "SerialNumber", "OrderLine", "PKLineIdentifier", "Year1", "Year2", "Year3", "BI1", "BI2", "BI3", "RI1", "RI2", "RI3", "VMProject", "ServiceContractNo", "MainProject", "LNProject", "ProjectContract", "SalesForceRef", "SalesRegion", "FiscalDate", "ScenarioCreated", "ScenarioUpdated", "ScenarioWeek", "Scenario", "ValueType"}),
#"Pivoted Column" = Table.Pivot(#"Removed Columns", List.Distinct(#"Removed Columns"[ScenarioName]), "ScenarioName", "Value", List.Sum),
#"Replaced Value" = Table.ReplaceValue(#"Pivoted Column",null,0,Replacer.ReplaceValue,{ScenarioFrom, ScenarioTo}),
#"Inserted Subtraction" = Table.AddColumn(#"Replaced Value", "Subtraction", each [#"Jan 15, 2021"] - [#"Mar 5, 2021"], type number),
#"Changed Type" = Table.TransformColumnTypes(#"Inserted Subtraction",{{"Subtraction", type number}}),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"Subtraction", "Variance"}}),
#"Merged Queries" = Table.NestedJoin(#"Renamed Columns", {"InterIntra"}, #"CORPDB INTER_INTRA_MAPPING", {"INTER DIVISION"}, "CORPDB INTER_INTRA_MAPPING", JoinKind.LeftOuter),
#"Expanded CORPDB INTER_INTRA_MAPPING" = Table.ExpandTableColumn(#"Merged Queries", "CORPDB INTER_INTRA_MAPPING", {"INTER DIVISION NAME"}, {"INTER DIVISION NAME"}),
#"Reordered Columns" = Table.ReorderColumns(#"Expanded CORPDB INTER_INTRA_MAPPING",{"Forecast", "ProgramGroupID", "InterIntra", "INTER DIVISION NAME", "EndUserCountry", "ValueTypeName", "Month", "Year", "Quarter", "MonthNumber", "FiscalYearMonth", "QuarterStart", "QuarterEnd", ScenarioFrom, ScenarioTo , "Variance"}),
#"Renamed Columns1" = Table.RenameColumns(#"Reordered Columns",{{"INTER DIVISION NAME", "InterIntraDivision"}}),
#"Removed Columns1" = Table.RemoveColumns(#"Renamed Columns1",{"InterIntra"})
in
#"Removed Columns1"
The part of the code I want to update is
#"Inserted Subtraction" = Table.AddColumn(#"Replaced Value", "Subtraction", each [#"Jan 15, 2021"] - [#"Mar 5, 2021"], type number),
Where I have 2 text variables (ScenarioFrom, ScenarioTo) that I want to dynamically substitute in the definition to say
#"Inserted Subtraction" = Table.AddColumn(#"Replaced Value", "Subtraction", each [Scenariofrom] - [Scenarioto] , type number),
I get that I'm trying to force a literal into a table column which is causing the problem, but wondering if there is a function / easy work-around without transforming the data entirely.
When you are defining a custom column, writing each [ColName] is short for each _[ColName] where _ represents the current row, which is a Record type.
With this in mind, we can define the code like this
each Record.Field(_, ScenarioFrom) - Record.Field(_, ScenarioTo)
instead of
each [#"Jan 15, 2021"] - [#"Mar 5, 2021"]
This question is similar but uses the value in a different column rather than parameters/variables:
PowerQuery choose values based on a key column

Power query from Microsoft Exchange zip folder attachment

I am trying to get data from my email and the data is attached to an email and compressed in a zip folder.Is there a way to extract the contents of the zip folder from my email and into power query?
This is what I have so far
let Source = Exchange.Contents("xxxxx#xxx.com"), Mail1 = Source{[Name="Mail"]}[Data], #"Filtered Rows" = Table.SelectRows(Mail1, each ([Folder Path] = "\xxxx Report\")), #"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", let latest = List.Max(#"Filtered Rows"[DateTimeReceived]) in each [DateTimeReceived] = latest), #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows1",{"Attachments"}), #"Added Custom" = Table.AddColumn(#"Removed Other Columns", "Custom", each UnzipContent([Attachments])) in #"Added Custom"
This gives me an error.
Please assist,
Vava

DateKey formatted as YYYYMMDD

I have the following calendar table script:
let
StartDate = #date(2016, 1, 1),
EndDate = #date(2018, 12, 31),
CurrentDate = DateTime.Date(DateTime.FixedLocalNow()),
FiscalYearEndMonth = 6,
ListDates = List.Dates(StartDate, Number.From(EndDate - StartDate)+1, #duration(1,0,0,0)),
#"Converted to Table" = Table.FromList(ListDates, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Renamed Columns as Date" = Table.RenameColumns(#"Converted to Table",{{"Column1", "Date"}}),
#"Changed Type to Date" = Table.TransformColumnTypes(#"Renamed Columns as Date",{{"Date", type date}}),
#"Added Calendar MonthNum" = Table.AddColumn(#"Changed Type to Date", "MonthNum", each Date.Month([Date]), Int64.Type),
#"Added Calendar Year" = Table.AddColumn(#"Added Calendar MonthNum", "Year", each Date.Year([Date]), Int64.Type),
#"Added MonthYearNum" = Table.AddColumn(#"Added Calendar Year", "MonthYearNum", each [Year]*100 + [MonthNum] /*e.g. Sep-2016 would become 201609*/, Int64.Type),
#"Added Day Num" = Table.AddColumn(#"Added MonthYearNum", "DayNum", each Date.Day([Date])),
#"Changed Type1" = Table.TransformColumnTypes(#"Added Datekey",{{"DayNum", Int64.Type}}),
#"Added Datekey" = Table.AddColumn(#"Added Day Num", "DateKey", each [Year]*100 + [MonthNum] + [DayNum]),
#"Changed Type" = Table.TransformColumnTypes(#"Changed Type1",{{"DayNum", Int64.Type}, {"DateKey", Int64.Type}})
in
#"Changed Type"
It results in the following:
So the last column is causing me the headache - I think the code for that column is:
each [Year]*100 + [MonthNum] + [DayNum]
It seems like the first two elements of this expression get concatenated as expected but then DayNum is added e.g. for 1st of Jan we get 2016 concatenated with 1 to give 201601 but then the DayNum gets added mathematically to it, to give 201602 but I want it to give 20160101 (format YYYYMMDD)
Why is this happening and what is the correct M/PowerQuery to avoid this behavior?
I believe you are correct as far as what it's doing. How about this instead?
each [Year]*10000 + [MonthNum]*100 + [DayNum]
or you could do
each [MonthYearNum]*100 + [DayNum]

Splitting a Column into new schema

Being relatively new to Power Query and M, I am attempting to split a column using a delimiter value, but also by the prefix into a given schema.
The raw table values at like the following:
20170101T1231_Name_A#1234_C#AB DEF_D#Comment
20170203T1543_A#11111_B#COL2_C#XYZ QRSTUV_D#Comment
I can use the _ as the delimiter to split the column into multiple columns, however the desired result is to have each of the prefixed # values in their own columns.
DATE&TIME | Text | A# | B# | C# | D#
20170101T1231 | Name | 1234 | | AB DEF | Comment
20170203T1543 | | 11111 | COL2 | XYZ QRSTUV | Comment
This code should do the trick:
let
Source = Table1,
#"Split Column by Position" = Table.SplitColumn(Source, "Column1", Splitter.SplitTextByPositions({0, 13}, false), {"Column1.1", "Column1.2"}),
#"Changed Type" = Table.TransformColumnTypes(#"Split Column by Position",{{"Column1.1", type datetime}, {"Column1.2", type text}}),
Splitted = Table.TransformColumns(#"Changed Type",{{"Column1.2", each Text.Split(_,"_")}}),
#"Added Index" = Table.AddIndexColumn(Splitted, "Index", 0, 1),
#"Expanded Column1.2" = Table.ExpandListColumn(#"Added Index", "Column1.2"),
#"Filtered Rows" = Table.SelectRows(#"Expanded Column1.2", each [Column1.2] <> null and [Column1.2] <> ""),
AddedTextLabel = Table.TransformColumns(#"Filtered Rows",{{"Column1.2", each if Text.Contains(_,"#") then _ else " Text#"&_}}),
#"Inserted Text After Delimiter" = Table.AddColumn(AddedTextLabel, "Text After Delimiter", each Text.AfterDelimiter([Column1.2], "#", 0), type text),
#"Trimmed Text1" = Table.TransformColumns(#"Inserted Text After Delimiter",{{"Column1.2", each Text.Start(_,1+Text.PositionOf(_,"#"))}}),
#"Pivoted Column" = Table.Pivot(#"Trimmed Text1", List.Sort(List.Distinct(#"Trimmed Text1"[Column1.2])), "Column1.2", "Text After Delimiter"),
#"Renamed Columns" = Table.RenameColumns(#"Pivoted Column",{{"Column1.1", "DATE&TIME"}, {" Text#", "Text"}}),
#"Removed Columns" = Table.RemoveColumns(#"Renamed Columns",{"Index"})
in
#"Removed Columns"

text contains another field (power query)

i need to filter a field which contains another field in the same query.
#"Filtered Rows" = Table.SelectRows(#"Filtered Rows1", each Text.Contains([ACIKLAMA], [SANTIYE]))
the error i got is
Expression.Error: The field 'SANTIYE' of the record wasn't found.
full code :
let
Source = Table.NestedJoin(Query1,{"Sicil No", "TARIH"},#"IK Bordro",{"Personel Kodu", "Bordro Tarihi"},"NewColumn",JoinKind.LeftOuter),
#"Expanded NewColumn" = Table.ExpandTableColumn(Source, "NewColumn", {"Santiye", "Taseron", "Turk/Yerel"}, {"Santiye", "Taseron", "Turk/Yerel"}),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded NewColumn",{{"TARIH", type date}}),
#"Added Conditional Column" = Table.AddColumn(#"Changed Type", "Bordro", each if [#"Turk/Yerel"] = "YEREL" then "YEREL BORDRO" else if Text.Contains([ACIKLAMA], "RUBLE") then "TURK RUBLE" else if Text.Contains([ACIKLAMA], "USD") then "TURK USD" else if Text.Contains([ACIKLAMA], "RUB") then "TURK RUBLE" else if Text.Contains([ACIKLAMA], "IZIN") then "TURK IZIN HAKKI" else if Text.Contains([ACIKLAMA], "IHBAR") then "TURK IHBAR HAKKI" else if Text.Contains([ACIKLAMA], "KIDEM") then "TURK KIDEM HAKKI" else "DIGER" ),
#"Reordered Columns" = Table.ReorderColumns(#"Added Conditional Column",{"Sicil No", "HESAP ADI", "TARIH", "ACIKLAMA", "Santiye", "Taseron", "Turk/Yerel", "Bordro", "Ruble Tahakkuk", "USD Tahakkuk"}),
#"Filtered Rows1" = Table.SelectRows(#"Reordered Columns", each [Santiye] <> null),
#"Filtered Rows" = Table.SelectRows(#"Filtered Rows1", each Text.Contains([ACIKLAMA], [Santiye] ))
in
#"Filtered Rows"
any ideas, workarounds ?
Maybe you are mixing up step names and field names?
It works fine with me as you can see here.
Code generated:
let
Source = ..... (table created, code not relevant),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ACIKLAMA", type text}, {"SANTIYE", type text}}),
#"Filtered Rows" = Table.SelectRows(#"Changed Type", each Text.Contains([ACIKLAMA],[SANTIYE]))
in
#"Filtered Rows"

Resources