Using the power query GUI to nest tables - powerquery

powerquery: This follow-up question relates to;
Aggregation/Summation of text and numeric fields
Ron, could you please clarify, from your “group by” code statements what is the equivalent if completing in the GUI.
I can get close but not the same as yours, or is it not possible to nest tables in the GUI ?
The "group by" portion of your code delivers this,
https://i.stack.imgur.com/rHz1B.png
I would like to achieve the same via the GUI
Below is what I am ultimately trying to achieve using code as the GUI did not work out as planned.
https://i.stack.imgur.com/23naf.png
I have tried "table.group" nesting as follows
Site
------->Agency
-------------->Division
Site
------->Agency
------->Division
Site
------->Agency
Site
------->Division
But not quite exactly what I want. Any assistance would be greatly appreciated.

I really don't understand your reluctance to use the Advanced Editor, but here is a method of adding a bunch of custom columns, each with their own custom formula, to get the results you show.
Paste in the M code and change the Source as per the previous instructions.
Then you can double click on any of the added custom columns in the Applied Steps to see the custom formula that was used.
I renamed the steps so they would be easier to follow.
Any step with a 'gear' icon on the right can be opened to examine the associated dialog box.
let
Source = Excel.CurrentWorkbook(){[Name="Table15"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"SiteName", type text}, {"Agency", type text}, {"Division", type text}, {"Staff Numbers", Int64.Type}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"SiteName"},
{{"ALL", each _, type table [SiteName=nullable text, Agency=nullable text, Division=nullable text, Staff Numbers=nullable number]}}),
Agency = Table.AddColumn(#"Grouped Rows", "Agency", each Text.Combine(List.Distinct([ALL][Agency]),"/")),
#"Agency Staff Numbers Tbl" = Table.AddColumn(Agency, "Agency Staff Numbers tbl", each
Table.Group([ALL],"Agency", {"Agency Staff Numbers",each List.Sum([Staff Numbers])})),
#"Agency Staff Numbers" = Table.AddColumn(#"Agency Staff Numbers Tbl", "Agency Staff Numbers",
each Text.Combine(List.Transform([Agency Staff Numbers tbl][Agency Staff Numbers],each Text.From(_)),"/")),
#"Removed Columns" = Table.RemoveColumns(#"Agency Staff Numbers",{"Agency Staff Numbers tbl"}),
Division = Table.AddColumn(#"Removed Columns", "Division", each Text.Combine(List.Distinct([ALL][Division]),"/")),
#"Division Staff Tbl" = Table.AddColumn(Division, "Divison Staff Numbers Tbl", each
Table.Group([ALL],"Division", {"Division Staff Numbers",(t)=> List.Sum(t[Staff Numbers])})),
#"Division Staff Numbers" = Table.AddColumn(#"Division Staff Tbl", "Division Staff Numbers", each
Text.Combine(List.Transform([Divison Staff Numbers Tbl][Division Staff Numbers],each Text.From(_)),"/")),
#"Removed Columns1" = Table.RemoveColumns(#"Division Staff Numbers",{"Divison Staff Numbers Tbl", "ALL"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Removed Columns1",{
{"Agency", type text}, {"Agency Staff Numbers", type text}, {"Division", type text}, {"Division Staff Numbers", type text}
})
in
#"Changed Type1"
I also added another site for debugging purposes:

Related

DAX or Power Query | Correct value of record based on related records within same table

The problem I'm trying to solve is on a multi-doctor planner database. Due to human error & bad habits, 3 different appointment status indicate to the user that the appointment actually took place. Ufortunately, there is an exception, evident only to the user, that takes place whenever a patient has more than one appointment on the same day, with the same doctor.
Case in point, Fulano de Tal had a multi stage consult with Dr. Smith on the 9th, starting at 13:30 hrs. The first 2 sessions (APP_IDs 2 and 3) are easily indentified as completed, but the one at 14:30 hrs had to have taken place, or would have been outright marked as cancelled. The reason it is known that APP_ID 4 took place is that 2 and 3 were completed. Fulano did not arrive to app_id 5, because it was on the next day, and there was no previous engagement on that day that could be used as a reference.
On the other hand, Pedrito was supposed to have a 3 stage consult with Dr. Doe. Pedrito did not arrive to APP_ID 6, but he did arrive for 7 and 8. APP_ID 7 completion is evident, but we only know 8 did so, because it was scheduled on the same day, at a later hour, whereas APP_ID 6 was scheduled before the one we know for certain took place.
APP_ID
Planner ID
Patient
Date
Date_Time
System Status
Completed?
1
Dr. Smith
Juan Perez
09-dec-2022
09-dec-2022 12:00
Completed
YES
2
Dr. Smith
Fulano de Tal
09-dec-2022
09-dec-2022 13:00
In Consult
YES
3
Dr. Smith
Fulano de Tal
09-dec-2022
09-dec-2022 13:30
Waiting
YES
4
Dr. Smith
Fulano de Tal
09-dec-2022
09-dec-2022 14:00
Called Upon
should be YES
5
Dr. Smith
Fulano de Tal
10-dec-2022
10-dec-2022 14:30
Called Upon
NO
6
Dr. Doe
Pedrito
09-dec-2022
09-dec-2022 09:00
Called Upon
NO
7
Dr. Doe
Pedrito
09-dec-2022
09-dec-2022 09:30
Completed
YES
8
Dr. Doe
Pedrito
09-dec-2022
09-dec-2022 10:00
Called Upon
should be YES
What I need is a calculated column that returns YES whenever:
The status is either Completed, In Consult or Waiting (this is the easy part)
The status is Called Upon AND the patient already had an appointment whose status is one of the above AND it took place on the same day AND it took place at a later time.
I already tried it on Dax, using a calculated countrows, like in this post, and adding additional conditions within the filter. But I guess because powerbi sorts the table so as to optimize storage, the earlier() function does can't properly do a sweep based on dates and time. Therefore, the solution might lie at powerquery, where I can use table.buffer to forcefully sort the table, but what I outright don't know how to do is add the calculated column that makes the full sweep to check for the easy condition and the four less than easy ones.
A solution in either powerquery or with dax work for me.
Please, help me out.
M / Powerquery method
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Date", type date}, {"Date_Time", type datetime}}),
// x[] is current row, [] is entire table
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom",
(x)=>Table.RowCount(Table.SelectRows( #"Changed Type", each
[Patient]=x[Patient]
and ([System Status]="Completed" or [System Status]="In Consult" or [System Status]="Waiting")
and x[System Status]="Called Upon"
and [Date]=x[Date]
and [Date_Time]<x[Date_Time]))
),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "Completed", each if [System Status]="Completed" or [System Status]="In Consult" or [System Status]="Waiting" or [Custom]>0 then "YES" else "NO"),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"Custom"})
in #"Removed Columns"
EDITED VERISON
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
process=(z as table)=> let
#"ProcessTable" = Table.AddColumn(z, "Custom",
(x)=>Table.RowCount(Table.SelectRows( z, each
([System Status]="Completed" or [System Status]="In Consult" or [System Status]="Waiting")
and x[System Status]="Called Upon"
and [Date_Time]<x[Date_Time]))
) in #"ProcessTable",
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Date", type date}, {"Date_Time", type datetime}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"Patient", "Date"}, {{"data", each process(_), type table }}),
#"Expanded data" = Table.ExpandTableColumn(#"Grouped Rows", "data", {"APP_ID", "Planner ID", "Date_Time", "System Status", "Custom"}, {"APP_ID", "Planner ID", "Date_Time", "System Status", "Custom"}),
#"Added Custom1" = Table.AddColumn(#"Expanded data", "Completed", each if [System Status]="Completed" or [System Status]="In Consult" or [System Status]="Waiting" or [Custom]>0 then "YES" else "NO")
in #"Added Custom1"
#horseyride
In the end, I went for an auxiliary table. I made a reference query in powerquery, selecting only the states that guarantee the completion of each appointment, and the columns needed to relate the ambiguous consults in the main table.
Although the appointment IDs are not unique, they almost are. So, even though a calculated ditinct count proved to be too much for my machine, a countrows of a filtered table (by the relatable traits) proved quite fast.
Thank you #horseyride.

CountIf Equivalent in Power Query , counts per row within self

I need help in creating a custom column that shows how many models per modality for each account. What would I need to input in the custom column section in power query.
It depends on how many other columns you have. I don't see an account column, but you mention one.
In general, in powerquery click select account and Modality columns. Right click, and use Group By. Use operation Count Rows with the new column name of your choice
Alternatively, [add aggregation] and use operation All Rows for that one
Then expand the new column using the arrows atop the new column to replace the missing data
edited answer to provide all potential combinations. Try
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ChildName", type text}, {"Modality", type text}, {"Model Info", type text}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"ChildName"}, {
{"Modality per ChildName", each Table.RowCount(_), Int64.Type},
{"Unique Modality per ChildName", each List.Count(List.Distinct(_[Modality])), Int64.Type},
{"data", each _, type table}
}),
#"Expanded data" = Table.ExpandTableColumn(#"Grouped Rows", "data", {"Modality", "Model Info"}, {"Modality", "Model Info"}),
#"Grouped Rows1" = Table.Group(#"Expanded data", {"ChildName", "Modality"}, {
{"data", each _, type table },
{"Model Info Per Modality", each Table.RowCount(_), Int64.Type},
{"Unique Model Info Per Modality", each List.Count(List.Distinct(_[Model Info])), Int64.Type}
}),
#"Expanded data1" = Table.ExpandTableColumn(#"Grouped Rows1", "data", {"Modality per ChildName", "Unique Modality per ChildName", "Model Info"}, {"Modality per ChildName", "Unique Modality per ChildName", "Model Info"})
in #"Expanded data1"

power query subtract row below from row above using multiple conditions

I am using Power Query in Excel and I need to calculate the duration on each "Door_side" using the Time column on a daily level for each individual user.
The data comes from a card based access system and is formatted as follows:
Date Time User_No Door_side
03/12 08:59 User_05 Outside
03/12 09:00 User_33 Inside
03/12 09:01 User_10 Outside
03/12 09:01 User_04 Outside
03/12 09:02 User_26 Outside
03/12 09:03 User_19 Outside
03/12 09:03 User_15 Inside
03/12 09:04 User_31 Inside
03/12 09:05 User_31 Outside
03/12 09:06 User_15 Outside
03/12 09:06 User_06 Inside
03/12 09:06 User_06 Inside
03/12 09:06 User_06 Inside
03/12 09:08 User_32 Outside
03/12 09:09 User_10 Inside
03/12 09:09 User_13 Inside
03/12 09:10 User_10 Outside
I tried the following:
Sorted the Rows by Date, User and Time;
Added Index column;
Created Custom column named PreviousTime;
Calculated Duration (Time - PreviousTime).
The full code for the above mentioned steps is:
let
Source = Table,
#"Sorted Rows" = Table.Sort(Source,{{"Date", Order.Ascending}, {"User_No", Order.Ascending}, {"Time", Order.Ascending}}),
#"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 0, 1),
#"Added Custom" = Table.AddColumn(#"Added Index", "PreviousTime", each try
if List.AllTrue(
{[User_No]=#"Added Index"[User_No]{[Index]-1},[Date]=#"Added Index"[Date]{[Index]-1}
}
)
then try #"Added Index"[Time]{[Index]-1} otherwise [Time]
else [Time]
otherwise [Time]),
Duration = Table.AddColumn(#"Added Custom", "Duration", each [Time] - [PreviousTime], type duration)
in
Duration
This works on small data sets but causes functionality issues and completely fails on a larger amount of data.
I am fairly new to Power Query and M so I just can't figure out what exactly from the custom column formula causes issues or how to approach this in another way.
I tried to keep the above code as part of my query and also to use it as a function but there is not much difference functionality wise between these two approaches.
The processed table will be sent to the Data Model but I was hoping to obtain the duration in Power Query rather than in Power Pivot.
A big thank you in advance!
To detail on the task a bit more I uploaded a reduced version of the data, for 3 users for the month of December. You can find it here: https://1drv.ms/x/s!AocQlL_KAzymgwhqiKxSL5JMZheL.
What I want to achieve is to calculate the duration between the timestamps based on user and date.
As a plus I do not have users working past midnight so all timestamps for a specific shift will be within the same date.
An example of the desired outcome can be found within the workbook as well and looks like this (calculated in Excel):
Date Time User Door_side Duration
03/12 06:54 User_1 Outside
03/12 07:26 User_1 Inside 00:32:00
03/12 07:27 User_1 Outside 00:01:00
03/12 07:44 User_1 Inside 00:17:00
03/12 07:52 User_1 Outside 00:08:00
03/12 08:35 User_1 Inside 00:43:00
03/12 08:36 User_1 Outside 00:01:00
03/12 11:50 User_1 Inside 03:14:00
03/12 12:01 User_1 Outside 00:11:00
03/12 13:27 User_1 Inside 01:26:00
03/12 13:43 User_1 Outside 00:16:00
03/12 14:57 User_1 Inside 01:14:00
03/12 15:20 User_1 Inside 00:23:00
03/12 15:26 User_1 Outside 00:06:00
03/12 15:34 User_1 Inside 00:08:00
Because the data contains all users and multiple days I am attempting to do the calculations within tables grouped by Date and User.
I spent some time testing all 3 approaches presented below (List.Min, Table.FirstN & nested tables) and on a limited data set all of them do a great job.
However, when applied to a larger dataset (I have around 20000 rows for 1 month) the nested tables approach seems to be the fastest.
Thank you Eugene and Marc for helping and, more important, for teaching me something new.
Here's a different approach. It relies on working in nested tables.
I started with your data from your spreadsheet, in a table named Table1:
In Power Query, using Table1 as the source, I split the Booking Time column, renamed the resulting date and time columns, filtered, out the - Doorside entries, and sorted per your guidance:
Then I grouped by Booking Date and User:
Then I added an index column within each of the nested tables, in a new custom column:
Then I added a new column with the previous time within each of the nested tables, in a new custom column:
(The error you see here is because there is no previous time.)
Then I added a new column with the corrections for the errors from when I added the previous date, in each of the nested tables, in a new custom column. I figured I would "correct" caused by no previous times, by replacing the error with the "current" Booking Time, which would result in a duration of zero:
Then I added a new column with the durations calculated in each of the nested tables, in a new custom column:
Then I removed all columns except the last one I had added, which I had called AddDuration:
Then I expanded the AddDuration column:
Here's my M code:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Split Column by Delimiter" = Table.SplitColumn(Table.TransformColumnTypes(Source, {{"Booking time", type text}}, "en-US"), "Booking time", Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, false), {"Booking time.1", "Booking time.2"}),
#"Renamed Columns" = Table.RenameColumns(#"Split Column by Delimiter",{{"Booking time.1", "Booking Date"}, {"Booking time.2", "Booking Time"}}),
#"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",{{"Booking Date", type date}, {"Booking Time", type time}}),
#"Filtered Rows" = Table.SelectRows(#"Changed Type", each ([Doorside] <> "-")),
#"Sorted Rows" = Table.Sort(#"Filtered Rows",{{"Booking Date", Order.Ascending}, {"User", Order.Ascending}, {"Booking Time", Order.Ascending}}),
#"Grouped Rows" = Table.Group(#"Sorted Rows", {"Booking Date", "User"}, {{"AllData", each _, type table}}),
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "AddIndex", each Table.AddIndexColumn([AllData],"Index",0,1)),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "AddPreviousTime", each let tblName = [AddIndex] in Table.AddColumn([AddIndex],"Previous Time",each tblName{[Index]-1}[Booking Time], type time)),
#"Added Custom2" = Table.AddColumn(#"Added Custom1", "CorrectErrors", each Table.ReplaceErrorValues([AddPreviousTime], {{"Previous Time", [AddPreviousTime][Booking Time]{0}}})),
#"Added Custom3" = Table.AddColumn(#"Added Custom2", "AddDuration", each Table.AddColumn([CorrectErrors],"Duration", each [Booking Time] - [Previous Time], type duration)),
#"Removed Other Columns" = Table.SelectColumns(#"Added Custom3",{"AddDuration"}),
#"Expanded AddDuration" = Table.ExpandTableColumn(#"Removed Other Columns", "AddDuration", {"Booking Date", "Booking Time", "User", "Doorside", "Index", "Previous Time", "Duration"}, {"Booking Date", "Booking Time", "User", "Doorside", "Index", "Previous Time", "Duration"})
in
#"Expanded AddDuration"
If I got your task correctly, you need time when next event occured, presuming this is the time door was closed.
In this case I strongly recommend you avoid using index. Instead, I suggest you to think out how to apply row selection procedure to gt what you need for each row.
Here is what I think should work if my understanding of your task was correct:
let
Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
SplitDateTime = Table.SplitColumn(Table.TransformColumnTypes(Source, {{"Booking time", type text}}, "en-GB"), "Booking time", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"Date", "Time"}),
FilteredDoorside = Table.SelectRows(SplitDateTime, each ([Doorside] <> "-")),
ChangedType = Table.Buffer(Table.TransformColumnTypes(FilteredDoorside,{{"Date", type date}, {"Time", type time}, {"User", type text}, {"Doorside", type text}})),
GetCloseTime = Table.AddColumn(ChangedType, "Duration", (row)=>List.Min(Table.SelectRows(ChangedType, each [Date]=row[Date] and [Time]>row[Time])[Time]) - row[Time]),
SetType = Table.TransformColumnTypes(GetCloseTime,{{"Duration", type duration}})
in
SetType
In GetCloseTime step I add function column, which selects rows from the table self, with the same date and later in time, and then selects minimal time. This will be next event time. You can add additional criteria if you need.
Another way is instead using List.Min make a sorted derived table and take its 1st row and value in column Time: {0}[Time]
let
Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
SplitDateTime = Table.SplitColumn(Table.TransformColumnTypes(Source, {{"Booking time", type text}}, "en-GB"), "Booking time", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"Date", "Time"}),
FilteredDoorside = Table.SelectRows(SplitDateTime, each ([Doorside] <> "-")),
ChangedType = Table.Buffer(Table.TransformColumnTypes(FilteredDoorside,{{"Date", type date}, {"Time", type time}, {"User", type text}, {"Doorside", type text}})),
GetCloseTime = Table.AddColumn(ChangedType, "Duration", (row)=>Table.FirstN(Table.Sort(Table.SelectRows(ChangedType, each [Date]=row[Date] and [Time]>row[Time]),{{"Time", Order.Ascending}}),1){0}[Time] - row[Time]),
SetType = Table.TransformColumnTypes(GetCloseTime,{{"Duration", type duration}})
in
SetType

Inserting text manually in Powerquery

I'm merging multiple Excel files into one where the user can review and mark an additional Comment column as completed. Each day there are additional files and I need to refresh the query and pull the data in. Keeping the original Comment column values.
I've attempted to do this by referencing Marcel Beug's video but that uses an sql table and I cannot seem to get it to work with the Excel files as the source.
After the Merge Queries I attempt to modify the first file to my source "InputFile"
![Modify the Merge Formula1][2]
![Changed to last query step of InputFile][3]
![InputFile Query with Source2 and Merge][4]
![M Code of InputFile Query with Merge][5]
By setting the First field in the Merge Formula to the last step in the InputFile query I was able to get around the Cyclic error however I find that every Refresh creates duplicate rows. 4 become 8 that then becomes 16, etc.
let
Source = Excel.Workbook(File.Contents("S:\Fin_Aid\Operations Team\COD mpn - lec\InputFiles\8.22.18 to 8.23.18.xlsx"), null, true),
Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
Rename_RecID = Table.RenameColumns(#"Removed Columns",{{"Column3.1", "RecID"}}),
Source2 = Excel.CurrentWorkbook(){[Name="InputFile"]}[Content],
InputWithComment = Table.TransformColumnTypes(Source2,{{"RecID", Int64.Type}, {"Column1", type text}, {"Column2", type text}, {"Column4", type text}, {"Column5", type text}, {"Comment", type text}}),
#"Merged Queries" = Table.NestedJoin(Rename_RecID,{"RecID"},InputWithComment,{"RecID"},"InputWithComment",JoinKind.LeftOuter),
#"Expanded InputWithComment" = Table.ExpandTableColumn(#"Merged Queries", "InputWithComment", {"Comment"}, {"Comment"})
in
#"Expanded InputWithComment"
Regards,
Jim

Power Query: issue with removing duplicates function

Problem when using "Remove Duplicates" function in Power Query.
I'm running Excel 2013 with PowerQuery and PowerPivot. Multiple txt files in the same folder were loaded into data models by creating an connection. The tables looks like below.
CoCd Doc.Id Plant PGroup Purch.Doc. Vendor
7200 411647 7200 U36 4800311931 2000031503
7020 421245 7020 D05 4800277051 2000032922
7200 404320 1000 8 4800000000 2000032944
7200 404321 7200 T48 4800293878 2000032944
7010 425013 7010 R21 4800346743 2000036726
There are total 440k rows in total. By running a pivot table, I've identified 144k unique Doc.Ids.
I then selected the Doc.Id (Whole Number) column and use the "Remove Duplicates" function in Power Query to remove the other duplicated rows. However, the final table only loaded 75k rows (should be 144k). I changed the data type of Doc.Ids to "text", then removed duplicates, the final table became 163k rows, which is some what correct as Doc.Ids contain "603" and " 603". Unfortunately I really need to have 144k rows in my final table.
Why the remove duplicates function doesn't work in my case with Doc.Ids as whole Number?
The code in Advance Editor looks like below:
#"Changed Type1" = Table.TransformColumnTypes(#"Filtered Rows",{{"CreateTime", type time}, {" TotalAmoun", Currency.Type}, {"Pst Date", type date}, {"Doc. Date", type date}, {"Due Date", type date}, {"DaysToDue", Int64.Type}, {"CreateDate", type date}, {"Cycle Time", type text}, {"Doc. Id", type text}, {"Purch.Doc.", Int64.Type}, {"Vendor", type text}, {"CoCd", Int64.Type}, {"Plant", type text}}),
#"Removed Duplicates" = Table.Distinct(#"Changed Type1", {"Doc. Id"})
in
#"Removed Duplicates"
After some further digging, it appears that a chunk of Doc.Id were missing between "398103" and "657238" plus some random ones. An example list of missing numbers as below. Can't find any reasons why they should be missing.
"245233"
"261404"
...
...
"398103"
...
...
"657238"

Resources