Power Query - Data Transformation - powerquery

New to PowerBi and Power Query and having some trouble transforming the data.
The data contains processes for each sale category with status if the manufacturing process has been complete or not. Require a new aggregate table that has three calculated columns returning the following dates:
Start date which is defined as the first date the process enters the table
Predicted end date which is defined as the last date the process is shown in the table
Actual end date which is defined as the last instance the process status is equal to "Done"
Have managed to return the three dates but each ends up on a separate line rather than one line with the data. Below is the original data and required output.
Output Table
Would appreciate any assistance in transforming this data.

Most of it you can do using the Power Query UI:
Group by Month/Category/Process
Aggregations:
Start date => Min of Date
Estimated (or Predicted) end date => Max of Date
But then you need a custom aggregation where you determine the max date after filtering the subtable for "Done" in the status column.
You can do that in the Advanced Editor editing the M Code directly.
M Code
let
Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Date", type date}, {"Month", type date},
{"Category", type text}, {"Process", type text}, {"Status", type text}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"Month", "Category", "Process"}, {
{"Start Date", each List.Min([Date]), type nullable date},
{"Predicted End Date", each List.Max([Date]), type nullable date},
//Custom aggregation to calculate Actual End Date
//Note that we can Filter the table here, and then select the last date
{"Actual End Date", each List.Max(Table.SelectRows(_, each [Status]= "Done")[Date]), type nullable date}
})
in
#"Grouped Rows"
Original Data
Results

Related

PowerQuery filter on 20th of the month no matter the month or year

I am trying to filter a list on the 20th of the month as this has been given as a significant date to identify a specific subset of records. There is no set date just a set day so it can be the 20th of any month in any year. Is there a way I can filter on these in PowerQuery?
Thanks
I assume you mean you want to filter a Table, choosing only to show the rows where the day = the 20th
Let's also also assume your data is loaded into Powerquery, and the date info is a column named Date
Add column, custom column, with formula
= Date.Day([Date])
( See the Power Query M function reference list )
Click at top of that new column and use the drop down filter to [x] the 20
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Custom" = Table.AddColumn(Source, "Custom", each Date.Day([Date])),
#"Filtered Rows" = Table.SelectRows(#"Added Custom", each ([Custom] = 20))
in #"Filtered Rows"

Insert rows for missing dates in Power Query

the starting point is the following table in which entries are made for events on specific days (journal).
Entity
Event
Date
Amount
0123
acquisition
05.05.2015
10,000.00
0123
capital increase
30.11.2015
1,000.00
0123
write-off
31.12.2017
-4,000.00
0123
write-up
31.12.2019
3,000.00
This journal is loaded into Power Query to be enhanced with additional information from other sources.
The goal is a Power Pivot table in which the amounts are summarized as at 31.12. of each year (Subtotals).
Year
Entity
Event
Date
Amount
2015
0123
aquisition
05.05.2015
10,000.00
2015
0123
capital increase
30.11.2015
1,000.00
2015 Subtotal
0123
11,000.00
2016 Subtotal
0123
11,000.00
2017
0123
write-off
31.12.2017
-4,000.00
2017 Subtotal
0123
7,000.00
2018 Subtotal
0123
7,000.00
2019
0123
write-up
31.12.2019
3,000.00
2019 Subtotal
0123
10,000.00
2020 Subtotal
0123
10,000,00
The question is how to insert rows in Power Query for years where no activity (event) has occurred (no entry in the journal) so that a subtotal can be shown in Power Pivot as of 31.12. of each year.
I hope I could explain my issue in an understandable way. Thanks in advance for your help!
Kind regards,
Joerg
See if something like this works for you. There are shorter, more confusing ways to do it
Get minimum year of all the data, and maximum year of all the data, and create a table of all combinations of years and entities. See if those are being used. If not, merge that year and entity back into the original table with month=dec day=31
there is a bit of self-merging etc, which requires pasting this into home...advanced... since not all of it can be done in the user interface
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Entity", Int64.Type}, {"Event", type text}, {"Date", type date}, {"Amount", Int64.Type}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Year", each Date.Year([Date])),
// Create table of all possible Entities and Years
DateList = {Date.Year(List.Min(#"Added Custom"[Date])) .. Date.Year(List.Max(#"Added Custom"[Date]))},
Entities = Table.AddColumn(Table.Distinct(Table.SelectColumns(#"Added Custom",{"Entity"})),"Year", each DateList),
#"Expanded Year" = Table.ExpandListColumn(Entities, "Year"),
// Find unique Data and merge into original data set
#"Merged Queries" = Table.NestedJoin(#"Expanded Year",{"Year", "Entity"},#"Added Custom",{"Year", "Entity"},"Table2",JoinKind.LeftOuter),
#"Expanded Table2" = Table.ExpandTableColumn(#"Merged Queries", "Table2", {"Date"}, {"Date2"}),
#"Filtered Rows" = Table.SelectRows(#"Expanded Table2", each ([Date2] = null)),
#"Added Custom1" = Table.AddColumn(#"Filtered Rows", "Date", each #date([Year],12,31), type date),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"Date2", "Year"}),
#"Appended Query" = Table.Combine({#"Changed Type", #"Removed Columns" })
in #"Appended Query"

Insert current date in column name

I'm starting to use Power Query in Excel 365 (desktop install). Is there a way to change the column name to append or prepend today's date to the column name? If the column is named "Size" I'd like the column to be named "Size_2019_04_18". The exact format of the date doesn't matter.
1, Go to Power Query Editor
2, Go to Advanced Editor
3, add the code below (Case Sentitive):
Let
...
NewName = "Size_"&Date.ToText(DateTime.Date(DateTime.LocalNow())),
#"Changed Type" = Table.TransformColumnTypes(Sheet1_Table,{{"Size", Int64.Type}}),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"Size", NewName}})
in
#"Renamed Columns"
Test Result:

PowerQuery: taking the average of each of many columns

I'm new to PowerQuery and I have a table that is essentially a matrix of dates and hours within those days: the first column holds each date and the rest of the columns are labeled 1 through 24. An example is:
Date H1 H2 H3 H4 ...
---- -- -- -- --
Jan 1
Jan 2
Jan 3
...
This is stored in an Excel file that is quite large, so I want to be able to simply query that file and pull subsets of the data. One example is the average hourly number by year. In SQL this would be represented by "SELECT YEAR(Date), AVG(H1), AVG(H2), ... FROM Source Table GROUPBY YEAR(Date)". However, in PowerQuery it seems like you can only use GROUPBY to generate a new column with the grouped result and thus have to repeat the operation x24 in this case, or more if I had data by seconds for example (to be fair, in the SQL query you also have to type out each column if you don't consider scripting solutions). Is there a simpler approach to generate my desired table (essentially collapsing each column to its average), or do I need to manually add each column?
You can unpivot your hour columns and then you only need to group by year and the unpivoted attribute column.
I made a sample table of your data like this and loaded it into power query. I converted the Date column to Year only, Unpivoted Other Columns on the Date column, then Grouped by the Date and Hour column after unpivoting. The result looks like this.
You can of course repivot the data after if you want inside or outside of power query. This is what the code in power query looks like, but this was all created with normal menu options, not written by hand.
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Extracted Year" = Table.TransformColumns(Source,{{"Date", Date.Year, Int64.Type}}),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Extracted Year", {"Date"}, "Hour", "Value"),
#"Grouped Rows" = Table.Group(#"Unpivoted Other Columns", {"Date", "Hour"}, {{"Average", each List.Average([Value]), type number}})
in
#"Grouped Rows"

Power Query, row by row the sum of the next 3 values

I have a power query table, 1 column with integer values. In another column, the sum of the current row and the other 2 rows should be calculated row (cell) by row (cell). - In plain Excel, I calculate it like this:
B1: = SUM(B1:B3)
B2: = SUM(B2:B4)
B3: = SUM(B3:B5)
...
How can I solve this with Power Query? If an error occurs in the last 2 lines, this is negligible.
Thanks and regards
Guenther
Is this what you're looking for?
If you start with this as your Source table:
Then if you add a custom column set up like this:
You'll get this:
Here's the M code, loading it from a spreadsheet's workbook, where the data is in a table named Table1:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Custom" = Table.AddColumn(Source, "Custom", each List.Sum(List.Range(Source[Column1],[Column1]-1,3)))
in
#"Added Custom"

Resources