I'm merging multiple Excel files into one where the user can review and mark an additional Comment column as completed. Each day there are additional files and I need to refresh the query and pull the data in. Keeping the original Comment column values.
I've attempted to do this by referencing Marcel Beug's video but that uses an sql table and I cannot seem to get it to work with the Excel files as the source.
After the Merge Queries I attempt to modify the first file to my source "InputFile"
![Modify the Merge Formula1][2]
![Changed to last query step of InputFile][3]
![InputFile Query with Source2 and Merge][4]
![M Code of InputFile Query with Merge][5]
By setting the First field in the Merge Formula to the last step in the InputFile query I was able to get around the Cyclic error however I find that every Refresh creates duplicate rows. 4 become 8 that then becomes 16, etc.
let
Source = Excel.Workbook(File.Contents("S:\Fin_Aid\Operations Team\COD mpn - lec\InputFiles\8.22.18 to 8.23.18.xlsx"), null, true),
Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
Rename_RecID = Table.RenameColumns(#"Removed Columns",{{"Column3.1", "RecID"}}),
Source2 = Excel.CurrentWorkbook(){[Name="InputFile"]}[Content],
InputWithComment = Table.TransformColumnTypes(Source2,{{"RecID", Int64.Type}, {"Column1", type text}, {"Column2", type text}, {"Column4", type text}, {"Column5", type text}, {"Comment", type text}}),
#"Merged Queries" = Table.NestedJoin(Rename_RecID,{"RecID"},InputWithComment,{"RecID"},"InputWithComment",JoinKind.LeftOuter),
#"Expanded InputWithComment" = Table.ExpandTableColumn(#"Merged Queries", "InputWithComment", {"Comment"}, {"Comment"})
in
#"Expanded InputWithComment"
Regards,
Jim
Related
I've been searching everywhere to find a way to filter a column that contains both Text and Numbers, I want to filter out the numbers only from that column.
Thanks.
Add column, custom column, potentially with one of these
= Text.Select([Column1],{"0".."9"})
=try Number.From([Column1]) otherwise "Text"
Try this:
let
//Change next line to reflect Data source
Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
//change next line to include all columns and their names
#"Changed Type" = Table.TransformColumnTypes(Source,{{"COLUMN", type any}}),
//Change next line to be testing the proper column
#"Numbers Only" = Table.SelectRows(#"Changed Type", each not (try Number.From([COLUMN]))[HasError]),
#"Changed Type1" = Table.TransformColumnTypes(#"Numbers Only",{{"COLUMN", type number}})
in
#"Changed Type1"
I have a Raw Data Table as shown in the screenshot below:
I want to group the data in the raw data table into the Output Table as shown in the screenshot below:
Basically the output table is counting the number of student for each understanding level in different intake. May I know how should I get the output table from the raw data table? I'm still new in Power Query, any help will be greatly appreciated!
This is what I have tried:
Code:
= Table.Group(Source, {"Intake"}, {
{"Count_Little_Understand", each Table.RowCount(Table.SelectRows(_, each ([Topic 1] = "Little Understanding"))), Int64.Type},
{"Count_General_Understanding", each Table.RowCount(Table.SelectRows(_, each ([Topic 1] = "General Understanding"))), Int64.Type},
{"Count_Good_Understand", each Table.RowCount(Table.SelectRows(_, each ([Topic 1] = "Good Understanding"))), Int64.Type},
{"Count_Fully_Understand", each Table.RowCount(Table.SelectRows(_, each ([Topic 1] = "Fully Understand"))), Int64.Type}
})
I only able to get the table by individual Topic, not sure how to include other Topic appended below and also add another extra column to label the Topic as shown in my second screenshot. Hope to get some advice/help on how should I modified the code. Thank you/1
I've rebuilt a similar but shorter table:
Now we first go into Transform (1), mark the Topic Cols (2) and Unpivot Columns (3).
Your table now looks like the following screenshot. Finally, we select the Value column (1), click on Pivot Column (2), select Employee Name (3).
Result:
You can Unpivot the Topic columns, then Pivot the Understanding column, using Count of Employee Name as the aggregate value.
Then simply reorder columns and sort rows, to suit the output you need:
#"Unpivoted Topic" = Table.UnpivotOtherColumns(#"Raw Data Table", {"Employee Name", "Intake"}, "Topic", "Understanding"),
#"Pivoted Understanding" = Table.Pivot(#"Unpivoted Topic", List.Distinct(#"Unpivoted Topic"[Understanding]), "Understanding", "Employee Name", List.NonNullCount),
#"Reordered Columns" = Table.ReorderColumns(#"Pivoted Understanding",{"Intake", "Topic", "Little Understanding", "General Understanding", "Good Understanding", "Fully Understand"}),
#"Sorted Rows" = Table.Sort(#"Reordered Columns",{{"Topic", Order.Ascending}, {"Intake", Order.Ascending}})
Output:
I have below data:
Con Payment Status Count
HUMANABRATTEN,MICOL9/20/2021 Resubmitted 15
HUMANABRATTEN,MICOL9/20/2021 In-Process 1
they have exact same length but when I try to remove duplicate it always removes the "Resubmitted" whereas I want the high count Payment status
Normally in Excel, when we remove duplicate from any Data it always return the first value and remove 2nd value. IDK why its not working in Power Query
Power Query does not necessarily return results in the order you might expect. Even the sorts are unstable, if I recall correctly.
For your problem, one solution would be to use Table.GroupBy and then extract the desired results. In your case it seems to be the Max Count, and the Payment Status that is in the same row as Max Count.
eg:
let
Source = Excel.CurrentWorkbook(){[Name="Table5"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Con", type text}, {"Payment Status", type text}, {"Count", Int64.Type}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"Con"}, {
//Return the Payment Status cell that is in the same row as Max Count
{"Payment Status", each _[Payment Status]{List.PositionOf(_[Count],List.Max(_[Count]))}},
//determine the Max Count
{"Count", each List.Max([Count]), type nullable number}})
in
#"Grouped Rows"
I'm trying to create a measure that averages the 29 elements' value [Overtime/Hours_worked] into one cell, visualised by the attached image.
Cell F32 currently shows [AverageA Total Overtime/Total Hours_worked] but I want it to be an average of the 29 rows' values as displayed in cell H32, =AVERAGEA(F3:F31).
The elements' figures are based upon underlying data from Data$, currently amounting to ~150k rows. When creating a measure that's averaging the elements' values from Column E [AvereageA Overtime/Hours_worked] and showing as a % of the 29 elements' aggregate %, I'm running into the problem of averaging the separate elements' underlying data taken from Data$. Worth noting is that F3:F31 is redundant in this instance, I'm looking for the average of the 29 elements' values in column E and not their respective averages shown in column F.
Am I right to use measure here or is there a better way to approach it? If measures can be used, is there a way to design the measure so that it refers to the Pivot Table's shown data instead of the underlying data taken from Data$? For instance by designing the measure to refer to column E in the pivot table?
Side note
The table needs to remain dynamic since Data$ is being updated regularly. I'm relatively new to Power Query so I'm not sure if there are other ways to solve this, i.e. through MDX, but I doubt I'll be able to sort that out myself.
Any and all help is appreciated, thanks.
I'm not sure how you are computing the individual entries in the AverageA Total Overtime/Total Hours_worked (so I left it blank), but to compute the totals and averages for the other columns, you can use the Table.Group command in a special way with an empty list for the key (so as to return the entire table for the Aggregation operations).
Given:
M Code
read the comments in the code to understand the algorithm
If your overtime% column is in your original data, you can just delete those code lines that add that column
let
//be sure to change table name in next line to your actual table name
Source = Excel.CurrentWorkbook(){[Name="wrkTbl"]}[Content],
//set data types
#"Changed Type" = Table.TransformColumnTypes(Source,{
{"Area", type text}, {"Hours_worked", Int64.Type}, {"Overtime", Int64.Type}}),
//Add the percent overtime column
#"Added Custom" = Table.AddColumn(#"Changed Type", "Overtime/Hours_worked",
each [Overtime]/[Hours_worked], Percentage.Type),
//Use Table.Group to compute total/averages for the last row
//Be sure to use the exact same names as used in the original table
#"Grouped Rows" = Table.Group(#"Added Custom", {}, {
{"Area", each "Totals",type text},
{"Hours_worked", each List.Sum([Hours_worked]), Int64.Type},
{"Overtime", each List.Sum([Overtime]), Int64.Type},
{"Overtime/Hours_worked", each List.Sum([Overtime])/List.Sum([Hours_worked]), Percentage.Type},
{"AverageA Overtime/Hours_worked", each List.Average([#"Overtime/Hours_worked"]), Percentage.Type}
}),
//Append the two tables to add the Totals row
append= Table.Combine({ #"Added Custom", #"Grouped Rows"})
in
append
results in =>
Problem when using "Remove Duplicates" function in Power Query.
I'm running Excel 2013 with PowerQuery and PowerPivot. Multiple txt files in the same folder were loaded into data models by creating an connection. The tables looks like below.
CoCd Doc.Id Plant PGroup Purch.Doc. Vendor
7200 411647 7200 U36 4800311931 2000031503
7020 421245 7020 D05 4800277051 2000032922
7200 404320 1000 8 4800000000 2000032944
7200 404321 7200 T48 4800293878 2000032944
7010 425013 7010 R21 4800346743 2000036726
There are total 440k rows in total. By running a pivot table, I've identified 144k unique Doc.Ids.
I then selected the Doc.Id (Whole Number) column and use the "Remove Duplicates" function in Power Query to remove the other duplicated rows. However, the final table only loaded 75k rows (should be 144k). I changed the data type of Doc.Ids to "text", then removed duplicates, the final table became 163k rows, which is some what correct as Doc.Ids contain "603" and " 603". Unfortunately I really need to have 144k rows in my final table.
Why the remove duplicates function doesn't work in my case with Doc.Ids as whole Number?
The code in Advance Editor looks like below:
#"Changed Type1" = Table.TransformColumnTypes(#"Filtered Rows",{{"CreateTime", type time}, {" TotalAmoun", Currency.Type}, {"Pst Date", type date}, {"Doc. Date", type date}, {"Due Date", type date}, {"DaysToDue", Int64.Type}, {"CreateDate", type date}, {"Cycle Time", type text}, {"Doc. Id", type text}, {"Purch.Doc.", Int64.Type}, {"Vendor", type text}, {"CoCd", Int64.Type}, {"Plant", type text}}),
#"Removed Duplicates" = Table.Distinct(#"Changed Type1", {"Doc. Id"})
in
#"Removed Duplicates"
After some further digging, it appears that a chunk of Doc.Id were missing between "398103" and "657238" plus some random ones. An example list of missing numbers as below. Can't find any reasons why they should be missing.
"245233"
"261404"
...
...
"398103"
...
...
"657238"