Consolidating Excel files in Power Query - powerquery

I have followed online instruction that allow me to do this if all of the files that I need are in the same folder.
Whilst all of the files that I need are indeed in my "Downloads" folder, there are other files in there as well.
Where I am getting lost is in editing my list of the files that I want.
When I press edit it takes me to Power Query (where I can filter the files that I want) but when I press Load and save it just creates a list of those files and doesn't allow me to then combine and edit.
I just end up with a list of the files that I wanted to combine!
Could anybody point me in the right direction please?

Below is some code to combine all XLS files in a directory. Add your code starting in the third row to further filter the filenames as needed, then let the rest of the steps proceed to combine them
let Source = Folder.Files("C:\subdirectory\directory"),
#"Filtered Rows" = Table.SelectRows(Source, each ([Extension] = ".xlsx")),
// add another filter here as desired
#"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Name", "Content"}),
#"Added Custom" = Table.AddColumn(#"Removed Other Columns", "GetFileData", each Excel.Workbook([Content],true)),
#"Expanded GetFileData" = Table.ExpandTableColumn(#"Added Custom", "GetFileData", {"Data", "Hidden", "Item", "Kind", "Name"}, {"Data", "Hidden", "Item", "Kind", "Sheet"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded GetFileData",{"Content", "Hidden", "Item", "Kind"}),
List = List.Union(List.Transform(#"Removed Columns"[Data], each Table.ColumnNames(_))),
#"Expanded Data" = Table.ExpandTableColumn(#"Removed Columns", "Data", List,List)
in #"Expanded Data"

This might help you work with your info a bit.
To be able to select the files you want to use from the folder, what you want to do is, when you go to get data from folder, click the Transform Data button instead of the pre-selected Combine or Combine & Transform Data button. (Which of the pre-selected buttons you'll see depends upon how you navigated to get data from folder.) After you click the Transform Data button, you'll be presented a table of your folder's contents, with various columns of pertinent info. You can then filter based on the info in the columns and, after filtering, you can click on the button to combine the contents of the filtered files.

Related

Excel Power Query - Remove Column if it exists, otherwise don't try

I have a calculated Column Custom = Column1 + Column2 - Column3
After the calculation, i need to delete all columns except Custom
Problem is sometimes one of the columns [Column4] does not exist in the dataset
I can have the the Custom calculate properly with "try otherwise" as in:
#"Added Custom" = Table.AddColumn(#"Previous Step", "Custom", each [Column2]+[Column3]- (try [Column4] otherwise 0)),
#"Removed Columns7" = Table.RemoveColumns(#"Added Custom",{"Column2", "Column3", "Column4"}),
This works fine, however the second step fails if [column4] doesn't exist.
So i need a way to test if [Column4] exists and remove it if it does, otherwise don't try to.
how about
#"Added Custom" = Table.AddColumn(#"Previous Step", "Custom", each [Column2]+[Column3]- (try [Column4] otherwise 0)),
#"Removed Columns" = try Table.RemoveColumns(#"Added Custom,{"Column4"}) otherwise #"Added Custom"
One way to approach this is to select the columns you want to keep rather than removing columns you don't want. This is equivalent to removing all except the columns you specify.
Alternatively, you can intersect the all of table columns with your list.
Table.RemoveColumns(
#"Added Custom",
List.Intersect(
{
Table.ColumnNames(#"Added Custom"),
{"Column2", "Column3", "Column4"}
}
)
)
Another way:
add = Table.AddColumn(Source, "Custom", each List.Sum({[Column2],[Column3],-[Column4]?})),
del = Table.RemoveColumns(add,{"Column2", "Column3", "Column4"}, 1)
If the only column you want to retain is Custom, then just use Table.SelectColumns.
If there might be other columns you want to retain, you can select them also or you can generate a list of columns to remove.
From what you write, it seems you want to remove any columns whose name starts with Column. If that is the case, here is one method:
#"Removed Columns"= Table.RemoveColumns(#"Previous Step",
List.Select(Table.ColumnNames(#"Previous Step"), each Text.StartsWith(_,"Column")))

Power Query: How to insert a Column using the Left(Right Function based on A4

Is there a way to Add a column, in Power Query, by referencing data in a specific cell?
I want to take the text from "A4", use a Left(Right function, and add that to a new column.
My VBA macro is:
"Latest 4 Wks - Ending " & Left(Right(.Range("A4"), 24), 23)
I guess you want to do something like that. In a first step you define a named range for A4 which I named cellA4. I then did a load into Powerquery, added an extra column with the part of the text (I used Text.Middle other text function are possible, of course) from the cell and drilled down to the content of the cell. The M-code for that is
let
Source = Excel.CurrentWorkbook(){[Name="cellA4"]}[Content],
#"Added Custom" = Table.AddColumn(Source, "Custom", each Text.Middle([Column1],23)),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Column1"}),
Custom = #"Removed Columns"{0}[Custom]
in
Custom
Result looks like
Them I just made a table with one column and imported that into Powerquery and added an extra column which just contains the text from cell A4. M-Code is
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Col1", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each cellA4)
in
#"Added Custom"
Result is
Through further research, I found that by adding a Blank Query, I was able to add a column, in Power Query, by referencing data in a specific cell?
Insert BlankQuery
Advance Editor
(YourWorkSheet as table ) as text=>
let
SheetCellA4 =YourWorkSheet[Column1]{3},
SplitByFrom = Text.Split(SheetCellA4, "to "){1},
SplitByTime = Text.Split(SplitByFrom, "`"){0}
in SplitByTime
The bring in the worksheet data
After the Source line
#"Added Custom" = Table.AddColumn(Source, "Custom", each Query1(Source))
In
#"Added Custom"

Populate conditional column depending on column name criteria

I receive a weekly report which contains some repetition of columns. This is because it is drawn from a collection of web forms which ask similar questions to each other - let's say they all ask "Do you want to join our email list?" - but this question is stored in the source system as a separate field for each form (each form is effectively a separate table). The columns will always be consistently named - e.g. "Email_optin_1", "Email_optin_2" - so I can come up with rules to identify the columns which ask the email question. However, the number of columns may vary from week to week - one week the report might just contain "Email_optin_2", the next week it might include four such columns. (This depends on which web-forms have been used in that week). The possible values are the same in all these columns - let's say "Yes" and "No".
Each row should normally only have one of the "Email_optin" columns populated.
What I would like to do is create a single column in Power Query called "Email_Optin_FINAL", which would return "Yes" if ANY columns beginning with "Email_optin" contain a value of "Yes".
So, basically, instead of the criteria simply referring to the values in specific columns, what I would like it to do is first of all figure out which columns it needs to be looking at, and then look at the values in those columns.
Is this possible in PowerQuery?
Thanks in advance for any advice!
This would find all the columns containing Email_optin and merge them for you into a new column and remove the original columns
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
EmailList= List.Select(Table.ColumnNames(Source), each Text.Contains(_, "Email_optin")),
#"Merged Columns" = Table.CombineColumns(Source,EmailList,Combiner.CombineTextByDelimiter("", QuoteStyle.None),"Merged")
in #"Merged Columns"
This would find all the columns containing Email_optin and merge them for you into a new column and preserve the original columns
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
Index= Table.AddIndexColumn(Source, "Index", 0, 1),
EmailList= List.Select(Table.ColumnNames(Index), each Text.Contains(_, "Email_optin")),
Merged = Table.CombineColumns(Index,EmailList,Combiner.CombineTextByDelimiter("", QuoteStyle.None),"Merged"),
#"Merged Queries" = Table.NestedJoin(Index,{"Index"},Merged,{"Index"},"Merged",JoinKind.LeftOuter),
#"Expanded Table2" = Table.ExpandTableColumn(#"Merged Queries", "Merged", {"Merged"}, {"Merged"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded Table2",{"Index"})
in #"Removed Columns"
you can then filter for "YES" among the merged answers if you want

Inserting text manually in Powerquery

I'm merging multiple Excel files into one where the user can review and mark an additional Comment column as completed. Each day there are additional files and I need to refresh the query and pull the data in. Keeping the original Comment column values.
I've attempted to do this by referencing Marcel Beug's video but that uses an sql table and I cannot seem to get it to work with the Excel files as the source.
After the Merge Queries I attempt to modify the first file to my source "InputFile"
![Modify the Merge Formula1][2]
![Changed to last query step of InputFile][3]
![InputFile Query with Source2 and Merge][4]
![M Code of InputFile Query with Merge][5]
By setting the First field in the Merge Formula to the last step in the InputFile query I was able to get around the Cyclic error however I find that every Refresh creates duplicate rows. 4 become 8 that then becomes 16, etc.
let
Source = Excel.Workbook(File.Contents("S:\Fin_Aid\Operations Team\COD mpn - lec\InputFiles\8.22.18 to 8.23.18.xlsx"), null, true),
Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
Rename_RecID = Table.RenameColumns(#"Removed Columns",{{"Column3.1", "RecID"}}),
Source2 = Excel.CurrentWorkbook(){[Name="InputFile"]}[Content],
InputWithComment = Table.TransformColumnTypes(Source2,{{"RecID", Int64.Type}, {"Column1", type text}, {"Column2", type text}, {"Column4", type text}, {"Column5", type text}, {"Comment", type text}}),
#"Merged Queries" = Table.NestedJoin(Rename_RecID,{"RecID"},InputWithComment,{"RecID"},"InputWithComment",JoinKind.LeftOuter),
#"Expanded InputWithComment" = Table.ExpandTableColumn(#"Merged Queries", "InputWithComment", {"Comment"}, {"Comment"})
in
#"Expanded InputWithComment"
Regards,
Jim

Power Query won't read from .xls files

I am using Office 2010. I have a query that combines data from several excel files in a folder. ".xlsx" files load fine, but when a ".xls" file exists in the folder, the query will not run (Gives error message: "Data could not be retrieved from database". In the query editor, when I click on the row for the file with an error, I see the message here: Error Message ). Resaving the files to ".xlsx" works, but I'd rather be able to use them as-is.
I have installed the MS Access Database Engine here: http://www.microsoft.com/en-us/download/details.aspx?id=13255 but it doesn't seem to help.
Any other ideas? Thanks!
Edit: Added the two queries. First is the query applied to each file, second is the query that combines them.
Query "Transform Sample File from Supplier CMRTs":
let
Source = Excel.Workbook(#"Sample File Parameter1", null, true),
#"Smelter List_Sheet" = Source{[Item="Smelter List",Kind="Sheet"]}[Data],
#"Removed Top Rows" = Table.Skip(#"Smelter List_Sheet",3),
#"Promoted Headers" = Table.PromoteHeaders(#"Removed Top Rows", [PromoteAllScalars=true]),
#"Removed Other Columns" = Table.SelectColumns(#"Promoted Headers",{"Smelter Identification Number Input Column", "Metal (*)", "Smelter Look-up (*)", "Comments"}),
#"Filtered Rows" = Table.SelectRows(#"Removed Other Columns", each [#"Metal (*)"] <> null and [#"Metal (*)"] <> "")
in
#"Filtered Rows"
Query "Supplier CMRTs":
let
Source = Folder.Files("O:\Supplier CMRTs"),
#"Invoke Custom Function1" = Table.AddColumn(Source, "Transform File from Supplier CMRTs", each #"Transform File from Supplier CMRTs"([Content])),
#"Filtered Rows" = Table.SelectRows(#"Invoke Custom Function1", each [Extension] <> ".txt"),
#"Renamed Columns1" = Table.RenameColumns(#"Filtered Rows", {"Name", "Source.Name"}),
#"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File from Supplier CMRTs"}),
#"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File from Supplier CMRTs", Table.ColumnNames(#"Transform File from Supplier CMRTs"(#"Sample File")))
in
#"Expanded Table Column1"
I found that when I combine binaries, if I select the Sample Binary Parameter instead of a Sheet, and work my way from there, it will not balk at xls vs xlsx files. But before I could even get to the point where I could combine binaries for the folder, I had to filter only to xlsx files. Therefore, after I successfully combine the binaries, I have to go back to the Applied Steps and remove the one where I filtered only to xlsx files.
Here are some step-by-step with screen clips:
I started with 4 Excel Sheets in one Folder, called New Folder:
Here's what their data looks like:
Establish a new source from folder. Do not click Combine & Edit. Click the Edit button:
Filter the Extension column to only xlsx files:
Right-click on the column name for the Content column and then click Remove Other Columns, so you'll only have a Content column:
Click to combine the binaries. Then click the folder level Sample Binary Parameter and click OK:
Go to your Applied Steps and remove the Filtered Rows step, where you filtered to only xlsx files: Change...
to...
Also remove the Changed Type step from the Applied Steps, because it now won't work and isn't needed.
Now your query should work with both your xlsx and xls files.
For completeness, here's what I have at this step (all 4 of my files each have only one sheet, called Sheet1 in each, which is why you see 4 Sheet1 names):
Anyhow, the names dont matter for me, so I delete the Name column and expand the Data column to get:
You should recognize the data as the data from all 4 sheets above.

Resources