Excel Power Query import (same file but with different month name) - powerquery

Each month I need to automate the importing of reference data, however the Excel file is named differently.
Monthly Data File January 2022.xlsx
Monthly Data File February 2022.xlsx
Could you point me in the right direction please?

In excel, use formulas ... name manager... to pick a cell and give it a range name, like NameVariable
Enter your filepath and filename C:\temp\Monthly Data File January 2022.xlsx in that named cell. Change the content of that range as needed when filename changes later
Load one file into powerquery, then in home ... advanced editor ... add a formula that refers to that range name, similar to this:
MVar = Excel.CurrentWorkbook(){[Name="NameVariable"]}[Content]{0}[Column1],
and change any hard coded references to the filename to use MVar instead
As an example, change
let Source = Excel.Workbook(File.Contents("C:\temp\Monthly Data File January 2022.xlsx"), null, true),
to be
let MVar = Excel.CurrentWorkbook(){[Name="NameVariable"]}[Content]{0}[Column1],
Source = Excel.Workbook(File.Contents(MVar), null, true),

Related

How to extract multiple values as multiple column data from filename by Informatica PowerCenter?

I am very new to Informatica PowerCenter, Just started learning. Looking for help. My requirement is : I have to extract data from flat file(CSV file) and store the data into Oracle Table. Some of the column value of the target table should be coming from extracting file name.
For example:
My Target Table is like below:
USER_ID Program_Code Program_Desc Visit Date Term
EACRP00127 ER Special Visits 08/02/2015 Aug 2015
My input filename is: Aug 2015 ER Special Visits EACRP00127.csv
From this FileName I have to extract "AUG 2015" as Term, "ER Special Visits" as Program_Desc and "EACRP00127" as Program_Code along with some other fields from the CSV file.
I have found one solution using "Currently Processed Filename". But with this I am able to get one single value from filename. how can I extract 3 values from the filename and store in the target table? Looking for some shed of light towards solution. Thank you.
Using expression transformation you can create three output values from Currently Processed Filename column.
So you get the file name from SQ using this field 'Currently Processed Filename'. Then you can substring the whole string to get what you want.
input/output = Currently Processed Filename
o_Term = substr(Currently Processed Filename,1,9)
o_Program_Desc = substr(Currently Processed Filename,10,18)
o_Program_Code = substr(Currently Processed Filename,28,11)

Power Query – File names loaded from folder become column names, causing failure if new files are later loaded

Power Query sourcing multiple Excel files from a folder.
Files are monthly transactions. The month and year are part of the file names. When the next month comes, new files (in the same format of course, but with new file names) replace the previous ones in the source folder. Having the new file names causes the query to fail on refresh in the following way.
When the files are combined and displayed to begin the transformations, the files names constitute a column of data (named Source). One of my steps in transforming the data is to “use first row as headers”; at this point the first file name in that Source column becomes its column header name.
The problem is that when files having new names replace the previous ones, that column name is no longer found, since the row promoted to be the column header is the name of a new file. PQ is looking for a column header having the original file name and doesn’t find it, so subsequent transformations using that column cause errors.
The error message is: “[Expression.Error] The column ‘[OriginalFileName]’ of the table wasn’t found.”
Basically, that original file name takes on a permanent role as a column name that is part of the query.
I successfully managed to get around the problem by manually renaming all the columns instead of promoting the first data row to be the column headers. Now files with new names are processed without complaint. But this solution is clunky and I would like to keep the step of promoting the first row to be the header.
Does anyone know how to overcome this problem?

Quicksight parse date into month

Maybe I missed it but I'm attempting to create a dynamic 'Month' parameter based on a datetime field - but can't seem to get just the month! ? Am I missing something ?
here's my source DTTM date/time field -
In Manage Data > Edit [selected] Data Set > Data source
Just add 'calculated field':
truncDate('MM', date)
where MM returns the month portion of the date.
See manual of truncDate function
The only place in Quicksight that you can get just a month, e.g. "September" is on a date-based axis of a visual. To do so, click the dropdown arrow next to the field name in the fields list, select "Format: (date)" then "More Formatting Options..." then "Custom" and enter MMMM in the Custom format input box.
Quicksight menu selection as described
This will then show the full month name on the date axis in your visual. NB It will use the full month name on this visual for ALL time period "Aggregations" - e.g. if you change the visual to aggregate by Quarter, it will display the full name of the quarter's first month etc.
If you are talking about "Parameters" in the Quicksight analysis view then you can only create a "Datetime" formatted parameter and then only use the "Date picker" box format for this parameter in a control (+ filter).
If you use a calculated field in either data preparation or analysis view the only date functions do not allow full month names as an output, you can get the month number as an integer or one of the allowed date formats here:
https://docs.aws.amazon.com/quicksight/latest/user/data-source-limits.html#supported-date-formats
You'll need to hardcode the desired results using ifelse, min, and extract.
Extract will pull out the month as an integer. Quicksight has a desire to beginning summing integers, so we'll put MIN in place to prevent that.
ifelse(min(extract('MM',Date)) = 1,'January',min(extract('MM',Date)) = 2,'February',min(extract('MM',Date)) = 3,'March',min(extract('MM',Date)) = 4,'April',min(extract('MM',Date)) = 5,'May',min(extract('MM',Date)) = 6,'June',min(extract('MM',Date)) = 7,'July',min(extract('MM',Date)) = 8,'August',min(extract('MM',Date)) = 9,'September',min(extract('MM',Date)) = 10,'October',min(extract('MM',Date)) = 11,'November',min(extract('MM',Date)) = 12,'December','Error')
Also, I apologize if this misses the mark. I'm not able to see the screeshot you posted due to security controls here at the office.
You can use the extract function. Works like this:
event_timestamp Nov 9, 2021
extract('MM', event_timestamp)
11
You can add a calculated field using the extract function:
extract returns a specified portion of a date value. Requesting a time-related portion of a date that doesn't contain time information returns 0.
extract('MM', date_field)

Combine csv files with different structures over time

I am here to ask you a hypothetical question.
Part of my current job consists of creating and updating dashboards. Most dashboards have to be updated everyday.
I've created a PowerBI dashboard from data linked to a folder filled with csv files. I did some queries to edit some things. So, everyday, I download a csv file from a client's web application and add the said file to the linked folder, everything gets updated automatically and all the queries created are applied.
Hypothetical scenario: my client changes the csv structure (e.g. column order, a few column name). How can I deal with this so I can keep my merged csv files table updated?
My guess would be to put the files with the new structure in a different folder, apply new queries so the table structures match, then append queries so I have a single table of data.
Is there a better way?
Thanks in advance.
Say I have some CSVs (all in the same folder) that I need to append/combine into a single Excel table, but:
the column order varies in some CSVs,
and the headers in some CSVs are different (for whatever reason) and need changing/renaming.
First CSV:
a,c,e,d,b
1,1,1,1,1
2,2,2,2,2
3,3,3,3,3
Second CSV:
ALPHA,b,c,d,e
4,4,4,4,4
5,5,5,5,5
6,6,6,6,6
Third CSV:
a,b,charlie,d,e
7,7,7,7,7
8,8,8,8,8
9,9,9,9,9
10,10,10,10,10
If the parent folder (containing my CSVs) is at "C:\Users\user\Desktop\changing csvs\csvs", then this M code should help me achieve what I need:
let
renameMap = [ALPHA = "a", charlie = "c"],
filesInFolder = Folder.Files("C:\Users\user\Desktop\changing csvs\csvs"),
binaryToCSV = Table.AddColumn(filesInFolder, "CSVs", each
let
csv = Csv.Document([Content], [Delimiter = ",", Encoding = 65001, QuoteStyle = QuoteStyle.Csv]),
promoteHeaders = Table.PromoteHeaders(csv, [PromoteAllScalars = true]),
headers = Table.ColumnNames(promoteHeaders),
newHeaders = List.Transform(headers, each Record.FieldOrDefault(renameMap, _, _)),
renameHeaders = Table.RenameColumns(promoteHeaders, List.Zip({headers, newHeaders}))
in
renameHeaders
),
append = Table.Combine(binaryToCSV[CSVs])
in
append
You'd need to change the folder path in the code to whatever it is on your system.
Regarding this line renameMap = [ALPHA = "a", charlie = "c"],, I needed to change "ALPHA" to "a" and "charlie" to "c" in my case, but you'd need to replace with whatever columns need renaming in your case. (Add however many headers you need to rename.)
This line append = Table.Combine(binaryToCSV[CSVs]) will append the tables to one another (to give you one table). It should automatically handle differences in column order. If there any rogue columns (e.g. say there was a column f in one of my CSVs that I didn't notice), then my final table will contain a column f, albeit with some nulls/blanks -- which is why it's important all renaming has been done before that line.
Once combined, you can obviously do whatever else needs doing to the table.
Try it to see if it works in your case.

SISS - Automatically Create Column ( Derived Colomn)

I would like to now, if has a way to create automatically columns on Data Conversion (SISS) .
I have an Excel Source with a med19g (which represents 2019 year).
Next step, I have my Data Conversion:
As you see, med19g columns is on it.
So, next year will be added a med20g (represents 2020 year) column in Excel, and I'd like to find a way to add this column automatically or way to prevent that column on my solution (Data Conversion).
Does someone have any idea how I can I get it?
I'm using Visual Studio 2015
Thanks in advance
You will need to use "Flat file source" in the data flow to get the source file. After that, Use "Derived Column" under Data Flow Transformations category and add the desired field like below:
You can edit your output column name later by using Advanced Editor (Right click on Derived column)
If you need to check whether your source file has the same column or not already, you can add a "Script Component" (Source) between the "Get Flat File" and "Derived Column" components. In the script, check your column names and use a boolean value to decide if a column name is existing already or not. How you can do is explained clearly in this link:
https://dichotic.wordpress.com/2006/11/01/ssis-test-for-data-files-existence/

Resources