Merge CSVs with PowerAutomate - power-automate

I am experimenting with the flow editor in PowerAutomate to merge a bunch of CSVs from OneDrive that I am syncing via rclone.
The structure is:
OneDrive(Root)/Folder/Subfolder/*.csv
I would like to merge them into a dataset (master CSV) that I can use with PowerBI.
Because this dataset is updated daily, new csvs will get added to the folder, thus my triggering event is "When a file is created"
The automation looks like this:
When a file is created >
Initialize a String Variable >
Find files in folder >
Search Query: *
Folder: Same as #1
FileSearch Mode: OneDriveSearch
Apply to each >
Get File content
Append to string variable
Compose (string variable) >
Create file >
File Path: whatever/path/
File Name: whatever.csv
File Contents: Outputs
The automation runs fine, and creates my master csv.
Except it's blank!
What's going on?
When a File is Created:
Initialize Variable:
Find Files in Folder:
Apply to Each:
Append String to Variable:
Compose:
Create a File:
At the end of the day, I get a CSV with some data written to it. *Previously, I thought it was blank, but it does, indeed have data.. but it appears truncated.
Something to note here... it looks like it retains the headers..
It looks like it grabbed the first 31 files, but there are 229 files in that folder.
Thanks in advance

Related

Data Factory | Copy recursively from multiple subfolders into one folder wit same name

Objective: Copy all files from multiple subfolders into one folder with same filenames.
E.g.
Source Root Folder
20221110/
AppID1
File1.csv
File2.csv
/AppID2
File3.csv
File4.csv
20221114
AppID3
File5.csv
File6.csv
and so on
Destination Root Folder
File1.csv
File2.csv
File3.csv
File4.csv
File5.csv
File6.csv
Approach 1 Azure Data Factory V2 All datasets selected as binary
GET METADATA - CHILDITEMS
FOR EACH - Childitem
COPY ACTIVITY(RECURSIVE : TRUE, COPY BEHAVIOUR: FLATTEN)
This config renames the files with autogenerated names.
If I change the copy behaviour to preserve hierarchy, Both file name and folder structure remains intact.
Approach 2
GET METADATA - CHILDITEMS
FOR EACH - Childitems
Execute PL2 (Pipeline level parameter: #item.name)
Get Metadata2 (Parameterised from dataset, invoked at pipeline level)
For EACH2- Childitems
Copy (Source: FolderName - Pipeline level, File name - ForEach2)
Both approaches not giving the desired output. Any help/Workaround would be appreciated.
My understanding is in Option 2
Step 3 & 5 :is done as to iterate through the folder and subfolder correct ?
6 . Copy (Source: FolderName - Pipeline level, File name - ForEach2)
I think since in step 6 you already have the filename . On the SINK side , add an dynamic expression and add #Filename and that should do the trick .
If all of your files are in the same directory level, you can try the below approach.
First use Get Meta data activity to get all files list and then use copy inside ForEach to copy to a target folder.
These are my source files with directory structure:
Source dataset:
Based on your directory level use the wildcard placeholder(*/*) in the source dataset.
The above error is only a warning, and we can ignore it while debug.
Get meta data activity:
This will give all the files list inside subfolders.
Give this array to a ForEach activity and inside ForEach use copy activity.
Copy activity source:
In the above also, the */* should be same as we gave in Get Meta data.
For sink dataset create a dataset parameter and use in the file path of dataset.
Copy activity sink:
Files copied to target folder:
If your source files are not in same directory level then you can try the recursive approach mentioned in this article by #Richard Swinbank.

create a CSV file in ADLS from databricks

I am creating a CSV file in an ADLS folder.
For example: sample.txt is the file name
instead of a single file, I see sample.txt/..,part-000 files.
My question is is there a method to create sample.txt file instead of a directory in pyspark.
df.write() or df.save() both create folders and multiple files inside that directory.
Using Coalesce(1) I can combine multiple part-000 files into one file. but how to create a single csv file?
Unfortunately, Spark doesn’t support creating a data file without a folder
To workaround,
Firstly using coalesce or repartition, create a single part (partition) file.
df\
.coalesce(1)\
.write\
.format("csv")\
.mode("overwrite")\
.save("mydata")
The above example produces an mydata directory, a part-000* file, and hidden files
However, our data is contained in only one CSV file. The name of this file is not user-friendly. We can rename this file and extract it.
data_location = "/mydata.csv/"
files = dbutils.fs.ls(data_location)
csv_file = [x.path for x in files if x.path.endswith(".csv")][0]
dbutils.fs.mv(csv_file, data_location.rstrip('/') + ".csv")
dbutils.fs.rm(data_location, recurse = True)
set up account key and configure the storage account to access. then move file from databricks location to adls. To move file, we use dbutils.fs.mv
```python
storage_account_name = "Storage account name"
storage_account_access_key = "storage account acesss key"
spark.conf.set("fs.azure.account.key."+storage_account_name+".blob.core.windows.net",storage_account_access_key)
dbutils.fs.cp('/mydata.csv.csv','abfss://demo12#pratikstorage1.dfs.core.windows.net//mydata1.csv')
My Execution:
Output:

how do i combine txt file from a list of file emplacement

i have a problem, i used "everything" to extract every txt file from a specific directory so that i can merge them. But on emeditor i don't find a way to merge file from a list of localisation.
Here what the everything file look like:
E:\Main directory\subdirectory 1\file.txt
E:\Main directory\subdirectory 2\file.txt
E:\Main directory\subdirectory 3\file.txt
E:\Main directory\subdirectory 4\file.txt
The list goes over 40k location. is there a way to use a program to read all the location in the text file and combine them ?
Also, the subdirectory has other txt file that i don't want to so i can't just merge all txt file from the main. Another thing is that there are variation of the "file.txt" like "Files.txt" for example.

Shell Script to Convert CSV to Text File

I need to create a shell script that reads a different folder based on today's date and inside the folder contains multiple files and one csv file that will have unique name everyday that is tab delimited. I want to pull this file and resave it as a text file.
Example of file path:
data/model/output20190725 (folder contains multiple files, new folder is created everyday)
-logfile1
-logfile2
-part3983isis4838.csv (this csv file will have a new and randomly generated name everyday, the csv file is also tab delimited)
I know how to go from a csv file to a text file, but I don't know how to add the logic of the folder name and the csv name changing everyday.
I saw that I could possibly use grep, but I don't know how to navigate to today's date folder and pull the csv and pass to the next argument to make the conversion.
grep -l .csv * |

informatica Post command task

I am working with multiple source files with single source instance. I created three flat files and one destination table to experiment multiple sources. I am using ‘File list’ concept, for that I created a text file which contains all the flat file names.
Example:
Filename : File_list.txt
File content : Price1.txt
Price2.txt
Price3.txt
In the above example Price1.txt, Price2.txt and Price3.txt are flat file names. I specified File_list.txt as a source file while running the Workflow in Informatica. So it will iterate through all the flat files in the specified file (File_list.txt) and insert all the values to destination table.
Now what I want to do is once data is inserted to the destination, I need to delete that source file in that directory location.
How to achieve this?.
You'll need to write a custom script that will use the File_list.txt as input and perform the delete operations. You can then call it using Post-Session Success Command session component, or as a separate Command Task in the workflow linked using a $YourSessionName.Status = SUCCEEDED condition.

Resources