Compute Node computes data from previous flow - ibm-integration-bus

This is how my flow looks like:
File Input -> File Read -> Compute -> Mapping -> Compute -> File Output
From file read node I save data to ${LocalEnvironment}. Also tried ${Environment}. And this is what happens behind Compute Node:
Send file A - data inside environment of A is computed (edit after answer -> is not computed, flow stops unexpectedly).
Send file B - data inside environment of A is computed.
Send file C - data inside environment of B is computed.
Send file D - data inside environment of C is computed.
How this offset is even possible? ${LocalEnvironment} should be reset at the beggining of the flow.
EDIT:
Never use Environment variable. Local variables should be stored inside $LocalEnvironment/Variables.
Explanation
Got that. And even now I think my Variables aren't being cleared. File Read still produces $LocalEnvironment/Variables/BLOB/BLOB from previous run.
EDIT2:*
Node settings:
File Input:
Input directory: C:\Users\User1\Documents\In
File pattern: *
Action on successful processing: move to mqsiarchive
Message domain: XMLNS (I know it should be XMLNSC but it works)
Use XMLNSC compact parser...: check
File read:
Input directory: C:\Users\User1\Documents\In\mqsiarchive
File name or pattern: *
Action: Delete
Request directory property location: $LocalEnvironment/Destination/File/Directory
Request filename property location: $LocalEnvironment/Destination/File/Name
Offset property location: $LocalEnvironment/Destination/File/Offset
Length property location: $LocalEnvironment/Destination/File/Length
Result data location: $ResultRoot
Output data location: $OutputLocalEnvironment/Variables
Copy local environment: check
Record selection expression: true()
Compute nodes:
Compute mode: LocalEnvironment and Message
File output (this one doesn't matter much since problem occurs even without it):
Output directory: C:\Users\User1\Documents\Out
Filename or pattern: test.txt
Stage in mqsitransit...: check
Data location: $Body
Request directory property location: $LocalEnvironment/Destination/File/Directory
Request filename property location: $LocalEnvironment/Destination/File/Name
Properties that I didn't mention: default

I think it works like this:
In the 1. transaction the File Read node has nothing to read because archive directory is empty. Therefore the message A from File Input is processed.
In the 2. transaction the File Read nodes finds message A from 1. transaction. Therefore message B just gets archived and message A from File Read node gets processed (again).

Related

Data Factory | Copy recursively from multiple subfolders into one folder wit same name

Objective: Copy all files from multiple subfolders into one folder with same filenames.
E.g.
Source Root Folder
20221110/
AppID1
File1.csv
File2.csv
/AppID2
File3.csv
File4.csv
20221114
AppID3
File5.csv
File6.csv
and so on
Destination Root Folder
File1.csv
File2.csv
File3.csv
File4.csv
File5.csv
File6.csv
Approach 1 Azure Data Factory V2 All datasets selected as binary
GET METADATA - CHILDITEMS
FOR EACH - Childitem
COPY ACTIVITY(RECURSIVE : TRUE, COPY BEHAVIOUR: FLATTEN)
This config renames the files with autogenerated names.
If I change the copy behaviour to preserve hierarchy, Both file name and folder structure remains intact.
Approach 2
GET METADATA - CHILDITEMS
FOR EACH - Childitems
Execute PL2 (Pipeline level parameter: #item.name)
Get Metadata2 (Parameterised from dataset, invoked at pipeline level)
For EACH2- Childitems
Copy (Source: FolderName - Pipeline level, File name - ForEach2)
Both approaches not giving the desired output. Any help/Workaround would be appreciated.
My understanding is in Option 2
Step 3 & 5 :is done as to iterate through the folder and subfolder correct ?
6 . Copy (Source: FolderName - Pipeline level, File name - ForEach2)
I think since in step 6 you already have the filename . On the SINK side , add an dynamic expression and add #Filename and that should do the trick .
If all of your files are in the same directory level, you can try the below approach.
First use Get Meta data activity to get all files list and then use copy inside ForEach to copy to a target folder.
These are my source files with directory structure:
Source dataset:
Based on your directory level use the wildcard placeholder(*/*) in the source dataset.
The above error is only a warning, and we can ignore it while debug.
Get meta data activity:
This will give all the files list inside subfolders.
Give this array to a ForEach activity and inside ForEach use copy activity.
Copy activity source:
In the above also, the */* should be same as we gave in Get Meta data.
For sink dataset create a dataset parameter and use in the file path of dataset.
Copy activity sink:
Files copied to target folder:
If your source files are not in same directory level then you can try the recursive approach mentioned in this article by #Richard Swinbank.

Merge CSVs with PowerAutomate

I am experimenting with the flow editor in PowerAutomate to merge a bunch of CSVs from OneDrive that I am syncing via rclone.
The structure is:
OneDrive(Root)/Folder/Subfolder/*.csv
I would like to merge them into a dataset (master CSV) that I can use with PowerBI.
Because this dataset is updated daily, new csvs will get added to the folder, thus my triggering event is "When a file is created"
The automation looks like this:
When a file is created >
Initialize a String Variable >
Find files in folder >
Search Query: *
Folder: Same as #1
FileSearch Mode: OneDriveSearch
Apply to each >
Get File content
Append to string variable
Compose (string variable) >
Create file >
File Path: whatever/path/
File Name: whatever.csv
File Contents: Outputs
The automation runs fine, and creates my master csv.
Except it's blank!
What's going on?
When a File is Created:
Initialize Variable:
Find Files in Folder:
Apply to Each:
Append String to Variable:
Compose:
Create a File:
At the end of the day, I get a CSV with some data written to it. *Previously, I thought it was blank, but it does, indeed have data.. but it appears truncated.
Something to note here... it looks like it retains the headers..
It looks like it grabbed the first 31 files, but there are 229 files in that folder.
Thanks in advance

Temp file not being deleted

I'm trying to create a temporary file in my pipeline, then use that file in another rule.
For example, I have two rules in a .smk file:
#Unzip adapter trimmed fastq file
rule unzip_fastq:
input:
'{sample}.adapterTrim.round2.fastq.gz',
output:
temp('{sample}.adapterTrim.round2.fastq')
conda:
'../envs/rep_element.yaml'
shell:
'gunzip -c {input[0]} > {output[0]}'
#Run bowtie2 to align to rep elements and parse output
rule parse_bowtie2_output_realtime:
input:
'{sample}.adapterTrim.round2.fastq'
output:
'rep_element_pipeline/{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam'
params:
bt2=config["ref"]["bt2_index_path"], eid=config["ref"]["enst2id"]
conda:
'../envs/rep_element.yaml'
shell:
'perl ../scripts/parse_bowtie2_output_realtime_includemultifamily.pl '
'{input[0]} {params.bt2} {output[0]} {params.eid}'
{sample}.adapterTrim.round2.fastq is used once and should ultimately be deleted upon completion. However, I'm finding that this file is uploaded to Amazon S3, even with the addition of temp(). I'm also finding that this file is removed locally, but still persists on S3.
Am I doing this correctly? '{sample}.adapterTrim.round2.fastq' is not currently written in the rule-all of the Snakefile.
We ultimately need to prevent this file from being uploaded to S3, so if there is a way to specify not to upload this file in the rule, that would be useful.
It seems that the snippet in the question is not consistent with actual use, since for S3 files one would need to wrap file names in remote.
However, as a general solution, documentation contains the following:
The remote() wrapper is mutually-exclusive with the temp() and protected() wrappers.
Hence, if you intend to use a temp file, make sure it's not wrapped in remote, or explicitly wrap the file in local.

clang:how can fdebug-prefix-map use new path relative to user home path `~`?

I try rewrite the source file path to ~/src/lib by using fdebug-prefix-map.
I can confirm DW_AT_decl_file is rewritten to something like ~/src/lib/path.
But the result is lldb can't find the source file. If I change to a absolute path, it works fine.
How can I solve this?
You can use the target.source-map setting to remap location of source files. From (lldb) apropos source-map:
Source path remappings are used to track the change of location between a source file when built, and where it exists on the current system. It consists of an array of duples, the first element of each duple is some part (starting at the root) of the path to the file when it was built, and the second is where the remainder of the original build hierarchy is rooted on the local system. Each element of the array is checked in order and the first one that results in a match wins.
The usage looks something like:
(lldb) settings append target.source-map /foo /bar
Note that you use append here instead of set, because otherwise you'd overwrite the mapping every time you add an entry. You can check the mapping with:
(lldb) settings show target.source-map

informatica Post command task

I am working with multiple source files with single source instance. I created three flat files and one destination table to experiment multiple sources. I am using ‘File list’ concept, for that I created a text file which contains all the flat file names.
Example:
Filename : File_list.txt
File content : Price1.txt
Price2.txt
Price3.txt
In the above example Price1.txt, Price2.txt and Price3.txt are flat file names. I specified File_list.txt as a source file while running the Workflow in Informatica. So it will iterate through all the flat files in the specified file (File_list.txt) and insert all the values to destination table.
Now what I want to do is once data is inserted to the destination, I need to delete that source file in that directory location.
How to achieve this?.
You'll need to write a custom script that will use the File_list.txt as input and perform the delete operations. You can then call it using Post-Session Success Command session component, or as a separate Command Task in the workflow linked using a $YourSessionName.Status = SUCCEEDED condition.

Resources