Multiple field names - etl

I have a txt file, which I have to insert into a database.
My problem is that in some files I have header "customer_" instead of "customer".
I don’t know how to fix this in Pentaho. I’ve tried "select values" but I have no idea how it works.
My transformation for now : get file names -> csv file input -> tx file output -> table output.

You have Metadata Injection capabilities built in Pentaho Data Integration, but just "any" file won't work, you need some kind of logic to determine that "customer_" or whatever you get maps to the "customer" column in the database.
Once you have the logic to build of the variations of possible columns in the origin file to columns in the table, you can inject that metadata to your transformation.

Related

File And File Grouping SQL server

I have a filegroup Named (Year2020) which contains There different .ndf files, for example Summer.ndf Winter.ndf, Fall.ndf.
Now I want to create a Fall table and I want my table to be saved in Fall.ndf file not on Summer.ndf not on Winter.ndf Is there a way to do things like this? I am using SQL Server.
The problem is all are in the same filegroup Named year2020....how can we save it exactly where we want ??
When I save the fall table it goes into summer.ndf not on Fall.ndf

2 differents Readers to fill an item in the same step

I have this situation. I have a csv file with some information, I have to complete this information with a database table registers ad write a new file with these info.
I guess that I should use a MultipleReader implementation, one to read my file and other to read my database (Some like this example: https://bigzidane.wordpress.com/2016/09/15/spring-batch-multiple-sources-as-input/) But I need pass conditions to te query in relation a the current item being processed.
Any way, If it is possible, I need configure a query data in my reader2 with info getted in my reader1. How I could make this?
This a little resume of my problem:
Input File (Reader1)
Id;data1;data2;data3
Database (Reader2)
Id|data4;data5;data6
Output File
Id;data1;data2;data3;data4;data5;data6
Sorry My english. Any link to articles or docs is good.
This is a common pattern known as the "driving query pattern" and is described in the Common patterns section of the reference documentation.
You can use a FlatFileItemReader to read your input file and an item processor to query the database for the current item and enrich it with additional data.
Another idea is to load your flat file in a staging table in the database and use a database item reader to join data.

Infomatica Reading From Metadata

I have a metadata Name as CONTACTS(SOURCE.CSV|TAGET.CSV). Now I read this file using reader and populate the value in table that I created as CONTACT_TABLE(PK NUMBER, Source_name varchar2(500),target_name varchar2(500)) after that I want to read these source.csv and target.csv file stored in my table CONTACT_TABLE AND populate the value in other table called SOURCE_COLUMN_TARGET_COLUMN_TABLE(PK,FK as pk of contact_table,source_column,target_column) this table should contain all the column of source and target and should have one to one relationship with that, for example, source.csv(fn)-----target.csv(firstName)
My objective is whenever we add some other attribute in source or target I should not change the entire mapping for eg if we add source.csv(email) and target.csv(email) it should directly map
Thanks!
please help!
I have this task completed before Friday and I searched every source I found dynamic mapping thing and parameter thing but it was not very helpful I want to do this way itself
Not clear what you are asking actually. The source analyser uses source files(.csv) on import itself and thereby contains the same format in source qualifier.
So, if any of the values gets added into your existing files (source.csv, target.csv) then it becomes a new file for your existing mapping. hence, you dont need to change the whole mapping just that you need to import it again.

How to make an excel cell as a source file for a parameter for a query?

I have a query on Excel query editor from multiple excel files, I changed the source to be a parameter, and the parameter type is a text. it worked. then I checked if I can make the parameter source is a query from excel cell (Power Query Parameters – How to use Named Cells as Flexible Inputs. but I faced a problem that there is an error as below :
Formula.Firewall: Query 'XXXX' (step 'Source') references other queries or steps, so it may not directly access a data source. Please rebuild this data combination.
I checked Excelguru solution ( Power Query Errors: Please Rebuild This Data Combination)
the point that the query I creates is a product from merged tables, Do I need to make a staging quires first before the merge, or any better idea?

Oracle - build dimension from a file based data source

I'm trying to build a star schema in Oracle 12c. In my case my data source is not a relational database but a single excel/csv file which is populated via a google form, which means I don't have any sort of reference from a source system such as auto incremental keys/ids. Now what would be the best approach to build a star schema given this condition?
File row sample:
<submitted timestamp>,<submitted by user>,<region>,<country>,<branch>,<branch location>,<branch area>,<branch type>,<branch name>,<branch private? yes/no value>,<the following would be all "fact" values (measurements),...,...,...
In case i wanted to build a "branch" dimension, how would I handle updates/inserts after the first load into the dimension table?
Thought solution so far:
I had thought of making a concatenated string "key" with the branch values, which would make it unique (underscore would be the "glue" to concatenate the values), eg:
<region>_<country>_<branch>_<branch location> as branch_key
I would insert all the distinct branches into a staging table, including they branch_key column for each one of them, then when trying to load into the dimension I could compare which key does not exists yet in my dimension table and then insert it. As for updates, I'm a bit stuck on how to handle that, I had thought of having another file mapping which branches are active having a expiration date column. Basically trying to simulate what I could do having the data in a database instead of CSV files.
This is all I can think of so far, do you have any other recommendations/ideas on how to implement this? Take on consideration that the data source cannot as in I have to read these csv files, since data is not stored anywhere else.
Thank you.

Resources