Spring batch to read multiple csv files into different databases table - spring

I have multiple excels which are different from each other. One has columns like segment, country, product and the other one has application, environment, database and so on... I want to read them and insert them into different tables say t1 and t2 respectively. I am looking for something which can solve this issue.
I have seen many tutorials/blogs etc which shows to read and write single excel into database or multiples files but the structure of excels are same ie columns are same.

The input files are differents, so you would want to create different steps to process them. These steps can be executed in parallel since the read/write operations are independent.

Related

Is there any way to handle source flat file with dynamic structure in informatica power center?

I have to load the flat file using informatica power center whose structure is not static. Number of columns will get changed in the future run.
Here is the source file:
In the sample file I have 4 columns right now but in the future I can get only 3 columns or I may get another set of new columns as well. I can't go and change the code every time in the production , I have to use the same code by handling this situation.
Expected result set is:-
Is there any way to handle this scenario? PLSQL and unix will also work here.
I can see two ways to do it. Only requirement is - source should decide a future structure and stick to it. Come tomorrow if someone decides to change the structure, data type, length, mapping will not work properly.
Solutions -
create extra columns in source towards the end. If you have 5 columns now, extra columns after 5th column will be pulled in as blank. Create as many as you want but pls note, you need to transform them as per future structure and load into proper place in target.
This is similar to above solution but in this case, read the line as single column in source+ source qualifier as large string of length 40000.
Then split the columns as per delimiter in Informatica expression transformation. This splitting can be done by following below thread. This can be also tricky if you have 100s of columns.
Split Flat File String into multiple columns in Informatica

Which Nifi processor to use for RDBMS Extract

i will explain my use case to understand which DB extract utility to use.
I need to extract data from SQL Server tables with varying frequency each day. Each extract query is a complex SQL statement, involving 5-10 tables in joins etc with multiple causes. Have around 20-30 such statements overall.
All these extract queries might be required to run multiple times a day with varying frequencies each day. It depends on how many times we receive data from source system or other cases.
We are planning to use Kafka to publish a message to let Nifi workflow know whenever a RDBMS table is updated and flow needs to be triggered (i can't just trigger Nifi flow based on "incremental" column value, there might only be all row update scenarios and we might not create new rows in tables).
How should i go about designing my Nifi. There are ExecuteSQL/GenerateTableFetch/ExecuteSQLRecord/QueryDatabaseTable all sorts of components available. Which one is going to fit my requirement best?
Thanks!
I am suggesting that you use ExecuteSQL. You can set query from attribute or compose it using attribute. Easiest way is to create json and then parse that json and create attributes. Check this example, here is sql created from file you can adjust it to create it from kafka link

Obtaining number of rows written to multiple targets in Informatica Power Center

I have a requirement where I need to obtain the number of rows written to multiple targets in my mapping. There are 3 targets in my mapping (T1, T2 and T3). I need the number of rows written to each target separately. These values need to be used in subsequent sessions.
I understand that there is a method where I can use separate counters and write them to a flat file and perform a lookup on this file in subsequent mappings. However, I am looking for a direct and better approach to this problem.
You can use the $PMTargetName#numAffectedRows built-in variables. In your case it would be something like
$PMT1#numAffectedRows
$PMT2#numAffectedRows
$PMT3#numAffectedRows
Please refer An ETL Framework for Operational Metadata Logging for details.

How do I use sqlldr to load data from multiple files at once?

I need to load data into an oracle DB using SQLLDR, but I need to pull parts of my table from two different INFILES using the different positions from those infiles?
You can certainly load data from multiple files into a single table and write control files to do that. The format of files should be same. Still running two separate jobs would be a better option. Doing little bit research would help. I have done many extra things using SQL*LOADER.
Sounds like two separate jobs would be simplest.
Depending on the file definitions, it may be possible to use a single job. See this for an idea (except you'd actually have the two record formats loading into the same table rather than different tables).

How can I use Oracle Preprocessor for External Tables to consume this type of format?

Suppose I have a custom file format, which can be analogous to N tables. Let's pick 3. I could transform the file, writing a custom load wrapper to fill 3 database tables.
But suppose for space and resource constraints, I can't store all of this in the tablespace.
Can I use Oracle Preprocessor for External Tables to transform the custom file three different ways?
The examples of use I have read give gzip'd text files an example. But this is a one-to-one file-to-table relationship, with only one transform.
I have a single file with N possible extractions of data.
Would I need to define N external tables, each referencing a different program?
If I map three tables to the same file, how will this affect performance? (Access is mostly or all reads, few or no writes).
Also, what format does the standard output of my preprocessor have to be? Must it be CSV, or are there ways to configure the external table driver?
"If I map three tables to the same
file, how will this affect
performance? (Access is mostly or all
reads, few or no writes"
There should be little or no difference between three sessions accessing the same file through one external table definition or three external table definitions.
External tables aren't cached by the database (might be by the file system or disk), so any access is purely physical reads.
Depending on the pre-processor program, there might be some level of serialization there (or you may use a pre-processor program to impose serialization).
Performance-wise, you'd be better for a single session to scan the external file/table and load it into one or more database tables. The other sessions read it from there and it is cached in the SGA. Also, you can index a database table so you don't have to read it all.
You may be able to use multi-table inserts to load multiple database tables from a single external table definition in a single pass.
"what format does the standard output
of my preprocessor have to be? Must it
be CSV, or are there ways to configure
the external table driver?"
It pretty much follows SQL*Loader, and both are in the Utilities manual. You can use fixed format or other delimiters.
Would I need to define N external
tables, each referencing a different
program?
Depends on how the data is interleaved. Ignoring pre-processors, you can have different external tables pulling different columns from the same file or use the LOAD WHEN clause to determine which records to include or exclude.

Resources