i need to create a package to pull data from source to destination. I know this is a simple package. but here I want the source and destination to be configurable with table and columns. ie the Source or destination table/columns can change. In between there can be transformations. Is this possible ?
It is only possible if the column counts and types are consistent. A Data Flow works by knowing ahead of time what the data should look like. It does not support dynamically changing the types or the number of columns.
Related
I need to identify if a schema in database has any change in metadata such as changed table columns or changed procedure/package PL/SQL-codes additional/deleted triggers etc. I've tried to make a expdp with content=metadata_only and calculated a checksum of the dump. But this doesn't work because the checksum changes every time despite the same unchanged database. How to identify if a schema in database (its structure) has changed or not? Do I have to export the plain text metadata instead? Thx.
If you only need to know who did what when, use database auditing.
If you only need to know something might have changed, but don't care what and are okay with the possibility of the change not being significant, you can use the last_ddl_time from dba_objects and compare it to the last maximum value you got on the previous check. This can be done either at the schema or object level.
If you really do need to generate a delta and know for certain that something changed, you have two choices:
Construct data dictionary queries against all application dictionary views (lots of work, because there are lot of views - columns, tables, partitions, subpartitions, indexes, index partitions, index subpartitions, lobs, lob partitions, etc, etc, etc.)
(Recommended) Use dbms_metadata to extract the DDL of the entire schema. See this answer for a query that will export almost every object you would likely care about.
Either using #1 or #2, you can then compare old/new strings or use a hash function (e.g. dbms_crypto.hash) to compute a hash value and compare that. I wrote a schema upgrade tool that does exactly this - surgically identifies and upgrades individual objects that are different than some template source schema. I use dbms_metadata to look for diffs on the hash values. You will, however, need to set certain transforms to omit clauses you don't care about and that could have arbitrary changes, or mask them with regexp_replace after the fact (e.g. a sequence will contain the current value which will always be different.. you don't want to see this as a change). It can be a bit of work.
I have to load the flat file using informatica power center whose structure is not static. Number of columns will get changed in the future run.
Here is the source file:
In the sample file I have 4 columns right now but in the future I can get only 3 columns or I may get another set of new columns as well. I can't go and change the code every time in the production , I have to use the same code by handling this situation.
Expected result set is:-
Is there any way to handle this scenario? PLSQL and unix will also work here.
I can see two ways to do it. Only requirement is - source should decide a future structure and stick to it. Come tomorrow if someone decides to change the structure, data type, length, mapping will not work properly.
Solutions -
create extra columns in source towards the end. If you have 5 columns now, extra columns after 5th column will be pulled in as blank. Create as many as you want but pls note, you need to transform them as per future structure and load into proper place in target.
This is similar to above solution but in this case, read the line as single column in source+ source qualifier as large string of length 40000.
Then split the columns as per delimiter in Informatica expression transformation. This splitting can be done by following below thread. This can be also tricky if you have 100s of columns.
Split Flat File String into multiple columns in Informatica
I'm importing data from a text file using Bulk Insert in the script component in SSIS package.
Package Ran successfully and data imported into SQL
Now how do I validate the completeness of the data?
1. I can get the row count from source and destination and compare.
but my manager wants to know how we can verify all the data has come a cross as it is without any issues.
If we are comparing 2 tables then probably a joining them on all fields and see anything missing out.
I’m not sure how to compare a text file and a sql table.
One way I could is write code to ready the file line by line and query the database for that record and compare each and every field. We have millions of records so this is not going to be a simple task.
Any other way to validate all of the data ??
Any suggestions would be much appreciated
Thanks
Ned
Well you could take the same file and do a look-up to the SQL source and if any of the columns don't match move to a row count.
Here's a generic example of how you can do this.
I am doing a transfer from a text file to a database and I will have to change the column types by using Derived Column. At the moment I am doing it one by one by looking at the destination and seeing what each column is required to be as shown below on the OLE DB Destination Editor. Is there a better way to do this using the Derived Column Data Flow so I can see the columns in the destination?
I am designing a job in Talend (ETL Tool). The incoming Data may have columns in different order.
How do I handle this? I want to map them to a static target (I am using tMap for this).
Also, I need to take care of number of columns(it may be less or more than what is expected)
check this tutorial. It works perfectly:
http://bekwam.blogspot.de/2011/06/dynamic-schemas-in-talend-open-studio.html
tFilterColumn can be used for filter columns flow from one component to another. Even tmap itself is a good way to control number of columns in output table of tmap.