Informatica target table populating - informatica-powercenter

I am new in Informatica,here i am trying to populate my target table by joining two tables where the no. of rows in both the table is 5649 and 2611 respectively.So, my output rows should be 8260.But the no. of rows rows in target table is around 108860 (approx.).
Why this is happening and how should I remove it?

It seems to me that you are confusing the join operation with union.
You need to merge two sets of rows into one, so use a Union transformation, not a Joiner.

if table structure isn't same pick/select only common columns and then perform a union in SQ Override..

Related

How can I merge two tables using ROWID in oracle?

I know that ROWID is distinct for each row in different tables.But,I am seeing somewhere that two tables are being merged using rowid.So,I also tried to see it,but I am getting the blank output.
I have person table which looks as:
scrowid is the column which contains rowid as:
alter table ot.person
add scrowid VARCHAR2(200) PRIMARY KEY;
I populated this person table as:
insert into ot.person(id,name,age,scrowid)
select id,name, age,a.rowid from ot.per a;
After this I also created another table ot.temp_person by same steps.Both table has same table structure and datatypes.So, i wanted to see them using inner join and I tried them as:
select * from ot.person p inner join ot.temp_person tp ON p.scrowid=tp.scrowid
I got my output as empty table:
Is there is any possible way I can merge two tables using rowid? Or I have forgotten some steps?If there is any way to join these two tables using rowid then suggest me.
Define scrowid as datatype ROWID or UROWID then it may work.
However, in general the ROWID may change at any time unless you lock the record, so it would be a poor key to join your tables.
I think perhaps you misunderstood the merging of two tables via rowid, unless what you actually saw was a Union, Cross Join, or Full Outer Join. Any attempt to match rowid, requardless of you define it, doomed to fail. This results from it being an internal definition. Rowid in not just a data type it is an internal structure (That is an older version of description but Oracle doesn't link documentation versions.) Those fields are basically:
- The data object number of the object
- The data block in the datafile in which the row resides
- The position of the row in the data block (first row is 0)
- The datafile in which the row resides (first file is 1). The file
number is relative to the tablespace.
So while it's possible for different tables to have the same rowid, it would be exteremly unlikely. Thus making an inner join on them always return null.

MonetDB simple join performance on 2 tables

Let's assume I have two tables of the same row count. Both tables contain a column that allows for 1-1 join between them.
If those tables were turned into one table instead and thus JOIN statement eliminated from the query, would there be any performance benefit of that?
Another example... Let's assume I have table with 10 columns. From that table I created new table but only taking one column. If I issue statement selecting that one column with WHERE predicate on the same column would there be any performance difference in executing this query on both tables?
What I'm trying to get to is if performance is the same in above cases is it safe to say tables are only containers wrapping number of columns together?
I did run couple tests but with non conclusive results.
Let's assume I have two tables of the same row count. Both tables
contain a column that allows for 1-1 join between them. If those
tables were turned into one table instead and thus JOIN statement
eliminated from the query, would there be any performance benefit of
that?
Performing that join for every query is of course more expensive than materializing the table once and then reading it. So yes, there would be a performance benefit.
Another example... Let's assume I have table with 10 columns. From
that table I created new table but only taking one column. If I issue
statement selecting that one column with WHERE predicate on the same
column would there be any performance difference in executing this
query on both tables?
No, there would be no difference, since tables are represented as collections of columns, which are each stored in their own file.
What I'm trying to get to is if performance is the same in above cases
is it safe to say tables are only containers wrapping number of
columns together?
That is indeed safe to say.

comparing data in two tables taking time

I need to query table1 find all orders and created date ( key is order number an date)).
In table 2 ( key is order number an date) Check if the order exists for a a date.
For this i am scanning table 1 and for each record checking if it exists in table 2. Any better way to do this
In this situation in which your key is identical for both tables, it makes sense to have a single table in which you store both data for Table 1 and Table 2. In that way you can do a single scan on your data and know straight away if the data exists for both criteria.
Even more so, if you want to use this data in MapReduce, you would simply scan that single table. If you only want to get the relevant rows, you could define a filter on the Scan. For example, in the case where you will not be populating rows at all in Table 2, you would simply use a ColumnPrefixFilter
If, however, you do need to keep this data separately in 2 tables, you could pre-split the tables with the same region boundaries for both tables - this will be helpful when you do the query that you are aiming for - load all rows in Table 1 when row exists in Table 2. Essentially this would be a map-side join. You could define multiple inputs in your MapReduce job, and since the region borders are the same, the splits will be such that each mapper will have corresponding rows from both tables. You would probably need to implement your own MultipleInput format for that (the MultiTableInputFormat class recently introduced in 0.96 does not seem to do that map side join)

Hive: Fast concatenate two tables into one?

I have two Hive tables of the same structure (schema). What would be an efficient SQL request to concatenate them into a single table with the same structure?
Update, this works quite fast in my case:
CREATE TABLE xy AS SELECT *
FROM (
SELECT *
FROM x
UNION ALL
SELECT *
FROM y
) tmp;
If you are trying to merge table_A and table_b into a single one, the easiest way is to use the UNION ALL operator. You can find the syntax and use cases here - https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Union
"union all" is a right solution but might be expensive, resource/time wise. I'd recommend creating a table with two partitions, one for table A and another for Table B. This way, no need to merge (or union all). The merged table is available as soon as both partitions get populated.

Wrong index is chosen by Oracle

I have a problem in indexing in Oracle. Will try to explain my problem with an instance as follows.
I have a table TABLE1 with columns A,B,C,D
another table TABLE2 with columns A,B,C,E,F,H
I have created Indexes for TABLE1
IX_1 A
IX_2 A,B
IX_3 A,C
IX_4 A,B,C
I have created Indexes for TABLE1
IY_1 A,B,C
IY_2 A
when i gave query similar to this
SELECT * FROM TABLE1 T1,TABLE2 T2
WHERE T1.A=T2.A
When i give Explain Plan i got its not getting IX_1 nor IY_2
Its taking IX_4 nor IY_1
why this is not picking right index?
EDITED:
Can anyone help me to know difference between INDEX RANGE SCAN,INDEX UNIQUE SCAN, INDEX SKIP SCAN
I guess SKIP SCAN means when a column is skipped in Composite Index by Oracle
what about others i dont have idea!
The best benefit of indexes is that you can select a few rows from a table without scanning the entire table.
If you ask for too many rows(let's say 30% - depends of many things) the engine will prefer to scan the entire table for those rows.
That's because reading a row using an index is gets an overhead : reading some index blocks, and after that reading table blocks.
In your case, in order to join tables T1 and T2, Oracle needs all the rows from those table. Reading(full) the index will be an unsefull operation, adding unnecesary cost.
UPDATE: A step forward: if you run:
SELECT T1.B, T2.B FROM TABLE1 T1,TABLE2 T2
WHERE T1.A=T2.A
Oracle probably will use the indexes(IX2, IY2), because it does not need to read anything from table, because the values T1.B, T2.B, are in indexes.

Resources