One to many instance join in infomatica - informatica-powercenter

I have a two source table in mapping
Table 1:
Taxid sal_cd
1000. A01
1000. B01
2000. C01
3000. D01
4000. Null
Table 2:
OrderCode amt sal_cd
201. 20. A01
202. 30. B01
202. 10. C01
203. 5. D01
Final data i want like below:
Target
Taxis. Ordercode. Amt
1000. 201. 20
1000. 202. 30
1000. 203. 0
2000. 201. 0
2000. 202. 10
2000. 202. 0
3000. 201. 0
3000. 202. 0
3000. 203. 5
4000. 201. 0
4000. 202. 0
4000. 203. 0
I tried in full outer join ... Its not helping
Thanks in advance

Related

Oracle script to create dynamic query based on SQL held in a field

Where a user gives a set of inputs from one table, e.g. "request_table" a:
User Input
Value
Field Name in Database
Product
Deposit
product_type
Deposit Term (months)
24
term
Deposit Amount
200,000
amount
Customer Type
Charity
customer_type
Existing Customer
Y
existing_customer
Would like to use the product selection to pick out SQL scripts embedded in a "pricing_table" b, where the price is made up of components, each of which are affected by one or more of the above inputs:
Product
Grid
Measures
Value1
Value1Min
Value1Max
Value2
Value2Min
Value2Max
Price
Deposit
Term_Amount
a.term>=b.value1min and a.term<b.value2 max and a.amount>=b.value2min and a.amount<b.value2max
0
12
0
100000
1
Deposit
Term_Amount
a.term>=b.value1min and a.term<b.value2 max and a.amount>=b.value2min and a.amount<b.value2max
12
36
0
100000
2
Deposit
Term_Amount
a.term>=b.value1min and a.term<b.value2 max and a.amount>=b.value2min and a.amount<b.value2max
36
9999
0
100000
3
Deposit
Term_Amount
a.term>=b.value1min and a.term<b.value2 max and a.amount>=b.value2min and a.amount<b.value2max
0
12
100000
500000
1.1
Deposit
Term_Amount
a.term>=b.value1min and a.term<b.value2 max and a.amount>=b.value2min and a.amount<b.value2max
12
36
100000
500000
2.1
Deposit
Term_Amount
a.term>=b.value1min and a.term<b.value2 max and a.amount>=b.value2min and a.amount<b.value2max
36
9999
100000
500000
3.1
Deposit
Term_Amount
a.term>=b.value1min and a.term<b.value2 max and a.amount>=b.value2min and a.amount<b.value2max
0
12
500000
99999999
1.2
Deposit
Term_Amount
a.term>=b.value1min and a.term<b.value2 max and a.amount>=b.value2min and a.amount<b.value2max
12
36
500000
99999999
2.2
Deposit
Term_Amount
a.term>=b.value1min and a.term<b.value2 max and a.amount>=b.value2min and a.amount<b.value2max
36
9999
500000
99999999
3.2
Deposit
Customer_Type
a.customer_type=b.value1
Personal
0
Deposit
Customer_Type
a.customer_type=b.value1
Charity
0.1
Deposit
Customer_Type
a.customer_type=b.value1
Business
-0.1
Deposit
Existing_Customer
a.existing_customer=b.value1
Y
0.1
Deposit
Existing_Customer
a.existing_customer=b.value1
N
0
Where the query is: select distinct measures from pricing_table where product=(select product_type from request_table). This gives multiple rows where SQL logic is held.
Would like to run this SQL logic in a LOOP, e.g.:
select b.* from pricing_table b where :measures
This would return all rows where the specific metrics are matched.
Doing it this way as the exact columns in the input can grow to hundreds, so don't want a really wide table.
Any help appreciated thanks.
I've creating tables but am unsure how to loop the measures, and apply the values from that field in a looped query thanks.
In a PL/SQL pipelined function, you can build the SQL query and open a cursor on it, loop on the results and PIPE the rows.

What is the syntax in Oracle to round any number to the greatest/highest place value of that number?

I have a wide variety of numbers
In the ten thousands, thousands, hundreds, etc
I would like to compute the rounding to the highest place value ex:
Starting #: 2555.5
Correctly Rounded : 3000
——
More examples ( in the same report )
Given: 255
Rounded: 300
Given: 25555
Rounded: 30000
Given: 2444
Rounded: 2000
But with the Round() or Ceil() functions I get the following
Given: 2555.5
Did not want : 2556
Any ideas ??? Thank you in advance
You can combine numeric functions like this
SELECT
col,
ROUND(col / POWER(10,TRUNC(LOG(10, col)))) * POWER(10,TRUNC(LOG(10,col)))
FROM Data
See fiddle
Explanation:
LOG(10, number) gets the power you need to raise 10 to in order get the number. E.g., LOG(10, 255) = 2.40654 and 10^2.40654 = 255
TRUNC(LOG(10, col)) the number of digit without the leading digit (2).
POWER(10,TRUNC(LOG(10, col))) converts, e.g., 255 to 100.
Then we divide the number by this rounded number. E.g. for 255 we get 255 / 100 = 2.55.
Then we round. ROUND(2.55) = 3
Finally we multiply this rounded result again by the previous divisor: 3 * 100 = 300.
By using the Oracle ROUND function with a second parameter specifying the number of digits with a negative number of digits, we can simplify the select command (see fiddle)
SELECT
col,
ROUND(col, -TRUNC(LOG(10, col))) AS rounded
FROM Data
You can also use this to round by other fractions like quarters of the main number:
ROUND(4 * col, -TRUNC(LOG(10, col))) / 4 AS quarters
see fiddle
Similar to what Olivier had built, you can use a combination of functions to round the numbers as you need. I had built a similar method except instead of using LOG, I used LENGTH to get the number of non-decimal digits.
WITH
nums (num)
AS
(SELECT 2555.5 FROM DUAL
UNION ALL
SELECT 255 FROM DUAL
UNION ALL
SELECT 25555 FROM DUAL
UNION ALL
SELECT 2444 FROM DUAL)
SELECT num,
ROUND (num, (LENGTH (TRUNC (num)) - 1) * -1) as rounded
FROM nums;
NUM ROUNDED
_________ __________
2555.5 3000
255 300
25555 30000
2444 2000

Replacing selective numbers with NaNs

I have eight columns of data. Colulmns 1,3, 5 and 7 contain 3-digit numbers. Columns 2,4,6 and 8 contain 1s and zeros and correspond to 1, 3, 5 and 7 respectively. Where there is a zero in an even column I want to change the corresponding number to NaN. More simply, if it were
155 1 345 0
328 1 288 1
884 0 145 0
326 1 332 1
159 0 186 1
then 884 would be replaced with NaN, as would 159, 345 and 145 with the other numbers remaining the same. I need to use NaN to maintain the data in matrix form.
I know I could use
data(3,1)=Nan; data(5,1)=Nan
etc but this is very time consuming. Any suggestions would be very welcome.
Approach 1
a1 = [
155 1 345 0
328 1 288 1
884 0 145 0
326 1 332 1
159 0 186 1]
t1 = a1(:,[2:2:end])
data1 = a1(:,[1:2:end])
t1(t1==0)=NaN
t1(t1==1)=data1(t1==1)
a1(:,[1:2:end]) = t1
Output -
a1 =
155 1 NaN 0
328 1 288 1
NaN 0 NaN 0
326 1 332 1
NaN 0 186 1
Approach 2
[x1,y1] = find(~a1(:,[2:2:end]))
a1(sub2ind(size(a1),x1,2*y1-1)) = NaN
I would split the problem into two matrices, with one being a logical mask, the other holding your data.
data = your_mat(:,1:2:end);
valid = your_mat(:,2:2:end);
Then you can simply do:
data(~valid)=NaN;
You could then rebuild your data by doing:
your_mat(:,1:2:end) = data;
Here is an interesting solution, I would expect it to perform quite well, but be aware that it is a bit tricky!
data(~data(:,2:end))=NaN
Using logical indexing:
even = a1(:,2:2:end); % even columns
odd = a1(:,1:2:end); % odd columns
odd(even == 0) = NaN; % set odd columns to NaN if corresponding col is 0
a1(:,1:2:end) = odd; % assign back to a1
a1 =
155 1 NaN 0
328 1 288 1
NaN 0 NaN 0
326 1 332 1
NaN 0 186 1
Here is an alternative solution. You can use circshift, in the following manner.
First create a mask of the even columns of the same size of your input matrix A:
AM = false(size(A)); AM(:,2:2:end) = true;
Then circshift the mask (A==0)&AM one element to the left, to shift this mask on the odd columns.
A(circshift((A==0)&AM,[0 -1])) = nan;
NOTE: I've searched for a one-liner ... I don't think it's a good one, but here is one you can use, based on my solution:
A(circshift(bsxfun(#and, A==0, mod(0:size(A,2)-1,2)),[0 -1])) = nan;
The dirty thing with bsxfun is to create on-line the mask AM. I use for that the oddness test on a vector of indices, bsxfun extends it over the whole matrix A. You can do anything else to create this mask, of course.

How to substitute a for-loop with vecorization acting several thousand times per data.frame row?

Being still quite wet behind the ears concerning R and - more important - vectorization, I cannot get my head around how to speed up the code below.
The for-loop calculates a number of seeds falling onto a road for several road segments with different densities of seed-generating plants by applying a random propability for every seed.
As my real data frame has ~200k rows and seed numbers are up to 300k/segment, using the example below would take several hours on my current machine.
#Example data.frame
df <- data.frame(Density=c(0,0,0,3,0,120,300,120,0,0))
#Example SeedRain vector
SeedRainDists <- c(7.72,-43.11,16.80,-9.04,1.22,0.70,16.48,75.06,42.64,-5.50)
#Calculating the number of seeds from plant densities
df$Seeds <- df$Density * 500
#Applying a probability of reaching the road for every seed
df$SeedsOnRoad <- apply(as.matrix(df$Seeds),1,function(x){
SeedsOut <- 0
if(x>0){
#Summing up the number of seeds reaching a certain distance
for(i in 1:x){
SeedsOut <- SeedsOut +
ifelse(sample(SeedRainDists,1,replace=T)>40,1,0)
}
}
return(SeedsOut)
})
If someone might give me a hint as to how the loop could be substituted by vectorization - or maybe how the data could be organized better in the first place to improve performance - I would be very grateful!
Edit: Roland's answer showed that I may have oversimplified the question. In the for-loop I extract a random value from a distribution of distances recorded by another author (that's why I can't supply the data here). Added an exemplary vector with likely values for SeedRain distances.
This should do about the same simulation:
df$SeedsOnRoad2 <- sapply(df$Seeds,function(x){
rbinom(1,x,0.6)
})
# Density Seeds SeedsOnRoad SeedsOnRoad2
#1 0 0 0 0
#2 0 0 0 0
#3 0 0 0 0
#4 3 1500 892 877
#5 0 0 0 0
#6 120 60000 36048 36158
#7 300 150000 90031 89875
#8 120 60000 35985 35773
#9 0 0 0 0
#10 0 0 0 0
One option is generate the sample() for all Seeds per row of df in a single go.
Using set.seed(1) before your loop-based code I get:
> df
Density Seeds SeedsOnRoad
1 0 0 0
2 0 0 0
3 0 0 0
4 3 1500 289
5 0 0 0
6 120 60000 12044
7 300 150000 29984
8 120 60000 12079
9 0 0 0
10 0 0 0
I get the same answer in a fraction of the time if I do:
set.seed(1)
tmp <- sapply(df$Seeds,
function(x) sum(sample(SeedRainDists, x, replace = TRUE) > 40)))
> tmp
[1] 0 0 0 289 0 12044 29984 12079 0 0
For comparison:
df <- transform(df, GavSeedsOnRoad = tmp)
df
> df
Density Seeds SeedsOnRoad GavSeedsOnRoad
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0
4 3 1500 289 289
5 0 0 0 0
6 120 60000 12044 12044
7 300 150000 29984 29984
8 120 60000 12079 12079
9 0 0 0 0
10 0 0 0 0
The points to note here are:
try to avoid calling a function repeatedly in a loop if you the function is vectorised or can generate the entire end result with a single call. Here you were calling sample() Seeds times for each row of df, each call returning a single sample from SeedRainDists. Here I do a single sample() call asking for sample size Seeds, for each row of df - hence I call sample 10 times, your code called it 271500 times.
even if you have to repeatedly call a function in a loop, remove from the loop anything that is vectorised that could be done on the entire result after the loop is done. An example here is your accumulating of SeedsOut, which is calling +() a large number of times.
Better would have been to collect each SeedsOut in a vector, and then sum() that vector outside the loop. E.g.
SeedsOut <- numeric(length = x)
for(i in seq_len(x)) {
SeedsOut[i] <- ifelse(sample(SeedRainDists,1,replace=TRUE)>40,1,0)
}
sum(SeedOut)
Note that R treats a logical as if it were numeric 0s or 1s where used in any mathematical function. Hence
sum(ifelse(sample(SeedRainDists, 100, replace=TRUE)>40,1,0))
and
sum(sample(SeedRainDists, 100, replace=TRUE)>40)
would give the same result if run with the same set.seed().
There may be a fancier way of doing the sampling requiring fewer calls to sample() (and there is, sample(SeedRainDists, sum(Seeds), replace = TRUE) > 40 but then you need to take care of selecting the right elements of that vector for each row of df - not hard, just a light cumbersome), but what i show may be efficient enough?

GAMS, matrix direct assignment

I want to assign values to a 3-D table in GAMS. But it seems it doesn't work as in Matlab.....Any luck ? Code is as followed and the problem is at the last few lines:
Sets
n nodes / Sto , Lon , Par , Ber , War , Mad , Rom /
i scenarios / 1 * 4 /
k capacity level / L, N, H / ;
alias(n,m);
Table balance(n,i) traffic balance for different nodes
1 2 3 4
Sto 50 50 -50 -50
Lon -40 40 -40 40
Par 0 0 0 0
Ber 0 0 0 0
War 40 -40 40 -40
Mad 0 0 0 0
Rom -50 -50 50 50 ;
Scalar r fluctuation rate of the capacity level
/0.15/;
Parameter p(k) probability of each level
/ L 0.25
N 0.5
H 0.25 / ;
Table nor_cap(n,m) Normal capacity level from n to m
Sto Lon Par Ber War Mad Rom
Sto 0 11 14 25 30 0 0
Lon 11 0 21 0 0 14 0
Par 14 21 0 22 0 31 19
Ber 25 0 22 0 26 0 18
War 30 0 0 26 0 18 22
Mad 0 14 31 0 18 0 15
Rom 0 0 19 18 22 15 0 ;
Table max_cap(n,m,k) capacity level under each k
max_cap(n,m,'N')=nor_cap(n,m)
max_cap(n,m,'L')=nor_cap(n,m)*(1-r)
max_cap(n,m,'H')=nor_cap(n,m)*(1+r);
The final assignment to a 3-D matrix should be done with PARAMETER as opposed to TABLE. In general I would also note that TABLE is very restrictive (2 dimensional, text input inside the code). You might want to consider $GDXIN (or EXECUTE_LOAD) and some of the GAMS utilities for loading xls or csv files.
As a user of both MATLAB and GAMS I would note that GAMS depends on "indices" for every array, but otherwise they can be quite similar. In your case max_cap(n,m,k) would be something like the maximum capacity between from_city and to_city under each capacity level scenario. Your matrix needs to be declared as a PARAMETER which can be any n-dimensional (indexed) matrix, including even a SCALAR.
Also, try the GAMS mailing list if you really need an answer quickly, the number of proficient GAMS users globally can't be more than a few thousand, so it might be hard to find a quick answer on StackOverflow - awesome as it is for the more common languages.

Resources