How to create repeted seq in informatica? - etl

How to generate repeated seq using Informatica mapping.
Src file
A
B
C
D
E
F
G
H
I
J
Trg file
A 1
B 1
C 2
D 2
E 3
F 3
G 4
H 4
I 5
J 5
Thank you in advance.

You can use a Sequence Generator, and then an Expression that divides the value of NEXTVAL by 2:
OUT: ROUND(NEXTVAL / 2)
In the Sequence Generator you could set "Start Value" to 1 and check "Reset" so that the mapping always starts with 1 1 2 2 3 3 if that's what you need.

You should be able to achieve this using variable ports in an Expression transformation, as long as your input rows are sorted in the correct order. e.g. (pseudocode)
v_RowCount = v_RowCount + 1
v_Seq = if v_RowCount Mod 2 = 0 then (v_Seq + 1) else v_Seq
(Output port) out_Seq = v_Seq

Related

What is N in this given scenario

I am trying to implement this code and this website has kindly provided their algorithm but I am trying to Find out what is "N" I understood what "I" and "M" is but not "N", is "N" the Total input(in the below example 5 because there are 5 letters)?
Algorithm:
Combinations are generated in lexicographical order. The algorithm uses indexes of the elements of the set. Here is how it works on example: Suppose we have a set of 5 elements with indexes 1 2 3 4 5 (starting from 1), and we need to generate all combinations of size m
= 3.
First, we initialize the first combination of size m - with indexes in ascending order
1 2 3
Then we check the last element (i = 3). If its value is less than n - m + i, it is incremented by 1.
1 2 4
Again we check the last element, and since it is still less than n - m
i, it is incremented by 1.
1 2 5
Now it has the maximum allowed value: n - m + i = 5 - 3 + 3 = 5, so we move on to the previous element (i = 2).
If its value less than n - m + i, it is incremented by 1, and all following elements are set to value of their previous neighbor plus 1
1 (2+1)3 (3+1)4 = 1 3 4
Then we again start from the last element i = 3
1 3 5
Back to i = 2
1 4 5
Now it finally equals n - m + i = 5 - 3 + 2 = 4, so we can move to first element (i = 1) (1+1)2 (2+1)3 (3+1)4 = 2 3 4
And then,
2 3 5
2 4 5
3 4 5
and it is the last combination since all values are set to the maximum possible value of n - m + i.
Input:
A
B
C
D
E
Output:
A B C
A B D
A B E
A C D
A C E
A D E
B C D
B C E
B D E
C D E
Take a look at the very first paragraf of the link you provided.
It states that
This combinations calculator generates all possible combinations of m elements from the set of n elements.
So yes, n is the number of elements or letters that the algorithm needs to use.
N here is the size of the set of set from which you generate the combinations. In the given example, "Suppose we have a set of 5 elements with indexes 1 2 3 4 5 (starting from 1)", N is 5.
Combinations are usually symbolized with nCm, or n choose m. So n is the total set size(in this example 5) and m is the number chosen(3).

diagonal value in co-occurrence matrix

I am so newbie and thank you so much in advance for advice
I want to make co-occurrence matrix, and followed link below
How to use R to create a word co-occurrence matrix
but I cannot understand why value of A-A is 10 in the matirx below
It should be 4 isn't it? because there are four A
dat <- read.table(text='film tag1 tag2 tag3
1 A A A
2 A C F
3 B D C ', header=T)
crossprod(as.matrix(mtabulate(as.data.frame(t(dat[, -1])))))
( ) A C F B D
A 10 1 1 0 0
C 1 2 1 1 1
F 1 1 1 0 0
B 0 1 0 1 1
D 0 1 0 1 1
The solution you use presumes each tag appears only once per film, which jives with the definition of a co-occurrence matrix as far as I can tell. Therefore, each A on the first line gets counted as co-occurring with itself and with the other two As, resulting in a total of ten co-occurences when factoring in the A on the second line.

sorting a dataframe by values and storing index and columns

I have a pandas DataFrame which is actually a matrix. It looks as shown below
a b c
d 1 0 5
e 0 6 2
f 2 0 3
I need the values to be sorted and need the values of index and columns of them. the result should be
index Column Value
e b 6
d c 5
f c 3
You need stack for reshape with nlargest:
df1 = df.stack().nlargest(3).rename_axis(['idx','col']).reset_index(name='val')
print (df1)
idx col val
0 e b 6
1 d c 5
2 f c 3
For MultiIndex:
df2 = df.stack().nlargest(3).to_frame(name='val')
print (df2)
val
e b 6
d c 5
f c 3

Unix / Shell Add a range of columns to file

So I've been trying the same problem for the last few days, and I'm at a formatting road block.
I have a program that will only run if its working on an equal number of columns. I know the total column count, and the number needed to add with a filler value of 0, but am not sure how to do this. Is there some time of range option with awk or sed for this?
Input:
A B C D E
A B C D E 1 1 1 1
Output:
A B C D E 0 0 0 0
A B C D E 1 1 1 1
The the alphabet columns are always present (with different values), but this "fill in the blank" function is eluding me. I can't use R for this due to data file size.
One way using awk:
$ awk 'NF!=n{for(i=NF+1;i<=n;i++)$i=0}1' n=9 file
A B C D E 0 0 0 0
A B C D E 1 1 1 1
Just set n to the number of columns you want to pad upto.

Print (or output to file) table of number of steps for Euclid's algorithm

I'd like to print (or send to a file in a human-readable format like below) arbitrary size square tables where each table cell contains the number of steps required to solve Euclid's algorithm for the two integers in the row/column headings like this (table written by hand, but I think the numbers are all correct):
1 2 3 4 5 6
1 1 1 1 1 1 1
2 1 1 2 1 2 1
3 1 2 1 2 3 1
4 1 1 2 1 2 2
5 1 2 3 2 1 2
6 1 1 1 2 2 1
The script would ideally allow me to choose the start integer (1 as above or 11 as below or something else arbitrary) and end integer (6 as above or 16 as below or something else arbitrary and larger than the start integer), so that I could do this too:
11 12 13 14 15 16
11 1 2 3 4 4 3
12 2 1 2 2 2 2
13 3 2 1 2 3 3
14 4 2 2 1 2 2
15 4 2 3 2 1 2
16 3 2 3 2 2 1
I realize that the table is symmetric about the diagonal and so only half of the table contains unique information, and that the diagonal itself is always a 1-step algorithm.
See this and for a graphical representation of what I'm after, but I'd like to know the actual number of steps for any two integers which the image doesn't show me.
I have the algorithms (there's probably better implementations, but I think these work):
The step counter:
def gcd(a,b):
"""Step counter."""
if b > a:
x = a
a = b
b = x
counter = 0
while b:
c = a % b
a = b
b = c
counter += 1
return counter
The list builder:
def gcd_steps(n):
"""List builder."""
print("Table of size", n - 1, "x", n - 1)
list_of_steps = []
for i in range(1, n):
for j in range(1, n):
list_of_steps.append(gcd(i,j))
print(list_of_steps)
return list_of_steps
but I'm totally hung up on how to write the table. I thought about a double nested for loop with i and j and stuff, but I'm new to Python and haven't a clue about the best way (or any way) to go about writing the table. I don't need special formatting like something to offset the row/column heads from the table cells as I can do that by eye, but just getting everything to line up so that I can read it easily is proving too difficult for me at my current skill level, I'm afraid. I'm thinking that it probably makes sense to print/output within the two nested for loops as I'm calculating the numbers I need which is why the list builder has some print statements as well as returning the list, but I don't know how to work the print magic to do what I'm after.
Try this. The programs computes data row by row and prints each row when it's available,
in order to limit memory usage.
import sys, os
def gcd(a,b):
k = 0
if b > a:
a, b = b, a
while b > 0:
a, b = b, a%b
k += 1
return k
def printgcd(name, a, b):
f = open(name, "wt")
s = ""
for i in range(a, b + 1):
s = "{}\t{}".format(s, i)
f.write("{}\n".format(s))
for i in range(a, b + 1):
s = "{}".format(i)
for j in range (a, b + 1):
s = "{}\t{}".format(s, gcd(i, j))
f.write("{}\n".format(s))
f.close()
printgcd("gcd-1-6.txt", 1, 6)
The preceding won't return a list with all computed values, since they are destroyed on purpose. It's easy to do however. Here is a solution with a hash table
def printgcd2(name, a, b):
f = open(name, "wt")
s = ""
h = { }
for i in range(a, b + 1):
s = "{}\t{}".format(s, i)
f.write("{}\n".format(s))
for i in range(a, b + 1):
s = "{}".format(i)
for j in range (a, b + 1):
k = gcd(i, j)
s = "{}\t{}".format(s, k)
h[i, j] = k
f.write("{}\n".format(s))
f.close()
return h
And here is another with a list of lists
def printgcd3(name, a, b):
f = open(name, "wt")
s = ""
u = [ ]
for i in range(a, b + 1):
s = "{}\t{}".format(s, i)
f.write("{}\n".format(s))
for i in range(a, b + 1):
v = [ ]
s = "{}".format(i)
for j in range (a, b + 1):
k = gcd(i, j)
s = "{}\t{}".format(s, k)
v.append(k)
f.write("{}\n".format(s))
u.append(v)
f.close()
return u

Resources