Create Rows depending on count in Informatica - informatica-powercenter

I am new to informatica power center tool and performing some assignment.
I have input data in a flat file.
data.csv contains
A,2
B,3
C,2
D,1
And Required output will be
output.csv should be like
A
A
B
B
B
C
C
D
Means I need to create output rows depending upon value in column. I tried it using java transformation and I got the result.
Is there any other way to do it.
Please help.

Java transformation is a very good approach, but if you insist on an alternative implementation, you can use a helper table and a Joiner transformation.
Create a helper table and populate it with appropriate amount of rows (you need to know the maximum value that may appear in the input file).
There is one row with COUNTER=1, two rows with COUNTER=2, three rows with COUNTER=3, etc.
Use a Joiner transformation to join data from the input file and the helper table - since the latter contains multiple rows for a single COUNTER value, the input rows will be multiplied.
COUNTER
-------------
1
2
2
3
3
3
4
4
4
4
(...)
Depending on your RDBMS, you may be able to produce the contents of the helper table using a SQL query in a source qualifier.

Related

Google Sheet Query: Select misses data when there are different data type in a column?

I have a table like this:
a
b
c
1
2
abc
2
3
4.00
note c2 is text while c3 is a number.
When I do
=QUERY(A1:C,"select *")
The result is like
a
b
c
1
2
2
3
4.00
The "text" in C2 has been missed. You can see the live sheet here:
https://docs.google.com/spreadsheets/d/1UOiP1JILUwgyYUsmy5RzQrpGj7opvPEXE46B3xfvHoQ/edit?usp=sharing
How to deal with this issue?
QUERY is very useful, but it has a main limitation: only can handle one kind of data per column. The other data is left as blank. There are usually ways to try to overcome this from inside the QUERY, but I've found them unfruitful. What you can do is just to use:
={A:C}
You can work with filters by its own, but as a step-by-step to adapt the main features of query: If you need to add conditions, use LAMBDA INDEX and FILTER
For example, to check where A is not null:
=LAMBDA(quer,FILTER(quer,INDEX(quer,,1)<>""))({A:C}) --> with INDEX(quer,,1), I've accesed the first column
Where B is more than one cell and less than other:
=LAMBDA(quer,FILTER(quer,INDEX(quer,,2)>D1,INDEX(quer,,2)<D2))({A:C})
For sorting and limiting an amount of items, use SORTN. For example, you want to sort by 3rd column and limit to 5 higher values in that column:
=LAMBDA(quer,SORTN(FILTER(quer,INDEX(quer,,1)<>""),5,1,3,0))({A:C})
Or, to limit to 5 elements without sorting use ARRAY_CONSTRAIN:
=ARRAY_CONSTRAIN(LAMBDA(quer,FILTER(quer,INDEX(quer,,1)<>""))({A:C}),5)
There are other options, you can use REGEXMATCH and other options, and emulate QUERYs functions without missing data. Let me know!
shenkwen,
If you are comfortable with adding an Google App Script in your sheet to give you a custom function, I have a QUERY replacement function that supports all standard SQL SELECT syntax. I don't analyze the column data to try and force to one type based on which is the most common data in the column - so this is not an issue.
The custom function code - is one file and is at:
https://github.com/demmings/gsSQL/tree/main/dist
After you save, you have a new function from your sheet. In your example, the syntax would be
=gsSQL("select a,b,c from testTable", {{"testTable", "F150:H152", 60, true}})
If your data is on a separate tab called 'testTable'(or whatever you want), the second parameter is not required.
I have typed in your example data into my test sheet (see line 150)
https://docs.google.com/spreadsheets/d/1Zmyk7a7u0xvICrxen-c0CdpssrLTkHwYx6XL00Tb1ws/edit?usp=sharing

how to fetch previous values of a table in oracle forms

My first task is to add two new columns to a table, first column stores the values of M and X fields values in a single column(as a single unit with a pipe separator) and second column stores O and Z fields values in a single column(as a single unit with a pipe separator).
second task selecting agency and external letter rating(shown in image) from drop down and after saving the form the value from fields M and X should move to N and Y and this values should be stored in table column that are created from task one, Now if we save the form the values should move to O and Z fields in forms and this should continue.
Can any one help me how to proceed with this and I don't know how to separate a column value into pieces and display on form.
Better if you propose any new method that does the same work.
Adding columns:
That's a bad idea. Concatenating values is easy; storing them into a column as well. But, then - in the next step - you have to split those values into two values (columns? rows?) to be joined to another value and produce result. Can you do it? Sure. Should you? No.
What to do? If you want to store 4 values, then add 4 columns to a table.
Alternatively, see if you can create a master-detail relationship between two tables so you'd actually create a new table (with a foreign key to existing table) with two additional columns:
one that says is value stored related to M or Y
value itself
It looks like more job to do, but - should pay off in the future.
Layout:
That really looks like a tabular form, which only supports what I previously said. You can't "dynamically" add rows (or, even if you could, that's really something you should avoid because you'd have to add (actually, display) separate items (not rows that share the same item name).

SSIS: flagging ALL the Data Quality issues in each row with Conditional Split

I have been tasked with performing Data Quality checks on data from a SQL table, whereby I export problem rows into a separate SQL table.
So far I've used a main Conditional Split that goes into derived columns: 1 per conditional split condition. It is working whereby it checks for errors, and depending on which condition is failed first, the data is output with a DQ_TYPE column populated with a certain code (e.g. DQ_001 if it had an error with the Hours condition, DQ_002 if it hit an error with the Consultant Code condition, and so on).
The problem is that I need to be able to see all of the errors within each row. For example at the moment, if Patient 101 has a row in the SQL table that has errors in all 5 columns, it'll fail the first condition in Conditional Split and 1 row will get output into my results with the code DQ_001. I would instead need it to be output 5 times, once for each error that it encountered, i.e. 1 row with DQ_001, a 2nd row with DQ_002, a 3rd row with DQ_003 and so on.
The goal is that I will use the DataQualityErrors SQL table to create an SSRS report that groups on DQ_TYPE and we can therefore Pie Chart to show the distribution of which error DQ_00X codes are most prevalent.
Is this possible using straightforward toolbox functions? Or is this only available with complex Script tasks, etc.?
Assuming I understand your problem, I would structure this as a series of columns added to the data flow via Derived Column transformation.
Assume I have inbound like this
SELECT col1, col2, col3, col4;
My business rules
col1 cannot contain nulls DQ_001
col2 must be greater than 5 DQ_002
col3 must be less than 3 DQ_003
col4 has no rules
From my source, I would add a Derived Column Component
New Column named Pass_DQ_001 as a boolean with an expression !isnull([col1])
New Column named Pass_DQ_002 as a boolean with an expression [col2] > 5
New Column named Pass_DQ_003 as a boolean with an expression [col3] < 3
etc
At this point, your data row could look something like
NULL, 4, 4, No Rules, False, False, False
ABC, 7, 2, Still No Rules, True, True, True
...
If you have more than 3 to 4 data quality conditions, I'd add a final Derived Column component into the mix
New column IsValid as yet another boolean with an expression like Pass_DQ_001 && Pass_DQ_002 && Pass_DQ_003 etc
The penalty for adding additional columns is trivial compared to trying to debug complex expressions in a dataflow so don't do it - especially for bit columns.
At this point, you can put a data viewer in there and verify that yes, all my logic is correct. If it's wrong, you can zip in and figure out why DQ_036 isn't flagging correctly.
Otherwise, you're ready to then connect the data flow to a Conditional Split. Use our final column IsValid and things that match that go out the Output 1 path and the default/unmatched rows head to your "needs attention/failed validation" destination.

Report Builder - Datediff between 2 columns

I have created a report with Report Builder 3.0 using columns groups.
The columns retrieve a datetime field. I would like to calculate the date difference between two columns.
For each individual the report should display 1 row and 2 columns. One of the columns can have null values. In this case, no calculation needs to be performed.
I cannot perform any calculation at the dataset level because I am using a stored procedure that I cannot modify.
I have tried to perform a calculation but the values are not correct. Moreover, the results are also changing if the Sort values change.
Random sort
Basically, I need to calculate the different between the 2 columns at the row level.
Any ideas?
Desired Output

CloverETL: Compare two records

I have two files, A and B. The records in both files share the same format and the first n characters of a record is its unique identifier. The record is of fixed length format and consists of m fields (field1, field2, field3, ...fieldm). File B contains new records and records in file A that have changed. How can I use cloverETL to determine which fields have changed in a record that appears in both files?
Also, how can I gather metrics on the frequency of changes for individual fiels. For example, I would like to know how many records had changes in fieldm.
This is typical example of Slowly Changing Dimension problem. Solution with CloverETL is described on theirs blog: Building Data Warehouse with CloverETL: Slowly Changing Dimension Type 1 and Building Data Warehouse with CloverETL: Slowly Changing Dimension Type 2.

Resources