How to handle the string separated by a comma in csvwriter processinggroup nifi - apache-nifi

I'm having a csv file with two columns for example column A and Columb B. Column B consists of string value like this : I am, doing good. so when I try to insert this data into a database only the string I am is getting inserted. I just want to know what attribute I need to add to the process group so that I am, doing good will get inserted to the database
The attached image consists of the attributes in the current process group

Related

Informatica - Concatenate Max value from each colum present in multiple rows for same Primary Key

enter image description here
I have tried traditional approach of using Agg (Group By: ID, Store Name) and Max(Each Object) columns separately.
Then in next expression, Concat(Val1 Val2 Val3 || Val4).
How ever, I'm getting output as '0100'.
But, REQUIRED OUTPUT: 1100
Please let me know, how this can be done in IICS.
IICS is similar to the Powercenter on-prem.
First use an aggregator.
in Group By tab add ID, Store Name
in Aggregate tab add max(object1)... please note to set data type and length correctly.
Then use an expression transformation.
link ID, Store Name first.
Then concat the max_* columns using pipe -
out_max=max_col1||max_col2||... please note to set data type and length correctly.
This should generate correct output. I think you are having wrong output because of data length or data type of object fields. Make sure you trim spaces from object data before aggregator.

Merge two bag and get all the field from first bag in pig

I am new to PIG scripting. need some help on this issue.
I got two set of bag in pig and from there I want to get all the field from first bag and overwrite data of first bag if second bag has the data of same field
Column list are dynamic (columns may get added or deleted any time).
in set b we may get data in another field also which are currently blank, if so, then we need to overwrite set a with data available in set b
columns - uniqueid,catagory,b,c,d,e,f,region,g,h,date,direction,indicator
EG:
all_data= COGROUP a by (uniqueid), b by (uniqueid);
Output:
(1,{(1,test,,,,,,,,city,,,,,2020-06-08T18:31:09.000Z,west,,,,,,,,,,,,,A)},{(1,,,,,,,,,,,,,,2020-09-08T19:31:09.000Z,,,,,,,,,,,,,,N)})
(2,{(2,test2,,,,,,,,dist,,,,,2020-08-02T13:06:16.000Z,east,,,,,,,,,,,,A)},{(2,,,,,,,,,,,,,,2020-09-08T18:31:09.000Z,,,,,,,,,,,,,,N)})
Expected Result:
(1,test,,,,,,,,city,,,,,2020-09-08T19:31:09.000Z,west,,,,,,,,,,,,,N)
(2,test2,,,,,,,,dist,,,,,2020-09-08T18:31:09.000Z,east,,,,,,,,,,,,N)
I was able to achieve expected output with below
final = FOREACH all_data GENERATE flatten($1),flatten($2.(region)) as region ,flatten($2.(indicator)) as indicator;

How to extract multiple values as multiple column data from filename by Informatica PowerCenter?

I am very new to Informatica PowerCenter, Just started learning. Looking for help. My requirement is : I have to extract data from flat file(CSV file) and store the data into Oracle Table. Some of the column value of the target table should be coming from extracting file name.
For example:
My Target Table is like below:
USER_ID Program_Code Program_Desc Visit Date Term
EACRP00127 ER Special Visits 08/02/2015 Aug 2015
My input filename is: Aug 2015 ER Special Visits EACRP00127.csv
From this FileName I have to extract "AUG 2015" as Term, "ER Special Visits" as Program_Desc and "EACRP00127" as Program_Code along with some other fields from the CSV file.
I have found one solution using "Currently Processed Filename". But with this I am able to get one single value from filename. how can I extract 3 values from the filename and store in the target table? Looking for some shed of light towards solution. Thank you.
Using expression transformation you can create three output values from Currently Processed Filename column.
So you get the file name from SQ using this field 'Currently Processed Filename'. Then you can substring the whole string to get what you want.
input/output = Currently Processed Filename
o_Term = substr(Currently Processed Filename,1,9)
o_Program_Desc = substr(Currently Processed Filename,10,18)
o_Program_Code = substr(Currently Processed Filename,28,11)

NiFi : Create file name using the field values from the csv flowfile

I have a csv flowfile with single record. I need to create its file name based on couple of column values in the csv file. Can you please let me know how we can do it by using the column name only not the position of the column as column position may change. Example 
CSV File
Name , City, State, Country, Gender
John, Dallas, Texas, USA, M
File name should be John_USA.csv
I am trying extract text processor and pulling the first data row using -
row = ^.\r?\n(.)
And then updateattribute processor I am pulling the values from the columns using below expression
${row:getDelimitedField(1)}_${row:getDelimitedField(4)}.csv
But this use the position of the column not the column name. How can I build it using the column name not the position of columns
The way I will do it (maybe be not the efficient one):
Convert the CSV to json
Pass content to attributes (so you can access the field you want like dictionnary (key-value))
Update Attributes
Convert it back to CSV (thus you can control the schema, and the position of the fields).

How to find columns count of csv(Excel) sheet in ETL?

To count the rows of csv file we can use Get Files Rows Count Input in etl. How to find the number columns of a csv file?
Just read the first row of the CSV file using Text-File-Input setting header rows to 0. Usually, the first row contains field names. If you read the whole row into a single field, you can use Split-Field-To-Rows to have a single fieldname per row and the number of rows tells you the number of fields. There are other ways, but this one easily prepares for a subsequent metadata injection - if that's what you have in mind.
No Need of Metadata injection , In Split-Field-To-Rows, check "Include rownum in output" and give some name to that Variable. Then apply sort rows on that Variable, use Sample rows, then you will get number of fields which are present in the file.

Resources