Add serial numbers to a column in csv in apache nifi - apache-nifi

I'm reading a CSV file in Apache Nifi and now i want to add a column which is serial number and this column will have the serial number for all the rows in the CSV file. So if i have 10 rows the serial number would from 1 to 10.How can i achieve this through Nifi?
I've tried using getStateValue in the update processor but this gives me a static number. My UpdateAttribute and UpdateRecord are shown below.

Use QueryRecord processor and processor supports Apache-Calcite SQL language.
Add new dynamic property to QueryRecord processor by using ROW_NUMBER() window clause.
Example:
select *,ROW_NUMBER() over(<optional orderby clause>) as seq from FLOWFILE
Define RecordReader and RecordWriter controller services in query record processor to include seq column in avro schema.
Output flowfile from queryrecord processor now will include seq column to the flowfile.

Related

NiFi - How can i change ReplacementValue on ReplaceText?

I'm using ExecuteSQL,SplitAvro,ConvertAvroToJSON,EvaluateJsonPath,ReplaceText,ExecuteSQL.
I am trying to replace the content in the flowfile using replaceText processor.
Now, i can replace like this. -> INSERT INTO values (${id},'${name}') . ReplaceText process send excute sql like this :
INSERT into x values(1,'xx')
INSERT into x values(2,'yy')
INSERT into x values(3,'zz')
I'm sending INSERT query for each line.
But i want to send executesql process like this
INSERT INTO x values (1,'xx'),(2,'yy'),(3,'zz')
i'm not sure it's the best approach but you could do this:
SplitAvro # i guess you are splitting the records here (here you should get fragment.* attributes)
ConvertAvroToJSON # converting each record to json
EvaluateJsonPath # getting id,name values from json
ReplaceText # (${id}, '${name}')
MergeContent # merge rows back to single file with header and delimiter
Binary Concatenation
Header = Insert into X values
Demarcator = ,
ExecuteSQL
Although it won't generate exactly the SQL you're looking for, take a look at ConvertRecord and/or JoltTransformRecord -> PutDatabaseRecord. The former is used to get each of your Avro records into the form you want (id, name), and PutDatabaseRecord will use a PreparedStatement in batches to send the records to the database. It might not be quite as efficient as a single INSERT, but should be much more efficient than a Split -> Convert -> ExecuteSQL with separate INSERTs per FlowFile.
To truly get the SQL you want, you'll likely need a scripted processor such as InvokeScriptedProcessor with a RecordReader, I have a blog post on the subject.

How to set start and end row or interval rows for CSV in Nifi?

I want to get particular part of excel file in Nifi. My Nifi template like that;
GetFileProcessor
ConvertExcelToCSVProcessor
PutDatabaseRecordProcessor
I should parse data between step 2 and 3.
Is there a solution for getting specific rows and columns ?
Note:If there is a option for cutting ConvertExcelToCSVProcessor, it will work for me.
You can use Record processors between ConvertExcelToCSV and PutDatabaseRecord.
to remove or override a column use UpdateRecord. this processor can receive your data via CSVReader and prepare an output for PutDatabaseRecord or QueryRecord . check View usage -> Additional Details...
in order to filter by column use QueryRecord.
here an example. this example receives data through CSVReader and makes some aggregations, you can as well do some filtering according to doc
also this post had helped me to understand Records in Nifi

Passing parameter from different source into insert statement using Nifi

I'm still new in NiFi. What I want to achieve is to pass a parameter from a different source.
Scenario:
I have 2 datasource which is Json data and record id (from oracle function). I declared record id using extract text as "${recid}" and json string default is "$1" .
How to insert into table using sql statement insert into table1 (json,recid) value ('$1','${recid}')
After I run the processor. I'm not able to get both attribute into one insert statement.
Please help.
Nifi flowfile
Flowfile after mergecontent
you should merge these 2 flowfiles to make one.
Use mergeFlowfile processor with Attribute Strategy set to Keep All Unique Attributes
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.6.0/org.apache.nifi.processors.standard.MergeContent/index.html
Take a look at LookupAttribute with a SimpleDatabaseLookupService. You can pass your JSON flow file into that, look up the recid into an attribute, then do the ExtractText -> ReplaceText to get it into SQL form.

Adding column at the end to pipe delimited file in NiFi

I have this particular pipe delimited file in a SFTP server
PROPERTY_ID|START_DATE|END_DATE|CAPACITY
1|01-JAN-07|31-DEC-30|101
2|01-JAN-07|31-DEC-30|202
3|01-JAN-07|31-DEC-30|151
4|01-JAN-07|31-DEC-30|162
5|01-JAN-07|31-DEC-30|224
I need to transfer this data to S3 bucket using NiFi. In this process I need to add another column which is today's date at the end.
PROPERTY_ID|START_DATE|END_DATE|CAPACITY|AS_OF_DATE
1|01-JAN-07|31-DEC-30|101|20-10-2020
2|01-JAN-07|31-DEC-30|202|20-10-2020
3|01-JAN-07|31-DEC-30|151|20-10-2020
4|01-JAN-07|31-DEC-30|162|20-10-2020
5|01-JAN-07|31-DEC-30|224|20-10-2020
what is the simple way to implement this in NiFi?
#Naga here is a very similar post that describes the ways to solve adding a new column on CSV:
Apache NiFi: Add column to csv using mapped values
The simplest way is ReplaceText to append the same "|20-10-2020" to each line. ReplaceText settings will be evaluate line by line and Regex: $1|20-10-2020. The other methods are additional ways to do that more dynamically, for example if the date isnt static.

Migrating table with PutDatabaseRecord with different column name at the target table

I need to migrate the data from a db2 table to a mssql table but one column has a different name, but the same datatype.
Db2 table:
NROCTA,NUMRUT,DIASMORA2
MSSQL table:
NROCTA,NUMRUT,DIAMORAS
As you see DIAMORAS is different.
Im using the following flow:
ExecuteSQL -> SplitAvro -> PutDatabaseRecord
In PutDataBaseRecord I have as RecordReader an AvroReader configured in this way:
Schema Acesss Strategy: Use Embedded Avro Schema.
Schema Text: ${avro.schema}
The flow just insert the two first columns.¿How I can do the mapping between DIASMORA2 and DIAMORAS columns ?
Thanks in advance!
First thing, you probably don't need SplitAvro in your flow at all, unless there's some logical subset of rows that you are trying to send as individual transactions.
For the column name change, use UpdateRecord and set the field /DIASMORAS to the record path /DIASMORA2, and change the name of the field in the AvroRecordSetWriter's schema from DIASMORA2 to DIASMORAS.
That last part is a little trickier since you are using the embedded schema in your AvroReader. If the schema will always be the same, you can stop the UpdateRecord processor and put in an ExtractAvroMetadata processor to extract the avro.schema attribute. That will put the embedded schema in the flowfile's avro.schema attribute.
Then before you start UpdateRecord, start the ExecuteSQL and ExtractAvroMetadata processors, then inspect a flow file in the queue to copy the schema out of the avro.schema attribute. Then in your AvroRecordSetWriter in ConvertRecord, instead of Inheriting the schema, you can choose to Use Schema Text, then paste in the schema from the attribute, changing DIASMORA2 to DIASMORAS. This approach puts values from the DIASMORA2 field into the DIASMORAS field, but since DIASMORA2 is not in the output schema, it is ignored, thereby effectively renaming the field (although under the hood it is a copy-and-remove).

Resources