I have a .csv file, there is a column call first_name, in my UpdateRecord processor I have created a property call "/first_name" -> value "${field.value:toUpper()}. But processor is failing to update.
Related
I'm using ExecuteSQL,SplitAvro,ConvertAvroToJSON,EvaluateJsonPath,ReplaceText,ExecuteSQL.
I am trying to replace the content in the flowfile using replaceText processor.
Now, i can replace like this. -> INSERT INTO values (${id},'${name}') . ReplaceText process send excute sql like this :
INSERT into x values(1,'xx')
INSERT into x values(2,'yy')
INSERT into x values(3,'zz')
I'm sending INSERT query for each line.
But i want to send executesql process like this
INSERT INTO x values (1,'xx'),(2,'yy'),(3,'zz')
i'm not sure it's the best approach but you could do this:
SplitAvro # i guess you are splitting the records here (here you should get fragment.* attributes)
ConvertAvroToJSON # converting each record to json
EvaluateJsonPath # getting id,name values from json
ReplaceText # (${id}, '${name}')
MergeContent # merge rows back to single file with header and delimiter
Binary Concatenation
Header = Insert into X values
Demarcator = ,
ExecuteSQL
Although it won't generate exactly the SQL you're looking for, take a look at ConvertRecord and/or JoltTransformRecord -> PutDatabaseRecord. The former is used to get each of your Avro records into the form you want (id, name), and PutDatabaseRecord will use a PreparedStatement in batches to send the records to the database. It might not be quite as efficient as a single INSERT, but should be much more efficient than a Split -> Convert -> ExecuteSQL with separate INSERTs per FlowFile.
To truly get the SQL you want, you'll likely need a scripted processor such as InvokeScriptedProcessor with a RecordReader, I have a blog post on the subject.
I'm still new in NiFi. What I want to achieve is to pass a parameter from a different source.
Scenario:
I have 2 datasource which is Json data and record id (from oracle function). I declared record id using extract text as "${recid}" and json string default is "$1" .
How to insert into table using sql statement insert into table1 (json,recid) value ('$1','${recid}')
After I run the processor. I'm not able to get both attribute into one insert statement.
Please help.
Nifi flowfile
Flowfile after mergecontent
you should merge these 2 flowfiles to make one.
Use mergeFlowfile processor with Attribute Strategy set to Keep All Unique Attributes
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.6.0/org.apache.nifi.processors.standard.MergeContent/index.html
Take a look at LookupAttribute with a SimpleDatabaseLookupService. You can pass your JSON flow file into that, look up the recid into an attribute, then do the ExtractText -> ReplaceText to get it into SQL form.
I am creating end to end flow to consume data into HDFS by using Consume Kafka for the Json files received through tealium event stream. Currently, I have used Consume Kafka -> Evaluate Json Path -> Jolttransform Json -> Merge Content -> Evaluate Json Path -> Update attribute -> PutHDFS
The requirement is to read Json Data for entire day spools into a single file referring to attribute postdate(covert epoch to YYYYMMDDSS timestamp before) and read data daily to merge into a single file and finally rename the file as per the Timestamp related to POST_DATE field to differentiate daily files. I have done all the part except renaming time stamp for the merged file as per the source attribute timestamp field. Could you please help me how to rename the file as per the attribute _year_month_day?
If you want to parse "year" and "month" from POST_DATE attribute, you can use format and toDate function.
For example:
-- year
format(toDate(${POST_DATE}, "YYYYMMDDSS"),"yyyy")
-- month
format(toDate(${POST_DATE}, "YYYYMMDDSS"),"MM")
--day
format(toDate(${POST_DATE}, "YYYYMMDDSS"),"dd")
I'm not sure the meaning of Rename the file, if it means changing file name before put to HDFS, you can simply use UpdateAttribute processor then update attribute contains the output file name like ${year}_${month}_${day}.
#gogocatmario, thanks for the response.
Issue resolved post adding the following value for the filename property on update_attribute.
tealium_es_${post_date:toDate("yyyy-MM-dd HH:mm:ss"):format("yyyy_MM_dd")}.json1
I'm reading a CSV file in Apache Nifi and now i want to add a column which is serial number and this column will have the serial number for all the rows in the CSV file. So if i have 10 rows the serial number would from 1 to 10.How can i achieve this through Nifi?
I've tried using getStateValue in the update processor but this gives me a static number. My UpdateAttribute and UpdateRecord are shown below.
Use QueryRecord processor and processor supports Apache-Calcite SQL language.
Add new dynamic property to QueryRecord processor by using ROW_NUMBER() window clause.
Example:
select *,ROW_NUMBER() over(<optional orderby clause>) as seq from FLOWFILE
Define RecordReader and RecordWriter controller services in query record processor to include seq column in avro schema.
Output flowfile from queryrecord processor now will include seq column to the flowfile.
I am currently getting files from FTP in Nifi, but I have to check some conditions before I fetch the file. The scenario goes some thing like this.
List FTP -> Check Condition -> Fetch FTP
In the Check Condition part, I have fetch some values from DB and compare with the file name. So can I use update attribute to fetch some records from DB and make it like this?
List FTP -> Update Attribute (from DB) -> Route on Attribute -> Fetch FTP
I think your flow looks something like below
Flow:
1.ListFTP //to list the files
2.ExecuteSQL //to execute query in db(sample query:select max(timestamp) db_time from table)
3.ConvertAvroToJson //convert the result of executesql to json format
4.EvaluateJsonPath //keep destination as FlowfileAttribute and add new property as db_time as $.db_time
5.ROuteOnAttribute //perform check filename timestamp vs extracted timestamp by using nifi expresson language
6.FetchFile //if condition is true then fetch the file
RouteOnAttribute Configs:
I have assumed filename is something like fn_2017-08-2012:09:10 and executesql has returned 2017-08-2012:08:10
Expression:
${filename:substringAfter('_'):toDate("yyyy-MM-ddHH:mm:ss"):toNumber()
:gt(${db_time:toDate("yyyy-MM-ddHH:mm:ss"):toNumber()})}
By using above expression we are having filename value same as ListFTP filename and db_time attribute is added by using EvaluateJsonPath processor and we are changing the time stamp to number then comparing.
Refer to this link for more details regards to NiFi expression language.
So if I understand your use case correctly, it is like you are using the external DB only for tracking purpose. So I guess only the latest processed timestamp is enough. In that case, I would suggest you to use DistributedCache processors and ControllerServices offered by NiFi instead of relying on an external DB.
With this method, your flow would be like:
ListFile --> FetchDistributedMapCache --(success)--> RouteOnAttribute -> FetchFile
Configure FetchDistributedMapCache
Cache Entry Identifier - This is the key for your Cache. Set it to something like lastProcessedTime
Put Cache Value In Attribute - Whatever name you give here will be added as a FlowFile attribute with its value being the Cache value. Provide a name, like latestTimestamp or lastProcessedTime
Configure RouteOnAttribute
Create a new dynamic relationship by clicking the (+) button in the Properties tab. Give it a name, like success or matches. Let's assume, your filenames are of the format somefile_1534824139 i.e. it has a name and an _ and the epoch timestamp appended.
In such case, you can leverage NiFi Expression Language and make use of the functions it offer. So for the new dynamic relation, you can have an expression like:
success - ${filename:substringAfter('_'):gt(${lastProcessedTimestamp})}
This is with the assumption that, in FetchDistributedMapCache, you have configured the property Put Cache Value In Attribute with the value lastProcessedTimestamp.
Useful Links
https://community.hortonworks.com/questions/83118/how-to-put-data-in-putdistributedmapcache.html
https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#dates