Apache Nifi - get the file attributes and construct the json through custom processor - apache-nifi

I am using a custom processor for csv to json conversion which converts the csv file data into a json array which contains json objects of the data.
My requirement is to get the file attributes like filename, uuid, path etc. and construct a json from these.
Question:
How can I get the related attributes of the file and construct the a json object appending it to the same json getting constructed before.
Just been few days working with apache nifi, so just going with the exact requirements now with the custom processor.

I can't speak to which attributes are being written for your custom processor, but there is a set of core attributes that most/all flow files have, such as filename and uuid. If you are using GetFile or ListFile/FetchFile to read in your CSV file, you will have those and a number of other attributes available (see the doc for more info).
When you have a flow file that has the appropriate attributes set, you can use the AttributesToJSON processor to create a JSON object containing a flat list of the specified attributes, and that object can replace the flow file content or become its own attribute (named 'JSONAttributes') depending on the setting of the "Destination" property of AttributesToJSON.

Related

Apache Nifi - how to write Json to a database table column?

I have an json array that I need to write to a database (as text).
I have two options:
Write as an array of objects, so the field would contain [{},{},{}]
Write each record as an object, so the field would contain {}
The problem is that nifi does not know how to map the json object to a specific database field on PutDatabaseRecord.
How to I map it?
Here is my flow:
You should use a combination of
ConvertAvroToJSON >> SplitJson(if you have multiple) >> ConvertJsontoSQL >> PutSQL
In the convertJsonToSQL you will have to set the db,schema,table for the incoming json payload to map to.
The config options are self explainatory for the convertJsonToSQL processor

How to do data transformation using Apache NiFi standrad processor?

I have to do data transfomration using Apache NiFi standard processor for below mentioned input data. I have to add two new fields class and year and drop extra price fields.
Below are my input data and transformed data.
Input data
Expected output
Disclaimer: I am assuming that your input headers are not dynamic, which means that you can maintain a predictable input schema. If that is true, you can do this with the standard processors as of 1.12.0, but it will require a little work.
Here's a blog post of mine about how to use ScriptedTransformRecord to take input from one schema, build a new data structure and mix it with another schema. It's a bit involved.
I've used that methodology recently to convert a much larger set of data into summary records, so I know it works. The summary of what's involved is this:
Create two schemas, one that matches input and one for output.
Set up ScriptedTransformRecord to use a writer that explicitly sets which schema to use since ScriptedTransformRecord doesn't support the ability to change the schema configuration internally.
Create a fat jar with Maven or Gradle that compiles your Avro schema into an object that can be used with the NiFi API to expose a static RecordSchema (NiFi API) to your script.
Write a Groovy script that generates a new MapRecord.

How to split the xml file using apache nifi?

l have a 20GB XML file in my local system, I want to split the data into multiple chunks and also I want to remove specific attributes from that file. how will I achieve using Nifi?
Use SplitRecord processor and define XML Reader/Writer controller services to read the xml data and write only the required attributes into your result xml.
Also define Records Per Split property value to include how many records you needed for each split.

Apache NiFi EvaluateXQuery

I am trying to use NiFi to break up an XML document into multiple flowfiles. The XML contains many elements from a web service. I am trying to process each event separately. I think EvaluateXQuery is the appropriate processor but I can't figure out to add my XQuery if the destination is a flowfile rather than an attribute. I know I have to add a property /value pair in the processor config/properties page but I can't figure out what the property name should be. Does it matter?
If you only need to extract one element, then yes, add a dynamic property with any name and set the destination to flowfile-content.
You can add multiple dynamic properties to the processor to extract elements into attributes on the outgoing flowfile. If you want to then replace the flowfile content with the attributes, you can use a processor like ReplaceText or AttributesToJson to combine multiple attributes into the flowfile content.
A couple things to remember:
extracting multiple large elements to attributes is an anti-pattern, as this will hurt performance on the heap
you might be better off splitting the XML file into chunks via SplitXML first in order to then extract a single element per chunk into the flowfile content (or an attribute)

Best approach to determine Oracle INSERT or UPDATE using NiFi

I have a JSON flow-file and I need determine if I should be doing an INSERT or UPDATE. The trick is to only update the columns that match the JSON attributes. I have an ExecuteSQL working and it returns executesql.row.count, however I've lose the original JSON flow-file which I was planing to use as a routeonattribute. I'm trying to get the MergeContent to join the ExecuteSQL (dump the Avro output, I only need the executesql.row.count attribute) with the JSON flow. I've set follow before I do the ExecuteSQL:
fragment.count=2
fragment.identifier=${UUID()}
fragment.index=${nextInt()}
Alternatively I could create a MERGE, if there is a way to loop through the list of JSON attributes that match the Oracle table?
How large is your JSON? If it's small, you might consider using ExtractText (matching the whole document) to get the JSON into an attribute. Then you can run ExecuteSQL, then ReplaceText to put the JSON back into the content (overwriting the Avro results). If your JSON is large, you could set up a DistributedMapCacheServer and (in a separate flow) run ExecuteSQL and store the value or executesql.row.count into the cache. Then in the JSON flow you can use FetchDistributedMapCache with the "Put Cache Value In Attribute" property set.
If you only need the JSON to use RouteOnAttribute, perhaps you could use EvaluateJsonPath before ExecuteSQL, so your conditions are already in attributes and you can replace the flow file contents.
If you want to use MergeContent, you can set fragment.count to 2, but rather than using the UUID() function, you could set "parent.identifier" to "${uuid}" using UpdateAttribute, then DuplicateFlowFile to create 2 copies, then UpdateAttribute to set "fragment.identifier" to "${parent.identifier}" and "fragment.index" to "${nextInt():mod(2)}". This gives a mergeable set of two flow files, you can route on fragment.index being 0 or 1, sending one to ExecuteSQL and one through the other flow, joining back up at MergeContent.
Another alternative is to use ConvertJSONToSQL set to "UPDATE", and if it fails, route those flow files to another ConvertJSONToSQL processor set to "INSERT".

Resources