Is any ability in NiFi to take every file of one flow and merge it with another, that contains only one file?
In that way, I want apply the same attribute to all flow files.
Thanks in advance!
Merging flowfiles modifies the content of the flowfiles. If you want to modify an attribute of one (or more) flowfiles, use the UpdateAttribute processor. If the value of the attribute you want to apply is dynamic, you can use the LookupAttribute processor to retrieve the value from a lookup service and apply it.
Related
I have a below scenario, I am trying
get new files list with ListFile processor
set a constant variable zipFilesBundleConstant = listBundle on each flowfile
Put the list to Database
Get all the list of files old and new from Database to process further with ExecuteSQL processor. (Here I want to make only one Database call to fetch complete list old and new, but ExecuteSQL is being called for all the flowfiles)
I tried keeping MergeContent processor with zipFilesBundleConstant as Correlation Attribute Name before ExecuteSQL to combine all the flowfiles but that is not working as expected and it merges some but always gives me multiple flowfiles.
Can anyone please help me with a solution on how to make a one call after inserting the new files list into the database.
You can use ExecuteSQL processor has separate workflow to fetch existing old files list from the database with the scheduling strategy as per the requirement.
https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#scheduling-tab
l have a 20GB XML file in my local system, I want to split the data into multiple chunks and also I want to remove specific attributes from that file. how will I achieve using Nifi?
Use SplitRecord processor and define XML Reader/Writer controller services to read the xml data and write only the required attributes into your result xml.
Also define Records Per Split property value to include how many records you needed for each split.
I am trying to use NiFi to break up an XML document into multiple flowfiles. The XML contains many elements from a web service. I am trying to process each event separately. I think EvaluateXQuery is the appropriate processor but I can't figure out to add my XQuery if the destination is a flowfile rather than an attribute. I know I have to add a property /value pair in the processor config/properties page but I can't figure out what the property name should be. Does it matter?
If you only need to extract one element, then yes, add a dynamic property with any name and set the destination to flowfile-content.
You can add multiple dynamic properties to the processor to extract elements into attributes on the outgoing flowfile. If you want to then replace the flowfile content with the attributes, you can use a processor like ReplaceText or AttributesToJson to combine multiple attributes into the flowfile content.
A couple things to remember:
extracting multiple large elements to attributes is an anti-pattern, as this will hurt performance on the heap
you might be better off splitting the XML file into chunks via SplitXML first in order to then extract a single element per chunk into the flowfile content (or an attribute)
I have a JSON flow-file and I need determine if I should be doing an INSERT or UPDATE. The trick is to only update the columns that match the JSON attributes. I have an ExecuteSQL working and it returns executesql.row.count, however I've lose the original JSON flow-file which I was planing to use as a routeonattribute. I'm trying to get the MergeContent to join the ExecuteSQL (dump the Avro output, I only need the executesql.row.count attribute) with the JSON flow. I've set follow before I do the ExecuteSQL:
fragment.count=2
fragment.identifier=${UUID()}
fragment.index=${nextInt()}
Alternatively I could create a MERGE, if there is a way to loop through the list of JSON attributes that match the Oracle table?
How large is your JSON? If it's small, you might consider using ExtractText (matching the whole document) to get the JSON into an attribute. Then you can run ExecuteSQL, then ReplaceText to put the JSON back into the content (overwriting the Avro results). If your JSON is large, you could set up a DistributedMapCacheServer and (in a separate flow) run ExecuteSQL and store the value or executesql.row.count into the cache. Then in the JSON flow you can use FetchDistributedMapCache with the "Put Cache Value In Attribute" property set.
If you only need the JSON to use RouteOnAttribute, perhaps you could use EvaluateJsonPath before ExecuteSQL, so your conditions are already in attributes and you can replace the flow file contents.
If you want to use MergeContent, you can set fragment.count to 2, but rather than using the UUID() function, you could set "parent.identifier" to "${uuid}" using UpdateAttribute, then DuplicateFlowFile to create 2 copies, then UpdateAttribute to set "fragment.identifier" to "${parent.identifier}" and "fragment.index" to "${nextInt():mod(2)}". This gives a mergeable set of two flow files, you can route on fragment.index being 0 or 1, sending one to ExecuteSQL and one through the other flow, joining back up at MergeContent.
Another alternative is to use ConvertJSONToSQL set to "UPDATE", and if it fails, route those flow files to another ConvertJSONToSQL processor set to "INSERT".
I am using Apache nifi to process the data from different resources and I have independent pipelines created for each data flow. I want to combine this data to process further. Is there any way I can aggregate the data and write it to a single file. The data is present in the form of flowfiles attributes in Nifi.
You should use the MergeContent processor, which accepts configuration values for min/max batch size, etc. and combines a number of flowfiles into a single flowfile according to the provided merge strategy.