I am trying to use NiFi to break up an XML document into multiple flowfiles. The XML contains many elements from a web service. I am trying to process each event separately. I think EvaluateXQuery is the appropriate processor but I can't figure out to add my XQuery if the destination is a flowfile rather than an attribute. I know I have to add a property /value pair in the processor config/properties page but I can't figure out what the property name should be. Does it matter?
If you only need to extract one element, then yes, add a dynamic property with any name and set the destination to flowfile-content.
You can add multiple dynamic properties to the processor to extract elements into attributes on the outgoing flowfile. If you want to then replace the flowfile content with the attributes, you can use a processor like ReplaceText or AttributesToJson to combine multiple attributes into the flowfile content.
A couple things to remember:
extracting multiple large elements to attributes is an anti-pattern, as this will hurt performance on the heap
you might be better off splitting the XML file into chunks via SplitXML first in order to then extract a single element per chunk into the flowfile content (or an attribute)
Related
I'm trying to utilize the geoEnrichIP processor as part of a nifi flow. I'm trying to follow the documentation https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-enrich-nar/1.6.0/org.apache.nifi.processors.GeoEnrichIP/ without luck.
I'm trying to attach the geoEnrichIP processor at the end of a convertRecord Processor.
ConvertRecord(Json) ---> geoEnrichIP
in the configuration for the geoEnrichIP I've added an attribute for the ip address field. The Field is Enrich: host_address But I'm not getting anything in my output. I don't think I'm referencing the field host_address which contains the IP Address.
How do you properly reference the ip address name of host_address to enrich with geolocation data?
Thanks
For GeoEnrichIP the field you want to enrich on must be an Attribute of the FlowFile, not part of the FlowFile content (e.g. inside a record).
The IP Address Attribute property must contain the name of the Attribute.
If the IP is in the FlowFile content, you'll need to extract the IP and put the value in an Attribute.
There are a few ways to do this, depending on your use case - but there's also an alternative approach.
If every FlowFile contains only a SINGLE Record, then you can use
EvaluateJsonPath to extract the IP and create an Attribute.
If every FlowFile contains MULTIPLE Records, with completely random IP addresses, you could use SplitJson to create unique FlowFiles and then EvaluateJsonPath (this is usually a pattern to avoid!)
If every FlowFile contains MULTIPLE Records, but the IP is one of a smaller set of common IP addresses, then you could use PartitionRecord to bucket Records into FlowFiles with a common IP Attribute.
However, rather than using GeoEnrichIP, you could instead use LookupRecord with an IPLookupService. In this way, you can handle either SINGLE or MULTIPLE Records per FlowFile and you do not need to deal with Attributes, instead relying on data within the Record itself. This handles all 3 cases listed above.
I wrote a post about using LookupRecord here if you need more details on how to use it, it's a very powerful processor for enrichment workflows.
Is any ability in NiFi to take every file of one flow and merge it with another, that contains only one file?
In that way, I want apply the same attribute to all flow files.
Thanks in advance!
Merging flowfiles modifies the content of the flowfiles. If you want to modify an attribute of one (or more) flowfiles, use the UpdateAttribute processor. If the value of the attribute you want to apply is dynamic, you can use the LookupAttribute processor to retrieve the value from a lookup service and apply it.
l have a 20GB XML file in my local system, I want to split the data into multiple chunks and also I want to remove specific attributes from that file. how will I achieve using Nifi?
Use SplitRecord processor and define XML Reader/Writer controller services to read the xml data and write only the required attributes into your result xml.
Also define Records Per Split property value to include how many records you needed for each split.
I have a JSON flow-file and I need determine if I should be doing an INSERT or UPDATE. The trick is to only update the columns that match the JSON attributes. I have an ExecuteSQL working and it returns executesql.row.count, however I've lose the original JSON flow-file which I was planing to use as a routeonattribute. I'm trying to get the MergeContent to join the ExecuteSQL (dump the Avro output, I only need the executesql.row.count attribute) with the JSON flow. I've set follow before I do the ExecuteSQL:
fragment.count=2
fragment.identifier=${UUID()}
fragment.index=${nextInt()}
Alternatively I could create a MERGE, if there is a way to loop through the list of JSON attributes that match the Oracle table?
How large is your JSON? If it's small, you might consider using ExtractText (matching the whole document) to get the JSON into an attribute. Then you can run ExecuteSQL, then ReplaceText to put the JSON back into the content (overwriting the Avro results). If your JSON is large, you could set up a DistributedMapCacheServer and (in a separate flow) run ExecuteSQL and store the value or executesql.row.count into the cache. Then in the JSON flow you can use FetchDistributedMapCache with the "Put Cache Value In Attribute" property set.
If you only need the JSON to use RouteOnAttribute, perhaps you could use EvaluateJsonPath before ExecuteSQL, so your conditions are already in attributes and you can replace the flow file contents.
If you want to use MergeContent, you can set fragment.count to 2, but rather than using the UUID() function, you could set "parent.identifier" to "${uuid}" using UpdateAttribute, then DuplicateFlowFile to create 2 copies, then UpdateAttribute to set "fragment.identifier" to "${parent.identifier}" and "fragment.index" to "${nextInt():mod(2)}". This gives a mergeable set of two flow files, you can route on fragment.index being 0 or 1, sending one to ExecuteSQL and one through the other flow, joining back up at MergeContent.
Another alternative is to use ConvertJSONToSQL set to "UPDATE", and if it fails, route those flow files to another ConvertJSONToSQL processor set to "INSERT".
I am using a custom processor for csv to json conversion which converts the csv file data into a json array which contains json objects of the data.
My requirement is to get the file attributes like filename, uuid, path etc. and construct a json from these.
Question:
How can I get the related attributes of the file and construct the a json object appending it to the same json getting constructed before.
Just been few days working with apache nifi, so just going with the exact requirements now with the custom processor.
I can't speak to which attributes are being written for your custom processor, but there is a set of core attributes that most/all flow files have, such as filename and uuid. If you are using GetFile or ListFile/FetchFile to read in your CSV file, you will have those and a number of other attributes available (see the doc for more info).
When you have a flow file that has the appropriate attributes set, you can use the AttributesToJSON processor to create a JSON object containing a flat list of the specified attributes, and that object can replace the flow file content or become its own attribute (named 'JSONAttributes') depending on the setting of the "Destination" property of AttributesToJSON.