I'm trying to utilize the geoEnrichIP processor as part of a nifi flow. I'm trying to follow the documentation https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-enrich-nar/1.6.0/org.apache.nifi.processors.GeoEnrichIP/ without luck.
I'm trying to attach the geoEnrichIP processor at the end of a convertRecord Processor.
ConvertRecord(Json) ---> geoEnrichIP
in the configuration for the geoEnrichIP I've added an attribute for the ip address field. The Field is Enrich: host_address But I'm not getting anything in my output. I don't think I'm referencing the field host_address which contains the IP Address.
How do you properly reference the ip address name of host_address to enrich with geolocation data?
Thanks
For GeoEnrichIP the field you want to enrich on must be an Attribute of the FlowFile, not part of the FlowFile content (e.g. inside a record).
The IP Address Attribute property must contain the name of the Attribute.
If the IP is in the FlowFile content, you'll need to extract the IP and put the value in an Attribute.
There are a few ways to do this, depending on your use case - but there's also an alternative approach.
If every FlowFile contains only a SINGLE Record, then you can use
EvaluateJsonPath to extract the IP and create an Attribute.
If every FlowFile contains MULTIPLE Records, with completely random IP addresses, you could use SplitJson to create unique FlowFiles and then EvaluateJsonPath (this is usually a pattern to avoid!)
If every FlowFile contains MULTIPLE Records, but the IP is one of a smaller set of common IP addresses, then you could use PartitionRecord to bucket Records into FlowFiles with a common IP Attribute.
However, rather than using GeoEnrichIP, you could instead use LookupRecord with an IPLookupService. In this way, you can handle either SINGLE or MULTIPLE Records per FlowFile and you do not need to deal with Attributes, instead relying on data within the Record itself. This handles all 3 cases listed above.
I wrote a post about using LookupRecord here if you need more details on how to use it, it's a very powerful processor for enrichment workflows.
Related
Is any ability in NiFi to take every file of one flow and merge it with another, that contains only one file?
In that way, I want apply the same attribute to all flow files.
Thanks in advance!
Merging flowfiles modifies the content of the flowfiles. If you want to modify an attribute of one (or more) flowfiles, use the UpdateAttribute processor. If the value of the attribute you want to apply is dynamic, you can use the LookupAttribute processor to retrieve the value from a lookup service and apply it.
I have a csv file
longtitude,lagtitude
34.094933,-118.30674
34.095028,-118.306625
(more to go)
I use UpdateRecord Processor (which support record processing) with CSVRecordSetWriter using RecordPath (https://nifi.apache.org/docs/nifi-docs/html/record-path-guide.html) to prepare gis field.
longtitude,lagtitude,gis
34.094933,-118.30674,"34.094933,-118.30674"
34.095028,-118.306625,"34.095028,-118.306625"
My next step is to retrieve gis as input parameter to a HTTP API, where this HTTP API returns info (poi) that I would like to store.
longtitude,lagtitude,gis,poi
34.094933,-118.30674,"34.094933,-118.30674","Restaurant A"
34.095028,-118.306625,"34.095028,-118.306625","Cinema X"
It seems like InvokeHTTP Processor does not process in record oriented way. Any possible solution to prepare the above without split it further?
When you want to enrich each record like this it is typically handled in NiFi by using the LookupRecord processor with a LookupService. It is basically saying, for each record in the incoming flow file, pass in some fields of the record to the lookup service, and take the results of the lookup and stored them back in the record.
For your example it sounds like you would want a RestLookupService:
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-lookup-services-nar/1.9.1/org.apache.nifi.lookup.RestLookupService/index.html
I am trying to use NiFi to break up an XML document into multiple flowfiles. The XML contains many elements from a web service. I am trying to process each event separately. I think EvaluateXQuery is the appropriate processor but I can't figure out to add my XQuery if the destination is a flowfile rather than an attribute. I know I have to add a property /value pair in the processor config/properties page but I can't figure out what the property name should be. Does it matter?
If you only need to extract one element, then yes, add a dynamic property with any name and set the destination to flowfile-content.
You can add multiple dynamic properties to the processor to extract elements into attributes on the outgoing flowfile. If you want to then replace the flowfile content with the attributes, you can use a processor like ReplaceText or AttributesToJson to combine multiple attributes into the flowfile content.
A couple things to remember:
extracting multiple large elements to attributes is an anti-pattern, as this will hurt performance on the heap
you might be better off splitting the XML file into chunks via SplitXML first in order to then extract a single element per chunk into the flowfile content (or an attribute)
I have a Kakfa topic which includes different types of messages sent from different sources.
I would like to use the ExtractGrok processor to extract the message based on the regular expression/grok pattern.
How do I configure or run the processor with multiple regular expression?
For example, the Kafka topic contains INFO, WARNING and ERROR log entries from different applications.
I would like to separate the different log levels messages and place then into HDFS.
Instead of Using ExtractGrok processor, use Partition Record processor in NiFi to partition as this processor
Evaluates one or more RecordPaths against the each record in the
incoming FlowFile.
Each record is then grouped with other "like records".
Configure/enable controller services
RecordReader as GrokReader
Record writer as your desired format
Then use PutHDFS processor to store the flowfile based on the loglevel attribute.
Flow:
1.ConsumeKafka processor
2.Partition Record
3.PutHDFS processor
Refer to this link describes all the steps how to configure PartitionRecord processor.
Refer to this link describes how to store partitions dynamically in HDFS directories using PutHDFS processor.
I have a JSON flow-file and I need determine if I should be doing an INSERT or UPDATE. The trick is to only update the columns that match the JSON attributes. I have an ExecuteSQL working and it returns executesql.row.count, however I've lose the original JSON flow-file which I was planing to use as a routeonattribute. I'm trying to get the MergeContent to join the ExecuteSQL (dump the Avro output, I only need the executesql.row.count attribute) with the JSON flow. I've set follow before I do the ExecuteSQL:
fragment.count=2
fragment.identifier=${UUID()}
fragment.index=${nextInt()}
Alternatively I could create a MERGE, if there is a way to loop through the list of JSON attributes that match the Oracle table?
How large is your JSON? If it's small, you might consider using ExtractText (matching the whole document) to get the JSON into an attribute. Then you can run ExecuteSQL, then ReplaceText to put the JSON back into the content (overwriting the Avro results). If your JSON is large, you could set up a DistributedMapCacheServer and (in a separate flow) run ExecuteSQL and store the value or executesql.row.count into the cache. Then in the JSON flow you can use FetchDistributedMapCache with the "Put Cache Value In Attribute" property set.
If you only need the JSON to use RouteOnAttribute, perhaps you could use EvaluateJsonPath before ExecuteSQL, so your conditions are already in attributes and you can replace the flow file contents.
If you want to use MergeContent, you can set fragment.count to 2, but rather than using the UUID() function, you could set "parent.identifier" to "${uuid}" using UpdateAttribute, then DuplicateFlowFile to create 2 copies, then UpdateAttribute to set "fragment.identifier" to "${parent.identifier}" and "fragment.index" to "${nextInt():mod(2)}". This gives a mergeable set of two flow files, you can route on fragment.index being 0 or 1, sending one to ExecuteSQL and one through the other flow, joining back up at MergeContent.
Another alternative is to use ConvertJSONToSQL set to "UPDATE", and if it fails, route those flow files to another ConvertJSONToSQL processor set to "INSERT".