How to take Entire flowfile content in nifi processor - apache-nifi

I am using nifi to develop the data drifting. In my flow using SelectHiveQL processor. The output(flowFile) of the selectHiveQL need to take into next processor.
what is the suitable processor to take the flowFile content and store into userdefined variable have to use the same variable in Executescript to manipulate the data.

The ExecuteScript processor has direct access to the content of the incoming flowfile via the standard API. Here is an example:
def flowFile = session.get();
if (flowFile == null) {
return;
}
// This uses a closure acting as a StreamCallback to do the writing of the new content to the flowfile
flowFile = session.write(flowFile,
{ inputStream, outputStream ->
String line
// This code creates a buffered reader over the existing flowfile input
final BufferedReader inReader = new BufferedReader(new InputStreamReader(inputStream, 'UTF-8'))
// For each line, write the reversed line to the output
while (line = inReader.readLine()) {
outputStream.write("${line.reverse()}\n".getBytes('UTF-8'))
}
} as StreamCallback)
flowFile = session?.putAttribute(flowFile, "reversed_lines", "true")
session.transfer(flowFile, /*ExecuteScript.*/ REL_SUCCESS)
It is dangerous to move the flowfile content to an attribute because attributes and content memory are managed differently in NiFi. There is a more detailed explanation of the differences in the Apache NiFi In Depth guide.

You could use ExtractText to extract the content of your flowfile to an attribute.
In the ExtractText processor, you would create a property(the name you give this property will be a new attribute in your flowfile), and the value of the property will be the regular expression (\A.+\Z). In my experience, this regex is enough to capture the entire content of the flowfile, though I suppose mileage could vary depending on the type of content within your flowfile.

Related

How can update a flow file attribute value data type (String to byte) in the NiFi

I am using NiFi version 1.8.0.3.3.0.0-165, and not getting an idea for converting an attribute value data type (String to byte).
Is it possible to convert the data type of NiFi flow file attribute.
for attributes you can use this guide
Apache NiFi Expression Language Guide
if you don't find the solution you can use a groovy script to load your attribute and do whatever you want
def flowFile = session.get()
if(!flowFile) return
def val = flowFile.getAttribute('yourattribue')
//mod your val
flowFile = session.putAttribute(flowFile, 'yourattributeout', yourattributeout)
session.transfer(flowFile, REL_SUCCESS)

How to store json object in a variable using apache nifi?

The following flowfile is the response of an "InvokeHttp":
[
{"data1":"[{....},{...},{....}]","info":"data-from_site"},
{"data2":"[{....},{...},{....}]","info":"data-from_site"},
{"data3":"[{....},{...},{....}]","info":"data-from_site"}
]
I did a "SplitJson", i got each json record as a single flowfile
flowfile 1:
{"data1":"[{....},{...},{....}]","info":"data-from_site"}
flowfile 2:
{"data2":"[{....},{...},{....}]","info":"data-from_site"}
flowfile 3:
{"data3":"[{....},{...},{....}]","info":"data-from_site"}
I want to store each json record in each flowfile in a variable like that:
variable1 = "{"data1":"[{....},{...},{....}]","info":"data-from_site"}"
variable2 = "{"data2":"[{....},{...},{....}]","info":"data-from_site"}"
variable3 = "{"data3":"[{....},{...},{....}]","info":"data-from_site"}"
can someone show me how to store the json record in a variable !
If I understand correctly what you want to do (by "variable", do you mean what is called "attribute" in NiFi?), you can use the EvaluateJsonPath processor configured with:
flowfile-attribute as Destination
json as Return type

OpenCSV : getting the list of header names in the order it appears in csv

I am using Springboot + OpenCSV to parse a CSV with 120 columns (sample 1). I upload the file process each rows and in case of error, return a similar CSV (say errorCSV). This errorCSV will have only errored out rows with 120 original columns and 3 additional columns for details on what went wrong. Sample Error file 2
I have used annotation based processing and beans are populating fine. But I need to get header names in the order they appear in the csv. This particular part is quite challenging. Then capture exception and original data during parsing. The two together can later be used in writing CSV.
CSVReaderHeaderAware headerReader;
headerReader = new CSVReaderHeaderAware(reader);
try {
header = headerReader.readMap().keySet();
} catch (CsvValidationException e) {
e.printStackTrace();
}
However the header order is jumbled and there is no way to get header index. The reason being CSVReaderHeaderAware internally uses a HashMap. In order to solve this I built my custom class. It is a replica of CSVReaderHeaderAware 3 except that I used LinkedHashMap
public class CSVReaderHeaderOrderAware extends CSVReader {
private final Map<String, Integer> headerIndex = new LinkedHashMap<>();
}
....
// This code cannot be done with a stream and Collectors.toMap()
// because Map.merge() does not play well with null values. Some
// implementations throw a NullPointerException, others simply remove
// the key from the map.
Map<String, String> resultMap = new LinkedHashMap<>(headerIndex.size()*2);
It does the job however wanted to check if this is the best way out or can you think of a better way to get header names and failed values back and write in a csv.
I referred to following links but couldn't get much help
How to read from particular header in opencsv?

Get id from previous processor NiFi

Processors I'm referring to
Is it possible that the processor "InvokeHTTP" takes the information "id" from the previous processor(in this case SELECT_FROM_SNOWFLAKE)?
Where i want to change
I would like the "Remote URL" to be something like:
http://${hostname()}:8080/nifi-api/processors/${previousProcessorId()}
No, you can't. But you can get name, id or other properties for current processor group using ExecuteScript or ExecuteGroovy processors somewhere in this flow to find these informations with script:
def flowFile = session.get()
if(!flowFile) return
processGroupId = context.procNode?.processGroupIdentifier ?: 'unknown'
processGroupName = context.procNode?.getProcessGroup().getName() ?: 'unknown'
flowFile = session.putAttribute(flowFile, 'processGroupId', processGroupId)
flowFile = session.putAttribute(flowFile, 'processGroupName', processGroupName)
session.transfer(flowFile, REL_SUCCESS)
After that, you can find get the id of this snow_flake processor in this processor group for example in rest api.
the Remote URL property in InvokeHTTP processor supports nifi expression language.
So, if previous processor sets attribute hostname then you can use it as http://${hostname}:8080/...
However SelectSQL returns result in Avro format.
Probably before InvokeHTTP you need to convert avro to json and then evaluatejsonpath to extract required values into attributes.

Transform data with NIFI

What's the best practice with NIFI to extract an attribute in a flowfile and transform it in a Text Format Example :
{ "data" : "ex" } ===> My data is ex
How can I do this with NIFI wihtout using a executeScript Processor
You could use ExtractText to extract the values into attributes. If you added a property in ExtractText like foo = {"(.+)" : "(.+)"} then your flow file would get two attributes for each of the capture groups in the regex:
foo.1 = data
foo.2 = ex
Then you can use ReplaceText with a Replacement Value of:
My ${foo.1} is ${foo.2}

Resources