How to combine the attributes from Different ExecuteScript Processor and add them in UpdateAttribute in NiFi - apache-nifi

I have a usecase to run different scripts(ExecuteScript Processor)based on different config sets as (ex : AB - ExecuteScriptAB, AC -ExecuteScriptAC , AL - ExecuteScriptAL, AM - ExecuteScriptAM)
Inside each ExecuteScript Processor, I have used session.putAttribute() using python script to get the execute status code of all scripts.
For AB : flowFile = session.putAttribute(flowFile,"returnCodeAB",str(exec_code));
For AC : flowFile = session.putAttribute(flowFile,"returnCodeAC",str(exec_code));
For AL : flowFile = session.putAttribute(flowFile,"returnCodeAL",str(exec_code));
For AM : flowFile = session.putAttribute(flowFile,"returnCodeAM",str(exec_code));
Now, I want to add all the values of these 4 attributes i.e returnCodeAB+returnCodeAC+returnCodeAL+returnCodeAM and return a final status code value. As these scripts execute seperately , i am unable to merge them and add the values .. they act as different processors.
I tried to use UpdateAttribute Processor after these scripts and create and attribute in Advanced Tab but it did not help. Can someone help me with efficient solution?

Related

ScrollElasticsearchHttp processor Count in Nifi

Hi am trying to take simple count of "ScrollElasticsearchHttp" processor in nifi. and using QueryRecord after this processor. I have created one new variable and using below Sql
"select count(1) from FLOFILE"
I am expecting result.count value 10000 which is my record count but its always showing record.count value 1.
can someone suggest how should I take count of this ScrollElasticsearchHttp flow.
Thanks !!
Documentation of ScrollElasticsearchHttp processor:
Each page of results is returned, wrapped in a JSON object like so: { "hits" : [ , , ] }.
Firstly use a EvaluateJsonPath processor:
Destination: flowfile-content
Return Type: auto-detect
hits (dynamic): $.hits
Then use a QueryRecord processor:
count: SELECT COUNT(1) AS COUNT FROM FLOWFILE

Get id from previous processor NiFi

Processors I'm referring to
Is it possible that the processor "InvokeHTTP" takes the information "id" from the previous processor(in this case SELECT_FROM_SNOWFLAKE)?
Where i want to change
I would like the "Remote URL" to be something like:
http://${hostname()}:8080/nifi-api/processors/${previousProcessorId()}
No, you can't. But you can get name, id or other properties for current processor group using ExecuteScript or ExecuteGroovy processors somewhere in this flow to find these informations with script:
def flowFile = session.get()
if(!flowFile) return
processGroupId = context.procNode?.processGroupIdentifier ?: 'unknown'
processGroupName = context.procNode?.getProcessGroup().getName() ?: 'unknown'
flowFile = session.putAttribute(flowFile, 'processGroupId', processGroupId)
flowFile = session.putAttribute(flowFile, 'processGroupName', processGroupName)
session.transfer(flowFile, REL_SUCCESS)
After that, you can find get the id of this snow_flake processor in this processor group for example in rest api.
the Remote URL property in InvokeHTTP processor supports nifi expression language.
So, if previous processor sets attribute hostname then you can use it as http://${hostname}:8080/...
However SelectSQL returns result in Avro format.
Probably before InvokeHTTP you need to convert avro to json and then evaluatejsonpath to extract required values into attributes.

Transform data with NIFI

What's the best practice with NIFI to extract an attribute in a flowfile and transform it in a Text Format Example :
{ "data" : "ex" } ===> My data is ex
How can I do this with NIFI wihtout using a executeScript Processor
You could use ExtractText to extract the values into attributes. If you added a property in ExtractText like foo = {"(.+)" : "(.+)"} then your flow file would get two attributes for each of the capture groups in the regex:
foo.1 = data
foo.2 = ex
Then you can use ReplaceText with a Replacement Value of:
My ${foo.1} is ${foo.2}

how to join header row to detail rows in multiple files with apache pig

I have several CSV files in a HDFS folder which I load to a relation with:
source = LOAD '$data' USING PigStorage(','); --the $data is a passed as a parameter to the pig command.
When I dump it, the structure of the source relation is as follows: (note that the data is text qualified but I will deal with that using the REPLACE function)
("HEADER","20110118","20101218","20110118","T00002")
("0000000000000000035412","20110107","2699","D","20110107","2315.","","","","","","C")
("0000000000000000035412","20110107","2699","D","20110107","246..","162","74","","","","B")
<.... more records ....>
("HEADER","20110224","20110109","20110224","T00002")
("0000000000000000035412","20110121","2028","D","20110121","a6c3.","","","","","R","P")
("0000000000000000035412","20110217","2619","D","20110217","a6c3.","","","","","R","P")
<.... more records ....>
So each file has a header which provides some information about the data set that follows it such as the provider of the data and the date range it covers.
So now, how can I transform the above structure and create a new relation like the following ?:
{
(HEADER,20110118,20101218,20110118,T00002),{(0000000000000000035412,20110107,2699,D,20110107,2315.,,,,,,C),(0000000000000000035412,20110107,2699,D,20110107,246..,162,74,,,,B),..more tuples..},
(HEADER,20110224,20110109,20110224,T00002),{(0000000000000000035412,20110121,2028,D,20110121,a6c3.,,,,,R,P),(0000000000000000035412,20110217,2619,D,20110217,a6c3.,,,,,R,P),..more tuples..},..more tuples..
}
Where each header tuple is followed by a bag of record tuples belonging to that header ?.
Unfortunately there is no common key field between the header and the detail rows, so I don't think cant use any JOIN operation. ?
I am quite new to Pig and Hadoop and this is one of the first concept projects that I am engaging in.
Hope my question is clear and look forward to some guidance here.
This should get you started.
Code:
Source = LOAD '$data' USING PigStorage(',','-tagFile');
A = SPLIT Source INTO FileHeaders IF $1 == 'HEADER', FileData OTHERWISE;
B = GROUP FileData BY $0;
C = GROUP FileHeaders BY $0;
D = JOIN B BY Group, C BY Group;
...

Store pig result in a text file

Hi stackoverflow community;
i'm totally new to pig, i want to STORE the result in a text file and name it as i want. is it possible do this using STORE function.
My code:
a = LOAD 'example.csv' USING PigStorage(';');
b = FOREACH a GENERATE $0,$1,$2,$3,$6,$7,$8,$9,$11,$12,$13,$14,$20,$24,$25;
STORE b INTO ‘myoutput’;
Thanks.
Yes you will be able to store your result in myoutput.txt and you can load the data into file with any delimiter you want using PigStorage.
a = LOAD 'example.csv' USING PigStorage(';');
b = FOREACH a GENERATE $0,$1,$2,$3,$6,$7,$8,$9,$11,$12,$13,$14,$20,$24,$25;
STORE b INTO ‘myoutput.txt’ using PigStorage(';');
Yes, it is possible. b will store every row into 25 different columns - $0 to S25.

Resources