Hi am trying to take simple count of "ScrollElasticsearchHttp" processor in nifi. and using QueryRecord after this processor. I have created one new variable and using below Sql
"select count(1) from FLOFILE"
I am expecting result.count value 10000 which is my record count but its always showing record.count value 1.
can someone suggest how should I take count of this ScrollElasticsearchHttp flow.
Thanks !!
Documentation of ScrollElasticsearchHttp processor:
Each page of results is returned, wrapped in a JSON object like so: { "hits" : [ , , ] }.
Firstly use a EvaluateJsonPath processor:
Destination: flowfile-content
Return Type: auto-detect
hits (dynamic): $.hits
Then use a QueryRecord processor:
count: SELECT COUNT(1) AS COUNT FROM FLOWFILE
Processors I'm referring to
Is it possible that the processor "InvokeHTTP" takes the information "id" from the previous processor(in this case SELECT_FROM_SNOWFLAKE)?
Where i want to change
I would like the "Remote URL" to be something like:
http://${hostname()}:8080/nifi-api/processors/${previousProcessorId()}
No, you can't. But you can get name, id or other properties for current processor group using ExecuteScript or ExecuteGroovy processors somewhere in this flow to find these informations with script:
def flowFile = session.get()
if(!flowFile) return
processGroupId = context.procNode?.processGroupIdentifier ?: 'unknown'
processGroupName = context.procNode?.getProcessGroup().getName() ?: 'unknown'
flowFile = session.putAttribute(flowFile, 'processGroupId', processGroupId)
flowFile = session.putAttribute(flowFile, 'processGroupName', processGroupName)
session.transfer(flowFile, REL_SUCCESS)
After that, you can find get the id of this snow_flake processor in this processor group for example in rest api.
the Remote URL property in InvokeHTTP processor supports nifi expression language.
So, if previous processor sets attribute hostname then you can use it as http://${hostname}:8080/...
However SelectSQL returns result in Avro format.
Probably before InvokeHTTP you need to convert avro to json and then evaluatejsonpath to extract required values into attributes.
What's the best practice with NIFI to extract an attribute in a flowfile and transform it in a Text Format Example :
{ "data" : "ex" } ===> My data is ex
How can I do this with NIFI wihtout using a executeScript Processor
You could use ExtractText to extract the values into attributes. If you added a property in ExtractText like foo = {"(.+)" : "(.+)"} then your flow file would get two attributes for each of the capture groups in the regex:
foo.1 = data
foo.2 = ex
Then you can use ReplaceText with a Replacement Value of:
My ${foo.1} is ${foo.2}
I have several CSV files in a HDFS folder which I load to a relation with:
source = LOAD '$data' USING PigStorage(','); --the $data is a passed as a parameter to the pig command.
When I dump it, the structure of the source relation is as follows: (note that the data is text qualified but I will deal with that using the REPLACE function)
("HEADER","20110118","20101218","20110118","T00002")
("0000000000000000035412","20110107","2699","D","20110107","2315.","","","","","","C")
("0000000000000000035412","20110107","2699","D","20110107","246..","162","74","","","","B")
<.... more records ....>
("HEADER","20110224","20110109","20110224","T00002")
("0000000000000000035412","20110121","2028","D","20110121","a6c3.","","","","","R","P")
("0000000000000000035412","20110217","2619","D","20110217","a6c3.","","","","","R","P")
<.... more records ....>
So each file has a header which provides some information about the data set that follows it such as the provider of the data and the date range it covers.
So now, how can I transform the above structure and create a new relation like the following ?:
{
(HEADER,20110118,20101218,20110118,T00002),{(0000000000000000035412,20110107,2699,D,20110107,2315.,,,,,,C),(0000000000000000035412,20110107,2699,D,20110107,246..,162,74,,,,B),..more tuples..},
(HEADER,20110224,20110109,20110224,T00002),{(0000000000000000035412,20110121,2028,D,20110121,a6c3.,,,,,R,P),(0000000000000000035412,20110217,2619,D,20110217,a6c3.,,,,,R,P),..more tuples..},..more tuples..
}
Where each header tuple is followed by a bag of record tuples belonging to that header ?.
Unfortunately there is no common key field between the header and the detail rows, so I don't think cant use any JOIN operation. ?
I am quite new to Pig and Hadoop and this is one of the first concept projects that I am engaging in.
Hope my question is clear and look forward to some guidance here.
This should get you started.
Code:
Source = LOAD '$data' USING PigStorage(',','-tagFile');
A = SPLIT Source INTO FileHeaders IF $1 == 'HEADER', FileData OTHERWISE;
B = GROUP FileData BY $0;
C = GROUP FileHeaders BY $0;
D = JOIN B BY Group, C BY Group;
...
Hi stackoverflow community;
i'm totally new to pig, i want to STORE the result in a text file and name it as i want. is it possible do this using STORE function.
My code:
a = LOAD 'example.csv' USING PigStorage(';');
b = FOREACH a GENERATE $0,$1,$2,$3,$6,$7,$8,$9,$11,$12,$13,$14,$20,$24,$25;
STORE b INTO ‘myoutput’;
Thanks.
Yes you will be able to store your result in myoutput.txt and you can load the data into file with any delimiter you want using PigStorage.
a = LOAD 'example.csv' USING PigStorage(';');
b = FOREACH a GENERATE $0,$1,$2,$3,$6,$7,$8,$9,$11,$12,$13,$14,$20,$24,$25;
STORE b INTO ‘myoutput.txt’ using PigStorage(';');
Yes, it is possible. b will store every row into 25 different columns - $0 to S25.