Hi am trying to take simple count of "ScrollElasticsearchHttp" processor in nifi. and using QueryRecord after this processor. I have created one new variable and using below Sql
"select count(1) from FLOFILE"
I am expecting result.count value 10000 which is my record count but its always showing record.count value 1.
can someone suggest how should I take count of this ScrollElasticsearchHttp flow.
Thanks !!
Documentation of ScrollElasticsearchHttp processor:
Each page of results is returned, wrapped in a JSON object like so: { "hits" : [ , , ] }.
Firstly use a EvaluateJsonPath processor:
Destination: flowfile-content
Return Type: auto-detect
hits (dynamic): $.hits
Then use a QueryRecord processor:
count: SELECT COUNT(1) AS COUNT FROM FLOWFILE
Related
I have a usecase to run different scripts(ExecuteScript Processor)based on different config sets as (ex : AB - ExecuteScriptAB, AC -ExecuteScriptAC , AL - ExecuteScriptAL, AM - ExecuteScriptAM)
Inside each ExecuteScript Processor, I have used session.putAttribute() using python script to get the execute status code of all scripts.
For AB : flowFile = session.putAttribute(flowFile,"returnCodeAB",str(exec_code));
For AC : flowFile = session.putAttribute(flowFile,"returnCodeAC",str(exec_code));
For AL : flowFile = session.putAttribute(flowFile,"returnCodeAL",str(exec_code));
For AM : flowFile = session.putAttribute(flowFile,"returnCodeAM",str(exec_code));
Now, I want to add all the values of these 4 attributes i.e returnCodeAB+returnCodeAC+returnCodeAL+returnCodeAM and return a final status code value. As these scripts execute seperately , i am unable to merge them and add the values .. they act as different processors.
I tried to use UpdateAttribute Processor after these scripts and create and attribute in Advanced Tab but it did not help. Can someone help me with efficient solution?
This is my flow file content:
{
"a":"b",
"c":"y",
"d":"z",
"e":"w",
"f":"u",
"g":"v",
"h":"o",
"x":"t"
}
The final result should look like that in Postgres :
| test |
|----------------------------------------------------------------|
|{"a":"b,"c":"y","d":"z","e":"w","f":"u","g":"v","h":"o","x":"t"}|
the table is: json_test
the column name is test
Those steps shows how i tried to solve the problem:
My method was to store the json record in a variable as string with "ExtractText":
the attribute data take only some key-values from the json not the entire record:
data = {"a":"b",
"c":"y",
"d":"z",
"e":"w",
"f":
so i have a problem in the regex expression.
next i used PutSQL with the following SQL statement:
Unfortunately the result isn't the wanted one.
I need to know the exact expression that i should set in ExtractText to get the entire json record in a variable as string.
The sql statement should be:
insert into schema.table_name(column_name) values(the_variable_where the flowfile data was stored)
The following flowfile is the response of an "InvokeHttp":
[
{"data1":"[{....},{...},{....}]","info":"data-from_site"},
{"data2":"[{....},{...},{....}]","info":"data-from_site"},
{"data3":"[{....},{...},{....}]","info":"data-from_site"}
]
I did a "SplitJson", i got each json record as a single flowfile
flowfile 1:
{"data1":"[{....},{...},{....}]","info":"data-from_site"}
flowfile 2:
{"data2":"[{....},{...},{....}]","info":"data-from_site"}
flowfile 3:
{"data3":"[{....},{...},{....}]","info":"data-from_site"}
I want to store each json record in each flowfile in a variable like that:
variable1 = "{"data1":"[{....},{...},{....}]","info":"data-from_site"}"
variable2 = "{"data2":"[{....},{...},{....}]","info":"data-from_site"}"
variable3 = "{"data3":"[{....},{...},{....}]","info":"data-from_site"}"
can someone show me how to store the json record in a variable !
If I understand correctly what you want to do (by "variable", do you mean what is called "attribute" in NiFi?), you can use the EvaluateJsonPath processor configured with:
flowfile-attribute as Destination
json as Return type
Processors I'm referring to
Is it possible that the processor "InvokeHTTP" takes the information "id" from the previous processor(in this case SELECT_FROM_SNOWFLAKE)?
Where i want to change
I would like the "Remote URL" to be something like:
http://${hostname()}:8080/nifi-api/processors/${previousProcessorId()}
No, you can't. But you can get name, id or other properties for current processor group using ExecuteScript or ExecuteGroovy processors somewhere in this flow to find these informations with script:
def flowFile = session.get()
if(!flowFile) return
processGroupId = context.procNode?.processGroupIdentifier ?: 'unknown'
processGroupName = context.procNode?.getProcessGroup().getName() ?: 'unknown'
flowFile = session.putAttribute(flowFile, 'processGroupId', processGroupId)
flowFile = session.putAttribute(flowFile, 'processGroupName', processGroupName)
session.transfer(flowFile, REL_SUCCESS)
After that, you can find get the id of this snow_flake processor in this processor group for example in rest api.
the Remote URL property in InvokeHTTP processor supports nifi expression language.
So, if previous processor sets attribute hostname then you can use it as http://${hostname}:8080/...
However SelectSQL returns result in Avro format.
Probably before InvokeHTTP you need to convert avro to json and then evaluatejsonpath to extract required values into attributes.
I have a Dataset, obtained from a DataBase query, of about 5,000 elements. I would like to divide this data into chunks and then have the 'users' (threads) make a HTTP request.
The purpose of this is we have a site that gives realtime information on transient data, I want to simulate multiple concurrent requests against the service.
1 - Tried to create a test plan where the DB query was done and then processed via a HTTP request via a ForEach controller. This works fine when I have only 1 'user', however; if I increase the user count to 2+ then the DB query is run 2+ times and each 'user' runs through the entire 5,000+ data points
2 - I tried moving the DB query into it's own Thread Group and then using BeanShell to put the data into the environment (props.add(...)). This worked in that the data was there but again each 'user' in the http request Thread Group iterated all the data.
Ideally what I would like is to take the data, and have the HTTP Request Thread Group divide the data so that Thread 1 takes the first 2,500 and that Thread 2 takes the second 2,500 (or if there are 4 'users' then thread 1 takes the first 1,250, thread 2 the next 1,250 and so on).
I just started looking at JMeter and I don't think it can do this "automatically" but I wanted to ask in case I'm missing something obvious.
Put a Counter element to testplan with:
Starting value: 1
Increment: 1
Reference name: (for example) cid
and disabled "Track counter independently ...".
Then add JSR223 or BeanShell sampler and write a simple code:
Integer cid = Integer.valueOf(vars.get("cid"));
Integer dataShift = 2500;
Integer startReadDataFrom = (cid - 1) * 2500;
vars.put('startReadDataFrom', String.valueOf(startReadDataFrom));
Then you can use variable ${startReadDataFrom} as a starting point to read data for every thread (0, 2500, 5000, 7500, ...).
The fastest and the easiest way is to store the data from the database into a CSV file, once done you should be able to use CSV Data Set Config and its Sharing Mode feature according to your requirements.
The storing of the data could be done as follows:
Define Result variable name in your JDBC Request Sampler:
Add JSR223 PostProcessor as a child of the JDBC Request sampler
Put the following code into "Script" area:
resultSet = vars.getObject("resultSet")
result = new StringBuilder()
for (Object row : resultSet ) {
iter = row.entrySet().iterator()
while (iter.hasNext()) {
pair = iter.next()
result.append(pair.getValue())
result.append(",")
}
result.append(System.getProperty("line.separator"))
}
org.apache.commons.io.FileUtils.writeStringToFile(new File("data.csv"), result.toString(), "UTF-8")
Once execution will be finished you should see data.csv file in "bin" folder of your JMeter installation containing the data from the database