Pass a directory as an argument to ExecuteStreamCommand - apache-nifi

I have a Java program that is designed to process a directory full of data, passed as an argument to the JAR.
input_dir/
file1
file2
How can I tell NiFi to pass a directory to a ExecuteStreamCommand, as an argument, instead of an individual FlowFile ?
Is there a way to model a directory as a FlowFile ?
I tried to use GetFile just before ExecuteStreamCommand on input_dir parent directory in order to get ìnput_dir`, so it would be passed to the stream command.
It didn't work, as GetFile just crawls all the directories looking for actual files when "Recurse Subdirectories" attribute is set to true.
When set to false, GetFile doesn't get any files.
To summarize, I would like to find a way to pass a directory containing data to a ExecuteStreamCommand, not just a single FlowFile.
Hope it makes sens, thank you for your suggestions.

A flow file does not have to be a file from disk, it can be anything. If I am understanding you correctly, you just need a flow file to trigger your ExecuteStreamCommand. You should be able to do this with GenerateFlowFile (set the scheduling strategy appropriately). You can put the directory directly into ExecuteStreamCommand, or if you want it to be more dynamic you can add it as a flow file attribute in GenerateFlowFile, then reference it in ExecuteStreamCommand like ${my.dir} (assuming you called it my.dir in GenerateFlowFile).

Related

CSV Data config - avoid creating duplicate folder names

I have created CSV data config as attached and the purpose of this run is to create folder names in the application and should not try to create the duplicate folder names.
please advise if we need to add post processor with if condition? if yes please provide the condition to put in post processor?
I can only think of getting the existing folder names somehow and storing them into JMeter Variables
Once done you can use the If Controller and use the following __groovy() function as the condition:
${__groovy(!vars.entrySet().collect {entry -> entry.getValue()}.contains(vars.get('your variable from CSV Data Set Config')),)}
The If Controllers child(ren) will only be executed if there is no JMeter Variable with the value of the current variable from the CSV Data Set Config is present.

Jmeter - CSV Data Config file name - Modify at RunTime

How can I change the filename of the CSV DataConfig at run time in the jmx file.
We have a logic in a java class which would create a dynamic file name and this
needs to be configured as the filename in the CSV DataConfig.
I am using Jmeter 4.0
Regards
You could use a variable / property name in the CSV data set config
here filename could be the name of the file or complete path of the file itself could be used as a variable.
Remember that CSV Data set config element gets initialized first - so filename should be a User defined variable / could be a property passed to JMeter. I would prefer a property.
Do note that You can not keep on changing the CSV data set config element filename in a test once it started. That means one CSV Data set config element can be used for 1 CSV file only. We can not modify!!
You cannot as CSV Data Set Config is a Configuration Element therefore it's executed before anything else. If you need to read the data from different files as the test goes by consider using JMeter Functions instead, the most suitable ones would be:
__StringFromFile() - returns the next string from the given file each time it's called
__CSVRead() - reads a value from CSV file. The function not only supports using dynamic file names, moreover, you can provide even multiple input files as well.

Write attributes to a file in Apache NiFi

​Hi,
I am using GetSNMP processor to connect a radio. As per the NiFi documentation, this information is written to flow file attributes not to flow file contents. So, I used AttributesToJSON processor. After that I used PutFile processor to write these attributes to a file. Files are generated but there are not attributes written there. Only "{}" is written in each of the file. Using LogAttribute processor , I can see all attributes in the log file but I want them in a separate file.
Please guide.
Thanks,
SGaur,
If incoming flow file content is empty before putFile processor then it will writes empty content in local directory.
So you have to write attributes into flowfile content using ReplaceText.
For an example, You having this attributes like
${filename}-->input.1,
${input.content.1}-->content.1,
${input.content.2}-->content.2
comes before putFile.
Now you have to write those attributes into flow file content like below.,
In ReplaceText, Just mention replacement value to be like this-->
${filename},${input.content.1},${input.content.2}
It will replace content like below.,
input.1,content.1,content.2
Now it will write into local file using put file processor.
Hope this helpful for you.

Apach Nifi revert flowfile attribute

In my Nifi 1.3.0 dataflow the FetchElasticsearchHttp processor changes the filename attribute to its corresponding ID in the database. I was wondering if there was a way of changing it back using some of Nifi's in house processors.
I have thought about simply writing my own script to correct this but there seems to be no way of knowing what file it is so I can't just grab its name.
if I understood you correctly, you can use UpdateAttribute to copy the filename attribute to another property. There's no way to stop the processor from writing its properties, but you can surely stash it away yourself. The trick is to copy/rename before invoking the fetch processor.

Flume - Can an entire file be considered an event in Flume?

I have a use case where I need to ingest files from a directory into HDFS. As a POC, I used simple Directory Spooling in Flume where I specified the source, sink and channel and it works fine. The disadvantage is that I would have to maintain multiple directories for multiple file types that go into distinct folders in order to get greater control over file sizes and other parameters, while making configuration repetitive, but easy.
As an alternative, I was advised to use regex interceptors where multiple files would reside in a single directory and based on a string in the file, would be routed to the specific directory in HDFS. The kind of files I am expecting are CSV files where the first line is the header and the subsequent lines are comma separated values.
With this in mind, I have a few questions.
How do interceptors handle files?
Given that the header line in the CSV would be like ID, Name followed in the next lines by IDs and Names, and another file in the same directory would have Name, Address followed in the next line by names and address, what would the interceptor and channel configuration look like for it to route it to different HDFS directories?
How does an interceptor handle the subsequent lines that clearly do not match the regex expression?
Would an entire file even constitute one event or is it possible that one file can actually be multiple events?
Please let me know. Thanks!
For starters, flume doesn't work on files as such, but on a thing called events. Events are Avro structures which can contain anything, usually a line, but in your case it might be an entire file.
An interceptor gives you the ability to extract information from your event and add that to that event's headers. The latter can be used to configure a traget directory structure.
In your specific case, you would want to code a parser that analyses the content of you event and sets a header value, for instance sub path:
if (line.contains("Address")) {
event.getHeaders().put("subpath", "address");
else if (line.contains("ID")) {
event.getHeaders().put("subpath", "id");
}
You can then reference that in your hdfs-sink confirguration as follows:
hdfs-a1.sinks.hdfs-sink.hdfs.path = hdfs://cluster/path/%{subpath}
As to your question whether multiple files can constitute an event: yes, that's possible, but not with the spool source. You would have to implement a client class which speaks to a configured Avro source. You would have to pipe your files into an event and send that off. You could then also set the headers there instead of using an interceptor.

Resources