The problem:
I have a CSV file. I want to read from it, and use one of the values in it based on the content of my flow file. My flow file will be XML. I want to read the key using EvaluateXPath into an attribute, then use that key to read the corresponding value from the CSV file and put that into a flow file attribute.
I tried following this:
https://community.hortonworks.com/questions/174144/lookuprecord-and-simplecsvfilelookupservice-in-nif.html
but found requiring several controller services, including a CSV writer to be a big more than I would think is required to solve this.
Since you're working with attributes (and only one lookup value), you can skip the record-based stuff and just use LookupAttribute with a SimpleCsvFileLookupService.
The record-based components are for doing multiple lookups per record and/or lookups for each record in a FlowFile. In your case it looks like you have one "record" and you really just want to look up an attribute from another attribute for the entire FlowFile, so the above solution should be more straightforward and easier to configure.
Related
I am getting some numerical data with API from URL and I am looking for a way to make some mathematical operations in apache nifi before putting data to file directory. Thanks already now.
By the way, I am using InvokeHTTP processor to get data and to put file in somewhere I am using PutFile processor. I searched some related websites but I could not find out a working way.
Try using QueryRecord processor and Define Record Reader/Writer controller services to read/write the flowfile.
Add new property to the QueryRecord processor by using Apache calcite SQL query with your mathematical operations on flowfile.
Results of the SQL query will be added to the outgoing flowfile in your desired format.
Ultimately the answer depends on whether the data you're working with is in the content of the FlowFile or in the attributes. If the data is small enough and it's only a couple operations, the suggested approach would be to work with the data as attributes and use NiFi's expression language to do the transformations.
There is a section of mathematical operations[1] in the Apache documentation[2]. The operations range from simple operand like plus/minus to exposing the java.lang.Math static methods.
[1] https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#numbers
[2] https://nifi.apache.org/docs.html
You can try ExecuteStreamCommand if you want to intake the whole file and then run operations. Alternatively, you can fiddle around with the variables on the flowfile - depending on how large your operation is.
For example if you have some initial variables you can include them in the name of your file and then extract them, run the operations within the variables of the flowfile, then add to the bottom of the original file
My question is - if I run a test via Jmeter, for example , if it's a site which enables you to book a flight, and you choose your source and destination when you record it.
Is it possible to pass different values to the destination field? I mean, maybe a txt file with some destinations and pass it to the Jmeter test and then, you will have some tests which each of them is running with a different destination?
If yes, how can I do it?
It's not necessary that it will be a txt file. Just a way to pass different values to one parameter.
Important: I'm using blazemeter plugin for chrome.
Thanks a lot,
appreciated.
You can use CSV Data Set Config. It is very easy to use for parameterizing variables in the test plan.
Check this article on blazemeter to understand the CSV Data Set Config quickly.
Depending on what you're trying to achieve you can go for:
HTML Link Parser. See Poll Example which shows how you can use it for selecting random values from inputs
You can extract all the possible values from the previous response using a Post-Processor, most probably CSS Selector Extractor and configure each thread to use its own (or random) value from the input
And last, but not the least, you can use an external data source like .txt or .csv file and utilize __StringFromFile() function or CSV Data Set Config so each thread (virtual user) would read the next value from file instead of using recorded hard-coded values.
Our requirement is split the flow data based on condition.
We thought to use "ExecuteStreamCommand" processor for that (intern it will use java class) but it is giving single flow data file only. We would like to have two flow data files, one is for matched and another is for unmatched criteria.
I looked at "RouteText" processor but it has no feature to use java class as part of it.
Let me know if anyone has any suggestion.
I think you could use GetMongo to read those definition values and store them in a map accessed by DistributedMapCacheClientService, then use RouteOnContent to route the incoming flowfiles based on the absence/presence of the retrieved values.
If that doesn't work, you could instead route the query result from GetMongo to PutFile and then use ScanContent, which reads from a dictionary file on the file system and routes flowfiles based on the absence/presence of those keywords in the content.
Finally, if all else fails, you can use ExecuteScript to combine those steps into a single processor and route to matched/unmatched relationships. It processes Groovy code easily, so you can directly invoke your existing Java class if necessary.
I have a common process group that will infer avro schema based on the file i supplied. But I want to set the Avro Record Name to a name corresponding to the filename i am supplying. So I used ${filename}. But the InferAvroSchema got error saying the record name is empty. Note that before this, I already set the property "filename" to the flowfile attribute and it has a value since i tested it using ReplaceText to see if there's value for ${filename}
Unfortunately this looks like a bug in InferAvroSchema. Many of the properties support expression language, but then the processor doesn't evaluate them against the incoming flow file. So it ends up only being able to use a value typed directly into the property (non-EL), or a value from system or environment properties which doesn't really make sense for a lot of these properties.
I created this JIRA for the issue:
https://issues.apache.org/jira/browse/NIFI-2465
The fix is that all of the calls to evaluateAttributeExpressions() should be passing in a flow file like:
context.getProperty(CSV_HEADER_DEFINITION).evaluateAttributeExpressions(inputFlowFile).getValue()
I have a use case where I need to ingest files from a directory into HDFS. As a POC, I used simple Directory Spooling in Flume where I specified the source, sink and channel and it works fine. The disadvantage is that I would have to maintain multiple directories for multiple file types that go into distinct folders in order to get greater control over file sizes and other parameters, while making configuration repetitive, but easy.
As an alternative, I was advised to use regex interceptors where multiple files would reside in a single directory and based on a string in the file, would be routed to the specific directory in HDFS. The kind of files I am expecting are CSV files where the first line is the header and the subsequent lines are comma separated values.
With this in mind, I have a few questions.
How do interceptors handle files?
Given that the header line in the CSV would be like ID, Name followed in the next lines by IDs and Names, and another file in the same directory would have Name, Address followed in the next line by names and address, what would the interceptor and channel configuration look like for it to route it to different HDFS directories?
How does an interceptor handle the subsequent lines that clearly do not match the regex expression?
Would an entire file even constitute one event or is it possible that one file can actually be multiple events?
Please let me know. Thanks!
For starters, flume doesn't work on files as such, but on a thing called events. Events are Avro structures which can contain anything, usually a line, but in your case it might be an entire file.
An interceptor gives you the ability to extract information from your event and add that to that event's headers. The latter can be used to configure a traget directory structure.
In your specific case, you would want to code a parser that analyses the content of you event and sets a header value, for instance sub path:
if (line.contains("Address")) {
event.getHeaders().put("subpath", "address");
else if (line.contains("ID")) {
event.getHeaders().put("subpath", "id");
}
You can then reference that in your hdfs-sink confirguration as follows:
hdfs-a1.sinks.hdfs-sink.hdfs.path = hdfs://cluster/path/%{subpath}
As to your question whether multiple files can constitute an event: yes, that's possible, but not with the spool source. You would have to implement a client class which speaks to a configured Avro source. You would have to pipe your files into an event and send that off. You could then also set the headers there instead of using an interceptor.