InferAvroSchema Avro Record Name based on flow attribute - apache-nifi

I have a common process group that will infer avro schema based on the file i supplied. But I want to set the Avro Record Name to a name corresponding to the filename i am supplying. So I used ${filename}. But the InferAvroSchema got error saying the record name is empty. Note that before this, I already set the property "filename" to the flowfile attribute and it has a value since i tested it using ReplaceText to see if there's value for ${filename}

Unfortunately this looks like a bug in InferAvroSchema. Many of the properties support expression language, but then the processor doesn't evaluate them against the incoming flow file. So it ends up only being able to use a value typed directly into the property (non-EL), or a value from system or environment properties which doesn't really make sense for a lot of these properties.
I created this JIRA for the issue:
https://issues.apache.org/jira/browse/NIFI-2465
The fix is that all of the calls to evaluateAttributeExpressions() should be passing in a flow file like:
context.getProperty(CSV_HEADER_DEFINITION).evaluateAttributeExpressions(inputFlowFile).getValue()

Related

how to pass the dynamic filename in csv data set config in jmeter while the dynamic names are generated to save data of previous request?

I have one http request whose response comes in nested json and using groovy i am saving that data in different csv file on the basis of conditions.
the name of csv file is generated dynamically and saved in a variable
using vars.put() function
vars.put("_cFileName",cFileName.toString());
when try to use this variable in csv data set config
enter image description here
getting error message
2022-01-19 16:58:39,370 ERROR o.a.j.t.JMeterThread: Test failed!
java.lang.IllegalArgumentException: File ${_cFileName} must exist and be readable
it is not converting the name to actual file name
But in case if file name is not dynaamic and the variable is defined under user defined variable in test plan it will able to convert to actual file name?
is there any way we can use the dynamic name created in an previos request post processor?
You cannot, as per JMeter Test Elements Execution Order Configuration Elements are executed before everything else, CSV Data Set Config is a Configuration Element hence it's being initialized before any JMeter Variables are being set.
The solution would be moving to __CSVRead() function, JMeter Functions are evaluated at the time they're being called so it's possible to provide dynamic file name there, see How to Pick Different CSV Files at JMeter Runtime guide for more details if needed.

How to read from a CSV file

The problem:
I have a CSV file. I want to read from it, and use one of the values in it based on the content of my flow file. My flow file will be XML. I want to read the key using EvaluateXPath into an attribute, then use that key to read the corresponding value from the CSV file and put that into a flow file attribute.
I tried following this:
https://community.hortonworks.com/questions/174144/lookuprecord-and-simplecsvfilelookupservice-in-nif.html
but found requiring several controller services, including a CSV writer to be a big more than I would think is required to solve this.
Since you're working with attributes (and only one lookup value), you can skip the record-based stuff and just use LookupAttribute with a SimpleCsvFileLookupService.
The record-based components are for doing multiple lookups per record and/or lookups for each record in a FlowFile. In your case it looks like you have one "record" and you really just want to look up an attribute from another attribute for the entire FlowFile, so the above solution should be more straightforward and easier to configure.

merging of flow files in the specified order

I am new to nifi(using version 1.8.0). I have the requirement of consuming kafka messages which contain vehicle position in the form of lat,lon per message. Since each message will arrive as a flow file, I need to merge all these flow files and make a json file containing the complete path followed by the vehicle. I am using consume kafka processor to subscribe to messages, update attribute processor(properties added are filename:${getStateValue("seq")},seq:${getStateValue("seq"):plus(1)}) to add a sequence number as filename (eg. filename is 1,2, 3 etc) and put file processor to write these files in the specified directory. I have configured FIFO priority queue on all the success relationship between the above mentioned processors.Once, I have received all the messages I want to merge all the flow files. For this I know I have to use get file, enforce order, merge content(merge strategy:bin packing algorithm, merge format:binary concatenation) and put file processor, respectively. Is my approach correct? How should I establish that merging of files takes place in the sequence of their names as filename is a seq number. What should I put in order attribute in enforce order processor?What should in put in group identifier? Are there more custom fields to be added in enforce order processor?
EnforceOrder processor documentation
1.Group Identifier
This property evaluate on each flowfile for your case use UpdateAttribute Processor, add group_name attribute and use the same ${group_name} attribute in Group Identifier property value.
2.Order Attribute
Expression language is not supported.
You can use filename (or) create new attribute in
UpdateAttribute processor and use same attribute name in your
Order Attribute property value.
For reference/usage of Enforce order processor use this template and upload to your NiFi instance.

How to determine the file type when a document is pulled from documentum

Which attribute can be used to pass the File Name while ingesting a document?
How to determine the file type when a document is pulled from Documentum using DFC API
Once a file is uploaded to Documentum, it "loses" its filename. A document is linked to a content object, which is again linked to the file itself on a filestore.
There are ways to get hints about the original filename and/or file extensions:
Find the Content ID by looking at i_contents_id, and look at that object's set_file attribute. Normally, this string will contain the full path to path (including filename) of the original file, but there are no guarantees.
If storage extensions are on (yes, they're on by default), you could use the following API command to get the file extension: getpath,c,<doc_id>
The document's a_content_type links to the name attribute of a dm_format object. Look at that object's dos_extension attribute to see the registered file extension for that given format (there is no guarantee that this was the original file extension, however).
As for which attribute should contain the filename, there is no clear answer. It's all up to the client. Normally, using object_nameshould suffice, or you could create a custom type with a custom attribute if the original filename is very important to you.
File in Documentum repository don't need to have document names that is originating from file that was uploaded from file system.
When you export document via export action with WDK application, i.e. Documentum Administrator or Webtop exported file will have name based on the value that was place in object_name property of that specific object.
File type of the content that is related to sepecific document object in repository is written in attribute a_content_type. Values in this attribute are internal Documetnum notation but names are intuitive. Check this question for more info or google.

Changing data source connection at runtime in an Informatica workflow

I have a mapping which I need to be able to run against multiple source schemas (having the same structure), one schema at-a-time. Given the number of schemas, I would rather not set up a session for each schema in order to specify a particular mapping connection, as that will require new sessions to be added as new schemas are added.
Is it possible to set up a workflow in such a way that the data source connection for a mapping within a session is defined (or passed in as a parameter of some sort) at run-time?
Configure the workflow or the session to use a parameter file.
In the session settings, change the 'hard-coded' source connection (option Use object) to Use Connection Variable and enter a variable name with a $DBConnection prefix (e.g. $DBConnectionSource01).
Create a parameter file in an appropriate location with the following contents:
[Global]
$DBConnectionSource01=connection_name
I believe what you are looking for is going to be solved using a parameter file and a bit of shell script (assuming your server is on some Unix flavour).
Setup your workflow to run with a parameter file. Declare a special parameter for Database Connection (staring with $DBConnection) in the parameter file global section. Change your session properties to use that parameter.
You need to create appropriate Relational connection objects for each of the source db/schemas.
Now, write some shell program to dynamically change that parameter file and replace the value of the parameter in parameter file with the new value you want.
Typical sequence of events at run time should look like this -
Whenever you want to run that workflow/session/mapping for a different source, launch the script with appropriate parameter to effect the change in Informatica parameter file.
Shell script runs for launching the job for a given db source. The script run should change the $DBConnection parameter in parameter file
Launch the workflow through pmcmd using the parameter file.

Resources