Which processor should I use to combine different JSON inputs into one object? - apache-nifi

I'd like to set up a dataflow that takes in multiple JSON inputs, and combines them into a single JSON object with multiple properties (I'm currently using a few GenerateFlowFile processors to generate the inputs), and send the data every 10 seconds via the PublishMQTT processor.
The inputs come in at different intervals (1-5seconds), and examples are:
{"temperature": 60}
{"pressure": 30}
I would like to compile the incoming data into one object i.e. {"temperature": 60,"pressure": 30} before sending it to the PublishMQTT processor.
Also, if fresh data with the same attribute comes in before the message is sent, it should update the attribute in the same object instead of being queued. i.e. If new data entered {"pressure": 150}, the output object should be updated to {"temperature": 60,"pressure": 150} before it is sent out via MQTT
I'm guessing that I will require a processor (see blue circle in attached image), but I'm not sure what processor(s) does what I've described.

There isn't really a provided processor that can do this because it requires some knowledge of your data. You would have to implement a custom processor, or use ExecuteScript.

You can use wait-notify to get flowfiles wait for another like in this example:
https://pierrevillard.com/2018/06/27/nifi-workflow-monitoring-wait-notify-pattern-with-split-and-merge/comment-page-1/
Note that the link is only an example, your use case is different and must be changed according to your requirements.
Then you can merge the information in one flowfile and use EL to generate the new json value.

Related

NiFi Flow for Record Enrichment

I am using NiFi 1.11.4 to build a data pipeline where IoT device is sending data in JSON format. Each time I receive data from IoT device, I receive two JSONs;
JSON_INITIAL
{
devId: "abc",
devValue: "TWOINITIALCHARS23",
}
and
JSON_FINAL
{
devId: "abc",
devValue: "TWOINITIALCHARS45",
}
There is a time difference of a few milli seconds with which I receive these two flow files. In my usecase, I need to merge this JSON in such a way that my resultant JSON looks like below (please note removal of TWOINITIALCHARS in both cases;
JSON_RESULT_AFTER_MERGE
{
devId: "abc",
devValue: "2345",
}
Is this something NiFi should be dealing with? If yes, would really appreciate an approach to design relevant flow for this use case.
Assuming the devId is static for a device and not used for the correlation (i.e. abc for all messages coming from this device, not abc for the first two and then def for the next two, etc.), you have a few options:
Use MergeContent to concatenate the flowfile contents (the two JSON blocks) and ReplaceText to modify the combined content to match the desired output. This will require tuning the MC binning properties to limit the merge window to 1-2 seconds (difficult/insufficient if you're receiving multiple messages per second, for example) and using regular expressions to remove the duplicate content.
Use a custom script to interact with the device JSON output (Groovy for example will make the JSON interaction pretty simple)
If you do this within the context of NiFi (via ExecuteScript or InvokeScriptedProcessor), you will have access to the NiFi framework, so you can evaluate flowfile attributes and content, making this easier (there will be attributes for initial timestamp, etc.).
If you do this outside the context of NiFi (via ExecuteProcess or ExecuteStreamCommand), you won't have access to the NiFi framework (attributes, etc.) but you may have better interaction with the device directly.

In nifi usgae of Evaluate jsonpath processor will it affect performance impact because of attribute creation

I'm trying to integrate nifi REST API's with my application. So by mapping input and output from my application, I am trying to call nifi REST api for flow creation. So, in my use case most of the times I will extract the JSON values and will apply expression languages.
So, for simplifying all the use-cases I am using evaluate JSONpath processor for fetching all attributes using jsonpath and apply expression language function on that in extract processor. Below is the flow diagram regarding that.
Is it the right approach because for JSON to JSON manipulation having 30 keys this is the simplest way, and as I am trying to integrate nifi REST API's with my application I cannot generate JOLT transformation logic dynamically based on the user mapping.
So, in this case, does the usage of evaluating JSONpath processor creates any performance issues for about 50 use case with different transformation logic because as I saw in documentation attribute usage creates performance(regarding memory) issues.
Your concern about having too many attributes in memory should not be an issue here; having 30 attributes per flowfile is higher than usual, but if these are all strings between 0 - ~100-200 characters, there should be minimal impact. If you start trying to extract KB worth of data from the flowfile content to the attributes on each flowfile, you will see increased heap usage, but the framework should still be able to handle this until you reach very high throughput (1000's of flowfiles per second on commodity hardware like a modern laptop).
You may want to investigate ReplaceTextWithMapping, as that processor can load from a definition file and handle many replace operations using a single processor.
It is usually a flow design "smell" to have multiple copies of the same flow process with different configuration values (with the occasional exception of database interaction). Rather, see if there is a way you can genericize the process and populate the relevant values for each flowfile using variable population (from the incoming flowfile attributes, the variable registry, environment variables, etc.).

Adding a global store for a transformer to consume

Is there a way to add a global store for a Transformer to use? In the docs for transformer it says:
"Transform each record of the input stream into zero or more records in the output stream (both key and value type can be altered arbitrarily). A Transformer (provided by the given TransformerSupplier) is applied to each input record and computes zero or more output records. In order to assign a state, the state must be created and registered beforehand via stores added via addStateStore or addGlobalStore before they can be connected to the Transformer"
yet, the API for addGlobalStore on takes a ProcessSupplier?
addGlobalStore(storeBuilder: StoreBuilder[_ <: StateStore],
topic: String,
consumed: Consumed[_, _],
stateUpdateSupplier: ProcessorSupplier[_, _])
My end goal is to the Kafka Streams DSL, with a transformer since I need a flatMap and transform both keys and values to my output topic. I do not have a processor in my topology tho.
I would expect something like this:
addGlobalStore(storeBuilder: StoreBuilder[_ <: StateStore], topic: String, consumed: Consumed[_, ], stateUpdateSupplier: TransformerSupplier[, _])
The Processor that is passed into addGlobalStore() is use to maintain (ie, write) the store. Note, that's it's expected atm that this Processor copies the data as-is into the store (cf https://issues.apache.org/jira/browse/KAFKA-7663).
After you have added a global store, you can also add a Transformer and the Transformer can access the store. Note, that it's not required to connect a global store to make it available (only "regular" stores, would need to be added). Also note, that a Transformer only gets read access to global stores.
Use a Processor instead of Transformer, for all the transformations you want to perform on the input topic, whenever there is a usecase of lookingup data from GlobalStateStore . Use context.forward(key,value,childName) to send the data to the downstream nodes. context.forward(key,value,childName) may be called multiple times in a process() and punctuate() , so as to send multiple records to downstream node. If there is a requirement to update GlobalStateStore, do this only in Processor passed to addGlobalStore(..) because, there is a GlobalStreamThread associated with GlobalStateStore, which keeps the state of the store consistent across all the running kstream instances.

Separate data received from various senders in Veins/OMNET++

In my scenario, a receiving vehicle gets BSMs from multiple senders. I need the BSM data recorded separately according to their respective senders.
Currently, I am achieving this using a custom logging system. However, since OMNET++ has a sophisticated logging system built-in, Is it possible to achieve what I need using OMNET's built-in tools?
OMNeT++ vectors log 2-tuples (TV: time+value) or 3-tuples (ETV: event+time+value) for each piece of data. You can use this additional information to find which values have been recorded at the same simulation time or as a consequence of the same event.

NiFi fetchFile processor doesn't allow dynamic attributes

What is the reason that some of NiFi processors don't allow dynamic attributes? I'm using FetchFile processor in one of my workflows and I need to pass through some data throughout the flow to be able to use it in the last step. However, FetchFile breaks it by not allowing dynamic attributes. I'm wondering if there is another way to do it? Why would NiFi not allow dynamic attributes on certain processors?
My flow is something like
ExecuteScript -> EvaluateJSon -> Custom Processor to write files-> FetchFile->SendtoS3 -> Mark workflow complete
I want to send some metadata so that I could mark the workflow complete. I'm passing that data as attributes but it breaks at FetchFile.
There are two separate concepts, user-defined properties on processors, and flow file attributes.
User-defined properties let a processor take input from a user for something that couldn't be defined ahead of time. Examples of this are in EvaluateJsonPath when the JSON paths are specified in user-defined properties, or in PutSolrContentStream when all the user-defined properties get passed as query parameters to Solr.
FlowFile attributes are a map of key/value pairs that get passed around with each piece of data. These attributes are usually created when a processor produces or modifies a flow file, or can be manipulated using processors like UpdateAttribute.
It is up to each processor to decide whether it needs user-defined properties and how they would be used. UpdateAttribute happens to be a processor where the user-defined properties are added as new key/value pairs to each flow file, but it doesn't make sense for every processor to do that.

Resources