Disclaimer: I know absolutely nothing about nifi.
I need to receive messages from the ListenHTTP processor, and then convert each message into a timestamped json message.
So, say I receive the message hello world at 5 am. It should transform it into {"timestamp": "5 am", "message":"hello world"}.
How do I do that?
Each flowfile has attributes, which are pieces of metadata stored in key/value pairs in memory (available for rapid read/write). When any operation occurs, pieces of metadata get written by the NiFi framework, both to the provenance events related to the flowfile, and sometimes to the flowfile itself. For example, if ListenHTTP is the first processor in the flow, any flowfile that enters the flow will have an attribute entryDate with the value of the time it originated in the format Thu Jan 24 15:53:52 PST 2019. You can read and write these attributes with a variety of processors (i.e. UpdateAttribute, RouteOnAttribute, etc.).
For your use case, you could a ReplaceText processor immediately following the ListenHTTP processor with a search value of (?s)(^.*$) (the entire flowfile content, or "what you received via the HTTP call") and a replacement value of {"timestamp_now":"${now():format('YYYY-MM-dd HH:mm:ss.SSS Z')}", "timestamp_ed": "${entryDate:format('YYYY-MM-dd HH:mm:ss.SSS Z')}", "message":"$1"}.
The example above provides two options:
The entryDate is when the flowfile came into existence via the ListenHTTP processor
The now() function gets the current timestamp in milliseconds since the epoch
Those two values can differ slightly based on performance/queuing/etc. In my simple example, they were 2 milliseconds apart. You can format them using the format() method and the normal Java time format syntax, so you could get "5 am" for example by using h a (full example: now():format('h a'):toLower()).
Example
ListenHTTP running on port 9999 with path contentListener
ReplaceText as above
LogAttribute with log payload true
Curl command: curl -d "helloworld" -X POST http://localhost:9999/contentListener
Example output:
2019-01-24 16:04:44,529 INFO [Timer-Driven Process Thread-6] o.a.n.processors.standard.LogAttribute LogAttribute[id=8246b0a0-0168-1000-7254-2c2e43d136a7] logging for flow file StandardFlowFileRecord[uuid=5e1c6d12-298d-4d9c-9fcb-108c208580fa,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1548374015429-1, container=default, section=1], offset=3424, length=122],offset=0,name=5e1c6d12-298d-4d9c-9fcb-108c208580fa,size=122]
--------------------------------------------------
Standard FlowFile Attributes
Key: 'entryDate'
Value: 'Thu Jan 24 16:04:44 PST 2019'
Key: 'lineageStartDate'
Value: 'Thu Jan 24 16:04:44 PST 2019'
Key: 'fileSize'
Value: '122'
FlowFile Attribute Map Content
Key: 'filename'
Value: '5e1c6d12-298d-4d9c-9fcb-108c208580fa'
Key: 'path'
Value: './'
Key: 'restlistener.remote.source.host'
Value: '127.0.0.1'
Key: 'restlistener.remote.user.dn'
Value: 'none'
Key: 'restlistener.request.uri'
Value: '/contentListener'
Key: 'uuid'
Value: '5e1c6d12-298d-4d9c-9fcb-108c208580fa'
--------------------------------------------------
{"timestamp_now":"2019-01-24 16:04:44.518 -0800", "timestamp_ed": "2019-01-24 16:04:44.516 -0800", "message":"helloworld"}
So, I added an ExecuteScript processor with this code:
import org.apache.commons.io.IOUtils
import java.nio.charset.StandardCharsets
import java.time.LocalDateTime
flowFile = session.get()
if(!flowFile)return
def text = ''
// Cast a closure with an inputStream parameter to InputStreamCallback
session.read(flowFile, {inputStream ->
text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
// Do something with text here
} as InputStreamCallback)
def outputMessage = '{\"timestamp\":\"' + LocalDateTime.now().toString() + '\", \"message:\":\"' + text + '\"}'
flowFile = session.write(flowFile, {inputStream, outputStream ->
text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
outputStream.write(outputMessage.getBytes(StandardCharsets.UTF_8))
} as StreamCallback)
session.transfer(flowFile, REL_SUCCESS)
and it worked.
Related
Consider the following flow which authenticates via HTTP to a service. I'm seeing an HTTP status code of 201 (created) come back, which should trigger the response relationship/flow. However as you can see in the log below, only the original relationship is triggered.
The Flow
Green lines indicate "response" flow. Magenta indicates "original" flow.
POST /token properties
Log
You can see here that the original relationship is triggered, but the response is not -- even though the status code, 201, is in the "success" range.
2023-01-29 15:22:08,341 INFO [Timer-Driven Process Thread-7] o.a.n.processors.standard.LogAttribute LogAttribute[id=fe0ace38-0185-1000-376d-8737d0e020f8] logging for flow file StandardFlowFileRecord[uuid=6b9f010a-f287-449c-8bef-94840c5cfa2f,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1674862641879-1, container=default, section=1], offset=13494, length=107],offset=0,name=6b9f010a-f287-449c-8bef-94840c5cfa2f,size=107]
---------------------ORIGINAL---------------------
FlowFile Properties
Key: 'entryDate'
Value: 'Sun Jan 29 15:22:07 UTC 2023'
Key: 'lineageStartDate'
Value: 'Sun Jan 29 15:22:07 UTC 2023'
Key: 'fileSize'
Value: '107'
FlowFile Attribute Map Content
Key: 'filename'
Value: '6b9f010a-f287-449c-8bef-94840c5cfa2f'
Key: 'invokehttp.request.duration'
Value: '738'
Key: 'invokehttp.request.url'
Value: '...'
Key: 'invokehttp.response.url'
Value: '...'
Key: 'invokehttp.status.code'
Value: '201'
Key: 'invokehttp.status.message'
Value: ''
Key: 'invokehttp.tx.id'
Value: 'efca13ac-16a1-4a27-a8e1-d04110d48523'
Key: 'mime.type'
Value: 'application/json'
Key: 'path'
Value: './'
Key: 'responseBody'
Value: '...'
Key: 'uuid'
Value: '6b9f010a-f287-449c-8bef-94840c5cfa2f'
---------------------ORIGINAL---------------------
The only thing I though of which might be causing an issue is that I'm writing the response body to an attribute. I tried to test by setting this attribute name to empty string but that just gives me an error in the log. I assumed that without the attribute name set, the response body would be the FlowFile sent to the response relationship, but that doesn't seem to be working.
Update: I created a second InvokeHTTP processor and replaced the relationships / disabled the old one. The flow worked correctly until I set the Response Body Attribute Name, and then the response relationship stopped triggering. I need to set this attribute though, so I can extract the error message from the response in the case of failure. I think I'll have to enable the Response Generation Required option, and check the status code in the response relationship flow. This is not ideal, though.
When you use Response Body Attribute Name, only original route is triggered. It's InvokeHTTP's behaviour, you can check documentation.
FlowFile attribute name used to write an HTTP response body for FlowFiles transferred to the Original relationship.
You can use this way for your problem,
InvokeHTTP (original route)-> RouteOnAttribute - (Success - ${invokehttp.status.code.ge(200):and(${invokehttp.status.code.le(299)})})
When you set Response Body Attribute Name attribute, it means that you don't want new flowfile, you want just add a new attribute to existing flowfile.
I am using Benthos to read AVRO-encoded messages from Kafka which have the kafka_key metadata field set to also contain an AVRO-encoded payload. The schemas of these AVRO-encoded payloads are stored in Schema Registry and Benthos has a schema_registry_decode processor for decoding them. I'm looking to produce an output JSON message for each Kafka message containing two fields, one called content containing the decoded AVRO message and the other one called metadata containing the various metadata fields collected by Benthos including the decoded kafka_key payload.
It turns out that one can achieve this using a branch processor like so:
input:
kafka:
addresses:
- localhost:9092
consumer_group: benthos_consumer_group
topics:
- benthos_input
pipeline:
processors:
# Decode the message
- schema_registry_decode:
url: http://localhost:8081
# Populate output content field
- bloblang: |
root.content = this
# Decode kafka_key metadata payload and populate output metadata field
- branch:
request_map: |
root = meta("kafka_key")
processors:
- schema_registry_decode:
url: http://localhost:8081
result_map: |
root.metadata = meta()
root.metadata.kafka_key = this
output:
stdout: {}
I am going to convert a log file events (which is recorded by LogAttribute processor) to JSON.
I am using ExtractGrok with this configuration:
STACK pattern in pattern file is (?m).*
Each log has this format:
2019-11-21 15:26:06,912 INFO [Timer-Driven Process Thread-4] org.apache.nifi.processors.standard.LogAttribute LogAttribute[id=143515f8-1f1d-1032-e7d2-8c07f50d1c5a] logging for flow file StandardFlowFileRecord[uuid=02eb9f21-4587-458b-8cee-ad052cb8e634,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1574339166853-1, container=default, section=1], offset=0, length=0],offset=0,name=0df20cc1-3f93-49df-81b1-dac18318ccd9,size=0]
------------- request was received----------
Standard FlowFile Attributes
Key: 'entryDate'
Value: 'Thu Nov 21 15:26:06 AST 2019'
Key: 'lineageStartDate'
Value: 'Thu Nov 21 15:26:06 AST 2019'
Key: 'fileSize'
Value: '0'
FlowFile Attribute Map Content
Key: 'filename'
Value: '0df20cc1-3f93-49df-81b1-dac18318ccd9'
Key: 'http.context.identifier'
Value: '9552bd22-ec3b-4ada-93a9-a5ce9b27de25'
Key: 'path'
Value: './'
Key: 'uuid'
Value: '02eb9f21-4587-458b-8cee-ad052cb8e634'
-------------- request was received----------
I expect rest of the message after first line saved in log, but I get only first line:
-------------- request was received----------
I check the expression in Grok Debugger and it works. but it doesn't work with NiFi.
How to config ExtractGrok to get all lines in log value?
I found the solution, I replace (?m).* with this one (?s).* and it works.
I have yaml data like the input below and i need output as key value pairs
Input
a="""
--- !ruby/hash:ActiveSupport::HashWithIndifferentAccess
code:
- '716'
- '718'
id:
- 488
- 499
"""
ouput needed
{'code': ['716', '718'], 'id': [488, 499]}
The default constructor was giving me an error. I tried adding new constructor and now its not giving me error but i am not able to get key value pairs.
FYI, If i remove the !ruby/hash:ActiveSupport::HashWithIndifferentAccess line from my yaml then it gives me desired output.
def new_constructor(loader, tag_suffix, node):
if type(node.value)=='list':
val=''.join(node.value)
else:
val=node.value
val=node.value
ret_val="""
{0}
""".format(val)
return ret_val
yaml.add_multi_constructor('', new_constructor)
yaml.load(a)
output
"\n [(ScalarNode(tag=u'tag:yaml.org,2002:str', value=u'code'), SequenceNode(tag=u'tag:yaml.org,2002:seq', value=[ScalarNode(tag=u'tag:yaml.org,2002:str', value=u'716'), ScalarNode(tag=u'tag:yaml.org,2002:str', value=u'718')])), (ScalarNode(tag=u'tag:yaml.org,2002:str', value=u'id'), SequenceNode(tag=u'tag:yaml.org,2002:seq', value=[ScalarNode(tag=u'tag:yaml.org,2002:int', value=u'488'), ScalarNode(tag=u'tag:yaml.org,2002:int', value=u'499')]))]\n "
Please suggest.
This is not a solution using PyYAML, but I recommend using ruamel.yaml instead. If for no other reason, it's more actively maintained than PyYAML. A quote from the overview
Many of the bugs filed against PyYAML, but that were never acted upon, have been fixed in ruamel.yaml
To load that string, you can do
import ruamel.yaml
parser = ruamel.yaml.YAML()
obj = parser.load(a) # as defined above.
I strongly recommend following #Andrew F answer, but in case you
wonder why your code did not get the proper result, that is because
you don't correctly process the node under the tag in your tag
handling.
Although the node's value is a list (of tuples with key value pairs),
you should test for the type of the node itself (using isinstance)
and then hand it over to the "normal" mapping processing routine as
the tag is on a mapping:
import yaml
from yaml.loader import SafeLoader
a = """\
--- !ruby/hash:ActiveSupport::HashWithIndifferentAccess
code:
- '716'
- '718'
id:
- 488
- 499
"""
def new_constructor(loader, tag_suffix, node):
if isinstance(node, yaml.nodes.MappingNode):
return loader.construct_mapping(node, deep=True)
raise NotImplementedError
yaml.add_multi_constructor('', new_constructor, Loader=SafeLoader)
data = yaml.load(a, Loader=SafeLoader)
print(data)
which gives:
{'code': ['716', '718'], 'id': [488, 499]}
You should not use PyYAML's yaml.load(), it is documented to be potentially unsafe
and above all it is not necessary. Just add the new constructor to the SafeLoader.
I have spent several hours now trying to figure out the expression language to get hold of the flowfile content.
Have a simple test flow to try and learn Nifi where I have:
GetMongo -> LogAttributes -> Put Slack
-----------------------LOG1-----------------------
Standard FlowFile Attributes
Key: 'entryDate'
Value: 'Wed Sep 28 23:58:36 GMT 2016'
Key: 'lineageStartDate'
Value: 'Wed Sep 28 23:58:36 GMT 2016'
Key: 'fileSize'
Value: '70'
FlowFile Attribute Map Content
Key: 'filename'
Value: '43546945658800'
Key: 'path'
Value: './'
Key: 'uuid'
Value: 'd1e10623-0e90-44af-a620-6bed9776ed62'
-----------------------LOG1-----------------------
{ "_id" : { "$oid" : "57ec27ec35a0759d54fb465d" }, "keyA" : "valueA" }
In the putSlack expression for test I have tried:
${flowfile.content}
${message}
${payload}
${msg}
${flowfile-content}
${content}
There is no expression language that accesses the content of the flow file. The attributes and the content are purposely stored very differently in order to facilitate moving around a Flow File that could represent a large payload. Expression language operates on attributes only.
The ExtractText processor can be used to extract the whole content of the Flow File into an attribute, just keep in mind that should only be done when you know the content will have no problem fitting in memory.