I am going to convert a log file events (which is recorded by LogAttribute processor) to JSON.
I am using ExtractGrok with this configuration:
STACK pattern in pattern file is (?m).*
Each log has this format:
2019-11-21 15:26:06,912 INFO [Timer-Driven Process Thread-4] org.apache.nifi.processors.standard.LogAttribute LogAttribute[id=143515f8-1f1d-1032-e7d2-8c07f50d1c5a] logging for flow file StandardFlowFileRecord[uuid=02eb9f21-4587-458b-8cee-ad052cb8e634,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1574339166853-1, container=default, section=1], offset=0, length=0],offset=0,name=0df20cc1-3f93-49df-81b1-dac18318ccd9,size=0]
------------- request was received----------
Standard FlowFile Attributes
Key: 'entryDate'
Value: 'Thu Nov 21 15:26:06 AST 2019'
Key: 'lineageStartDate'
Value: 'Thu Nov 21 15:26:06 AST 2019'
Key: 'fileSize'
Value: '0'
FlowFile Attribute Map Content
Key: 'filename'
Value: '0df20cc1-3f93-49df-81b1-dac18318ccd9'
Key: 'http.context.identifier'
Value: '9552bd22-ec3b-4ada-93a9-a5ce9b27de25'
Key: 'path'
Value: './'
Key: 'uuid'
Value: '02eb9f21-4587-458b-8cee-ad052cb8e634'
-------------- request was received----------
I expect rest of the message after first line saved in log, but I get only first line:
-------------- request was received----------
I check the expression in Grok Debugger and it works. but it doesn't work with NiFi.
How to config ExtractGrok to get all lines in log value?
I found the solution, I replace (?m).* with this one (?s).* and it works.
Related
Consider the following flow which authenticates via HTTP to a service. I'm seeing an HTTP status code of 201 (created) come back, which should trigger the response relationship/flow. However as you can see in the log below, only the original relationship is triggered.
The Flow
Green lines indicate "response" flow. Magenta indicates "original" flow.
POST /token properties
Log
You can see here that the original relationship is triggered, but the response is not -- even though the status code, 201, is in the "success" range.
2023-01-29 15:22:08,341 INFO [Timer-Driven Process Thread-7] o.a.n.processors.standard.LogAttribute LogAttribute[id=fe0ace38-0185-1000-376d-8737d0e020f8] logging for flow file StandardFlowFileRecord[uuid=6b9f010a-f287-449c-8bef-94840c5cfa2f,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1674862641879-1, container=default, section=1], offset=13494, length=107],offset=0,name=6b9f010a-f287-449c-8bef-94840c5cfa2f,size=107]
---------------------ORIGINAL---------------------
FlowFile Properties
Key: 'entryDate'
Value: 'Sun Jan 29 15:22:07 UTC 2023'
Key: 'lineageStartDate'
Value: 'Sun Jan 29 15:22:07 UTC 2023'
Key: 'fileSize'
Value: '107'
FlowFile Attribute Map Content
Key: 'filename'
Value: '6b9f010a-f287-449c-8bef-94840c5cfa2f'
Key: 'invokehttp.request.duration'
Value: '738'
Key: 'invokehttp.request.url'
Value: '...'
Key: 'invokehttp.response.url'
Value: '...'
Key: 'invokehttp.status.code'
Value: '201'
Key: 'invokehttp.status.message'
Value: ''
Key: 'invokehttp.tx.id'
Value: 'efca13ac-16a1-4a27-a8e1-d04110d48523'
Key: 'mime.type'
Value: 'application/json'
Key: 'path'
Value: './'
Key: 'responseBody'
Value: '...'
Key: 'uuid'
Value: '6b9f010a-f287-449c-8bef-94840c5cfa2f'
---------------------ORIGINAL---------------------
The only thing I though of which might be causing an issue is that I'm writing the response body to an attribute. I tried to test by setting this attribute name to empty string but that just gives me an error in the log. I assumed that without the attribute name set, the response body would be the FlowFile sent to the response relationship, but that doesn't seem to be working.
Update: I created a second InvokeHTTP processor and replaced the relationships / disabled the old one. The flow worked correctly until I set the Response Body Attribute Name, and then the response relationship stopped triggering. I need to set this attribute though, so I can extract the error message from the response in the case of failure. I think I'll have to enable the Response Generation Required option, and check the status code in the response relationship flow. This is not ideal, though.
When you use Response Body Attribute Name, only original route is triggered. It's InvokeHTTP's behaviour, you can check documentation.
FlowFile attribute name used to write an HTTP response body for FlowFiles transferred to the Original relationship.
You can use this way for your problem,
InvokeHTTP (original route)-> RouteOnAttribute - (Success - ${invokehttp.status.code.ge(200):and(${invokehttp.status.code.le(299)})})
When you set Response Body Attribute Name attribute, it means that you don't want new flowfile, you want just add a new attribute to existing flowfile.
Is it possible to transfer a certain value of a log line to another log line inside of the Logstash config?
I need the value RoundTripDuration of line CLIENT in the line of REQUEST
Current state:
CLIENT : Session: 97B31BBDBB793E62107178C911683FB4, RoundtripDuration: 185ms, ClientUpdate: 3 ms
REQUEST: Session: 97B31BBDBB793E62107178C911683FB4, Total: 163ms, Invoke: 126ms (#{d.d_9.GenericDialogUI}
Target state:
CLIENT : Session: 97B31BBDBB793E62107178C911683FB4, RoundtripDuration: 185ms, ClientUpdate: 3 ms
REQUEST: Session: 97B31BBDBB793E62107178C911683FB4, Total: 163ms, Invoke: 126ms (#{d.d_9.GenericDialogUI}, RoundtripDuration: 185ms
I am trying to POST HTTP request and request is not going thru. what am I doing wrong or missing?
I think it does not like this part
colonoscopy.jpg/1-1?
Server name: ${hostName}
Path: ${virtualDirectory}/data/media/${location}/colonoscopy.jpg/1-1?&prodName=${prodName}&otherParams=&sid=${authToken}
HTTP Header Manager
Content-Type application/x-www-form-urlencoded
Getting
Thread Name: Thread Group 1-1
Sample Start: 2019-02-01 15:39:06 PST
Load time: 0
Connect Time: 0
Latency: 0
Size in bytes: 1681
Sent bytes:0
Headers size in bytes: 0
Body size in bytes: 1681
Sample Count: 1
Error Count: 1
Data type ("text"|"bin"|""): text
Response code: Non HTTP response code: java.net.URISyntaxException
Response message: Non HTTP response message: Illegal character in path at index 53: http://10.188.169.185/api/v2/data/media/dc2e83cfe2054
Server:%20Microsoft-IIS/10.0
Set-Cookie:%20ASP.NET_SessionId=5cwq0inclxwvmd0qlnd01yo3;%20path=/;%20HttpOnly
X-AspNet-Version:%204.0.30319
X-Powered-By:%20ASP.NET
Access-Control-Allow-Origin:%20*
Access-Control-Allow-Headers:%20Content-Type,Authorization
Access-Control-Expose-Headers:%20Content-Location,%20Location
Access-Control-Allow-Methods:%20GET,%20POST,%20OPTIONS,%20DELETE
Date:%20Fri,%2001%20Feb%202019%2023:39:05%20GMT
Content-Length:%200
colonoscopy.jpg/1-1?&prodName=test&otherParams=&sid=ca1bc7a576a44d9b8270b7cac2dddab8
HTTPSampleResult fields:
ContentType:
DataEncoding: null
Inspect your URL along with the query string using View Results Tree listener, the error can have only one reason: you're sending a character which requires URL encoding as it is.
The reasons are in:
One of your variables (${location}, ${prodName} or ${authToken}) has the character which requires the URL encoding. If this is the case - you will need to wrap it into __urlencode() function
One of your Variables (${location}, ${prodName} or ${authToken}) is not getting resolved into the value and curly braces characters ({ and }) are not allowed in the URL string without encoding. Use Debug Sampler to make sure that :
all the variables have their values
all the characters which require the encoding are encoded using the __urlencode() function
Disclaimer: I know absolutely nothing about nifi.
I need to receive messages from the ListenHTTP processor, and then convert each message into a timestamped json message.
So, say I receive the message hello world at 5 am. It should transform it into {"timestamp": "5 am", "message":"hello world"}.
How do I do that?
Each flowfile has attributes, which are pieces of metadata stored in key/value pairs in memory (available for rapid read/write). When any operation occurs, pieces of metadata get written by the NiFi framework, both to the provenance events related to the flowfile, and sometimes to the flowfile itself. For example, if ListenHTTP is the first processor in the flow, any flowfile that enters the flow will have an attribute entryDate with the value of the time it originated in the format Thu Jan 24 15:53:52 PST 2019. You can read and write these attributes with a variety of processors (i.e. UpdateAttribute, RouteOnAttribute, etc.).
For your use case, you could a ReplaceText processor immediately following the ListenHTTP processor with a search value of (?s)(^.*$) (the entire flowfile content, or "what you received via the HTTP call") and a replacement value of {"timestamp_now":"${now():format('YYYY-MM-dd HH:mm:ss.SSS Z')}", "timestamp_ed": "${entryDate:format('YYYY-MM-dd HH:mm:ss.SSS Z')}", "message":"$1"}.
The example above provides two options:
The entryDate is when the flowfile came into existence via the ListenHTTP processor
The now() function gets the current timestamp in milliseconds since the epoch
Those two values can differ slightly based on performance/queuing/etc. In my simple example, they were 2 milliseconds apart. You can format them using the format() method and the normal Java time format syntax, so you could get "5 am" for example by using h a (full example: now():format('h a'):toLower()).
Example
ListenHTTP running on port 9999 with path contentListener
ReplaceText as above
LogAttribute with log payload true
Curl command: curl -d "helloworld" -X POST http://localhost:9999/contentListener
Example output:
2019-01-24 16:04:44,529 INFO [Timer-Driven Process Thread-6] o.a.n.processors.standard.LogAttribute LogAttribute[id=8246b0a0-0168-1000-7254-2c2e43d136a7] logging for flow file StandardFlowFileRecord[uuid=5e1c6d12-298d-4d9c-9fcb-108c208580fa,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1548374015429-1, container=default, section=1], offset=3424, length=122],offset=0,name=5e1c6d12-298d-4d9c-9fcb-108c208580fa,size=122]
--------------------------------------------------
Standard FlowFile Attributes
Key: 'entryDate'
Value: 'Thu Jan 24 16:04:44 PST 2019'
Key: 'lineageStartDate'
Value: 'Thu Jan 24 16:04:44 PST 2019'
Key: 'fileSize'
Value: '122'
FlowFile Attribute Map Content
Key: 'filename'
Value: '5e1c6d12-298d-4d9c-9fcb-108c208580fa'
Key: 'path'
Value: './'
Key: 'restlistener.remote.source.host'
Value: '127.0.0.1'
Key: 'restlistener.remote.user.dn'
Value: 'none'
Key: 'restlistener.request.uri'
Value: '/contentListener'
Key: 'uuid'
Value: '5e1c6d12-298d-4d9c-9fcb-108c208580fa'
--------------------------------------------------
{"timestamp_now":"2019-01-24 16:04:44.518 -0800", "timestamp_ed": "2019-01-24 16:04:44.516 -0800", "message":"helloworld"}
So, I added an ExecuteScript processor with this code:
import org.apache.commons.io.IOUtils
import java.nio.charset.StandardCharsets
import java.time.LocalDateTime
flowFile = session.get()
if(!flowFile)return
def text = ''
// Cast a closure with an inputStream parameter to InputStreamCallback
session.read(flowFile, {inputStream ->
text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
// Do something with text here
} as InputStreamCallback)
def outputMessage = '{\"timestamp\":\"' + LocalDateTime.now().toString() + '\", \"message:\":\"' + text + '\"}'
flowFile = session.write(flowFile, {inputStream, outputStream ->
text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
outputStream.write(outputMessage.getBytes(StandardCharsets.UTF_8))
} as StreamCallback)
session.transfer(flowFile, REL_SUCCESS)
and it worked.
I have spent several hours now trying to figure out the expression language to get hold of the flowfile content.
Have a simple test flow to try and learn Nifi where I have:
GetMongo -> LogAttributes -> Put Slack
-----------------------LOG1-----------------------
Standard FlowFile Attributes
Key: 'entryDate'
Value: 'Wed Sep 28 23:58:36 GMT 2016'
Key: 'lineageStartDate'
Value: 'Wed Sep 28 23:58:36 GMT 2016'
Key: 'fileSize'
Value: '70'
FlowFile Attribute Map Content
Key: 'filename'
Value: '43546945658800'
Key: 'path'
Value: './'
Key: 'uuid'
Value: 'd1e10623-0e90-44af-a620-6bed9776ed62'
-----------------------LOG1-----------------------
{ "_id" : { "$oid" : "57ec27ec35a0759d54fb465d" }, "keyA" : "valueA" }
In the putSlack expression for test I have tried:
${flowfile.content}
${message}
${payload}
${msg}
${flowfile-content}
${content}
There is no expression language that accesses the content of the flow file. The attributes and the content are purposely stored very differently in order to facilitate moving around a Flow File that could represent a large payload. Expression language operates on attributes only.
The ExtractText processor can be used to extract the whole content of the Flow File into an attribute, just keep in mind that should only be done when you know the content will have no problem fitting in memory.