how to define a field in grok regex fluentd - elasticsearch

i have a below apache atlas audit logs:
[INFO] 2020-06-29 15:14:31,732 AUDIT logJSON - {"repoType":15,"repo":"atlas","reqUser":"varun","evtTime":"2020-06-29 15:14:29.967","access":"entity-read","resource":"AtlanColumn/[]/glue/78975568964/flights/default/flightsgdelt_100m_test_partition/c_11","resType":"entity","action":"entity-read","result":1,"agent":"atlas","policy":6,"enforcer":"ranger-acl","cliIP":"10.9.2.76","agentHost":"atlas-7d9dcdd6c5-lmfzj","logType":"RangerAudit","id":"87c9e862-910b-4ee2-86f8-cb174f4e7b76-863129","seq_num":1701441,"event_count":1,"event_dur_ms":0,"tags":[],"cluster_name":"","policy_version":54}
rite now i have below parse config:
<parse>
#type regexp
expression ^\[(?<Level>.[^ ]*)\] (?<datetime>[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3}) (?<Type>.[^ ]*) (?<Action>.[^ ]*) \- \{"repoType":(?<repoType>.[^ ]*)\,"repo":"(?<repo>.[^ ]*)\","reqUser":"(?<reqUser>.[^ ]*)\","evtTime":"(?<evtTime>.[^ ].*)\","access":"(?<access>.[^ ]*)\","resource":"(?<resource>.[^ ].*)\","resType":"(?<resType>.[^ ]*)\","action":"(?<action>.[^ ]*)\","result":(?<result>.[^ ]*)\,"agent":"(?<agent>.[^ ].*)\","policy":(?<policy>.[^ ]*)\,"enforcer":"(?<enforcer>.[^ ]*)\","cliIP":"(?<cliIP>.[^ ]*)\","agentHost":"(?<agentHost>.[^ ]*)\","logType":"(?<logType>.[^ ]*)\","id":"(?<id>.[^ ]*)\","seq_num":(?<seq_num>.[^ ]*)\,"event_count":(?<event_count>.[^ ]*)\,"event_dur_ms":(?<event_dur_ms>.[^ ]*)\,"tags":(?<tags>.[^ ].*)\,"cluster_name":(?<cluster_name>.[^ ].*),"policy_version":(?<policy_version>.[^ ]*)\}
</parse>
now we want to further breakdown the resource field into multiple fields like below:
AssetType
Tags
Integration
Database
Schema
Table
Column
issue here is its not neccesary that resource field always has above combination. it can be AssetType/Tags/Integration or AssetType/Tags/Integration/Database or AssetType/Tags/Integration/Database/Schema or AssetType/Tags/Integration/Database/Schema/Table or AssetType/Tags/Integration/Database/Schema/Table/Column.
if any of the fields are missing then we should send null.
any suggestion or guidence on this would be highly appreciated.

you can use the record_reformer plugin to parse the resource key and extract the needed values for each of the needed keys, Below is an example of the usage
<match pattern.**>
#type record_reformer
tag new_tag.${tag_suffix[2]}
renew_record false
enable_ruby true
<record>
AssetType ${record['resource'].scan(/^([^\/]+\/){0}(?<param>[^\/]+)/).flatten.compact[0]}
Tags ${record['resource'].scan(/^([^\/]+\/){1}(?<param>[^\/]+)/).flatten.compact[0]}
Integration ${record['resource'].scan(/^([^\/]+\/){2}(?<param>[^\/]+)/).flatten.compact[0]}
Database ${record['resource'].scan(/^([^\/]+\/){3}(?<param>[^\/]+)/).flatten.compact[0]}
Schema ${record['resource'].scan(/^([^\/]+\/){4}(?<param>[^\/]+)/).flatten.compact[0]}
Table ${record['resource'].scan(/^([^\/]+\/){5}(?<param>[^\/]+)/).flatten.compact[0]}
Column ${record['resource'].scan(/^([^\/]+\/){6}(?<param>[^\/]+)/).flatten.compact[0]}
</record>
</match>

Related

Fluentd record transformer conditional statements to add key if not present

I want to add a message field to the log if it is not present in the logs.
Here's the relevant fluentd configuration:
<filter **>
#type record_transformer
enable_ruby true
<record>
message ${ if record.has_key?('message'); then record ["message"]; else record["message"] == "nomsg"; end}
</record>
</filter>
But when the message field is not present I get message=false, when it is present getting message=actual_msg.
Not sure why it is not taking message=nomsg.
Please help and suggest.
Tried above syntax to and fro but no luck.
Simply make changes in else condition as below:
<filter **>
#type record_transformer
enable_ruby true
<record>
message ${ if record.has_key?('message'); then record ["message"]; else "nomsg"; end}
</record>
</filter>
I hope this will help you.

How to let fluent-bit skip the field that can not be parserd?

I am trying to send data from fluent_bit to the Elastic search
Here is my fluent-bit parser:
[PARSER]
Name escape_utf8_log
Format json
# Command | Decoder | Field | Optional Action
# =============|=====================|=================
Decode_Field_As escaped_utf8 log
Decode_Field json log [PARSER]
Name escape_message
Format json
# Command | Decoder | Field | Optional Action
# =============|=================|=================
Decode_Field_As escaped_utf8 message
Decode_Field json message
Here is my fluent-bit config:
[FILTER]
Name parser
Match docker_logs
Key_Name message
Parser escape_message
Reserve_Data True
In some cases, other people would put the log data to the fluent-bit in the wrong format so that we can get "mapper_parsing_exception" (example: filed to parse field [id] of type long in document).
I am trying to skip parsing a log and then send it to ES anyway if the fluent can not parse that log. so that we would not get the parser error even if someone sends the wrong format to fluent_bit. Is this possible to do that?

Is if record.dig exists else null exclude even possible?

I'm trying to work on filtering data in fluentd using the record_modifier plugin which are null by adding an if statement. I’ve seen a few examples but none match my use case. Has anyone come across this or can confirm it's even possible?
Human readable action
partOf =
if the value of record.dig("kubernetes", "labels", "app.kubernetes.io/part-of") exists
include record and value
else
remove_keys partOf
Example config with dig:
<filter **>
#type record_modifier
<record>
partOf ${record.dig("kubernetes", "labels", "app.kubernetes.io/part-of")}
</record>
</filter>
partOf is a common k8s label sample; "app.kubernetes.io/part-of": "kube-state-metrics",
With record_modifier, it's working with embedded Ruby logic like this:
<filter **>
#type record_modifier
remove_keys __dummy__
<record>
__dummy__ ${ p = record["kubernetes"]["labels"]["app.kubernetes.io/part-of"]; p.nil? ? p : record['partOf'] = p; }
</record>
</filter>
See: https://github.com/repeatedly/fluent-plugin-record-modifier#ruby-code-trick-for-complex-logic
For running a complete test cycle, please refer to this answer.

Fluentd logs not sent to Elasticsearch - pattern not match

I can't get log messages to be processed by Fluentd and sent to Elasticsearch. I'm tailing the container of my service, it will pick up the log, but it can't parse it, it always fails with the error pattern not match. I understand that something is wrong with my parsing setup, but I can't see what.
The service writes to Stdout with Serilog, using the ElasticsearchJsonFormatter. My understanding is that it will write valid json to the console. This appears to be happening if I view the logs of the running container. When I view the Fluentd logs, it looks as if it has all been escaped.
If I view the logs of the service pod I can see the message, if then view the logs of the fluentd pod I can see the pattern not match error. Both of these are included below.
Any help or pointers will be greatly appreciated, as I've been stuck on this for days now.
Serilog Setup
var loggerConfig = new LoggerConfiguration()
.WriteTo.Console(new ElasticsearchJsonFormatter())
.MinimumLevel.Information()
.MinimumLevel.Override("Microsoft", LogEventLevel.Warning)
.MinimumLevel.Override("System", LogEventLevel.Warning);
Fluentd Config
apiVersion: v1
kind: ConfigMap
metadata:
name: fluentd-config
data:
fluent.conf: |
<source>
#type tail
path /var/log/containers/my-service-*.log
pos_file /var/log/app.log.pos
tag kubernetes.*
read_from_head true
<parse>
#type json
time_format %Y-%m-%dT%H:%M:%S.%NZ
</parse>
</source>
<match **>
#type elasticsearch
host "#{ENV['FLUENT_ELASTICSEARCH_HOST']}"
port "#{ENV['FLUENT_ELASTICSEARCH_PORT']}"
user "#{ENV['FLUENT_ELASTICSEARCH_USER']}"
password "#{ENV['FLUENT_ELASTICSEARCH_PASSWORD']}"
index_name fluentd
type_name fluentd
</match>
Example log message
This is what I can see if I view the logs for the running container. In my case this is a Kubernetes pod.
{
"#timestamp": "2021-12-29T13:23:28.2692128+00:00",
"level": "Information",
"messageTemplate": "Get Record - User Claim Contracts: {contracts} All Claims: {allclaims}",
"message": "Get Record - User Claim Contracts: [1, 2, 3, 4] All Claims: \"Application.Claims\"",
"fields": {
"contracts": [
1,
2,
3,
4
],
"allclaims": "Application.Claims",
"ddsource": "application",
"RecordId": null,
"UserId": null,
"TransactionId": null,
"env": "UAT",
}
}
Fluentd pattern not match
This is what I see when I view the logs for the fluentd container. Again, this is a pod.
2021-12-29 13:37:48 +0000 [warn]: #0 pattern not match: "2021-12-29T13:37:47.852294372Z stdout F {\"#timestamp\":\"2021-12-29T13:37:47.8518242+00:00\",\"level\":\"Information\",\"messageTemplate\":\"Get Record - User Claim Contracts: {contracts} All Claims: {allclaims}\",\"message\":\"Get Record - User Claim Contracts: [1, 2, 3, 4] All Claims: \\\"Application.Claims\\\"\",\"fields\":{\"contracts\":[1,2,3,4],\"allclaims\":\"\",\"ddsource\":\"\",\"RecordId\":null,\"UserId\":null,\"TransactionId\":null,\"env\":\"UAT\",\"\":\"\",\"\":\"\"}}"
Your log message is not valid JSON, since it contains a comma in line "env": "UAT",. The Fluentd log writes out two more empty fields "":"" as part of your record.
To parse time fields, you have to tell Fluentd the name of the time_key, in your case with time_key: #timestamp. You can use %iso8601 as time_format. For details see the Fluentd documentation on time parameters.

How to create indices based on kubernetes metadata

I am using filebeat in Kubernetes to ship logs to elastic search.
I want to create indexes based on namespaces. I'm trying to create a custom index for my different apps in a Kubernetes cluster, but this not working.
I used below conf:-
output.elasticsearch:
index: "%{[kubernetes.labels.app]:filebeat}-%{[beat.version]}-%{+yyyy.MM.dd}"
Filebeat Kube-manifest link- https://github.com/anup1384/k8s-filebeat
Use elasticsearch output as given below in filebeat configmap
output.elasticsearch:
index: "%{[kubernetes.namespace]:filebeat}-%{[beat.version]}-%{+yyyy.MM.dd}"
Create a custom index using Kubernetes metadata. So here I'm creating an index based on pod name metadata.
logstash_prefix ${record['kubernetes']['pod_name']}
For more details:
https://medium.com/faun/how-to-create-custom-indices-based-on-kubernetes-metadata-using-fluentd-beed062faa5d
<source>
#type tail
#id in_tail_docker_logs
read_from_head true
tag yourTag.*
path /var/log/containers/**yournamespace**.log
pos_file /var/log/file.log.pos
<parse>
#type multi_format
<pattern>
format json
time_format '%Y-%m-%dT%H:%M:%S.%N%Z'
</pattern>
<pattern>
format regexp
expression /^(?<time>.+) (?<stream>stdout|stderr)( (?<logtag>.))? (?<log>.*)$/
time_format '%Y-%m-%dT%H:%M:%S.%N%:z'
</pattern>
</parse>
</source>
<match yourTag_**>
#type elasticsearch
host "#{ENV['FLUENT_ELASTICSEARCH_HOST']}"
port "#{ENV['FLUENT_ELASTICSEARCH_PORT']}"
scheme "#{ENV['FLUENT_ELASTICSEARCH_SCHEME'] || 'https'}"
user "#{ENV['FLUENT_ELASTICSEARCH_MDSA_USER']}"
password "#{ENV['FLUENT_ELASTICSEARCH_MDSA_PASSWORD']}"
ssl_verify "#{ENV['FLUENT_ELASTICSEARCH_SSL_VERIFY'] || 'false'}"
suppress_type_name true
index_name ocp_${tag[0]}_${$.kubernetes.labels.app}_%Y%m%d ### ==> **ocp_yourTag_appName_date**
<buffer tag, time, $.kubernetes.labels.app>
#type memory
timekey 10s
timekey_wait 0s
flush_mode immediate
flush_thread_count 4
</buffer>
This will work 100%, just change with your tag and with you namespace

Resources