How to create indices based on kubernetes metadata - elasticsearch

I am using filebeat in Kubernetes to ship logs to elastic search.
I want to create indexes based on namespaces. I'm trying to create a custom index for my different apps in a Kubernetes cluster, but this not working.
I used below conf:-
output.elasticsearch:
index: "%{[kubernetes.labels.app]:filebeat}-%{[beat.version]}-%{+yyyy.MM.dd}"
Filebeat Kube-manifest link- https://github.com/anup1384/k8s-filebeat

Use elasticsearch output as given below in filebeat configmap
output.elasticsearch:
index: "%{[kubernetes.namespace]:filebeat}-%{[beat.version]}-%{+yyyy.MM.dd}"

Create a custom index using Kubernetes metadata. So here I'm creating an index based on pod name metadata.
logstash_prefix ${record['kubernetes']['pod_name']}
For more details:
https://medium.com/faun/how-to-create-custom-indices-based-on-kubernetes-metadata-using-fluentd-beed062faa5d

<source>
#type tail
#id in_tail_docker_logs
read_from_head true
tag yourTag.*
path /var/log/containers/**yournamespace**.log
pos_file /var/log/file.log.pos
<parse>
#type multi_format
<pattern>
format json
time_format '%Y-%m-%dT%H:%M:%S.%N%Z'
</pattern>
<pattern>
format regexp
expression /^(?<time>.+) (?<stream>stdout|stderr)( (?<logtag>.))? (?<log>.*)$/
time_format '%Y-%m-%dT%H:%M:%S.%N%:z'
</pattern>
</parse>
</source>
<match yourTag_**>
#type elasticsearch
host "#{ENV['FLUENT_ELASTICSEARCH_HOST']}"
port "#{ENV['FLUENT_ELASTICSEARCH_PORT']}"
scheme "#{ENV['FLUENT_ELASTICSEARCH_SCHEME'] || 'https'}"
user "#{ENV['FLUENT_ELASTICSEARCH_MDSA_USER']}"
password "#{ENV['FLUENT_ELASTICSEARCH_MDSA_PASSWORD']}"
ssl_verify "#{ENV['FLUENT_ELASTICSEARCH_SSL_VERIFY'] || 'false'}"
suppress_type_name true
index_name ocp_${tag[0]}_${$.kubernetes.labels.app}_%Y%m%d ### ==> **ocp_yourTag_appName_date**
<buffer tag, time, $.kubernetes.labels.app>
#type memory
timekey 10s
timekey_wait 0s
flush_mode immediate
flush_thread_count 4
</buffer>
This will work 100%, just change with your tag and with you namespace

Related

Fluentd record transformer conditional statements to add key if not present

I want to add a message field to the log if it is not present in the logs.
Here's the relevant fluentd configuration:
<filter **>
#type record_transformer
enable_ruby true
<record>
message ${ if record.has_key?('message'); then record ["message"]; else record["message"] == "nomsg"; end}
</record>
</filter>
But when the message field is not present I get message=false, when it is present getting message=actual_msg.
Not sure why it is not taking message=nomsg.
Please help and suggest.
Tried above syntax to and fro but no luck.
Simply make changes in else condition as below:
<filter **>
#type record_transformer
enable_ruby true
<record>
message ${ if record.has_key?('message'); then record ["message"]; else "nomsg"; end}
</record>
</filter>
I hope this will help you.

Is if record.dig exists else null exclude even possible?

I'm trying to work on filtering data in fluentd using the record_modifier plugin which are null by adding an if statement. I’ve seen a few examples but none match my use case. Has anyone come across this or can confirm it's even possible?
Human readable action
partOf =
if the value of record.dig("kubernetes", "labels", "app.kubernetes.io/part-of") exists
include record and value
else
remove_keys partOf
Example config with dig:
<filter **>
#type record_modifier
<record>
partOf ${record.dig("kubernetes", "labels", "app.kubernetes.io/part-of")}
</record>
</filter>
partOf is a common k8s label sample; "app.kubernetes.io/part-of": "kube-state-metrics",
With record_modifier, it's working with embedded Ruby logic like this:
<filter **>
#type record_modifier
remove_keys __dummy__
<record>
__dummy__ ${ p = record["kubernetes"]["labels"]["app.kubernetes.io/part-of"]; p.nil? ? p : record['partOf'] = p; }
</record>
</filter>
See: https://github.com/repeatedly/fluent-plugin-record-modifier#ruby-code-trick-for-complex-logic
For running a complete test cycle, please refer to this answer.

Fluentd logs not sent to Elasticsearch - pattern not match

I can't get log messages to be processed by Fluentd and sent to Elasticsearch. I'm tailing the container of my service, it will pick up the log, but it can't parse it, it always fails with the error pattern not match. I understand that something is wrong with my parsing setup, but I can't see what.
The service writes to Stdout with Serilog, using the ElasticsearchJsonFormatter. My understanding is that it will write valid json to the console. This appears to be happening if I view the logs of the running container. When I view the Fluentd logs, it looks as if it has all been escaped.
If I view the logs of the service pod I can see the message, if then view the logs of the fluentd pod I can see the pattern not match error. Both of these are included below.
Any help or pointers will be greatly appreciated, as I've been stuck on this for days now.
Serilog Setup
var loggerConfig = new LoggerConfiguration()
.WriteTo.Console(new ElasticsearchJsonFormatter())
.MinimumLevel.Information()
.MinimumLevel.Override("Microsoft", LogEventLevel.Warning)
.MinimumLevel.Override("System", LogEventLevel.Warning);
Fluentd Config
apiVersion: v1
kind: ConfigMap
metadata:
name: fluentd-config
data:
fluent.conf: |
<source>
#type tail
path /var/log/containers/my-service-*.log
pos_file /var/log/app.log.pos
tag kubernetes.*
read_from_head true
<parse>
#type json
time_format %Y-%m-%dT%H:%M:%S.%NZ
</parse>
</source>
<match **>
#type elasticsearch
host "#{ENV['FLUENT_ELASTICSEARCH_HOST']}"
port "#{ENV['FLUENT_ELASTICSEARCH_PORT']}"
user "#{ENV['FLUENT_ELASTICSEARCH_USER']}"
password "#{ENV['FLUENT_ELASTICSEARCH_PASSWORD']}"
index_name fluentd
type_name fluentd
</match>
Example log message
This is what I can see if I view the logs for the running container. In my case this is a Kubernetes pod.
{
"#timestamp": "2021-12-29T13:23:28.2692128+00:00",
"level": "Information",
"messageTemplate": "Get Record - User Claim Contracts: {contracts} All Claims: {allclaims}",
"message": "Get Record - User Claim Contracts: [1, 2, 3, 4] All Claims: \"Application.Claims\"",
"fields": {
"contracts": [
1,
2,
3,
4
],
"allclaims": "Application.Claims",
"ddsource": "application",
"RecordId": null,
"UserId": null,
"TransactionId": null,
"env": "UAT",
}
}
Fluentd pattern not match
This is what I see when I view the logs for the fluentd container. Again, this is a pod.
2021-12-29 13:37:48 +0000 [warn]: #0 pattern not match: "2021-12-29T13:37:47.852294372Z stdout F {\"#timestamp\":\"2021-12-29T13:37:47.8518242+00:00\",\"level\":\"Information\",\"messageTemplate\":\"Get Record - User Claim Contracts: {contracts} All Claims: {allclaims}\",\"message\":\"Get Record - User Claim Contracts: [1, 2, 3, 4] All Claims: \\\"Application.Claims\\\"\",\"fields\":{\"contracts\":[1,2,3,4],\"allclaims\":\"\",\"ddsource\":\"\",\"RecordId\":null,\"UserId\":null,\"TransactionId\":null,\"env\":\"UAT\",\"\":\"\",\"\":\"\"}}"
Your log message is not valid JSON, since it contains a comma in line "env": "UAT",. The Fluentd log writes out two more empty fields "":"" as part of your record.
To parse time fields, you have to tell Fluentd the name of the time_key, in your case with time_key: #timestamp. You can use %iso8601 as time_format. For details see the Fluentd documentation on time parameters.

how to define a field in grok regex fluentd

i have a below apache atlas audit logs:
[INFO] 2020-06-29 15:14:31,732 AUDIT logJSON - {"repoType":15,"repo":"atlas","reqUser":"varun","evtTime":"2020-06-29 15:14:29.967","access":"entity-read","resource":"AtlanColumn/[]/glue/78975568964/flights/default/flightsgdelt_100m_test_partition/c_11","resType":"entity","action":"entity-read","result":1,"agent":"atlas","policy":6,"enforcer":"ranger-acl","cliIP":"10.9.2.76","agentHost":"atlas-7d9dcdd6c5-lmfzj","logType":"RangerAudit","id":"87c9e862-910b-4ee2-86f8-cb174f4e7b76-863129","seq_num":1701441,"event_count":1,"event_dur_ms":0,"tags":[],"cluster_name":"","policy_version":54}
rite now i have below parse config:
<parse>
#type regexp
expression ^\[(?<Level>.[^ ]*)\] (?<datetime>[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3}) (?<Type>.[^ ]*) (?<Action>.[^ ]*) \- \{"repoType":(?<repoType>.[^ ]*)\,"repo":"(?<repo>.[^ ]*)\","reqUser":"(?<reqUser>.[^ ]*)\","evtTime":"(?<evtTime>.[^ ].*)\","access":"(?<access>.[^ ]*)\","resource":"(?<resource>.[^ ].*)\","resType":"(?<resType>.[^ ]*)\","action":"(?<action>.[^ ]*)\","result":(?<result>.[^ ]*)\,"agent":"(?<agent>.[^ ].*)\","policy":(?<policy>.[^ ]*)\,"enforcer":"(?<enforcer>.[^ ]*)\","cliIP":"(?<cliIP>.[^ ]*)\","agentHost":"(?<agentHost>.[^ ]*)\","logType":"(?<logType>.[^ ]*)\","id":"(?<id>.[^ ]*)\","seq_num":(?<seq_num>.[^ ]*)\,"event_count":(?<event_count>.[^ ]*)\,"event_dur_ms":(?<event_dur_ms>.[^ ]*)\,"tags":(?<tags>.[^ ].*)\,"cluster_name":(?<cluster_name>.[^ ].*),"policy_version":(?<policy_version>.[^ ]*)\}
</parse>
now we want to further breakdown the resource field into multiple fields like below:
AssetType
Tags
Integration
Database
Schema
Table
Column
issue here is its not neccesary that resource field always has above combination. it can be AssetType/Tags/Integration or AssetType/Tags/Integration/Database or AssetType/Tags/Integration/Database/Schema or AssetType/Tags/Integration/Database/Schema/Table or AssetType/Tags/Integration/Database/Schema/Table/Column.
if any of the fields are missing then we should send null.
any suggestion or guidence on this would be highly appreciated.
you can use the record_reformer plugin to parse the resource key and extract the needed values for each of the needed keys, Below is an example of the usage
<match pattern.**>
#type record_reformer
tag new_tag.${tag_suffix[2]}
renew_record false
enable_ruby true
<record>
AssetType ${record['resource'].scan(/^([^\/]+\/){0}(?<param>[^\/]+)/).flatten.compact[0]}
Tags ${record['resource'].scan(/^([^\/]+\/){1}(?<param>[^\/]+)/).flatten.compact[0]}
Integration ${record['resource'].scan(/^([^\/]+\/){2}(?<param>[^\/]+)/).flatten.compact[0]}
Database ${record['resource'].scan(/^([^\/]+\/){3}(?<param>[^\/]+)/).flatten.compact[0]}
Schema ${record['resource'].scan(/^([^\/]+\/){4}(?<param>[^\/]+)/).flatten.compact[0]}
Table ${record['resource'].scan(/^([^\/]+\/){5}(?<param>[^\/]+)/).flatten.compact[0]}
Column ${record['resource'].scan(/^([^\/]+\/){6}(?<param>[^\/]+)/).flatten.compact[0]}
</record>
</match>

convert system time to utc and utc to system fails [duplicate]

This bash command "etcdctl get system config/log/timestamp" returns time type either UTC or System. Now i want to use this to convert time to same format. How can i do that?
I tried this but it failed td-agent running.
<source>
#type exec
command etcdctl get system config/log/timestamp
<parse>
keys timeType
</parse>
</source>
Now i want to use that timeType to convert my time from given log to that timeType
{"host":"sp-1","level":"INFO","log":{"classname":"common.server.hacluster.CSFHATopologyChangeHandlerMBean:93","message":"Finished processing CSF HA 'become standby' message.","stacktrace":"","threadname":"RMI TCP Connection(1784)-192.168.20.11"},"process":"becomeStandby","service":"SP","time":"2020-03-19T10:15:36.514Z","timezone":"America/Toronto","type":"log","system":"SP_IG_20_3_R1_I2002","systemid":"SP_IG_20_3_R1_I2002"}
This is where i want to use that $timeType
<filter com.logging.tmplog>
#type record_modifier
<record>
type log
time ${record["time"]}.$timeType ## It's not working
arun ${tag}
</record>
</filter>
The issue is solved by using record-modifier's prepare_value
<filter com.logging.tmplog>
#type record_modifier
prepare_value #timeType =`etcdctl get system config/log/timestamp`.strip
<record>
timeType ${#timeType}
time ${if #timeType == 'UTC' then Time.at(time).utc.strftime('%FT%T') else Time.at(time).to_s; end}
</record>
</filter>

Resources