Fluent-bit - Splitting json log into structured fields in Elasticsearch - elasticsearch

I am trying to find a way in Fluent-bit config to tell/enforce ES to store plain json formatted logs (the log bit below that comes from docker stdout/stderror) in structured way - please see image at the bottom for better explanation. For example, apart from (or along with) storing the log as a plain json entry under log field, I would like to store each property individually as shown in red.
The documentation for Filters and Parsers are really poor and not clear. On top of that the forward input doesn't have a "parser" option. I tried json/docker/regex parsers but no luck. My regex is here if I have to use regex. Currently using ES (7.1), Fluent-bit (1.1.3) and Kibana (7.1) - not Kubernetes.
If anyone can direct me to an example or give one I would be much appreciated.
Thanks
{
"_index": "hello",
"_type": "logs",
"_id": "T631e2sBChSKEuJw-HO4",
"_version": 1,
"_score": null,
"_source": {
"#timestamp": "2019-06-21T21:34:02.000Z",
"tag": "php",
"container_id": "53154cf4d4e8d7ecf31bdb6bc4a25fdf2f37156edc6b859ba0ddfa9c0ab1715b",
"container_name": "/hello_php_1",
"source": "stderr",
"log": "{\"time_local\":\"2019-06-21T21:34:02+0000\",\"client_ip\":\"-\",\"remote_addr\":\"192.168.192.3\",\"remote_user\":\"\",\"request\":\"GET / HTTP/1.1\",\"status\":\"200\",\"body_bytes_sent\":\"0\",\"request_time\":\"0.001\",\"http_referrer\":\"-\",\"http_user_agent\":\"curl/7.38.0\",\"request_id\":\"91835d61520d289952b7e9b8f658e64f\"}"
},
"fields": {
"#timestamp": [
"2019-06-21T21:34:02.000Z"
]
},
"sort": [
1561152842000
]
}
Thanks
conf
[SERVICE]
Flush 5
Daemon Off
Log_Level debug
Parsers_File parsers.conf
[INPUT]
Name forward
Listen 0.0.0.0
Port 24224
[OUTPUT]
Name es
Match hello_*
Host elasticsearch
Port 9200
Index hello
Type logs
Include_Tag_Key On
Tag_Key tag

Solution is as follows.
[SERVICE]
Flush 5
Daemon Off
Log_Level debug
Parsers_File parsers.conf
[INPUT]
Name forward
storage.type filesystem
Listen my_fluent_bit_service
Port 24224
[FILTER]
Name parser
Parser docker
Match hello_*
Key_Name log
Reserve_Data On
Preserve_Key On
[OUTPUT]
Name es
Host my_elasticsearch_service
Port 9200
Match hello_*
Index hello
Type logs
Include_Tag_Key On
Tag_Key tag
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
# Command | Decoder | Field | Optional Action
# =============|==================|=================
Decode_Field_As escaped_utf8 log do_next
Decode_Field_As json log

You can use the Fluent Bit Nest filter for that purpose, please refer to the following documentation:
https://docs.fluentbit.io/manual/filter/nest

Related

Fluentd logs not sent to Elasticsearch - pattern not match

I can't get log messages to be processed by Fluentd and sent to Elasticsearch. I'm tailing the container of my service, it will pick up the log, but it can't parse it, it always fails with the error pattern not match. I understand that something is wrong with my parsing setup, but I can't see what.
The service writes to Stdout with Serilog, using the ElasticsearchJsonFormatter. My understanding is that it will write valid json to the console. This appears to be happening if I view the logs of the running container. When I view the Fluentd logs, it looks as if it has all been escaped.
If I view the logs of the service pod I can see the message, if then view the logs of the fluentd pod I can see the pattern not match error. Both of these are included below.
Any help or pointers will be greatly appreciated, as I've been stuck on this for days now.
Serilog Setup
var loggerConfig = new LoggerConfiguration()
.WriteTo.Console(new ElasticsearchJsonFormatter())
.MinimumLevel.Information()
.MinimumLevel.Override("Microsoft", LogEventLevel.Warning)
.MinimumLevel.Override("System", LogEventLevel.Warning);
Fluentd Config
apiVersion: v1
kind: ConfigMap
metadata:
name: fluentd-config
data:
fluent.conf: |
<source>
#type tail
path /var/log/containers/my-service-*.log
pos_file /var/log/app.log.pos
tag kubernetes.*
read_from_head true
<parse>
#type json
time_format %Y-%m-%dT%H:%M:%S.%NZ
</parse>
</source>
<match **>
#type elasticsearch
host "#{ENV['FLUENT_ELASTICSEARCH_HOST']}"
port "#{ENV['FLUENT_ELASTICSEARCH_PORT']}"
user "#{ENV['FLUENT_ELASTICSEARCH_USER']}"
password "#{ENV['FLUENT_ELASTICSEARCH_PASSWORD']}"
index_name fluentd
type_name fluentd
</match>
Example log message
This is what I can see if I view the logs for the running container. In my case this is a Kubernetes pod.
{
"#timestamp": "2021-12-29T13:23:28.2692128+00:00",
"level": "Information",
"messageTemplate": "Get Record - User Claim Contracts: {contracts} All Claims: {allclaims}",
"message": "Get Record - User Claim Contracts: [1, 2, 3, 4] All Claims: \"Application.Claims\"",
"fields": {
"contracts": [
1,
2,
3,
4
],
"allclaims": "Application.Claims",
"ddsource": "application",
"RecordId": null,
"UserId": null,
"TransactionId": null,
"env": "UAT",
}
}
Fluentd pattern not match
This is what I see when I view the logs for the fluentd container. Again, this is a pod.
2021-12-29 13:37:48 +0000 [warn]: #0 pattern not match: "2021-12-29T13:37:47.852294372Z stdout F {\"#timestamp\":\"2021-12-29T13:37:47.8518242+00:00\",\"level\":\"Information\",\"messageTemplate\":\"Get Record - User Claim Contracts: {contracts} All Claims: {allclaims}\",\"message\":\"Get Record - User Claim Contracts: [1, 2, 3, 4] All Claims: \\\"Application.Claims\\\"\",\"fields\":{\"contracts\":[1,2,3,4],\"allclaims\":\"\",\"ddsource\":\"\",\"RecordId\":null,\"UserId\":null,\"TransactionId\":null,\"env\":\"UAT\",\"\":\"\",\"\":\"\"}}"
Your log message is not valid JSON, since it contains a comma in line "env": "UAT",. The Fluentd log writes out two more empty fields "":"" as part of your record.
To parse time fields, you have to tell Fluentd the name of the time_key, in your case with time_key: #timestamp. You can use %iso8601 as time_format. For details see the Fluentd documentation on time parameters.

Conditional indexing not working in ingest node pipelines

Am trying to implement an index template with datastream enabled and then set contains in ingest node pipelines. So that I could get metrics with below-mentioned index format :
.ds-metrics-kubernetesnamespace
I had tried this sometime back and I did these things as mentioned above and it was giving metrics in such format but now when I implement the same it's not changing anything in my index. I cannot see any logs in openshift cluster so ingest seems to be working fine(when I add a doc and test it works fine)
PUT _ingest/pipeline/metrics-index
{
"processors": [
{
"set": {
"field": "_index",
"value": "metrics-{{kubernetes.namespace}}",
"if": "ctx.kubernetes?.namespace==\"dev\""
}
}
]
}
This is the ingest node condition I have used for indexing.
metricbeatConfig:
metricbeat.yml: |
metricbeat.modules:
- module: kubernetes
enabled: true
metricsets:
- state_node
- state_daemonset
- state_deployment
- state_replicaset
- state_statefulset
- state_pod
- state_container
- state_job
- state_cronjob
- state_resourcequota
- state_service
- state_persistentvolume
- state_persistentvolumeclaim
- state_storageclass
- event
Since you're using Metricbeat, you have another way to do this which is much better.
Simply configure your elasticsearch output like this:
output.elasticsearch:
hosts: ["http://<host>:<port>"]
indices:
- index: "%{[kubernetes.namespace]}"
mappings:
dev: "metrics-dev"
default: "metrics-default"
or like this:
output.elasticsearch:
hosts: ["http://<host>:<port>"]
indices:
- index: "metrics-%{[kubernetes.namespace]}"
when.equals:
kubernetes.namespace: "dev"
default: "metrics-default"
or simply like this would also work if you have plenty of different namespaces and you don't want to manage different mappings:
output.elasticsearch:
hosts: ["http://<host>:<port>"]
index: "metrics-%{[kubernetes.namespace]}"
Steps to create datastreams in elastic stack:
create an ILM policy
Create an index template that has an index pattern that matches with the index pattern of metrics/logs.(Set number of primary shards/replica shards and mapping in index template)
Set a condition in ingest pipeline.(Make sure no such index exist)
If these conditions meet it will create a data stream and logs/metrics would have an index starting with .ds- and it will be hidden in index management.
In my case the issue was I did not have enough permission to create a custom index. When I checked my OpenShift logs I could find metricbeat was complaining about the privilege. So I gave Superuser permission and then used ingest node to set conditional indexing
PUT _ingest/pipeline/metrics-index
{
"processors": [
{
"set": {
"field": "_index",
"value": "metrics-{{kubernetes.namespace}}",
"if": "ctx.kubernetes?.namespace==\"dev\""
}
}
]
}

Filtering Filebeat input with or without Logstash

In our current setup we use Filebeat to ship logs to an Elasticsearch instance. The application logs are in JSON format and it runs in AWS.
For some reason AWS decided to prefix the log lines in a new platform release, and now the log parsing doesn't work.
Apr 17 06:33:32 ip-172-31-35-113 web: {"#timestamp":"2020-04-17T06:33:32.691Z","#version":"1","message":"Tomcat started on port(s): 5000 (http) with context path ''","logger_name":"org.springframework.boot.web.embedded.tomcat.TomcatWebServer","thread_name":"main","level":"INFO","level_value":20000}
Before it was simply:
{"#timestamp":"2020-04-17T06:33:32.691Z","#version":"1","message":"Tomcat started on port(s): 5000 (http) with context path ''","logger_name":"org.springframework.boot.web.embedded.tomcat.TomcatWebServer","thread_name":"main","level":"INFO","level_value":20000}
The question would be whether we can avoid using Logstash to convert the log lines into the old format? If not, how do I drop the prefix? Which filter is the best choice for this?
My current Filebeat configuration looks like this:
filebeat.inputs:
- type: log
paths:
- /var/log/web-1.log
json.keys_under_root: true
json.ignore_decoding_error: true
json.overwrite_keys: true
fields_under_root: true
fields:
environment: ${ENV_NAME:not_set}
app: myapp
cloud.id: "${ELASTIC_CLOUD_ID:not_set}"
cloud.auth: "${ELASTIC_CLOUD_AUTH:not_set}"
I would try to leverage the dissect and decode_json_fields processors:
processors:
# first ignore the preamble and only keep the JSON data
- dissect:
tokenizer: "%{?ignore} %{+ignore} %{+ignore} %{+ignore} %{+ignore}: %{json}"
field: "message"
target_prefix: ""
# then parse the JSON data
- decode_json_fields:
fields: ["json"]
process_array: false
max_depth: 1
target: ""
overwrite_keys: false
add_error_key: true
There is a plugin in Logstash called JSON filter that includes all the raw log line in a field called "message" (for instance).
filter {
json {
source => "message"
}
}
If you do not want to include the beginning part of the line, use the dissect filter in Logstash. It would be something like this:
filter {
dissect {
mapping => {
"message" => "%{}: %{message_without_prefix}"
}
}
}
Maybe in Filebeat there are these two features available as well. But in my experience, I prefer working with Logstash when parsing/manipulating logging data.

Can filebeat convert log lines output to json without logstash in pipeline?

We have standard log lines in our Spring Boot web applications (non json).
We need to centralize our logging and ship them to an elastic search as json.
(I've heard the later versions can do some transformation)
Can Filebeat read the log lines and wrap them as a json ? i guess it could append some meta data aswell. no need to parse the log line.
expected output :
{timestamp : "", beat: "", message: "the log line..."}
i have no code to show unfortunately.
filebeat supports several outputs including Elastic Search.
Config file filebeat.yml can look like this:
# filebeat options: https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-reference-yml.html
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/../file.err.log
processors:
- drop_fields:
# Prevent fail of Logstash (https://www.elastic.co/guide/en/beats/libbeat/current/breaking-changes-6.3.html#custom-template-non-versioned-indices)
fields: ["host"]
- dissect:
# tokenizer syntax: https://www.elastic.co/guide/en/logstash/current/plugins-filters-dissect.html.
tokenizer: "%{} %{} [%{}] {%{}} <%{level}> %{message}"
field: "message"
target_prefix: "spring boot"
fields:
log_type: spring_boot
output.elasticsearch:
hosts: ["https://localhost:9200"]
username: "filebeat_internal"
password: "YOUR_PASSWORD"
Well it seems to do it by default. this is my result when i tried it locally to read log lines. it wraps it exactly like i wanted.
{
"#timestamp":"2019-06-12T11:11:49.094Z",
"#metadata":{
"beat":"filebeat",
"type":"doc",
"version":"6.2.4"
},
"message":"the log line...",
"source":"/Users/myusername/tmp/hej.log",
"offset":721,
"prospector":{
"type":"log"
},
"beat":{
"name":"my-macbook.local",
"hostname":"my-macbook.local",
"version":"6.2.4"
}
}

Search Guard 5 - ][WARN ][c.f.s.c.PrivilegesEvaluator] Can not handle composite request

I have just installed Search Guard, version 5.6.9-19.1, in elasticsearch 5.6.9 to make a PoC. I'm getting 2 types of warning messages in elastic log.
I am using admin default roles and permissions to make requests. Files of elasticsearch, sg_roles and sg_roles_mapping are below.
**elasticsearch.yml**
searchguard.ssl.transport.keystore_filepath: CN=dev-keystore.jks
searchguard.ssl.transport.keystore_password:
searchguard.ssl.transport.truststore_filepath: truststore.jks
searchguard.ssl.transport.truststore_password:
searchguard.ssl.transport.enforce_hostname_verification: false
searchguard.ssl.http.enabled: true
searchguard.ssl.http.keystore_filepath: CN=dev-keystore.jks
searchguard.ssl.http.keystore_password:
searchguard.ssl.http.truststore_filepath: truststore.jks
searchguard.ssl.http.truststore_password:
searchguard.authcz.admin_dn:
- CN=sgadmin
**sg_roles.yml**
sg_all_access:
cluster:
- UNLIMITED
indices:
'*':
'*':
- UNLIMITED
tenants:
adm_tenant: RW
test_tenant_ro: RO
**sg_roles_mapping.yml**
sg_all_access:
users:
- sgadmin
- admin
The requests that I made were in kibana console:
GET /_msearch/template
{"index":"rt", "_type" : "rt-type"}
{"id": "getState","params": {"Key": "Issuer:9972"}}
{"index":"history", "_type" : "history-type"}
{"id": "getDaily","params": {"Key": "Issuer:9971","from": "2018-07-30T00:00:00"}}
The log message elasticsearch.yml:
[WARN ][c.f.s.c.PrivilegesEvaluator] Can not handle composite request of type 'org.elasticsearch.script.mustache.MultiSearchTemplateRequest'for indices:data/read/msearch/template here
==========================================================================
GET rt/rt-type/_search/template
{"id": "searchKey","params": {"Key": "Issuer:9971"}}
The log message elasticsearch.yml:
[WARN ][c.f.s.c.PrivilegesEvaluator] Can not handle composite request of type 'org.elasticsearch.script.mustache.SearchTemplateRequest'for indices:data/read/search/template here
getState, getDaily and searchKey are templates.
What does that mean? Is there any config missing? How can I avoid these messages?
Thank you!

Resources