Set fluentD elastic-search index dynamically - elasticsearch

I'm trying to forward logs to elastic-search and got stuck with setting the index dynamically (by field in the input data).
My input data format is JSON and always have the key "es_idx". I wish to forward to elasticsearch by that key and add it timestamp, I use logstash_format true to achieve the timestamp feature and logstash_prefix to set the index name other than "fluentd"
This is how my fluentd config looks like:
# fluentd/conf/fluent.conf
<source>
type stdin
# Input pattern. It depends on Parser plugin
format json
# Optional. default is stdin.events
</source>
<match *.**>
#type copy
<store>
#type stdout
</store>
<store>
#type elasticsearch
host <es-host>
port<es-port>
logstash_format true
logstash_prefix ${$record["es_idx"]}
type_name fluentd
flush_interval 5s
</store>
</match>
When using the following input {"tenant_id":"test","es_idx":"blabla"}, I'm getting the following error:
2020-05-27 10:38:06 +0300 [warn]: #0 dump an error event: error_class=Fluent::Plugin::ElasticsearchErrorHandler::ElasticsearchError error="400 - Rejected by Elasticsearch" location=nil tag="stdin.events" time=2020-05-27 10:37:59.498450000 +0300 record={"tenant_id"=>"test", "es_idx"=>"blabla"}
If I'm setting the logstash_pattern to other string like this: "logstash_pattern blabla" it works fine.
Does anyone have a clue what can be the issue?

To use dynamic elastic-search you need to use Chunk keys as described here
In your case, you may need such configs
<match *.**>
#type copy
<store>
#type stdout
</store>
<store>
#type elasticsearch
host <es-host>
port<es-port>
logstash_format true
logstash_prefix ${es_idx}
logstash_dateformat %Y%m%d
type_name fluentd
flush_interval 5s
<buffer es_idx>
#type file
path /fluentd/log/elastic-buffer
flush_thread_count 8
flush_interval 1s
chunk_limit_size 32M
queue_limit_length 4
flush_mode interval
retry_max_interval 30
retry_forever true
</buffer>
</store>
</match>
another option is to use elasticsearch_dynamic
<match my.logs.*>
#type elasticsearch_dynamic
hosts ${record['host1']}:9200,${record['host2']}:9200
index_name my_index.${Time.at(time).getutc.strftime(#logstash_dateformat)}
logstash_prefix ${tag_parts[3]}
port ${9200+rand(4)}
index_name ${tag_parts[2]}-${Time.at(time).getutc.strftime(#logstash_dateformat)}
</match>

succeed to get the value from the record object like this:
<match *.**>
#type copy
<store>
#type stdout
</store>
<store>
#type elasticsearch
#log_level debug
host <host>
logstash_format true
logstash_prefix ${es_index_pattern}
type_name fluentd
flush_interval 5s
<buffer tag, es_index_pattern>
#type memory
</buffer>
</store>
</match>

Related

log drops when my Fluent Pod utilization is near ~1 core

I am getting log drops when my Fluent Pod utilization is near ~1 core.
I'm running fluentd in Kubernetes(eks) send my application stdout/stderr logs to elastic search, getting logs to drop when ingestion rate is high and when pod CPU utilization is 1 core.
Fluentd helm charts link
https://github.com/anup1384/helm-charts/tree/master/stable/fluentd-ds
Fluentd version v1.14.3
ES version- 7.15.0
Plugin version -
elasticsearch (7.15.0)
elasticsearch-api (7.15.0)
elasticsearch-transport (7.15.0)
elasticsearch-xpack (7.15.0)
fluent-plugin-elasticsearch (5.1.5, 5.1.4)
My fluentd-config
<source>
#type tail
#id in_tail_container_logs
path /var/log/containers/perf*.log
pos_file /var/log/fluentd-containers.log.pos
tag k8.*
read_from_head true
<parse>
#type json
time_key #timestamp
time_format %Y-%m-%dT%H:%M:%S.%N%z
keep_time_key true
</parse>
</source>
<filter **>
#type kubernetes_metadata
skip_container_metadata "true"
</filter>
<filter **>
#type parser
#log_level info
key_name log
reserve_data true
reserve_time true
remove_key_name_field true
emit_invalid_record_to_error false
replace_invalid_sequence true
<parse>
#type json
</parse>
</filter>
<filter **>
#type record_transformer
enable_ruby true
<record>
log_json ${record["log"]}
</record>
remove_keys $.kubernetes.labels
</filter>
<filter **>
#type elasticsearch_genid
hash_id_key _hash
</filter>
<match k8.**>
#type copy
#id k8s
<store>
#type elasticsearch
#id k8s_es
#log_level debug
scheme http
host "es-acquiring-log.abc.com"
port "80"
log_es_400_reason true
logstash_format true
logstash_prefix abc-test
reconnect_on_error true
reload_on_failure true
reload_connections false
suppress_type_name true
sniffer_class_name Fluent::Plugin::ElasticsearchSimpleSniffer
request_timeout 2147483648
compression_level best_compression
include_timestamp true
utc_index false
time_key_format "%Y-%m-%dT%H:%M:%S.%N%z"
time_key time
id_key _hash
remove_keys _hash
<buffer tag, abc-test>
#type file
flush_mode interval
flush_thread_count 8
path /var/log/fluentd-buffers/k8s.buffer
chunk_limit_size 16m
queue_limit_length 512
flush_interval 5s
overflow_action drop_oldest_chunk
retry_max_interval 30s
retry_forever false
retry_type exponential_backoff
retry_timeout 1h
retry_wait 20s
retry_max_times 5
</buffer>
</store>
</match>
I'm unable to use multi-core fluentd, can someone help me with configurations and how to use multicore fluentd pod
Thanks

fluentd: ignore_repeated_log_interval and ignore_same_log_interval

Trying to suppress same logs in fluentd by using fluentd system directives ignore_repeated_log_interval and ignore_same_log_interval, but no working action has been observed
MY fluentd.conf
## conf file
<match fluent.**>
#type null
</match>
<system>
log_level info
ignore_repeated_log_interval 5s
ignore_same_log_interval 5s
suppress_repeated_stacktrace true
</system>
<source>
#type tail
read_from_head true
#path /home/nikhil.nayak/ngp/dlog2.txt
path /home/nikhil.nayak/ngp/log.txt
pos_file /var/log/td-agent/fluentd-docker.pos
time_format %Y-%m-%dT%H:%M:%S.%NZ
format none
tag docker.*
</source>
<match **>
#type elasticsearch
#id out_es
#log_level info
include_tag_key true
logstash_format true
host
port 9600
flush_interval 5s
type_name "_doc"
logstash_prefix logstash
logstash_format true
index_name logstash
<buffer>
flush_thread_count 8
flush_interval 5s
chunk_limit_size 2M
queue_limit_length 32
retry_max_interval 30
retry_forever true
</buffer>
</match>
#<match **>
##type stdout
#</match>
section parameters concerns to the Fluentd logging mechanism them self. Not for the data pipelines from a source.

failed to flush the buffer fluentd

I am getting these errors. Data is loaded into elasticsearch, but some records are missing in kibana. I am seeing this in fluentd logs in kubernetes
2021-04-26 15:58:10 +0000 [warn]: #0 failed to flush the buffer. retry_time=29 next_retry_seconds=2021-04-26 15:58:43 +0000 chunk="5c0e21cad29fc91298a9d881c6bd9873" error_class=Fluent::Plugin::ElasticsearchErrorHandler::ElasticsearchError error="Elasticsearch returned errors, retrying. Add '#log_level debug' to your config to see the full response"
2021-04-26 15:58:10 +0000 [warn]: #0 suppressed same stacktrace
Here is myy fluentd conf
fluent.conf: |
<match fluent.**>
# this tells fluentd to not output its log on stdout
#type null
</match>
# here we read the logs from Docker's containers and parse them
<source>
#type tail
path /var/log/containers/*nginx-ingress-controller*.log,/var/log/containers/*kong*.log
pos_file /var/log/nginx-containers.log.pos
#label #NGINX
tag kubernetes.*
read_from_head true
<parse>
#type json
time_format %Y-%m-%dT%H:%M:%S.%NZ
</parse>
</source>
<source>
#type tail
path /var/log/containers/*.log
exclude_path ["/var/log/containers/*nginx-ingress-controller*.log", "/var/log/containers/*kong*.log"]
pos_file /var/log/fluentd-containers.log.pos
tag kubernetes.*
read_from_head true
<parse>
#type json
time_format %Y-%m-%dT%H:%M:%S.%NZ
</parse>
</source>
# we use kubernetes metadata plugin to add metadatas to the log
<filter kubernetes.**>
#type kubernetes_metadata
</filter>
<label #NGINX>
<filter kubernetes.**>
#type kubernetes_metadata
</filter>
<filter kubernetes.**>
#type parser
key_name log
reserve_data true
<parse>
#type regexp
expression /^(?<remote>[^ ]*(?: [^ ]* [^ ]*)?) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)"(?: (?<request_length>[^ ]*) (?<request_time>[^ ]*) (?<proxy_upstream_name>[^ ]*(?: \[[^ ]*\])*) (?<upstream_addr>[^ ]*(?:, [^ ]*)*) (?<upstream_response_length>[^ ]*(?:, [^ ]*)*) (?<upstream_response_time>[^ ]*(?:, [^ ]*)*) (?<upstream_status>[^ ]*(?:, [^ ]*)*) (?<req_id>[^ ]*))?)?$/
time_format %d/%b/%Y:%H:%M:%S %z
</parse>
</filter>
<match kubernetes.**>
#type elasticsearch
include_tag_key true
host "#{ENV['FLUENT_ELASTICSEARCH_HOST']}"
port "#{ENV['FLUENT_ELASTICSEARCH_PORT']}"
scheme "#{ENV['FLUENT_ELASTICSEARCH_SCHEME'] || 'https'}"
ssl_verify false
reload_connections false
logstash_prefix k8-nginx
logstash_format true
<buffer>
flush_mode interval
retry_type exponential_backoff
flush_thread_count 2
flush_interval 5s
retry_forever true
retry_max_interval 30
chunk_limit_size 2M
queue_limit_length 32
overflow_action block
</buffer>
</match>
</label>
# we send the logs to Elasticsearch
<match kubernetes.**>
#type elasticsearch
include_tag_key true
host "#{ENV['FLUENT_ELASTICSEARCH_HOST']}"
port "#{ENV['FLUENT_ELASTICSEARCH_PORT']}"
scheme "#{ENV['FLUENT_ELASTICSEARCH_SCHEME'] || 'https'}"
ssl_verify false
reload_connections false
logstash_prefix k8-logstash
logstash_format true
<buffer>
flush_mode interval
retry_type exponential_backoff
flush_thread_count 2
flush_interval 5s
retry_forever true
retry_max_interval 30
chunk_limit_size 2M
queue_limit_length 32
overflow_action block
</buffer>
</match>

Fluentd JSON logs truncate/splitting after 16385 characters- How to concate?

I have deployed Bitnami EFK stack on K8s environment:
repository: bitnami/fluentd
tag: 1.12.1-debian-10-r0
Currently, one of the modules/applications inside my namespaces are configured to generate JSON logs. I see logs in Kibana as JSON format.
But there is the issue of splitting/truncating logs after 16385 characters, and I cannot see full logs trace.
I have tested some of the concat plugins but they don't give the expected results so far. or maybe I did the wrong implementation of Plugins.
fluentd-inputs.conf: |
# Get the logs from the containers running in the node
<source>
#type tail
path /var/log/containers/*.log
tag kubernetes.*
<parse>
#type json
time_key time
time_format %Y-%m-%dT%H:%M:%S.%NZ
</parse>
</source>
# enrich with kubernetes metadata
<filter kubernetes.**>
#type kubernetes_metadata
</filter>
<filter kubernetes.**>
#type parser
key_name log
reserve_data true
<parse>
#type json
</parse>
</filter>
<filter kubernetes.**>
#type concat
key log
stream_identity_key #timestamp
#multiline_start_regexp /^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d+ .*/
multiline_start_regexp /^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3}
flush_interval 5
</filter>
fluentd-output.conf: |
<match **>
#type forward
# Elasticsearch forward
<buffer>
#type file
path /opt/bitnami/fluentd/logs/buffers/logs.buffer
total_limit_size 1024MB
chunk_limit_size 16MB
flush_mode interval
retry_type exponential_backoff
retry_timeout 30m
retry_max_interval 30
overflow_action drop_oldest_chunk
flush_thread_count 2
flush_interval 5s
flush_thread_count 2
flush_interval 5s
</buffer>
</match>
{{- else }}
# Send the logs to the standard output
<match **>
#type stdout
</match>
{{- end }}
I am not sure but a reason could be that inside fluentd configuration, some Plugins are already used to filter JSON data, and maybe there is a different way to use a new concat plugin. ? or it can be configured in a different way. ?
https://github.com/fluent-plugins-nursery/fluent-plugin-concat
Can anyone of you please support?
Thanks

Fluentd is working but no index is being created on elastcisearch

I have a Kubernetes pod java app (writes logs to file on volume host (/var/log/java-app/java.log )) and use Fluentd as daemon sets that tails log file and writes to Elasticsearch. My fluentd is working but no index is being created on the elastic search and no index is showing on kibana.
Here is the Fluentd configuration:
javaapp.conf: |
<source>
#type tail
path /var/log/java-app/java.log
pos_file /var/log/java-apps.log.pos
tag java.app
read_from_head true
<parse>
#type json
time_format %Y-%m-%dT%H:%M:%S.%NZ
</parse>
</source>
# we send the logs to Elasticsearch
<match java.**>
#type elasticsearch_dynamic
#log_level info
include_tag_key true
host "#{ENV['FLUENT_ELASTICSEARCH_HOST']}"
port "#{ENV['FLUENT_ELASTICSEARCH_PORT']}"
user "#{ENV['FLUENT_ELASTICSEARCH_USER']}"
password "#{ENV['FLUENT_ELASTICSEARCH_PASSWORD']}"
scheme "#{ENV['FLUENT_ELASTICSEARCH_SCHEME'] || 'http'}"
ssl_verify "#{ENV['FLUENT_ELASTICSEARCH_SSL_VERIFY'] || 'true'}"
reload_connections true
logstash_format true
logstash_prefix java-app-logs
<buffer>
#type file
path /var/log/fluentd-buffers/java-app.system.buffer
flush_mode interval
retry_type exponential_backoff
flush_thread_count 2
flush_interval 5s
retry_forever true
retry_max_interval 30
chunk_limit_size 2M
queue_limit_length 32
overflow_action block
</buffer>
</match>
Version of Fluentd version : fluent/fluentd-kubernetes-daemonset:v1.1-debian-elasticsearch
Version of Elasticsearch version: docker.elastic.co/elasticsearch/elasticsearch:7.3.0
Looks like Fluentd does not get to put the logs into Elasticsearch.

Resources