Fluentd seems to be working but no logs in Kibana - elasticsearch

I have a Kubernetes pod consisting of two containers - main app (writes logs to file on volume) and Fluentd sidecar that tails log file and writes to Elasticsearch.
Here is the Fluentd configuration:
<source>
type tail
format none
path /test/log/system.log
pos_file /test/log/system.log.pos
tag anm
</source>
<match **>
#id elasticsearch
#type elasticsearch
#log_level debug
time_key #timestamp
include_timestamp true
include_tag_key true
host elasticsearch-logging.kube-system.svc.cluster.local
port 9200
logstash_format true
<buffer>
#type file
path /var/log/fluentd-buffers/kubernetes.system.buffer
flush_mode interval
retry_type exponential_backoff
flush_thread_count 2
flush_interval 5s
retry_forever
retry_max_interval 30
chunk_limit_size 2M
queue_limit_length 8
overflow_action block
</buffer>
</match>
Everything is working, Elasticsearch host & port are correct since API works correctly on that URL. In Kibana I see only records every 5 seconds about Fluentd creating new chunk:
2018-12-03 12:15:50 +0000 [debug]: #0 [elasticsearch] Created new chunk chunk_id="57c1d1c105bcc60d2e2e671dfa5bef04" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="anm", variables=nil>
but no actual logs in Kibana (the ones that are being written by the app to system.log file). Kibana is configured to the "logstash-*" index pattern that matches the one and only existing index.
Version of Fluentd image: k8s.gcr.io/fluentd-elasticsearch:v2.0.4
Version of Elasticsearch: k8s.gcr.io/elasticsearch:v6.3.0
Where can I check to find out what's wrong? Looks like Fluentd does not get to put the logs into Elasticsearch, but what can be the reason?

The answer turned out to be embarrassingly simple, maybe will help someone in the future.
I figured the problem was with this source config line:
<source>
...
format none
...
</source>
That meant that no usual tags where added when saved to elasticsearch (e.g. pod or container name) and I had to search for these records in Kibana in a completely different way. For instance, I used my own tag to search for those records and found them alright. The custom tag was originally added just in case, but turned out to be very useful:
<source>
...
tag anm
...
</source>
So, the final takeaway could be the following. Use "format none" with caution, and if the source data actually is unstructured, add your own tags, and possibly enrich with additional tags/info (e.g. "hostname", etc) using fluentd's record_transformer, which I ended up also doing. Then it will be much easier to locate the records via Kibana.

Related

ElasticSearch 8 errors with Action/metadata line [1] contains an unknown parameter [_type] status:400

I am trying to setup EFK (ElasticSearch 8, FluentD and Kibana) stack on K8S cluster (on-premises)
I followed this link to install elasticsearch and installed it using helm charts and followed this link to install fluentd
Output of fluentd and elasticsearch pods
[root#ctrl01 ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
elasticsearch-master-0 1/1 Running 0 136m
[root#ctrl01 ~]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
fluentd-cnb7p 1/1 Running 0 107m
fluentd-dbxjk 1/1 Running 0 107m
However, elasticsearch log was piled up with the following warning messages
2021-10-18 12:13:12 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2021-10-18 12:13:42 +0000 error_class="Elasticsearch::Transport::Transport::Errors::BadRequest" error="[400] {\"error\":{\"root_cause\":[{\"type\":\"illegal_argument_exception\",\"reason\":\"Action/metadata line [1] contains an unknown parameter [_type]\"}],\"type\":\"illegal_argument_exception\",\"reason\":\"Action/metadata line [1] contains an unknown parameter [_type]\"},\"status\":400}" plugin_id="out_es"
2021-10-18 12:13:12 +0000 [warn]: suppressed same stacktrace
Conf file (tailored output)
2021-10-18 12:09:10 +0000 [info]: using configuration file: <ROOT>
<match fluent.**>
#type null
</match>
<source>
#type tail
#id in_tail_container_logs
path /var/log/containers/*.log
pos_file /var/log/fluentd-containers.log.pos
tag kubernetes.*
read_from_head true
format json
time_format %Y-%m-%dT%H:%M:%S.%NZ
</source>
<source>
#type tail
#id in_tail_minion
path /var/log/salt/minion
pos_file /var/log/fluentd-salt.pos
tag salt
format /^(?<time>[^ ]* [^ ,]*)[^\[]*\[[^\]]*\]\[(?<severity>[^ \]]*) *\] (?<message>.*)$/
time_format %Y-%m-%d %H:%M:%S
</source>
I am not sure which 'type' field it refers to. I am unable to find an example of ElasticSearch 8 for match and source directives to compare
It seems type field is not supported from ES 8 onwards but I am not sure on that. Kindly let me know the reason for the error
I faced similar errors when I tried to use elasticsearch 8.2.3 with fluentBit 1.9.5. I could see elastic was sending logs but could not see any data in kibana webpage due to which could not create indices and saw the above error in fluent-bit pod logs. I followed this github issue and added Suppress_Type_Name On under outputs: section in my fluent-bit helm chart values.yaml file and it worked fine after that.
[OUTPUT]
Name es
Match *
Host {{ .Values.global.backend.es.host }}
Port {{ .Values.global.backend.es.port }}
Logstash_Format Off
Retry_Limit False
Type _doc
Time_Key #timestamp
Replace_Dots On
Suppress_Type_Name On
Index {{ .Values.global.backend.es.index }}
{{ .Values.extraEntries.output }}
I was working on the same issue for a few days and I found a solution but just a workaround, not the optimal solution.
If you set TypeName as null for ElasticsearchSinkOptions, you don't face this issue.
Unfortunately, you can't set it from appsettings.json. At least I couldn't find a way.
In background, Serilog.Sinks.ElasticSearch library use this property as _type in HTTP header. But the '_type' header,
as leandrojmp pointed out in the comment, it is no longer available in version 8.2 of ElasticSearch.

Add additional field in fluentd for windows logs

We have Windows AD logs that we send to ElasticSearch (winlogbeat->fluentd->Elasticsearch).
Is it possible to add addtitional field in fluentd based on regexp of other field?
What I want to do:
If I have field event_data.TargetUserName=PC-NAME$ -> I add field event_data.logonType=Computer
If I have field event_data.TargetUserName=Username -> I add field event_data.logonType=Human
And then send it to Elasticserach.
One trick is to regexp data with '$', and other trick is to add new filed.
Can anyone tell me is it possible?
Here is part of my fluentd conf file for windows logs (it's very simple):
<match winserver.**>
#type elasticsearch
hosts http://elasticsearch.test:9200
logstash_format true
time_key ttw
time_key_format "%Y-%m-%dT%H:%M:%S.%L%z"
remove_keys ttw
logstash_prefix winserver.test
request_timeout 15s
<buffer>
#type memory
flush_interval 10s
</buffer>
</match>
Thank you!
You can use built-in filter_record_transformer for your purpose.

fluentd aggregator not getting logs from forwarder, is the config correct?

I am currently trying to setup a system of forwarder and aggregator instances of fluentd.
My forwarder config is ->
<source>
#type tail
format json
path /app/node-apps/something-fe/logs/itt-logs.log
tag td.pm2.access
</source>
## File output
## match tag=local.** and write to file
<match td.pm2.*>
#type file
path /opt/td-agent/forward/log/fluentd.log
</match>
## Forwarding
## match tag=system.** and forward to another td-agent server
<match td.pm2.*>
#type forward
host <hostname>.bc
Doing this i can see that t forwarder is outputting log files in the forwarding location here : /opt/td-agent/forward/log/fluentd.log
All good so far!!!
But when i try to import this in the aggregator via the match-forward syntaxes above i do not get anything in the aggregator machines.
Please find teh aggregator config for fluentd here that i am using -->
<source>
#type forward
port 24224
</source>
<match td.pm2.*>
type copy
<store>
#type file
path /opt/td-agent/log/forward.log
</store>
<store>
type elasticsearch
host <aggreatorhost>.bc
port 9200
logstash_format true
flush_interval 10s
</store>
</match>
I am trying to use a store to copy the logs over there and also forward them to elasticsearch.
Forgetting elasticsearch altogether, it seems that even the logs are not getting populated from the forwarder to the aggregator.
Am i doing something wrong? The aggregator logs say that its listening on all addresses on port 24224.
On your forwarder you have two identical match patterns and only the first match is being executed (the config is run top to bottom). The logs are being written to the file system (/opt/td-agent/forward/log/fluentd.log) but not forwarded to the aggregator.
You've actually used the correct copy syntax on the aggregator which you should copy into your forwarder and replace the elasticsearch with the #forward config to the aggregator
<match td.pm2.*>
type copy
<store>
#type file
path /opt/td-agent/log/forward.log
</store>
<store>
#type forward
host <hostname>.bc
</store>
</match>
Further reading: http://docs.fluentd.org/articles/out_copy
I think there is a typo:
there has to be an "#"-symbol before type copy. At least, from my experience I can say that my td-agent didn't allow me to restart without the "#"-symbol, so I think it is right, though not 100% sure.
#type copy
in the second line of the code above.

Fluentd High Availability Custom Index

I've setup a fluentd/elasticsearch/kibana stack very similar to what is described here. When I look at the logs in kibana I notice that they are automatically indexed by day using the format "logstash-[YYYY].[MM].[DD]. Based on the documentation for the fluentd elasticsearch plugin it seems that you can create a custom index by setting the "index_name" property.
I've tried this on both the log forwarder and the log aggregator but I still seem to get the default index name in elasticsearch. Is there something else required to customize this index name in a HA setup?
Here is the log forwarder config:
<source>
type tail
path /var/log/debug.json
pos_file /var/log/debug.pos
tag asdf
format json
index_name fluentd
time_key time_field
</source>
<match *>
type copy
<store>
type stdout
</store>
<store>
type forward
flush_interval 10s
<server>
host [fluentd aggregator]
</server>
</store>
</match>
And here is the log aggregator config:
<source>
type forward
port 24224
bind 0.0.0.0
</source>
<match *>
type copy
<store>
type stdout
</store>
<store>
type elasticsearch
host localhost
port 9200
index_name fluentd
type_name fluentd
logstash_format true
include_tag_key true
flush_interval 10s # for testing
</store>
</match>
I found an issue on the fluent-plugin-elasticsearch repo that explains this behavior. When setting the "logstash_format" option to true the "index_name" field is ignored.
remove logstash_format true from .You will get your custom index.But you will not get timestamp in your data.For getting timestamp you have to update version of ruby and then pass time format to config file of fluentd.

get docker log stream in correct order

I've tried a couple of log collection services now, like logspout/papertrail and fluentd/elasticsearch, but the results don't always show up in the correct order, which can make debugging difficult. An example is with a Node.js application, a console.log command which results in multiple lines, or an error with its stack trace. The lines all show up with the same timestamp, and I guess the log collection services have no way to know which order to display those. Is there a way to add millisecond precision? Or some other way to make sure they are displayed in the same order as if I did a docker logs command?
Update: I haven't looked into it, but I saw something about fluent or elasticsearch supporting millisecond+ accuracy by default in a newer version
In my understanding, you have 2 options:
Increase time stamp precision (like you did); or
Use log storage which can maintain the order of data. For example MongoDB. The log collection concept is described in another stackoverflow post.
I found a workaround for fluentd in this answer, though I'd still like a real solution
Here is my modified td-agent.conf, for use in the fluentd-es-image. It adds the time_nano field, which can be sorted on
<source>
type tail
format json
time_key time
path /varlog/containers/*.log
pos_file /varlog/es-containers.log.pos
time_format %Y-%m-%dT%H:%M:%S.%L%Z
tag cleanup.reform.*
read_from_head true
</source>
<match cleanup.**>
type record_reformer
time_nano ${t = Time.now; ((t.to_i * 1000000000) + t.nsec).to_s}
tag ${tag_suffix[1]}
</match>
<match reform.**>
type record_reformer
enable_ruby true
tag kubernetes.${tag_suffix[3].split('-')[0..-2].join('-')}
</match>
<match kubernetes.**>
type elasticsearch
log_level info
include_tag_key true
host elasticsearch-logging.default
port 9200
logstash_format true
flush_interval 5s
# Never wait longer than 5 minutes between retries.
max_retry_wait 300
# Disable the limit on the number of retries (retry forever).
disable_retry_limit
</match>
<source>
type tail
format none
path /varlog/kubelet.log
pos_file /varlog/es-kubelet.log.pos
tag kubelet
</source>
<match kubelet>
type elasticsearch
log_level info
include_tag_key true
host elasticsearch-logging.default
port 9200
logstash_format true
flush_interval 5s
# Never wait longer than 5 minutes between retries.
max_retry_wait 300
# Disable the limit on the number of retries (retry forever).
disable_retry_limit
</match>

Resources