Fluentd High Availability Custom Index - elasticsearch

I've setup a fluentd/elasticsearch/kibana stack very similar to what is described here. When I look at the logs in kibana I notice that they are automatically indexed by day using the format "logstash-[YYYY].[MM].[DD]. Based on the documentation for the fluentd elasticsearch plugin it seems that you can create a custom index by setting the "index_name" property.
I've tried this on both the log forwarder and the log aggregator but I still seem to get the default index name in elasticsearch. Is there something else required to customize this index name in a HA setup?
Here is the log forwarder config:
<source>
type tail
path /var/log/debug.json
pos_file /var/log/debug.pos
tag asdf
format json
index_name fluentd
time_key time_field
</source>
<match *>
type copy
<store>
type stdout
</store>
<store>
type forward
flush_interval 10s
<server>
host [fluentd aggregator]
</server>
</store>
</match>
And here is the log aggregator config:
<source>
type forward
port 24224
bind 0.0.0.0
</source>
<match *>
type copy
<store>
type stdout
</store>
<store>
type elasticsearch
host localhost
port 9200
index_name fluentd
type_name fluentd
logstash_format true
include_tag_key true
flush_interval 10s # for testing
</store>
</match>

I found an issue on the fluent-plugin-elasticsearch repo that explains this behavior. When setting the "logstash_format" option to true the "index_name" field is ignored.

remove logstash_format true from .You will get your custom index.But you will not get timestamp in your data.For getting timestamp you have to update version of ruby and then pass time format to config file of fluentd.

Related

Fluentd seems to be working but no logs in Kibana

I have a Kubernetes pod consisting of two containers - main app (writes logs to file on volume) and Fluentd sidecar that tails log file and writes to Elasticsearch.
Here is the Fluentd configuration:
<source>
type tail
format none
path /test/log/system.log
pos_file /test/log/system.log.pos
tag anm
</source>
<match **>
#id elasticsearch
#type elasticsearch
#log_level debug
time_key #timestamp
include_timestamp true
include_tag_key true
host elasticsearch-logging.kube-system.svc.cluster.local
port 9200
logstash_format true
<buffer>
#type file
path /var/log/fluentd-buffers/kubernetes.system.buffer
flush_mode interval
retry_type exponential_backoff
flush_thread_count 2
flush_interval 5s
retry_forever
retry_max_interval 30
chunk_limit_size 2M
queue_limit_length 8
overflow_action block
</buffer>
</match>
Everything is working, Elasticsearch host & port are correct since API works correctly on that URL. In Kibana I see only records every 5 seconds about Fluentd creating new chunk:
2018-12-03 12:15:50 +0000 [debug]: #0 [elasticsearch] Created new chunk chunk_id="57c1d1c105bcc60d2e2e671dfa5bef04" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="anm", variables=nil>
but no actual logs in Kibana (the ones that are being written by the app to system.log file). Kibana is configured to the "logstash-*" index pattern that matches the one and only existing index.
Version of Fluentd image: k8s.gcr.io/fluentd-elasticsearch:v2.0.4
Version of Elasticsearch: k8s.gcr.io/elasticsearch:v6.3.0
Where can I check to find out what's wrong? Looks like Fluentd does not get to put the logs into Elasticsearch, but what can be the reason?
The answer turned out to be embarrassingly simple, maybe will help someone in the future.
I figured the problem was with this source config line:
<source>
...
format none
...
</source>
That meant that no usual tags where added when saved to elasticsearch (e.g. pod or container name) and I had to search for these records in Kibana in a completely different way. For instance, I used my own tag to search for those records and found them alright. The custom tag was originally added just in case, but turned out to be very useful:
<source>
...
tag anm
...
</source>
So, the final takeaway could be the following. Use "format none" with caution, and if the source data actually is unstructured, add your own tags, and possibly enrich with additional tags/info (e.g. "hostname", etc) using fluentd's record_transformer, which I ended up also doing. Then it will be much easier to locate the records via Kibana.

Add additional field in fluentd for windows logs

We have Windows AD logs that we send to ElasticSearch (winlogbeat->fluentd->Elasticsearch).
Is it possible to add addtitional field in fluentd based on regexp of other field?
What I want to do:
If I have field event_data.TargetUserName=PC-NAME$ -> I add field event_data.logonType=Computer
If I have field event_data.TargetUserName=Username -> I add field event_data.logonType=Human
And then send it to Elasticserach.
One trick is to regexp data with '$', and other trick is to add new filed.
Can anyone tell me is it possible?
Here is part of my fluentd conf file for windows logs (it's very simple):
<match winserver.**>
#type elasticsearch
hosts http://elasticsearch.test:9200
logstash_format true
time_key ttw
time_key_format "%Y-%m-%dT%H:%M:%S.%L%z"
remove_keys ttw
logstash_prefix winserver.test
request_timeout 15s
<buffer>
#type memory
flush_interval 10s
</buffer>
</match>
Thank you!
You can use built-in filter_record_transformer for your purpose.

fluentd aggregator not getting logs from forwarder, is the config correct?

I am currently trying to setup a system of forwarder and aggregator instances of fluentd.
My forwarder config is ->
<source>
#type tail
format json
path /app/node-apps/something-fe/logs/itt-logs.log
tag td.pm2.access
</source>
## File output
## match tag=local.** and write to file
<match td.pm2.*>
#type file
path /opt/td-agent/forward/log/fluentd.log
</match>
## Forwarding
## match tag=system.** and forward to another td-agent server
<match td.pm2.*>
#type forward
host <hostname>.bc
Doing this i can see that t forwarder is outputting log files in the forwarding location here : /opt/td-agent/forward/log/fluentd.log
All good so far!!!
But when i try to import this in the aggregator via the match-forward syntaxes above i do not get anything in the aggregator machines.
Please find teh aggregator config for fluentd here that i am using -->
<source>
#type forward
port 24224
</source>
<match td.pm2.*>
type copy
<store>
#type file
path /opt/td-agent/log/forward.log
</store>
<store>
type elasticsearch
host <aggreatorhost>.bc
port 9200
logstash_format true
flush_interval 10s
</store>
</match>
I am trying to use a store to copy the logs over there and also forward them to elasticsearch.
Forgetting elasticsearch altogether, it seems that even the logs are not getting populated from the forwarder to the aggregator.
Am i doing something wrong? The aggregator logs say that its listening on all addresses on port 24224.
On your forwarder you have two identical match patterns and only the first match is being executed (the config is run top to bottom). The logs are being written to the file system (/opt/td-agent/forward/log/fluentd.log) but not forwarded to the aggregator.
You've actually used the correct copy syntax on the aggregator which you should copy into your forwarder and replace the elasticsearch with the #forward config to the aggregator
<match td.pm2.*>
type copy
<store>
#type file
path /opt/td-agent/log/forward.log
</store>
<store>
#type forward
host <hostname>.bc
</store>
</match>
Further reading: http://docs.fluentd.org/articles/out_copy
I think there is a typo:
there has to be an "#"-symbol before type copy. At least, from my experience I can say that my td-agent didn't allow me to restart without the "#"-symbol, so I think it is right, though not 100% sure.
#type copy
in the second line of the code above.

fluent-plugin-elasticsearch: logstash_format true not working

I am trying to implement this:
http://www.tipstuff.org/2014/01/Postfix-log-centralize-and-analysis-in-realtime-with-fluentd-elasticsearch-and-kibana-part-4.html
I have everything working with this configuration:
<match mail.info>
type elasticsearch
log_level debug
index_name postfix_mail
type_name postfix_mail
</match>
But when I add logstash_format true, it does not work. I desperately need timestamp in my ES index to get Kibana to work as desired.
<match mail.info>
type elasticsearch
log_level debug
index_name postfix_mail
type_name postfix_mail
logstash_format true
</match>
I tried to add verbose logging in td-agent init script (-vv option), but I don't get anything of value there.
Any inputs to resolve this will be highly appreciated.
in your match block, I'm not seeing any details for the elasticsearch server. maybe add that?
docs are here:
https://github.com/uken/fluent-plugin-elasticsearch

get docker log stream in correct order

I've tried a couple of log collection services now, like logspout/papertrail and fluentd/elasticsearch, but the results don't always show up in the correct order, which can make debugging difficult. An example is with a Node.js application, a console.log command which results in multiple lines, or an error with its stack trace. The lines all show up with the same timestamp, and I guess the log collection services have no way to know which order to display those. Is there a way to add millisecond precision? Or some other way to make sure they are displayed in the same order as if I did a docker logs command?
Update: I haven't looked into it, but I saw something about fluent or elasticsearch supporting millisecond+ accuracy by default in a newer version
In my understanding, you have 2 options:
Increase time stamp precision (like you did); or
Use log storage which can maintain the order of data. For example MongoDB. The log collection concept is described in another stackoverflow post.
I found a workaround for fluentd in this answer, though I'd still like a real solution
Here is my modified td-agent.conf, for use in the fluentd-es-image. It adds the time_nano field, which can be sorted on
<source>
type tail
format json
time_key time
path /varlog/containers/*.log
pos_file /varlog/es-containers.log.pos
time_format %Y-%m-%dT%H:%M:%S.%L%Z
tag cleanup.reform.*
read_from_head true
</source>
<match cleanup.**>
type record_reformer
time_nano ${t = Time.now; ((t.to_i * 1000000000) + t.nsec).to_s}
tag ${tag_suffix[1]}
</match>
<match reform.**>
type record_reformer
enable_ruby true
tag kubernetes.${tag_suffix[3].split('-')[0..-2].join('-')}
</match>
<match kubernetes.**>
type elasticsearch
log_level info
include_tag_key true
host elasticsearch-logging.default
port 9200
logstash_format true
flush_interval 5s
# Never wait longer than 5 minutes between retries.
max_retry_wait 300
# Disable the limit on the number of retries (retry forever).
disable_retry_limit
</match>
<source>
type tail
format none
path /varlog/kubelet.log
pos_file /varlog/es-kubelet.log.pos
tag kubelet
</source>
<match kubelet>
type elasticsearch
log_level info
include_tag_key true
host elasticsearch-logging.default
port 9200
logstash_format true
flush_interval 5s
# Never wait longer than 5 minutes between retries.
max_retry_wait 300
# Disable the limit on the number of retries (retry forever).
disable_retry_limit
</match>

Resources