Ship filebeat logs to logstash to index with docker metadata - elasticsearch

Iam trying to index in elastichsearch with the help of filebeat and logstash. Here is the filebeat.yml :
filebeat.inputs:
- type: docker
combine_partial: true
containers:
path: "/usr/share/dockerlogs/data"
stream: "stdout"
ids:
- "*"
exclude_files: ['\.gz$']
ignore_older: 10m
processors:
# decode the log field (sub JSON document) if JSON encoded, then maps it's fields to elasticsearch fields
- decode_json_fields:
fields: ["log", "message"]
target: ""
# overwrite existing target elasticsearch fields while decoding json fields
overwrite_keys: true
- add_docker_metadata:
host: "unix:///var/run/docker.sock"
filebeat.config.modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: false
# setup filebeat to send output to logstash
output.logstash:
hosts: ["xxx.xx.xx.xx:5044"]
# Write Filebeat own logs only to file to avoid catching them with itself in docker log files
logging.level: info
logging.to_files: false
logging.to_syslog: false
loggins.metrice.enabled: false
logging.files:
path: /var/log/filebeat
name: filebeat
keepfiles: 7
permissions: 0644
ssl.verification_mode: none
And here is the logstash.conf:
input
{
beats {
port => 5044
host => "0.0.0.0"
}
}
output
{
stdout {
codec => dots
}
elasticsearch {
hosts => "http://xxx.xx.xx.x:9200"
index => "%{[docker][container][labels][com][docker][swarm][service][name]}-%{+xxxx.ww}"
}
}
Iam trying to index with the docker name so it would be more readable and more clear than the usual pattern we see all the time like "filebeat-xxxxxx.some-date".
I tried several things:
- index => "%{[docker][container][labels][com][docker][swarm][service][name]}-%{+xxxx.ww}"
- index => "%{[docker][container][labels][com][docker][swarm][service][name]}-%{+YYYY.MM}"
- index => "%{[docker][swarm][service][name]}-%{+xxxx.ww}"
But nothing worked. What am i doing wrong ? Maybe iam doing something wrong or missing anthing in filebeat.yml file. It could be that too.
Thanks for any help or any lead.

Looks like you're unsure of what docker metadata fields are being added. It might be a good idea to just get successful indexing first with the default index name (ex. "filebeat-xxxxxx.some-date" or whatever) and then view the log events to see the format of your docker metadata fields.
I don't have the same setup as you, but for reference, I'm on AWS ECS so the format of my docker fields are:
"docker": {
"container": {
"name": "",
"labels": {
"com": {
"amazonaws": {
"ecs": {
"cluster": "",
"container-name": "",
"task-definition-family": "",
"task-arn": "",
"task-definition-version": ""
}
}
}
},
"image": "",
"id": ""
}
}
After seeing the format and fields available, I was able to add a custom "application_name" field using the above. This field is being generated in my input plugin which is redis in my case, but all input plugins should have the add_field option (https://www.elastic.co/guide/en/logstash/current/plugins-inputs-beats.html#plugins-inputs-beats-add_field):
input {
redis {
host => "***"
data_type => "list"
key => "***"
codec => json
add_field => {
"application_name" => "%{[docker][container][labels][com][amazonaws][ecs][task-definition-family]}"
}
}
}
After getting getting this new custom field, I was able to run specific filters (grok, json, kv, etc) for different "application_name" fields as they had different log formats, but the important part for you is that you could use it in your output to Elasticsearch for index names:
output {
elasticsearch {
user => ***
password => ***
hosts => [ "***" ]
index => "logstash-%{application_name}-%{+YYY.MM.dd}"
}
}

Related

Filebeat - Logstash - Multiple Config Files - Duplicate data

I am new to logstash and filebeat. I am trying to set up multiple config files for my logstash instance.
Using filebeat to send data to logstash. Even if I have filters created for both the logstash config files, I am getting duplicate data.
Logstash config file - 1:
input {
beats {
port => 5045
}
}
filter {
if [fields][env] == "prod" {
grok { match => { "message" => "%{LOGLEVEL:loglevel}] %{GREEDYDATA:message}$" }
overwrite => [ "message" ]
}
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
hosts => ["https://172.17.0.2:9200"]
index => "logstash-myapp-%{+YYYY.MM.dd}"
user => "elastic"
password => "password"
ssl => true
cacert => "/usr/share/logstash/certs/http_ca.crt"
}
}
logstash config file-2
input {
beats {
port => 5044
}
}
filter {
if [fields][env] == "dev" {
grok { match => { "message" => "%{LOGLEVEL:loglevel}] %{GREEDYDATA:message}$" }
overwrite => [ "message" ]
}
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
hosts => ["https://172.17.0.2:9200"]
index => "logstash-myapp-%{+YYYY.MM.dd}"
user => "elastic"
password => "password"
ssl => true
cacert => "/usr/share/logstash/certs/http_ca.crt"
}
}
Logfile Content:
[INFO] First Line
[INFO] Second Line
[INFO] Third Line
Filebeat config:
filebeat.inputs:
- type: filestream
enabled: true
paths:
- /root/data/logs/*.log
fields:
app: test
env: dev
output.logstash:
# The Logstash hosts
hosts: ["172.17.0.4:5044"]
I know that even if we have multiple files for config, logstash processes each and every line of the data against all the filters present in all the config files. Hence we have put filters in each of the config files for "fields.env".
I am expecting 3 lines to be sent to Elasticsearch because "fields.env" is "dev", but it is sending 6 lines to Elasticsearch and duplicate data.
Pleas help.
The problem is that your two configuration files get merged, not only the filters but also the outputs.
So each log line making it into the pipeline through any of the input, will go through all filters (bearing any conditions of course) and all outputs (no conditions possible in output).
So the first log line [INFO] First Line coming in from port 5044, will only go through the filter guarded by [fields][env] == "dev", but then will go through each of the two outputs, hence why it ends up twice in your ES.
So the easy solution is to remove the output section from one of the configuration file, so that log lines only go through a single output.
The better solution is to create separate pipelines.

Logstash and filebeat set event.dataset value

I've a configuration in which filebeat fetches logs from some files (using a custom format) and sends those logs to a logstash instance.
In logstash I apply a gork filter in order to split some of the fields and then I send the output to my elasticsearch instance.
The pipeline works fine and it is correctly loaded on elasticsearch, but no event data is present (such as event.dataset or event.module). So I'm looking for the piece of code for adding such information to my events.
Here my filebeat configuration:
filebeat.config:
modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: false
filebeat.inputs:
- type: log
paths:
- /var/log/*/info.log
- /var/log/*/warning.log
- /var/log/*/error.log
output.logstash:
hosts: '${ELK_HOST:logstash}:5044'
Here my logstash pipeline:
input {
beats {
port => 5044
}
}
filter {
grok {
match => { "message" => "MY PATTERN"}
}
mutate {
add_field => { "logLevelLower" => "%{logLevel}" }
}
mutate {
lowercase => [ "logLevelLower" ]
}
}
output {
elasticsearch {
hosts => "elasticsearch:9200"
user => "USER"
password => "PASSWORD"
index => "%{[#metadata][beat]}-%{logLevelLower}-%{[#metadata][version]}"
}
}
You can do it like this easily with a mutate/add_field filter:
filter {
mutate {
add_field => {
"[ecs][version]" => "1.5.0"
"[event][kind]" => "event"
"[event][category]" => "host"
"[event][type]" => ["info"]
"[event][dataset]" => "module.dataset"
}
}
}
The Elastic Common Schema documentation explains how to pick values for kind, category and type.

Elasticsearch + Filebeat + Logstash

I am new to elastic stack and recently tried to ship logs to ELK stack, but observed strange issue.
Could someone please suggest me with the configuration.
filebeat.yml
filebeat.inputs:
- type: log
paths:
#- /var/log/*.log
- D:\apps\logs\RGGYSLT-0473\learnings-elasticsearch\*.log
multiline.pattern: '^\[[0-9]{4}-[0-9]{2}-[0-9]{2}'
multiline.negate: true
multiline.match: after
logstash.conf
input {
beats {
type => "v1-elasticsearch"
host => "127.0.0.1"
port => "5044"
}
}
filter {
if[type] == "v1-elasticsearch" {
#If log line contains tab character followed by 'at' then we will tag that entry as stacktrace
if [message] =~ "\tat" {
grok {
match => ["message", "^(\tat)"]
add_tag => ["stacktrace"]
}
}
}
}
output {
stdout {
codec => rubydebug
}
# Sending properly parsed log events to elasticsearch
elasticsearch {
hosts => ["http://localhost:9200"]
index => "dhisco-learnings-elasticseach-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
#user => "elastic"
#password => "changeme"
}
}
Kibana output -
Jun 7, 2020 # 23:58:58.067 2020-06-07 23:58:48,480 88900 [http-nio-9090-exec-2] INFO c.d.l.e.web.HotelController - Brand: RADISSON
2020-06-07 23:58:49,297 88900 [http-nio-9090-exec-3] INFO c.d.l.e.web.HotelController - Brand: RADISSON
I hit my controller twice, but unfortunately, both logs got concatenated and shown against the same timestamp.
Could someone please suggest ?
I did the same thing using json encoder and never faced any issues.

How to set kibana index pattern from filebeat?

I am using elk stack with a node application. I am sending logs from host to logstash with filebeat, logsstash formats and send data to elastic and kibana reads from elastic. In kibana i see default index pattern like filebeat-2019.06.16.
I want to change this to application-name-filebeat-2019.06.16. But it's not working. I am looking for a way to do it in filebeat since there will be multiple applications/filebeats but one single logstash/elasticsearch/kibana.
I have tried this filebeat configs at filebeat.yml.
filebeat.inputs:
- type: log
paths:
- /var/log/*.log
fields:
- app_name: myapp
output.logstash:
index: "%{fields.app_name}-filebeat-%{[agent.version]}-%{+yyyy.MM.dd}"
hosts: ["${ELK_ENDPOINT}"]
ssl.enabled: true
ssl:
certificate_authorities:
- /etc/pki/tls/certs/logstash-beats.crt
setup.template.name: "%{fields.app_name}-filebeat-%{[agent.version]}"
same kind of file will be with each of node appication host and filebeat.
also logstash is initialized with this configs
02-beats-input.conf
input {
beats {
port => 5044
codec => "json"
ssl => true
ssl_certificate => "/etc/pki/tls/certs/logstash-beats.crt"
ssl_key => "/etc/pki/tls/private/logstash-beats.key"
}
}
30-output.conf
filter {
json {
source => "message"
}
}
output {
elasticsearch {
hosts => ["localhost"]
manage_template => false
index => "%{[#metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
}
it is genarating index pattern like filebeat-2019.06.16. I want something like application-name-filebeat-2019.06.16.
You are sending your filebeat logs to logstash, you need to define the index name in the logstash pipeline, not in the filebeat config file.
Try the following output:
output {
elasticsearch {
hosts => ["localhost"]
manage_template => false
index => "%{[fields][app_name]}-%{[#metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
}
To set the index name on filebeat you would need to send the logs directly to elasticsearch.
If you have other beats sending data to the same port and some of them do not have the field [fields][app_name] you could use a conditional on your output or create the field on your pipeline.

Logs in Kibana are not sorted by log timestamp

I use Filebeat and Logstash to send logs to Elasticsearch. I can see my logs in Kibana but logs are not sorted correctly according to the timestamp in log record.
I tried to create a separate field dateTime for the log record timestamp but it looks like it's not possible to sort table in Kibana by that column.
Kibana screenshot
Could anybody explain what could be a solution in this situation?
filebeat
filebeat.prospectors:
- input_type: log
paths:
- /var/log/app.log
fields_under_root: true
multiline.pattern: '^[0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3}'
multiline.negate: true
multiline.match: after
registry_file: /var/lib/filebeat/registry
output.logstash:
hosts: ["host_name:5044"]
ssl.certificate_authorities: ["..."]
logstash
input {
beats {
port => 5044
ssl => true
ssl_certificate => "..."
ssl_key => "..."
}
}
filter {
if [type] == "filebeat" {
grok {
match => { "message" => "(?<dateTime>[0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3})"}
}
}
}
output {
elasticsearch {
hosts => "elasticsearch:9200"
}
}
This functionality is exactly what the date filter is made for. Add this after your grok expression:
date {
match => [ "dateTime", "HH:mm:ss.SSS" ]
}
This will set the #timestamp field that Kibana sorts by to be the value in this field.

Resources