Nginx module for filebeats doesn't parse access logs - elasticsearch

I am using nginx module for filebeats to send log data to elasticsearch. Here is my filebeats configuration:
output:
logstash:
enabled: true
hosts:
- logstash:5044
timeout: 15
filebeat.modules:
- module: nginx
access:
enabled: true
var.paths: ["/var/log/nginx/access.log"]
error:
enabled: true
var.paths: ["/var/log/nginx/error.log"]
The problem is that logs are not parsed. This is what I see in Kibana:
{ "_index": "filebeat-2017.07.18", "_type": "log", "_id": "AV1VLXEbhj7uWd8Fgz6M", "_version": 1, "_score": null, "_source": {
"#timestamp": "2017-07-18T10:10:24.791Z",
"offset": 65136,
"#version": "1",
"beat": {
"hostname": "06d09033fb23",
"name": "06d09033fb23",
"version": "5.5.0"
},
"input_type": "log",
"host": "06d09033fb23",
"source": "/var/log/nginx/access.log",
"message": "10.15.129.226 - - [18/Jul/2017:12:10:21 +0200] \"POST /orders-service/orders/v1/sessions/update/FUEL_DISPENSER?api_key=vgxt5u24uqyyyd9gmxzpu9n7 HTTP/1.1\" 200 5 \"-\" \"Mashery Proxy\"",
"type": "log",
"tags": [
"beats_input_codec_plain_applied"
] }, "fields": {
"#timestamp": [
1500372624791
] }, "sort": [
1500372624791 ] }
I am missing parsed fields, as specified in the documentation: https://www.elastic.co/guide/en/beats/filebeat/current/exported-fields-nginx.html
Why are log lines not parsed?

When you run filebeat -v -modules=nginx -setup, it will essentially create 4 things:
mapping template
kibana dashboards
machineLearning job
filters in the ingest node
Here are the filters for parsing:
- nginx access log
- nginx error log
The filters are stored in the ingest node. You can access them on:
http://YourElasticHost:9200/_ingest/pipeline
So if you want your logs parsed, you need to send them via the ingest node.

Related

elasticsearch filebeat mapper_parsing_exception when using decode_json_fields

I have ECK setup and im using filebeat to ship logs from Kubernetes to elasticsearch.
Ive recently added decode_json_fields processor to my configuration, so that im able decode the json that is usually in the message field.
- decode_json_fields:
fields: ["message"]
process_array: false
max_depth: 10
target: "log"
overwrite_keys: true
add_error_key: true
However logs have stopped appearing since adding it.
example log:
{
"_index": "filebeat-7.9.1-2020.10.01-000001",
"_type": "_doc",
"_id": "wF9hB3UBtUOF3QRTBcts",
"_score": 1,
"_source": {
"#timestamp": "2020-10-08T08:43:18.672Z",
"kubernetes": {
"labels": {
"controller-uid": "9f3f9d08-cfd8-454d-954d-24464172fa37",
"job-name": "stream-hatchet-cron-manual-rvd"
},
"container": {
"name": "stream-hatchet-cron",
"image": "<redacted>.dkr.ecr.us-east-2.amazonaws.com/stream-hatchet:v0.1.4"
},
"node": {
"name": "ip-172-20-32-60.us-east-2.compute.internal"
},
"pod": {
"uid": "041cb6d5-5da1-4efa-b8e9-d4120409af4b",
"name": "stream-hatchet-cron-manual-rvd-bh96h"
},
"namespace": "default"
},
"ecs": {
"version": "1.5.0"
},
"host": {
"mac": [],
"hostname": "ip-172-20-32-60",
"architecture": "x86_64",
"name": "ip-172-20-32-60",
"os": {
"codename": "Core",
"platform": "centos",
"version": "7 (Core)",
"family": "redhat",
"name": "CentOS Linux",
"kernel": "4.9.0-11-amd64"
},
"containerized": false,
"ip": []
},
"cloud": {
"instance": {
"id": "i-06c9d23210956ca5c"
},
"machine": {
"type": "m5.large"
},
"region": "us-east-2",
"availability_zone": "us-east-2a",
"account": {
"id": "<redacted>"
},
"image": {
"id": "ami-09d3627b4a09f6c4c"
},
"provider": "aws"
},
"stream": "stdout",
"message": "{\"message\":{\"log_type\":\"cron\",\"status\":\"start\"},\"level\":\"info\",\"timestamp\":\"2020-10-08T08:43:18.670Z\"}",
"input": {
"type": "container"
},
"log": {
"offset": 348,
"file": {
"path": "/var/log/containers/stream-hatchet-cron-manual-rvd-bh96h_default_stream-hatchet-cron-73069980b418e2aa5e5dcfaf1a29839a6d57e697c5072fea4d6e279da0c4e6ba.log"
}
},
"agent": {
"type": "filebeat",
"version": "7.9.1",
"hostname": "ip-172-20-32-60",
"ephemeral_id": "6b3ba0bd-af7f-4946-b9c5-74f0f3e526b1",
"id": "0f7fff14-6b51-45fc-8f41-34bd04dc0bce",
"name": "ip-172-20-32-60"
}
},
"fields": {
"#timestamp": [
"2020-10-08T08:43:18.672Z"
],
"suricata.eve.timestamp": [
"2020-10-08T08:43:18.672Z"
]
}
}
In the filebeat logs i can see the following error:
2020-10-08T09:25:43.562Z WARN [elasticsearch] elasticsearch/client.go:407 Cannot
index event
publisher.Event{Content:beat.Event{Timestamp:time.Time{wall:0x36b243a0,
ext:63737745936, loc:(*time.Location)(nil)}, Meta:null,
Fields:{"agent":{"ephemeral_id":"5f8afdba-39c3-4fb7-9502-be7ef8f2d982","hostname":"ip-172-20-32-60","id":"0f7fff14-6b51-45fc-8f41-34bd04dc0bce","name":"ip-172-20-32-60","type":"filebeat","version":"7.9.1"},"cloud":{"account":{"id":"700849607999"},"availability_zone":"us-east-2a","image":{"id":"ami-09d3627b4a09f6c4c"},"instance":{"id":"i-06c9d23210956ca5c"},"machine":{"type":"m5.large"},"provider":"aws","region":"us-east-2"},"ecs":{"version":"1.5.0"},"host":{"architecture":"x86_64","containerized":false,"hostname":"ip-172-20-32-60","ip":["172.20.32.60","fe80::af:9fff:febe:dc4","172.17.0.1","100.96.1.1","fe80::6010:94ff:fe17:fbae","fe80::d869:14ff:feb0:81b3","fe80::e4f3:b9ff:fed8:e266","fe80::1c19:bcff:feb3:ce95","fe80::fc68:21ff:fe08:7f24","fe80::1cc2:daff:fe84:2a5a","fe80::3426:78ff:fe22:269a","fe80::b871:52ff:fe15:10ab","fe80::54ff:cbff:fec0:f0f","fe80::cca6:42ff:fe82:53fd","fe80::bc85:e2ff:fe5f:a60d","fe80::e05e:b2ff:fe4d:a9a0","fe80::43a:dcff:fe6a:2307","fe80::581b:20ff:fe5f:b060","fe80::4056:29ff:fe07:edf5","fe80::c8a0:5aff:febd:a1a3","fe80::74e3:feff:fe45:d9d4","fe80::9c91:5cff:fee2:c0b9"],"mac":["02:af:9f:be:0d:c4","02:42:1b:56:ee:d3","62:10:94:17:fb:ae","da:69:14:b0:81:b3","e6:f3:b9:d8:e2:66","1e:19:bc:b3:ce:95","fe:68:21:08:7f:24","1e:c2:da:84:2a:5a","36:26:78:22:26:9a","ba:71:52:15:10:ab","56:ff:cb:c0:0f:0f","ce:a6:42:82:53:fd","be:85:e2:5f:a6:0d","e2:5e:b2:4d:a9:a0","06:3a:dc:6a:23:07","5a:1b:20:5f:b0:60","42:56:29:07:ed:f5","ca:a0:5a:bd:a1:a3","76:e3:fe:45:d9:d4","9e:91:5c:e2:c0:b9"],"name":"ip-172-20-32-60","os":{"codename":"Core","family":"redhat","kernel":"4.9.0-11-amd64","name":"CentOS
Linux","platform":"centos","version":"7
(Core)"}},"input":{"type":"container"},"kubernetes":{"container":{"image":"700849607999.dkr.ecr.us-east-2.amazonaws.com/stream-hatchet:v0.1.4","name":"stream-hatchet-cron"},"labels":{"controller-uid":"a79daeac-b159-4ba7-8cb0-48afbfc0711a","job-name":"stream-hatchet-cron-manual-c5r"},"namespace":"default","node":{"name":"ip-172-20-32-60.us-east-2.compute.internal"},"pod":{"name":"stream-hatchet-cron-manual-c5r-7cx5d","uid":"3251cc33-48a9-42b1-9359-9f6e345f75b6"}},"log":{"level":"info","message":{"log_type":"cron","status":"start"},"timestamp":"2020-10-08T09:25:36.916Z"},"message":"{"message":{"log_type":"cron","status":"start"},"level":"info","timestamp":"2020-10-08T09:25:36.916Z"}","stream":"stdout"},
Private:file.State{Id:"native::30998361-66306", PrevId:"",
Finished:false, Fileinfo:(*os.fileStat)(0xc001c14dd0),
Source:"/var/log/containers/stream-hatchet-cron-manual-c5r-7cx5d_default_stream-hatchet-cron-4278d956fff8641048efeaec23b383b41f2662773602c3a7daffe7c30f62fe5a.log",
Offset:539, Timestamp:time.Time{wall:0xbfd7d4a1e556bd72,
ext:916563812286, loc:(*time.Location)(0x607c540)}, TTL:-1,
Type:"container", Meta:map[string]string(nil),
FileStateOS:file.StateOS{Inode:0x1d8ff59, Device:0x10302},
IdentifierName:"native"}, TimeSeries:false}, Flags:0x1,
Cache:publisher.EventCache{m:common.MapStr(nil)}} (status=400):
{"type":"mapper_parsing_exception","reason":"failed to parse field
[log.message] of type [keyword] in document with id
'56aHB3UBLgYb8gz801DI'. Preview of field's value: '{log_type=cron,
status=start}'","caused_by":{"type":"illegal_state_exception","reason":"Can't
get text on a START_OBJECT at 1:113"}}
It throws an error because apparently log.message is of type "keyword" however this does not exist in the index mapping.
I thought this maybe an issue with the "target": "log" so ive tried changing this to something arbitrary like "my_parsed_message" or "m_log" or "mlog" and i get the same error for all of them.
{"type":"mapper_parsing_exception","reason":"failed to parse field
[mlog.message] of type [keyword] in document with id
'J5KlDHUB_yo5bfXcn2LE'. Preview of field's value: '{log_type=cron,
status=end}'","caused_by":{"type":"illegal_state_exception","reason":"Can't
get text on a START_OBJECT at 1:217"}}
Elastic version: 7.9.2
The problem is that some of your JSON messages contain a message field that is sometimes a simple string and other times a nested JSON object (like in the case you're showing in your question).
After this index was created, the very first message that was parsed was probably a string and hence the mapping has been modified to add the following field (line 10553):
"mlog": {
"properties": {
...
"message": {
"type": "keyword",
"ignore_above": 1024
},
}
}
You'll find the same pattern for my_parsed_message (line 10902), my_parsed_logs (line 10742), etc...
Hence the next message that comes with message being a JSON object, like
{"message":{"log_type":"cron","status":"start"}, ...
will not work because it's an object, not a string...
Looking at the fields of your custom JSON, it seems you don't really have the control over either their taxonomy (i.e. naming) or what they contain...
If you're serious about willing to search within those custom fields (which I think you are since you're parsing the field, otherwise you'd just store the stringified JSON), then I can only suggest to start figuring out a proper taxonomy in order to make sure that they all get a standard type.
If all you care about is logging your data, then I suggest to simply disable the indexing of that message field. Another solution is to set dynamic: false in your mapping to ignore those fields, i.e. not modify your mapping.

Mapping exception while executing Rollup job

I have a Kibana instance which stores log data from our java apps in per daily indexes, like logstash-java-beats-2019.09.01. As far as amount of indexes could be pretty big in future I want to create a rollup job, to be able to archive old logs in separate index, something like logstash-java-beats-rollup. Typical document in logstash-java-beats-2019.09.01 index looks like this:
{
"_index": "logstash-java-beats-2019.10.01",
"_type": "_doc",
"_id": "C9mfhG0Bf_Fr5GBl6kTg",
"_version": 1,
"_score": 1,
"_source": {
"#timestamp": "2019-10-01T00:02:13.756Z",
"ecs": {
"version": "1.0.0"
},
"event_timestamp": "2019-10-01 00:02:13,756",
"log": {
"offset": 5729359,
"file": {
"path": "/var/log/application-name/application.log"
}
},
"tags": [
"service-name",
"location",
"beats_input_codec_plain_applied"
],
"loglevel": "WARN",
"java_class": "java.class.name",
"message": "Log message here",
"host": {
"name": "host-name-beat"
},
"#version": "1",
"agent": {
"hostname": "host-name",
"id": "a34af368-3359-495a-9775-63502693d148",
"ephemeral_id": "cc4afd3c-ad97-47a4-bd21-72255d450232",
"type": "filebeat",
"version": "7.2.0",
"name": "host-name-beat"
},
"input": {
"type": "log"
}
}
}
So I created a rollup job with such config:
{
"config": {
"id": "Test 2 job",
"index_pattern": "logstash-java-beats-2*",
"rollup_index": "logstash-java-beats-rollup",
"cron": "0 0 * * * ?",
"groups": {
"date_histogram": {
"fixed_interval": "1000ms",
"field": "#timestamp",
"delay": "1d",
"time_zone": "UTC"
}
},
"metrics": [],
"timeout": "20s",
"page_size": 1000
},
"status": {
"job_state": "stopped",
"current_position": {
"#timestamp.date_histogram": 1567933199000
},
"upgraded_doc_id": true
},
"stats": {
"pages_processed": 1840,
"documents_processed": 5322525,
"rollups_indexed": 1838383,
"trigger_count": 1,
"index_time_in_ms": 1555018,
"index_total": 1839,
"index_failures": 0,
"search_time_in_ms": 59059,
"search_total": 1840,
"search_failures": 0
}
}
but it fails to rollup the data with such exception:
Error while attempting to bulk index documents: failure in bulk execution:
[0]: index [logstash-java-beats-rollup], type [_doc], id [Test 2 job$GTvyIZtPhKqi-dtfVd6MXg], message [MapperParsingException[Could not dynamically add mapping for field [#timestamp.date_histogram.time_zone]. Existing mapping for [#timestamp] must be of type object but found [date].]]
[1]: index [logstash-java-beats-rollup], type [_doc], id [Test 2 job$v-r89eEpLvImr0lWIrOb_Q], message [MapperParsingException[Could not dynamically add mapping for field [#timestamp.date_histogram.time_zone]. Existing mapping for [#timestamp] must be of type object but found [date].]]
[2]: index [logstash-java-beats-rollup], type [_doc], id [Test 2 job$quCHwZP1iVU_Bs2fmhgSjQ], message [MapperParsingException[Could not dynamically add mapping for field [#timestamp.date_histogram.time_zone]. Existing mapping for [#timestamp] must be of type object but found [date].]]
...
logstash-java-beats-rollup index is empty, even if there is some stats for the rollup job available.
I'm using elasticsearch v7.2.0
Could you please explain what is wrong with the data, or with the rollup job configuration?

logstash json filter source

I cannot get the message field to decode from my json log line when receiving via filebeat.
Here is the line in my logs:
{"levelname": "WARNING", "asctime": "2016-07-01 18:06:37", "message": "One or more gateways are offline", "name": "ep.management.commands.import", "funcName": "check_gateway_online", "lineno": 103, "process": 44551, "processName": "MainProcess", "thread": 140735198597120, "threadName": "MainThread", "server": "default"}
Here the logstash config. I tried with and without the codec. The only difference is that the message is being escaped when I use the codec.
input {
beats {
port => 5044
codec => "json"
}
}
filter {
json{
source => "message"
}
}
Here is the json as it arrives in elasticsearch:
{
"_index": "filebeat-2016.07.01",
"_type": "json",
"_id": "AVWnpK519vJkh3Ry-Q9B",
"_score": null,
"_source": {
"#timestamp": "2016-07-01T18:07:13.522Z",
"beat": {
"hostname": "59b378d40b2e",
"name": "59b378d40b2e"
},
"count": 1,
"fields": null,
"input_type": "log",
"message": "{\"levelname\": \"WARNING\", \"asctime\": \"2016-07-01 18:07:12\", \"message\": \"One or more gateways are offline on server default\", \"name\": \"ep.controllers.secure_client\", \"funcName\": \"check_gateways_online\", \"lineno\": 80, \"process\": 44675, \"processName\": \"MainProcess\", \"thread\": 140735198597120, \"threadName\": \"MainThread\"}",
"offset": 251189,
"source": "/mnt/ep_logs/ep_.json",
"type": "json"
},
"fields": {
"#timestamp": [
1467396433522
]
},
"sort": [
1467396433522
]
}
What I would like is that contents from the message object are decoded.
Many thanks
When that happens, it's usually because your Filebeat instance is configured to send documents directly to ES.
In your filebeat configuration file, make sure to comment out the elasticsearch output.

How to do a query in Grafana with Elasticsearch as source datas?

I am a beginner with Grafana and ElasticSearch. I added Elasticsearch source in grafana. Now, i would like do a query to show my datas.
An example of one data in Elasticsearch:
{
"_index": "shinken-2016.04.08",
"_type": "shinken-logs",
"_id": "AVP0GFeTmLuZ9eaw1Bjp",
"_score": ​1.0,
"_source":
{
"comment": "",
"plugin_output": "",
"attempt": ​0,
"message": "[1460089115] SERVICE NOTIFICATION: shinken;hostname_test.com;MySQL - TCP;CRITICAL;notify-service-by-email;connect to address 10.11.12.13 and port 1234: No route to host",
"logclass": ​3,
"options": "",
"state_type": "CRITICAL",
"state": ​2,
"host_name": "hostname_test.com",
"#timestamp": "2016-04-08T04:18:35Z",
"time": ​1460089115,
"service_description": "MySQL - TCP",
"logobject": ​2,
"type": "SERVICE NOTIFICATION",
"contact_name": "shinken",
"command_name": "notify-service-by-email"
}
},
My goal it's to show in granafa the state number of one service (here it's MySQL - TCP) for each day (here 2016-04-08).
My question is: How to do a query in Grafana with Elasticsearch as source datas ?

Leave out default Logstash fields in ElasticSearch

After processing data with: input | filter | output > ElasticSearch the format it's get stored in is somewhat like:
"_index": "logstash-2012.07.02",
"_type": "stdin",
"_id": "JdRaI5R6RT2do_WhCYM-qg",
"_score": 0.30685282,
"_source": {
"#source": "stdin://dist/",
"#type": "stdin",
"#tags": [
"tag1",
"tag2"
],
"#fields": {},
"#timestamp": "2012-07-02T06:17:48.533000Z",
"#source_host": "dist",
"#source_path": "/",
"#message": "test"
}
I filter/store most of the important information in specific fields, is it possible to leave out the default fields like: #source_path and #source_host? In the near future it's going to store 8 billion logs/month and I would like to run some performance tests with this default fields excluded (I just don't use these fields).
This removes fields from output:
filter {
mutate {
# remove duplicate fields
# this leaves timestamp from message and source_path for source
remove => ["#timestamp", "#source"]
}
}
Some of that will depend on what web interface you are using to view your logs. I'm using Kibana, and a customer logger (c#) that indexes the following:
{
"_index": "logstash-2013.03.13",
"_type": "logs",
"_id": "n3GzIC68R1mcdj6Wte6jWw",
"_version": 1,
"_score": 1,
"_source":
{
"#source": "File",
"#message": "Shalom",
"#fields":
{
"tempor": "hit"
},
"#tags":
[
"tag1"
],
"level": "Info"
"#timestamp": "2013-03-13T21:47:51.9838974Z"
}
}
This shows up in Kibana, and the source fields are not there.
To exclude certain fields you can use prune filter plugin.
filter {
prune {
blacklist_names => [ "#timestamp", "#source" ]
}
}
Prune filter is not a logstash default plugin and must be installed first:
bin/logstash-plugin install logstash-filter-prune

Resources