logstash json filter source - elasticsearch

I cannot get the message field to decode from my json log line when receiving via filebeat.
Here is the line in my logs:
{"levelname": "WARNING", "asctime": "2016-07-01 18:06:37", "message": "One or more gateways are offline", "name": "ep.management.commands.import", "funcName": "check_gateway_online", "lineno": 103, "process": 44551, "processName": "MainProcess", "thread": 140735198597120, "threadName": "MainThread", "server": "default"}
Here the logstash config. I tried with and without the codec. The only difference is that the message is being escaped when I use the codec.
input {
beats {
port => 5044
codec => "json"
}
}
filter {
json{
source => "message"
}
}
Here is the json as it arrives in elasticsearch:
{
"_index": "filebeat-2016.07.01",
"_type": "json",
"_id": "AVWnpK519vJkh3Ry-Q9B",
"_score": null,
"_source": {
"#timestamp": "2016-07-01T18:07:13.522Z",
"beat": {
"hostname": "59b378d40b2e",
"name": "59b378d40b2e"
},
"count": 1,
"fields": null,
"input_type": "log",
"message": "{\"levelname\": \"WARNING\", \"asctime\": \"2016-07-01 18:07:12\", \"message\": \"One or more gateways are offline on server default\", \"name\": \"ep.controllers.secure_client\", \"funcName\": \"check_gateways_online\", \"lineno\": 80, \"process\": 44675, \"processName\": \"MainProcess\", \"thread\": 140735198597120, \"threadName\": \"MainThread\"}",
"offset": 251189,
"source": "/mnt/ep_logs/ep_.json",
"type": "json"
},
"fields": {
"#timestamp": [
1467396433522
]
},
"sort": [
1467396433522
]
}
What I would like is that contents from the message object are decoded.
Many thanks

When that happens, it's usually because your Filebeat instance is configured to send documents directly to ES.
In your filebeat configuration file, make sure to comment out the elasticsearch output.

Related

elasticsearch filebeat mapper_parsing_exception when using decode_json_fields

I have ECK setup and im using filebeat to ship logs from Kubernetes to elasticsearch.
Ive recently added decode_json_fields processor to my configuration, so that im able decode the json that is usually in the message field.
- decode_json_fields:
fields: ["message"]
process_array: false
max_depth: 10
target: "log"
overwrite_keys: true
add_error_key: true
However logs have stopped appearing since adding it.
example log:
{
"_index": "filebeat-7.9.1-2020.10.01-000001",
"_type": "_doc",
"_id": "wF9hB3UBtUOF3QRTBcts",
"_score": 1,
"_source": {
"#timestamp": "2020-10-08T08:43:18.672Z",
"kubernetes": {
"labels": {
"controller-uid": "9f3f9d08-cfd8-454d-954d-24464172fa37",
"job-name": "stream-hatchet-cron-manual-rvd"
},
"container": {
"name": "stream-hatchet-cron",
"image": "<redacted>.dkr.ecr.us-east-2.amazonaws.com/stream-hatchet:v0.1.4"
},
"node": {
"name": "ip-172-20-32-60.us-east-2.compute.internal"
},
"pod": {
"uid": "041cb6d5-5da1-4efa-b8e9-d4120409af4b",
"name": "stream-hatchet-cron-manual-rvd-bh96h"
},
"namespace": "default"
},
"ecs": {
"version": "1.5.0"
},
"host": {
"mac": [],
"hostname": "ip-172-20-32-60",
"architecture": "x86_64",
"name": "ip-172-20-32-60",
"os": {
"codename": "Core",
"platform": "centos",
"version": "7 (Core)",
"family": "redhat",
"name": "CentOS Linux",
"kernel": "4.9.0-11-amd64"
},
"containerized": false,
"ip": []
},
"cloud": {
"instance": {
"id": "i-06c9d23210956ca5c"
},
"machine": {
"type": "m5.large"
},
"region": "us-east-2",
"availability_zone": "us-east-2a",
"account": {
"id": "<redacted>"
},
"image": {
"id": "ami-09d3627b4a09f6c4c"
},
"provider": "aws"
},
"stream": "stdout",
"message": "{\"message\":{\"log_type\":\"cron\",\"status\":\"start\"},\"level\":\"info\",\"timestamp\":\"2020-10-08T08:43:18.670Z\"}",
"input": {
"type": "container"
},
"log": {
"offset": 348,
"file": {
"path": "/var/log/containers/stream-hatchet-cron-manual-rvd-bh96h_default_stream-hatchet-cron-73069980b418e2aa5e5dcfaf1a29839a6d57e697c5072fea4d6e279da0c4e6ba.log"
}
},
"agent": {
"type": "filebeat",
"version": "7.9.1",
"hostname": "ip-172-20-32-60",
"ephemeral_id": "6b3ba0bd-af7f-4946-b9c5-74f0f3e526b1",
"id": "0f7fff14-6b51-45fc-8f41-34bd04dc0bce",
"name": "ip-172-20-32-60"
}
},
"fields": {
"#timestamp": [
"2020-10-08T08:43:18.672Z"
],
"suricata.eve.timestamp": [
"2020-10-08T08:43:18.672Z"
]
}
}
In the filebeat logs i can see the following error:
2020-10-08T09:25:43.562Z WARN [elasticsearch] elasticsearch/client.go:407 Cannot
index event
publisher.Event{Content:beat.Event{Timestamp:time.Time{wall:0x36b243a0,
ext:63737745936, loc:(*time.Location)(nil)}, Meta:null,
Fields:{"agent":{"ephemeral_id":"5f8afdba-39c3-4fb7-9502-be7ef8f2d982","hostname":"ip-172-20-32-60","id":"0f7fff14-6b51-45fc-8f41-34bd04dc0bce","name":"ip-172-20-32-60","type":"filebeat","version":"7.9.1"},"cloud":{"account":{"id":"700849607999"},"availability_zone":"us-east-2a","image":{"id":"ami-09d3627b4a09f6c4c"},"instance":{"id":"i-06c9d23210956ca5c"},"machine":{"type":"m5.large"},"provider":"aws","region":"us-east-2"},"ecs":{"version":"1.5.0"},"host":{"architecture":"x86_64","containerized":false,"hostname":"ip-172-20-32-60","ip":["172.20.32.60","fe80::af:9fff:febe:dc4","172.17.0.1","100.96.1.1","fe80::6010:94ff:fe17:fbae","fe80::d869:14ff:feb0:81b3","fe80::e4f3:b9ff:fed8:e266","fe80::1c19:bcff:feb3:ce95","fe80::fc68:21ff:fe08:7f24","fe80::1cc2:daff:fe84:2a5a","fe80::3426:78ff:fe22:269a","fe80::b871:52ff:fe15:10ab","fe80::54ff:cbff:fec0:f0f","fe80::cca6:42ff:fe82:53fd","fe80::bc85:e2ff:fe5f:a60d","fe80::e05e:b2ff:fe4d:a9a0","fe80::43a:dcff:fe6a:2307","fe80::581b:20ff:fe5f:b060","fe80::4056:29ff:fe07:edf5","fe80::c8a0:5aff:febd:a1a3","fe80::74e3:feff:fe45:d9d4","fe80::9c91:5cff:fee2:c0b9"],"mac":["02:af:9f:be:0d:c4","02:42:1b:56:ee:d3","62:10:94:17:fb:ae","da:69:14:b0:81:b3","e6:f3:b9:d8:e2:66","1e:19:bc:b3:ce:95","fe:68:21:08:7f:24","1e:c2:da:84:2a:5a","36:26:78:22:26:9a","ba:71:52:15:10:ab","56:ff:cb:c0:0f:0f","ce:a6:42:82:53:fd","be:85:e2:5f:a6:0d","e2:5e:b2:4d:a9:a0","06:3a:dc:6a:23:07","5a:1b:20:5f:b0:60","42:56:29:07:ed:f5","ca:a0:5a:bd:a1:a3","76:e3:fe:45:d9:d4","9e:91:5c:e2:c0:b9"],"name":"ip-172-20-32-60","os":{"codename":"Core","family":"redhat","kernel":"4.9.0-11-amd64","name":"CentOS
Linux","platform":"centos","version":"7
(Core)"}},"input":{"type":"container"},"kubernetes":{"container":{"image":"700849607999.dkr.ecr.us-east-2.amazonaws.com/stream-hatchet:v0.1.4","name":"stream-hatchet-cron"},"labels":{"controller-uid":"a79daeac-b159-4ba7-8cb0-48afbfc0711a","job-name":"stream-hatchet-cron-manual-c5r"},"namespace":"default","node":{"name":"ip-172-20-32-60.us-east-2.compute.internal"},"pod":{"name":"stream-hatchet-cron-manual-c5r-7cx5d","uid":"3251cc33-48a9-42b1-9359-9f6e345f75b6"}},"log":{"level":"info","message":{"log_type":"cron","status":"start"},"timestamp":"2020-10-08T09:25:36.916Z"},"message":"{"message":{"log_type":"cron","status":"start"},"level":"info","timestamp":"2020-10-08T09:25:36.916Z"}","stream":"stdout"},
Private:file.State{Id:"native::30998361-66306", PrevId:"",
Finished:false, Fileinfo:(*os.fileStat)(0xc001c14dd0),
Source:"/var/log/containers/stream-hatchet-cron-manual-c5r-7cx5d_default_stream-hatchet-cron-4278d956fff8641048efeaec23b383b41f2662773602c3a7daffe7c30f62fe5a.log",
Offset:539, Timestamp:time.Time{wall:0xbfd7d4a1e556bd72,
ext:916563812286, loc:(*time.Location)(0x607c540)}, TTL:-1,
Type:"container", Meta:map[string]string(nil),
FileStateOS:file.StateOS{Inode:0x1d8ff59, Device:0x10302},
IdentifierName:"native"}, TimeSeries:false}, Flags:0x1,
Cache:publisher.EventCache{m:common.MapStr(nil)}} (status=400):
{"type":"mapper_parsing_exception","reason":"failed to parse field
[log.message] of type [keyword] in document with id
'56aHB3UBLgYb8gz801DI'. Preview of field's value: '{log_type=cron,
status=start}'","caused_by":{"type":"illegal_state_exception","reason":"Can't
get text on a START_OBJECT at 1:113"}}
It throws an error because apparently log.message is of type "keyword" however this does not exist in the index mapping.
I thought this maybe an issue with the "target": "log" so ive tried changing this to something arbitrary like "my_parsed_message" or "m_log" or "mlog" and i get the same error for all of them.
{"type":"mapper_parsing_exception","reason":"failed to parse field
[mlog.message] of type [keyword] in document with id
'J5KlDHUB_yo5bfXcn2LE'. Preview of field's value: '{log_type=cron,
status=end}'","caused_by":{"type":"illegal_state_exception","reason":"Can't
get text on a START_OBJECT at 1:217"}}
Elastic version: 7.9.2
The problem is that some of your JSON messages contain a message field that is sometimes a simple string and other times a nested JSON object (like in the case you're showing in your question).
After this index was created, the very first message that was parsed was probably a string and hence the mapping has been modified to add the following field (line 10553):
"mlog": {
"properties": {
...
"message": {
"type": "keyword",
"ignore_above": 1024
},
}
}
You'll find the same pattern for my_parsed_message (line 10902), my_parsed_logs (line 10742), etc...
Hence the next message that comes with message being a JSON object, like
{"message":{"log_type":"cron","status":"start"}, ...
will not work because it's an object, not a string...
Looking at the fields of your custom JSON, it seems you don't really have the control over either their taxonomy (i.e. naming) or what they contain...
If you're serious about willing to search within those custom fields (which I think you are since you're parsing the field, otherwise you'd just store the stringified JSON), then I can only suggest to start figuring out a proper taxonomy in order to make sure that they all get a standard type.
If all you care about is logging your data, then I suggest to simply disable the indexing of that message field. Another solution is to set dynamic: false in your mapping to ignore those fields, i.e. not modify your mapping.

Mapping exception while executing Rollup job

I have a Kibana instance which stores log data from our java apps in per daily indexes, like logstash-java-beats-2019.09.01. As far as amount of indexes could be pretty big in future I want to create a rollup job, to be able to archive old logs in separate index, something like logstash-java-beats-rollup. Typical document in logstash-java-beats-2019.09.01 index looks like this:
{
"_index": "logstash-java-beats-2019.10.01",
"_type": "_doc",
"_id": "C9mfhG0Bf_Fr5GBl6kTg",
"_version": 1,
"_score": 1,
"_source": {
"#timestamp": "2019-10-01T00:02:13.756Z",
"ecs": {
"version": "1.0.0"
},
"event_timestamp": "2019-10-01 00:02:13,756",
"log": {
"offset": 5729359,
"file": {
"path": "/var/log/application-name/application.log"
}
},
"tags": [
"service-name",
"location",
"beats_input_codec_plain_applied"
],
"loglevel": "WARN",
"java_class": "java.class.name",
"message": "Log message here",
"host": {
"name": "host-name-beat"
},
"#version": "1",
"agent": {
"hostname": "host-name",
"id": "a34af368-3359-495a-9775-63502693d148",
"ephemeral_id": "cc4afd3c-ad97-47a4-bd21-72255d450232",
"type": "filebeat",
"version": "7.2.0",
"name": "host-name-beat"
},
"input": {
"type": "log"
}
}
}
So I created a rollup job with such config:
{
"config": {
"id": "Test 2 job",
"index_pattern": "logstash-java-beats-2*",
"rollup_index": "logstash-java-beats-rollup",
"cron": "0 0 * * * ?",
"groups": {
"date_histogram": {
"fixed_interval": "1000ms",
"field": "#timestamp",
"delay": "1d",
"time_zone": "UTC"
}
},
"metrics": [],
"timeout": "20s",
"page_size": 1000
},
"status": {
"job_state": "stopped",
"current_position": {
"#timestamp.date_histogram": 1567933199000
},
"upgraded_doc_id": true
},
"stats": {
"pages_processed": 1840,
"documents_processed": 5322525,
"rollups_indexed": 1838383,
"trigger_count": 1,
"index_time_in_ms": 1555018,
"index_total": 1839,
"index_failures": 0,
"search_time_in_ms": 59059,
"search_total": 1840,
"search_failures": 0
}
}
but it fails to rollup the data with such exception:
Error while attempting to bulk index documents: failure in bulk execution:
[0]: index [logstash-java-beats-rollup], type [_doc], id [Test 2 job$GTvyIZtPhKqi-dtfVd6MXg], message [MapperParsingException[Could not dynamically add mapping for field [#timestamp.date_histogram.time_zone]. Existing mapping for [#timestamp] must be of type object but found [date].]]
[1]: index [logstash-java-beats-rollup], type [_doc], id [Test 2 job$v-r89eEpLvImr0lWIrOb_Q], message [MapperParsingException[Could not dynamically add mapping for field [#timestamp.date_histogram.time_zone]. Existing mapping for [#timestamp] must be of type object but found [date].]]
[2]: index [logstash-java-beats-rollup], type [_doc], id [Test 2 job$quCHwZP1iVU_Bs2fmhgSjQ], message [MapperParsingException[Could not dynamically add mapping for field [#timestamp.date_histogram.time_zone]. Existing mapping for [#timestamp] must be of type object but found [date].]]
...
logstash-java-beats-rollup index is empty, even if there is some stats for the rollup job available.
I'm using elasticsearch v7.2.0
Could you please explain what is wrong with the data, or with the rollup job configuration?

how to store my json log file to logstash with json filter

This is my json log file. I'm trying to store the file to my elastic-Search through my logstash.
{ "id": "135569", "title" : "Star Trek Beyond", "year":2016 , "genre":
["Action", "Adventure", "Sci-Fi"] }
after storing the data into the elasticSearch, my results is as follow
{
"_index": "filebeat-6.2.4-2018.11.09",
"_type": "doc",
"_id": "n-J39mYB6zb53NvEugMO",
"_score": 1,
"_source": {
"#timestamp": "2018-11-09T03:15:32.262Z",
"source": "/Users/jinwoopark/Jin/json_files/testJson.log",
"offset": 106,
"message": """{ "id": "135569", "title" : "Star Trek Beyond", "year":2016 , "genre":["Action", "Adventure", "Sci-Fi"] }""",
"id": "%{id}",
"#version": "1",
"host": "Jinui-MacBook-Pro.local",
"tags": [
"beats_input_codec_plain_applied"
],
"prospector": {
"type": "log"
},
"title": "%{title}",
"beat": {
"name": "Jinui-MacBook-Pro.local",
"hostname": "Jinui-MacBook-Pro.local",
"version": "6.2.4"
}
}
}
What I'm trying to do is that,
I want to store only "genre value" into the message field, and store other values(ex id, title) into extra fields(the created fields, which is id and title field). but the extra fields were stored with empty values(%{id}, %{title}). It seems like I need to modify my logstash json filter, but here I need your help.
my current configuration of logstash is as follow
input {
beats {
port => 5044
}
}
filter {
json {
source => "genre" //want to store only genre (from json log) into message field
}
mutate {
add_field => {
"id" => "%{id}" // want to create extra field for id value from log file
"title" => "%{title}" // want to create extra field for title value from log file
}
}
date {
match => [ "timestamp", "dd/MM/yyyy:HH:mm:ss Z" ]
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "%{[#metadata][beat]}-%{[#metadata][version]}-%{+YYYY.MM.dd}"
}
stdout {
codec => rubydebug
}
}
When you tell the json filter that the source is genre, it should ignore the rest of the document, which would explain why you don't get an id or title.
Seems like you should parse the entire json document, and use the mutate->replace plugin to move the contents of genre to message.

How to query logstash indexes with .net elasticsearch client?

I'm trying to use NEST for searching through elastic search indexex that were created with logstash (basically logstash-*).
I have setup NEST with following code:
Node = new Uri("http://localhost:9200");
Settings = new ConnectionSettings(Node);
Settings.DefaultIndex("logstash-*");
Client = new ElasticClient(Settings);
this is how I try to get results:
var result = Client.Search<Logstash>(s => s
.Query(p => p.Term("Message", "*")));
and I get 0 hits:
http://screencast.com/t/d2FB9I4imE
Here is an example of entry I would like to find:
{
"_index": "logstash-2016.06.20",
"_type": "logs",
"_id": "AVVtswJxpdkh1tFPP9S5",
"_score": null,
"_source": {
"timestamp": "2016-06-20 14:04:55.6650",
"logger": "xyz",
"level": "debug",
"message": "Processed command service method SearchService.SearchBy in 65 ms",
"exception": "",
"url": "",
"ip": "",
"username": "",
"user_id": "",
"role": "",
"authentication_provider": "",
"application_id": "",
"application_name": "",
"application": "ZBD",
"#version": "1",
"#timestamp": "2016-06-20T12:04:55.666Z",
"host": "0:0:0:0:0:0:0:1"
},
"fields": {
"#timestamp": [
1466424295666
]
},
"sort": [
1466424295666
]
}
I'm using 5.0.0-alpha3 version, and NEST client is alpha2 version atm.
This is because of
...
"_type": "logs",
...
When you are doing query like yours it will hit logstash not logs type, because NEST infers type name from generic parameter. You have two options to solve this problem.
Tell NEST to map Logstash type to logs type whenever making
request to elasticsearch, by setting this mapping in client's
settings:
var settings = new ConnectionSettings()
.MapDefaultTypeNames(m => m.Add(typeof(Logstash), "logs");
var client = new ElasticClient(settings);
Override default behaviour by setting type explicitly in request
parameters:
var result = Client.Search<Logstash>(s => s
.Type("logs")
.Query(p => p.Term("message", "*")));
Also notice you sould use message not Message in term descriptor
as you don't have such field in index. Second this is as far as I
know wildcards are not supported in term query. You may want to use
query string instead.
Hope it helps.

Leave out default Logstash fields in ElasticSearch

After processing data with: input | filter | output > ElasticSearch the format it's get stored in is somewhat like:
"_index": "logstash-2012.07.02",
"_type": "stdin",
"_id": "JdRaI5R6RT2do_WhCYM-qg",
"_score": 0.30685282,
"_source": {
"#source": "stdin://dist/",
"#type": "stdin",
"#tags": [
"tag1",
"tag2"
],
"#fields": {},
"#timestamp": "2012-07-02T06:17:48.533000Z",
"#source_host": "dist",
"#source_path": "/",
"#message": "test"
}
I filter/store most of the important information in specific fields, is it possible to leave out the default fields like: #source_path and #source_host? In the near future it's going to store 8 billion logs/month and I would like to run some performance tests with this default fields excluded (I just don't use these fields).
This removes fields from output:
filter {
mutate {
# remove duplicate fields
# this leaves timestamp from message and source_path for source
remove => ["#timestamp", "#source"]
}
}
Some of that will depend on what web interface you are using to view your logs. I'm using Kibana, and a customer logger (c#) that indexes the following:
{
"_index": "logstash-2013.03.13",
"_type": "logs",
"_id": "n3GzIC68R1mcdj6Wte6jWw",
"_version": 1,
"_score": 1,
"_source":
{
"#source": "File",
"#message": "Shalom",
"#fields":
{
"tempor": "hit"
},
"#tags":
[
"tag1"
],
"level": "Info"
"#timestamp": "2013-03-13T21:47:51.9838974Z"
}
}
This shows up in Kibana, and the source fields are not there.
To exclude certain fields you can use prune filter plugin.
filter {
prune {
blacklist_names => [ "#timestamp", "#source" ]
}
}
Prune filter is not a logstash default plugin and must be installed first:
bin/logstash-plugin install logstash-filter-prune

Resources