custom parse with logstash and elastic search - elasticsearch

I am new to logstash !
I configured and everything is working fine - so far.
My log files comes as:
2014-04-27 16:24:43 DEBUG b45e66 T+561 10.31.166.155 /v1/XXX<!session> XXX requested for category_ids: only_pro: XXX_ids:14525
If i use the following conf file:
input { file { path => "/logs/*_log" }} output { elasticsearch { host => localhost } }
It will place the following in the ES:
{
_index: "logstash-2014.04.28",
_type: "logs",
_id: "WIoUbIvCQOqnz4tMZzMohg",
_score: 1,
_source: {
message: "2014-04-27 16:24:43 DEBUG b45e66 T+561 10.31.166.155 This is my log !",
#version: "1",
#timestamp: "2014-04-28T14:25:52.165Z",
host: "MYCOMPUTER",
path: "\logs\xxx_app.log"
}
}
How do i take the string in my log so the entire text wont be at _source.message ?
e.g: I wish i could parse it to something like:
{
_index: "logstash-2014.04.28",
_type: "logs",
_id: "WIoUbIvCQOqnz4tMZzMohg",
_score: 1,
_source: {
logLevel: "DEBUG",
messageId: "b45e66",
sendFrom: "10.31.166.155",
logTimestamp: "2014-04-27 16:24:43",
message: "This is my log !",
#version: "1",
#timestamp: "2014-04-28T14:25:52.165Z",
host: "MYCOMPUTER",
path: "\logs\xxx_app.log"
}
}

You need to parse it through a filter, e.g. the grok filter. This can be quite a bit tricky, so be patient and try, try, try. And have a look at the predefined patterns, too.
A start for your message would be
%{DATESTAMP} %{WORD:logLevel} %{WORD:messageId} %{GREEDYDATA:someString} %{IP}
The grokdebugger is an extremely helpful tool for your assistance.
When done, your config should look like
input {
stdin {}
}
filter {
grok {
match => { "message" => "%{DATESTAMP} %{WORD:logLevel} %{WORD:messageId} %{GREEDYDATA:someString} %{IP}" }
}
}
output {
elasticsearch { host => localhost }
}

Related

Logstash: how to get field from path when using Filebeat?

The necessary part of the Filebeat config:
filebeat.inputs:
- type: log
paths:
- C:\Program Files\Filebeat\test_logs\*.txt
After sending to logstash and elasticsearch, the following field appears:
"log": {
"offset": 117,
"file": {
"path": "C:\\Program Files\\Filebeat\\test_logs\\20200804_0929_logui.txt"
}
I would like to get the folder name / file name as separate fields, but don't know how.
Already tried something like this:
grok {
match => { 'path' => '(C:\\Program Files\\Filebeat\\test_logs\\)%{GREEDYDATA:filename}\.txt' }
}
Unfortunately, this does not work.
Please help me figure it out.
Try with the dissect filter, much easier:
filter {
dissect {
mapping => {
"[log][file][path]" => "C:\\Program Files\\Filebeat\\test_logs\\%{[log][file][name]}.txt"
}
}
}

Multiple Logstash Outputs depending from collectd

I'm facing a configuration failure which I can't solve on my own, tried to get the solution with the documentation, but without luck.
I'm having a few different hosts which send their metrics via collectd to logstash. Inside the logstash configuration I'd like to seperate each host and pipe it into an own ES-index. When I try to configtest my settings logstash throws a failure - maybe someone can help me.
The seperation should be triggered by the hostname collectd delivers:
[This is an old raw json output, so please don't mind the wrong set index]
{
"_index": "wv-metrics",
"_type": "logs",
"_id": "AVHyJunyGanLcfwDBAon",
"_score": null,
"_source": {
"host": "somefqdn.com",
"#timestamp": "2015-12-30T09:10:15.211Z",
"plugin": "disk",
"plugin_instance": "dm-5",
"collectd_type": "disk_merged",
"read": 0,
"write": 0,
"#version": "1"
},
"fields": {
"#timestamp": [
1451466615211
]
},
"sort": [
1451466615211
]
}
Please see my config:
Input Config (Working so far)
input {
udp {
port => 25826
buffer_size => 1452
codec => collectd { }
}
}
Output Config File:
filter {
if [host] == "somefqdn.com" {
output {
elasticsearch {
hosts => "someip:someport"
user => logstash
password => averystrongpassword
index => "somefqdn.com"
}
}
}
}
Error which is thrown:
root#test-collectd1:/home/username# service logstash configtest
Error: Expected one of #, => at line 21, column 17 (byte 314) after filter {
if [host] == "somefqdn.com" {
output {
elasticsearch
I understand, that there's a character possible missing in my config, but I can't locate it.
Thx in advance!
I spot two errors in a quick scan:
First, your output stanza should not be wrapped with a filter{} block.
Second, your output stanza should start with output{} (put the conditional inside):
output {
if [host] == "somefqdn.com" {
elasticsearch {
...
}
}
}

ElasticSearch query using match or term?

I use match query to search the field "syslog_5424"
{
"query":{
"filtered":{
"query":{"match":{"syslog5424_app":"e1c28ca3-dc7e-4425-ba14-7778f126bdd6"}}
}
}
}
Here is the query result:
{
took: 23,
timed_out: false,
-_shards: {
total: 45,
successful: 29,
failed: 0
},
-hits: {
total: 8340,
max_score: 17.623652,
-hits: [
-{
_index: "logstash-2014.12.16",
_type: "applog",
_id: "AUpTBuwKsotKslj7c27d",
_score: 17.623652,
-_source: {
message: "132 <14>1 2014-12-16T12:16:09.889089+00:00 loggregator e1c28ca3-dc7e-4425-ba14-7778f126bdd6 [App/0] - - Get the platform's MBean server",
#version: "1",
#timestamp: "2014-12-16T12:16:10.127Z",
host: "9.91.32.178:33128",
type: "applog",
syslog5424_pri: "14",
syslog5424_ver: "1",
syslog5424_ts: "2014-12-16T12:16:09.889089+00:00",
syslog5424_host: "loggregator",
syslog5424_app: "e1c28ca3-dc7e-4425-ba14-7778f126bdd6",
syslog5424_proc: "[App/0]",
syslog5424_msg: "Get the platform's MBean server",
syslog_severity_code: 5,
syslog_facility_code: 1,
syslog_facility: "user-level",
syslog_severity: "notice",
#source_host: "%{syslog_hostname}",
#message: "%{syslog_message}"
}
},
...
But when I change the "match" to "term", I got nothing. the content of field syslog5424_app is exactly "e1c28ca3-dc7e-4425-ba14-7778f126bdd6", but I can't find it using "term".any kind of advice would be good.
{
"query":{
"filtered":{
"query":{"term":{"syslog5424_app":"e1c28ca3-dc7e-4425-ba14-7778f126bdd6"}}
}
}
}
What analyser are you using on field syslog_5424?
if it's the standard analyser then the data is probably being broken down into search terms.
e.g.
e1c28ca3-dc7e-4425-ba14-7778f126bdd6
is broken down into:
e1c28ca3
dc7e
4425
ba14
7778f126bdd6
When you use match query, your search string will also be broken down - so a match is made.
However when you use a term query, the search string won't be analysed. i.e. you are looking for e1c28ca3-dc7e-4425-ba14-7778f126bdd6 in the 5 individual terms - it's not going to match.
So - my recommendation would be to update your mapping to use not_analyzed - you wouldn't normally need part of a UUID, so turn off all analysis for this field.

_grokparsefailure without Filters

I have some simple logstash configuration:
input {
syslog {
port => 5140
type => "fortigate"
}
}
output {
elasticsearch {
cluster => "logging"
node_name => "logstash-logging-03"
bind_host => "10.100.19.77"
}
}
Thats it. Problem is that the documents that end up in elasticsearch do contain a _grokparsefailure:
{
"_index": "logstash-2014.12.19",
...
"_source": {
"message": ...",
...
"tags": [
"_grokparsefailure"
],
...
},
...
}
How come? There are no (grok) filters...
OK: The syslog input obviously makes use of gork internally. Therefore, if some other log format than "syslog" hits the input a "_grokparsefailure" will occure.
Instead, I just used "tcp" and "udp" inputs to achieve the required result (I was not aware of them before).
Cheers

How to update child document in Elastic search using update API?

I use parent-child documents in Elastic Search. I can do partial updates of the master document using the _update api. However, if I use the _update APi on a child document, the content of the document is completely replaced by the content of my script. Something goes wrong ... and I do not know what ....
See example below:
CREATE CHILD DOCUMENT
POST to /indexName/comment/c006?parent=b003
{
"authorId": "ps101",
"authorFullName": "Lieven",
"body": "Comment text comes here",
"isApproved": false
}
GET CHILD
GET to /indexName/comment/c006?parent=b003
{
_index: "indexName"
_type: "comment"
_id: "c006"
_version: 20
found: true
-_source: {
authorId: "ps101"
authorFullName: "Lieven"
body: "Comment text comes here."
isApproved: false
}
}
PARTIAL UPDATE
POST TO /indexName/comment/c006?parent=b003/_update
{
"script" : "ctx._source.isAcceptedAnswer=value",
"params" : {
"value" : true
}
}
NOW, GET AGAIN THE CHILD
GET to /indexName/comment/c006?parent=b003
{
_index: "indexName"
_type: "comment"
_id: "c006"
_version: 21
found: true
-_source: {
script: "ctx._source.isAcceptedAnswer=value"
-params: {
value: true
}
}
}
Source is completely wrong ...
Hope somebody can help
Marc
Change
POST TO /indexName/comment/c006?parent=b003/_update
to
POST TO /indexName/comment/c006/_update?parent=b003
The ? is the beginning of the query string, and it goes on the end.

Resources