grokdebugger validates entries of a log that logstash eventually refuses - elasticsearch

Using the grokdebugger I've adapted what I found over the Internet for my first attempt to handle logback spring-boot kind of logs.
Here is a log entry sent to grokdebugger:
2022-03-09 06:35:15,821 [http-nio-9090-exec-1] WARN org.springdoc.core.OpenAPIService - found more than one OpenAPIDefinition class. springdoc-openapi will be using the first one found.
with the grok pattern:
(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME}) \[(?<thread>(.*?)+)\] %{LOGLEVEL:level}\s+%{GREEDYDATA:class} - (?<logmessage>.*)
and its dispatches its content as wished:
{
"timestamp": [
[
"2022-03-09 06:35:15,821"
]
],
"YEAR": [
[
"2022"
]
],
"MONTHNUM": [
[
"03"
]
],
"MONTHDAY": [
[
"09"
]
],
"TIME": [
[
"06:35:15,821"
]
],
"HOUR": [
[
"06"
]
],
"MINUTE": [
[
"35"
]
],
"SECOND": [
[
"15,821"
]
],
"thread": [
[
"http-nio-9090-exec-1"
]
],
"level": [
[
"WARN"
]
],
"class": [
[
"org.springdoc.core.OpenAPIService"
]
],
"logmessage": [
[
"found more than one OpenAPIDefinition class. springdoc-openapi will be using the first one found."
]
]
}
But when I ask for the same action inside logstash, I set in configuration for input declaration:
input {
file {
path => "/home/lebihan/dev/Java/comptes-france/metier-et-gestion/dev/ApplicationMetierEtGestion/sparkMetier.log"
codec => multiline {
pattern => "^%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME}.*"
negate => "true"
what => "previous"
}
}
}
and for filter declaration:
filter {
#If log line contains tab character followed by 'at' then we will tag that entry as stacktrace
if [message] =~ "\tat" {
grok {
match => ["message", "^(\tat)"]
add_tag => ["stacktrace"]
}
}
grok {
match => [ "message",
"(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME}) \[(?<thread>(.*?)+)\] %{LOGLEVEL:level}\s+%{GREEDYDATA:class} - (?<logmessage>.*)"
]
}
date {
match => [ "timestamp" , "yyyy-MM-dd HH:mm:ss.SSS" ]
}
}
But it fails in parsing it, and I don't know how to have extra content about the underlying error mentioned by _grokparsefailure.

The main responsible of my trouble is the:
grok {
match => [
instead of:
grok {
match => {
But after that, I had to change:
the timestamp definition to a %{TIMESTAMP_ISO8601:timestamp}
the date match
and in the date match add a target to it to avoid a
to avoid a _dateparsefailure.
#timestamp:
Mar 16, 2022 # 09:14:22.002
#version:
1
class:
f.e.service.AbstractSparkDataset
host:
debian
level:
INFO
logmessage:
Un dataset a été sauvegardé dans le fichier parquet /data/tmp/balanceComptesCommunes_2019_2019.
thread:
http-nio-9090-exec-10
timestamp:
2022-03-16T06:34:09.394Z
_id:
8R_KkX8BBIYNTaMw1Jfg
_index:
ecoemploimetier-2022.03.16
_score:
-
_type:
_doc
I eventually corrected my logstash config file like that:
input {
file {
path => "/home/[...]/myLog.log"
sincedb_path => "/dev/null"
start_position => "beginning"
codec => multiline {
pattern => "^%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME}.*"
negate => "true"
what => "previous"
}
}
}
filter {
#If log line contains tab character followed by 'at' then we will tag that entry as stacktrace
if [message] =~ "\tat" {
grok {
match => ["message", "^(\tat)"]
add_tag => ["stacktrace"]
}
}
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} \[(?<thread>(.*?)+)\] %{LOGLEVEL:level} %{GREEDYDATA:class} - (?<logmessage>.*)" }
}
date {
# 2022-03-16 07:32:24,860
match => [ "timestamp" , "yyyy-MM-dd HH:mm:ss,SSS" ]
target => "timestamp"
}
# S'il n'y a pas d'erreur de parsing, supprimer le message d'origine, non parsé
if "_grokparsefailure" not in [tags] {
mutate {
remove_field => [ "message", "path" ]
}
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
hosts => ["localhost:9200"]
index => "ecoemploimetier-%{+YYYY.MM.dd}"
}
}

Related

Have a key value pair as logstash output, by only using grok filter

I am working on a spring boot project and using ELK stack for logging and auditing. I need a logstash.conf file which will process logs and the output can have dynamic key-value pairs. This output data will be used for auditing.
Adding an example for better clarity
Example:
Sample log:
[INFO] [3b1d04f219fc43d18ccb6cb22db6cff4] 2021-10-13_13:43:09.074 Audit_ key1:value1| key2:value2| key3:value3| keyN:valueN
Required logstash output:
{
"logLevel": [
[
"INFO"
]
],
"threadId": [
[
"3b1d04f219fc43d18ccb6cb22db6cff4"
]
],
"timeStamp": [
[
"2021-10-13_13:43:09.074"
]
],
"class": [
[
"Audit_"
]
],
"key1": [
[
"value1"
]
],
"key2": [
[
"value2"
]
],
"key3": [
[
"value3"
]
],
"keyN": [
[
"valueN"
]
]
}
Note:
"key" will always be a word or string value
"value" can be word, numeric or sentence(string with spaces)
":" is the separator between key and value
"|" is the separator between key-value pairs
The number of key-value pairs can vary.
Can someone suggest/help me with the match pattern to be used here? I am only allowed to use grok filter.
Thank you for guidance Filip and leandrojmp!
Just using a grok filter for this, would make it very complex and also it wont support dynamic key-value pairs.
So I went with a combination of grok followed by kv filter. And this approach worked for me.
Sample Log:
[INFO] [3b1d04f219fc43d18ccb6cb22db6cff4] 2021-10-13_13:43:09.074 _Audit_ key1:value1| key2:value2| key3:value3| keyN:valueN
logstash.conf file:
input {
beats {
port => "5044"
}
}
filter {
grok {
match => {"message" => "\[%{LOGLEVEL:logLevel}\]\ \[%{WORD:traceId}\]\ (?<timestamp>[0-9\-_:\.]*)\ %{WORD:class}\ %{GREEDYDATA:message}"]}
overwrite => [ "message" ]
}
if [class] == "_Audit_" {
kv {
source => "message"
field_split => "&"
value_split => "="
remove_field => ["message"]
}
}
}
output {
if [class] == "_Audit_" {
elasticsearch {
hosts => ["localhost:9200"]
index => "audit-logs-%{+YYYY.MM.dd}"
}
}
else {
elasticsearch {
hosts => ["localhost:9200"]
index => "normal-logs-%{+YYYY.MM.dd}"
}
}
}

Logstash nginx filter doesn't apply to half of rows

Using filebeat to push nginx logs to logstash and then to elasticsearch.
Logstash filter:
filter {
if [fileset][module] == "nginx" {
if [fileset][name] == "access" {
grok {
match => { "message" => ["%{IPORHOST:[nginx][access][remote_ip]} - %{DATA:[nginx][access][user_name]} \[%{HTTPDATE:[nginx][access][time]}\] \"%{WORD:[nginx][access][method]} %{DATA:[nginx][access][url]} HTTP/%{NUMBER:[nginx][access][http_version]}\" %{NUMBER:[nginx][access][response_code]} %{NUMBER:[nginx][access][body_sent][bytes]} \"%{DATA:[nginx][access][referrer]}\" \"%{DATA:[nginx][access][agent]}\""] }
remove_field => "message"
}
mutate {
add_field => { "read_timestamp" => "%{#timestamp}" }
}
date {
match => [ "[nginx][access][time]", "dd/MMM/YYYY:H:m:s Z" ]
remove_field => "[nginx][access][time]"
}
useragent {
source => "[nginx][access][agent]"
target => "[nginx][access][user_agent]"
remove_field => "[nginx][access][agent]"
}
geoip {
source => "[nginx][access][remote_ip]"
target => "[nginx][access][geoip]"
}
}
else if [fileset][name] == "error" {
grok {
match => { "message" => ["%{DATA:[nginx][error][time]} \[%{DATA:[nginx][error][level]}\] %{NUMBER:[nginx][error][pid]}#%{NUMBER:[nginx][error][tid]}: (\*%{NUMBER:[nginx][error][connection_id]} )?%{GREEDYDATA:[nginx][error][message]}"] }
remove_field => "message"
}
mutate {
rename => { "#timestamp" => "read_timestamp" }
}
date {
match => [ "[nginx][error][time]", "YYYY/MM/dd H:m:s" ]
remove_field => "[nginx][error][time]"
}
}
}
}
There is just one file /var/log/nginx/access.log.
In kibana, I see ± half of the rows with parsed message and other half - not.
All of the rows in kibana have a tag "beats_input_codec_plain_applied".
Examples from filebeat -e
Row that works fine:
"source": "/var/log/nginx/access.log",
"offset": 5405195,
"message": "...",
"fileset": {
"module": "nginx",
"name": "access"
}
Row that doesn't work fine (no "fileset"):
"offset": 5405397,
"message": "...",
"source": "/var/log/nginx/access.log"
Any idea what could be the cause?

How to use the logstash mutate or ruby filter

I have the following Json syntax
{"result": {
"entities": {
"SERVICE-CCC89FB0A922657A": "service1",
"SERVICE-D279F46CD751424F": "service2",
"SERVICE-7AB760E70FCDCA18": "service3",
},
"dataPoints": {
"SERVICE-CCC89FB0A922657A": [
[
1489734240000,
1101.0
],
[
1489734300000,
null
]
],
"SERVICE-7AB760E70FCDCA18": [
[
1489734240000,
4080800.5470588235
],
[
1489734300000,
null
]
],
"SERVICE-D279F46CD751424F": [
[
1489734240000,
26677.695652173912
],
[
1489734300000,
null
]
]
}
},
"#timestamp": "2017-03-17T07:05:37.531Z",
"data": "data",
"#version": "1"
}
I want to change the following and input it in elasticsearch.
{"#timestamp": "2017-03-17T07:05:37.531Z",
"data": "data",
"#version": "1",
"data" : {
"service1",: [
[
1489734240000,
1101.0
],
[
1489734300000,
null
]
],
"service3" : [
[
1489734240000,
4080800.5470588235
],
[
1489734300000,
null
]
],
"service2": [
[
1489734240000,
26677.695652173912
],
[
1489734300000,
null
]
]
}
}
This is the contents of the current logstash conf file.
input {
http_poller {
urls => {
test => {
method => get
url => "https://xxxx.com"
headers => {
Accept => "application/json"
}
}
}
request_timeout => 60
schedule => { every => "60s" }
codec => "plain"
}
}
filter {
json{
source => "message"
remove_field => ["[result][aggregationType]","message"]
}
# translate{
# }
# mutate{
# }
# ruby{
# }
}
output {
stdout {
codec => rubydebug {
#metadata => true
}
}
elasticsearch {
hosts => ["http://192.168.0.36:9200"]
}
}
I have just used elasticsearch and I do not know how to implement what filter to use.
I wonder if it is possible to implement the contents of the mutate filter rename.
Or should I implement code with ruby ​​filters?
It is likely that the entities will be arrayed with the ruby ​​filter to match the SERVICE- * s of the dataPoints.
However, it is difficult to cope with Ruby code.
I want you to help me.
Thank you.
Here are couple of filters are used for logstash...
https://www.elastic.co/guide/en/logstash/current/plugins-filters-json.html

Elasticsearch, Logstash and Kibana for pfsense logs with geo location

I want to create a tile map in Kibana to show source IP's from countries around the world.
When trying to set up a tile map, I get an error saying that "The "logstash-*" index pattern does not contain any of the following field types: geo_point"
I've googled the problem and found this link https://github.com/elastic/logstash/issues/3137 and at the end of that page, it states this is fixed in 2.x. But I am on 2.1.
Here are my configs:
1inputs.conf:
input {
udp {
type => "syslog"
port => 5140
}
}
5pfsense.conf:
filter {
# Replace with your IP
if [host] =~ /10\.1\.15\.200/ {
grok {
match => [ 'message', '.* %{WORD:program}:%{GREEDYDATA:rest}' ]
}
if [program] == "filterlog" {
# Grab fields up to IP version. The rest will vary depending on IP version.
grok {
match => [ 'rest', '%{INT:rule_number},%{INT:sub_rule_number},,%{INT:tracker_id},%{WORD:interface},%{WORD:reason},%{WORD:action},%{WORD:direction},%{WORD:ip_version},%{GREEDYDATA:rest2}' ]
}
}
mutate {
replace => [ 'message', '%{rest2}' ]
}
if [ip_version] == "4" {
# IPv4. Grab field up to dest_ip. Rest can vary.
grok {
match => [ 'message', '%{WORD:tos},,%{INT:ttl},%{INT:id},%{INT:offset},%{WORD:flags},%{INT:protocol_id},%{WORD:protocol},%{INT:length},%{IP:src_ip},%{IP:dest_ip},%{GREEDYDATA:rest3}' ]
}
}
if [protocol_id] != 2 {
# Non-IGMP has more fields.
grok {
match => [ 'rest3', '^%{INT:src_port:int},%{INT:dest_port:int}' ]
}
}
else {
# IPv6. Grab field up to dest_ip. Rest can vary.
grok {
match => [ 'message', '%{WORD:class},%{WORD:flow_label},%{INT:hop_limit},%{WORD:protocol},%{INT:protocol_id},%{INT:length},%{IPV6:src_ip},%{IPV6:dest_ip},%{GREEDYDATA:rest3}' ]
}
}
mutate {
replace => [ 'message', '%{rest3}' ]
lowercase => [ 'protocol' ]
}
if [message] {
# Non-ICMP has more fields
grok {
match => [ 'message', '^%{INT:src_port:int},%{INT:dest_port:int},%{INT:data_length}' ]
}
}
mutate {
remove_field => [ 'message' ]
remove_field => [ 'rest' ]
remove_field => [ 'rest2' ]
remove_field => [ 'rest3' ]
remove_tag => [ '_grokparsefailure' ]
add_tag => [ 'packetfilter' ]
}
geoip {
add_tag => [ "GeoIP" ]
source => "src_ip"
}
}
}
Lastly, the 50outputs.conf:
output {
elasticsearch { hosts => localhost index => "logstash-%{+YYYY.MM.dd}" template_overwrite => "true" }
stdout { codec => rubydebug }
}

Adding fields depending on event message in Logstash not working

I have ELK installed and working in my machine, but now I want to do a more complex filtering and field adding depending on event messages.
Specifically, I want to set "id_error" and "descripcio" depending on the message pattern.
I have been trying a lot of code combinations in "logstash.conf" file, but I am not able to get the expected behavior.
Can someone tell me what I am doing wrong, what I have to do or if this is not possible? Thanks in advance.
This is my "logstash.conf" file, with the last test I have made, resulting in no events captured in Kibana:
input {
file {
path => "C:\xxx.log"
}
}
filter {
grok {
patterns_dir => "C:\elk\patterns"
match => [ "message", "%{ERROR2:error2}" ]
add_field => [ "id_error", "2" ]
add_field => [ "descripcio", "error2!!!" ]
}
grok {
patterns_dir => "C:\elk\patterns"
match => [ "message", "%{ERROR1:error1}" ]
add_field => [ "id_error", "1" ]
add_field => [ "descripcio", "error1!!!" ]
}
if ("_grokparsefailure" in [tags]) { drop {} }
}
output {
elasticsearch {
host => "localhost"
protocol => "http"
index => "xxx-%{+YYYY.MM.dd}"
}
}
I also have tried the following code, resulting in fields "id_error" and "descripcio" with both vaules "[1,2]" and "[error1!!!,error2!!!]" respectively, in each matched event.
As "break_on_match" is set "true" by default, I expect getting only the fields behind the matching clause, but this doesn't occur.
input {
file {
path => "C:\xxx.log"
}
}
filter {
grok {
patterns_dir => "C:\elk\patterns"
match => [ "message", "%{ERROR1:error1}" ]
add_field => [ "id_error", "1" ]
add_field => [ "descripcio", "error1!!!" ]
match => [ "message", "%{ERROR2:error2}" ]
add_field => [ "id_error", "2" ]
add_field => [ "descripcio", "error2!!!" ]
}
if ("_grokparsefailure" in [tags]) { drop {} }
}
output {
elasticsearch {
host => "localhost"
protocol => "http"
index => "xxx-%{+YYYY.MM.dd}"
}
}
I have solved the problem. I get the expected results with the following code in "logstash.conf":
input {
file {
path => "C:\xxx.log"
}
}
filter {
grok {
patterns_dir => "C:\elk\patterns"
match => [ "message", "%{ERROR1:error1}" ]
match => [ "message", "%{ERROR2:error2}" ]
}
if [message] =~ /error1_regex/ {
grok {
patterns_dir => "C:\elk\patterns"
match => [ "message", "%{ERROR1:error1}" ]
}
mutate {
add_field => [ "id_error", "1" ]
add_field => [ "descripcio", "Error1!" ]
remove_field => [ "message" ]
remove_field => [ "error1" ]
}
}
else if [message] =~ /error2_regex/ {
grok {
patterns_dir => "C:\elk\patterns"
match => [ "message", "%{ERROR2:error2}" ]
}
mutate {
add_field => [ "id_error", "2" ]
add_field => [ "descripcio", "Error2!" ]
remove_field => [ "message" ]
remove_field => [ "error2" ]
}
}
if ("_grokparsefailure" in [tags]) { drop {} }
}
output {
elasticsearch {
host => "localhost"
protocol => "http"
index => "xxx-%{+YYYY.MM.dd}"
}
}

Resources