Hierarchical data matching and displaying - elasticsearch

In my log files, I have data that represents a the hierarchy of items, much like an http log file might show the hierarchy of a website.
I may have data such as this
41 2016-01-01 01:41:32-500 show:category:all
41 2016-01-01 04:11:20-500 show:category:animals
42 2016-01-02 01:41:32-500 show:item:wallaby
42 2016-01-02 01:41:32-500 show:home
and I would have 3 items in here... %{NUMBER:terminal} %{TIMESTAMP_ISO8601:ts} and (?<info>([^\r])*)
I parse the info data into an array using mutate and split to convert lvl1:lvl2:lvl3 into ['lvl1','lvl2','lvl3'].
I'm interested in aggregating the data to get counts at various levels easily, such as counting all records where info[0] is the same or where info[0] and info[1] are the same. (and be able to select time range and terminal)
Is there a way to set up kibana to visualize this kind of information?
Or should I change the way the filter is matching the data to make the data easier to access?
the depth of levels varies but I can be pretty certain that the max levels are 5, so I could parse the text into various fields lvl1 lvl2 lvl3 lvl4 lvl5 instead of putting them in an array.

As per your question, I agree with your way of parsing data. But I would like to add on more to make it directly aggregatable & visualize using Kibana.
The approach should be :-
Filter the data using %{NUMBER:terminal} %{TIMESTAMP_ISO8601:ts} and (?([^\r])*) {As per information given by you}
Mutate
Filter
Then after using mutate & filter you will get data in terms of array {as you have mentioned}
Now you can add a field as level 1 by mentioning add_field => [ "fieldname", "%{[arrayname][0]}" ]
Now you can add a field as level 2 by mentioning add_field => [ "fieldname", "%{[arrayname][1]}" ]
Now you can add a field as level 3 by mentioning add_field => [ "fieldname", "%{[arrayname][2]}" ]
Then you can directly use Kibana to visualize such information.

my solution
input {
file {
path => "C:/Temp/zipped/*.txt"
start_position => beginning
ignore_older => 0
sincedb_path => "C:/temp/logstash_temp2.sincedb"
}
}
filter {
grok {
match => ["message","^%{NOTSPACE}\[%{NUMBER:terminal_id}\] %{NUMBER:log_level} %{NUMBER} %{TIMESTAMP_ISO8601:ts} \[(?<facility>([^\]]*))\] (?<lvl>([^$|\r])*)"]
}
mutate {
split => ["lvl", ":"]
add_field => {"lvl_1" => "%{lvl[0]}"}
add_field => {"lvl_2" => "%{lvl[1]}"}
add_field => {"lvl_3" => "%{lvl[2]}"}
add_field => {"lvl_4" => "%{lvl[3]}"}
add_field => {"lvl_5" => "%{lvl[4]}"}
add_field => {"lvl_6" => "%{lvl[5]}"}
add_field => {"lvl_7" => "%{lvl[6]}"}
add_field => {"lvl_8" => "%{lvl[7]}"}
lowercase => [ "terminal_id" ] # set to lowercase so that it can be used for index - additional filtering may be required
}
date {
match => ["ts", "YYYY-MM-DD HH:mm:ssZZ"]
}
}
filter {
if [lvl_1] =~ /%\{lvl\[0\]\}/ {mutate {remove_field => [ "lvl_1" ]}}
if [lvl_2] =~ /%\{lvl\[1\]\}/ {mutate {remove_field => [ "lvl_2" ]}}
if [lvl_3] =~ /%\{lvl\[2\]\}/ {mutate {remove_field => [ "lvl_3" ]}}
if [lvl_4] =~ /%\{lvl\[3\]\}/ {mutate {remove_field => [ "lvl_4" ]}}
if [lvl_5] =~ /%\{lvl\[4\]\}/ {mutate {remove_field => [ "lvl_5" ]}}
if [lvl_6] =~ /%\{lvl\[5\]\}/ {mutate {remove_field => [ "lvl_6" ]}}
if [lvl_7] =~ /%\{lvl\[6\]\}/ {mutate {remove_field => [ "lvl_7" ]}}
if [lvl_8] =~ /%\{lvl\[7\]\}/ {mutate {remove_field => [ "lvl_8" ]}}
mutate{
remove_field => [ "lvl","host","ts" ] # do not keep this data
}
}
output {
if [facility] == "mydata" {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash-mydata-%{terminal_id}-%{+YYYY.MM.DD}"
}
} else {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash-other-%{terminal_id}-%{+YYYY.MM.DD}"
}
}
# stdout { codec => rubydebug }
}

Related

Logstash giving _rubyexception while adding a field and altering its value

Logstash version 6.5.4
I want to create jobExecutionTime field when status is COMPLETE and set its value as current_timestamp-created_timestamp.
These are few lines from my config file.
match => { "message" => '%{DATA:current_timestamp},%{WORD:status},%{DATA:created_timestamp}}
if [status] == "COMPLETE" {
mutate {
add_field => [ "jobExecutionTime" , "null" ]
}
ruby {
code => "event.set('jobExecutionTime', event.get('current_timestamp') - event.get('created_timestamp'))"
}
}
This my input
"created_timestamp" => "2022-07-10 23:50:03.644"
"current_timestamp" => "2022-07-10 23:50:03.744"
"status" => "COMPLETE"
I am getting this as output
"jobExecutionTime" => "null",
"exportFrequency" => "RECURRENT",
"successfulImportMilestone" => 0,
"tags" => [
[0] "_rubyexception"
],
Here jobExecutionTime is set to null rather than concerned value
Your [created_timestamp] and [current_timestamp] fields are strings. You cannot do math on a string, you need to convert it an object type that you can do math on. In this case you should use date filters to convert them to LogStash::Timestamp objects
If you add
date { match => [ "created_timestamp", "ISO8601" ] target => "created_timestamp" }
date { match => [ "current_timestamp", "ISO8601" ] target => "current_timestamp" }
to your filter section then your ruby filter will work as-is, and you will get
"created_timestamp" => 2022-07-11T03:50:03.644Z,
"current_timestamp" => 2022-07-11T03:50:03.744Z,
"jobExecutionTime" => 0.1

Removing grok matched field after using it

I use filebeat to fetch log files into my logstash and then filter unnecessary fields. Everything works fine and I output these into elasticsearch but there is a field which I use for elasticsearch index name, I define this variable in my grok match but I couldn't find a way to remove that variable once it serves its purpose. I'll share my logstash config below
input {
beats {
port => "5044"
}
}
filter {
grok {
match => { "[log][file][path]" => ".*(\\|\/)(?<myIndex>.*)(\\|\/).*.*(\\|\/).*(\\|\/).*(\\|\/).*(\\|\/)" }
}
json {
source => message
}
mutate {
remove_field => ["agent"]
remove_field => ["input"]
remove_field => ["#metadata"]
remove_field => ["log"]
remove_field => ["tags"]
remove_field => ["host"]
remove_field => ["#version"]
remove_field => ["message"]
remove_field => ["event"]
remove_field => ["ecs"]
}
date {
match => ["t","yyyy-MM-dd HH:mm:ss.SSS"]
remove_field => ["t"]
}
mutate {
rename => ["l","log_level"]
rename => ["mt","msg_template"]
rename => ["p","log_props"]
}
}
output {
elasticsearch {
hosts => [ "localhost:9222" ]
index => "%{myIndex}"
}
stdout { codec => rubydebug { metadata => true } }
}
I just want to remove the "myIndex" field from my index. With this config file, I see this field in elasticsearch if possible I want to remove it. I've tried to remove it with other fields altogether but it gave an error. I guess it's because I removed it before logstash could give it to elasticsearch.
Create the field under [#metadata]. Those fields are available to use in logstash but are ignored by outputs unless they use a rubydebug codec.
Adjust your grok filter
match => { "[log][file][path]" => ".*(\\|\/)(?<[#metadata][myIndex]>.*)(\\|\/).*.*(\\|\/).*(\\|\/).*(\\|\/).*(\\|\/)" }
Delete [#metadata] from the mutate+remove_field and change the output configuration to have
index => "%{[#metadata][myIndex]}"

How to write if condition inside of the logstash grok pattern?

My question is related to logstash grok pattern. I created below pattern that's working fine but the big problem is not string values. Sometimes; "Y" and "age" can be null so my grok pattern not create any log in elasticseach. It is not working properly. I need to tell my grok pattern :
if(age is null || age i empty){
updatefield["age",0]
}
but I don't know how to make it. by the way; I checked many solutions by googling but it is directly related to my problem.
input {
file {
path => ["C:/log/*.log"]
start_position => "beginning"
discover_interval => 10
stat_interval => 10
sincedb_write_interval => 10
close_older => 10
codec => multiline {
pattern => "^%{TIMESTAMP_ISO8601}\|"
negate => true
what => "previous"
}
}
}
filter {
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:formattedDate}.* X: %{DATA:X} Y: %{NUMBER:Y} Z: %{DATA:Z} age: %{NUMBER:age:int} "}
}
date {
timezone => "Europe/Istanbul"
match => ["TimeStamp", "ISO8601"]
}
json{
source => "request"
target => "parsedJson"
}
mutate {
remove_field => [ "path","message","tags","#version"]
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
hosts => [ "http://localhost:9200" ]
index => "logstash-%{+YYYY.MM}"
}
}
You can check if your fields exists or are empty using conditionals with your filter,
filter {
if ![age] or [age] == "" {
mutate {
update => { "age" => "0" }
}
}
}

:reason=>"Something is wrong with your configuration." GeoIP.dat Mutate Logstash

I have the following configuration for logstash.
There are 3 parts to this one is a generallog which we use for all applications they land in here.
second part is the application stats where in which we have a specific logger which will be configured to push the application statistics
third we have is the click stats when ever an event occurs on client side we may want to push it to the logstash on the upd address.
all 3 are udp based, we also use log4net to to send the logs to the logstash.
the base install did not have a GeoIP.dat file so got the file downloaded from the https://dev.maxmind.com/geoip/legacy/geolite/
have put the file in the /opt/logstash/GeoIPDataFile with a 777 permissions on the file and folder.
second thing is i have a country name and i need a way to show how many users form each country are viewing the application in last 24 hours.
so for that reason we also capture the country name as its in their profile in the application.
now i need a way to get the geo co-ordinates to use the tilemap in kibana.
What am i doing wrong.
if i take the geoIP { source -=> "country" section the logstash works fine.
when i check the
/opt/logstash/bin/logstash -t -f /etc/logstash/conf.d/logstash.conf
The configuration file is ok is what i receive. where am i going worng?
Any help would be great.
input {
udp {
port => 5001
type => generallog
}
udp {
port => 5003
type => applicationstats
}
udp {
port => 5002
type => clickstats
}
}
filter {
if [type] == "generallog" {
grok {
remove_field => message
match => { message => "(?m)%{TIMESTAMP_ISO8601:sourcetimestamp} \[%{NUMBER:threadid}\] %{LOGLEVEL:loglevel} +- %{IPORHOST:requesthost} - %{WORD:applicationname} - %{WORD:envname} - %{GREEDYDATA:logmessage}" }
}
if !("_grokparsefailure" in [tags]) {
mutate {
replace => [ "message" , "%{logmessage}" ]
replace => [ "host" , "%{requesthost}" ]
add_tag => "generalLog"
}
}
}
if [type] == "applicationstats" {
grok {
remove_field => message
match => { message => "(?m)%{TIMESTAMP_ISO8601:sourceTimestamp} \[%{NUMBER:threadid}\] %{LOGLEVEL:loglevel} - %{WORD:envName}\|%{IPORHOST:actualHostMachine}\|%{WORD:applicationName}\|%{NUMBER:empId}\|%{WORD:regionCode}\|%{DATA:country}\|%{DATA:applicationName}\|%{NUMBER:staffapplicationId}\|%{WORD:applicationEvent}" }
}
geoip {
source => "country"
target => "geoip"
database => "/opt/logstash/GeoIPDataFile/GeoIP.dat"
add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ]
}
mutate {
convert => [ "[geoip][coordinates]", "float"]
}
if !("_grokparsefailure" in [tags]) {
mutate {
add_tag => "applicationstats"
add_tag => [ "eventFor_%{applicationName}" ]
}
}
}
if [type] == "clickstats" {
grok {
remove_field => message
match => { message => "(?m)%{TIMESTAMP_ISO8601:sourceTimestamp} \[%{NUMBER:threadid}\] %{LOGLEVEL:loglevel} - %{IPORHOST:remoteIP}\|%{IPORHOST:fqdnHost}\|%{IPORHOST:actualHostMachine}\|%{WORD:applicationName}\|%{WORD:envName}\|(%{NUMBER:clickId})?\|(%{DATA:clickName})?\|%{DATA:clickEvent}\|%{WORD:domainName}\\%{WORD:userName}" }
}
if !("_grokparsefailure" in [tags]) {
mutate {
add_tag => "clicksStats"
add_tag => [ "eventFor_%{clickName}" ]
}
}
}
}
output {
if [type] == "applicationstats" {
elasticsearch {
hosts => "localhost:9200"
index => "applicationstats-%{+YYYY-MM-dd}"
template => "/opt/logstash/templates/udp-applicationstats.json"
template_name => "applicationstats"
template_overwrite => true
}
}
else if [type] == "clickstats" {
elasticsearch {
hosts => "localhost:9200"
index => "clickstats-%{+YYYY-MM-dd}"
template => "/opt/logstash/templates/udp-clickstats.json"
template_name => "clickstats"
template_overwrite => true
}
}
else if [type] == "generallog" {
elasticsearch {
hosts => "localhost:9200"
index => "generallog-%{+YYYY-MM-dd}"
template => "/opt/logstash/templates/udp-generallog.json"
template_name => "generallog"
template_overwrite => true
}
}
else{
elasticsearch {
hosts => "localhost:9200"
index => "logstash-%{+YYYY-MM-dd}"
}
}
}
As per the error message, the mutation which you're trying to do could be wrong. Could you please change your mutate as below:
mutate {
convert => { "geoip" => "float" }
convert => { "coordinates" => "float" }
}
I guess you've given the mutate as an array, and it's a hash type by origin. Try converting both the values individually. Your database path for geoip seems to be fine in your filter. Is that the whole error which you've mentioned in the question? If not update the question with the whole error if possible.
Refer here, for in depth explanations.

Not seeing any Fields for a Y-Axis aggregation in Kibana

I have grok filter for apache logs as follows :
if [type] == "apachelogs" {
grok {
break_on_match => false
match => { "message" => "\[%{HTTPDATE:apachetime}\]%{SPACE}%{NOTSPACE:verb}%{SPACE}/%{NOTSPACE:ApacheRequested}" }
match=> { "message" => "\*\*%{NUMBER:seconds}/%{NUMBER:microseconds}" }
add_tag => "%{apachetime}"
add_tag => "%{verb}"
add_tag => "%{ApacheRequested}"
add_tag => "%{seconds}"
add_tag => "%{microseconds}"
I want to create a visualisation in kibana for search type="apachelogs". O am using filebeat.So my search query is
filebeat*type="apachelogs"
I want apachetime in X-axis and microseconds in Y-axis.But in Y
-axis, I am not getting any fields except default ones (sum,count,aggregation).
Please help.I dont know what I am doing wrong.

Resources