Logstash Array split gives Nilclass for one element array - elasticsearch

I am trying to input json data from logs through filebeat-> logstash to elasticsearch but i seem to get NilClass error no matter what i try.
The data sample:
{"student":[{"details":{"name":chirs,"lname":dave},"age":10,"grade":1.2,"id":"323"}],"id":"metric95"}
my logstash configuration is:
input {
beats {
port => "5044"
}
}
filter {
json {
source => "message"
}
split {
field => "[student]"
}
}
output {
elasticsearch {
hosts => [ "localhost:9200" ]
}
stdout { codec => rubydebug }
}
Error: split - Only String and Array types are splittable. field:[student] is of type = NilClass

Please try
split {
field => "student"
}
and put double quotes as {"name":"chirs","lname":"dave"}

Related

Logstash aggregate fields

I am trying to configure logstash to aggregate similar syslog based on a message field and in a specific timestamp.
To make my case clear, this is an example of what I would like to do.
example: I have those junk syslog coming through my logstash
timestamp. message
13:54:24. hello
13:54:35. hello
What I would like to do is have a condition that check if the message are the same and those message occurs in a specific timespan (for example 10min) I would like to aggregate them into one row, and increase the count
the output I am expecting to see is as follow
timestamp. message. count
13.54.35. hello. 2
I know and I saw that there is the opportunity to aggregate the fields, but I was wondering if there is a chance to do this aggregation based on a specific time range
If anyone can help me I would be extremely grateful as I am new to logstash and I have the problem that in my server I am receiving tons of junk syslog and I would like to reduce that amount.
So far I did some cleaning with this configuration
input {
syslog {
port => 514
}
}
filter {
prune {
whitelist_names =>["timestamp","message","newfield"]
}
mutate {
add_field => {"newfield" => "%{#timestamp}%{message}"}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash_index"
}
stdout {
codec => rubydebug
}
}
Now I just need to do the aggregation.
Thank you so much for your help guys
EDIT:
Following the documentation, I put in place this configuration:
input {
syslog {
port => 514
}
}
filter {
prune {
whitelist_names =>["timestamp","message","newfield"]
}
mutate {
add_field => {"newfield" => "%{#timestamp}%{message}"}
}
if [message] =~ "MESSAGE FROM" {
aggregate {
task_id => "%{message}"
code => "map['message'] ||= 0; map['message'] += 1;"
push_map_as_event_on_timeout => true
timeout_task_id_field => "message"
timeout => 60
inactivity_timeout => 50
timeout_tags => ['_aggregatetimeout']
timeout_code => "event.set('count_message', event.get('message') > 1)"
}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash_index"
}
stdout {
codec => rubydebug
}
}
I don't get any error but the output is not what I am expecting.
The actual output is that it create a tag field (Good) passing an array with _aggregationtimeout and _aggregationexception
{
"message" => "<88>MESSAGE FROM\r\n",
"tags" => [
[0] "_aggregatetimeout",
[1] "_aggregateexception"
],
"#timestamp" => 2021-07-23T12:10:45.646Z,
"#version" => "1"
}

Read a CSV in Logstash level and filter on basis of the extracted data

I am using Metricbeat to get process-level data and push it to Elastic Search using Logstash.
Now, the aim is to categorize the processes into 2 tags i.e the process running is either a browser or it is something else.
I am able to do that statically using this block of code :
input {
beats {
port => 5044
}
}
filter{
if [process][name]=="firefox.exe" or [process][name]=="chrome.exe" {
mutate {
add_field => { "process.type" => "browsers" }
convert => {
"process.type" => "string"
}
}
}
else {
mutate {
add_field => { "process.type" => "other" }
}
}
}
output {
elasticsearch {
hosts => "localhost:9200"
# manage_template => false
index => "metricbeatlogstash"
}
}
But when I try to make that if condition dynamic by reading the process list from a CSV, I am not getting any valid results in Kibana, nor a error on my LogStash level.
The CSV config file code is as follows :
input {
beats {
port => 5044
}
file{
path=>"filePath"
start_position=>"beginning"
sincedb_path=>"NULL"
}
}
filter{
csv{
separator=>","
columns=>["processList","IT"]
}
if [process][name] in [processList] {
mutate {
add_field => { "process.type" => "browsers" }
convert => {
"process.type" => "string"
}
}
}
else {
mutate {
add_field => { "process.type" => "other" }
}
}
}
output {
elasticsearch {
hosts => "localhost:9200"
# manage_template => false
index => "metricbeatlogstash2"
}
}
What you are trying to do does not work that way in logstash, the events in a logstash pipeline are independent from each other.
The events received by your beats input have no knowledge about the events received by your csv input, so you can't use fields from different events in a conditional.
To do what you want you can use the translate filter with the following config.
translate {
field => "[process][name]"
destination => "[process][type]"
dictionary_path => "process.csv"
fallback => "others"
refresh_interval => 300
}
This filter will check the value of the field [process][name] against a dictionary, loaded into memory from the file process.csv, the dictionary is a .csv file with two columns, the first is the name of the browser process and the second is always browser.
chrome.exe,browser
firefox.exe,browser
If the filter got a match, it will populate the field [process][type] (not process.type) with the value from the second column, in this case, always browser, if there is no match, it will populate the field [process][type] with the value of the fallback config, in this case, others, it will also reload the content of the process.csv file every 300 seconds (5 minutes)

Lowercase field name in Logstash for Elasticsearch index

I have a logstash command that I'm piping a file to that will write to Elasticsearch. I want to use one field to select the index I will write to (appName). However the data in this field is not all lowercase so I need to do that when selecting the index but I don't want the data in the document itself to be modified.
I have an attempt below where I first copy the original field (appName) to a new one (appNameIndex), lowercase the new field, remove it from the upload and then use it pick the index.
input {
stdin { type => stdin }
}
filter {
csv {
separator => " "
columns => ["appName", "field1", "field2", ...]
convert => {
...
}
}
filter {
mutate {
copy => ["appName", "appNameIndex"]
}
}
filter {
mutate {
lowercase => ["appNameIndex"]
}
}
filter {
mutate {
remove_field => [
"appNameIndex", // if I remove this it works
...
]
}
}
output {
amazon_es {
hosts =>
["my-es-cluster.us-east-1.es.amazonaws.com"]
index => "%{appNameIndex}"
region => "us-east-1"
}
}
However I am getting errors that say
Invalid index name [%{appIndexName}]
Clearly it's not grabbing my mutation. Is it because the remove section takes it out entirely? I was hoping that just removed it from the document upload. Am I going about this incorrectly?
UPDATE I tried taking out the remove index name part and it does in fact work, so that helps identify the source of the error. Now the question becomes how do I get around it. With that part of the config removed I essentially have two fields with the same data, one lowercased and one not
You can define a #metadata field that is a special field which will never be included in the output https://www.elastic.co/guide/en/logstash/current/event-dependent-configuration.html#metadata.
input {
stdin { type => stdin }
}
filter {
csv {
separator => " "
columns => ["appName", "field1", "field2", ...]
convert => {
...
}
}
filter {
mutate {
copy => ["appName", "[#metadata][appNameIndex]"]
}
}
filter {
mutate {
lowercase => ["[#metadata][appNameIndex]"]
}
}
output {
amazon_es {
hosts => ["my-es-cluster.us-east-1.es.amazonaws.com"]
index => "%{[#metadata][appNameIndex]}"
region => "us-east-1"
}
}

Elasticsearch change field type from filter dissect

I use logstash-logback-encoder to send java log files to logstash, and then to elasticsearch. To parse the message in java log, I use following filter to dissect message
input {
file {
path => "/Users/MacBook-201965/Work/java/logs/oauth-logstash.log"
start_position => "beginning"
codec => "json"
}
}
filter {
if "EXECUTION_TIME" in [tags] {
dissect {
mapping => {
"message" => "%{endpoint} timeMillis:[%{execution_time_millis}] data:%{additional_data}"
}
}
mutate {
convert => { "execution_time_millis" => "integer" }
}
}
}
output {
elasticsearch {
hosts => "localhost:9200"
index => "elk-%{+YYYY}"
document_type => "log"
}
stdout {
codec => json
}
}
It dissect the message so I can get value of execution_time_millis. However the data type is string. I created the index using Kibana index pattern. How can I change the data type of execution_time_millis into long?
Here is the sample json message from logback
{
"message":"/tests/{id} timeMillis:[142] data:2282||0:0:0:0:0:0:0:1",
"logger_name":"com.timpamungkas.oauth.client.controller.ElkController",
"level_value":20000,
"endpoint":"/tests/{id}",
"execution_time_millis":"142",
"#version":1,
"host":"macbook201965s-MacBook-Air.local",
"thread_name":"http-nio-8080-exec-7",
"path":"/Users/MacBook-201965/Work/java/logs/oauth-logstash.log",
"#timestamp":"2018-01-04T11:20:20.100Z",
"level":"INFO",
"tags":[
"EXECUTION_TIME"
],
"additional_data":"2282||0:0:0:0:0:0:0:1"
}{
"message":"/tests/{id} timeMillis:[110] data:2280||0:0:0:0:0:0:0:1",
"logger_name":"com.timpamungkas.oauth.client.controller.ElkController",
"level_value":20000,
"endpoint":"/tests/{id}",
"execution_time_millis":"110",
"#version":1,
"host":"macbook201965s-MacBook-Air.local",
"thread_name":"http-nio-8080-exec-5",
"path":"/Users/MacBook-201965/Work/java/logs/oauth-logstash.log",
"#timestamp":"2018-01-04T11:20:19.780Z",
"level":"INFO",
"tags":[
"EXECUTION_TIME"
],
"additional_data":"2280||0:0:0:0:0:0:0:1"
}
Thank you
If you have already indexed the documents, you'll have to reindex the data after changing the datatype of any field.
However, you can use something like this to change the type of millis from string to integer. (long is not supported in this)
https://www.elastic.co/guide/en/logstash/current/plugins-filters-mutate.html#plugins-filters-mutate-convert
Also, try defining elasticsearch template before creating index if are going to add multiple index with index names having some regex pattern.Else, you can define your index format beforehand too an then start indexing.

how filter {"foo":"bar", "bar": "foo"} with grok to get only foo field?

I copied
{"name":"myapp","hostname":"banana.local","pid":40161,"level":30,"msg":"hi","time":"2013-01-04T18:46:23.851Z","v":0}
from https://github.com/trentm/node-bunyan and save it as my logs.json. I am trying to import only two fields (name and msg) to ElasticSearch via LogStash. The problem is that I depend on a sort of filter that I am not able to accomplish. Well I have successfully imported such line as a single message but certainly it is not worth in my real case.
That said, how can I import only name and msg to ElasticSearch? I tested several alternatives using http://grokdebug.herokuapp.com/ to reach an useful filter with no success at all.
For instance, %{GREEDYDATA:message} will bring the entire line as an unique message but how to split it and ignore all other than name and msg fields?
At the end, I am planing to use here:
input {
file {
type => "my_type"
path => [ "/home/logs/logs.log" ]
codec => "json"
}
}
filter {
grok {
match => { "message" => "data=%{GREEDYDATA:request}"}
}
#### some extra lines here probably
}
output
{
elasticsearch {
codec => json
hosts => "http://127.0.0.1:9200"
index => "indextest"
}
stdout { codec => rubydebug }
}
I have just gone through the list of available Logstash filters. The prune filter should match your need.
Assume you have installed the prune filter, your config file should look like:
input {
file {
type => "my_type"
path => [ "/home/logs/logs.log" ]
codec => "json"
}
}
filter {
prune {
whitelist_names => [
"#timestamp",
"type",
"name",
"msg"
]
}
}
output {
elasticsearch {
codec => json
hosts => "http://127.0.0.1:9200"
index => "indextest"
}
stdout { codec => rubydebug }
}
Please be noted that you will want to keep type for Elasticsearch to index it into a correct type. #timestamp is required if you will view the data on Kibana.

Resources