Logstash delay of log sending - elasticsearch

I'm forwarding application logs to elasticsearch, while performing some grok filters before.
The application has a timestamp field and there's the timestamp field of logstash itself.
We regularly check the difference between those timestamp, and on many cases the delay is very big, meaning the log took very long time to be shipped to elasticsearch.
I'm wondering how can I isolate the issue to know if the delay is coming from logstash or elasticsearch.
Example logstash scrape config:
input {
file {
path => "/app/app-core/_logs/app-core.log"
codec => multiline {
pattern => "(^[a-zA-Z.]+(?:Error|Exception).+)|(^\s+at .+)|(^\s+... \d+ more)|(^\t+)|(^\s*Caused by:.+)"
what => "previous"
}
}
}
filter {
if "multiline" not in [tags]{
json {
source => "message"
remove_field => ["[request][body]","[response][body][response][items]"]
}
}
else {
grok {
pattern_definitions => { APPJSON => "{.*}" }
match => { "message" => "%{APPJSON:appjson} %{GREEDYDATA:stack_trace}"}
remove_field => ["message"]
}
json {
source => "appjson"
remove_field => ["appjson"]
}
}
}
output {
elasticsearch {
hosts => ["elasticsearch-logs.internal.app.io:9200"]
index => "logstash-core-%{+YYYY.MM.dd}"
document_type => "logs"
}
}
We tried adjusting the number of workers and batch size, no value we tried reduced the delay:
pipeline.workers: 9
pipeline.output.workers: 9
pipeline.batch.size: 600
pipeline.batch.delay: 5
Nothing was done on the elasticsearch side because I think the issue is with logstash, but I'm not sure.

Related

Logstash aggregate fields

I am trying to configure logstash to aggregate similar syslog based on a message field and in a specific timestamp.
To make my case clear, this is an example of what I would like to do.
example: I have those junk syslog coming through my logstash
timestamp. message
13:54:24. hello
13:54:35. hello
What I would like to do is have a condition that check if the message are the same and those message occurs in a specific timespan (for example 10min) I would like to aggregate them into one row, and increase the count
the output I am expecting to see is as follow
timestamp. message. count
13.54.35. hello. 2
I know and I saw that there is the opportunity to aggregate the fields, but I was wondering if there is a chance to do this aggregation based on a specific time range
If anyone can help me I would be extremely grateful as I am new to logstash and I have the problem that in my server I am receiving tons of junk syslog and I would like to reduce that amount.
So far I did some cleaning with this configuration
input {
syslog {
port => 514
}
}
filter {
prune {
whitelist_names =>["timestamp","message","newfield"]
}
mutate {
add_field => {"newfield" => "%{#timestamp}%{message}"}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash_index"
}
stdout {
codec => rubydebug
}
}
Now I just need to do the aggregation.
Thank you so much for your help guys
EDIT:
Following the documentation, I put in place this configuration:
input {
syslog {
port => 514
}
}
filter {
prune {
whitelist_names =>["timestamp","message","newfield"]
}
mutate {
add_field => {"newfield" => "%{#timestamp}%{message}"}
}
if [message] =~ "MESSAGE FROM" {
aggregate {
task_id => "%{message}"
code => "map['message'] ||= 0; map['message'] += 1;"
push_map_as_event_on_timeout => true
timeout_task_id_field => "message"
timeout => 60
inactivity_timeout => 50
timeout_tags => ['_aggregatetimeout']
timeout_code => "event.set('count_message', event.get('message') > 1)"
}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash_index"
}
stdout {
codec => rubydebug
}
}
I don't get any error but the output is not what I am expecting.
The actual output is that it create a tag field (Good) passing an array with _aggregationtimeout and _aggregationexception
{
"message" => "<88>MESSAGE FROM\r\n",
"tags" => [
[0] "_aggregatetimeout",
[1] "_aggregateexception"
],
"#timestamp" => 2021-07-23T12:10:45.646Z,
"#version" => "1"
}

Read a CSV in Logstash level and filter on basis of the extracted data

I am using Metricbeat to get process-level data and push it to Elastic Search using Logstash.
Now, the aim is to categorize the processes into 2 tags i.e the process running is either a browser or it is something else.
I am able to do that statically using this block of code :
input {
beats {
port => 5044
}
}
filter{
if [process][name]=="firefox.exe" or [process][name]=="chrome.exe" {
mutate {
add_field => { "process.type" => "browsers" }
convert => {
"process.type" => "string"
}
}
}
else {
mutate {
add_field => { "process.type" => "other" }
}
}
}
output {
elasticsearch {
hosts => "localhost:9200"
# manage_template => false
index => "metricbeatlogstash"
}
}
But when I try to make that if condition dynamic by reading the process list from a CSV, I am not getting any valid results in Kibana, nor a error on my LogStash level.
The CSV config file code is as follows :
input {
beats {
port => 5044
}
file{
path=>"filePath"
start_position=>"beginning"
sincedb_path=>"NULL"
}
}
filter{
csv{
separator=>","
columns=>["processList","IT"]
}
if [process][name] in [processList] {
mutate {
add_field => { "process.type" => "browsers" }
convert => {
"process.type" => "string"
}
}
}
else {
mutate {
add_field => { "process.type" => "other" }
}
}
}
output {
elasticsearch {
hosts => "localhost:9200"
# manage_template => false
index => "metricbeatlogstash2"
}
}
What you are trying to do does not work that way in logstash, the events in a logstash pipeline are independent from each other.
The events received by your beats input have no knowledge about the events received by your csv input, so you can't use fields from different events in a conditional.
To do what you want you can use the translate filter with the following config.
translate {
field => "[process][name]"
destination => "[process][type]"
dictionary_path => "process.csv"
fallback => "others"
refresh_interval => 300
}
This filter will check the value of the field [process][name] against a dictionary, loaded into memory from the file process.csv, the dictionary is a .csv file with two columns, the first is the name of the browser process and the second is always browser.
chrome.exe,browser
firefox.exe,browser
If the filter got a match, it will populate the field [process][type] (not process.type) with the value from the second column, in this case, always browser, if there is no match, it will populate the field [process][type] with the value of the fallback config, in this case, others, it will also reload the content of the process.csv file every 300 seconds (5 minutes)

How to filter data with Logstash before storing parsed data in Elasticsearch

I understand that Logstash is for aggregating and processing logs. I have NGIX logs and had Logstash config setup as:
filter {
grok {
match => [ "message" , "%{COMBINEDAPACHELOG}+%{GREEDYDATA:extra_fields}"]
overwrite => [ "message" ]
}
mutate {
convert => ["response", "integer"]
convert => ["bytes", "integer"]
convert => ["responsetime", "float"]
}
geoip {
source => "clientip"
target => "geoip"
add_tag => [ "nginx-geoip" ]
}
date {
match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
remove_field => [ "timestamp" ]
}
useragent {
source => "agent"
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "weblogs-%{+YYYY.MM}"
document_type => "nginx_logs"
}
stdout { codec => rubydebug }
}
This would parse the unstructured logs into a structured form of data, and store the data into monthly indexes.
What I discovered is that the majority of logs were contributed by robots/web-crawlers. In python I would filter them out by:
browser_names = browser_names[~browser_names.str.\
match('^[\w\W]*(google|bot|spider|crawl|headless)[\w\W]*$', na=False)]
However, I would like to filter them out with Logstash so I can save a lot of disk space in Elasticsearch server. Is there a way to do that? Thanks in advance!
Thanks LeBigCat for generously giving a hint. I solved this problem by adding the following under the filter:
if [browser_names] =~ /(?i)^[\w\W]*(google|bot|spider|crawl|headless)[\w\W]*$/ {
drop {}
}
the (?i) flag is for case insensitive matching.
In your filter you can ask for drop (https://www.elastic.co/guide/en/logstash/current/plugins-filters-drop.html). As you already got your pattern, should be pretty fast ;)

Elasticsearch Logstash Filebeat mapping

Im having a problem with ELK Stack + Filebeat.
Filebeat is sending apache-like logs to Logstash, which should be parsing the lines. Elasticsearch should be storing the split data in fields so i can visualize them using Kibana.
Problem:
Elasticsearch recieves the logs but stores them in a single "message" field.
Desired solution:
Input:
10.0.0.1 some.hostname.at - [27/Jun/2017:23:59:59 +0200]
ES:
"ip":"10.0.0.1"
"hostname":"some.hostname.at"
"timestamp":"27/Jun/2017:23:59:59 +0200"
My logstash configuration:
input {
beats {
port => 5044
}
}
filter {
if [type] == "web-apache" {
grok {
patterns_dir => ["./patterns"]
match => { "message" => "IP: %{IPV4:client_ip}, Hostname: %{HOSTNAME:hostname}, - \[timestamp: %{HTTPDATE:timestamp}\]" }
break_on_match => false
remove_field => [ "message" ]
}
date {
locale => "en"
timezone => "Europe/Vienna"
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
useragent {
source => "agent"
prefix => "browser_"
}
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
hosts => ["localhost:9200"]
index => "test1"
document_type => "accessAPI"
}
}
My Elasticsearch discover output:
I hope there are any ELK experts around that can help me.
Thank you in advance,
Matthias
The grok filter you stated will not work here.
Try using:
%{IPV4:client_ip} %{HOSTNAME:hostname} - \[%{HTTPDATE:timestamp}\]
There is no need to specify desired names seperately in front of the field names (you're not trying to format the message here, but to extract seperate fields), just stating the field name in brackets after the ':' will lead to the result you want.
Also, use the overwrite-function instead of remove_field for message.
More information here:
https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html#plugins-filters-grok-options
It will look similar to that in the end:
filter {
grok {
match => { "message" => "%{IPV4:client_ip} %{HOSTNAME:hostname} - \[%{HTTPDATE:timestamp}\]" }
overwrite => [ "message" ]
}
}
You can test grok filters here:
http://grokconstructor.appspot.com/do/match

Logstash not importing files due to missing index error

I am having a difficult time trying to get the combination of the Logstash, Elasticsearch & Kibana working in my Windows 7 environment.
I have set all 3 up and they all seem to be running fine, Logstash and Elasticsearch are running as Windows services and Kibana as a website in IIS.
Logstash is running from http://localhost:9200
I have a web application creating log files in .txt with the format:
Datetime=[DateTime], Value=[xxx]
The log files get created in this directory:
D:\wwwroot\Logs\Errors\
My logstash.conf file looks like this:
input {
file {
format => ["plain"]
path => ["D:\wwwroot\Logs\Errors\*.txt"]
type => "testlog"
}
}
output {
elasticsearch {
embedded => true
}
}
My Kibana config.js file looks like this:
define(['settings'],
function (Settings) {
return new Settings({
elasticsearch: "http://localhost:9200",
kibana_index: "kibana-int",
panel_names: [
'histogram',
'map',
'pie',
'table',
'filtering',
'timepicker',
'text',
'fields',
'hits',
'dashcontrol',
'column',
'derivequeries',
'trends',
'bettermap',
'query',
'terms'
]
});
});
When I view Kibana I see the error:
No index found at http://localhost:9200/_all/_mapping. Please create at least one index.If you're using a proxy ensure it is configured correctly.
I have no idea on how to create the index, so if anyone can shed some light on what I am doing wrong that would be great.
It seems like nothing is making it to elasticsearch currently.
For the current version of es (0.90.5), I had to use elasticsearch_http output. The elasticsearch output seemed to be too closely associated with 0.90.3.
e.g: here is how my config is for log4j format to elastic search
input {
file {
path => "/srv/wso2/wso2am-1.4.0/repository/logs/wso2carbon.log"
path => "/srv/wso2/wso2as-5.1.0/repository/logs/wso2carbon.log"
path => "/srv/wso2/wso2is-4.1.0/repository/logs/wso2carbon.log"
type => "log4j"
}
}
output {
stdout { debug => true debug_format => "ruby"}
elasticsearch_http {
host => "localhost"
port => 9200
}
}
For my file format, I have a grok filter as well - to parse it properly.
filter {
if [message] !~ "^[ \t\n]+$" {
# if the line is a log4j type
if [type] == "log4j" {
# parse out fields from log4j line
grok {
match => [ "message", "TID:%{SPACE}\[%{BASE10NUM:thread_name}\]%{SPACE}\[%{WORD:component}\]%{SPACE}\[%{TIMESTAMP_ISO8601:timestamp}\]%{SPACE}%{LOGLEVEL:level}%{SPACE}{%{JAVACLASS:java_file}}%{SPACE}-%{SPACE}%{GREEDYDATA:log_message}" ]
add_tag => ["test"]
}
if "_grokparsefailure" not in [tags] {
mutate {
replace => ["message", " "]
}
}
multiline {
pattern => "^TID|^ $"
negate => true
what => "previous"
add_field => {"additional_log" => "%{message}"}
remove_field => ["message"]
remove_tag => ["_grokparsefailure"]
}
mutate {
strip => ["additional_log"]
remove_tag => ["test"]
remove_field => ["message"]
}
}
} else {
drop {}
}
}
Also, I would get elasticsearch head plugin to monitor your content in elasticsearch- to easily verify the data and state it is in.

Resources