Filebeat -> Logstash indexing documents twice - elasticsearch

I have Nginx logs being sent from Filebeat to Logstash which is indexing them into Elasticsearch.
Every entry gets indexed twice. Once with the correct grok filter and then again with no fields found except for the "message" field.
This is the logstash configuration.
02-beats-input.conf
input {
beats {
port => 5044
ssl => false
}
}
11-nginx-filter.conf
filter {
if [type] == "nginx-access" {
grok {
patterns_dir => ['/etc/logstash/patterns']
match => {"message" => "%{NGINXACCESS}"
}
date {
match => [ "timestamp", "dd/MMM/YYYY:HH:mm:ss Z", "d/MMM/YYYY:HH:mm:ss Z" ]
}
}
}
Nginx Patterns
NGUSERNAME [a-zA-Z\.\#\-\+_%]+
NGUSER %{NGUSERNAME}
NGINXACCESS %{IPORHOST:clientip}\s+%{NGUSER:ident}\s+%{NGUSER:auth}\s+\[%{HTTPDATE:timestamp}\]\s+\"%{WORD:verb}\s+%{URIPATHPARAM:request}\s+HTTP/%{NUMBER:httpversion}\"\s+%{NUMBER:response}\s+(?:%{NUMBER:bytes}|-)\s+(?:\"(?:%{URI:referrer}|-)\"|%{QS:referrer})\s+%{QS:agent}
30-elasticsearc-output.conf
output {
elasticsearch {
hosts => ["elastic00:9200", "elastic01:9200", "elastic02:9200"]
manage_template => false
index => "%{[#metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
}

Check your filebeat configuration!
During setup I had accidentally un-commented and configured the output.elasticsearch section of the filebeat.yml.
I then also configured the output.logstash section of the configuration but forgot to comment out the elasticsearch output section.
This caused one entry to be sent to logstash where it was grok'd and another one to be sent directly to elasticsearch.

Related

how to import apache log into elasticsearcha and create an index pattern in Kibana

I have an apache log file which i want to import in elasticsearch and create index pattern in kibana. i have installed ELK stack in my centos-7 machine and the services are running.I configured the /etc/logstash/conf.d/apachelog.conf file but its not showing in kibana. Please suggest what is missing in this or any other configurations required. The elasticsearch.yml amd kibana.yml has also been configured. The /etc/logstash/conf.d/apachelog.conf file is as follows:
input {
file {
path => "/home/vagrant/apache.log"
start_position => "beginning"
}
}
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
geoip {
source => "clientip"
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "project"
}
}

Logstash delay of log sending

I'm forwarding application logs to elasticsearch, while performing some grok filters before.
The application has a timestamp field and there's the timestamp field of logstash itself.
We regularly check the difference between those timestamp, and on many cases the delay is very big, meaning the log took very long time to be shipped to elasticsearch.
I'm wondering how can I isolate the issue to know if the delay is coming from logstash or elasticsearch.
Example logstash scrape config:
input {
file {
path => "/app/app-core/_logs/app-core.log"
codec => multiline {
pattern => "(^[a-zA-Z.]+(?:Error|Exception).+)|(^\s+at .+)|(^\s+... \d+ more)|(^\t+)|(^\s*Caused by:.+)"
what => "previous"
}
}
}
filter {
if "multiline" not in [tags]{
json {
source => "message"
remove_field => ["[request][body]","[response][body][response][items]"]
}
}
else {
grok {
pattern_definitions => { APPJSON => "{.*}" }
match => { "message" => "%{APPJSON:appjson} %{GREEDYDATA:stack_trace}"}
remove_field => ["message"]
}
json {
source => "appjson"
remove_field => ["appjson"]
}
}
}
output {
elasticsearch {
hosts => ["elasticsearch-logs.internal.app.io:9200"]
index => "logstash-core-%{+YYYY.MM.dd}"
document_type => "logs"
}
}
We tried adjusting the number of workers and batch size, no value we tried reduced the delay:
pipeline.workers: 9
pipeline.output.workers: 9
pipeline.batch.size: 600
pipeline.batch.delay: 5
Nothing was done on the elasticsearch side because I think the issue is with logstash, but I'm not sure.

Match missing LogStash logs to the correct date in Kibana dashboard

TL;DR; Map missing logs from LogStash in Kibana dashboard to their correct date and time
I have an Amazon ElasticSearch domain(which includes both ElasticSearch service and Kibana dashboard) configured. My source of logs is a Beanstalk environment. I have installed FileBeat inside that environment, which sends the logs to an EC2, which have configured with LogStash. The LogStash server will then send the logs to ES domain endpoint.
This happened for a while without an issue, but when I checked yesterday, I saw logs were not sent to the ES for like 4 days. I got it fixed and the current logs are being transferred alright. The missing logs are still stacked in the LogStash server.
I modified my logstash.conf file to include the missing log files, but they all appear as a one single bar in the current date in my Kibana graph. What I want to do is that, make sure each missing set of logs is shown in Kibana in their respective date and time.
Example date and time part:
2021-05-20 19:44:34.700+0000
The following is my logstash.conf configuration. (Please let me know if I should post my FileBeat config too).
input {
beats {
port => 5044
}
}
filter {
date {
match => [ "logdate", "yyyy-MM-dd HH:mm:ss.SSS+Z" ]
}
mutate {
split => { "message" => "|" }
add_field => { "messageDate" => "%{[message][0]}" }
add_field => { "messageLevel" => "%{[message][1]}" }
add_field => { "messageContextName" => "%{[message][2]}" }
add_field => { "messagePID" => "%{[message][3]}" }
add_field => { "messageThread" => "%{[message][4]}" }
add_field => { "messageLogger" => "%{[message][5]}" }
add_field => { "messageMessage" => "%{[message][6]}" }
}
}
output {
amazon_es {
hosts => ["hostname"]
index => "dev-%{+YYYY.MM.dd}"
region => "region"
aws_access_key_id => "ackey_id"
aws_secret_access_key => "ackey"
}
}
Use a date filter to parse the date contained in the message. This will set [#timestamp] and then the event will appear in the right bucket in kibana.

Elasticsearch Logstash Filebeat mapping

Im having a problem with ELK Stack + Filebeat.
Filebeat is sending apache-like logs to Logstash, which should be parsing the lines. Elasticsearch should be storing the split data in fields so i can visualize them using Kibana.
Problem:
Elasticsearch recieves the logs but stores them in a single "message" field.
Desired solution:
Input:
10.0.0.1 some.hostname.at - [27/Jun/2017:23:59:59 +0200]
ES:
"ip":"10.0.0.1"
"hostname":"some.hostname.at"
"timestamp":"27/Jun/2017:23:59:59 +0200"
My logstash configuration:
input {
beats {
port => 5044
}
}
filter {
if [type] == "web-apache" {
grok {
patterns_dir => ["./patterns"]
match => { "message" => "IP: %{IPV4:client_ip}, Hostname: %{HOSTNAME:hostname}, - \[timestamp: %{HTTPDATE:timestamp}\]" }
break_on_match => false
remove_field => [ "message" ]
}
date {
locale => "en"
timezone => "Europe/Vienna"
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
useragent {
source => "agent"
prefix => "browser_"
}
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
hosts => ["localhost:9200"]
index => "test1"
document_type => "accessAPI"
}
}
My Elasticsearch discover output:
I hope there are any ELK experts around that can help me.
Thank you in advance,
Matthias
The grok filter you stated will not work here.
Try using:
%{IPV4:client_ip} %{HOSTNAME:hostname} - \[%{HTTPDATE:timestamp}\]
There is no need to specify desired names seperately in front of the field names (you're not trying to format the message here, but to extract seperate fields), just stating the field name in brackets after the ':' will lead to the result you want.
Also, use the overwrite-function instead of remove_field for message.
More information here:
https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html#plugins-filters-grok-options
It will look similar to that in the end:
filter {
grok {
match => { "message" => "%{IPV4:client_ip} %{HOSTNAME:hostname} - \[%{HTTPDATE:timestamp}\]" }
overwrite => [ "message" ]
}
}
You can test grok filters here:
http://grokconstructor.appspot.com/do/match

logstash not reading logtype field from beats

I have logstash filebeat and elasticsearch running on one node.
I'm trying to get logstash to identify logs labeled as "syslog" and dump them in an index named "syslog", but it appears to not see the label as they are all going into the "uncategorized" index (my catch all default index)
Here is my beats config
/etc/filebeat/filebeat.yml
filebeat:
prospectors:
-
paths:
- /var/log/messages
fields:
type: syslog
output:
logstash:
hosts: ["localhost:9901"]
Here is my logstash config file
/etc/logstash/conf.d/logstash_server_syslog.conf
input {
beats {
port => "9901"
}
}
filter {
if [type] == "syslog" {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
add_field => [ "received_at", "%{#timestamp}" ]
add_field => [ "received_from", "%{host}" ]
}
date {
match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
}
}
}
output {
if [type] == "syslog" {
elasticsearch {
hosts => ["10.0.0.167:9200", "10.0.0.168:9200"]
index => "syslog"
}
} else {
elasticsearch {
hosts => ["10.0.0.167:9200", "10.0.0.168:9200"]
index => "uncategorized"
}
}
}
Looking at the output (with a stdout{} stanza) would confirm this, but I'm guessing that you missed this part of the doc:
By default, the fields that you specify [in the 'fields' config'] will be grouped under a
fields sub-dictionary in the output document. To store the custom
fields as top-level fields, set the fields_under_root option to true.
To set a custom type field in Filebeat using the document_type configuration option.
filebeat:
prospectors:
- paths:
- /var/log/messages
document_type: syslog
This will set the #metadata.type field for use with Logstash whereas a custom field will not.

Resources