Logstash pipeline adding extra timestamp%{host} in beginning of rsyslog message - elasticsearch

I am using logstash pipeline to ingest data into rsyslog server .
But the pipeline is adding extra date stamp and %{host} at the beginning
example:
**Sep 22 04:47:20 %{host} 2022-09-22T04:47:20.876Z %{host}** 22-09-2022 05:47:20.875 a7bd0ebd9101-SLOT0 `TEST-AWS-ACTIVITY`#011970507 P 201059147698 `[FCH-TEST] [35.49.122.49] [TEST-251047********-******] [c713fcf9-6e73-4627-ace9-170e6c72fac5] OUT;P;201059147698;;;;/bcl/test/survey/checkSurveyEligibility.json;ErrorMsg=none;;{"body":{"eligible":false,"surveys":[]},"header":null}`**
Can anyone tell from where this extra part is coming and how to suppress this .
The data is coming from AWS cloudwatch installed on ECS containers.
The pipeline is configured as :
input { pipeline { address => test_syslog } }
filter {
if [owner] == "1638134254521" {
mutate { add_field => { "[ec_part]" => "AWS_TEST"} }
}
}
output {
#TEST ACTIVITY Logs being sent via TCP to Logreceiver
if [ec_part] == "AWS_TEST" {
syslog {
appname => ""
host => "10.119.140.206"
port => "10514"
protocol => "ssl-tcp"
ssl_cacert => "/etc/logstash/ca.crt"
ssl_cert => "/etc/logstash/server.crt"
ssl_key => "/etc/logstash/server.key"
priority => "info"
rfc => "rfc5424"
}
}
}

The default codec for an output is plain, and the if the format option is not specified then that will call the .to_s method on the event. The .to_s method adds the timestamp and %{host}. You can prevent this by adding
codec => plain { format => "%{message}" }
to your syslog output.

Related

Logstash delay of log sending

I'm forwarding application logs to elasticsearch, while performing some grok filters before.
The application has a timestamp field and there's the timestamp field of logstash itself.
We regularly check the difference between those timestamp, and on many cases the delay is very big, meaning the log took very long time to be shipped to elasticsearch.
I'm wondering how can I isolate the issue to know if the delay is coming from logstash or elasticsearch.
Example logstash scrape config:
input {
file {
path => "/app/app-core/_logs/app-core.log"
codec => multiline {
pattern => "(^[a-zA-Z.]+(?:Error|Exception).+)|(^\s+at .+)|(^\s+... \d+ more)|(^\t+)|(^\s*Caused by:.+)"
what => "previous"
}
}
}
filter {
if "multiline" not in [tags]{
json {
source => "message"
remove_field => ["[request][body]","[response][body][response][items]"]
}
}
else {
grok {
pattern_definitions => { APPJSON => "{.*}" }
match => { "message" => "%{APPJSON:appjson} %{GREEDYDATA:stack_trace}"}
remove_field => ["message"]
}
json {
source => "appjson"
remove_field => ["appjson"]
}
}
}
output {
elasticsearch {
hosts => ["elasticsearch-logs.internal.app.io:9200"]
index => "logstash-core-%{+YYYY.MM.dd}"
document_type => "logs"
}
}
We tried adjusting the number of workers and batch size, no value we tried reduced the delay:
pipeline.workers: 9
pipeline.output.workers: 9
pipeline.batch.size: 600
pipeline.batch.delay: 5
Nothing was done on the elasticsearch side because I think the issue is with logstash, but I'm not sure.

Logstash aggregate fields

I am trying to configure logstash to aggregate similar syslog based on a message field and in a specific timestamp.
To make my case clear, this is an example of what I would like to do.
example: I have those junk syslog coming through my logstash
timestamp. message
13:54:24. hello
13:54:35. hello
What I would like to do is have a condition that check if the message are the same and those message occurs in a specific timespan (for example 10min) I would like to aggregate them into one row, and increase the count
the output I am expecting to see is as follow
timestamp. message. count
13.54.35. hello. 2
I know and I saw that there is the opportunity to aggregate the fields, but I was wondering if there is a chance to do this aggregation based on a specific time range
If anyone can help me I would be extremely grateful as I am new to logstash and I have the problem that in my server I am receiving tons of junk syslog and I would like to reduce that amount.
So far I did some cleaning with this configuration
input {
syslog {
port => 514
}
}
filter {
prune {
whitelist_names =>["timestamp","message","newfield"]
}
mutate {
add_field => {"newfield" => "%{#timestamp}%{message}"}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash_index"
}
stdout {
codec => rubydebug
}
}
Now I just need to do the aggregation.
Thank you so much for your help guys
EDIT:
Following the documentation, I put in place this configuration:
input {
syslog {
port => 514
}
}
filter {
prune {
whitelist_names =>["timestamp","message","newfield"]
}
mutate {
add_field => {"newfield" => "%{#timestamp}%{message}"}
}
if [message] =~ "MESSAGE FROM" {
aggregate {
task_id => "%{message}"
code => "map['message'] ||= 0; map['message'] += 1;"
push_map_as_event_on_timeout => true
timeout_task_id_field => "message"
timeout => 60
inactivity_timeout => 50
timeout_tags => ['_aggregatetimeout']
timeout_code => "event.set('count_message', event.get('message') > 1)"
}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash_index"
}
stdout {
codec => rubydebug
}
}
I don't get any error but the output is not what I am expecting.
The actual output is that it create a tag field (Good) passing an array with _aggregationtimeout and _aggregationexception
{
"message" => "<88>MESSAGE FROM\r\n",
"tags" => [
[0] "_aggregatetimeout",
[1] "_aggregateexception"
],
"#timestamp" => 2021-07-23T12:10:45.646Z,
"#version" => "1"
}

Having issues creating conditional outputs with logstash using metadata fields

I want to send winlogbeat data to a separate index than my main index. I have configured winlogbeat to send it's data to my logstash server and i can confirm that i have received the data.
This is what i do currently:
output {
if [#metadata][beat] == "winlogbeat" {
elasticsearch {
hosts => ["10.229.1.12:9200", "10.229.1.13:9200"]
index => "%{[#metadata][beat]}-%{+YYYY-MM-dd}"
user => logstash_internal
password => password
stdout { codec => rubydebug }
}
else {
elasticsearch {
hosts => ["10.229.1.12:9200", "10.229.1.13:9200"]
index => "logstash-%{stuff}-%{+YYYY-MM-dd}"
user => logstash_internal
password => password
}
}
}
}
However, i cannot start logstash using this configuration. If i remove the if statements and only use one elasticsearch output, the one which handles regular logstash data, it works.
What am i doing wrong here?
You have problems with the brackets from your configuration. To fix your code please see below:
output {
if [#metadata][beat] == "winlogbeat" {
elasticsearch {
hosts => ["10.229.1.12:9200", "10.229.1.13:9200"]
index => "%{[#metadata][beat]}-%{+YYYY-MM-dd}"
user => logstash_internal
password => password
}
stdout { codec => rubydebug }
} else {
elasticsearch {
hosts => ["10.229.1.12:9200", "10.229.1.13:9200"]
index => "logstash-%{stuff}-%{+YYYY-MM-dd}"
user => logstash_internal
password => password
}
}
}
I hope this sorts your issue.

Logstash If configuration dropping events

I am using the ELK stack to save custom events. The events I am pushing may or may not contain a field called feed.name.
I am using this field to dynamically set the index, so if it doesn't exist I want to set it to unknown before sending it to Elastic.
Here is the full config I have:
input {
http{
host => "XXXXXXXXX"
port => xxxx
codec => "json"
}
}
filter{
if ![feed.name]{
mutate { add_field => { "feed.name"=> "unknown" }}
}
if [source.asn]{
mutate { convert => {"source.asn" => "string"}}
}
if [destination.asn]{
mutate { convert => {"destination.asn" => "string"}}
}
}
output {
elasticsearch {
hosts => ["xxxxxxxxx:XXXX"]
index => "l-%{feed.name}-%{+YYYY.MM.dd}"
}
}
Here is my problem: When there is no feed.name set, Logstash sets it correctly and everything is fine. However, if the field exists, the event seems to be dropped.
So 2 questions arise here: How is this behaviour explained? And also, how can I make it work (or are there any workarounds)?

Convert timestamp timezone in Logstash for output index name

In my scenario, the "timestamp" of the syslog lines Logstash receives is in UTC and we use the event "timestamp" in the Elasticsearch output:
output {
elasticsearch {
embedded => false
host => localhost
port => 9200
protocol => http
cluster => 'elasticsearch'
index => "syslog-%{+YYYY.MM.dd}"
}
}
My problem is that at UTC midnight, Logstash sends log to different index before the end of the day in out timezone (GMT-4 => America/Montreal) and the index has no logs after 20h (8h PM) because of the "timestamp" being UTC.
We've done a work arround to convert the timezone but we experience a significant performance degradation:
filter {
mutate {
add_field => {
# Create a new field with string value of the UTC event date
"timestamp_zoned" => "%{#timestamp}"
}
}
date {
# Parse UTC string value and convert it to my timezone into a new field
match => [ "timestamp_zoned", "yyyy-MM-dd HH:mm:ss Z" ]
timezone => "America/Montreal"
locale => "en"
remove_field => [ "timestamp_zoned" ]
target => "timestamp_zoned_obj"
}
ruby {
# Output the zoned date to a new field
code => "event['index_day'] = event['timestamp_zoned_obj'].strftime('%Y.%m.%d')"
remove_field => [ "timestamp_zoned_obj" ]
}
}
output {
elasticsearch {
embedded => false
host => localhost
port => 9200
protocol => http
cluster => 'elasticsearch'
# Use of the string value
index => "syslog-%{index_day}"
}
}
Is there a way to optimize this config?
This is the optimize config, please have a try and test for the performance.
You no need to use mutate and date plugin. Use ruby plugin directly.
input {
stdin {
}
}
filter {
ruby {
code => "
event['index_day'] = event['#timestamp'].localtime.strftime('%Y.%m.%d')
"
}
}
output {
stdout { codec => rubydebug }
}
Example output:
{
"message" => "test",
"#version" => "1",
"#timestamp" => "2015-03-30T05:27:06.310Z",
"host" => "BEN_LIM",
"index_day" => "2015.03.29"
}
In logstash version 5.0 and later, you can use this:
filter{
ruby {
code => "event.set('index_day', event.get('[#timestamp]').time.localtime.strftime('%Y%m%d'))"
}
}
In version 1.5.0, we can convert timestamp by local timezone for the index name. Here is my configuration:
filter {
ruby {
code => "event['index_day'] = event.timestamp.time.localtime.strftime('%Y.%m.%d')"
}
}
output {
elasticsearch {
host => localhost
index => "thrall-%{index_day}"
}
}
In Logstash Version 5.0.2,The API was modified. We can convert timestamp by local timezone for the index name. Here is my configuration:
filter {
ruby {
code => "event['index_day'] = event.timestamp.time.localtime.strftime('%Y.%m.%d')"
}
}
Similar use case - but using the logstash file output plugin and writing files dated by the local time of the arrival of the event.
Verified on logstash version 7.12.
Adapted from discuss.elastic.co, mainly zero padding the offset hours. NB! If your offset has half hours you will need to adjust accordingly.
filter {
ruby {
code => "
require 'tzinfo'
tz = 'Europe/Oslo'
offset = TZInfo::Timezone.get(tz).current_period.utc_total_offset / (60*60)
event.set('[#metadata][local_date]',
event.get('#timestamp').time.localtime(
sprintf('+%02i:00', offset.to_s)
).strftime('%Y%m%d'))
"
}
if ([agent][type] == "filebeat") {
mutate {
add_field => ["file_path", "%{[host][name]}_%{[log][file][path]}.%{[#metadata][local_date]}"]
}
} else {
mutate {
add_field => ["file_path", "%{[agent][hostname]}_%{[agent][type]}.%{[#metadata][local_date]}"]
}
}
}

Resources