And Thank you in advance for any help!
I'm using Netdata to collect metrics from servers and then send them to Logstash and to Elastic.
My need is to aggregate metrics with same fields and create a single event but in nested format.
This a example of input from Netdata:
{"host":"centosdns","#version":"1","port":52212,"#timestamp":"2019-01-19T16:16:22.117Z","message":"netdata.centosdns.disk_await.centos_swap.reads 0.0000000 1547914548"}
{"host":"centosdns","#version":"1","port":52212,"#timestamp":"2019-01-19T16:16:22.117Z","message":"netdata.centosdns.disk_await.centos_swap.writes 0.0000000 1547914548"}
{"host":"centosdns","#version":"1","port":52212,"#timestamp":"2019-01-19T16:16:22.117Z","message":"netdata.centosdns.disk_await.centos_root.reads 0.0000000 1547914548"}
{"host":"centosdns","#version":"1","port":52212,"#timestamp":"2019-01-19T16:16:22.117Z","message":"netdata.centosdns.disk_await.centos_root.writes 0.0000000 1547914548"}
My config file of logstash looks like:
input {
tcp {
port => 1234
}
}
filter {
# I take 'message' field and separate in different fields
grok {
named_captures_only => "true"
pattern_definitions => {
"CHART" => "[a-z]\w+"
"FAMILY" => "[_a-z0-9]+"
}
match => {
"message" => "%{WORD:prefix}\.%{WORD:hostname}\.%{CHART:chart}\.%{FAMILY:family}\.%{NOTSPACE:dimension} %{NUMBER:val} %{NUMBER:timestamp}"
}
}
if "_grokparsefailure" not in [tags] {
mutate {
remove_field => [ "#version", "host", "port", "prefix" ]
}
# Attempt to create a nested field and then aggregate
mutate {
id => "chart_field"
add_field => { "[%{chart}][%{family}][%{dimension}][value]" => "%{val}"
}
}
aggregate {
task_id => "[%{chart}][%{family}]"
code => "
# I tried many codes to aggregate but without success
event.cancel()
"
push_previous_map_as_event => true
timeout => 5
}
mutate {
# Remove unnecessary fields
id => "netdata_mutate_remove"
remove_field => [ "timestamp", "message"]
}
} else {
drop{}
}
output {
# TESTING PURPOSES
if "_aggregateexception" in [tags] {
file {
path => "/var/log/logstash/netdata/aggregatefailures-%{+MM-dd}.log"
}
} else {
file {
path => "/var/log/logstash/netdata/netdata-%{+MM-dd}-aggregate.log"
}
}
stdout { codec => rubydebug }
}
Take the input above:
"netdata.centosdns.disk_await.centos_swap.reads 0.0000000"
"netdata.centosdns.disk_await.centos_swap.writes 0.0000000"
My objective is make a nested field like:
disk_await: { # Chart
centos_swap: { # Family
[
reads => 0.0000000, # Dimension => Value
writes => 0.0000000 # Dimension => Value
]
}
}
I pretend to aggregate all 'Dimension\'Value'' in the same 'Chart'\'Family', this is only four lines of metrics but in reality we talk about 1000 per second or even more in some cases, all metrics are dynamic, it's virtual impossible to know all the names.
At this moment I'm using:
Logstash v.6.5.4 on a Virtualbox CentOS 7 minimal
All plugins (inputs/filters/outputs) updated
Related
I would like create a logstash grok pattern to parse the below oracle audit log and extract only the values from "<AuditRecord> to </AuditRecord>"
{"messageType":"DATA_MESSAGE","owner":"656565656566","logGroup":"/aws/rds/instance/stg/audit","logStream":"STG_ora_20067_20210906120520144010741320.xml","subscriptionFilters":["All logs"],"logEvents":[{"id":"36370952585791240628335082776414249187626811417307774976","timestamp":1630929920144,"message":<AuditRecord><Audit_Type>8</Audit_Type><EntryId>1</EntryId><Extended_Timestamp>2021-08-31T13:25:20.140969Z</Extended_Timestamp><DB_User>/</DB_User><OS_User>rdsdb</OS_User><Userhost>ip-172-27-1-72</Userhost><OS_Process>6773</OS_Process><Instance_Number>0</Instance_Number><Returncode>0</Returncode><OSPrivilege>SYSDBA</OSPrivilege><DBID>918393906</DBID> <Sql_Text>CONNECT</Sql_Text> </AuditRecord>"}]}
these logs are stored in s3 and in gz format. I am using below config for Logstash but its not working.
input {
s3 {
bucket => "s3bucket"
type => "oracle-audit-log-xml"
region => "eu-west-1"
}
}
filter {
## For Oracle audit log
if [type] == "oracle-audit-log-xml" {
mutate { gsub => [ "message", "[\n]", "" ] }
grok {
match => [ "message","<AuditRecord>%{DATA:temp_audit_message}</AuditRecord>" ]
}
mutate {
add_field => { "audit_message" => "<AuditRecord>%{temp_audit_message}</AuditRecord>" }
}
xml {
store_xml => true
source => "audit_message"
target => "audit"
}
mutate {
add_field => { "timestamp" => "%{[audit][Extended_Timestamp]}" }
}
date {
match => [ "timestamp","yyyy-MM-dd'T'HH:mm:ss.SSSSSS'Z'","ISO8601" ]
target => "#timestamp"
}
# remove temporary fields
mutate { remove_field => ["message", "audit_message", "temp_audit_message"] }
if "_grokparsefailure" in [tags] {
drop{}
}
}
}
output {
amazon_es {
hosts => ["elasticsearch url"]
index => "rdslogs-%{+YYYY.MM.dd}"
region => "eu-west-1"
aws_access_key_id => ''
aws_secret_access_key => ''
}
}
it seems to be an issue with the line below
{"messageType":"DATA_MESSAGE","owner":"656565656566","logGroup":"/aws/rds/instance/stg/audit","logStream":"STG_ora_20067_20210906120520144010741320.xml","subscriptionFilters":["All logs"],"logEvents":[{"id":"36370952585791240628335082776414249187626811417307774976","timestamp":1630929920144,"message":
is there any way we can modify this to drop the above line.
Thanks
You don't need a grok pattern as your logs are in JSON format. Install logstash json filter plugin.
$ logstash-plugin install logstash-filter-json
And add filter setting to like below to parse your logs.
filter{
json {
source => "message"
}
}
Can check attached screenshot from my local ELK setup. Tried to parse log line provided by you.
I am injecting a file from the s3 bucket to logstash, My file name is containing some information, I want to split the file name into multiple fields, so I can use them as separate fields. Please help me I am new with elk.
input {
s3 {
bucket => "***********"
access_key_id => "***********"
secret_access_key => "*******"
"region" => "*********"
"prefix" => "Logs"
"interval" => "1"
"additional_settings" => {
"force_path_style" => true
"follow_redirects" => false
}
}
}
filter {
mutate {
add_field => {
"file" => "%{[#metadata][s3][key]}" //This file name have to split
}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "indexforlogstash"
}
}
In the filter section you can leverage the dissect filter in order to achieve what you want:
filter {
...
dissect {
mapping => {
"file" => "Logs/%{deviceId}-%{buildId}-log.txt"
}
}
}
After going through this filter, your document is going to get two new fields, namely:
deviceId (1232131)
buildId (custombuildv12)
I am using the Kafka plugin to input data into logstash from kafka.
input {
kafka {
bootstrap_servers => ["{{ kafka_bootstrap_server }}"]
codec => "json"
group_id => "{{ kafka_consumer_group_id }}"
auto_offset_reset => "earliest"
topics_pattern => ".*" <- This line ensures it reads from all kafka topics
decorate_events => true
add_field => { "[#metadata][label]" => "kafka-read" }
}
}
The kafka topics are of the format
ingest-abc &
ingest-xyz
I use the following filter to specify the ES index where it should end up by setting the [#metadata][index_prefix] field.
filter {
mutate {
add_field => {
"[#metadata][index_prefix]" => "%{[#metadata][kafka][topic]}"
}
remove_field => ["[kafka][partition]", "[kafka][key]"]
}
if [message] {
mutate {
add_field => { "[pipeline_metadata][normalizer][original_raw_message]" => "%{message}" }
}
}
}
So my es indexes end up being
ingest-abc-YYYY-MM-DD
ingest-xyz-YYYY-MM-DD
How do I set the index_prefix to
abc-YYYY-MM-DD & xyz-YYYY-MM-DD instead
by getting rid of the commong ingest- prefix
The regex that matches it is: (?!ingest)\b(?!-)\S+
But I am not sure where it would fit in the config.
Thanks!
Ok so I figured it out if anyone ever stumbles on a similar problem,
I basically used a gsub filter instead of filters and grok,
This replaces any matching text with the passed text in argument3
filter {
mutate {
rename => { "[#metadata][kafka]" => "kafka"}
gsub => [ "[#metadata][index_prefix]", "ingest-", "" ]
}
}
I have the following configuration for logstash.
There are 3 parts to this one is a generallog which we use for all applications they land in here.
second part is the application stats where in which we have a specific logger which will be configured to push the application statistics
third we have is the click stats when ever an event occurs on client side we may want to push it to the logstash on the upd address.
all 3 are udp based, we also use log4net to to send the logs to the logstash.
the base install did not have a GeoIP.dat file so got the file downloaded from the https://dev.maxmind.com/geoip/legacy/geolite/
have put the file in the /opt/logstash/GeoIPDataFile with a 777 permissions on the file and folder.
second thing is i have a country name and i need a way to show how many users form each country are viewing the application in last 24 hours.
so for that reason we also capture the country name as its in their profile in the application.
now i need a way to get the geo co-ordinates to use the tilemap in kibana.
What am i doing wrong.
if i take the geoIP { source -=> "country" section the logstash works fine.
when i check the
/opt/logstash/bin/logstash -t -f /etc/logstash/conf.d/logstash.conf
The configuration file is ok is what i receive. where am i going worng?
Any help would be great.
input {
udp {
port => 5001
type => generallog
}
udp {
port => 5003
type => applicationstats
}
udp {
port => 5002
type => clickstats
}
}
filter {
if [type] == "generallog" {
grok {
remove_field => message
match => { message => "(?m)%{TIMESTAMP_ISO8601:sourcetimestamp} \[%{NUMBER:threadid}\] %{LOGLEVEL:loglevel} +- %{IPORHOST:requesthost} - %{WORD:applicationname} - %{WORD:envname} - %{GREEDYDATA:logmessage}" }
}
if !("_grokparsefailure" in [tags]) {
mutate {
replace => [ "message" , "%{logmessage}" ]
replace => [ "host" , "%{requesthost}" ]
add_tag => "generalLog"
}
}
}
if [type] == "applicationstats" {
grok {
remove_field => message
match => { message => "(?m)%{TIMESTAMP_ISO8601:sourceTimestamp} \[%{NUMBER:threadid}\] %{LOGLEVEL:loglevel} - %{WORD:envName}\|%{IPORHOST:actualHostMachine}\|%{WORD:applicationName}\|%{NUMBER:empId}\|%{WORD:regionCode}\|%{DATA:country}\|%{DATA:applicationName}\|%{NUMBER:staffapplicationId}\|%{WORD:applicationEvent}" }
}
geoip {
source => "country"
target => "geoip"
database => "/opt/logstash/GeoIPDataFile/GeoIP.dat"
add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ]
}
mutate {
convert => [ "[geoip][coordinates]", "float"]
}
if !("_grokparsefailure" in [tags]) {
mutate {
add_tag => "applicationstats"
add_tag => [ "eventFor_%{applicationName}" ]
}
}
}
if [type] == "clickstats" {
grok {
remove_field => message
match => { message => "(?m)%{TIMESTAMP_ISO8601:sourceTimestamp} \[%{NUMBER:threadid}\] %{LOGLEVEL:loglevel} - %{IPORHOST:remoteIP}\|%{IPORHOST:fqdnHost}\|%{IPORHOST:actualHostMachine}\|%{WORD:applicationName}\|%{WORD:envName}\|(%{NUMBER:clickId})?\|(%{DATA:clickName})?\|%{DATA:clickEvent}\|%{WORD:domainName}\\%{WORD:userName}" }
}
if !("_grokparsefailure" in [tags]) {
mutate {
add_tag => "clicksStats"
add_tag => [ "eventFor_%{clickName}" ]
}
}
}
}
output {
if [type] == "applicationstats" {
elasticsearch {
hosts => "localhost:9200"
index => "applicationstats-%{+YYYY-MM-dd}"
template => "/opt/logstash/templates/udp-applicationstats.json"
template_name => "applicationstats"
template_overwrite => true
}
}
else if [type] == "clickstats" {
elasticsearch {
hosts => "localhost:9200"
index => "clickstats-%{+YYYY-MM-dd}"
template => "/opt/logstash/templates/udp-clickstats.json"
template_name => "clickstats"
template_overwrite => true
}
}
else if [type] == "generallog" {
elasticsearch {
hosts => "localhost:9200"
index => "generallog-%{+YYYY-MM-dd}"
template => "/opt/logstash/templates/udp-generallog.json"
template_name => "generallog"
template_overwrite => true
}
}
else{
elasticsearch {
hosts => "localhost:9200"
index => "logstash-%{+YYYY-MM-dd}"
}
}
}
As per the error message, the mutation which you're trying to do could be wrong. Could you please change your mutate as below:
mutate {
convert => { "geoip" => "float" }
convert => { "coordinates" => "float" }
}
I guess you've given the mutate as an array, and it's a hash type by origin. Try converting both the values individually. Your database path for geoip seems to be fine in your filter. Is that the whole error which you've mentioned in the question? If not update the question with the whole error if possible.
Refer here, for in depth explanations.
I'm trying to input a csv file to elasticsearch through logstash.
That's my configuration file
input {
file {
codec => plain{
charset => "ISO-8859-1"
}
path => ["PATH/*.csv"]
sincedb_path => "PATH/.sincedb_path"
start_position => "beginning"
}
}
filter {
if [message] =~ /^"ID","DATE"/ {
drop { }
}
date {
match => [ "DATE","yyyy-MM-dd HH:mm:ss" ]
target => "DATE"
}
csv {
columns => ["ID","DATE",...]
separator => ","
source => message
remove_field => ["message","host","path","#version","#timestamp"]
}
}
output {
elasticsearch {
embedded => false
host => "localhost"
cluster => "elasticsearch"
node_name => "localhost"
index => "index"
index_type => "type"
}
}
Now, the mapping produced in elasticsearch types the DATE field as string. I would like to type as a date field.
In the filter element, I tried to convert the type field in date but it doesn't work.
How can I fix that ?
Regards,
Alexandre
You have your filter chain setup in the wrong order. The date{} block needs to come after the csv {} block.