Logstash (Extractic parts of fields using regex) - elasticsearch

I am using the Kafka plugin to input data into logstash from kafka.
input {
kafka {
bootstrap_servers => ["{{ kafka_bootstrap_server }}"]
codec => "json"
group_id => "{{ kafka_consumer_group_id }}"
auto_offset_reset => "earliest"
topics_pattern => ".*" <- This line ensures it reads from all kafka topics
decorate_events => true
add_field => { "[#metadata][label]" => "kafka-read" }
}
}
The kafka topics are of the format
ingest-abc &
ingest-xyz
I use the following filter to specify the ES index where it should end up by setting the [#metadata][index_prefix] field.
filter {
mutate {
add_field => {
"[#metadata][index_prefix]" => "%{[#metadata][kafka][topic]}"
}
remove_field => ["[kafka][partition]", "[kafka][key]"]
}
if [message] {
mutate {
add_field => { "[pipeline_metadata][normalizer][original_raw_message]" => "%{message}" }
}
}
}
So my es indexes end up being
ingest-abc-YYYY-MM-DD
ingest-xyz-YYYY-MM-DD
How do I set the index_prefix to
abc-YYYY-MM-DD & xyz-YYYY-MM-DD instead
by getting rid of the commong ingest- prefix
The regex that matches it is: (?!ingest)\b(?!-)\S+
But I am not sure where it would fit in the config.
Thanks!

Ok so I figured it out if anyone ever stumbles on a similar problem,
I basically used a gsub filter instead of filters and grok,
This replaces any matching text with the passed text in argument3
filter {
mutate {
rename => { "[#metadata][kafka]" => "kafka"}
gsub => [ "[#metadata][index_prefix]", "ingest-", "" ]
}
}

Related

Removing grok matched field after using it

I use filebeat to fetch log files into my logstash and then filter unnecessary fields. Everything works fine and I output these into elasticsearch but there is a field which I use for elasticsearch index name, I define this variable in my grok match but I couldn't find a way to remove that variable once it serves its purpose. I'll share my logstash config below
input {
beats {
port => "5044"
}
}
filter {
grok {
match => { "[log][file][path]" => ".*(\\|\/)(?<myIndex>.*)(\\|\/).*.*(\\|\/).*(\\|\/).*(\\|\/).*(\\|\/)" }
}
json {
source => message
}
mutate {
remove_field => ["agent"]
remove_field => ["input"]
remove_field => ["#metadata"]
remove_field => ["log"]
remove_field => ["tags"]
remove_field => ["host"]
remove_field => ["#version"]
remove_field => ["message"]
remove_field => ["event"]
remove_field => ["ecs"]
}
date {
match => ["t","yyyy-MM-dd HH:mm:ss.SSS"]
remove_field => ["t"]
}
mutate {
rename => ["l","log_level"]
rename => ["mt","msg_template"]
rename => ["p","log_props"]
}
}
output {
elasticsearch {
hosts => [ "localhost:9222" ]
index => "%{myIndex}"
}
stdout { codec => rubydebug { metadata => true } }
}
I just want to remove the "myIndex" field from my index. With this config file, I see this field in elasticsearch if possible I want to remove it. I've tried to remove it with other fields altogether but it gave an error. I guess it's because I removed it before logstash could give it to elasticsearch.
Create the field under [#metadata]. Those fields are available to use in logstash but are ignored by outputs unless they use a rubydebug codec.
Adjust your grok filter
match => { "[log][file][path]" => ".*(\\|\/)(?<[#metadata][myIndex]>.*)(\\|\/).*.*(\\|\/).*(\\|\/).*(\\|\/).*(\\|\/)" }
Delete [#metadata] from the mutate+remove_field and change the output configuration to have
index => "%{[#metadata][myIndex]}"

Logstash Grok filter pattern for Oracle RDS XML Audit Logs

I would like create a logstash grok pattern to parse the below oracle audit log and extract only the values from "<AuditRecord> to </AuditRecord>"
{"messageType":"DATA_MESSAGE","owner":"656565656566","logGroup":"/aws/rds/instance/stg/audit","logStream":"STG_ora_20067_20210906120520144010741320.xml","subscriptionFilters":["All logs"],"logEvents":[{"id":"36370952585791240628335082776414249187626811417307774976","timestamp":1630929920144,"message":<AuditRecord><Audit_Type>8</Audit_Type><EntryId>1</EntryId><Extended_Timestamp>2021-08-31T13:25:20.140969Z</Extended_Timestamp><DB_User>/</DB_User><OS_User>rdsdb</OS_User><Userhost>ip-172-27-1-72</Userhost><OS_Process>6773</OS_Process><Instance_Number>0</Instance_Number><Returncode>0</Returncode><OSPrivilege>SYSDBA</OSPrivilege><DBID>918393906</DBID> <Sql_Text>CONNECT</Sql_Text> </AuditRecord>"}]}
these logs are stored in s3 and in gz format. I am using below config for Logstash but its not working.
input {
s3 {
bucket => "s3bucket"
type => "oracle-audit-log-xml"
region => "eu-west-1"
}
}
filter {
## For Oracle audit log
if [type] == "oracle-audit-log-xml" {
mutate { gsub => [ "message", "[\n]", "" ] }
grok {
match => [ "message","<AuditRecord>%{DATA:temp_audit_message}</AuditRecord>" ]
}
mutate {
add_field => { "audit_message" => "<AuditRecord>%{temp_audit_message}</AuditRecord>" }
}
xml {
store_xml => true
source => "audit_message"
target => "audit"
}
mutate {
add_field => { "timestamp" => "%{[audit][Extended_Timestamp]}" }
}
date {
match => [ "timestamp","yyyy-MM-dd'T'HH:mm:ss.SSSSSS'Z'","ISO8601" ]
target => "#timestamp"
}
# remove temporary fields
mutate { remove_field => ["message", "audit_message", "temp_audit_message"] }
if "_grokparsefailure" in [tags] {
drop{}
}
}
}
output {
amazon_es {
hosts => ["elasticsearch url"]
index => "rdslogs-%{+YYYY.MM.dd}"
region => "eu-west-1"
aws_access_key_id => ''
aws_secret_access_key => ''
}
}
it seems to be an issue with the line below
{"messageType":"DATA_MESSAGE","owner":"656565656566","logGroup":"/aws/rds/instance/stg/audit","logStream":"STG_ora_20067_20210906120520144010741320.xml","subscriptionFilters":["All logs"],"logEvents":[{"id":"36370952585791240628335082776414249187626811417307774976","timestamp":1630929920144,"message":
is there any way we can modify this to drop the above line.
Thanks
You don't need a grok pattern as your logs are in JSON format. Install logstash json filter plugin.
$ logstash-plugin install logstash-filter-json
And add filter setting to like below to parse your logs.
filter{
json {
source => "message"
}
}
Can check attached screenshot from my local ELK setup. Tried to parse log line provided by you.

Show Kafka topic title as a field in Kibana, logstash add_field?

I have logstash with ElasticSearch & Kibana 7.6.2
I connect logstash to Kafka as follows:
input {
kafka {
bootstrap_servers => "******"
topics_pattern => [".*"]
decorate_events => true
add_field => { "[topic_name]" => "%{[#metadata][kafka][topic]}"}
}
}
filter {
date {
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash"
document_type => "logs"
}
}
It's OK and work. But I field topic_name show as %{[#metadata][kafka][topic]}
How can I fix it?
The syntax of the sprintf format you are using ( %{[#metadata][kafka][topic]} ) to get the value of that field is correct.
Allegedly there is no such field #metadata.kafka.topic in your document. Therefore the sprintf can't obtain the field value and as a result, the newly created field contains the sprintf call as a string.
However, since you set decorate_events => true, the metadata fields should be available as stated in the documentation (https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html):
Metadata is only added to the event if the decorate_events option is set to true (it defaults to false).
I can imagine that the add_field action set in the input plugin causes the issue. Since the decorate_events option first enables the addition of the metadata fields, the add_field action should come at second place after the input plugin.
Your configuration would then look like this:
input {
kafka {
bootstrap_servers => "******"
topics_pattern => [".*"]
decorate_events => true
}
}
filter {
mutate{
add_field => { "[topic_name]" => "%{[#metadata][kafka][topic]}"}
}
date {
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash"
document_type => "logs"
}
}
How about
add_field => { "topic_name" => "%{[#metadata][kafka][topic]}"}
i.e. [topic_name] -> topic_name

logstash 5.0.1: setup elasticsearch multiple indexes ouput for multiple kafka input topics

I have a logstash input setup as
input {
kafka {
bootstrap_servers => "zookeper_address"
topics => ["topic1","topic2"]
}
}
I need to feed the topics into two different indexes in elasticsearch. Can anyone help me with how the ouput should be setup for such a task. At this time I am only able to setup
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "my_index"
codec => "json"
document_id => "%{id}"
}
}
I need two indexes on the same elasticsearch instance say index1 and index2 which will be fed by messages coming in on topic1 and topic2
First, you need to add decorate_events to your kafka input in order to know from which topic the message is coming
input {
kafka {
bootstrap_servers => "zookeper_address"
topics => ["topic1","topic2"]
decorate_events => true
}
}
Then, you have two options, both involving conditional logic. The first is by introducing a filter for adding the correct index name depending on the topic name. For this you need to add
filter {
if [kafka][topic] == "topic1" {
mutate {
add_field => {"[#metadata][index]" => "index1"}
}
} else {
mutate {
add_field => {"[#metadata][index]" => "index2"}
}
}
# remove the field containing the decorations, unless you want them to land into ES
mutate {
remove_field => ["kafka"]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "%{[#metadata][index]}"
codec => "json"
document_id => "%{id}"
}
}
Then second option is to do the if/else directly in the output section, like this (but the additional kafka field will land into ES):
output {
if [#metadata][kafka][topic] == "topic1" {
elasticsearch {
hosts => ["localhost:9200"]
index => "index1"
codec => "json"
document_id => "%{id}"
}
} else {
elasticsearch {
hosts => ["localhost:9200"]
index => "index2"
codec => "json"
document_id => "%{id}"
}
}
}

How to type data input in logstash

I'm trying to input a csv file to elasticsearch through logstash.
That's my configuration file
input {
file {
codec => plain{
charset => "ISO-8859-1"
}
path => ["PATH/*.csv"]
sincedb_path => "PATH/.sincedb_path"
start_position => "beginning"
}
}
filter {
if [message] =~ /^"ID","DATE"/ {
drop { }
}
date {
match => [ "DATE","yyyy-MM-dd HH:mm:ss" ]
target => "DATE"
}
csv {
columns => ["ID","DATE",...]
separator => ","
source => message
remove_field => ["message","host","path","#version","#timestamp"]
}
}
output {
elasticsearch {
embedded => false
host => "localhost"
cluster => "elasticsearch"
node_name => "localhost"
index => "index"
index_type => "type"
}
}
Now, the mapping produced in elasticsearch types the DATE field as string. I would like to type as a date field.
In the filter element, I tried to convert the type field in date but it doesn't work.
How can I fix that ?
Regards,
Alexandre
You have your filter chain setup in the wrong order. The date{} block needs to come after the csv {} block.

Resources