I use filebeat to fetch log files into my logstash and then filter unnecessary fields. Everything works fine and I output these into elasticsearch but there is a field which I use for elasticsearch index name, I define this variable in my grok match but I couldn't find a way to remove that variable once it serves its purpose. I'll share my logstash config below
input {
beats {
port => "5044"
}
}
filter {
grok {
match => { "[log][file][path]" => ".*(\\|\/)(?<myIndex>.*)(\\|\/).*.*(\\|\/).*(\\|\/).*(\\|\/).*(\\|\/)" }
}
json {
source => message
}
mutate {
remove_field => ["agent"]
remove_field => ["input"]
remove_field => ["#metadata"]
remove_field => ["log"]
remove_field => ["tags"]
remove_field => ["host"]
remove_field => ["#version"]
remove_field => ["message"]
remove_field => ["event"]
remove_field => ["ecs"]
}
date {
match => ["t","yyyy-MM-dd HH:mm:ss.SSS"]
remove_field => ["t"]
}
mutate {
rename => ["l","log_level"]
rename => ["mt","msg_template"]
rename => ["p","log_props"]
}
}
output {
elasticsearch {
hosts => [ "localhost:9222" ]
index => "%{myIndex}"
}
stdout { codec => rubydebug { metadata => true } }
}
I just want to remove the "myIndex" field from my index. With this config file, I see this field in elasticsearch if possible I want to remove it. I've tried to remove it with other fields altogether but it gave an error. I guess it's because I removed it before logstash could give it to elasticsearch.
Create the field under [#metadata]. Those fields are available to use in logstash but are ignored by outputs unless they use a rubydebug codec.
Adjust your grok filter
match => { "[log][file][path]" => ".*(\\|\/)(?<[#metadata][myIndex]>.*)(\\|\/).*.*(\\|\/).*(\\|\/).*(\\|\/).*(\\|\/)" }
Delete [#metadata] from the mutate+remove_field and change the output configuration to have
index => "%{[#metadata][myIndex]}"
Related
I would like create a logstash grok pattern to parse the below oracle audit log and extract only the values from "<AuditRecord> to </AuditRecord>"
{"messageType":"DATA_MESSAGE","owner":"656565656566","logGroup":"/aws/rds/instance/stg/audit","logStream":"STG_ora_20067_20210906120520144010741320.xml","subscriptionFilters":["All logs"],"logEvents":[{"id":"36370952585791240628335082776414249187626811417307774976","timestamp":1630929920144,"message":<AuditRecord><Audit_Type>8</Audit_Type><EntryId>1</EntryId><Extended_Timestamp>2021-08-31T13:25:20.140969Z</Extended_Timestamp><DB_User>/</DB_User><OS_User>rdsdb</OS_User><Userhost>ip-172-27-1-72</Userhost><OS_Process>6773</OS_Process><Instance_Number>0</Instance_Number><Returncode>0</Returncode><OSPrivilege>SYSDBA</OSPrivilege><DBID>918393906</DBID> <Sql_Text>CONNECT</Sql_Text> </AuditRecord>"}]}
these logs are stored in s3 and in gz format. I am using below config for Logstash but its not working.
input {
s3 {
bucket => "s3bucket"
type => "oracle-audit-log-xml"
region => "eu-west-1"
}
}
filter {
## For Oracle audit log
if [type] == "oracle-audit-log-xml" {
mutate { gsub => [ "message", "[\n]", "" ] }
grok {
match => [ "message","<AuditRecord>%{DATA:temp_audit_message}</AuditRecord>" ]
}
mutate {
add_field => { "audit_message" => "<AuditRecord>%{temp_audit_message}</AuditRecord>" }
}
xml {
store_xml => true
source => "audit_message"
target => "audit"
}
mutate {
add_field => { "timestamp" => "%{[audit][Extended_Timestamp]}" }
}
date {
match => [ "timestamp","yyyy-MM-dd'T'HH:mm:ss.SSSSSS'Z'","ISO8601" ]
target => "#timestamp"
}
# remove temporary fields
mutate { remove_field => ["message", "audit_message", "temp_audit_message"] }
if "_grokparsefailure" in [tags] {
drop{}
}
}
}
output {
amazon_es {
hosts => ["elasticsearch url"]
index => "rdslogs-%{+YYYY.MM.dd}"
region => "eu-west-1"
aws_access_key_id => ''
aws_secret_access_key => ''
}
}
it seems to be an issue with the line below
{"messageType":"DATA_MESSAGE","owner":"656565656566","logGroup":"/aws/rds/instance/stg/audit","logStream":"STG_ora_20067_20210906120520144010741320.xml","subscriptionFilters":["All logs"],"logEvents":[{"id":"36370952585791240628335082776414249187626811417307774976","timestamp":1630929920144,"message":
is there any way we can modify this to drop the above line.
Thanks
You don't need a grok pattern as your logs are in JSON format. Install logstash json filter plugin.
$ logstash-plugin install logstash-filter-json
And add filter setting to like below to parse your logs.
filter{
json {
source => "message"
}
}
Can check attached screenshot from my local ELK setup. Tried to parse log line provided by you.
I have logstash with ElasticSearch & Kibana 7.6.2
I connect logstash to Kafka as follows:
input {
kafka {
bootstrap_servers => "******"
topics_pattern => [".*"]
decorate_events => true
add_field => { "[topic_name]" => "%{[#metadata][kafka][topic]}"}
}
}
filter {
date {
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash"
document_type => "logs"
}
}
It's OK and work. But I field topic_name show as %{[#metadata][kafka][topic]}
How can I fix it?
The syntax of the sprintf format you are using ( %{[#metadata][kafka][topic]} ) to get the value of that field is correct.
Allegedly there is no such field #metadata.kafka.topic in your document. Therefore the sprintf can't obtain the field value and as a result, the newly created field contains the sprintf call as a string.
However, since you set decorate_events => true, the metadata fields should be available as stated in the documentation (https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html):
Metadata is only added to the event if the decorate_events option is set to true (it defaults to false).
I can imagine that the add_field action set in the input plugin causes the issue. Since the decorate_events option first enables the addition of the metadata fields, the add_field action should come at second place after the input plugin.
Your configuration would then look like this:
input {
kafka {
bootstrap_servers => "******"
topics_pattern => [".*"]
decorate_events => true
}
}
filter {
mutate{
add_field => { "[topic_name]" => "%{[#metadata][kafka][topic]}"}
}
date {
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash"
document_type => "logs"
}
}
How about
add_field => { "topic_name" => "%{[#metadata][kafka][topic]}"}
i.e. [topic_name] -> topic_name
I have the following config file for logstash:
input {
file {
path => "/home/elk/data/visits.csv"
start_position => "beginning"
sincedb_path => "NUL"
}
}
filter {
csv {
separator => ","
columns => ["estado","tiempo_demora","poblacion","id_poblacion","edad_valor","cp","latitude_corregida","longitud_corregida","patologia","Fecha","id_tipo","id_personal","nasistencias","menor","Geopoint_corregido"]
}
date {
match => ["Fecha","dd-MM-YYYY HH:mm"]
target => "Fecha"
}
mutate {convert => ["nasistencias", "integer"]}
mutate {convert => ["id_poblacion", "integer"]}
mutate {convert => ["id_personal", "integer"]}
mutate {convert => ["id_tipo", "integer"]}
mutate {convert => ["cp", "integer"]}
mutate {convert => ["edad_valor", "integer"]}
mutate {
convert => { "longitud_corregida" => "float" }
convert => { "latitude_corregida" => "float" }
}
mutate {
rename => {
"longitud_corregida" => "[location][lon]"
"latitude_corregida" => "[location][lat]"
}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "medicalvisits-%{+dd.MM.YYYY}"
}
stdout {
codec => json_lines
codec => rubydebug
}
}
From there Fecha should have been sent to elasticsearch as date, but in kibana, when I try to set it as timestamp, it doesn't appear, and it shows as string:
Any idea what am I doing wrong here?
The types in your index patterns are not the same as the types in your index templates (the information actually being stored).
I would suggest to you, that you should overwrite the timestamp with the information sent by your logstash. After all, what's important to you in most cases, is the timestamp of the event, not the timestamp of the time the event was sent to your elasticsearch.
With this being said, why you just dont save your "Fecha" directly into your "#timestamp" by means of "date" filter in logstash. Like this:
date {
match => ["Fecha","dd-MM-YYYY HH:mm"]
target => "#timestamp"
tag_on_failure => ["fallo_filtro_fecha"]
Another option if you really need that "Fecha" algonside with #timestamp (not the best idea), and Fecha being of type "date", is to modify your index mapping to change that field type to date. Like this (adjust as necessary):
PUT /nombre_de_tu_indice/_mapping
{
"properties": {
"Fecha": {
"type": "date",
}
}
}
Of course, this change will only affect new indexed indices or a re-indexed one.
I have a logstash configuration that has as filter like this:
filter {
grok {
match => { "message" => "%{GREEDYDATA:inputs}"}
}
json {
source => "inputs"
target => "parsedJson"
remove_field => ["inputs"]
}
mutate {
add_field => {
"serviceName" => "%{[parsedJson][serviceName]}"
"thread_name" => "%{[parsedJson][thread_name]}"
}
}
}
It is working and I am getting field/variables names such as serviceName and thread_name in Elastic/Kibana. However, I am also getting some unwanted additional things, which I believe are due to the mutate:
unwanted grok output
as you can see, there are additional "parsedJson.[field_name]" fields that are repeated. I've played with the json and mutate portion, but I can't seem to figure this out. Any help appreciated, Thanks.
Use remove_field in mutate filter.
mutate {
remove_field => [ "[parsedJson][message]", "[parsedJson][serviceName]", "[parsedJson][thread_name]" ]
}
I am trying to import csv into elasticsearch using logstash
I have tried using two ways:
Using CSV
Using grok filter
1) For csv below is my logstash file:
input {
file {
path => "path_to_my_csv.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
separator => ","
columns => ["col1","col2_datetime"]
}
mutate {convert => [ "col1", "float" ]}
date {
locale => "en"
match => ["col2_datetime", "ISO8601"] // tried this one also - match => ["col2_datetime", "yyyy-MM-dd HH:mm:ss"]
timezone => "Asia/Kolkata"
target => "#timestamp" // tried this one also - target => "col2_datetime"
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "my_collection"
}
stdout {}
}
2) Using grok filter:
For grok filter below is my logstash file
input {
file {
path => "path_to_my_csv.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
grok {
match => { "message" => "(?<col1>(?:%{BASE10NUM})),(%{TIMESTAMP_ISO8601:col2_datetime})"}
remove_field => [ "message" ]
}
date {
match => ["col2_datetime", "yyyy-MM-dd HH:mm:ss"]
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "my_collection_grok"
}
stdout {}
}
PROBLEM:
So when I run both the files individually, I am able to import the data in elasticsearch. But my date field is not parsed as of datetime type rather it has been saved as string and because of that I am not able to run the date filters.
So can someone help me to figure out why it's happening.
My elasticsearch version is 5.4.1.
Thanks in advance
There are 2 changes I made to your config file.
1) remove the under_score in the column name col2_datetime
2) add target
Here is how my config file look like...
vi logstash.conf
input {
file {
path => "/config-dir/path_to_my_csv.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
separator => ","
columns => ["col1","col2"]
}
mutate {convert => [ "col1", "float" ]}
date {
locale => "en"
match => ["col2", "yyyy-MM-dd HH:mm:ss"]
target => "col2"
}
}
output {
elasticsearch {
hosts => "http://172.17.0.1:9200"
index => "my_collection"
}
stdout {}
}
Here is the data file:
vi path_to_my_csv.csv
1234365,2016-12-02 19:00:52
1234368,2016-12-02 15:02:02
1234369,2016-12-02 15:02:07