i could see some events are missing while reporting logs to elastic search. Take an example i am sending 5 logs event only 4 or 3 are reporting.
Basically i am using logstash 7.4 to read my log messages and store the information on elastic search 7.4. below is my logstash configuration
input {
file {
type => "web"
path => ["/Users/a0053/Downloads/logs/**/*-web.log"]
start_position => "beginning"
sincedb_path => "/tmp/sincedb_file"
codec => multiline {
pattern => "^(%{MONTHDAY}-%{MONTHNUM}-%{YEAR} %{TIME}) "
negate => true
what => previous
}
}
}
filter {
if [type] == "web" {
grok {
match => [ "message","(?<frontendDateTime>%{MONTHDAY}-%{MONTHNUM}-%{YEAR} %{TIME})%{SPACE}(\[%{DATA:thread}\])?( )?%{LOGLEVEL:level}%{SPACE}%{USERNAME:zhost}%{SPACE}%{JAVAFILE:javaClass} %{USERNAME:orgId} (?<loginId>[\w.+=:-]+#[0-9A-Za-z][0-9A-Za-z-]{0,62}(?:[.](?:[0-9A-Za-z][0-9A-Za-zāā-]{0,62}))*) %{GREEDYDATA:jsonstring}"]
}
json {
source => "jsonstring"
target => "parsedJson"
remove_field=>["jsonstring"]
}
mutate {
add_field => {
"actionType" => "%{[parsedJson][actionType]}"
"errorMessage" => "%{[parsedJson][errorMessage]}"
"actionName" => "%{[parsedJson][actionName]}"
"Payload" => "%{[parsedJson][Payload]}"
"pageInfo" => "%{[parsedJson][pageInfo]}"
"browserInfo" => "%{[parsedJson][browserInfo]}"
"dateTime" => "%{[parsedJson][dateTime]}"
}
}
}
}
output{
if "_grokparsefailure" in [tags]
{
elasticsearch
{
hosts => "localhost:9200"
index => "grokparsefailure-%{+YYYY.MM.dd}"
}
}
else {
elasticsearch
{
hosts => "localhost:9200"
index => "zindex"
}
}
stdout{codec => rubydebug}
}
As keep on new logs are writing to log files, i could see a difference of log counts.
Any suggestions would be appreciated.
I have logstash with ElasticSearch & Kibana 7.6.2
I connect logstash to Kafka as follows:
input {
kafka {
bootstrap_servers => "******"
topics_pattern => [".*"]
decorate_events => true
add_field => { "[topic_name]" => "%{[#metadata][kafka][topic]}"}
}
}
filter {
date {
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash"
document_type => "logs"
}
}
It's OK and work. But I field topic_name show as %{[#metadata][kafka][topic]}
How can I fix it?
The syntax of the sprintf format you are using ( %{[#metadata][kafka][topic]} ) to get the value of that field is correct.
Allegedly there is no such field #metadata.kafka.topic in your document. Therefore the sprintf can't obtain the field value and as a result, the newly created field contains the sprintf call as a string.
However, since you set decorate_events => true, the metadata fields should be available as stated in the documentation (https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html):
Metadata is only added to the event if the decorate_events option is set to true (it defaults to false).
I can imagine that the add_field action set in the input plugin causes the issue. Since the decorate_events option first enables the addition of the metadata fields, the add_field action should come at second place after the input plugin.
Your configuration would then look like this:
input {
kafka {
bootstrap_servers => "******"
topics_pattern => [".*"]
decorate_events => true
}
}
filter {
mutate{
add_field => { "[topic_name]" => "%{[#metadata][kafka][topic]}"}
}
date {
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash"
document_type => "logs"
}
}
How about
add_field => { "topic_name" => "%{[#metadata][kafka][topic]}"}
i.e. [topic_name] -> topic_name
I have a tomcat log of below format
10.0.6.35 - - [21/Oct/2019:00:00:04 +0000] "GET /rest/V1/productlist/category/4259/ar/final_price/asc/4/20 HTTP/1.1" 200 14970 12
I want to create the field of last two column which is bytes and duration and want to analyze it using Kibana. I had used Filebeat and Logstash for transferring data to the Elasticsearch.
My Logstash configuration file is below:
I had tried with below configuration but not able to see the field on kibana.
input {
beats {
port => 5044
}
}
filter {
grok {
match => ["message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes}(?m) %{NUMBER:duration}" ]
#match=>{"duration"=> "%{NUMBER:duration}"}
# match => { "message" => "%{COMBINEDAPACHELOG}" }
}
# mutate {
# remove_field => ["#version", "#timestamp"]
# }
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
output {
if [fields][log_type] == "access-log"
{
elasticsearch {
hosts => ["172.31.30.73:9200"]
index => "%{[fields][service]}-%{+YYYY.MM.dd}"
}
}
if [fields][log_type] == "application-log"
{
elasticsearch {
hosts => ["172.31.30.73:9200"]
index => "%{[fields][service]}-%{+YYYY.MM.dd}"
}
}
else
{
elasticsearch {
hosts => ["172.31.30.73:9200"]
index => "logstashhh-%{+YYYY.MM.dd}"
}
I want that duration and bytes becomes my field on Kibana for visualization.
Try this as your logstash configuration:
input {
beats {
port => 5044
}
}
filter {
grok {
match => ["message" => "%{NUMBER:bytes}(?m) %{NUMBER:duration}$" ]
#match=>{"duration"=> "%{NUMBER:duration}"}
# match => { "message" => "%{COMBINEDAPACHELOG}" }
}
# mutate {
# remove_field => ["#version", "#timestamp"]
# }
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
output {
if [fields][log_type] == "access-log"
{
elasticsearch {
hosts => ["172.31.30.73:9200"]
index => "%{[fields][service]}-%{+YYYY.MM.dd}"
}
}
if [fields][log_type] == "application-log"
{
elasticsearch {
hosts => ["172.31.30.73:9200"]
index => "%{[fields][service]}-%{+YYYY.MM.dd}"
}
}
else
{
elasticsearch {
hosts => ["172.31.30.73:9200"]
index => "logstashhh-%{+YYYY.MM.dd}"
}
My question is related to logstash grok pattern. I created below pattern that's working fine but the big problem is not string values. Sometimes; "Y" and "age" can be null so my grok pattern not create any log in elasticseach. It is not working properly. I need to tell my grok pattern :
if(age is null || age i empty){
updatefield["age",0]
}
but I don't know how to make it. by the way; I checked many solutions by googling but it is directly related to my problem.
input {
file {
path => ["C:/log/*.log"]
start_position => "beginning"
discover_interval => 10
stat_interval => 10
sincedb_write_interval => 10
close_older => 10
codec => multiline {
pattern => "^%{TIMESTAMP_ISO8601}\|"
negate => true
what => "previous"
}
}
}
filter {
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:formattedDate}.* X: %{DATA:X} Y: %{NUMBER:Y} Z: %{DATA:Z} age: %{NUMBER:age:int} "}
}
date {
timezone => "Europe/Istanbul"
match => ["TimeStamp", "ISO8601"]
}
json{
source => "request"
target => "parsedJson"
}
mutate {
remove_field => [ "path","message","tags","#version"]
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
hosts => [ "http://localhost:9200" ]
index => "logstash-%{+YYYY.MM}"
}
}
You can check if your fields exists or are empty using conditionals with your filter,
filter {
if ![age] or [age] == "" {
mutate {
update => { "age" => "0" }
}
}
}
I noticed that #timestamp field, which is correctly defined by filebeat, is changed automatically by logstash and its value is replaced with a log timestamp value (field name is a_timestamp).
Here is part of logstash debug log:
[2017-07-18T11:55:03,598][DEBUG][logstash.pipeline ] filter received {"event"=>{"#****timestamp"=>2017-07-18T09:54:53.507Z, "offset"=>498, "#version"=>"1", "input_type"=>"log", "beat"=>{"hostname"=>"centos-ea", "name"=>"filebeat_shipper_kp", "version"=>"5.5.0"}, "host"=>"centos-ea", "source"=>"/home/elastic/ELASTIC_NEW/log_bw/test.log", "message"=>"2017-06-05 19:02:46.779 INFO [bwEngThread:In-Memory Process Worker-4] psg.logger - a_applicationName=\"PieceProxy\", a_processName=\"piece.PieceProxy\", a_jobId=\"bw0a10ao\", a_processInstanceId=\"bw0a10ao\", a_level=\"Info\", a_phase=\"ProcessStart\", a_activityName=\"SetAndLog\", a_timeStamp=\"2017-06-05T19:02:46.779\", a_sessionId=\"\", a_sender=\"PCS\", a_cruid=\"37d7e225-bbe5-425b-8abc-f4b44a5a1560\", a_MachineCode=\"CFDM7757\", a_correlationId=\"fa10f\", a_trackingId=\"9d3b8\", a_message=\"START=piece.PieceProxy\"", "type"=>"log", "tags"=>["beats_input_codec_plain_applied"]}}
[2017-07-18T11:55:03,629][DEBUG][logstash.pipeline ] output received {"event"=>{"a_message"=>"START=piece.PieceProxy", "log"=>"INFO ", "bwthread"=>"[bwEngThread:In-Memory Process Worker-4]", "logger"=>"psg.logger ", "a_correlationId"=>"fa10f", "source"=>"/home/elastic/ELASTIC_NEW/log_bw/test.log", "a_trackingId"=>"9d3b8", "type"=>"log", "a_sessionId"=>"\"\"", "a_sender"=>"PCS", "#version"=>"1", "beat"=>{"hostname"=>"centos-ea", "name"=>"filebeat_shipper_kp", "version"=>"5.5.0"}, "host"=>"centos-ea", "a_level"=>"Info", "a_processName"=>"piece.PieceProxy", "a_cruid"=>"37d7e225-bbe5-425b-8abc-f4b44a5a1560", "a_activityName"=>"SetAndLog", "offset"=>498, "a_MachineCode"=>"CFDM7757", "input_type"=>"log", "message"=>"2017-06-05 19:02:46.779 INFO [bwEngThread:In-Memory Process Worker-4] psg.logger - a_applicationName=\"PieceProxy\", a_processName=\"piece.PieceProxy\", a_jobId=\"bw0a10ao\", a_processInstanceId=\"bw0a10ao\", a_level=\"Info\", a_phase=\"ProcessStart\", a_activityName=\"SetAndLog\", a_timeStamp=\"2017-06-05T19:02:46.779\", a_sessionId=\"\", a_sender=\"PCS\", a_cruid=\"37d7e225-bbe5-425b-8abc-f4b44a5a1560\", a_MachineCode=\"CFDM7757\", a_correlationId=\"fa10f\", a_trackingId=\"9d3b8\", a_message=\"START=piece.PieceProxy\"", "a_phase"=>"ProcessStart", "tags"=>["beats_input_codec_plain_applied", "_dateparsefailure", "kv_ok", "taskStarted"], "a_processInstanceId"=>"bw0a10ao", "#timestamp"=>2017-06-05T17:02:46.779Z, "my_index"=>"bw_logs", "a_timeStamp"=>"2017-06-05T19:02:46.779", "a_jobId"=>"bw0a10ao", "a_applicationName"=>"PieceProxy", "TMS"=>"2017-06-05 19:02:46.779"}}
NB:
I noticed that this doesn't happen with a simple pipeline (without grok, kv and other plugins I use in my custom pipeline)
I changed filebeat's property json.overwrite_keys to TRUE but with no success.
Can you explain me why and what happens with #_timestamp changing? I don't expect it to be done automatically (I saw many posts of people asking how to do that) because #timestamp is a system field.. What's wrong with that?
Here is my pipeline:
input {
beats {
port => "5043"
type => json
}
}
filter {
#date {
# match => [ "#timestamp", "ISO8601" ]
# target => "#timestamp"
#}
if "log_bw" in [source] {
grok {
patterns_dir => ["/home/elastic/ELASTIC_NEW/logstash-5.5.0/config/patterns/extrapatterns"]
match => { "message" => "%{CUSTOM_TMS:TMS}\s*%{CUSTOM_LOGLEVEL:log}\s*%{CUSTOM_THREAD:bwthread}\s*%{CUSTOM_LOGGER:logger}-%{CUSTOM_TEXT:text}" }
tag_on_failure => ["no_match"]
}
if "no_match" not in [tags] {
if "Payload for Request is" in [text] {
mutate {
add_field => { "my_index" => "json_request" }
}
grok {
patterns_dir => ["/home/elastic/ELASTIC_NEW/logstash-5.5.0/config/patterns/extrapatterns"]
match => { "text" => "%{CUSTOM_JSON:json_message}" }
}
json {
source => "json_message"
tag_on_failure => ["errore_parser_json"]
target => "json_request"
}
mutate {
remove_field => [ "json_message", "text" ]
}
}
else {
mutate {
add_field => { "my_index" => "bw_logs" }
}
kv {
source => "text"
trim_key => "\s"
field_split => ","
add_tag => [ "kv_ok" ]
}
if "kv_ok" not in [tags] {
drop { }
}
else {
mutate {
remove_field => [ "text" ]
}
if "ProcessStart" in [a_phase] {
mutate {
add_tag => [ "taskStarted" ]
}
}
if "ProcessEnd" in [a_phase] {
mutate {
add_tag => [ "taskTerminated" ]
}
}
date {
match => [ "a_timeStamp", "yyyy'-'MM'-'dd'T'HH:mm:ss.SSS" ]
}
elapsed {
start_tag => "taskStarted"
end_tag => "taskTerminated"
unique_id_field => "a_cruid"
}
}
}
}
}
else {
mutate {
add_field => { "my_index" => "other_products" }
}
}
}
output {
elasticsearch {
index => "%{my_index}"
hosts => ["localhost:9200"]
}
stdout { codec => rubydebug }
file {
path => "/tmp/loggata.tx"
codec => json
}
}
Thank you very much,
Andrea
This was the error (a typo from previous tests):
date {
match => [ "a_timeStamp", "yyyy'-'MM'-'dd'T'HH:mm:ss.SSS" ]
}
Thank you guys, anyway!