Logstash elapsed filter - elasticsearch

I am trying to use the elapsed.rb filter in the ELK stack and cant seem to figure it out. I am not very familiar with grok and I believe that is where my issue lives. Can anyone help?
Example Log Files:
{
"application_name": "Application.exe",
"machine_name": "Machine1",
"user_name": "testuser",
"entry_date": "2015-03-12T18:12:23.5187552Z",
"chef_environment_name": "chefenvironment1",
"chef_logging_cookbook_version": "0.1.9",
"logging_level": "INFO",
"performance": {
"process_name": "account_search",
"process_id": "Machine1|1|635617555435187552",
"event_type": "enter"
},
"thread_name": "1",
"logger_name": "TestLogger",
"#version": "1",
"#timestamp": "2015-03-12T18:18:48.918Z",
"type": "rabbit",
"log_from": "rabbit"
}
{
"application_name": "Application.exe",
"machine_name": "Machine1",
"user_name": "testuser",
"entry_date": "2015-03-12T18:12:23.7527462Z",
"chef_environment_name": "chefenvironment1",
"chef_logging_cookbook_version": "0.1.9",
"logging_level": "INFO",
"performance": {
"process_name": "account_search",
"process_id": "Machine1|1|635617555435187552",
"event_type": "exit"
},
"thread_name": "1",
"logger_name": "TestLogger",
"#version": "1",
"#timestamp": "2015-03-12T18:18:48.920Z",
"type": "rabbit",
"log_from": "rabbit"
}
Example .conf file
input {
rabbitmq {
host => "SERVERNAME"
add_field => ["log_from", "rabbit"]
type => "rabbit"
user => "testuser"
password => "testuser"
durable => "true"
exchange => "Logging"
queue => "testqueue"
codec => "json"
exclusive => "false"
passive => "true"
}
}
filter {
grok {
match => ["message", "%{TIMESTAMP_ISO8601} START id: (?<process_id>.*)"]
add_tag => [ "taskStarted" ]
}
grok {
match => ["message", "%{TIMESTAMP_ISO8601} END id: (?<process_id>.*)"]
add_tag => [ "taskTerminated"]
}
elapsed {
start_tag => "taskStarted"
end_tag => "taskTerminated"
unique_id_field => "process_id"
timeout => 10000
new_event_on_match => false
}
}
output {
file {
codec => json { charset => "UTF-8" }
path => "test.log"
}
}

You would not need to use a grok filter because your input is already in json format. You'd need to do something like this:
if [performance][event_type] == "enter" {
mutate { add_tag => ["taskStarted"] }
} else if [performance][event_type] == "exit" {
mutate { add_tag => ["taskTerminated"] }
}
elapsed {
start_tag => "taskStarted"
end_tag => "taskTerminated"
unique_id_field => "performance.process_id"
timeout => 10000
new_event_on_match => false
}
I'm not positive on that unique_id_field -- I think it should work, but if it doesn't you could just change it to process_id only and add_field => { "process_id" => "%{[performance][process_id]}" }

Related

Logstash nginx filter doesn't apply to half of rows

Using filebeat to push nginx logs to logstash and then to elasticsearch.
Logstash filter:
filter {
if [fileset][module] == "nginx" {
if [fileset][name] == "access" {
grok {
match => { "message" => ["%{IPORHOST:[nginx][access][remote_ip]} - %{DATA:[nginx][access][user_name]} \[%{HTTPDATE:[nginx][access][time]}\] \"%{WORD:[nginx][access][method]} %{DATA:[nginx][access][url]} HTTP/%{NUMBER:[nginx][access][http_version]}\" %{NUMBER:[nginx][access][response_code]} %{NUMBER:[nginx][access][body_sent][bytes]} \"%{DATA:[nginx][access][referrer]}\" \"%{DATA:[nginx][access][agent]}\""] }
remove_field => "message"
}
mutate {
add_field => { "read_timestamp" => "%{#timestamp}" }
}
date {
match => [ "[nginx][access][time]", "dd/MMM/YYYY:H:m:s Z" ]
remove_field => "[nginx][access][time]"
}
useragent {
source => "[nginx][access][agent]"
target => "[nginx][access][user_agent]"
remove_field => "[nginx][access][agent]"
}
geoip {
source => "[nginx][access][remote_ip]"
target => "[nginx][access][geoip]"
}
}
else if [fileset][name] == "error" {
grok {
match => { "message" => ["%{DATA:[nginx][error][time]} \[%{DATA:[nginx][error][level]}\] %{NUMBER:[nginx][error][pid]}#%{NUMBER:[nginx][error][tid]}: (\*%{NUMBER:[nginx][error][connection_id]} )?%{GREEDYDATA:[nginx][error][message]}"] }
remove_field => "message"
}
mutate {
rename => { "#timestamp" => "read_timestamp" }
}
date {
match => [ "[nginx][error][time]", "YYYY/MM/dd H:m:s" ]
remove_field => "[nginx][error][time]"
}
}
}
}
There is just one file /var/log/nginx/access.log.
In kibana, I see ± half of the rows with parsed message and other half - not.
All of the rows in kibana have a tag "beats_input_codec_plain_applied".
Examples from filebeat -e
Row that works fine:
"source": "/var/log/nginx/access.log",
"offset": 5405195,
"message": "...",
"fileset": {
"module": "nginx",
"name": "access"
}
Row that doesn't work fine (no "fileset"):
"offset": 5405397,
"message": "...",
"source": "/var/log/nginx/access.log"
Any idea what could be the cause?

geoip.location is defined as an object in mapping [doc] but this name is already used for a field in other types

I'm getting this error:
Could not index event to Elasticsearch. {:status=>400,
:action=>["index", {:_id=>nil, :_index=>"nginx-access-2018-06-15",
:_type=>"doc", :_routing=>nil}, #],
:response=>{"index"=>{"_index"=>"nginx-access-2018-06-15",
"_type"=>"doc", "_id"=>"jo-rfGQBDK_ao1ZhmI8B", "status"=>400,
"error"=>{"type"=>"illegal_argument_exception",
"reason"=>"[geoip.location] is defined as an object in mapping [doc]
but this name is already used for a field in other types"}}}}
I'm getting the above error but don't understand why, this is loading into a brand new ES instance with no data. This is the first record that is inserted. Why am I getting this error? Here is the config:
input {
file {
type => "nginx-access"
start_position => "beginning"
path => [ "/var/log/nginx-archived/access.log.small"]
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
if [type] == "nginx-access" {
grok {
patterns_dir => "/etc/logstash/patterns"
match => { "message" => "%{NGINX_ACCESS}" }
remove_tag => ["_grokparsefailure"]
}
geoip {
source => "visitor_ip"
}
date {
# 11/Jun/2018:06:23:45 +0000
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
target => "#request_time"
}
if "_grokparsefailure" not in [tags] {
ruby {
code => "
thetime = event.get('#request_time').time
event.set('index_date', 'nginx-access-' + thetime.strftime('%Y-%m-%d'))
"
}
}
if "_grokparsefailure" in [tags] {
ruby {
code => "
event.set('index_date', 'nginx-access-error')
"
}
}
}
}
output {
elasticsearch {
hosts => "elasticsearch:9200"
index => "%{index_date}"
template => "/etc/logstash/templates/nginx-access.json"
template_overwrite => true
manage_template => true
template_name => "nginx-access"
}
stdout { }
}
Here's a sample record:
{
"method" => "GET",
"#version" => "1",
"geoip" => {
"continent_code" => "AS",
"latitude" => 39.9289,
"country_name" => "China",
"ip" => "220.181.108.103",
"location" => {
"lon" => 116.3883,
"lat" => 39.9289
},
"region_code" => "11",
"region_name" => "Beijing",
"longitude" => 116.3883,
"timezone" => "Asia/Shanghai",
"city_name" => "Beijing",
"country_code2" => "CN",
"country_code3" => "CN"
},
"index_date" => "nginx-access-2018-06-15",
"ignore" => "\"-\"",
"bytes" => "2723",
"request" => "/wp-login.php",
"#request_time" => 2018-06-15T06:29:40.000Z,
"message" => "220.181.108.103 - - [15/Jun/2018:06:29:40 +0000] \"GET /wp-login.php HTTP/1.1\" 200 2723 \"-\" \"Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)\"",
"path" => "/var/log/nginx-archived/access.log.small",
"#timestamp" => 2018-07-09T01:32:56.952Z,
"host" => "ab1526efddec",
"visitor_ip" => "220.181.108.103",
"timestamp" => "15/Jun/2018:06:29:40 +0000",
"response" => "200",
"referrer" => "\"Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)\"",
"httpversion" => "1.1",
"type" => "nginx-access"
}
Figured out the answer, based on this:
https://www.elastic.co/guide/en/elasticsearch/reference/6.x/removal-of-types.html#_schedule_for_removal_of_mapping_types
The basic problem is that for each Elasticsearch index, each field must be the same type, even if the records are different types.
That is, if I have a person { "status": "A" } stored as text I cannot have a record for a car { "status": 23 } stored as a number in the same index. Based on the info in the link above, I'm storing one "type" per index.
My output section for Logstash looks like this:
output {
elasticsearch {
hosts => "elasticsearch:9200"
index => "%{index_date}"
# Can test loading this with:
# curl -XPUT -H 'Content-Type: application/json' -d#/docker-elk/logstash/templates/nginx-access.json http://localhost:9200/_template/nginx-access
template => "/etc/logstash/templates/nginx-access.json"
template_overwrite => true
manage_template => true
template_name => "nginx-access"
}
stdout { }
}
My template looks like this:
{
"index_patterns": ["nginx-access*"],
"settings": {
},
"mappings": {
"doc": {
"_source": {
"enabled": true
},
"properties": {
"type" : { "type": "keyword" },
"response_time": { "type": "float" },
"geoip" : {
"properties" : {
"location": {
"type": "geo_point"
}
}
}
}
}
}
}
I'm also using the one type per index pattern described in the link above.

How to preprocess a document before indexation?

I'm using logstash and elasticsearch to collect tweet using the Twitter plug in. My problem is that I receive a document from twitter and I would like to make some preprocessing before indexing my document. Let's say that I have this as a document result from twitter:
{
"tweet": {
"tweetId": 1025,
"tweetContent": "Hey this is a fake document for stackoverflow #stackOverflow #elasticsearch",
"hashtags": ["stackOverflow", "elasticsearch"],
"publishedAt": "2017 23 August",
"analytics": {
"likeNumber": 400,
"shareNumber": 100,
}
},
"author":{
"authorId": 819744,
"authorAt": "the_expert",
"authorName": "John Smith",
"description": "Haha it's a fake description"
}
}
Now out of this document that twitter is sending me I would like to generate two documents:
the first one will be indexed in twitter/tweet/1025 :
# The id for this document should be the one from tweetId `"tweetId": 1025`
{
"content": "Hey this is a fake document for stackoverflow #stackOverflow #elasticsearch", # this field has been renamed
"hashtags": ["stackOverflow", "elasticsearch"],
"date": "2017/08/23", # the date has been formated
"shareNumber": 100 # This field has been flattened
}
The second one will be indexed in twitter/author/819744:
# The id for this document should be the one from authorId `"authorId": 819744 `
{
"authorAt": "the_expert",
"description": "Haha it's a fake description"
}
I have defined my output as follow:
output {
stdout { codec => dots }
elasticsearch {
hosts => [ "localhost:9200" ]
index => "twitter"
document_type => "tweet"
}
}
How can I process the information from twitter?
EDIT:
So my full config file should look like:
input {
twitter {
consumer_key => "consumer_key"
consumer_secret => "consumer_secret"
oauth_token => "access_token"
oauth_token_secret => "access_token_secret"
keywords => [ "random", "word"]
full_tweet => true
type => "tweet"
}
}
filter {
clone {
clones => ["author"]
}
if([type] == "tweet") {
mutate {
remove_field => ["authorId", "authorAt"]
}
} else {
mutate {
remove_field => ["tweetId", "tweetContent"]
}
}
}
output {
stdout { codec => dots }
if [type] == "tweet" {
elasticsearch {
hosts => [ "localhost:9200" ]
index => "twitter"
document_type => "tweet"
document_id => "%{[tweetId]}"
}
} else {
elasticsearch {
hosts => [ "localhost:9200" ]
index => "twitter"
document_type => "author"
document_id => "%{[authorId]}"
}
}
}
You could use the clone filter plugin on logstash.
With a sample logstash configuration file that takes a JSON input from stdin and simply shows the output on stdout:
input {
stdin {
codec => json
type => "tweet"
}
}
filter {
mutate {
add_field => {
"tweetId" => "%{[tweet][tweetId]}"
"content" => "%{[tweet][tweetContent]}"
"date" => "%{[tweet][publishedAt]}"
"shareNumber" => "%{[tweet][analytics][shareNumber]}"
"authorId" => "%{[author][authorId]}"
"authorAt" => "%{[author][authorAt]}"
"description" => "%{[author][description]}"
}
}
date {
match => ["date", "yyyy dd MMMM"]
target => "date"
}
ruby {
code => '
event.set("hashtags", event.get("[tweet][hashtags]"))
'
}
clone {
clones => ["author"]
}
mutate {
remove_field => ["author", "tweet", "message"]
}
if([type] == "tweet") {
mutate {
remove_field => ["authorId", "authorAt", "description"]
}
} else {
mutate {
remove_field => ["tweetId", "content", "hashtags", "date", "shareNumber"]
}
}
}
output {
stdout {
codec => rubydebug
}
}
Using as input:
{"tweet": { "tweetId": 1025, "tweetContent": "Hey this is a fake document", "hashtags": ["stackOverflow", "elasticsearch"], "publishedAt": "2017 23 August","analytics": { "likeNumber": 400, "shareNumber": 100 } }, "author":{ "authorId": 819744, "authorAt": "the_expert", "authorName": "John Smith", "description": "fake description" } }
You would get these two documents:
{
"date" => 2017-08-23T00:00:00.000Z,
"hashtags" => [
[0] "stackOverflow",
[1] "elasticsearch"
],
"type" => "tweet",
"tweetId" => "1025",
"content" => "Hey this is a fake document",
"shareNumber" => "100",
"#timestamp" => 2017-08-23T20:36:53.795Z,
"#version" => "1",
"host" => "my-host"
}
{
"description" => "fake description",
"type" => "author",
"authorId" => "819744",
"#timestamp" => 2017-08-23T20:36:53.795Z,
"authorAt" => "the_expert",
"#version" => "1",
"host" => "my-host"
}
You could alternatively use a ruby script to flatten the fields, and then use rename on mutate, when necessary.
If you want elasticsearch to use authorId and tweetId, instead of default ID, you could probably configure elasticsearch output with document_id.
output {
stdout { codec => dots }
if [type] == "tweet" {
elasticsearch {
hosts => [ "localhost:9200" ]
index => "twitter"
document_type => "tweet"
document_id => "%{[tweetId]}"
}
} else {
elasticsearch {
hosts => [ "localhost:9200" ]
index => "twitter"
document_type => "tweet"
document_id => "%{[authorId]}"
}
}
}

Show location points in a tile map with kibi

I'm using logstash 2.3.1, elasticsearch 2.3.1 and kibi 0.3.2. I have problems visualizing locations in a map with kibi.
I have the following configuration in logstash:
input {
file {
path => "/opt/logstash-2.3.1/logTest/Dades.csv"
type => "Dades"
start_position => "beginning"
}
}
filter {
csv {
columns => ["c1", "c2", "c3", "c4", "c5", "c6", "c7", "c8", "c9", "c10", "c11", "c12", "c13", "c14", "c15", "c16", "c17", "c18", "c19", "c20", "c21", "c22", "c23"]
separator => ";"
}
ruby {
code => "
temp = event['c17']
event['c17'] = temp[0..1].to_f+ (temp[2..8].to_f/60)
temp = event['c19']
event['c19'] = temp[0..2].to_f+ (temp[3..8].to_f/60)
"
}
mutate {
convert => {
"c3" => "float"
"c5" => "float"
"c7" => "float"
"c9" => "float"
"c11" => "float"
"c13" => "float"
"c15" => "float"
"c21" => "float"
"c23" => "float"
}
}
date {
match => [ "c1", "dd/MM/YYYY HH:mm:ss.SSS", "ISO8601"]
target => "ts_date"
}
mutate {
rename => [ "c17", "[location][lat]",
"c19", "[location][lon]" ]
}
}
output {
elasticsearch {
hosts => localhost
index => "tram3"
manage_template => false
template => "tram3_template.json"
template_name => "tram3"
template_overwrite => "true"
}
stdout {
codec => rubydebug
}
}
The mapping configuration file (tram3_template.json) is like this:
{
"template": "tram3",
"order": 1,
"settings": {
"number_of_shards": 1
},
"mappings": {
"tram3": {
"_all": {
"enabled": false
},
"properties": {
"location": {
"type": "geo_point"
}
}
}
}
}
When I import de csv file to elasticsearch it seems that all works ok. The output is something like this:
{
"message" => "26/02/2016 00:00:22.984;Total;4231.143555;Trac1;26.547932;Trac2;-338.939697;AA1;-364.611511;AA2;3968.135010;Reo1;0.000000;Reo2;0.000000;Latitud;4125.1846;Longitud;00213.5219;Speed;0.000000;CVS;3873.429443;\r",
"#version" => "1",
"#timestamp" => "2016-04-25T14:02:52.901Z",
"path" => "/opt/logstash-2.3.1/logTest/Dades.csv",
"host" => "ubuntu",
"type" => "Dades",
"c1" => "26/02/2016 00:00:22.984",
"c2" => "Total",
"c3" => 4231.143555,
"c4" => "Trac1",
"c5" => 26.547932,
"c6" => "Trac2",
"c7" => -338.939697,
"c8" => "AA1",
"c9" => -364.611511,
"c10" => "AA2",
"c11" => 3968.13501,
"c12" => "Reo1",
"c13" => 0.0,
"c14" => "Reo2",
"c15" => 0.0,
"c16" => "Latitud",
"c18" => "Longitud",
"c20" => "Speed",
"c21" => 0.0,
"c22" => "CVS",
"c23" => 3873.429443,
"column24" => nil,
"ts_date" => "2016-02-25T23:00:22.984Z",
"location" => {
"lat" => 41.41974333333334,
"lon" => 2.22535
}
}
But when I try to visualize the location parameter in a map it doesn't show any result:
I don't know what I'm doing wrong. Why the location point doesn't appear in the map?
In your ES mapping file, you probably need to enable the storage of the geohash sub-field (defaults to false) as the geohash aggregation cannot work without it.
{
"template": "tram3",
"order": 1,
"settings": {
"number_of_shards": 1
},
"mappings": {
"tram3": {
"_all": {
"enabled": false
},
"properties": {
"location": {
"type": "geo_point",
"geohash": true, <-- add this
"geohash_prefix": true <-- add this
}
}
}
}
}
Then you can build a geohash aggregation on the location.geohash field
Note that if you want to also index all geohash prefixes, you can also add "geohash_prefix": true to your field mapping.
UPDATE
After reproducing the case, here are some more fixes to do:
You need to change the type in your file input as it will be used as the document type and your mapping specifies that the mapping type is named dades2 not Dades:
file {
path => "/opt/logstash-2.3.1/logTest/Dades.csv"
type => "dades2"
start_position => "beginning"
sincedb_path => "/dev/null"
}
Your elasticsearch output should look like below, namely, manage_template should be true and use the full path to your dades2_template.json file (make sure to change /full/path/to with the actual path name.
elasticsearch {
hosts => localhost
index => "dades2"
manage_template => true
template => "/full/path/to/dades2_template.json"
template_name => "dades2"
template_overwrite => "true"
}
The new dades2_template.json file should look like this
{
"template": "dades2",
"order": 1,
"settings": {
"number_of_shards": 1
},
"mappings": {
"dades2": {
"_all": {
"enabled": false
},
"properties": {
"location": {
"type": "geo_point",
"geohash": true,
"geohash_prefix": true
}
}
}
}
}

OS X: logstash works for a while and then stops with "Logstash shutdown completed" msg((

After I upgraded to logstash 1.5.0 getting strange behavior of this program.
Whenever I run it with next command:
$ logstash agent -f /usr/local/etc/logstash/conf.d/logstash.conf
It works for a while and then stops saying "Logstash shutdown completed".
Example:
.....
......
"#version" => "1",
"#timestamp" => "2015-06-20T21:04:09.087Z",
"type" => "SuricataIDPS",
"host" => "drew-sh.server",
"path" => "/var/log/suricata/eve.json",
"geoip" => {
"ip" => "209.52.144.104",
"country_code2" => "CA",
"country_code3" => "CAN",
"country_name" => "Canada",
"continent_code" => "NA",
"region_name" => "BC",
"city_name" => "Vancouver",
"latitude" => 49.25,
"longitude" => -123.13329999999999,
"timezone" => "America/Vancouver",
"real_region_name" => "British Columbia",
"location" => [
[0] -123.13329999999999,
[1] 49.25
],
"coordinates" => [
[0] -123.13329999999999,
[1] 49.25
]
}
}
Logstash shutdown completed
even after complete reinstallation:
$ brew rm logstash
$ brew install logstash
I'm having same issue (((
Here is my /usr/local/etc/logstash/conf.d/logstash.conf:
input {
file {
path => ["/var/log/suricata/eve.json"]
sincedb_path => ["/var/lib/logstash/"]
codec => json
type => "SuricataIDPS"
start_position => "beginning"
}
}
filter {
if [type] == "SuricataIDPS" {
date {
match => [ "timestamp", "ISO8601" ]
}
ruby {
code => "if event['event_type'] == 'fileinfo'; event['fileinfo']['type']=event['fileinfo']['magic'].to_s.split(',')[0]; end;"
}
}
if [src_ip] {
geoip {
source => "src_ip"
target => "geoip"
#database => "/usr/local/opt/logstash/libexec/vendor/geoip/GeoLiteCity.dat"
add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ]
}
mutate {
convert => [ "[geoip][coordinates]", "float" ]
}
if ![geoip.ip] {
if [dest_ip] {
geoip {
source => "dest_ip"
target => "geoip"
#database => "/usr/local/opt/logstash/libexec/vendor/geoip/GeoLiteCity.dat"
add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ]
}
mutate {
convert => [ "[geoip][coordinates]", "float" ]
}
}
}
}
}
output {
elasticsearch {
host => localhost
protocol => http
}
stdout {
codec => rubydebug
}
}
Why? What am I doing wrong?
Never mind - I've updated logstash and now it works fine

Resources