Grok configuration ELK - elasticsearch

I have an original type of log to parse. The syntax is :
2013-01-05 03:29:38,842 INFO [ajp-bio-8009-exec-69] web.CustomAuthenticationSuccessHandler - doLogin : admin.ebusiness date : 2013-01-05 03:29:38
When I use the grok pattern :
if [type] in ["edai"] {
grok {
match => { "message" => ["%{YEAR:year}-%{WORD:month}-%{DATA:day} %{DATA:hour}:%{DATA:minute}:%{DATA:second},%{DATA:millis} %{NOTSPACE:loglevel} {0,1}%{GREEDYDATA:message}"] }
overwrite => [ "message" ]
}
}
The pattern work as you can see, but when I go into Kibana, the log stay in one block in the "message" section like this:
2013-01-05 23:27:47,030 INFO [ajp-bio-8009-exec-63] web.CustomAuthenticationSuccessHandler - doLogin : admin.ebusiness date : 2013-01-05 23:27:47
I would prefer to have it like this:
{ "year": [["2013"]], "month": [["01"]], "day": [["05"]], "hour": [["04"]], "minute": [["04"]], "second": [["39"]], "millis": [["398"] ], "loglevel": [ ["INFO"]] }
Can you help me to parse it correctly please?

Just tested this configuration. I kinda copied everything from your question.
input {
stdin { type => "edai" }
}
filter {
if [type] == "edai" {
grok {
match => { "message" => ["%{YEAR:year}-%{WORD:month}-%{DATA:day} %{DATA:hour}:%{DATA:minute}:%{DATA:second},%{DATA:millis} %{NOTSPACE:loglevel} {0,1}%{GREEDYDATA:message}"] }
overwrite => [ "message" ]
}
}
}
output {
stdout { codec => rubydebug }
}
This is the output:
{
"year" => "2013",
"message" => " [ajp-bio-8009-exec-69] web.CustomAuthenticationSuccessHandler - doLogin : admin.ebusiness date : 2013-01-05 03:29:38\r",
"type" => "edai",
"minute" => "29",
"second" => "38",
"#timestamp" => 2017-06-29T08:19:08.605Z,
"month" => "01",
"hour" => "03",
"loglevel" => "INFO",
"#version" => "1",
"host" => "host_name",
"millis" => "842",
"day" => "05"
}
Everything seems fine from my perspective.
I had issue when I compared type the way you did:
if [type] in ["eday"]
It did not work and I've replaced it with direct comparison:
if [type] == "edai"
Also this worked too:
if [type] in "edai"
And that solved the issue.

Related

MULTIPLE IF ELSE CONDITION IN LOGSTASH WITH AND OPERATOR

if i use this logic in logstash it works
if "a" in [msg] or "b" in [msg]
but what i need to use is and conditioning. if i replace or with and then it would fail. Is there any idea?
This will fail
if "a" in [msg] and "b" in [msg]
What i want to do is whenever selected string a and b is there and use the filter as defined, Any help is highly appreciated
This works for me.
filter {
grok {
match => [ "message", "%{GREEDYDATA:my_data}" ]
tag_on_failure => [ "_failure", "_grokparsefailure" ]
}
if "sandeep" in [my_data] and "kanabar" in [my_data]{
mutate {
add_field => { "status" => "Both name and surname present"}
}
}
else if "sandeep" in [my_data] or "kanabar" in [my_data]{
mutate {
add_field => { "status" => "either name/surname present"}
}
}
}
Output of test run:
Input --> name:"sandeep test"
Output:
{
"#timestamp" => 2019-10-31T11:27:33.941Z,
"my_data" => "name:\"sandeep test\"",
"#version" => "1",
"host" => "M22959216G3QD",
"message" => "name:\"sandeep test\"",
"status" => "either name/surname present"
}
Input --> :"test kanabar"
Output:
{
"#timestamp" => 2019-10-31T11:27:43.389Z,
"my_data" => "name:\"test kanabar\"",
"#version" => "1",
"host" => "my_host",
"message" => "name:\"test kanabar\"",
"status" => "either name/surname present"
}
Input --> :"sandeep kanabar"
Output:
{
"#timestamp" => 2019-10-31T11:27:50.516Z,
"my_data" => "name:\"sandeep kanabar\"",
"#version" => "1",
"host" => "M22959216G3QD",
"message" => "name:\"sandeep kanabar\"",
"status" => "Both name and surname present"
}

Logstash nginx filter doesn't apply to half of rows

Using filebeat to push nginx logs to logstash and then to elasticsearch.
Logstash filter:
filter {
if [fileset][module] == "nginx" {
if [fileset][name] == "access" {
grok {
match => { "message" => ["%{IPORHOST:[nginx][access][remote_ip]} - %{DATA:[nginx][access][user_name]} \[%{HTTPDATE:[nginx][access][time]}\] \"%{WORD:[nginx][access][method]} %{DATA:[nginx][access][url]} HTTP/%{NUMBER:[nginx][access][http_version]}\" %{NUMBER:[nginx][access][response_code]} %{NUMBER:[nginx][access][body_sent][bytes]} \"%{DATA:[nginx][access][referrer]}\" \"%{DATA:[nginx][access][agent]}\""] }
remove_field => "message"
}
mutate {
add_field => { "read_timestamp" => "%{#timestamp}" }
}
date {
match => [ "[nginx][access][time]", "dd/MMM/YYYY:H:m:s Z" ]
remove_field => "[nginx][access][time]"
}
useragent {
source => "[nginx][access][agent]"
target => "[nginx][access][user_agent]"
remove_field => "[nginx][access][agent]"
}
geoip {
source => "[nginx][access][remote_ip]"
target => "[nginx][access][geoip]"
}
}
else if [fileset][name] == "error" {
grok {
match => { "message" => ["%{DATA:[nginx][error][time]} \[%{DATA:[nginx][error][level]}\] %{NUMBER:[nginx][error][pid]}#%{NUMBER:[nginx][error][tid]}: (\*%{NUMBER:[nginx][error][connection_id]} )?%{GREEDYDATA:[nginx][error][message]}"] }
remove_field => "message"
}
mutate {
rename => { "#timestamp" => "read_timestamp" }
}
date {
match => [ "[nginx][error][time]", "YYYY/MM/dd H:m:s" ]
remove_field => "[nginx][error][time]"
}
}
}
}
There is just one file /var/log/nginx/access.log.
In kibana, I see ± half of the rows with parsed message and other half - not.
All of the rows in kibana have a tag "beats_input_codec_plain_applied".
Examples from filebeat -e
Row that works fine:
"source": "/var/log/nginx/access.log",
"offset": 5405195,
"message": "...",
"fileset": {
"module": "nginx",
"name": "access"
}
Row that doesn't work fine (no "fileset"):
"offset": 5405397,
"message": "...",
"source": "/var/log/nginx/access.log"
Any idea what could be the cause?

geoip.location is defined as an object in mapping [doc] but this name is already used for a field in other types

I'm getting this error:
Could not index event to Elasticsearch. {:status=>400,
:action=>["index", {:_id=>nil, :_index=>"nginx-access-2018-06-15",
:_type=>"doc", :_routing=>nil}, #],
:response=>{"index"=>{"_index"=>"nginx-access-2018-06-15",
"_type"=>"doc", "_id"=>"jo-rfGQBDK_ao1ZhmI8B", "status"=>400,
"error"=>{"type"=>"illegal_argument_exception",
"reason"=>"[geoip.location] is defined as an object in mapping [doc]
but this name is already used for a field in other types"}}}}
I'm getting the above error but don't understand why, this is loading into a brand new ES instance with no data. This is the first record that is inserted. Why am I getting this error? Here is the config:
input {
file {
type => "nginx-access"
start_position => "beginning"
path => [ "/var/log/nginx-archived/access.log.small"]
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
if [type] == "nginx-access" {
grok {
patterns_dir => "/etc/logstash/patterns"
match => { "message" => "%{NGINX_ACCESS}" }
remove_tag => ["_grokparsefailure"]
}
geoip {
source => "visitor_ip"
}
date {
# 11/Jun/2018:06:23:45 +0000
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
target => "#request_time"
}
if "_grokparsefailure" not in [tags] {
ruby {
code => "
thetime = event.get('#request_time').time
event.set('index_date', 'nginx-access-' + thetime.strftime('%Y-%m-%d'))
"
}
}
if "_grokparsefailure" in [tags] {
ruby {
code => "
event.set('index_date', 'nginx-access-error')
"
}
}
}
}
output {
elasticsearch {
hosts => "elasticsearch:9200"
index => "%{index_date}"
template => "/etc/logstash/templates/nginx-access.json"
template_overwrite => true
manage_template => true
template_name => "nginx-access"
}
stdout { }
}
Here's a sample record:
{
"method" => "GET",
"#version" => "1",
"geoip" => {
"continent_code" => "AS",
"latitude" => 39.9289,
"country_name" => "China",
"ip" => "220.181.108.103",
"location" => {
"lon" => 116.3883,
"lat" => 39.9289
},
"region_code" => "11",
"region_name" => "Beijing",
"longitude" => 116.3883,
"timezone" => "Asia/Shanghai",
"city_name" => "Beijing",
"country_code2" => "CN",
"country_code3" => "CN"
},
"index_date" => "nginx-access-2018-06-15",
"ignore" => "\"-\"",
"bytes" => "2723",
"request" => "/wp-login.php",
"#request_time" => 2018-06-15T06:29:40.000Z,
"message" => "220.181.108.103 - - [15/Jun/2018:06:29:40 +0000] \"GET /wp-login.php HTTP/1.1\" 200 2723 \"-\" \"Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)\"",
"path" => "/var/log/nginx-archived/access.log.small",
"#timestamp" => 2018-07-09T01:32:56.952Z,
"host" => "ab1526efddec",
"visitor_ip" => "220.181.108.103",
"timestamp" => "15/Jun/2018:06:29:40 +0000",
"response" => "200",
"referrer" => "\"Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)\"",
"httpversion" => "1.1",
"type" => "nginx-access"
}
Figured out the answer, based on this:
https://www.elastic.co/guide/en/elasticsearch/reference/6.x/removal-of-types.html#_schedule_for_removal_of_mapping_types
The basic problem is that for each Elasticsearch index, each field must be the same type, even if the records are different types.
That is, if I have a person { "status": "A" } stored as text I cannot have a record for a car { "status": 23 } stored as a number in the same index. Based on the info in the link above, I'm storing one "type" per index.
My output section for Logstash looks like this:
output {
elasticsearch {
hosts => "elasticsearch:9200"
index => "%{index_date}"
# Can test loading this with:
# curl -XPUT -H 'Content-Type: application/json' -d#/docker-elk/logstash/templates/nginx-access.json http://localhost:9200/_template/nginx-access
template => "/etc/logstash/templates/nginx-access.json"
template_overwrite => true
manage_template => true
template_name => "nginx-access"
}
stdout { }
}
My template looks like this:
{
"index_patterns": ["nginx-access*"],
"settings": {
},
"mappings": {
"doc": {
"_source": {
"enabled": true
},
"properties": {
"type" : { "type": "keyword" },
"response_time": { "type": "float" },
"geoip" : {
"properties" : {
"location": {
"type": "geo_point"
}
}
}
}
}
}
}
I'm also using the one type per index pattern described in the link above.

How to preprocess a document before indexation?

I'm using logstash and elasticsearch to collect tweet using the Twitter plug in. My problem is that I receive a document from twitter and I would like to make some preprocessing before indexing my document. Let's say that I have this as a document result from twitter:
{
"tweet": {
"tweetId": 1025,
"tweetContent": "Hey this is a fake document for stackoverflow #stackOverflow #elasticsearch",
"hashtags": ["stackOverflow", "elasticsearch"],
"publishedAt": "2017 23 August",
"analytics": {
"likeNumber": 400,
"shareNumber": 100,
}
},
"author":{
"authorId": 819744,
"authorAt": "the_expert",
"authorName": "John Smith",
"description": "Haha it's a fake description"
}
}
Now out of this document that twitter is sending me I would like to generate two documents:
the first one will be indexed in twitter/tweet/1025 :
# The id for this document should be the one from tweetId `"tweetId": 1025`
{
"content": "Hey this is a fake document for stackoverflow #stackOverflow #elasticsearch", # this field has been renamed
"hashtags": ["stackOverflow", "elasticsearch"],
"date": "2017/08/23", # the date has been formated
"shareNumber": 100 # This field has been flattened
}
The second one will be indexed in twitter/author/819744:
# The id for this document should be the one from authorId `"authorId": 819744 `
{
"authorAt": "the_expert",
"description": "Haha it's a fake description"
}
I have defined my output as follow:
output {
stdout { codec => dots }
elasticsearch {
hosts => [ "localhost:9200" ]
index => "twitter"
document_type => "tweet"
}
}
How can I process the information from twitter?
EDIT:
So my full config file should look like:
input {
twitter {
consumer_key => "consumer_key"
consumer_secret => "consumer_secret"
oauth_token => "access_token"
oauth_token_secret => "access_token_secret"
keywords => [ "random", "word"]
full_tweet => true
type => "tweet"
}
}
filter {
clone {
clones => ["author"]
}
if([type] == "tweet") {
mutate {
remove_field => ["authorId", "authorAt"]
}
} else {
mutate {
remove_field => ["tweetId", "tweetContent"]
}
}
}
output {
stdout { codec => dots }
if [type] == "tweet" {
elasticsearch {
hosts => [ "localhost:9200" ]
index => "twitter"
document_type => "tweet"
document_id => "%{[tweetId]}"
}
} else {
elasticsearch {
hosts => [ "localhost:9200" ]
index => "twitter"
document_type => "author"
document_id => "%{[authorId]}"
}
}
}
You could use the clone filter plugin on logstash.
With a sample logstash configuration file that takes a JSON input from stdin and simply shows the output on stdout:
input {
stdin {
codec => json
type => "tweet"
}
}
filter {
mutate {
add_field => {
"tweetId" => "%{[tweet][tweetId]}"
"content" => "%{[tweet][tweetContent]}"
"date" => "%{[tweet][publishedAt]}"
"shareNumber" => "%{[tweet][analytics][shareNumber]}"
"authorId" => "%{[author][authorId]}"
"authorAt" => "%{[author][authorAt]}"
"description" => "%{[author][description]}"
}
}
date {
match => ["date", "yyyy dd MMMM"]
target => "date"
}
ruby {
code => '
event.set("hashtags", event.get("[tweet][hashtags]"))
'
}
clone {
clones => ["author"]
}
mutate {
remove_field => ["author", "tweet", "message"]
}
if([type] == "tweet") {
mutate {
remove_field => ["authorId", "authorAt", "description"]
}
} else {
mutate {
remove_field => ["tweetId", "content", "hashtags", "date", "shareNumber"]
}
}
}
output {
stdout {
codec => rubydebug
}
}
Using as input:
{"tweet": { "tweetId": 1025, "tweetContent": "Hey this is a fake document", "hashtags": ["stackOverflow", "elasticsearch"], "publishedAt": "2017 23 August","analytics": { "likeNumber": 400, "shareNumber": 100 } }, "author":{ "authorId": 819744, "authorAt": "the_expert", "authorName": "John Smith", "description": "fake description" } }
You would get these two documents:
{
"date" => 2017-08-23T00:00:00.000Z,
"hashtags" => [
[0] "stackOverflow",
[1] "elasticsearch"
],
"type" => "tweet",
"tweetId" => "1025",
"content" => "Hey this is a fake document",
"shareNumber" => "100",
"#timestamp" => 2017-08-23T20:36:53.795Z,
"#version" => "1",
"host" => "my-host"
}
{
"description" => "fake description",
"type" => "author",
"authorId" => "819744",
"#timestamp" => 2017-08-23T20:36:53.795Z,
"authorAt" => "the_expert",
"#version" => "1",
"host" => "my-host"
}
You could alternatively use a ruby script to flatten the fields, and then use rename on mutate, when necessary.
If you want elasticsearch to use authorId and tweetId, instead of default ID, you could probably configure elasticsearch output with document_id.
output {
stdout { codec => dots }
if [type] == "tweet" {
elasticsearch {
hosts => [ "localhost:9200" ]
index => "twitter"
document_type => "tweet"
document_id => "%{[tweetId]}"
}
} else {
elasticsearch {
hosts => [ "localhost:9200" ]
index => "twitter"
document_type => "tweet"
document_id => "%{[authorId]}"
}
}
}

input json to logstash - config issues?

i have the following json input that i want to dump to logstash (and eventually search/dashboard in elasticsearch/kibana).
{"vulnerabilities":[
{"ip":"10.1.1.1","dns":"z.acme.com","vid":"12345"},
{"ip":"10.1.1.2","dns":"y.acme.com","vid":"12345"},
{"ip":"10.1.1.3","dns":"x.acme.com","vid":"12345"}
]}
i'm using the following logstash configuration
input {
file {
path => "/tmp/logdump/*"
type => "assets"
codec => "json"
}
}
output {
stdout { codec => rubydebug }
elasticsearch { host => localhost }
}
output
{
"message" => "{\"vulnerabilities\":[\r",
"#version" => "1",
"#timestamp" => "2014-10-30T23:41:19.788Z",
"type" => "assets",
"host" => "av12612sn00-pn9",
"path" => "/tmp/logdump/stack3.json"
}
{
"message" => "{\"ip\":\"10.1.1.30\",\"dns\":\"z.acme.com\",\"vid\":\"12345\"},\r",
"#version" => "1",
"#timestamp" => "2014-10-30T23:41:19.838Z",
"type" => "assets",
"host" => "av12612sn00-pn9",
"path" => "/tmp/logdump/stack3.json"
}
{
"message" => "{\"ip\":\"10.1.1.31\",\"dns\":\"y.acme.com\",\"vid\":\"12345\"},\r",
"#version" => "1",
"#timestamp" => "2014-10-30T23:41:19.870Z",
"type" => "shellshock",
"host" => "av1261wag2sn00-pn9",
"path" => "/tmp/logdump/stack3.json"
}
{
"ip" => "10.1.1.32",
"dns" => "x.acme.com",
"vid" => "12345",
"#version" => "1",
"#timestamp" => "2014-10-30T23:41:19.884Z",
"type" => "assets",
"host" => "av12612sn00-pn9",
"path" => "/tmp/logdump/stack3.json"
}
obviously logstash is treating each line as an event and it thinks {"vulnerabilities":[ is an event and i'm guessing the trailing commas on the 2 subsequent nodes mess up the parsing, and the last node appears coorrect. how do i tell logstash to parse the events inside the vulnerabilities array and to ignore the commas at the end of the line?
Updated: 2014-11-05
Following Magnus' recommendations, I added the json filter and it's working perfectly. However, it would not parse the last line of the json correctly without specifying start_position => "beginning" in the file input block. Any ideas why not? I know it parses bottom up by default but would anticipate the mutate/gsub would handle this smoothly?
file {
path => "/tmp/logdump/*"
type => "assets"
start_position => "beginning"
}
}
filter {
if [message] =~ /^\[?{"ip":/ {
mutate {
gsub => [
"message", "^\[{", "{",
"message", "},?\]?$", "}"
]
}
json {
source => "message"
remove_field => ["message"]
}
}
}
output {
stdout { codec => rubydebug }
elasticsearch { host => localhost }
}
You could skip the json codec and use a multiline filter to join the message into a single string that you can feed to the json filter.filter {
filter {
multiline {
pattern => '^{"vulnerabilities":\['
negate => true
what => "previous"
}
json {
source => "message"
}
}
However, this produces the following unwanted results:
{
"message" => "<omitted for brevity>",
"#version" => "1",
"#timestamp" => "2014-10-31T06:48:15.589Z",
"host" => "name-of-your-host",
"tags" => [
[0] "multiline"
],
"vulnerabilities" => [
[0] {
"ip" => "10.1.1.1",
"dns" => "z.acme.com",
"vid" => "12345"
},
[1] {
"ip" => "10.1.1.2",
"dns" => "y.acme.com",
"vid" => "12345"
},
[2] {
"ip" => "10.1.1.3",
"dns" => "x.acme.com",
"vid" => "12345"
}
]
}
Unless there's a fixed number of elements in the vulnerabilities array I don't think there's much we can do with this (without resorting to the ruby filter).
How about just applying the json filter to lines that look like what we want and drop the rest? Your question doesn't make it clear whether all of the log looks like this so this may not be so useful.
filter {
if [message] =~ /^\s+{"ip":/ {
# Remove trailing commas
mutate {
gsub => ["message", ",$", ""]
}
json {
source => "message"
remove_field => ["message"]
}
} else {
drop {}
}
}

Resources