Logstash - parses only one JSON event - elasticsearch

I am using ELK 5.3.0. I am trying to parse simple JSON document. It does work creating key/values, however it writes in Elasticsearch only one event. And it does that randomly. Sometimes is first, sometimes second or third. But is is always one event.
Filesetup (Created in Mac. One line per JSON object), three events:
{"timestamp":"2012-01-01 02:00:01", "severity":"ERROR",
"messages":"Foo failed", "fieldone": "I am first entry... if the value
of a field one", "fieldtwo": "ttthis if the value of a field two"}
{"timestamp":"2013-01-01 02:04:02", "severity":"INFO", "messages":"Bar
was successful", "fieldone": "I am second entry... if the value of a
field one", "fieldtwo": "this if the value of a field two"}
{"timestamp":"2017-01-01 02:10:12", "severity":"DEBUG",
"messages":"Baz was notified", "fieldone": "I am third entry... if the
value of a field one", "fieldtwo": "this if the value of a field two"}
Filebeatsetup:
- input_type: log
paths: Downloads/elk/small/jsontest.log
document_type: jsonindex
Logstashsetup:
filter {
if [#metadata][type] == "jsonindex" {
json {
source => "message"
}
}
}
Logstash output (shows three events):
{
"severity" => "DEBUG",
"offset" => 544,
"#uuid" => "a316bb67-98e5-4551-8243-f8538023cfd9",
"input_type" => "log",
"source" => "/Users/xxx/Downloads/elk/small/jsontest.log",
"fieldone" => "this if the value of a field one",
"type" => "jsonindex",
"tags" => [
[0] "beats_input_codec_json_applied",
[1] "_dateparsefailure"
],
"fieldtwo" => "this if the value of a field two",
"#timestamp" => 2017-05-08T11:25:41.586Z,
"#version" => "1",
"beat" => {
"hostname" => "C700893",
"name" => "C700893",
"version" => "5.3.0"
},
"host" => "C700893",
"fingerprint" => "bcb57f445084cc0e474366bf892f6b4ab9162a4e",
"messages" => "Baz was notified",
"timestamp" => "2017-01-01 02:10:12"
}
{
"severity" => "INFO",
"offset" => 361,
"#uuid" => "6d4b4401-a440-4894-b0de-84c97fc4eaf5",
"input_type" => "log",
"source" => "/Users/xxx/Downloads/elk/small/jsontest.log",
"fieldone" => "this if the value of a field one",
"type" => "jsonindex",
"tags" => [
[0] "beats_input_codec_json_applied",
[1] "_dateparsefailure"
],
"fieldtwo" => "this if the value of a field two",
"#timestamp" => 2017-05-08T11:25:41.586Z,
"#version" => "1",
"beat" => {
"hostname" => "C700893",
"name" => "C700893",
"version" => "5.3.0"
},
"host" => "C700893",
"fingerprint" => "bcb57f445084cc0e474366bf892f6b4ab9162a4e",
"messages" => "Bar was successful",
"timestamp" => "2013-01-01 02:04:02"
}
{
"severity" => "ERROR",
"offset" => 177,
"#uuid" => "d9bd0a0b-0021-48fd-8d9e-d6f82cd1e506",
"input_type" => "log",
"source" => "/Users/xxx/Downloads/elk/small/jsontest.log",
"fieldone" => "this if the value of a field one",
"type" => "jsonindex",
"tags" => [
[0] "beats_input_codec_json_applied",
[1] "_dateparsefailure"
],
"fieldtwo" => "this if the value of a field two",
"#timestamp" => 2017-05-08T11:25:41.586Z,
"#version" => "1",
"beat" => {
"hostname" => "C700893",
"name" => "C700893",
"version" => "5.3.0"
},
"host" => "C700893",
"fingerprint" => "bcb57f445084cc0e474366bf892f6b4ab9162a4e",
"messages" => "Foo failed",
"timestamp" => "2012-01-01 02:00:01"
}
ElasticSearch (document viewed in as JSON):
"tags": [
"beats_input_codec_json_applied",
"_dateparsefailure"
],
There is no JSON failure. _dateparsefailure is expected.
What is going on in here?
EDIT (Solution):
After some time, I figured I was shooting myself in the leg. Since I am parsing many different logs and also log types, I need to make certain I do not have duplicates, this in my Logstash output section I have this piece of code to ensure no duplicate log entires:
uuid {
target => "#uuid"
overwrite => true
}
fingerprint {
source => ["message"]
target => "fingerprint"
key => "78787878"
method => "SHA1"
concatenate_sources => true
}
}
End also in the same section I call ElasticSearch like this:
if [#metadata][type] == "jsonindex" {
elasticsearch {
hosts => [ "localhost:9200" ]
index => "%{[#metadata][type]}"
document_id => "%{fingerprint}"
}
}
Since my JSON objects do not contain message property, it is always virtually identical:
fingerprint {
source => ["message"]
...
Small edit to index creation fixed the problem:
if [#metadata][type] == "jsonindex" {
elasticsearch {
hosts => [ "localhost:9200" ]
index => "%{[#metadata][type]}"
}
}

your data need to separate by lines
the grok need to parse it as different line,it will become 3 data
for example :
{"timestamp":"2012-01-01 02:00:01", "severity":"ERROR", "messages":"Foo failed", "fieldone": "I am first entry... if the value of a field one", "fieldtwo": "ttthis if the value of a field two"}
{"timestamp":"2013-01-01 02:04:02", "severity":"INFO", "messages":"Bar was successful", "fieldone": "I am second entry... if the value of a field one", "fieldtwo": "this if the value of a field two"}
{"timestamp":"2017-01-01 02:10:12", "severity":"DEBUG", "messages":"Baz was notified", "fieldone": "I am third entry... if the value of a field one", "fieldtwo": "this if the value of a field two"}
you are in one line so the result is parse the last one which means the timestamp is the last one
"timestamp":"2017-01-01 02:10:12
if you change by line ,i think maybe not but you could use this
- input_type: log
paths: Downloads/elk/small/jsontest.log
document_type: jsonindex
multiline.pattern: '^{"timestamp":"[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}, '
multiline.negate: true
multiline.match: after
add the mutiline to change ,but i am afraid your data not separate by lines

Related

MULTIPLE IF ELSE CONDITION IN LOGSTASH WITH AND OPERATOR

if i use this logic in logstash it works
if "a" in [msg] or "b" in [msg]
but what i need to use is and conditioning. if i replace or with and then it would fail. Is there any idea?
This will fail
if "a" in [msg] and "b" in [msg]
What i want to do is whenever selected string a and b is there and use the filter as defined, Any help is highly appreciated
This works for me.
filter {
grok {
match => [ "message", "%{GREEDYDATA:my_data}" ]
tag_on_failure => [ "_failure", "_grokparsefailure" ]
}
if "sandeep" in [my_data] and "kanabar" in [my_data]{
mutate {
add_field => { "status" => "Both name and surname present"}
}
}
else if "sandeep" in [my_data] or "kanabar" in [my_data]{
mutate {
add_field => { "status" => "either name/surname present"}
}
}
}
Output of test run:
Input --> name:"sandeep test"
Output:
{
"#timestamp" => 2019-10-31T11:27:33.941Z,
"my_data" => "name:\"sandeep test\"",
"#version" => "1",
"host" => "M22959216G3QD",
"message" => "name:\"sandeep test\"",
"status" => "either name/surname present"
}
Input --> :"test kanabar"
Output:
{
"#timestamp" => 2019-10-31T11:27:43.389Z,
"my_data" => "name:\"test kanabar\"",
"#version" => "1",
"host" => "my_host",
"message" => "name:\"test kanabar\"",
"status" => "either name/surname present"
}
Input --> :"sandeep kanabar"
Output:
{
"#timestamp" => 2019-10-31T11:27:50.516Z,
"my_data" => "name:\"sandeep kanabar\"",
"#version" => "1",
"host" => "M22959216G3QD",
"message" => "name:\"sandeep kanabar\"",
"status" => "Both name and surname present"
}

logstash : create fingerprint from timestamp part

I have a problem to create a fingerprint based on client-ip and a timestamp containing date+hour.
I'm using logstash 7.3.1. Here it the relevant part of my configuration file
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date{
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
...
ruby{
code => "
keydate = Date.parse(event.get('timestamp'))
event.set('keydate', keydate.strftime('%Y%m%d-%H'))
"
}
fingerprint {
key => "my_custom_secret"
method => "SHA256"
concatenate_sources => "true"
source => [
"clientip",
"keydate"
]
}
}
The problem is into the 'ruby' block. I tried multiple methods to compute the keydate, but none works without giving me errors.
The last one (using this config file) is
[ERROR][logstash.filters.ruby ] Ruby exception occurred: Missing Converter handling for full class name=org.jruby.ext.date.RubyDateTime, simple name=RubyDateTime
input document
{
"timestamp" => "19/Sep/2019:00:07:56 +0200",
"referrer" => "-",
"#version" => "1",
"#timestamp" => 2019-09-18T22:07:56.000Z,
...
"request" => "index.php",
"type" => "apache_access",
"clientip" => "54.157.XXX.XXX",
"verb" => "GET",
...
"tags" => [
[0] "_rubyexception" # generated by the ruby exception above
],
"response" => "200"
}
expected output
{
"timestamp" => "19/Sep/2019:00:07:56 +0200",
"referrer" => "-",
"#version" => "1",
"#timestamp" => 2019-09-18T22:07:56.000Z,
...
"request" => "index.php",
"type" => "apache_access",
"clientip" => "54.157.XXX.XXX",
"verb" => "GET",
...
"keydate" => "20190919-00", #format : YYYYMMDD-HH
"fingerprint" => "ab347766ef....1190af",
"response" => "200"
}
As always, many thanks for all your help !
I advice to remove the ruby snippet and use the build in Date filter: https://www.elastic.co/guide/en/logstash/current/plugins-filters-date.html
What you are doing in the ruby snippet is exactly what the date filter does - extract a timestamp from a field and reconstruct it into your desire format.
another option (a bit less recommended, but will also work) is to use grok in order to extract the relevant parts of the timestamp and combine them in a different manner.

How to write grok filter in logstash to accept variable arguments

How to write grok filter rule, if message contains transactions of variable arguments.
For example:
22-Jun-2015 04:45:56 Transaction for Bill 123 item1=100 item2=200 item3=300
22-Jun-2015 05:45:23 Transaction for Bill 124 item1=200
22-Jun-2015 06:23:36 Transaction for Bill 125 item4=400 item2=200 item1=100 item5=500
We can match date, time, bill # in the above case but how to handle for variable arguments item here.
Finally I was able to do that using kv{} option of logstash
For example:
item1=100&item2=200&item3=300
item1=100&item2=200&item3=300&item4=400
I created two messages and then I got the below output;
{
"message" => "item1=100&item2=200&item3=300",
"#version" => "1",
"#timestamp" => "2015-07-04T19:20:15.831Z",
"host" => "viswesn-PC",
"item1" => "100",
"item2" => "200",
"item3" => "300",
"tags" => [
[0] "true"
]
}
{
"message" => "item1=100&item2=200&item3=300&item4=400",
"#version" => "1",
"#timestamp" => "2015-07-04T19:20:25.866Z",
"host" => "viswesn-PC",
"item1" => "100",
"item2" => "200",
"item3" => "300",
"item4" => "400",
"tags" => [
[0] "true"
]
}

Logstash - how do I split an array using the split filter without a target?

I'm trying to split a JSON array into multiple events. Here's a sample input:
{"results" : [{"id": "a1", "name": "hello"}, {"id": "a2", "name": "logstash"}]}
Here's my filter and output config:
filter {
split {
field => "results"
}
}
stdout {
codec => "rubydebug"
}
This produces 2 events, one for each of the JSONs in the array. And it's close to what I'm looking for:
{
"results" => {
"id" => "a1",
"name" => "hello"
},
"#version" => "1",
"#timestamp" => "2015-05-30T18:33:21.527Z",
"host" => "laptop",
}
{
"results" => {
"id" => "a2",
"name" => "logstash"
},
"#version" => "1",
"#timestamp" => "2015-05-30T18:33:21.527Z",
"host" => "laptop",
}
The problem is the nested "results" part. "results" being the default value for the target parameter.
Is there a way to use the split filter without producing the nested JSON, and get something like this:
{
"id" => "a1",
"name" => "hello"
"#version" => "1",
"#timestamp" => "2015-05-30T18:33:21.527Z",
"host" => "laptop",
}
{
"id" => "a2",
"name" => "logstash"
"#version" => "1",
"#timestamp" => "2015-05-30T18:33:21.527Z",
"host" => "laptop",
}
The purpose is to feed this to the ElasticSearch output with each event being a document with document_id => "id". Any good solutions are welcomed!
If you know what all of the fields will be (as it appears you do), you can simply rename the fields:
mutate {
rename => [
"[results][id]", "id",
"[results][name]", "name"
]
remove_field => "results"
}
If you didn't know what all of the fields were, you could write a ruby code filter that did a event['results'].each... and created new fields from the sub-fields of results.

input json to logstash - config issues?

i have the following json input that i want to dump to logstash (and eventually search/dashboard in elasticsearch/kibana).
{"vulnerabilities":[
{"ip":"10.1.1.1","dns":"z.acme.com","vid":"12345"},
{"ip":"10.1.1.2","dns":"y.acme.com","vid":"12345"},
{"ip":"10.1.1.3","dns":"x.acme.com","vid":"12345"}
]}
i'm using the following logstash configuration
input {
file {
path => "/tmp/logdump/*"
type => "assets"
codec => "json"
}
}
output {
stdout { codec => rubydebug }
elasticsearch { host => localhost }
}
output
{
"message" => "{\"vulnerabilities\":[\r",
"#version" => "1",
"#timestamp" => "2014-10-30T23:41:19.788Z",
"type" => "assets",
"host" => "av12612sn00-pn9",
"path" => "/tmp/logdump/stack3.json"
}
{
"message" => "{\"ip\":\"10.1.1.30\",\"dns\":\"z.acme.com\",\"vid\":\"12345\"},\r",
"#version" => "1",
"#timestamp" => "2014-10-30T23:41:19.838Z",
"type" => "assets",
"host" => "av12612sn00-pn9",
"path" => "/tmp/logdump/stack3.json"
}
{
"message" => "{\"ip\":\"10.1.1.31\",\"dns\":\"y.acme.com\",\"vid\":\"12345\"},\r",
"#version" => "1",
"#timestamp" => "2014-10-30T23:41:19.870Z",
"type" => "shellshock",
"host" => "av1261wag2sn00-pn9",
"path" => "/tmp/logdump/stack3.json"
}
{
"ip" => "10.1.1.32",
"dns" => "x.acme.com",
"vid" => "12345",
"#version" => "1",
"#timestamp" => "2014-10-30T23:41:19.884Z",
"type" => "assets",
"host" => "av12612sn00-pn9",
"path" => "/tmp/logdump/stack3.json"
}
obviously logstash is treating each line as an event and it thinks {"vulnerabilities":[ is an event and i'm guessing the trailing commas on the 2 subsequent nodes mess up the parsing, and the last node appears coorrect. how do i tell logstash to parse the events inside the vulnerabilities array and to ignore the commas at the end of the line?
Updated: 2014-11-05
Following Magnus' recommendations, I added the json filter and it's working perfectly. However, it would not parse the last line of the json correctly without specifying start_position => "beginning" in the file input block. Any ideas why not? I know it parses bottom up by default but would anticipate the mutate/gsub would handle this smoothly?
file {
path => "/tmp/logdump/*"
type => "assets"
start_position => "beginning"
}
}
filter {
if [message] =~ /^\[?{"ip":/ {
mutate {
gsub => [
"message", "^\[{", "{",
"message", "},?\]?$", "}"
]
}
json {
source => "message"
remove_field => ["message"]
}
}
}
output {
stdout { codec => rubydebug }
elasticsearch { host => localhost }
}
You could skip the json codec and use a multiline filter to join the message into a single string that you can feed to the json filter.filter {
filter {
multiline {
pattern => '^{"vulnerabilities":\['
negate => true
what => "previous"
}
json {
source => "message"
}
}
However, this produces the following unwanted results:
{
"message" => "<omitted for brevity>",
"#version" => "1",
"#timestamp" => "2014-10-31T06:48:15.589Z",
"host" => "name-of-your-host",
"tags" => [
[0] "multiline"
],
"vulnerabilities" => [
[0] {
"ip" => "10.1.1.1",
"dns" => "z.acme.com",
"vid" => "12345"
},
[1] {
"ip" => "10.1.1.2",
"dns" => "y.acme.com",
"vid" => "12345"
},
[2] {
"ip" => "10.1.1.3",
"dns" => "x.acme.com",
"vid" => "12345"
}
]
}
Unless there's a fixed number of elements in the vulnerabilities array I don't think there's much we can do with this (without resorting to the ruby filter).
How about just applying the json filter to lines that look like what we want and drop the rest? Your question doesn't make it clear whether all of the log looks like this so this may not be so useful.
filter {
if [message] =~ /^\s+{"ip":/ {
# Remove trailing commas
mutate {
gsub => ["message", ",$", ""]
}
json {
source => "message"
remove_field => ["message"]
}
} else {
drop {}
}
}

Resources