logstash - map to json array with transformation - elasticsearch

I have the following json file, each line is a diferent json:
{"s":"some address","c":"some city"}
{"s":"some address1","c":"some city1"}
{"s":"some address2","c":"some city2"}
I have the following job:
input {
file {
start_position => "beginning"
path => "/sources/someFile.txt"
}
}
filter {
json {
source => "a"
target => "addresses[0].street"
}
mutate {
remove_field => ["message", "#timestamp", "host", "path", "#version"]
}
}
output {
elasticsearch {
hosts => "http://elasticsearch:9200"
index => "store"
}
}
I want to write to to the index as the following (each address go to a different doc as the forst element in an array):
{
"addresses": [{"street" : "some address", "city" : "some city"}]
}
{
"addresses": [{"street" : "some address2", "city" : "some city1"}]
}
{
"addresses": [{"street" : "some address3", "city" : "some city2"}]
}
The attached job is not working. no error and not doing anything.
Thanks

You cannot use that field reference in the target option of the json filter. In any version of logstash from the last couple of years I would expect that to result in a _jsonparsefailure tag and the error
Exception caught in json filter {:exception=>"Invalid FieldReference: `addresses[0].street`"
If you change the reference to be [addresses][0] then it will run without error, but the reference will be interpreted as the "0" entry in the "addresses" hash, not the first entry in the addresses array.
Your incoming JSON has the wrong field names, so you will have to rename the fields. I think it is easiest to do it in a ruby filter
json { source => "message" target => "[#metadata][json]" }
ruby {
code => '
json = event.get("[#metadata][json]")
event.set("addresses", [ { "street" => json["s"], "city" => json["c"] } ] )
'
}
which produces
"addresses" => [
[0] {
"city" => "some city",
"street" => "some address"
}
],
The original JSON is placed inside the [#metadata] field so that is available but not indexed by the output.

Related

Proper way to Parse a Payload in Ruby

I have the following payload:
[{:payload=>
"{\"user\":\"test\",\"job\":\"Test\",\"username\":\"Bob\",\"blocks\":[{\"type\":\"section\",\"text\":{\"type\":\"mrkdwn\",\"text\":\"this is the title\"}},{\"type\":\"context\",\"elements\":[{\"type\":\"mrkdwn\",\"text\":\"Test\"}]},{\"type\":\"divider\"}]}"}]
I'm trying to figure out how to extract it. I tried
JSON.parse(response)
But I get the following error
TypeError: no implicit conversion of Hash into String
How can I extract this value to something where I can do something like:
response.job == "test" ?
Let's assume that you meant to say:
response = [{:payload => "{\"user\":\"test\",\"job\":\"Test\",\"username\":\"Bob\",\"blocks\":[{\"type\":\"section\",\"text\":{\"type\":\"mrkdwn\",\"text\":\"this is the title\"}},{\"type\":\"context\",\"elements\":[{\"type\":\"mrkdwn\",\"text\":\"Test\"}]},{\"type\":\"divider\"}]}"}]
Then response is an array with one element. That one element is a hash. You would thus access the payload with:
payload = JSON.parse(response.first[:payload])
=> {
"user" => "test",
"job" => "Test",
"username" => "Bob",
"blocks" => [
[0] {
"type" => "section",
"text" => {
"type" => "mrkdwn",
"text" => "this is the title"
}
},
[1] {
"type" => "context",
"elements" => [
[0] {
"type" => "mrkdwn",
"text" => "Test"
}
]
},
[2] {
"type" => "divider"
}
]
}
The payload object is then a hash and its child elements can be accessed using the standard [] call:
job = payload['job']
=> "Test"

logstash : create fingerprint from timestamp part

I have a problem to create a fingerprint based on client-ip and a timestamp containing date+hour.
I'm using logstash 7.3.1. Here it the relevant part of my configuration file
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date{
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
...
ruby{
code => "
keydate = Date.parse(event.get('timestamp'))
event.set('keydate', keydate.strftime('%Y%m%d-%H'))
"
}
fingerprint {
key => "my_custom_secret"
method => "SHA256"
concatenate_sources => "true"
source => [
"clientip",
"keydate"
]
}
}
The problem is into the 'ruby' block. I tried multiple methods to compute the keydate, but none works without giving me errors.
The last one (using this config file) is
[ERROR][logstash.filters.ruby ] Ruby exception occurred: Missing Converter handling for full class name=org.jruby.ext.date.RubyDateTime, simple name=RubyDateTime
input document
{
"timestamp" => "19/Sep/2019:00:07:56 +0200",
"referrer" => "-",
"#version" => "1",
"#timestamp" => 2019-09-18T22:07:56.000Z,
...
"request" => "index.php",
"type" => "apache_access",
"clientip" => "54.157.XXX.XXX",
"verb" => "GET",
...
"tags" => [
[0] "_rubyexception" # generated by the ruby exception above
],
"response" => "200"
}
expected output
{
"timestamp" => "19/Sep/2019:00:07:56 +0200",
"referrer" => "-",
"#version" => "1",
"#timestamp" => 2019-09-18T22:07:56.000Z,
...
"request" => "index.php",
"type" => "apache_access",
"clientip" => "54.157.XXX.XXX",
"verb" => "GET",
...
"keydate" => "20190919-00", #format : YYYYMMDD-HH
"fingerprint" => "ab347766ef....1190af",
"response" => "200"
}
As always, many thanks for all your help !
I advice to remove the ruby snippet and use the build in Date filter: https://www.elastic.co/guide/en/logstash/current/plugins-filters-date.html
What you are doing in the ruby snippet is exactly what the date filter does - extract a timestamp from a field and reconstruct it into your desire format.
another option (a bit less recommended, but will also work) is to use grok in order to extract the relevant parts of the timestamp and combine them in a different manner.

Json Array splitting issue Logstash configuration : Unexpected end-of-input: expected close marker for Array (start marker at [Source: (S

This is how my json object looks like, i have verified that the json i am getting is a valid. I tries setting up configuration files for the same, but always get the same error
SON parse error, original data now in message field {:error=>#, :data=>"{\"total_rows\":15587,\"offset\":0,\"rows\":[\r"}
[2019-08-05T21:07:49,799][WARN ][logstash.filters.split ] Only String and Array types are splittable. field:[doc][serversGroups] is of type = NilClass
[2019-08-05T21:07:50,584][WARN ][logstash.filters.split ] Only String and Array types are splittable. field:[doc][serversGroups][ActiveUsers] is of type = NilClass
This is my source Config file i am using for logstash
filter {
json {
source => "message"
skip_on_invalid_json => "true"
target => "doc"
}
split {
field => "[doc][serversGroups]"
}
split {
field => "[doc][serversGroups][ActiveUsers]"
}
date {
match => [ "[doc][date]", "UNIX" ]
target => "unix_time"
}
mutate {
convert => { "[doc][serversGroups][ActiveUsers][handle]" => "integer"
"[doc][serversGroups][list][UsedLicenses]" => "integer"
"[doc][serversGroups][list][issuedLicenses]" => "integer"
}
}
fingerprint {
concatenate_all_fields => "true"
method => "SHA256"
target => "fingerprint"
}
}
output {
stdout {
codec => "rubydebug"
}
elasticsearch {
hosts => ["localhost:9200"]
index => "pyyython"
codec => "json"
document_id => "%{[fingerprint]}"
}
}
This is my source JSON
{
"total_rows": 156122,
"offset": 12,
"rows": [
{
"id": "12345",
"key": "12345",
"value": {
"rev": "1-12345"
},
"doc": {
"_id": "12345",
"_rev": "1-12345",
"date": "15645348122",
"HostServerName": "abc.com",
"serversGroups": [
{
"ServiceName": "--- ",
"list": {
"issuedLicenses": "123",
"UsedLicenses": "12"
},
"ActiveUsers": [
{}
]
},
{
"ServiceName": "--- ",
"list": {
"issuedLicenses": "123",
"UsedLicenses": "12"
},
"ActiveUsers": [
{}
]
},
{
"ServiceName": "--- ",
"list": {
"issuedLicenses": "123",
"UsedLicenses": "12"
},
"ActiveUsers": [
{}
]
},
{
"ServiceName": "--- ",
"list": {
"issuedLicenses": "123",
"UsedLicenses": "1"
},
"ActiveUsers": [
{
"user": "me",
"user_host": "myself",
"dispay": "andI",
"version": "v1.1",
"server_host": "testing.abc.com",
"handle": "12345",
"last_date_license_check": "7/7",
"last_time_license_check": "12:12"
}
]
}
]
}
}
]
}
I keep getting this error
SON parse error, original data now in message field {:error=>#<LogStash::Json::ParserError: Unexpected end-of-input: expected close marker for Array (start marker at [Source: (S"; line: 1, column: 39])87,"offset":0,"rows":[
"; line: 2, column: 41]>, :data=>"{\"total_rows\":15587,\"offset\":0,\"rows\":[\r"}
[2019-08-05T21:07:49,799][WARN ][logstash.filters.split ] Only String and Array types are splittable. field:[doc][serversGroups] is of type = NilClass
[2019-08-05T21:07:50,584][WARN ][logstash.filters.split ] Only String and Array types are splittable. field:[doc][serversGroups][ActiveUsers] is of type = NilClass
not sure if my splitting is wrong!
The source JSON that you show is clearly invalid, since it ends with a comma. If I replace the comma with
]
}
}
]
}
then it is valid. With that change made it can be split using
split { field => "[doc][rows][0][doc][serversGroups]" }
split { field => "[doc][rows][0][doc][serversGroups][ActiveUsers]" }

How to preprocess a document before indexation?

I'm using logstash and elasticsearch to collect tweet using the Twitter plug in. My problem is that I receive a document from twitter and I would like to make some preprocessing before indexing my document. Let's say that I have this as a document result from twitter:
{
"tweet": {
"tweetId": 1025,
"tweetContent": "Hey this is a fake document for stackoverflow #stackOverflow #elasticsearch",
"hashtags": ["stackOverflow", "elasticsearch"],
"publishedAt": "2017 23 August",
"analytics": {
"likeNumber": 400,
"shareNumber": 100,
}
},
"author":{
"authorId": 819744,
"authorAt": "the_expert",
"authorName": "John Smith",
"description": "Haha it's a fake description"
}
}
Now out of this document that twitter is sending me I would like to generate two documents:
the first one will be indexed in twitter/tweet/1025 :
# The id for this document should be the one from tweetId `"tweetId": 1025`
{
"content": "Hey this is a fake document for stackoverflow #stackOverflow #elasticsearch", # this field has been renamed
"hashtags": ["stackOverflow", "elasticsearch"],
"date": "2017/08/23", # the date has been formated
"shareNumber": 100 # This field has been flattened
}
The second one will be indexed in twitter/author/819744:
# The id for this document should be the one from authorId `"authorId": 819744 `
{
"authorAt": "the_expert",
"description": "Haha it's a fake description"
}
I have defined my output as follow:
output {
stdout { codec => dots }
elasticsearch {
hosts => [ "localhost:9200" ]
index => "twitter"
document_type => "tweet"
}
}
How can I process the information from twitter?
EDIT:
So my full config file should look like:
input {
twitter {
consumer_key => "consumer_key"
consumer_secret => "consumer_secret"
oauth_token => "access_token"
oauth_token_secret => "access_token_secret"
keywords => [ "random", "word"]
full_tweet => true
type => "tweet"
}
}
filter {
clone {
clones => ["author"]
}
if([type] == "tweet") {
mutate {
remove_field => ["authorId", "authorAt"]
}
} else {
mutate {
remove_field => ["tweetId", "tweetContent"]
}
}
}
output {
stdout { codec => dots }
if [type] == "tweet" {
elasticsearch {
hosts => [ "localhost:9200" ]
index => "twitter"
document_type => "tweet"
document_id => "%{[tweetId]}"
}
} else {
elasticsearch {
hosts => [ "localhost:9200" ]
index => "twitter"
document_type => "author"
document_id => "%{[authorId]}"
}
}
}
You could use the clone filter plugin on logstash.
With a sample logstash configuration file that takes a JSON input from stdin and simply shows the output on stdout:
input {
stdin {
codec => json
type => "tweet"
}
}
filter {
mutate {
add_field => {
"tweetId" => "%{[tweet][tweetId]}"
"content" => "%{[tweet][tweetContent]}"
"date" => "%{[tweet][publishedAt]}"
"shareNumber" => "%{[tweet][analytics][shareNumber]}"
"authorId" => "%{[author][authorId]}"
"authorAt" => "%{[author][authorAt]}"
"description" => "%{[author][description]}"
}
}
date {
match => ["date", "yyyy dd MMMM"]
target => "date"
}
ruby {
code => '
event.set("hashtags", event.get("[tweet][hashtags]"))
'
}
clone {
clones => ["author"]
}
mutate {
remove_field => ["author", "tweet", "message"]
}
if([type] == "tweet") {
mutate {
remove_field => ["authorId", "authorAt", "description"]
}
} else {
mutate {
remove_field => ["tweetId", "content", "hashtags", "date", "shareNumber"]
}
}
}
output {
stdout {
codec => rubydebug
}
}
Using as input:
{"tweet": { "tweetId": 1025, "tweetContent": "Hey this is a fake document", "hashtags": ["stackOverflow", "elasticsearch"], "publishedAt": "2017 23 August","analytics": { "likeNumber": 400, "shareNumber": 100 } }, "author":{ "authorId": 819744, "authorAt": "the_expert", "authorName": "John Smith", "description": "fake description" } }
You would get these two documents:
{
"date" => 2017-08-23T00:00:00.000Z,
"hashtags" => [
[0] "stackOverflow",
[1] "elasticsearch"
],
"type" => "tweet",
"tweetId" => "1025",
"content" => "Hey this is a fake document",
"shareNumber" => "100",
"#timestamp" => 2017-08-23T20:36:53.795Z,
"#version" => "1",
"host" => "my-host"
}
{
"description" => "fake description",
"type" => "author",
"authorId" => "819744",
"#timestamp" => 2017-08-23T20:36:53.795Z,
"authorAt" => "the_expert",
"#version" => "1",
"host" => "my-host"
}
You could alternatively use a ruby script to flatten the fields, and then use rename on mutate, when necessary.
If you want elasticsearch to use authorId and tweetId, instead of default ID, you could probably configure elasticsearch output with document_id.
output {
stdout { codec => dots }
if [type] == "tweet" {
elasticsearch {
hosts => [ "localhost:9200" ]
index => "twitter"
document_type => "tweet"
document_id => "%{[tweetId]}"
}
} else {
elasticsearch {
hosts => [ "localhost:9200" ]
index => "twitter"
document_type => "tweet"
document_id => "%{[authorId]}"
}
}
}

input json to logstash - config issues?

i have the following json input that i want to dump to logstash (and eventually search/dashboard in elasticsearch/kibana).
{"vulnerabilities":[
{"ip":"10.1.1.1","dns":"z.acme.com","vid":"12345"},
{"ip":"10.1.1.2","dns":"y.acme.com","vid":"12345"},
{"ip":"10.1.1.3","dns":"x.acme.com","vid":"12345"}
]}
i'm using the following logstash configuration
input {
file {
path => "/tmp/logdump/*"
type => "assets"
codec => "json"
}
}
output {
stdout { codec => rubydebug }
elasticsearch { host => localhost }
}
output
{
"message" => "{\"vulnerabilities\":[\r",
"#version" => "1",
"#timestamp" => "2014-10-30T23:41:19.788Z",
"type" => "assets",
"host" => "av12612sn00-pn9",
"path" => "/tmp/logdump/stack3.json"
}
{
"message" => "{\"ip\":\"10.1.1.30\",\"dns\":\"z.acme.com\",\"vid\":\"12345\"},\r",
"#version" => "1",
"#timestamp" => "2014-10-30T23:41:19.838Z",
"type" => "assets",
"host" => "av12612sn00-pn9",
"path" => "/tmp/logdump/stack3.json"
}
{
"message" => "{\"ip\":\"10.1.1.31\",\"dns\":\"y.acme.com\",\"vid\":\"12345\"},\r",
"#version" => "1",
"#timestamp" => "2014-10-30T23:41:19.870Z",
"type" => "shellshock",
"host" => "av1261wag2sn00-pn9",
"path" => "/tmp/logdump/stack3.json"
}
{
"ip" => "10.1.1.32",
"dns" => "x.acme.com",
"vid" => "12345",
"#version" => "1",
"#timestamp" => "2014-10-30T23:41:19.884Z",
"type" => "assets",
"host" => "av12612sn00-pn9",
"path" => "/tmp/logdump/stack3.json"
}
obviously logstash is treating each line as an event and it thinks {"vulnerabilities":[ is an event and i'm guessing the trailing commas on the 2 subsequent nodes mess up the parsing, and the last node appears coorrect. how do i tell logstash to parse the events inside the vulnerabilities array and to ignore the commas at the end of the line?
Updated: 2014-11-05
Following Magnus' recommendations, I added the json filter and it's working perfectly. However, it would not parse the last line of the json correctly without specifying start_position => "beginning" in the file input block. Any ideas why not? I know it parses bottom up by default but would anticipate the mutate/gsub would handle this smoothly?
file {
path => "/tmp/logdump/*"
type => "assets"
start_position => "beginning"
}
}
filter {
if [message] =~ /^\[?{"ip":/ {
mutate {
gsub => [
"message", "^\[{", "{",
"message", "},?\]?$", "}"
]
}
json {
source => "message"
remove_field => ["message"]
}
}
}
output {
stdout { codec => rubydebug }
elasticsearch { host => localhost }
}
You could skip the json codec and use a multiline filter to join the message into a single string that you can feed to the json filter.filter {
filter {
multiline {
pattern => '^{"vulnerabilities":\['
negate => true
what => "previous"
}
json {
source => "message"
}
}
However, this produces the following unwanted results:
{
"message" => "<omitted for brevity>",
"#version" => "1",
"#timestamp" => "2014-10-31T06:48:15.589Z",
"host" => "name-of-your-host",
"tags" => [
[0] "multiline"
],
"vulnerabilities" => [
[0] {
"ip" => "10.1.1.1",
"dns" => "z.acme.com",
"vid" => "12345"
},
[1] {
"ip" => "10.1.1.2",
"dns" => "y.acme.com",
"vid" => "12345"
},
[2] {
"ip" => "10.1.1.3",
"dns" => "x.acme.com",
"vid" => "12345"
}
]
}
Unless there's a fixed number of elements in the vulnerabilities array I don't think there's much we can do with this (without resorting to the ruby filter).
How about just applying the json filter to lines that look like what we want and drop the rest? Your question doesn't make it clear whether all of the log looks like this so this may not be so useful.
filter {
if [message] =~ /^\s+{"ip":/ {
# Remove trailing commas
mutate {
gsub => ["message", ",$", ""]
}
json {
source => "message"
remove_field => ["message"]
}
} else {
drop {}
}
}

Resources