logstash issue with json input file - elasticsearch

i have the following json in a file-
{
"foo":"bar",
"spam" : "eggs"
},
{
"css":"ddq",
"eeqw": "fewq"
}
and the following conf file-
input {
file
{
path => "/opt/logstash-1.4.2/bin/sam.json"
type => "json"
codec => json_lines
start_position =>"beginning"
}
}
output { stdout { codec => json } }
but when i run
./logstash -f sample.conf
i don't get any output in stdout.
but when i don't give json as codec and give type => "core2" then it seems to work.
Anyone know how i can fix it to work for json type.
The other issue is it gives me the following output when it does give stdout-
{"message":"{","#version":"1","#timestamp":"2015-07-15T02:02:02.653Z","type":"core2","host":"sjagannath","path":"/opt/logstash-1.4.2/bin/sam.json"}{"message":"\"foo\":\"bar\", ","#version":"1","#timestamp":"2015-07-15T02:02:02.654Z","type":"core2","host":"sjagannath","path":"/opt/logstash-1.4.2/bin/sam.json"}{"message":"\"spam\" : \"eggs\" ","#version":"1","#timestamp":"2015-07-15T02:02:02.655Z","type":"core2","host":"sjagannath","path":"/opt/logstash-1.4.2/bin/sam.json"}{"message":"},","#version":"1","#timestamp":"2015-07-15T02:02:02.655Z","type":"core2","host":"sjagannath","path":"/opt/logstash-1.4.2/bin/sam.json"}{"message":"{ ","#version":"1","#timestamp":"2015-07-15T02:02:02.655Z","type":"core2","host":"sjagannath","path":"/opt/logstash-1.4.2/bin/sam.json"}{"message":"\"css\":\"ddq\", ","#version":"1","#timestamp":"2015-07-15T02:02:02.656Z","type":"core2","host":"sjagannath","path":"/opt/logstash-1.4.2/bin/sam.json"}{"message":"\"eeqw\": \"fewq\"","#version":"1","#timestamp":"2015-07-15T02:02:02.656Z","type":"core2","host":"sjagannath","path":"/opt/logstash-1.4.2/bin/sam.json"}{"message":"}","#version":"1","#timestamp":"2015-07-15T02:02:02.656Z","type":"core2","host":"sjagannath","path":"/opt/logstash-1.4.2/bin/sam.json"}{"message":"","#version":"1","#timestamp":"2015-07-15T02:02:02.656Z","type":"core2","host":"sjagannath","path":"/opt/logstash-1.4.2/bin/sam.json"}
I want to know how it can be parsed the right way with the key value pairs in my input file

I found this and edited it to suit your purpose. The following config should do exactly what you want:
input {
file {
codec => multiline
{
pattern => "^\}"
negate => true
what => previous
}
path => ["/absoute_path/json.json"]
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
mutate {
replace => [ "message", "%{message}}" ]
gsub => [ "message","\n",""]
gsub => [ "message","},",""]
}
if [message] =~ /^{.*}$/ {
json { source => message }
}
}
I tried your given json and it results in two events. First with foo = bar and spam = eggs. Second with css = ddq and eeqw = fewq.

As of my understanding you want to put your complete son document on one line if you want to use the json_lines codec:
{"foo":"bar","spam" : "eggs"}
{"css":"ddq","eeqw": "fewq"}
In your case you have a problem with the structure since you also have a ',' between the son objects. Not the most easy way to handle it. SO if possible change the source to my example. If that is not possible the multiline approach might help you. Check this for reference:
input json to logstash - config issues?

Related

Setting variables in logstash config and referencing them

I started ELK a week back to use it for storing multiple CSVs and getting them to kibana for ease of analysing them. One case will involve multiple machines and one machine will generate many CSVs. Now these CSVs have a particular naming pattern. I am taking one particular file ( BrowsingHistoryView_DMZ-machine1.csv ) for reference and setting up the case as index. To define an index I've chosen to rename files to have prefix of '__case_number __' . So the file name will be- __1__BrowsingHistoryView_DMZ-machine1.csv
Now I want to derive two things out of it.1. Get the case number __1 __ and use 1 as index. 1 , 2 , 3 etc will be used as a case numbers.
2. Get the filetype (BrowsingHistoryView for ex.) and add a tag name to the uploaded file.
3. Get the machine name DMZ-machine1 (don't know yet where I'll use it).
I created a config file for it, which is as below-
file {
path => "/home/kriss/Documents/*.csv" # get the files from Documents
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
separator => ","
}
if [path] =~ "BrowsingHistory" { mutate { add_tag => ["Browsinghistory"] } # define a new tag for browser history, this worked
grok { match => ["path", "__(?<case>[0-9]+)__(?<category>\w+_)(?<machine>(.+)).csv"] # This regex pattern is to get category(browsingHistory), MachineName
}
}
if [path] =~ "Inbound_RDP_Events" { mutate { add_tag => {"Artifact" => "RDP" } } }
} # This tagging worked
output {
elasticsearch {
hosts => "localhost"
index => "%{category}" # This referencing the category variable didn't work
}
stdout {}
}
When I run this config on logstash, the index generated is %category . I needed it to capture browser_history for the index of that file. Also if I can convert the category to small letters, since sometimes uppercases don't work well in index. I tried to follow the official documentation but didn't get the complete info that I need.
There's a grok debugger in Dev Tools in Kibana you can use to work on these kinds of problems, or an online one at https://grokdebug.herokuapp.com/ - it's great.
Below is a slightly modified version of your config. I've removed your comments and inserted my own.
The changes are:
The path regex in your config doesn't match the example filename you gave. You might want to change it back, depending on how accurate your example was.
The grok pattern has been tweaked
Changed your Artifact tag to a field, because it looks like you're trying to create a field
I tried to stick to your spacing convention :)
input {
file {
path => "/home/kriss/Documents/*.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
separator => ","
}
# I replaced your regex with something matches your example
# filename, but given that you said you already had this
# working, you might want to change it back.
if [path] =~ "browser_history" {
mutate { add_tag => ["Browsinghistory"] }
grok {
# I replaced custom captures with a more grokish style, and
# use GREEDYDATA to capture everything up to the last '_'
match => [ "path", "^_+%{NUMBER:case}_+%{GREEDYDATA:category}_+%{DATA:case}\.csv$" ]
}
}
# Replaced `add_tag` with `add_field` so that the syntax makes sense
if [path] =~ "Inbound_RDP_Events" { mutate { add_field => {"Artifact" => "RDP" } } }
# Added the `mutate` filter's `lowercase` function for "category"
mutate {
lowercase => "category"
}
}
output {
elasticsearch {
hosts => "localhost"
index => "%{category}"
}
stdout {}
}
Not tested, but I hope it gives you enough clues.
So, for the reference for anyone who is trying to use custom variables in logstash config file. Below is the working config-
input {
file {
path => "/home/user1/Documents/__1__BrowsingHistoryView_DMZ-machine1.csv" # Getting the absolte path (necessary)
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
separator => ","
}
if [path] =~ "BrowsingHistory" { mutate { add_field => {"artifact" => "Browsinghistory"} } # if BrowsingHistory is found in path, add a tag called Browsinghistory
grok { match => ["path", "__(?<case>[0-9]+)__(?<category>\w+_)(?<machine>(.+)).csv"] # get the caseNumber, logCategory, machineName into variables
}
}
if [path] =~ "Inbound_RDP_Events" { mutate { add_field => {"artifact" => "RDP"} } } # another tag if RDP event file found in path
}
output {
elasticsearch {
hosts => "localhost"
index => "%{case}" # passing the variable value derived from regex
# index => "%{category}" # another regex variable
# index => "%{machine}" # another regex variable
}
stdout {}
}
I wasn't very sure whether to add a new tag or a new field (add_field => {"artifact" => "Browsinghistory"}) for easy identification of a file in kibana. If someone could provide some info on how to chose one out of them.

how filter {"foo":"bar", "bar": "foo"} with grok to get only foo field?

I copied
{"name":"myapp","hostname":"banana.local","pid":40161,"level":30,"msg":"hi","time":"2013-01-04T18:46:23.851Z","v":0}
from https://github.com/trentm/node-bunyan and save it as my logs.json. I am trying to import only two fields (name and msg) to ElasticSearch via LogStash. The problem is that I depend on a sort of filter that I am not able to accomplish. Well I have successfully imported such line as a single message but certainly it is not worth in my real case.
That said, how can I import only name and msg to ElasticSearch? I tested several alternatives using http://grokdebug.herokuapp.com/ to reach an useful filter with no success at all.
For instance, %{GREEDYDATA:message} will bring the entire line as an unique message but how to split it and ignore all other than name and msg fields?
At the end, I am planing to use here:
input {
file {
type => "my_type"
path => [ "/home/logs/logs.log" ]
codec => "json"
}
}
filter {
grok {
match => { "message" => "data=%{GREEDYDATA:request}"}
}
#### some extra lines here probably
}
output
{
elasticsearch {
codec => json
hosts => "http://127.0.0.1:9200"
index => "indextest"
}
stdout { codec => rubydebug }
}
I have just gone through the list of available Logstash filters. The prune filter should match your need.
Assume you have installed the prune filter, your config file should look like:
input {
file {
type => "my_type"
path => [ "/home/logs/logs.log" ]
codec => "json"
}
}
filter {
prune {
whitelist_names => [
"#timestamp",
"type",
"name",
"msg"
]
}
}
output {
elasticsearch {
codec => json
hosts => "http://127.0.0.1:9200"
index => "indextest"
}
stdout { codec => rubydebug }
}
Please be noted that you will want to keep type for Elasticsearch to index it into a correct type. #timestamp is required if you will view the data on Kibana.

How to force encoding for Logstash filters? (umlauts from message not recognized)

I am trying to import historical log data into ElasticSearch (Version 5.2.2) using Logstash (Version 5.2.1) - all running under Windows 10.
Sample log file
The sample log file I am importing looks like this:
07.02.2017 14:16:42 - Critical - General - Ähnlicher Fehler mit übermäßger Ödnis
08.02.2017 14:13:52 - Critical - General - ästhetisch überfällige Fleißarbeit
Working configuration
For starters I tried the following simple Logstash configuration (it's running on Windows so don't get confused by the mixed slashes ;)):
input {
file {
path => "D:/logstash/bin/*.log"
sincedb_path => "C:\logstash\bin\file_clientlogs_lastrun"
ignore_older => 999999999999
start_position => "beginning"
stat_interval => 60
type => "clientlogs"
}
}
output {
if [type] == "clientlogs" {
elasticsearch {
index => "logstash-clientlogs"
}
}
}
And this works fine - I see that input gets read line by line into the index I specified - when I check with Kibana for example those two lines might look like this (I just ommitted host-name - click to enlarge):
More complex (not working) configuration
But of course this is still pretty flat data and I really want to extract the proper timestamps from my lines and also the other fields and replace #timestamp and message with those; so I inserted some filter-logic involving grok-, mutate- and date-filter in between inputand output so the resulting configuration looks like this:
input {
file {
path => "D:/logs/*.log"
sincedb_path => "C:\logstash\bin\file_clientlogs_lastrun"
ignore_older => 999999999999
start_position => "beginning"
stat_interval => 60
type => "clientlogs"
}
}
filter {
if [type] == "clientlogs" {
grok {
match => [ "message", "%{MONTHDAY:monthday}.%{MONTHNUM2:monthnum}.%{YEAR:year} %{TIME:time} - %{WORD:severity} - %{WORD:granularity} - %{GREEDYDATA:logmessage}" ]
}
mutate {
add_field => {
"timestamp" => "%{year}-%{monthnum}-%{monthday} %{time}"
}
replace => [ "message", "%{logmessage}" ]
remove_field => ["year", "monthnum", "monthday", "time", "logmessage"]
}
date {
locale => "en"
match => ["timestamp", "YYYY-MM-dd HH:mm:ss"]
timezone => "Europe/Vienna"
target => "#timestamp"
add_field => { "debug" => "timestampMatched"}
}
}
}
output {
if [type] == "clientlogs" {
elasticsearch {
index => "logstash-clientlogs"
}
}
}
Now, when I look at those logs for example with Kibana, I see the fields I wanted to add do appear and the timestamp and message are replaced correctly, but my umlauts are all gone (click to enlarge):
Forcing charset in input and output
I also tried setting
codec => plain {
charset => "UTF-8"
}
for input and output, but that also did not change anything for the better.
Different output-type
When I change output to stdout { }
The output seems okay:
2017-02-07T13:16:42.000Z MYPC Ähnlicher Fehler mit übermäßger Ödnis
2017-02-08T13:13:52.000Z MYPC ästhetisch überfällige Fleißarbeit
Querying without Kibana
I also queried against the index using this PowerShell-command:
Invoke-WebRequest –Method POST -Uri 'http://localhost:9200/logstash-clientlogs/_search' -Body '
{
"query":
{
"regexp": {
"message" : ".*"
}
}
}
' | select -ExpandProperty Content
But it also returns the same messed up contents Kibana reveals:
{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":1.0,"hits":[{"_index":"logstash-clientlogs","_type":"clientlogs","_id":"AVskdTS8URonc
bfBgFwC","_score":1.0,"_source":{"severity":"Critical","debug":"timestampMatched","message":"�hnlicher Fehler mit �berm��ger �dnis\r","type":"clientlogs","path":"D:/logs/Client.log","#timestamp":"2017-02-07T13:16:42.000Z","granularity":"General","#version":"1","host":"MYPC","timestamp":"2017-02-07 14:16:42"}},{"_index":"logstash-clientlogs","_type":"clientlogs","_id":"AVskdTS8UR
oncbfBgFwD","_score":1.0,"_source":{"severity":"Critical","debug":"timestampMatched","message":"�sthetisch �berf�llige Flei�arbeit\r","type":"clientlogs","path":"D:/logs/Client.log","#timestamp":"2017-02-08T13:13:52.000Z","granularity":"General","#version":"1","host":"MYPC","timestamp":"2017-02-08 14:13:52"}}]}}
Has anyone else experienced this and has a solution for this use-case? I don't see any setting for grok to specify any encoding (the file I am passing is UTF-8 with BOM) and encoding for input itself does not seem necessary, because it gets me the correct message when I leave out the filter.

logstash parsing error (json array )

I am trying to use logstash/elasticsearch.
First, I have tried to put an xml (table) into logstash but, it seemed that xml was unreadable, so I converted it into a json array looking like this:
[
["bla","blieb"],
["things",more"],
]
my config looks like this:
input {
file {
path => "C:\Users\mipmip\Downloads\noch.json"
start_position => "beginning"
}
}
filter {
json {source => message
}
}
output {
elasticsearch{
hosts => "localhost"
index => "datensatz"
}
stdout { }
}
But it still doesn't work, all I get are a lot of _jsonparsefailures in elasticsearch :(
But whyyyy D:
[
["bla","blieb"],
["things",more"],
]
This is not a JSON object.
First, you are missing a double quote near "more". Second, you have an extra comma after the second object. I recommend checking with jsonlint.com if you have a valid JSON.
You should also surround the "message" with double quotes, in the filter part.

Data type conversion using logstash grok

Basic is a float field. The mentioned index is not present in elasticsearch. When running the config file with logstash -f, I am getting no exception. Yet, the data reflected and entered in elasticsearch shows the mapping of Basic as string. How do I rectify this? And how do I do this for multiple fields?
input {
file {
path => "/home/sagnik/work/logstash-1.4.2/bin/promosms_dec15.csv"
type => "promosms_dec15"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
grok{
match => [
"Basic", " %{NUMBER:Basic:float}"
]
}
csv {
columns => ["Generation_Date","Basic"]
separator => ","
}
ruby {
code => "event['Generation_Date'] = Date.parse(event['Generation_Date']);"
}
}
output {
elasticsearch {
action => "index"
host => "localhost"
index => "promosms-%{+dd.MM.YYYY}"
workers => 1
}
}
You have two problems. First, your grok filter is listed prior to the csv filter and because filters are applied in order there won't be a "Basic" field to convert when the grok filter is applied.
Secondly, unless you explicitly allow it, grok won't overwrite existing fields. In other words,
grok{
match => [
"Basic", " %{NUMBER:Basic:float}"
]
}
will always be a no-op. Either specify overwrite => ["Basic"] or, preferably, use mutate's type conversion feature:
mutate {
convert => ["Basic", "float"]
}

Resources