logstash parsing error (json array ) - elasticsearch

I am trying to use logstash/elasticsearch.
First, I have tried to put an xml (table) into logstash but, it seemed that xml was unreadable, so I converted it into a json array looking like this:
[
["bla","blieb"],
["things",more"],
]
my config looks like this:
input {
file {
path => "C:\Users\mipmip\Downloads\noch.json"
start_position => "beginning"
}
}
filter {
json {source => message
}
}
output {
elasticsearch{
hosts => "localhost"
index => "datensatz"
}
stdout { }
}
But it still doesn't work, all I get are a lot of _jsonparsefailures in elasticsearch :(
But whyyyy D:

[
["bla","blieb"],
["things",more"],
]
This is not a JSON object.
First, you are missing a double quote near "more". Second, you have an extra comma after the second object. I recommend checking with jsonlint.com if you have a valid JSON.
You should also surround the "message" with double quotes, in the filter part.

Related

logstash _grokparsefailure for realy simple tag

I don't understand why I have a grokparse failure for this simple config :
input {
file {
path => "/var/log/*.log"
codec => json {
}
}
}
filter {
grok {
add_tag => ["test"]
}
}
output {
elasticsearch {
/.../
}
}
The logs are correcly sent to elasticsearch, the json is correcly parsed, but the added tag don't work, instead I have a tag "_grokparsefailure". What I want is to pass a static value as a tag.
I am surely missing something dumb, but I can't find what.
Your grok filter does nothing, there is no pattern to match, the tag would only be applied after a successful match.
To add a tag in your case you can use the tags option in your input or the mutate filter.
To use the tags option just add change your input to this one:
input {
file {
path => "/var/log/*.log"
codec => json
tags => ["test"]
}
}
To use the mutate filter, put the bellow config inside your filter block.
mutate {
add_tag => ["test"]
}
Both configurations will add a test tag to all your messages.

Drop filter not working logstash

I have multiple log messages in a file which I am processing using logstash filter plugins. Then, the filtered logs are getting sent to elasticsearch.
There is one field called addID in a log message. I want to drop all the log messages which have a particular addID present. These particular addIDS are present in a ID.yml file.
Scenario: If the addID of a log message matches with any of the addIDs present in the ID.yml file, that log message should be dropped.
Could anyone help me in achieving this?
Below is my config file.
input {
file {
path => "/Users/jshaw/logs/access_logs.logs
ignore_older => 0
}
}
filter {
grok {
patterns_dir => ["/Users/jshaw/patterns"]
match => ["message", "%{TIMESTAMP:Timestamp}+{IP:ClientIP}+{URI:Uri}"]
}
kv{
field_split => "&?"
include_keys => [ "addID" ]
allow_duplicate_values => "false"
}
if [addID] in "/Users/jshaw/addID.yml" {
drop{}
}
}
output {
elasticsearch{
hosts => ["localhost:9200"]
}
}
You are using the in operator wrong. It is used to check if a value is in an array, not in a file, which are usually a bit more complicated to use.
A solution would be to use the ruby filter to open the file each time.
Or put the addId value in your configuration file, like this:
if [addID] == "addID" {
drop{}
}

How to extract variables from log file path, test log file name for pattern in Logstash?

I have AWS ElasticBeanstalk instance logs on S3 bucket.
Path to Logs is:
resources/environments/logs/publish/e-3ykfgdfgmp8/i-cf216955/_var_log_nginx_rotated_access.log1417633261.gz
which translates to :
resources/environments/logs/publish/e-[random environment id]/i-[random instance id]/
The path contains multiple logs:
_var_log_eb-docker_containers_eb-current-app_rotated_application.log1417586461.gz
_var_log_eb-docker_containers_eb-current-app_rotated_application.log1417597261.gz
_var_log_rotated_docker1417579261.gz
_var_log_rotated_docker1417582862.gz
_var_log_rotated_docker-events.log1417579261.gz
_var_log_nginx_rotated_access.log1417633261.gz
Notice that there's some random number (timestamp?) inserted by AWS in filename before ".gz"
Problem is that I need to set variables depending on log file name.
Here's my configuration:
input {
s3 {
debug => "true"
bucket => "elasticbeanstalk-us-east-1-something"
region => "us-east-1"
region_endpoint => "us-east-1"
credentials => ["..."]
prefix => "resources/environments/logs/publish/"
sincedb_path => "/tmp/s3.sincedb"
backup_to_dir => "/tmp/logstashed/"
tags => ["s3","elastic_beanstalk"]
type => "elastic_beanstalk"
}
}
filter {
if [type] == "elastic_beanstalk" {
grok {
match => [ "#source_path", "resources/environments/logs/publish/%{environment}/%{instance}/%{file}<unnecessary_number>.gz" ]
}
}
}
In this case I want to extract environment , instance and file name from path. In file name I need to ignore that random number.
Am I doing this the right way? What will be full, correct solution for this?
Another question is how can I specify fields for custom log format for particular log file from above?
This could be something like: (meta-code)
filter {
if [type] == "elastic_beanstalk" {
if [file_name] BEGINS WITH "application_custom_log" {
grok {
match => [ "message", "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}" ]
}
}
if [file_name] BEGINS WITH "some_other_custom_log" {
....
}
}
}
How do I test for file name pattern?
For your first question, and assuming that #source_path contains the full path, try:
match => [ "#source_path", "logs/publish/%{NOTSPACE:env}/%{NOTSPACE:instance}/%{NOTSPACE:file}%{NUMBER}%{NOTSPACE:suffix}" ]
This will create 4 logstash field for you:
env
instance
file
suffix
More information is available on the grok man page and you should test with the grok debugger.
To test fields in logstash, you use conditionals, e.g.
if [field] == "value"
if [field] =~ /regexp/
etc.
Note that it's not always necessary to do this with grok. You can have multiple 'match' arguments, and it will (by default) stop after hitting the first one that matches. If your patterns are exclusive, this should work for you.

reparsing a logstash record? fix extracts?

I'm taking a JSON message (Cloudtrail, many objects concatenated together) and by the time I'm done filtering it, Logstash doesn't seem to be parsing the message correctly. It's as if the hash was simply dumped into a string.
Anyhow, here's the input and filter.
input {
s3 {
bucket => "stanson-ops"
delete => false
#snipped unimportant bits
type => "cloudtrail"
}
}
filter {
if [type] == "cloudtrail" {
json { # http://logstash.net/docs/1.4.2/filters/json
source => "message"
}
ruby {
code => "event['RecordStr'] = event['Records'].join('~~~')"
}
split {
field => "RecordStr"
terminator => "~~~"
remove_field => [ "message", "Records" ]
}
}
}
By the time I'm done, elasticsearch entries include a RecordStr key with the following data. It doesn't have a message field, nor does it have a Records field.
{"eventVersion"=>"1.01", "userIdentity"=>{"type"=>"IAMUser", "principalId"=>"xxx"}}
Note that is not JSON style, it's been parsed. (which is important for the concat->split thing to work).
So, the RecordStr key looks not quite right as one value. Further, in Kibana, filterable fields include RecordStr (no subfields). It includes some entries that aren't there anymore: Records.eventVersion, Records.userIdentity.type.
Why is that? How can I get the proper fields?
edit 1 here's part of the input.
{"Records":[{"eventVersion":"1.01","userIdentity":{"type":"IAMUser",
It's unprettified JSON. It appears the body of the file (the above) is in the message field, json extracts it and I end up with an array of records in the Records field. That's why I join and split it- I then end up with individual documents, each with a single RecordStr entry. However, the template(?) doesn't seem to understand the new structure.
I've worked out a method that allows for indexing the appropriate CloudTrail fields as you requested. Here are the modified input and filter configs:
input {
s3 {
backup_add_prefix => \"processed-logs/\"
backup_to_bucket => \"test-bucket\"
bucket => \"test-bucket\"
delete => true
interval => 30
prefix => \"AWSLogs/<account-id>/CloudTrail/\"
type => \"cloudtrail\"
}
}
filter {
if [type] == \"cloudtrail\" {
json {
source => \"message\"
}
ruby {
code => \"event.set('RecordStr', event.get('Records').join('~~~'))\"
}
split {
field => \"RecordStr\"
terminator => \"~~~\"
remove_field => [ \"message\", \"Records\" ]
}
mutate {
gsub => [
\"RecordStr\", \"=>\", \":\"
]
}
mutate {
gsub => [
\"RecordStr\", \"nil\", \"null\"
]
}
json {
skip_on_invalid_json => true
source => \"RecordStr\"
target => \"cloudtrail\"
}
mutate {
add_tag => [\"cloudtrail\"]
remove_field=>[\"RecordStr\", \"#version\"]
}
date {
match => [\"[cloudtrail][eventTime]\",\"ISO8601\"]
}
}
}
The key observation here is that once the split is done we no longer possess valid json in the event and are therefore required to execute the mutate replacements ('=>' to ':' and 'nil' to 'null'). Additionally, I found it useful to get the timestamp out of the CloudTrail eventTime and do some cleanup of unnecessary fields.

logstash issue with json input file

i have the following json in a file-
{
"foo":"bar",
"spam" : "eggs"
},
{
"css":"ddq",
"eeqw": "fewq"
}
and the following conf file-
input {
file
{
path => "/opt/logstash-1.4.2/bin/sam.json"
type => "json"
codec => json_lines
start_position =>"beginning"
}
}
output { stdout { codec => json } }
but when i run
./logstash -f sample.conf
i don't get any output in stdout.
but when i don't give json as codec and give type => "core2" then it seems to work.
Anyone know how i can fix it to work for json type.
The other issue is it gives me the following output when it does give stdout-
{"message":"{","#version":"1","#timestamp":"2015-07-15T02:02:02.653Z","type":"core2","host":"sjagannath","path":"/opt/logstash-1.4.2/bin/sam.json"}{"message":"\"foo\":\"bar\", ","#version":"1","#timestamp":"2015-07-15T02:02:02.654Z","type":"core2","host":"sjagannath","path":"/opt/logstash-1.4.2/bin/sam.json"}{"message":"\"spam\" : \"eggs\" ","#version":"1","#timestamp":"2015-07-15T02:02:02.655Z","type":"core2","host":"sjagannath","path":"/opt/logstash-1.4.2/bin/sam.json"}{"message":"},","#version":"1","#timestamp":"2015-07-15T02:02:02.655Z","type":"core2","host":"sjagannath","path":"/opt/logstash-1.4.2/bin/sam.json"}{"message":"{ ","#version":"1","#timestamp":"2015-07-15T02:02:02.655Z","type":"core2","host":"sjagannath","path":"/opt/logstash-1.4.2/bin/sam.json"}{"message":"\"css\":\"ddq\", ","#version":"1","#timestamp":"2015-07-15T02:02:02.656Z","type":"core2","host":"sjagannath","path":"/opt/logstash-1.4.2/bin/sam.json"}{"message":"\"eeqw\": \"fewq\"","#version":"1","#timestamp":"2015-07-15T02:02:02.656Z","type":"core2","host":"sjagannath","path":"/opt/logstash-1.4.2/bin/sam.json"}{"message":"}","#version":"1","#timestamp":"2015-07-15T02:02:02.656Z","type":"core2","host":"sjagannath","path":"/opt/logstash-1.4.2/bin/sam.json"}{"message":"","#version":"1","#timestamp":"2015-07-15T02:02:02.656Z","type":"core2","host":"sjagannath","path":"/opt/logstash-1.4.2/bin/sam.json"}
I want to know how it can be parsed the right way with the key value pairs in my input file
I found this and edited it to suit your purpose. The following config should do exactly what you want:
input {
file {
codec => multiline
{
pattern => "^\}"
negate => true
what => previous
}
path => ["/absoute_path/json.json"]
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
mutate {
replace => [ "message", "%{message}}" ]
gsub => [ "message","\n",""]
gsub => [ "message","},",""]
}
if [message] =~ /^{.*}$/ {
json { source => message }
}
}
I tried your given json and it results in two events. First with foo = bar and spam = eggs. Second with css = ddq and eeqw = fewq.
As of my understanding you want to put your complete son document on one line if you want to use the json_lines codec:
{"foo":"bar","spam" : "eggs"}
{"css":"ddq","eeqw": "fewq"}
In your case you have a problem with the structure since you also have a ',' between the son objects. Not the most easy way to handle it. SO if possible change the source to my example. If that is not possible the multiline approach might help you. Check this for reference:
input json to logstash - config issues?

Resources