This is kind of a follow up from another one of my questions:
JSON parser in logstash ignoring data?
But this time I feel like the problem is more clear then last time and might be easier for someone to answer.
I'm using the JSON parser like this:
json #Parse all the JSON
{
source => "MFD_JSON"
target => "PARSED"
add_field => { "%{FAMILY_ID}" => "%{[PARSED][platform][family_id][1]}_%{[PARSED][platform][family_id][0]}" }
}
The part of the output for one the logs in logstash.stdout looks like this:
"FACILITY_NUM" => "1",
"LEVEL_NUM" => "7",
"PROGRAM" => "mfd_status",
"TIMESTAMP" => "2016-01-12T11:00:44.570Z",
MORE FIELDS
There are a whole bunch of fields that like the ones above that work when I remove the JSON code. When I add the JSON filter, the whole log just disappears form elasticserach/kibana for some reason. The bit added by the JSON filter is bellow:
"PARSED" => {
"platform" => {
"boot_mode" => [
[0] 2,
[1] "NAND"
],
"boot_ver" => [
[0] 6,
[1] 1,
[2] 32576,
[3] 0
],
WHOLE LOT OF OTHER VARIABLES
"family_id" => [
[0] 14,
[1] "Hatchetfish"
],
A WHOLE LOT MORE VARIABLES
},
"flash" => [
[0] 131072,
[1] 7634944
],
"can_id" => 1700,
"version" => {
"kernel" => "3.0.35 #2 SMP PREEMPT Thu Aug 20 10:40:42 UTC 2015",
"platform" => "17.0.32576-r1",
"product" => "next",
"app" => "53.1.9",
"boot" => "2013.04 (Aug 20 2015 - 10:33:51)"
}
},
"%{FAMILY_ID}" => "Hatchetfish 14"
Lets pretend the JSON won't work, I'm okay with that now, that shouldn't mess with everything else to do with the log from elasticsearch/kibana. Also, at the end I've got FAMILY_ID as a field that I added separately using add_field. At the very least that should show up, right?
If someone's seen something like this before it would be great help.
Also sorry for spamming almost the same question twice.
SAMPLE LOG LINE:
1452470936.88 1448975468.00 1 7 mfd_status 000E91DCB5A2 load {"up":[38,1.66,0.40,0.13],"mem":[967364,584900,3596,116772],"cpu":[1299,812,1791,3157,480,144],"cpu_dvfs":[996,1589,792,871,396,1320],"cpu_op":[996,50]}
The sample line will be parsed (Everything after load is JSON), and in stdout I can see that it is parsed successfully, But I don't see it in elasticsearch.
This is my output code:
elasticsearch
{
hosts => ["localhost:9200"]
document_id => "%{fingerprint}"
}
stdout { codec => rubydebug }
A lot of my logstash filter is in the other question, but I think like all the relevant parts are in this question now.
If you want to check it out here's the link: JSON parser in logstash ignoring data?
Answering my own question here. It's not the ideal answer, but if anyone has a similar problem as me you can try this out.
json #Parse all the JSON
{
source => "MFD_JSON"
target => "PARSED"
add_field => { "%{FAMILY_ID}" => "%{[PARSED][platform][family_id][1]}_%{[PARSED][platform][family_id][0]}" }
}
That's how I parsed all the JSON before, I kept at the trial and error hoping I'd get it sometime. I was about to just use a grok filter to get bits that I wanted, which is a option if this doesn't work for you. I came back to this later, and thought "What if I removed everything after" because of some crazy reason that I've forgotten. In the end I did this:
json
{
source => "MFD_JSON"
target => "PARSED_JSON"
add_field => { "FAMILY_ID" => "%{[PARSED_JSON][platform][family_id][1]}_%{[PARSED_JSON][platform][family_id][0]}" }
remove_field => [ "PARSED_JSON" ]
}
So, extract the field/fields your interested in, and then remove the field made by the parser at the end. That's what worked for me. I don't know why, but it might work for other people too.
Related
I'm searching on internet a way to put a variable in logstash and use or modify the value if a term is corresponding to a pattern.
Here, the is an example of my data source:
2017-04-12 15:49:57,641|OK|file1|98|||
2017-04-12 15:49:58,929|OK|file2|1387|null|msg_fils|
2017-04-12 15:49:58,931|OK|file3|2|msg_pere|msg_fils|
2017-04-12 15:50:17,666|OK|file1|25|||
2017-04-12 15:50:17,929|OK|file2|1387|null|msg_fils|
I'm using this grok code to parse my source.
grok {
match => {"message" => '%{TIMESTAMP_ISO8601:msgdates:date}\|%{WORD:verb}\|%{DATA:component}\|%{NUMBER:temps:int}\|%{DATA:msg_pere}\|%{DATA:msg_fils}\|'}
}
But in fact I want to modify the first field by the previous value of the line which contains file1
Can you tell me if it's possible or not?
Thanks
I have found a solution to my issue. I'm sharing you the solution to my problem.
I'm using a plugin named logstash-filter-memorize, it can be install by the command :
logstash-plugin install logstash-filter-memorize
So my filter is like this :
grok {
match => {"message" => '%{TIMESTAMP_ISO8601:msgdates:date}\|%{WORD:verb}\|%{DATA:component}\|%{NUMBER:temps:int}\|%{DATA:msg_pere}\|%{DATA:msg_fils}\|'}
}
if [component] =~ "file1" {
mutate {
add_field => [ "msg_id", "%{msgdates}" ]
}
memorize {
fields => [ "msg_id" ]
default => { "msg_id" => "NOTFOUND" }
} }
memorize {
fields => [ "msg_id9" ]
}
I hope that it can be useful for others.
I'm trying to setup the logstash-input-mongodb plugin to read audits from my database, but all the parsing strategies seem to have issues and I don't see how to customize anything.
The "flatten" parse_method works quite nicely, but it ignores mongodb object IDs and does not output them anywhere except in the log_entry field.
The "simple" parse_method includes object IDs but outputs dates in a way that I cannot figure out how to parse with the date filter (e.g., "2017-02-12 16:30:00 UTC"). Then, in the absence of a proper timestamp, the plugin seems to generate timestamps on its own which have no relation to the current time (e.g., in 2022).
The "dig" method I haven't quite figured out yet.
So my questions:
Is there a way to parse data from the log_entry (see example below) field that the plugin outputs? I've tried the json filter but it is not json because it's been ruby-formatted.
Or, is there any way to get the "flatten" method to include object IDs?
Or, is there anyw ay to get the "simple" method to properly format mongodb ISODate fields?
Is there any way to prevent the plugin from reading data from the beginning of time (I only want to push the last day or so into logstash)?
Can be reproduced with any configuration, here's my basic one:
input {
mongodb {
uri => 'mongodb://localhost:27017/test'
placeholder_db_dir => '/elk/logstash-mongodb/'
placeholder_db_name => 'logstash_sqlite.db'
collection => 'auditcommunications'
batch_size => 1000
parse_method => "flatten"
}
}
filter {
date {
match => [ "timestamp", "ISO8601" ]
}
}
output {
stdout { codec => rubydebug }
}
Example data including log_entry:
{
"audit-id" => "58a2edc916e057270065fa74",
"created" => "2017-02-14T11:45:13Z",
"type" => "mongodb-audit",
"audit-type" => "PaymentAudit",
"mongo_id" => "58a2edc916e057270065fa74",
"expiresAt" => "2017-05-15T11:45:13Z",
"lastUpdated" => "2017-02-14T11:45:13Z",
"#timestamp" => 2017-02-14T11:45:13.000Z,
"log_entry" => "{\"_id\"=>BSON::ObjectId('58a2edc916e057270065fa74'), \"order\"=>BSON::ObjectId('a8a2f205790858970046aa59'), \"_type\"=>\"PaymentAudit\", \"lastUpdated\"=>2017-02-14 11:45:13 UTC, \"created\"=>2017-02-14 11:45:13 UTC, \"payment\"=>BSON::ObjectId('58a2edc02eafcd560101ee5f'), \"organization\"=>BSON::ObjectId('56edde0ba33e1c03ff54a5ec'), \"status\"=>\"succeeded\", \"context\"=>{\"type\"=>\"order\", \"id\"=>BSON::ObjectId('58a2e205790852270046ab59')}, \"expiresAt\"=>2017-05-15 11:45:13 UTC, \"__v\"=>0}",
"logdate" => "2017-02-14T11:45:13+00:00",
"__v" => 0,
"#version" => "1",
"context_type" => "order",
"status" => "succeeded",
"timestamp" => "2017-02-14T11:45:13Z"
}
How can I extract the organization from the log_entry field above?
I've tried the following:
filter {
ruby {
code => "event.set('organization', eval(event.get('[log_entry]')))"
}
}
but this throws a rubyexception: ERROR logstash.filters.ruby - Ruby exception occurred: (eval):1: syntax error, unexpected tINTEGER
If you use the simple parse_method then you can parse the timestamp easily with the following pattern yyyy-MM-dd HH:mm:ss ZZZ that you can add to your date filter.
filter {
date {
match => [ "timestamp", "yyyy-MM-dd HH:mm:ss ZZZ" ]
}
}
Regarding the last point, I suggest checking the since_* settings which allow you to keep a cursor of what's been already processed and only start from that cursor on the next logstash restart.
I have the following filter
filter {
grok {
break_on_match => false
match => { 'message' => '\[(?<log_time>\d{0,2}\/\d{0,2}\/\d{2} \d{2}:\d{2}:\d{2}:\d{3} [A-Z]{3})\]%{SPACE}%{BASE16NUM}%{SPACE}%{WORD:system_stat}%{GREEDYDATA}\]%{SPACE}%{LOGLEVEL}%{SPACE}(?<log_method>[a-zA-Z\.]+)%{SPACE}-%{SPACE}%{GREEDYDATA:log_message}%{SPACE}#%{SPACE}%{IP:app_host}:%{INT:app_port};%{SPACE}%{GREEDYDATA}Host:%{IPORHOST:host_name}:%{POSINT:host_port}' }
match => { 'message' => '\[(?<log_time>\d{0,2}\/\d{0,2}\/\d{2} \d{2}:\d{2}:\d{2}:\d{3} [A-Z]{3})\]'}
}
kv{
field_split => "\n;"
value_split => "=:"
trimkey => "<>\[\],;\n"
trim => "<>\[\],;\n"
}
date{
match => [ "log_time","MM/dd/YY HH:mm:ss:SSS z" ]
target => "log_time"
locale => "en"
}
mutate {
convert => {
"line_number" => "integer"
"app_port" => "integer"
"host_port" => "integer"
"et" => "integer"
}
#remove_field => [ "message" ]
}
mutate {
rename => {
"et" => "execution_time"
"URI" => "uri"
"Method" => "method"
}
}
}
i can get results out of the grok and kv filters but neither of the mutate filters work. Is it because of the kv filter?
EDIT: Purpose
my problem is that my log contains heterogenous log records. For example
[9/13/16 15:01:18:301 EDT] 89798797 SystemErr jbhsdbhbdv [vjnwnvurnuvuv] INFO djsbbdyebycbe - Filter.doFilter(..) took 0 ms.
[9/13/16 15:01:18:302 EDT] 4353453443 SystemErr sdgegrebrb [dbebtrntn] INFO sverbrebtnnrb - [SECURITY AUDIT] Received request from: "null" # wrvrbtbtbtf:000222; Headers=Host:vervreertherg:1111
Connection:keep-alive
User-Agent:Mozilla/5.0
Accept:text/css,*/*;q=0.1
Referer:https:kokokfuwnvuwnev/ikvdwninirnv/inwengi
Accept-Encoding:gzip
Accept-Language:en-US,en;q=0.8
; Body=; Method=GET; URI=dasd/wgomnwiregnm/iwenviewn; et=10ms; SC=200
all i care about is capturing the timestamp at the beginning of each record and a few other fields if they are present. i want Method,et,Host,loglevel and URI. If these fields are not present, i still want to capture the event with the loglevel and the message being logged.
is it advisable to capture such events using the same logstash process? should i be running two logstash processes? The problem is that i dont know the structure of the logs beforehand, apart from the few fields that i do want to capture.
Multiline config
path => ["path to log"]
start_position => "beginning"
ignore_older => 0
sincedb_path => "/dev/null"
codec => multiline {
pattern => "^\[\d{0,2}\/\d{0,2}\/\d{2} \d{2}:\d{2}:\d{2}:\d{3} [A-Z]{3}\]"
negate => "true"
what => "previous"
Maybe it is because some fields (line_number, et, URI, Method) aren't being created during the initial grok. For example, I see you define "log_method" but in mutate->rename, you refer to "Method". Is there a json codec or something applied in the input block that adds these extra fields?
If you post sample logs, I can test them with your filter and help you more. :)
EDIT:
I see that the log you sent has multiple lines. Are you using a multiline filter on input? Could you share your input block as well?
You definitely don't need to run two Logstash processes. One Logstash can take care of multiple log formats. You can use conditionals, try/catch, or mark the fields as optional by adding a '?' after.
MORE EDIT:
I'm getting output that implies that your mutate filters work:
"execution_time" => 10,
"uri" => "dasd/wgomnwiregnm/iwenviewn",
"method" => "GET"
once I changed trimkey => "<>\[\],;\n" to trimkey => "<>\[\],;( )?\n". I noticed that those fields (et, Method) were being prefixed with a space.
Note: I'm using the following multiline filter for testing, if yours is different it would affect the outcome. Let me know if that helps.
codec => multiline {
pattern => "\n"
negate => true
what => previous
}
I asked another question eairler which I think might be related to this question:
JSON parser in logstash ignoring data?
The reason I think it's related is because in the previous question kibana wasn't displaying results from the JSON parser which have the "PROGRAM" field as "mfd_status". Now I'm changing the way I do things, removed the JSON parser just in case it might be interfering with stuff, but I still don't have any logs with "mfd_status" in them showing up.
csv
{
columns => ["unixTime", "unixTime2", "FACILITY_NUM", "LEVEL_NUM", "PROGRAM", "PID", "MSG_FULL"]
source => "message"
separator => " "
}
In my filter from the previous question I used two grok filters, now I've replaced them with a csv filter. I also have two date and a fingerprint filter but they're irrelevant for this question, I think.
Example log messages:
"1452564798.76\t1452496397.00\t1\t4\tkernel\t\t[ 6252.000246] sonar: sonar_write(): waiting..."
OUTPUT:
"unixTime" => "1452564798.76",
"unixTime2" => "1452496397.00",
"FACILITY_NUM" => "1",
"LEVEL_NUM" => "4",
"PROGRAM" => "kernel",
"PID" => nil,
"MSG_FULL" => "[ 6252.000246] sonar: sonar_write(): waiting...",
"TIMESTAMP" => "2016-01-12T02:13:18.760Z",
"TIMESTAMP_second" => "2016-01-11T07:13:17.000Z"
"1452564804.57\t1452496403.00\t1\t7\tmfd_status\t\t00800F08CFB0\textra\t{\"date\":1452543203,\"host\":\"ABCD1234\",\"inet\":[\"169.254.42.207/16\",\"10.8.207.176/32\",\"172.22.42.207/16\"],\"fb0\":[\"U:1280x800p-60\",32]}"
OUTPUT:
"tags" => [
[0] "_csvparsefailure"
After it says kernel/mfd_status in the logs, there shouldn't be any more deliminators and it should all go under the MSG_FULL field.
So, to summarize, why does one of my log messages parse correctly and the other one not? Also, even if it doesn't parse correctly it should still send it to elasticsearch just with empty fields, I think, why doesn't it do that either?
You're almost good, you need to override two more parameters in your CSV filter and both lines will be parsed correctly.
The first is skip_empty_columns => true because you have one empty field in your second log line and you need to ignore it.
The second is quote_char=> "'" (or anything else than the double quote ") since your JSON contain double quotes.
csv {
columns => ["unixTime", "unixTime2", "FACILITY_NUM", "LEVEL_NUM", "PROGRAM", "PID", "MSG_FULL"]
source => "message"
separator => " "
skip_empty_columns => true
quote_char => "'"
}
Using this, your first log line parses as:
{
"message" => "1452564798.76\\t1452496397.00\\t1\\t4\\tkernel\\t\\t[ 6252.000246] sonar: sonar_write(): waiting...",
"#version" => "1",
"#timestamp" => "2016-01-12T04:21:34.051Z",
"host" => "iMac.local",
"unixTime" => "1452564798.76",
"unixTime2" => "1452496397.00",
"FACILITY_NUM" => "1",
"LEVEL_NUM" => "4",
"PROGRAM" => "kernel",
"MSG_FULL" => "[ 6252.000246] sonar: sonar_write(): waiting..."
}
And the second log lines parses as:
{
"message" => "1452564804.57\\t1452496403.00\\t1\\t7\\tmfd_status\\t\\t00800F08CFB0\\textra\\t{\\\"date\\\":1452543203,\\\"host\\\":\\\"ABCD1234\\\",\\\"inet\\\":[\\\"169.254.42.207/16\\\",\\\"10.8.207.176/32\\\",\\\"172.22.42.207/16\\\"],\\\"fb0\\\":[\\\"U:1280x800p-60\\\",32]}",
"#version" => "1",
"#timestamp" => "2016-01-12T04:21:07.974Z",
"host" => "iMac.local",
"unixTime" => "1452564804.57",
"unixTime2" => "1452496403.00",
"FACILITY_NUM" => "1",
"LEVEL_NUM" => "7",
"PROGRAM" => "mfd_status",
"MSG_FULL" => "00800F08CFB0",
"column8" => "extra",
"column9" => "{\\\"date\\\":1452543203,\\\"host\\\":\\\"ABCD1234\\\",\\\"inet\\\":[\\\"169.254.42.207/16\\\",\\\"10.8.207.176/32\\\",\\\"172.22.42.207/16\\\"],\\\"fb0\\\":[\\\"U:1280x800p-60\\\",32]}"
}
Well, after looking around quite a lot, I could not find a solution to my problem, as it "should" work, but obviously doesn't.
I'm using on a Ubuntu 14.04 LTS machine Logstash 1.4.2-1-2-2c0f5a1, and I am receiving messages such as the following one:
2014-08-05 10:21:13,618 [17] INFO Class.Type - This is a log message from the class:
BTW, I am also multiline
In the input configuration, I do have a multiline codec and the event is parsed correctly. I also separate the event text in several parts so that it is easier to read.
In the end, I obtain, as seen in Kibana, something like the following (JSON view):
{
"_index": "logstash-2014.08.06",
"_type": "customType",
"_id": "PRtj-EiUTZK3HWAm5RiMwA",
"_score": null,
"_source": {
"#timestamp": "2014-08-06T08:51:21.160Z",
"#version": "1",
"tags": [
"multiline"
],
"type": "utg-su",
"host": "ubuntu-14",
"path": "/mnt/folder/thisIsTheLogFile.log",
"logTimestamp": "2014-08-05;10:21:13.618",
"logThreadId": "17",
"logLevel": "INFO",
"logMessage": "Class.Type - This is a log message from the class:\r\n BTW, I am also multiline\r"
},
"sort": [
"21",
1407315081160
]
}
You may have noticed that I put a ";" in the timestamp. The reason is that I want to be able to sort the logs using the timestamp string, and apparently logstash is not that good at that (e.g.: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/multi-fields.html).
I have unsuccessfull tried to use the date filter in multiple ways, and it apparently did not work.
date {
locale => "en"
match => ["logTimestamp", "YYYY-MM-dd;HH:mm:ss.SSS", "ISO8601"]
timezone => "Europe/Vienna"
target => "#timestamp"
add_field => { "debug" => "timestampMatched"}
}
Since I read that the Joda library may have problems if the string is not strictly ISO 8601-compliant (very picky and expects a T, see https://logstash.jira.com/browse/LOGSTASH-180), I also tried to use mutate to convert the string to something like 2014-08-05T10:21:13.618 and then use "YYYY-MM-dd'T'HH:mm:ss.SSS". That also did not work.
I do not want to have to manually put a +02:00 on the time because that would give problems with daylight saving.
In any of these cases, the event goes to elasticsearch, but date does apparently nothing, as #timestamp and logTimestamp are different and no debug field is added.
Any idea how I could make the logTime strings properly sortable? I focused on converting them to a proper timestamp, but any other solution would also be welcome.
As you can see below:
When sorting over #timestamp, elasticsearch can do it properly, but since this is not the "real" log timestamp, but rather when the logstash event was read, I need (obviously) to be able to sort also over logTimestamp. This is what then is output. Obviously not that useful:
Any help is welcome! Just let me know if I forgot some information that may be useful.
Update:
Here is the filter config file that finally worked:
# Filters messages like this:
# 2014-08-05 10:21:13,618 [17] INFO Class.Type - This is a log message from the class:
# BTW, I am also multiline
# Take only type- events (type-componentA, type-componentB, etc)
filter {
# You cannot write an "if" outside of the filter!
if "type-" in [type] {
grok {
# Parse timestamp data. We need the "(?m)" so that grok (Oniguruma internally) correctly parses multi-line events
patterns_dir => "./patterns"
match => [ "message", "(?m)%{TIMESTAMP_ISO8601:logTimestampString}[ ;]\[%{DATA:logThreadId}\][ ;]%{LOGLEVEL:logLevel}[ ;]*%{GREEDYDATA:logMessage}" ]
}
# The timestamp may have commas instead of dots. Convert so as to store everything in the same way
mutate {
gsub => [
# replace all commas with dots
"logTimestampString", ",", "."
]
}
mutate {
gsub => [
# make the logTimestamp sortable. With a space, it is not! This does not work that well, in the end
# but somehow apparently makes things easier for the date filter
"logTimestampString", " ", ";"
]
}
date {
locale => "en"
match => ["logTimestampString", "YYYY-MM-dd;HH:mm:ss.SSS"]
timezone => "Europe/Vienna"
target => "logTimestamp"
}
}
}
filter {
if "type-" in [type] {
# Remove already-parsed data
mutate {
remove_field => [ "message" ]
}
}
}
I have tested your date filter. it works on me!
Here is my configuration
input {
stdin{}
}
filter {
date {
locale => "en"
match => ["message", "YYYY-MM-dd;HH:mm:ss.SSS"]
timezone => "Europe/Vienna"
target => "#timestamp"
add_field => { "debug" => "timestampMatched"}
}
}
output {
stdout {
codec => "rubydebug"
}
}
And I use this input:
2014-08-01;11:00:22.123
The output is:
{
"message" => "2014-08-01;11:00:22.123",
"#version" => "1",
"#timestamp" => "2014-08-01T09:00:22.123Z",
"host" => "ABCDE",
"debug" => "timestampMatched"
}
So, please make sure that your logTimestamp has the correct value.
It is probably other problem. Or can you provide your log event and logstash configuration for more discussion. Thank you.
This worked for me - with a slightly different datetime format:
# 2017-11-22 13:00:01,621 INFO [AtlassianEvent::0-BAM::EVENTS:pool-2-thread-2] [BuildQueueManagerImpl] Sent ExecutableQueueUpdate: addToQueue, agents known to be affected: []
input {
file {
path => "/data/atlassian-bamboo.log"
start_position => "beginning"
type => "logs"
codec => multiline {
pattern => "^%{TIMESTAMP_ISO8601} "
charset => "ISO-8859-1"
negate => true
what => "previous"
}
}
}
filter {
grok {
match => [ "message", "(?m)^%{TIMESTAMP_ISO8601:logtime}%{SPACE}%{LOGLEVEL:loglevel}%{SPACE}\[%{DATA:thread_id}\]%{SPACE}\[%{WORD:classname}\]%{SPACE}%{GREEDYDATA:logmessage}" ]
}
date {
match => ["logtime", "yyyy-MM-dd HH:mm:ss,SSS", "yyyy-MM-dd HH:mm:ss,SSS Z", "MMM dd, yyyy HH:mm:ss a" ]
timezone => "Europe/Berlin"
}
}
output {
elasticsearch { hosts => ["localhost:9200"] }
stdout { codec => rubydebug }
}