Logstash Filtering and Parsing Dies Output - ruby

Environment
Ubuntu 16.04
Logstash 5.2.1
ElasticSearch 5.1
I've configured our Deis platform to send logs to our Logstack node with no issues. However, I'm still new to Ruby and Regexes are not my strong suit.
Log Example:
2017-02-15T14:55:24UTC deis-logspout[1]: 2017/02/15 14:55:24 routing all to udp://x.x.x.x:xxxx\n
Logstash Configuration:
input {
tcp {
port => 5000
type => syslog
codec => plain
}
udp {
port => 5000
type => syslog
codec => plain
}
}
filter {
json {
source => "syslog_message"
}
}
output {
elasticsearch { hosts => ["foo.somehost"] }
}
Elasticsearch output:
"#timestamp" => 2017-02-15T14:55:24.408Z,
"#version" => "1",
"host" => "x.x.x.x",
"message" => "2017-02-15T14:55:24UTC deis-logspout[1]: 2017/02/15 14:55:24 routing all to udp://x.x.x.x:xxxx\n",
"type" => "json"
Desired outcome:
"#timestamp" => 2017-02-15T14:55:24.408Z,
"#version" => "1",
"host" => "x.x.x.x",
"type" => "json"
"container" => "deis-logspout"
"severity level" => "Info"
"message" => "routing all to udp://x.x.x.x:xxxx\n"
How can I extract the information out of the message into their individual fields?

Unfortunately your assumptions about what you are trying to do is slightly off, but we can fix that!
You created a regex for JSON, but you are not parsing JSON. You are simply parsing a log that is bastardized syslog (see syslogStreamer in the source), but is not in fact syslog format (either RFC 5424 or 3164). Logstash afterwards provides JSON output.
Let's break down the message, which becomes the source that you parse. The key is you have to parse the message front to back.
Message:
2017-02-15T14:55:24UTC deis-logspout[1]: 2017/02/15 14:55:24 routing all to udp://x.x.x.x:xxxx\n
2017-02-15T14:55:24UTC: Timestamp is a common grok pattern. This mostly follows TIMESTAMP_ISO8601 but not quite.
deis-logspout[1]: This would be your logsource, which you can name container. You can use the grok pattern URIHOST.
routing all to udp://x.x.x.x:xxxx\n: Since the message for most logs is contained at the end of the message, you can just then use the grok pattern GREEDYDATA which is the equivalent of .* in a regular expression.
2017/02/15 14:55:24: Another timestamp (why?) that doesn't match common grok patterns.
With grok filters, you can map a syntax (abstraction from regular expressions) to a semantic (name for the value that you extract). For example %{URIHOST:container}
You'll see I did some hacking together of the grok filters to make the formatting work. You have match parts of the text, even if you don't intend to capture the results. If you can't change the formatting of the timestamps to match standards, create a custom pattern.
Configuration:
input {
tcp {
port => 5000
type => deis
}
udp {
port => 5000
type => deis
}
}
filter {
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp}(UTC|CST|EST|PST) %{URIHOST:container}\[%{NUMBER}\]: %{YEAR}/%{MONTHNUM}/%{MONTHDAY} %{TIME} %{GREEDYDATA:msg}" }
}
}
output {
elasticsearch { hosts => ["foo.somehost"] }
}
Output:
{
"container" => "deis-logspout",
"msg" => "routing all to udp://x.x.x.x:xxxx",
"#timestamp" => 2017-02-22T23:55:28.319Z,
"port" => 62886,
"#version" => "1",
"host" => "10.0.2.2",
"message" => "2017-02-15T14:55:24UTC deis-logspout[1]: 2017/02/15 14:55:24 routing all to udp://x.x.x.x:xxxx",
"timestamp" => "2017-02-15T14:55:24"
"type" => "deis"
}
You can additionally mutate the item to drop #timestamp, #host, etc. as these are provided by Logstash by default. Another suggestion is to use the date filter to convert any timestamps found into usable formats (better for searching).
Depending on the log formatting, you may have to slightly alter the pattern. I only had one example to go off of. This also maintains the original full message, because any field operations done in Logstash are destructive (they overwrite values with fields of the same name).
Resources:
Grok
Grok Patterns
Grok Debugger

Related

Filter Nginx log in ELK by custom pattern

I have installed ELasticsearch + Logstash + Kibana 7.11.0 using Docker on a ubuntu server. On this server I have Nginx with custom log format also installed Filebeat to tail logs and push to ELK.
No in Kibana dashboard -> Discover section I have all loges. On the right side, I see some filter fields. One of them is "message" with contain exact each log line content like this:
10.20.30.40 - [19/Feb/2021:18:10:49 +0000] "GET /blog/post/1 HTTP/2.0" - [sts: 200] "https://google.com" "Mozilla/5.0 (Linux; Android 11; SM-N975F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.152 Mobile Safari/537.36" "-" "dns.com" rbs=103 sn=*.dns.com rt=0.002 uadd=127.0.0.1:3000 us=200 urt=0.000 url=103 rid=b694742bf2cca075d33bada95ce2c46f pck="cachekey-1010265" ucs=-
I have a custom GROK pattern for my log file and here is my logstash.conf content:
input {
beats {
port => 5044
}
tcp {
port => 5000
}
}
filter {
grok {
match => [ "message" , '%{IPORHOST:ip} (?:-|(%{WORD})) \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:method} %{NOTSPACE:uri}(?: HTTP/%{NUMBER:httpversion})?|-)\" - \[sts\: (?:%{WORD:response})\] \"%{NOTSPACE:referrer}\" \"%{DATA:http_user_agent}\" \"(?:-|())\" \"(?:%{NOTSPACE:hostname})\" rbs=(?:%{WORD:body_bytes_sent}) sn=(?:%{NOTSPACE:server_name}) rt=(?:%{NOTSPACE:request_time}) uadd=(?:%{IPORHOST:upstream_addr}):%{NUMBER:upstream_port} us=(?:%{NUMBER:upstream_status}) urt=(?:%{NOTSPACE:upstream_response_time}) url=(?:%{NUMBER:upstream_response_length}) rid=(?:%{WORD:request_id}) pck=(?:%{NOTSPACE:cache_key}) ucs=(?:%{NOTSPACE:upstream_cache_status})']
overwrite => [ "message" ]
}
mutate {
convert => ["response", "integer"]
convert => ["bytes", "integer"]
convert => ["responsetime", "float"]
}
geoip {
source => "clientip"
target => "geoip"
add_tag => [ "nginx-geoip" ]
}
date {
match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
remove_field => [ "timestamp" ]
}
useragent {
source => "agent"
}
}
output {
elasticsearch {
hosts => ["elasticsearch:9200"]
user => "elastic"
password => "MY_PASS"
index => "nginx-%{+YYYY.MM.dd}"
document_type => "nginx_logs"
}
stdout { codec => rubydebug }
}
My question is how can I use may GROP pattern on it? How can I filter my logs based on geoIP, or time duration or referral URL or other fields in my log? I did not understand it yet!
Here is filters section which I selected "message" field to show me the real log lines:
A few points based on your screenshot and the rest of the details you've provided :
Message field by default contains exactly the logline which is sent by the filebeat to the logstash and then to the index(nginx-* in your case)
Each log line gets stored in a separate event called a "document" with the above-mentioned message field as one of the fields.
For every log, you should be able to see the rest of the fields as you have parsed separately as the grok pattern you have applied here. If not, you will see the field "tags" with a value "_grokparsefailure" for that specific log-event/document. This means the grok pattern you tried to use to parse that logline is not appropriate as the logline has a different pattern than the grok pattern you applied for parsing.
Also, you might wanna check if you have created the index pattern with a time field as #timestamp or any other time field so that you can apply a range filter to see it on the basis of the time it occurred.
Happy log parsing :)

Elasticsearch Logstash Filebeat mapping

Im having a problem with ELK Stack + Filebeat.
Filebeat is sending apache-like logs to Logstash, which should be parsing the lines. Elasticsearch should be storing the split data in fields so i can visualize them using Kibana.
Problem:
Elasticsearch recieves the logs but stores them in a single "message" field.
Desired solution:
Input:
10.0.0.1 some.hostname.at - [27/Jun/2017:23:59:59 +0200]
ES:
"ip":"10.0.0.1"
"hostname":"some.hostname.at"
"timestamp":"27/Jun/2017:23:59:59 +0200"
My logstash configuration:
input {
beats {
port => 5044
}
}
filter {
if [type] == "web-apache" {
grok {
patterns_dir => ["./patterns"]
match => { "message" => "IP: %{IPV4:client_ip}, Hostname: %{HOSTNAME:hostname}, - \[timestamp: %{HTTPDATE:timestamp}\]" }
break_on_match => false
remove_field => [ "message" ]
}
date {
locale => "en"
timezone => "Europe/Vienna"
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
useragent {
source => "agent"
prefix => "browser_"
}
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
hosts => ["localhost:9200"]
index => "test1"
document_type => "accessAPI"
}
}
My Elasticsearch discover output:
I hope there are any ELK experts around that can help me.
Thank you in advance,
Matthias
The grok filter you stated will not work here.
Try using:
%{IPV4:client_ip} %{HOSTNAME:hostname} - \[%{HTTPDATE:timestamp}\]
There is no need to specify desired names seperately in front of the field names (you're not trying to format the message here, but to extract seperate fields), just stating the field name in brackets after the ':' will lead to the result you want.
Also, use the overwrite-function instead of remove_field for message.
More information here:
https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html#plugins-filters-grok-options
It will look similar to that in the end:
filter {
grok {
match => { "message" => "%{IPV4:client_ip} %{HOSTNAME:hostname} - \[%{HTTPDATE:timestamp}\]" }
overwrite => [ "message" ]
}
}
You can test grok filters here:
http://grokconstructor.appspot.com/do/match

ElasticSearch - not setting the date type

I am trying the ELK stack, and so far so good :)
I have run in to strange situation regardgin the parsing the date field and sending it to ElasticSearch. I manage to parse the field, and it really gets created in the ElasticSearch, but it always end up as string.
I have tried many different combinations. Also I have tried many different things that people suggested, but still I fail.
This is my setup:
The strings that comes from Filebeat:
[2017-04-26 09:40:33] security.DEBUG: Stored the security token in the session. {"key":"securitysecured_area"} []
[2017-04-26 09:50:42] request.INFO: Matched route "home_logged_in". {"route_parameters":{"controller":"AppBundle\Controller\HomeLoggedInController::showAction","locale":"de","route":"homelogged_in"},"request_uri":"https://qa.someserver.de/de/home"} []
The logstash parsing section:
if [#metadata][type] == "feprod" or [#metadata][type] == "feqa"{
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:logdate}" }
}
date {
#timezone => "Europe/Berlin"
match => [ "logdate", "yyyy-MM-dd HH:mm:ss"]
}
}
According to the documentation, my #timestamp field should be overwritten with the logdate value. But it is no happening.
In the ElasticSearch I can see the field logdate is being created and it has value of 2017-04-26 09:40:33, but its type is string.
I always create index from zero, I delete it first and let the logstash populate it.
I need either #timestamp overwritten with the actual date (not the date when it was indexed), or that logdate field is created with date type. Both is good
Unless you are explicitly adding [#metadata][type] somewhere that you aren't showing, that is your problem. It's not set by default, [type] is set by default from the 'type =>' parameter on your input.
You can validate this with a minimal complete example:
input {
stdin {
type=>'feprod'
}
}
filter {
if [#metadata][type] == "feprod" or [#metadata][type] == "feqa"{
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:logdate}" }
}
date {
match => [ "logdate", "yyyy-MM-dd HH:mm:ss"]
}
}
}
output {
stdout { codec => "rubydebug" }
}
And running it:
echo '[2017-04-26 09:40:33] security.DEBUG: Stored the security token in the session. {"key":"securitysecured_area"} []' | bin/logstash -f test.conf
And getting the output:
{
"#timestamp" => 2017-05-02T15:15:05.875Z,
"#version" => "1",
"host" => "xxxxxxxxx",
"message" => "[2017-04-26 09:40:33] security.DEBUG: Stored the security token in the session. {\"key\":\"securitysecured_area\"} []",
"type" => "feprod",
"tags" => []
}
if you use just if [type] ==... it will work fine.
{
"#timestamp" => 2017-04-26T14:40:33.000Z,
"logdate" => "2017-04-26 09:40:33",
"#version" => "1",
"host" => "xxxxxxxxx",
"message" => "[2017-04-26 09:40:33] security.DEBUG: Stored the security token in the session. {\"key\":\"securitysecured_area\"} []",
"type" => "feprod",
"tags" => []
}

Logstash : Mutate filter does not work

I have the following filter
filter {
grok {
break_on_match => false
match => { 'message' => '\[(?<log_time>\d{0,2}\/\d{0,2}\/\d{2} \d{2}:\d{2}:\d{2}:\d{3} [A-Z]{3})\]%{SPACE}%{BASE16NUM}%{SPACE}%{WORD:system_stat}%{GREEDYDATA}\]%{SPACE}%{LOGLEVEL}%{SPACE}(?<log_method>[a-zA-Z\.]+)%{SPACE}-%{SPACE}%{GREEDYDATA:log_message}%{SPACE}#%{SPACE}%{IP:app_host}:%{INT:app_port};%{SPACE}%{GREEDYDATA}Host:%{IPORHOST:host_name}:%{POSINT:host_port}' }
match => { 'message' => '\[(?<log_time>\d{0,2}\/\d{0,2}\/\d{2} \d{2}:\d{2}:\d{2}:\d{3} [A-Z]{3})\]'}
}
kv{
field_split => "\n;"
value_split => "=:"
trimkey => "<>\[\],;\n"
trim => "<>\[\],;\n"
}
date{
match => [ "log_time","MM/dd/YY HH:mm:ss:SSS z" ]
target => "log_time"
locale => "en"
}
mutate {
convert => {
"line_number" => "integer"
"app_port" => "integer"
"host_port" => "integer"
"et" => "integer"
}
#remove_field => [ "message" ]
}
mutate {
rename => {
"et" => "execution_time"
"URI" => "uri"
"Method" => "method"
}
}
}
i can get results out of the grok and kv filters but neither of the mutate filters work. Is it because of the kv filter?
EDIT: Purpose
my problem is that my log contains heterogenous log records. For example
[9/13/16 15:01:18:301 EDT] 89798797 SystemErr jbhsdbhbdv [vjnwnvurnuvuv] INFO djsbbdyebycbe - Filter.doFilter(..) took 0 ms.
[9/13/16 15:01:18:302 EDT] 4353453443 SystemErr sdgegrebrb [dbebtrntn] INFO sverbrebtnnrb - [SECURITY AUDIT] Received request from: "null" # wrvrbtbtbtf:000222; Headers=Host:vervreertherg:1111
Connection:keep-alive
User-Agent:Mozilla/5.0
Accept:text/css,*/*;q=0.1
Referer:https:kokokfuwnvuwnev/ikvdwninirnv/inwengi
Accept-Encoding:gzip
Accept-Language:en-US,en;q=0.8
; Body=; Method=GET; URI=dasd/wgomnwiregnm/iwenviewn; et=10ms; SC=200
all i care about is capturing the timestamp at the beginning of each record and a few other fields if they are present. i want Method,et,Host,loglevel and URI. If these fields are not present, i still want to capture the event with the loglevel and the message being logged.
is it advisable to capture such events using the same logstash process? should i be running two logstash processes? The problem is that i dont know the structure of the logs beforehand, apart from the few fields that i do want to capture.
Multiline config
path => ["path to log"]
start_position => "beginning"
ignore_older => 0
sincedb_path => "/dev/null"
codec => multiline {
pattern => "^\[\d{0,2}\/\d{0,2}\/\d{2} \d{2}:\d{2}:\d{2}:\d{3} [A-Z]{3}\]"
negate => "true"
what => "previous"
Maybe it is because some fields (line_number, et, URI, Method) aren't being created during the initial grok. For example, I see you define "log_method" but in mutate->rename, you refer to "Method". Is there a json codec or something applied in the input block that adds these extra fields?
If you post sample logs, I can test them with your filter and help you more. :)
EDIT:
I see that the log you sent has multiple lines. Are you using a multiline filter on input? Could you share your input block as well?
You definitely don't need to run two Logstash processes. One Logstash can take care of multiple log formats. You can use conditionals, try/catch, or mark the fields as optional by adding a '?' after.
MORE EDIT:
I'm getting output that implies that your mutate filters work:
"execution_time" => 10,
"uri" => "dasd/wgomnwiregnm/iwenviewn",
"method" => "GET"
once I changed trimkey => "<>\[\],;\n" to trimkey => "<>\[\],;( )?\n". I noticed that those fields (et, Method) were being prefixed with a space.
Note: I'm using the following multiline filter for testing, if yours is different it would affect the outcome. Let me know if that helps.
codec => multiline {
pattern => "\n"
negate => true
what => previous
}

Logstash date parsing as timestamp using the date filter

Well, after looking around quite a lot, I could not find a solution to my problem, as it "should" work, but obviously doesn't.
I'm using on a Ubuntu 14.04 LTS machine Logstash 1.4.2-1-2-2c0f5a1, and I am receiving messages such as the following one:
2014-08-05 10:21:13,618 [17] INFO Class.Type - This is a log message from the class:
BTW, I am also multiline
In the input configuration, I do have a multiline codec and the event is parsed correctly. I also separate the event text in several parts so that it is easier to read.
In the end, I obtain, as seen in Kibana, something like the following (JSON view):
{
"_index": "logstash-2014.08.06",
"_type": "customType",
"_id": "PRtj-EiUTZK3HWAm5RiMwA",
"_score": null,
"_source": {
"#timestamp": "2014-08-06T08:51:21.160Z",
"#version": "1",
"tags": [
"multiline"
],
"type": "utg-su",
"host": "ubuntu-14",
"path": "/mnt/folder/thisIsTheLogFile.log",
"logTimestamp": "2014-08-05;10:21:13.618",
"logThreadId": "17",
"logLevel": "INFO",
"logMessage": "Class.Type - This is a log message from the class:\r\n BTW, I am also multiline\r"
},
"sort": [
"21",
1407315081160
]
}
You may have noticed that I put a ";" in the timestamp. The reason is that I want to be able to sort the logs using the timestamp string, and apparently logstash is not that good at that (e.g.: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/multi-fields.html).
I have unsuccessfull tried to use the date filter in multiple ways, and it apparently did not work.
date {
locale => "en"
match => ["logTimestamp", "YYYY-MM-dd;HH:mm:ss.SSS", "ISO8601"]
timezone => "Europe/Vienna"
target => "#timestamp"
add_field => { "debug" => "timestampMatched"}
}
Since I read that the Joda library may have problems if the string is not strictly ISO 8601-compliant (very picky and expects a T, see https://logstash.jira.com/browse/LOGSTASH-180), I also tried to use mutate to convert the string to something like 2014-08-05T10:21:13.618 and then use "YYYY-MM-dd'T'HH:mm:ss.SSS". That also did not work.
I do not want to have to manually put a +02:00 on the time because that would give problems with daylight saving.
In any of these cases, the event goes to elasticsearch, but date does apparently nothing, as #timestamp and logTimestamp are different and no debug field is added.
Any idea how I could make the logTime strings properly sortable? I focused on converting them to a proper timestamp, but any other solution would also be welcome.
As you can see below:
When sorting over #timestamp, elasticsearch can do it properly, but since this is not the "real" log timestamp, but rather when the logstash event was read, I need (obviously) to be able to sort also over logTimestamp. This is what then is output. Obviously not that useful:
Any help is welcome! Just let me know if I forgot some information that may be useful.
Update:
Here is the filter config file that finally worked:
# Filters messages like this:
# 2014-08-05 10:21:13,618 [17] INFO Class.Type - This is a log message from the class:
# BTW, I am also multiline
# Take only type- events (type-componentA, type-componentB, etc)
filter {
# You cannot write an "if" outside of the filter!
if "type-" in [type] {
grok {
# Parse timestamp data. We need the "(?m)" so that grok (Oniguruma internally) correctly parses multi-line events
patterns_dir => "./patterns"
match => [ "message", "(?m)%{TIMESTAMP_ISO8601:logTimestampString}[ ;]\[%{DATA:logThreadId}\][ ;]%{LOGLEVEL:logLevel}[ ;]*%{GREEDYDATA:logMessage}" ]
}
# The timestamp may have commas instead of dots. Convert so as to store everything in the same way
mutate {
gsub => [
# replace all commas with dots
"logTimestampString", ",", "."
]
}
mutate {
gsub => [
# make the logTimestamp sortable. With a space, it is not! This does not work that well, in the end
# but somehow apparently makes things easier for the date filter
"logTimestampString", " ", ";"
]
}
date {
locale => "en"
match => ["logTimestampString", "YYYY-MM-dd;HH:mm:ss.SSS"]
timezone => "Europe/Vienna"
target => "logTimestamp"
}
}
}
filter {
if "type-" in [type] {
# Remove already-parsed data
mutate {
remove_field => [ "message" ]
}
}
}
I have tested your date filter. it works on me!
Here is my configuration
input {
stdin{}
}
filter {
date {
locale => "en"
match => ["message", "YYYY-MM-dd;HH:mm:ss.SSS"]
timezone => "Europe/Vienna"
target => "#timestamp"
add_field => { "debug" => "timestampMatched"}
}
}
output {
stdout {
codec => "rubydebug"
}
}
And I use this input:
2014-08-01;11:00:22.123
The output is:
{
"message" => "2014-08-01;11:00:22.123",
"#version" => "1",
"#timestamp" => "2014-08-01T09:00:22.123Z",
"host" => "ABCDE",
"debug" => "timestampMatched"
}
So, please make sure that your logTimestamp has the correct value.
It is probably other problem. Or can you provide your log event and logstash configuration for more discussion. Thank you.
This worked for me - with a slightly different datetime format:
# 2017-11-22 13:00:01,621 INFO [AtlassianEvent::0-BAM::EVENTS:pool-2-thread-2] [BuildQueueManagerImpl] Sent ExecutableQueueUpdate: addToQueue, agents known to be affected: []
input {
file {
path => "/data/atlassian-bamboo.log"
start_position => "beginning"
type => "logs"
codec => multiline {
pattern => "^%{TIMESTAMP_ISO8601} "
charset => "ISO-8859-1"
negate => true
what => "previous"
}
}
}
filter {
grok {
match => [ "message", "(?m)^%{TIMESTAMP_ISO8601:logtime}%{SPACE}%{LOGLEVEL:loglevel}%{SPACE}\[%{DATA:thread_id}\]%{SPACE}\[%{WORD:classname}\]%{SPACE}%{GREEDYDATA:logmessage}" ]
}
date {
match => ["logtime", "yyyy-MM-dd HH:mm:ss,SSS", "yyyy-MM-dd HH:mm:ss,SSS Z", "MMM dd, yyyy HH:mm:ss a" ]
timezone => "Europe/Berlin"
}
}
output {
elasticsearch { hosts => ["localhost:9200"] }
stdout { codec => rubydebug }
}

Resources