Logstash and elastic search: Split up values within a value - elasticsearch

Just getting started with logstash and elastic search
Below is my log:
2015-09-09 16:02:23 GET /NeedA/some1/some2/some3/NeedB/some4/NeedC f=json - 127.0.0.1 Mozilla/5.0+(Macintosh;+Intel+Mac+OS+X+10_10_5)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/44.0.2403.157+Safari/537.36 http://localhost:3000/ 200 373 554 46
Using the config file below, I was able to get seperate out the url:
/NeedA/some1/some2/some3/NeedB/some4/NeedC
filter {
grok {
match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{WORD:method} %{URIPATH:url} %{NOTSPACE:querystring} %{NOTSPACE:username} %{IPORHOST:ipaddress} %{NOTSPACE:useragent} %{NOTSPACE:referer} %{NUMBER:scstatus} %{NUMBER:scbytes:int} %{NUMBER:csbytes:int} %{NUMBER:timetaken:int}"]
}
date {
match => [ "log_timestamp", "YYYY-MM-dd HH:mm:ss" ]
timezone => "Etc/UCT"
}
}
Question:
How do I seperate out NeedA, NeedB and NeedC from /NeedA/some1/some2/some3/NeedB/some4/NeedC and put it as different fields in elastic search

Here is the solution:
grok {
match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{WORD:method} \/%{WORD:fieldA}\/.*\/.*\/.*\/%{WORD:fieldB}\/.*\/%{WORD:fieldC} %{NOTSPACE:querystring} %{NOTSPACE:username} %{IPORHOST:ipaddress} %{NOTSPACE:useragent} %{NOTSPACE:referer} %{NUMBER:scstatus} %{NUMBER:scbytes:int} %{NUMBER:csbytes:int} %{NUMBER:timetaken:int}"]
}
In your grok, just replace %{URIPATH:url} by \/%{WORD:fieldA}\/.*\/.*\/.*\/%{WORD:fieldB}\/.*\/%{WORD:fieldC}
The output result:
{
"message" => "2015-09-09 16:02:23 GET /NeedA/some1/some2/some3/NeedB/some4/NeedC f=json - 127.0.0.1 Mozilla/5.0+(Macintosh;+Intel+Mac+OS+X+10_10_5)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/44.0.2403.157+Safari/537.36 http://localhost:3000/ 200 373 554 46",
"#version" => "1",
"#timestamp" => "2015-09-09T16:02:23.000Z",
"host" => "MyHost.local",
"path" => "/path/of/test.log",
"log_timestamp" => "2015-09-09 16:02:23",
"method" => "GET",
"fieldA" => "NeedA",
"fieldB" => "NeedB",
"fieldC" => "NeedC",
"querystring" => "f=json",
"username" => "-",
"ipaddress" => "127.0.0.1",
"useragent" => "Mozilla/5.0+(Macintosh;+Intel+Mac+OS+X+10_10_5)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/44.0.2403.157+Safari/537.36",
"referer" => "http://localhost:3000/",
"scstatus" => "200",
"scbytes" => 373,
"csbytes" => 554,
"timetaken" => 46
}
Regards,
Alain

Related

logstash : create fingerprint from timestamp part

I have a problem to create a fingerprint based on client-ip and a timestamp containing date+hour.
I'm using logstash 7.3.1. Here it the relevant part of my configuration file
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date{
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
...
ruby{
code => "
keydate = Date.parse(event.get('timestamp'))
event.set('keydate', keydate.strftime('%Y%m%d-%H'))
"
}
fingerprint {
key => "my_custom_secret"
method => "SHA256"
concatenate_sources => "true"
source => [
"clientip",
"keydate"
]
}
}
The problem is into the 'ruby' block. I tried multiple methods to compute the keydate, but none works without giving me errors.
The last one (using this config file) is
[ERROR][logstash.filters.ruby ] Ruby exception occurred: Missing Converter handling for full class name=org.jruby.ext.date.RubyDateTime, simple name=RubyDateTime
input document
{
"timestamp" => "19/Sep/2019:00:07:56 +0200",
"referrer" => "-",
"#version" => "1",
"#timestamp" => 2019-09-18T22:07:56.000Z,
...
"request" => "index.php",
"type" => "apache_access",
"clientip" => "54.157.XXX.XXX",
"verb" => "GET",
...
"tags" => [
[0] "_rubyexception" # generated by the ruby exception above
],
"response" => "200"
}
expected output
{
"timestamp" => "19/Sep/2019:00:07:56 +0200",
"referrer" => "-",
"#version" => "1",
"#timestamp" => 2019-09-18T22:07:56.000Z,
...
"request" => "index.php",
"type" => "apache_access",
"clientip" => "54.157.XXX.XXX",
"verb" => "GET",
...
"keydate" => "20190919-00", #format : YYYYMMDD-HH
"fingerprint" => "ab347766ef....1190af",
"response" => "200"
}
As always, many thanks for all your help !
I advice to remove the ruby snippet and use the build in Date filter: https://www.elastic.co/guide/en/logstash/current/plugins-filters-date.html
What you are doing in the ruby snippet is exactly what the date filter does - extract a timestamp from a field and reconstruct it into your desire format.
another option (a bit less recommended, but will also work) is to use grok in order to extract the relevant parts of the timestamp and combine them in a different manner.

Invalid FieldReference occurred when attempting to create index with the same name as request path value using ElasticSearch output

This is my logstash.conf file:
input {
http {
host => "127.0.0.1"
port => 31311
}
}
filter {
mutate {
split => ["%{headers.request_path}", "/"]
add_field => { "index_id" => "%{headers.request_path[0]}" }
add_field => { "document_id" => "%{headers.request_path[1]}" }
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "%{index_id}"
document_id => "%{document_id}"
}
stdout {
codec => "rubydebug"
}
}
When I send a PUT request like
C:\Users\BolverkXR\Downloads\curl-7.64.1-win64-mingw\bin> .\curl.exe
-XPUT 'http://127.0.0.1:31311/twitter'
I want a new index to be created with the name twitter, instead of using the ElasticSearch default.
However, Logstash crashes immediately with the following (truncated) error message:
Exception in pipelineworker, the pipeline stopped processing new
events, please check your filter configuration and restart Logstash.
org.logstash.FieldReference$IllegalSyntaxException: Invalid
FieldReference: headers.request_path[0]
I am sure I have made a syntax error somewhere, but I can't see where it is. How can I fix this?
EDIT:
The same error occurs when I change the filter segment to the following:
filter {
mutate {
split => ["%{[headers][request_path]}", "/"]
add_field => { "index_id" => "%{[headers][request_path][0]}" }
add_field => { "document_id" => "%{[headers][request_path][1]}" }
}
}
To split the field the %{foo} syntax is not used. Also you should start at position [1] of the array, because in position [0] there will be an empty string("") due to the reason that there are no characters at the left of the first separator(/). Instead, your filter section should be something like this:
filter {
mutate {
split => ["[headers][request_path]", "/"]
add_field => { "index_id" => "%{[headers][request_path][1]}" }
add_field => { "document_id" => "%{[headers][request_path][2]}" }
}
}
You can now use the value in %{index_id} and %{document_id}. I tested this using logstash 6.5.3 version and used Postman to send the 'http://127.0.0.1:31311/twitter/1' HTTP request and the output in console was as follows:
{
"message" => "",
"index_id" => "twitter",
"document_id" => "1",
"#version" => "1",
"host" => "127.0.0.1",
"#timestamp" => 2019-04-09T12:15:47.098Z,
"headers" => {
"connection" => "keep-alive",
"http_version" => "HTTP/1.1",
"http_accept" => "*/*",
"cache_control" => "no-cache",
"content_length" => "0",
"postman_token" => "cb81754f-6d1c-4e31-ac94-fde50c0fdbf8",
"accept_encoding" => "gzip, deflate",
"request_path" => [
[0] "",
[1] "twitter",
[2] "1"
],
"http_host" => "127.0.0.1:31311",
"http_user_agent" => "PostmanRuntime/7.6.1",
"request_method" => "PUT"
}
}
The output section of your configuration does not change. So, your final logstash.conf file will be something like this:
input {
http {
host => "127.0.0.1"
port => 31311
}
}
filter {
mutate {
split => ["[headers][request_path]", "/"]
add_field => { "index_id" => "%{[headers][request_path][1]}" }
add_field => { "document_id" => "%{[headers][request_path][2]}" }
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "%{index_id}"
document_id => "%{document_id}"
}
stdout {
codec => "rubydebug"
}
}

geoip.location is defined as an object in mapping [doc] but this name is already used for a field in other types

I'm getting this error:
Could not index event to Elasticsearch. {:status=>400,
:action=>["index", {:_id=>nil, :_index=>"nginx-access-2018-06-15",
:_type=>"doc", :_routing=>nil}, #],
:response=>{"index"=>{"_index"=>"nginx-access-2018-06-15",
"_type"=>"doc", "_id"=>"jo-rfGQBDK_ao1ZhmI8B", "status"=>400,
"error"=>{"type"=>"illegal_argument_exception",
"reason"=>"[geoip.location] is defined as an object in mapping [doc]
but this name is already used for a field in other types"}}}}
I'm getting the above error but don't understand why, this is loading into a brand new ES instance with no data. This is the first record that is inserted. Why am I getting this error? Here is the config:
input {
file {
type => "nginx-access"
start_position => "beginning"
path => [ "/var/log/nginx-archived/access.log.small"]
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
if [type] == "nginx-access" {
grok {
patterns_dir => "/etc/logstash/patterns"
match => { "message" => "%{NGINX_ACCESS}" }
remove_tag => ["_grokparsefailure"]
}
geoip {
source => "visitor_ip"
}
date {
# 11/Jun/2018:06:23:45 +0000
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
target => "#request_time"
}
if "_grokparsefailure" not in [tags] {
ruby {
code => "
thetime = event.get('#request_time').time
event.set('index_date', 'nginx-access-' + thetime.strftime('%Y-%m-%d'))
"
}
}
if "_grokparsefailure" in [tags] {
ruby {
code => "
event.set('index_date', 'nginx-access-error')
"
}
}
}
}
output {
elasticsearch {
hosts => "elasticsearch:9200"
index => "%{index_date}"
template => "/etc/logstash/templates/nginx-access.json"
template_overwrite => true
manage_template => true
template_name => "nginx-access"
}
stdout { }
}
Here's a sample record:
{
"method" => "GET",
"#version" => "1",
"geoip" => {
"continent_code" => "AS",
"latitude" => 39.9289,
"country_name" => "China",
"ip" => "220.181.108.103",
"location" => {
"lon" => 116.3883,
"lat" => 39.9289
},
"region_code" => "11",
"region_name" => "Beijing",
"longitude" => 116.3883,
"timezone" => "Asia/Shanghai",
"city_name" => "Beijing",
"country_code2" => "CN",
"country_code3" => "CN"
},
"index_date" => "nginx-access-2018-06-15",
"ignore" => "\"-\"",
"bytes" => "2723",
"request" => "/wp-login.php",
"#request_time" => 2018-06-15T06:29:40.000Z,
"message" => "220.181.108.103 - - [15/Jun/2018:06:29:40 +0000] \"GET /wp-login.php HTTP/1.1\" 200 2723 \"-\" \"Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)\"",
"path" => "/var/log/nginx-archived/access.log.small",
"#timestamp" => 2018-07-09T01:32:56.952Z,
"host" => "ab1526efddec",
"visitor_ip" => "220.181.108.103",
"timestamp" => "15/Jun/2018:06:29:40 +0000",
"response" => "200",
"referrer" => "\"Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)\"",
"httpversion" => "1.1",
"type" => "nginx-access"
}
Figured out the answer, based on this:
https://www.elastic.co/guide/en/elasticsearch/reference/6.x/removal-of-types.html#_schedule_for_removal_of_mapping_types
The basic problem is that for each Elasticsearch index, each field must be the same type, even if the records are different types.
That is, if I have a person { "status": "A" } stored as text I cannot have a record for a car { "status": 23 } stored as a number in the same index. Based on the info in the link above, I'm storing one "type" per index.
My output section for Logstash looks like this:
output {
elasticsearch {
hosts => "elasticsearch:9200"
index => "%{index_date}"
# Can test loading this with:
# curl -XPUT -H 'Content-Type: application/json' -d#/docker-elk/logstash/templates/nginx-access.json http://localhost:9200/_template/nginx-access
template => "/etc/logstash/templates/nginx-access.json"
template_overwrite => true
manage_template => true
template_name => "nginx-access"
}
stdout { }
}
My template looks like this:
{
"index_patterns": ["nginx-access*"],
"settings": {
},
"mappings": {
"doc": {
"_source": {
"enabled": true
},
"properties": {
"type" : { "type": "keyword" },
"response_time": { "type": "float" },
"geoip" : {
"properties" : {
"location": {
"type": "geo_point"
}
}
}
}
}
}
}
I'm also using the one type per index pattern described in the link above.

Could not able to use geo_ip in logstash 2.4

I'm trying to use geoip from apache access log with logstash 2.4, elasticsearch 2.4, kibna 4.6.
my logstash filter is...
input {
file {
path => "/var/log/httpd/access_log"
type => "apache"
start_position => "beginning"
}
}
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
geoip {
source => "clientip"
target => "geoip"
database =>"/home/elk/logstash-2.4.0/GeoLiteCity.dat"
#add_field => { "foo_%{somefield}" => "Hello world, from %{host}" }
add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ]
}
mutate {
convert => [ "[geoip][coordinates]", "float" ]
}
}
output {
stdout { codec => rubydebug }
elasticsearch
{ hosts => ["192.168.56.200:9200"]
sniffing => true
manage_template => false
index => "apache-geoip-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
}
And if elasticsearch parsing some apache access log, the output is...
{
"message" => "xxx.xxx.xxx.xxx [24/Oct/2016:14:46:30 +0900] HTTP/1.1 8197 /images/egovframework/com/cmm/er_logo.jpg 200",
"#version" => "1",
"#timestamp" => "2016-10-24T05:46:34.505Z",
"path" => "/NCIALOG/JBOSS/SMBA/default-host/access_log.2016-10-24",
"host" => "smba",
"type" => "jboss_access_log",
"clientip" => "xxx.xxxx.xxx.xxx",
"geoip" => {
"ip" => "xxx.xxx.xxx.xxx",
"country_code2" => "KR",
"country_code3" => "KOR",
"country_name" => "Korea, Republic of",
"continent_code" => "AS",
"region_name" => "11",
"city_name" => "Seoul",
"latitude" => xx.5985,
"longitude" => xxx.97829999999999,
"timezone" => "Asia/Seoul",
"real_region_name" => "Seoul-t'ukpyolsi",
"location" => [
[0] xxx.97829999999999,
[1] xx.5985
],
"coordinates" => [
[0] xxx.97829999999999,
[1] xx.5985
]
}
}
I could not able to see geo_point field.
please help me.
Thanks.
I added my error in tile map .
It says "logstash-* index pattern does not contain any of the following field types: geo_point"
Mmmmm.... the geoip fields are already into you response !
Into the field "geoip" you can find all needed informations (ip, continent, country name, ...). The added field coordinates are present too.
So, what's the problem ?

input json to logstash - config issues?

i have the following json input that i want to dump to logstash (and eventually search/dashboard in elasticsearch/kibana).
{"vulnerabilities":[
{"ip":"10.1.1.1","dns":"z.acme.com","vid":"12345"},
{"ip":"10.1.1.2","dns":"y.acme.com","vid":"12345"},
{"ip":"10.1.1.3","dns":"x.acme.com","vid":"12345"}
]}
i'm using the following logstash configuration
input {
file {
path => "/tmp/logdump/*"
type => "assets"
codec => "json"
}
}
output {
stdout { codec => rubydebug }
elasticsearch { host => localhost }
}
output
{
"message" => "{\"vulnerabilities\":[\r",
"#version" => "1",
"#timestamp" => "2014-10-30T23:41:19.788Z",
"type" => "assets",
"host" => "av12612sn00-pn9",
"path" => "/tmp/logdump/stack3.json"
}
{
"message" => "{\"ip\":\"10.1.1.30\",\"dns\":\"z.acme.com\",\"vid\":\"12345\"},\r",
"#version" => "1",
"#timestamp" => "2014-10-30T23:41:19.838Z",
"type" => "assets",
"host" => "av12612sn00-pn9",
"path" => "/tmp/logdump/stack3.json"
}
{
"message" => "{\"ip\":\"10.1.1.31\",\"dns\":\"y.acme.com\",\"vid\":\"12345\"},\r",
"#version" => "1",
"#timestamp" => "2014-10-30T23:41:19.870Z",
"type" => "shellshock",
"host" => "av1261wag2sn00-pn9",
"path" => "/tmp/logdump/stack3.json"
}
{
"ip" => "10.1.1.32",
"dns" => "x.acme.com",
"vid" => "12345",
"#version" => "1",
"#timestamp" => "2014-10-30T23:41:19.884Z",
"type" => "assets",
"host" => "av12612sn00-pn9",
"path" => "/tmp/logdump/stack3.json"
}
obviously logstash is treating each line as an event and it thinks {"vulnerabilities":[ is an event and i'm guessing the trailing commas on the 2 subsequent nodes mess up the parsing, and the last node appears coorrect. how do i tell logstash to parse the events inside the vulnerabilities array and to ignore the commas at the end of the line?
Updated: 2014-11-05
Following Magnus' recommendations, I added the json filter and it's working perfectly. However, it would not parse the last line of the json correctly without specifying start_position => "beginning" in the file input block. Any ideas why not? I know it parses bottom up by default but would anticipate the mutate/gsub would handle this smoothly?
file {
path => "/tmp/logdump/*"
type => "assets"
start_position => "beginning"
}
}
filter {
if [message] =~ /^\[?{"ip":/ {
mutate {
gsub => [
"message", "^\[{", "{",
"message", "},?\]?$", "}"
]
}
json {
source => "message"
remove_field => ["message"]
}
}
}
output {
stdout { codec => rubydebug }
elasticsearch { host => localhost }
}
You could skip the json codec and use a multiline filter to join the message into a single string that you can feed to the json filter.filter {
filter {
multiline {
pattern => '^{"vulnerabilities":\['
negate => true
what => "previous"
}
json {
source => "message"
}
}
However, this produces the following unwanted results:
{
"message" => "<omitted for brevity>",
"#version" => "1",
"#timestamp" => "2014-10-31T06:48:15.589Z",
"host" => "name-of-your-host",
"tags" => [
[0] "multiline"
],
"vulnerabilities" => [
[0] {
"ip" => "10.1.1.1",
"dns" => "z.acme.com",
"vid" => "12345"
},
[1] {
"ip" => "10.1.1.2",
"dns" => "y.acme.com",
"vid" => "12345"
},
[2] {
"ip" => "10.1.1.3",
"dns" => "x.acme.com",
"vid" => "12345"
}
]
}
Unless there's a fixed number of elements in the vulnerabilities array I don't think there's much we can do with this (without resorting to the ruby filter).
How about just applying the json filter to lines that look like what we want and drop the rest? Your question doesn't make it clear whether all of the log looks like this so this may not be so useful.
filter {
if [message] =~ /^\s+{"ip":/ {
# Remove trailing commas
mutate {
gsub => ["message", ",$", ""]
}
json {
source => "message"
remove_field => ["message"]
}
} else {
drop {}
}
}

Resources