Logstash split filter - elasticsearch

Recently I have discovered that I am able to pool data directly from the Logstash by directly providing URLs. Fetching the input works very well, however it downloads and loads full documents into ES.
I would like to create a new record on elastic search for every line. By default whole file is loaded in a message field and it slows Kibana loads in Discovery tab etc.
Kibana output:
{
"_index": "blacklists",
"_type": "default",
"_id": "pf3k_2QB9sEBYW4CK4AA",
"_version": 1,
"_score": null,
"_source": {
"#timestamp": "2018-08-03T13:05:00.569Z",
"tags": [
"_jsonparsefailure",
"c2_info",
"ipaddress"
],
"#version": "1",
"message": "#############################################################\n## Master Feed of known, active and non-sinkholed C&Cs IP \n## addresses\n## \n## HIGH-CONFIDENCE FAMILIES ONLY\n## \n## Feed generated at: 2018-08-03 12:13 \n##\n## Feed Provided By: John Bambenek of Bambenek Consulting\n## jcb#bambenekconsulting.com // http://bambenekconsulting.com\n## Use of this feed is governed by the license here: \n## http://osint.bambenekconsulting.com/license.txt,
"client": "204.11.56.48",
"http_poller_metadata": {
"name": "bembenek_c2",
"host": "node1",
"request": {
"method": "get",
"url": "http://osint.bambenekconsulting.com/feeds/c2-ipmasterlist-high.txt"
},
"response_message": "OK",
"runtime_seconds": 0.27404,
"response_headers": {
"content-type": "text/plain",
"accept-ranges": "bytes",
"cf-ray": "4448fe69e02197ce-FRA",
"date": "Fri, 03 Aug 2018 13:05:05 GMT",
"connection": "keep-alive",
"last-modified": "Fri, 03 Aug 2018 12:13:44 GMT",
"server": "cloudflare",
"vary": "Accept-Encoding",
"etag": "\"4bac-57286dbe759e4-gzip\""
},
"code": 200,
"times_retried": 0
}
},
"fields": {
"#timestamp": [
"2018-08-03T13:05:00.569Z"
]
},
"sort": [
1533301500569
]
}
Logstash config:
input {
http_poller {
urls => {
bembenek_c2 => "http://osint.bambenekconsulting.com/feeds/c2-ipmasterlist-high.txt"
bembenek_c2dom => "http://osint.bambenekconsulting.com/feeds/c2-dommasterlist-high.txt"
blocklists_all => "http://lists.blocklist.de/lists/all.txt"
}
request_timeout => 30
codec => "json"
tags => c2_info
schedule => { cron => "*/10 * * * *"}
metadata_target => "http_poller_metadata"
}
}
filter {
grok {
match => { "message" => [
"%{IPV4:ipaddress}" }
add_tag => [ "ipaddress" ]
}
}
output {
stdout { codec => dots }
elasticsearch {
hosts => ["10.0.50.51:9200"]
index => "blacklists"
document_type => "default"
template_overwrite => true
}
file {
path => "/tmp/blacklists.json"
codec => json {}
}
}
Does anyone know how to split the loaded file with "\n"?
I have tried
filter {
split {
terminator => "\n"
}
}
Documentation and examples how to use this filter is not that popular.

The missing filter was:
filter {
split {
field => "[message]"
}
}
We do not have to specify the terminator, as it is set by default as "\n" per Logstash 6.3 documentation.

Related

Elastic's Logstash Mutate Split not working

I'm having trouble splitting the http.request.referrer field in logstash. This is coming from packet beat. I want to only use the domain and not the full path. With the following filter, as suggested here https://www.elastic.co/guide/en/logstash/current/plugins-filters-mutate.html, I get the following error
[WARN ][logstash.filters.mutate ] Exception caught while applying mutate filter {:exception=>"Invalid FieldReference: `sfa[2]`"}
But if I dont try to retrieve the second element and just used the field sfa to add to sfa_ref then it works, only with the forward slashes replaced by commas.
filter {
mutate {
add_field => {"sfa" => "%{[http][request][referrer]}"}
}
mutate {
split => ["sfa", "/"]
add_field => {"sfa_ref" => "%{sfa[2]}"}
}
}
input is as follows:
{
"http": {
"request": {
"bytes": 727,
"method": "get",
"headers": {
"content-length": 0
},
"referrer": "https://example.domain.com/web/font-awesome/css/font-awesome.min.css"
},
"response": {
"bytes": 66989,
"status_code": 200,
"body": {
"bytes": 66624
},
"headers": {
"content-length": 66624,
"content-type": "application/font-woff2"
}
},
"version": "1.1"
},
"status": "OK"
}
After the split,the sfa field becomes:
"sfa": [ "https:", "", "example.domain.com", "web", "font-awesome", "css", "font-awesome.min.css" ]
The documentation followed seems to be outdated. In the newer versions of logstash, the proper way to address an array or elements in them is %{[field_name][index]}.
So I needed square brackects around the field name also.
mutate {
split => ["sfa", "/"]
add_field => {"sfa_ref" => "%{[sfa][2]}"}
}

Using multiple config files for logstash

I am just learning elasticsearch and I need to know how to correctly split a configuration file into multiple. I'm using the official logstash on docker with ports bound on 9600 and 5044. Originally I had a working single logstash file without conditionals like so:
input {
beats {
port => '5044'
}
}
filter
{
grok{
match => {
"message" => "%{TIMESTAMP_ISO8601:timestamp} \[(?<event_source>[\w\s]+)\]:\[(?<log_type>[\w\s]+)\]:\[(?<id>\d+)\] %{GREEDYDATA:details}"
"source" => "%{GREEDYDATA}\\%{GREEDYDATA:app}.log"
}
}
mutate{
convert => { "id" => "integer" }
}
date {
match => [ "timestamp", "ISO8601" ]
locale => en
remove_field => "timestamp"
}
}
output
{
elasticsearch {
hosts => ["http://elastic:9200"]
index => "logstash-supportworks"
}
}
When I wanted to add metricbeat I decided to split that configuration into a new file. So I ended up with 3 files:
__input.conf
input {
beats {
port => '5044'
}
}
metric.conf
# for testing I'm adding no filters just to see what the data looks like
output {
if ['#metadata']['beat'] == 'metricbeat' {
elasticsearch {
hosts => ["http://elastic:9200"]
index => "%{[#metadata][beat]}-%{[#metadata][version]}"
}
}
}
supportworks.conf
filter
{
if ["source"] =~ /Supportwork Server/ {
grok{
match => {
"message" => "%{TIMESTAMP_ISO8601:timestamp} \[(?<event_source>[\w\s]+)\]:\[(?<log_type>[\w\s]+)\]:\[(?<id>\d+)\] %{GREEDYDATA:details}"
"source" => "%{GREEDYDATA}\\%{GREEDYDATA:app}.log"
}
}
mutate{
convert => { "id" => "integer" }
}
date {
match => [ "timestamp", "ISO8601" ]
locale => en
remove_field => "timestamp"
}
}
}
output
{
if ["source"] =~ /Supportwork Server/ {
elasticsearch {
hosts => ["http://elastic:9200"]
index => "logstash-supportworks"
}
}
}
Now no data is being sent to the ES instance. I have verified that filebeat at least is running and publishing messages, so I'd expect to at least see that much going to ES. Here's a published message from my server running filebeat
2019-03-06T09:16:44.634-0800 DEBUG [publish] pipeline/processor.go:308 Publish event: {
"#timestamp": "2019-03-06T17:16:44.634Z",
"#metadata": {
"beat": "filebeat",
"type": "doc",
"version": "6.6.1"
},
"source": "C:\\Program Files (x86)\\Hornbill\\Supportworks Server\\log\\swserver.log",
"offset": 4773212,
"log": {
"file": {
"path": "C:\\Program Files (x86)\\Hornbill\\Supportworks Server\\log\\swserver.log"
}
},
"message": "2019-03-06 09:16:42 [COMMS]:[INFO ]:[4924] Helpdesk API (5005) Socket error while idle - 10053",
"prospector": {
"type": "log"
},
"input": {
"type": "log"
},
"beat": {
"name": "WIN-22VRRIEO8LM",
"hostname": "WIN-22VRRIEO8LM",
"version": "6.6.1"
},
"host": {
"name": "WIN-22VRRIEO8LM",
"architecture": "x86_64",
"os": {
"platform": "windows",
"version": "6.3",
"family": "windows",
"name": "Windows Server 2012 R2 Standard",
"build": "9600.0"
},
"id": "e5887ac2-6fbf-45ef-998d-e40437066f56"
}
}
I got this working by adding a mutate filter to __input.conf to replace backslashes with forward slashes in the source field
filter {
mutate{
gsub => [ "source", "[\\]", "/" ]
}
}
And removing the " from the field accessors in my conditionals So
if ["source"] =~ /Supportwork Server/
Became
if [source] =~ /Supportwork Server/
Both changes seemed to be necessary to get this configuration working.

Outputting document metadata from ElasticSearch using Logstash output csv plugin

I am attempting to output the _id metadata field from ES into a CSV file using Logstash.
{
"_index": "data",
"_type": "default",
"_id": "vANfNGYB9XD0VZRJUFfy",
"_version": 1,
"_score": null,
"_source": {
"vulnid": "CVE-2018-1000060",
"product": [],
"year": "2018",
"month": "02",
"day": "09",
"hour": "23",
"minute": "29",
"published": "2018-02-09T18:29:02.213-05:00",
},
"sort": [
1538424651203
]
}
My logstash output filter is:
output { csv { fields => [ "_id", "vulnid", "published"] path =>
"/tmp/export.%{+YYYY-MM-dd-hh-mm}.csv" } }
I get output:
,CVE-2018-1000060,2018-02-09T18:29:02.213-05:00
But I would like to get:
vANfNGYB9XD0VZRJUFfy,CVE-2018-1000060,2018-02-09T18:29:02.213-05:00
How to output the metadata _id into the csv file?
It does not matter if I specify the field like "_id" or "#_id" or "#id".
When we query ES we have to enable docinfo => true. By default it is false.
input {
elasticsearch {
hosts => [ your hosts ]
index => "ti"
query => '{your query}'
size => 1000
scroll => "1s"
docinfo => true
schedule => "14 * * * *"
}
}
Well logstash is not able to get "_id" field from your input, because you must not have set the option docinfo into true.
docinfo helps to include elasticsearch documents information such as index,type _id etc..Please have a look here for more info https://www.elastic.co/guide/en/logstash/current/plugins-inputs-elasticsearch.html#plugins-inputs-elasticsearch-docinfo
use your input plugin as
input {
elasticsearch {
hosts => "hostname"
index => "yourIndex"
query => '{ "query": { "query_string": { "query": "*" } } }' //optional
size => 500 //optional
scroll => "5m" //optional
docinfo => true
}
}

how to filter a simple message via LogStash to ElasticSearch dividing the message in multiple fields

This is the input file:
{"meta":"","level":"error","message":"clientErrorHandler: Erro não previsto ou mapeado durante chamada dos serviços.","timestamp":"2017-04-06T16:08:37.861Z"}
{"meta":"","level":"error","message":"clientErrorHandler: Erro não previsto ou mapeado durante chamada dos serviços.","timestamp":"2017-04-06T19:40:17.682Z"}
Basically, such log is the outcome of my NodeJs Application via Winstom module. My doubt focuses how to adjust the logstash filter in order to get 4 fields created in ElasticSearch.
My intention is to see "columns" (properties or fileds may be better words in ElasticSearch context I guess): level (eg. error), message_source (eg. clientErrorHandler), message_content (eg. Erro não ...serviços) and error_time without nanoseconds (eg. 2017-04-06T19:40:17).
I got stuck on this point:
1 - I used this logstash.conf
input {
file {
path => "/home/demetrio/dev/testes_manuais/ELK/logs/*"
start_position => "beginning"
}
}
filter {
grok {
match => {
"message" => '%{SYSLOG5424SD:loglevel} %{TIMESTAMP_ISO8601:Date} %{GREEDYDATA:content}'
}
}
date {
match => [ "Date", "YYYY-mm-dd HH:mm:ss.SSS" ]
locale => en
}
}
output {
stdout {
codec => plain {
charset => "ISO-8859-1"
}
}
elasticsearch {
hosts => "http://127.0.0.1:9200"
index => "dmz-logs-indice"
}
}
2 - search ElasticSearch via Kibana DevTools
GET _search
{
"query": {
"match_all": {}
}
}
and I saw:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 6,
"successful": 6,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": ".kibana",
"_type": "config",
"_id": "5.3.0",
"_score": 1,
"_source": {
"buildNum": 14823
}
},
{
"_index": "dmz-logs-indice",
"_type": "logs",
"_id": "AVtJLZ5x6gscWn5fxxA_",
"_score": 1,
"_source": {
"path": "/home/demetrio/dev/testes_manuais/ELK/logs/logs.log",
"#timestamp": "2017-04-07T16:09:36.996Z",
"#version": "1",
"host": "nodejs",
"message": """{"meta":"","level":"error","message":"clientErrorHandler: Erro não previsto ou mapeado durante chamada dos serviços.","timestamp":"2017-04-06T16:08:37.861Z"}""",
"tags": [
"_grokparsefailure"
]
}
},
{
"_index": "dmz-logs-indice",
"_type": "logs",
"_id": "AVtJLZ5x6gscWn5fxxBA",
"_score": 1,
"_source": {
"path": "/home/demetrio/dev/testes_manuais/ELK/logs/logs.log",
"#timestamp": "2017-04-07T16:09:36.998Z",
"#version": "1",
"host": "nodejs",
"message": """{"meta":"","level":"error","message":"clientErrorHandler: Erro não previsto ou mapeado durante chamada dos serviços.","timestamp":"2017-04-06T19:40:17.682Z"}""",
"tags": [
"_grokparsefailure"
]
}
}
]
}
}
I guess I should use some RegularExpresss or Grok in order to divide in four peaces:
1 - level
2 - message with what come before ":"
3 - message with what come after ":"
4 - timestamp
And, if it is possible, provide better column (field/property) labels like:
1 - level
2 - message_source
3 - message_content
4 - error_time
And finally remove the timestamp nanoseconds
PS. Just in case some future reader get interested on how I am logging in NodeJs, here you are:
...
var winston = require('winston');
winston.emitErrs = true;
var logger = new winston.Logger({
transports: [
new winston.transports.File({
level: 'error',
filename: './logs/logs.log',
handleExceptions: true,
json: true,
maxsize: 5242880, //5MB
maxFiles: 5,
colorize: false,
prettyPrint: true
})
],
exitOnError: false
});
...
function clientErrorHandler(err, req, res, next) {
logger.log("error","clientErrorHandler: Erro não previsto ou mapeado durante chamada dos serviços.",err.message);
res.send(500, { error: 'Erro genérico!' });
}
app.use(clientErrorHandler);
PS2: I carefully read questions like Filter specific Message with logstash before sending to ElasticSearch but I am really stuck
Since your application outputs log as JSON string, you can configure Logstash to parse the log as JSON. This is as simple as adding codec => "json" into the file input configuration.
Below is an example configuration for your scenario:
input {
file {
path => "/home/demetrio/dev/testes_manuais/ELK/logs/*"
start_position => "beginning"
codec => "json"
}
}
filter {
# This matches `timestamp` field into `#timestamp` field for Kibana to consume.
date {
match => [ "timestamp", "ISO8601" ]
remove_field => [ "timestamp" ]
}
}
output {
stdout {
# This codec gives your more details about the event.
codec => rubydebug
}
elasticsearch {
hosts => "http://127.0.0.1:9200"
index => "dmz-logs-indice"
}
}
This is the sample stdout from Logstash:
{
"path" => "/home/demetrio/dev/testes_manuais/ELK/logs/demo.log",
"#timestamp" => 2017-04-06T19:40:17.682Z,
"level" => "error",
"meta" => "",
"#version" => "1",
"host" => "dbf718c4b8e4",
"message" => "clientErrorHandler: Erro não previsto ou mapeado durante chamada dos serviços.",
}

logstash splits event field values and assign to #metadata field

I have a logstash event, which has the following field
{
"_index": "logstash-2016.08.09",
"_type": "log",
"_id": "AVZvz2ix",
"_score": null,
"_source": {
"message": "function_name~execute||line_no~128||debug_message~id was not found",
"#version": "1",
"#timestamp": "2016-08-09T14:57:00.147Z",
"beat": {
"hostname": "coredev",
"name": "coredev"
},
"count": 1,
"fields": null,
"input_type": "log",
"offset": 22299196,
"source": "/project_root/project_1/log/core.log",
"type": "log",
"host": "coredev",
"tags": [
"beats_input_codec_plain_applied"
]
},
"fields": {
"#timestamp": [
1470754620147
]
},
"sort": [
1470754620147
]
}
I am wondering how to use filter (kv maybe?) to extract core.log from "source": "/project_root/project_1/log/core.log", and put it in e.g. [#metadata][log_type], and so later on, I can use log_type in output to create an unique index, composing of hostname + logtype + timestamp, e.g.
output {
elasticsearch {
hosts => "localhost:9200"
manage_template => false
index => "%{[#metadata][_source][host]}-%{[#metadata][log_type]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
stdout { codec => rubydebug }
}
You can leverage the mutate/gsub filter in order to achieve this:
filter {
# add the log_type metadata field
mutate {
add_field => {"[#metadata][log_type]" => "%{source}"}
}
# remove everything up to the last slash
mutate {
gsub => [ "[#metadata][log_type]", "^.*\/", "" ]
}
}
Then you can modify your elasticsearch output like this:
output {
elasticsearch {
hosts => ["localhost:9200"]
manage_template => false
index => "%{host}-%{[#metadata][log_type]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
stdout { codec => rubydebug }
}

Resources