logstash splits event field values and assign to #metadata field - filter

I have a logstash event, which has the following field
{
"_index": "logstash-2016.08.09",
"_type": "log",
"_id": "AVZvz2ix",
"_score": null,
"_source": {
"message": "function_name~execute||line_no~128||debug_message~id was not found",
"#version": "1",
"#timestamp": "2016-08-09T14:57:00.147Z",
"beat": {
"hostname": "coredev",
"name": "coredev"
},
"count": 1,
"fields": null,
"input_type": "log",
"offset": 22299196,
"source": "/project_root/project_1/log/core.log",
"type": "log",
"host": "coredev",
"tags": [
"beats_input_codec_plain_applied"
]
},
"fields": {
"#timestamp": [
1470754620147
]
},
"sort": [
1470754620147
]
}
I am wondering how to use filter (kv maybe?) to extract core.log from "source": "/project_root/project_1/log/core.log", and put it in e.g. [#metadata][log_type], and so later on, I can use log_type in output to create an unique index, composing of hostname + logtype + timestamp, e.g.
output {
elasticsearch {
hosts => "localhost:9200"
manage_template => false
index => "%{[#metadata][_source][host]}-%{[#metadata][log_type]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
stdout { codec => rubydebug }
}

You can leverage the mutate/gsub filter in order to achieve this:
filter {
# add the log_type metadata field
mutate {
add_field => {"[#metadata][log_type]" => "%{source}"}
}
# remove everything up to the last slash
mutate {
gsub => [ "[#metadata][log_type]", "^.*\/", "" ]
}
}
Then you can modify your elasticsearch output like this:
output {
elasticsearch {
hosts => ["localhost:9200"]
manage_template => false
index => "%{host}-%{[#metadata][log_type]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
stdout { codec => rubydebug }
}

Related

How to get fields inside message array from Logstash?

I've been trying to configure a logstash pipeline with input type is snmptrap along with yamlmibdir. Here's the code
input {
snmptrap {
host => "abc"
port => 1062
yamlmibdir => "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/snmp-1.3.2/data/ruby/snmp/mibs"
}
}
filter {
mutate {
gsub => ["message","^\"{","{"]
gsub => ["message","}\"$","}"]
gsub => ["message","[\\]",""]
}
json { source => "message" }
split {
field => "message"
target => "evetns"
}
}
output {
elasticsearch {
hosts => "xyz"
index => "logstash-%{+YYYY.MM.dd}"
}
stdout { codec => rubydebug }
}
and the result shown in Kibana (JSON format)
{
"_index": "logstash-2019.11.18-000001",
"_type": "_doc",
"_id": "Y_5zjG4B6M9gb7sxUJwG",
"_version": 1,
"_score": null,
"_source": {
"#version": "1",
"#timestamp": "2019-11-21T05:33:07.675Z",
"tags": [
"_jsonparsefailure"
],
"1.11.12.13.14.15": "teststring",
"message": "#<SNMP::SNMPv1_Trap:0x244bf33f #enterprise=[1.2.3.4.5.6], #timestamp=#<SNMP::TimeTicks:0x196a1590 #value=55>, #varbind_list=[#<SNMP::VarBind:0x21f5e155 #name=[1.11.12.13.14.15], #value=\"teststring\">], #specific_trap=99, #source_ip=\"xyz\", #agent_addr=#<SNMP::IpAddress:0x5a5c3c5f #value=\"xC0xC1xC2xC3\">, #generic_trap=6>",
"host": "xyz"
},
"fields": {
"#timestamp": [
"2019-11-21T05:33:07.675Z"
]
},
"sort": [
1574314387675
]
}
As you can see in the message field, it's an array so how can I get all the field inside the array. also able to select these field to display on Kibana.
ps1. still got tags _jsonparsefailure if select type 'Table' in Expanded document
ps2. even if using gsub for remove '\' from expected json result, why still got an result with '\' ?

Outputting document metadata from ElasticSearch using Logstash output csv plugin

I am attempting to output the _id metadata field from ES into a CSV file using Logstash.
{
"_index": "data",
"_type": "default",
"_id": "vANfNGYB9XD0VZRJUFfy",
"_version": 1,
"_score": null,
"_source": {
"vulnid": "CVE-2018-1000060",
"product": [],
"year": "2018",
"month": "02",
"day": "09",
"hour": "23",
"minute": "29",
"published": "2018-02-09T18:29:02.213-05:00",
},
"sort": [
1538424651203
]
}
My logstash output filter is:
output { csv { fields => [ "_id", "vulnid", "published"] path =>
"/tmp/export.%{+YYYY-MM-dd-hh-mm}.csv" } }
I get output:
,CVE-2018-1000060,2018-02-09T18:29:02.213-05:00
But I would like to get:
vANfNGYB9XD0VZRJUFfy,CVE-2018-1000060,2018-02-09T18:29:02.213-05:00
How to output the metadata _id into the csv file?
It does not matter if I specify the field like "_id" or "#_id" or "#id".
When we query ES we have to enable docinfo => true. By default it is false.
input {
elasticsearch {
hosts => [ your hosts ]
index => "ti"
query => '{your query}'
size => 1000
scroll => "1s"
docinfo => true
schedule => "14 * * * *"
}
}
Well logstash is not able to get "_id" field from your input, because you must not have set the option docinfo into true.
docinfo helps to include elasticsearch documents information such as index,type _id etc..Please have a look here for more info https://www.elastic.co/guide/en/logstash/current/plugins-inputs-elasticsearch.html#plugins-inputs-elasticsearch-docinfo
use your input plugin as
input {
elasticsearch {
hosts => "hostname"
index => "yourIndex"
query => '{ "query": { "query_string": { "query": "*" } } }' //optional
size => 500 //optional
scroll => "5m" //optional
docinfo => true
}
}

Logstash split filter

Recently I have discovered that I am able to pool data directly from the Logstash by directly providing URLs. Fetching the input works very well, however it downloads and loads full documents into ES.
I would like to create a new record on elastic search for every line. By default whole file is loaded in a message field and it slows Kibana loads in Discovery tab etc.
Kibana output:
{
"_index": "blacklists",
"_type": "default",
"_id": "pf3k_2QB9sEBYW4CK4AA",
"_version": 1,
"_score": null,
"_source": {
"#timestamp": "2018-08-03T13:05:00.569Z",
"tags": [
"_jsonparsefailure",
"c2_info",
"ipaddress"
],
"#version": "1",
"message": "#############################################################\n## Master Feed of known, active and non-sinkholed C&Cs IP \n## addresses\n## \n## HIGH-CONFIDENCE FAMILIES ONLY\n## \n## Feed generated at: 2018-08-03 12:13 \n##\n## Feed Provided By: John Bambenek of Bambenek Consulting\n## jcb#bambenekconsulting.com // http://bambenekconsulting.com\n## Use of this feed is governed by the license here: \n## http://osint.bambenekconsulting.com/license.txt,
"client": "204.11.56.48",
"http_poller_metadata": {
"name": "bembenek_c2",
"host": "node1",
"request": {
"method": "get",
"url": "http://osint.bambenekconsulting.com/feeds/c2-ipmasterlist-high.txt"
},
"response_message": "OK",
"runtime_seconds": 0.27404,
"response_headers": {
"content-type": "text/plain",
"accept-ranges": "bytes",
"cf-ray": "4448fe69e02197ce-FRA",
"date": "Fri, 03 Aug 2018 13:05:05 GMT",
"connection": "keep-alive",
"last-modified": "Fri, 03 Aug 2018 12:13:44 GMT",
"server": "cloudflare",
"vary": "Accept-Encoding",
"etag": "\"4bac-57286dbe759e4-gzip\""
},
"code": 200,
"times_retried": 0
}
},
"fields": {
"#timestamp": [
"2018-08-03T13:05:00.569Z"
]
},
"sort": [
1533301500569
]
}
Logstash config:
input {
http_poller {
urls => {
bembenek_c2 => "http://osint.bambenekconsulting.com/feeds/c2-ipmasterlist-high.txt"
bembenek_c2dom => "http://osint.bambenekconsulting.com/feeds/c2-dommasterlist-high.txt"
blocklists_all => "http://lists.blocklist.de/lists/all.txt"
}
request_timeout => 30
codec => "json"
tags => c2_info
schedule => { cron => "*/10 * * * *"}
metadata_target => "http_poller_metadata"
}
}
filter {
grok {
match => { "message" => [
"%{IPV4:ipaddress}" }
add_tag => [ "ipaddress" ]
}
}
output {
stdout { codec => dots }
elasticsearch {
hosts => ["10.0.50.51:9200"]
index => "blacklists"
document_type => "default"
template_overwrite => true
}
file {
path => "/tmp/blacklists.json"
codec => json {}
}
}
Does anyone know how to split the loaded file with "\n"?
I have tried
filter {
split {
terminator => "\n"
}
}
Documentation and examples how to use this filter is not that popular.
The missing filter was:
filter {
split {
field => "[message]"
}
}
We do not have to specify the terminator, as it is set by default as "\n" per Logstash 6.3 documentation.

Elasticsearch - Range query work wrong

I used kibana DEV Tools to query some range data,but there have 2 hits is out of my expectation,why it happens?
image of the range query
the query:
{
"query" : {
"constant_score" : {
"filter" : {
"range" : {
"rss" : {
"gte": 3000000
}
}
}
}
}
}
the result:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 69,
"successful": 69,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "BBQ",
"_type": "BBQ",
"_id": "AWGJaCYkk-tGbWgj2e6R",
"_score": 1,
"_source": {
"message": [
"nodeProcessInfo"
],
"#timestamp": "2018-02-12T09:45:59.525Z",
"rss": "92636",
"#version": "1",
"host": "192.168.213.96"
}
},
{
"_index": "BBQ",
"_type": "BBQ",
"_id": "AWGJaJxzk-tGbWgj2e-V",
"_score": 1,
"_source": {
"message": [
"nodeProcessInfo"
],
"#timestamp": "2018-02-12T09:46:29.680Z",
"rss": "85272",
"#version": "1",
"host": "192.168.213.96"
}
}
]
}
}
The result of range query is not in my expectation, why gte => 3000000 but rss = 92636 appeared?
======================edit at 2018.2.13=========(1)
the log like this:
"nodeProcessInfo|auth-server-1|auth|9618|1.9|1.2|98060|2018-2-12 6:33:43 PM|"
the filter like this:
filter {
if "nodeProcessInfo" in [message] {
mutate {
split => ["message", "|"]
add_field => {
"serverId" => "%{[message[1]]}"
}
add_field => {
"serverType" => "%{[message[2]]}"
}
add_field => {
"pid" => "%{[message[3]]}"
}
add_field => {
"cpuAvg" => "%{[message[4]]}"
}
add_field => {
"memAvg" => "%{[message[5]]}"
}
add_field => {
"rss" => "%{[message[6]]}"
}
add_field => {
"time" => "%{[message[7]]}"
}
convert => ["rss", "integer"] # I try convert rss to int, but failed
add_tag => "nodeProcessInfo"
}
}
}
======================edit at 2018.2.13=========(2)
I let the convert code in a new mutate, and it worked to make "rss" into int type,but the result of range query also wrong,the change code like this:
if "nodeProcessInfo" in [message] {
mutate {
split => ["message", "|"]
...
...
add_field => {
"rss" => "%{[message[6]]}"
}
}
mutate {
convert => ["rss", "integer"] # add a new mutate here
}
}
======================edit at 2018.2.13=========(3)
At last I found the reason why rss'type is converted to int but range query also wrong:
"You can't change existing mapping type, you need to create a new index with the correct mapping and index the data again."
so I create a new field name to instead of rss and the result of range query is right now.
Can you share the mapping of the index.
I thing the problem is as i can see in the search results which you have shared , the type of the rss field is text or string.
If it is so then the range query you are using is treating them as string characters and giving you results according to that.
And what you are trying to use is number ranges which will work if you index data with type of rss field as long and then fire the same query.
You would then get the desired reuslts

Correlate messages in ELK by field

Related to: Combine logs and query in ELK
We are setting up ELK and would want to create a visualization in Kibana 4.
The issue here is that we want to relate between two different types of message.
To simplify:
Message type 1 fields: message_type, common_id_number, byte_count,
...
Message type 2 fields: message_type, common_id_number, hostname, ...
Both messages share the same index in elasticsearch.
As you can see we were trying to graph without taking that common_id_number into account, but it seems that we must use it. We don't know how yet, though.
Any help?
EDIT
These are the relevant field definitions in the ES template:
"URIHost" : {
"type" : "string",
"norms" : {
"enabled" : false
},
"fields" : {
"raw" : {
"type" : "string",
"index" : "not_analyzed",
"ignore_above" : 256
}
}
},
"Type" : {
"type" : "string",
"norms" : {
"enabled" : false
},
"fields" : {
"raw" : {
"type" : "string",
"index" : "not_analyzed",
"ignore_above" : 256
}
}
},
"SessionID" : {
"type" : "long"
},
"Bytes" : {
"type" : "long"
},
"BytesReceived" : {
"type" : "long"
},
"BytesSent" : {
"type" : "long"
},
This is a TRAFFIC type, edited document:
{
"_index": "logstash-2015.11.05",
"_type": "paloalto",
"_id": "AVDZqdBjpQiRid-uxPjE",
"_score": null,
"_source": {
"#version": "1",
"#timestamp": "2015-11-05T21:59:55.543Z",
"syslog_severity_code": 5,
"syslog_facility_code": 1,
"syslog_timestamp": "Nov 5 22:59:58",
"Type": "TRAFFIC",
"SessionID": 21713,
"Bytes": 939,
"BytesSent": 480,
"BytesReceived": 459,
},
"fields": {
"#timestamp": [
1446760795543
]
},
"sort": [
1446760795543
]
}
And this is a THREAT type document:
{
"_index": "logstash-2015.11.05",
"_type": "paloalto",
"_id": "AVDZqVNIpQiRid-uxPjC",
"_score": null,
"_source": {
"#version": "1",
"#timestamp": "2015-11-05T21:59:23.440Z",
"syslog_severity_code": 5,
"syslog_facility_code": 1,
"syslog_timestamp": "Nov 5 22:59:26",
"Type": "THREAT",
"SessionID": 21713,
"URIHost": "whatever.nevermind.com",
"URIPath": "/connectiontest.html"
},
"fields": {
"#timestamp": [
1446760763440
]
},
"sort": [
1446760763440
]
}
This is the logstash "filter" configuration:
filter {
if [type] == "paloalto" {
syslog_pri {
remove_field => [ "syslog_facility", "syslog_severity" ]
}
grok {
match => {
"message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{HOSTNAME:hostname} %{INT},%{YEAR}/%{MONTHNUM}/%{MONTHDAY} %{TIME},%{INT},%{WORD:Type},%{GREEDYDATA:log}"
}
remove_field => [ "message" ]
}
if [Type] == "THREAT" {
csv {
source => "log"
columns => [ "Threat_OR_ContentType", "ConfigVersion", "GenerateTime", "SourceAddress", "DestinationAddress", "NATSourceIP", "NATDestinationIP", "Rule", "SourceUser", "DestinationUser", "Application", "VirtualSystem", "SourceZone", "DestinationZone", "InboundInterface", "OutboundInterface", "LogAction", "TimeLogged", "SessionID", "RepeatCount", "SourcePort", "DestinationPort", "NATSourcePort", "NATDestinationPort", "Flags", "IPProtocol", "Action", "URL", "Threat_OR_ContentName", "reportid", "Category", "Severity", "Direction", "seqno", "actionflags", "SourceCountry", "DestinationCountry", "cpadding", "contenttype", "pcap_id", "filedigest", "cloud", "url_idx", "user_agent", "filetype", "xff", "referer", "sender", "subject", "recipient" ]
remove_field => [ "log" ]
}
mutate {
convert => {
"SessionID" => "integer"
"SourcePort" => "integer"
"DestinationPort" => "integer"
"NATSourcePort" => "integer"
"NATDestinationPort" => "integer"
}
remove_field => [ "ConfigVersion", "GenerateTime", "VirtualSystem", "InboundInterface", "OutboundInterface", "LogAction", "TimeLogged", "RepeatCount", "Flags", "Action", "reportid", "Severity", "seqno", "actionflags", "cpadding", "pcap_id", "filedigest", "recipient" ]
}
grok {
match => {
"URL" => "%{URIHOST:URIHost}%{URIPATH:URIPath}(%{URIPARAM:URIParam})?"
}
remove_field => [ "URL" ]
}
}
else if [Type] == "TRAFFIC" {
csv {
source => "log"
columns => [ "Threat_OR_ContentType", "ConfigVersion", "GenerateTime", "SourceAddress", "DestinationAddress", "NATSourceIP", "NATDestinationIP", "Rule", "SourceUser", "DestinationUser", "Application", "VirtualSystem", "SourceZone", "DestinationZone", "InboundInterface", "OutboundInterface", "LogAction", "TimeLogged", "SessionID", "RepeatCount", "SourcePort", "DestinationPort", "NATSourcePort", "NATDestinationPort", "Flags", "IPProtocol", "Action", "Bytes", "BytesSent", "BytesReceived", "Packets", "StartTime", "ElapsedTimeInSecs", "Category", "Padding", "seqno", "actionflags", "SourceCountry", "DestinationCountry", "cpadding", "pkts_sent", "pkts_received", "session_end_reason" ]
remove_field => [ "log" ]
}
mutate {
convert => {
"SessionID" => "integer"
"SourcePort" => "integer"
"DestinationPort" => "integer"
"NATSourcePort" => "integer"
"NATDestinationPort" => "integer"
"Bytes" => "integer"
"BytesSent" => "integer"
"BytesReceived" => "integer"
"ElapsedTimeInSecs" => "integer"
}
remove_field => [ "ConfigVersion", "GenerateTime", "VirtualSystem", "InboundInterface", "OutboundInterface", "LogAction", "TimeLogged", "RepeatCount", "Flags", "Action", "Packets", "StartTime", "seqno", "actionflags", "cpadding", "pcap_id", "filedigest", "recipient" ]
}
}
date {
match => [ "syslog_timastamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
timezone => "CET"
remove_field => [ "syslog_timestamp" ]
}
}
}
What we are trying to do is to visualize URIHost terms as X axis and Bytes, BytesSent and BytesReceived sums as Y axis.
I think you can use the aggregate filter to carry out your task. The aggregate filter provides support for aggregating several log lines into one single event based on a common field value. In your case, the common field we're going to use will be the SessionID field.
Then we need another field to detect the first event vs the second/last event that should be aggregated. In your case, this would be the Type field.
You need to change your current configuration like this:
filter {
... all other filters
if [Type] == "THREAT" {
... all other filters
aggregate {
task_id => "%{SessionID}"
code => "map['URIHost'] = event['URIHost']; map['URIPath'] = event['URIPath']"
}
}
else if [Type] == "TRAFFIC" {
... all other filters
aggregate {
task_id => "%{SessionID}"
code => "event['URIHost'] = map['URIHost']; event['URIPath'] = map['URIPath']"
end_of_task => true
timeout => 120
}
}
}
The general idea is that when Logstash encounters THREAT logs it will temporarily store the URIHost and URIPath in the in-memory event map, and then when a TRAFFIC log comes in, the URIHost and URIPath fields will be added to the event. You can copy other fields, too, if needed. You can also adapt the timeout (in seconds) depending on how long you expect a TRAFFIC event to come in after the last THREAT event.
In the end, you'll get documents with data merged from both THREAT and TRAFFIC log lines and you can easily create the visualization showing bytes count per URIHost as shown on your screenshot.

Resources