Split a message using grok - elasticsearch

I have logs in the format:
2018-09-17 15:24:34;Count of files in error folder in;C:\Scripts\FOLDER\SUBFOLDER\error;1
I want to put in a separate field the path to the folder and the number after.
Like
dirTEST=C:\Scripts\FOLDER\SUBFOLDER\
count.of.error.filesTEST=1
or
dir=C:\Scripts\FOLDER\SUBFOLDER\
count.of.error.files=1
I use for this grok pattern in logstash config:
if "TestLogs" in [tags] {
grok{
match => { "message" => "%{DATE:date_in_log}%{SPACE}%{TIME:time.in.log};%{DATA:message.text.log};%{WINPATH:dir};%{INT:count.of.error.files}" }
add_field => { "dirTEST" => "%{dir}" }
add_field => { "count.of.error.filesTEST" => "%{count.of.error.files}" }
}
}
No errors in logstash logs.
But in the Kibana I get the usual log without new fields.

A couple of notes here. First of all, it must be said that the solution seems to be doing what you expect, so probably the problem is that your Index Pattern has not been updated with the new fields. To do so in Kibana you can go to Management -> Kibana -> Index Patterns and refresh the field list in the upper right corner (Next to the delete Index Pattern button).
Second is that you must take into account that using points to separate the terms makes the structured data look like this:
{
"date_in_log": "18-09-17",
"count": {
"of": {
"error": {
"files": "1"
}
}
},
"time": {
"in": {
"log": "15:24:34"
}
},
"message": {
"text": {
"log": "Count of files in error folder in"
}
},
"dir": "C:\\Scripts\\FOLDER\\SUBFOLDER\\error"
}
I don't know if this is how you want your data to be represented, but maybe you should consider other solution changing the naming of the fields in the grok pattern.

Related

Splitting A Nested JSON In Logstash

I am working on a project, ingesting CVE data from NVD into Elasticsearch.
My input data looks something like this:
{
"resultsPerPage":20,
"startIndex":0,
"totalResults":189227,
"result":{
"CVE_data_type":"CVE",
"CVE_data_format":"MITRE",
"CVE_data_version":"4.0",
"CVE_data_timestamp":"2022-11-23T08:27Z",
"CVE_Items":[
{...},
{...},
{...},
{
"cve":{
"data_type":"CVE",
"data_format":"MITRE",
"data_version":"4.0",
"CVE_data_meta":{
"ID":"CVE-2022-45060",
"ASSIGNER":"cve#mitre.org"
},
"problemtype":{...},
"references":{...},
"description":{
"description_data":[
{
"lang":"en",
"value":"An HTTP Request Forgery issue was discovered in Varnish Cache 5.x and 6.x before 6.0.11, 7.x before 7.1.2, and 7.2.x before 7.2.1. An attacker may introduce characters through HTTP/2 pseudo-headers that are invalid in the context of an HTTP/1 request line, causing the Varnish server to produce invalid HTTP/1 requests to the backend. This could, in turn, be used to exploit vulnerabilities in a server behind the Varnish server. Note: the 6.0.x LTS series (before 6.0.11) is affected."
}
]
}
},
"configurations":{
"CVE_data_version":"4.0",
"nodes":[
{
"operator":"OR",
"children":[
],
"cpe_match":[
{
"vulnerable":true,
"cpe23Uri":"cpe:2.3:a:varnish-software:varnish_cache_plus:6.0.8:r2:*:*:*:*:*:*",
"cpe_name":[
]
},
{
"vulnerable":true,
"cpe23Uri":"cpe:2.3:a:varnish-software:varnish_cache_plus:6.0.8:r1:*:*:*:*:*:*",
"cpe_name":[
]
},
{
"vulnerable":true,
"cpe23Uri":"cpe:2.3:a:varnish_cache_project:varnish_cache:7.2.0:*:*:*:*:*:*:*",
"cpe_name":[
]
}
]
}
]
},
"impact":{...},
"publishedDate":"2022-11-09T06:15Z",
"lastModifiedDate":"2022-11-23T03:15Z"
}
]
}
}
Each query from the API gives back a result like this, containing 20 individual CVEs, each CVE contains multiple configuration. What I wanted to achieve was to have one document per configuration, where the other data is the same, only the cpe23Uri field changes. I've tried using the split filter in multiple different ways, with no success.
My pipeline looks like this:
filter {
json {
source => "message"
}
split {
field => "[result][CVE_Items]"
}
split {
field => "[result][CVE_Items][cve][description][description_data]"
}
}
kv {
source => "message"
field_split => ","
}
mutate {
remove_field => [...]
rename =>
rename => ["[result][CVE_Items][configurations][nodes]", "Affected_Products"]
.
.
.
rename => ["[result][CVE_data_timestamp]", "CVE_Timestamp"]
}
fingerprint {
source => ["CVE", "Affected Products"]
target => "fingerprint"
concatenate_sources => "true"
method => "SHA256"
}
}
For splitting I've tried simply using:
split {
field => "[result][CVE_Items][configurations][nodes][cpe_match]"
}
and also with adding [cpe23Uri] to the end,
as well as:
split {
field => "[result][CVE_Items][configurations][nodes][cpe_match]"
target => "cpeCopy"
remove_field => "[result][CVE_Items][configurations][nodes][cpe_match]"
}
if [result][CVE_Items][configurations][nodes][cpe_match] {
ruby {
code => "event['cpeCopy'] = event['[result][CVE_Items][configurations][nodes][cpe_match]'][0]"
remove_field => "[result][CVE_Items][configurations][nodes][cpe_match]"
}
}
if [cpeCopy] {
mutate {
rename => { "cpeCopy" => "CPE" }
}
As I've said I've tried multiple ways, I won't list them all but unfortunately nothing has worked so far.

ElasticSearch: populating ip_range type field via logstash

I'm experimenting with the ip_range field type in ElasticSearch 6.8 (https://www.elastic.co/guide/en/elasticsearch/reference/6.8/range.html) and struggle to find a way to load ip data into the field properly via logstash
I was able to load some sample data via Kibana Dev Tools, but cannot figure out a way to do the same via logstash.
Index definition
PUT test_ip_range
{
"mapping": {
"_doc": {
"properties": {
"ip_from_to_range": {
"type": "ip_range"
},
"ip_from": {
"type": "ip"
},
"ip_to": {
"type": "ip"
}
}
}
}
}
Add sample doc:
PUT test_ip_range/_doc/3
{
"ip_from_to_range" :
{
"gte" : "<dotted_ip_from>",
"lte": "<dotted_ip_to>"
}
}
Logstash config (reading from DB)
input {
jdbc {
...
statement => "SELECT ip_from, ip_to, <???> AS ip_from_to_range FROM sample_ip_data"
}
}
output {
stdout { codec => json_lines }
elasticsearch {
"hosts" => "<host>"
"index" => "test_ip_range"
"document_type" => "_doc"
}
}
Question:
How do I get ip_from and ip_to DB fields into their respective gte and lte parts of the ip_from_to_range via logstash config??
I know I can also insert the ip range in CIDR notation, but would like to be able to have both options - loading in CIDR notation and loading as a range.
After some trial and error, finally figured out the logstash config.
I had posted about a similar issue here, which finally got me on the right track with the syntax for this use case as well.
input { ... }
filter {
mutate {
add_field => {
"[ip_from_to_range]" =>
'{
"gte": "%{ip_from}",
"lte": "%{ip_to}"
}'
}
}
json {
source => "ip_from_to_range"
target => "ip_from_to_range"
}
}
output { ... }
Filter parts explained
mutate add_field: create a new field [ip_from_to_range] with its value being a json string ( '{...}' ). It is important to have the field as [field_name], otherwise the next step to parse the string into json object doesn't work
json: parse the string representation into a json object

Converting nginx access log bytes to number in Kibana4

I would like to create a visualization of the sum of bytes sent using the data from my nginx access logs. When trying to create a "Metric" visualization, I can't use the bytes field as a sum because it is a string type.
And I'm not able to change it under settings.
How do I go about changing this field type to a number/bytes type?
Here is my logstash config for nginx access logs
filter {
if [type] == "nginx-access" {
grok {
match => { "message" => "%{NGINXACCESS}" }
}
geoip {
source => "clientip"
}
useragent {
source => "agent"
target => "useragent"
}
}
}
Since each logstash index is being created as an index, I'm guess I need to change it here.
I tried adding
mutate {
convert => { "bytes" => "integer" }
}
But it doesn't seem to make a difference.
Field types are configured using mappings, which is configured at the index level and can hardly change. With Logstash, as a new index is created everyday, so if you wan't to change these mappings either wait for the next day or delete the current index if you can.
By default these mappings are generated automatically by Elasticsearch depending on the syntax of the indexed JSON document and the applied Index Templates:
# Type String
{"bytes":"123"}
# Type Integer
{"bytes":123}
In the end there are 2 solutions:
Tune Logstash, to make it generate an integer and let Elasticsearch guess the field type → Use the mutate/convert filter
Tune Elasticsearch, to force the field bytes for the document type nginx-access to be of type integer → Use Index Template:
Index Template API:
PUT _template/logstash-nginx-access
{
"order": 1,
"template": "logstash-*",
"mappings": {
"nginx-access": {
"properties": {
"bytes": {
"type": "integer"
}
}
}
}
}

logstash, syslog and grok

I am working on an ELK-stack configuration. logstash-forwarder is used as a log shipper, each type of log is tagged with a type-tag:
{
"network": {
"servers": [ "___:___" ],
"ssl ca": "___",
"timeout": 15
},
"files": [
{
"paths": [
"/var/log/secure"
],
"fields": {
"type": "syslog"
}
}
]
}
That part works fine... Now, I want logstash to split the message string in its parts; luckily, that is already implemented in the default grok patterns, so the logstash.conf remains simple so far:
input {
lumberjack {
port => 6782
ssl_certificate => "___" ssl_key => "___"
}
}
filter {
if [type] == "syslog" {
grok {
match => [ "message", "%{SYSLOGLINE}" ]
}
}
}
output {
elasticsearch {
cluster => "___"
template => "___"
template_overwrite => true
node_name => "logstash-___"
bind_host => "___"
}
}
The issue I have here is that the document that is received by elasticsearch still holds the whole line (including timestamp etc.) in the message field. Also, the #timestamp still shows the date of when logstash has received the message which makes is bad to search since kibana does query the #timestamp in order to filter by date... Any idea what I'm doing wrong?
Thanks, Daniel
The reason your "message" field contains the original log line (including timestamps etc) is that the grok filter by default won't allow existing fields to be overwritten. In other words, even though the SYSLOGLINE pattern,
SYSLOGLINE %{SYSLOGBASE2} %{GREEDYDATA:message}
captures the message into a "message" field it won't overwrite the current field value. The solution is to set the grok filter's "overwrite" parameter.
grok {
match => [ "message", "%{SYSLOGLINE}" ]
overwrite => [ "message" ]
}
To populate the "#timestamp" field, use the date filter. This will probably work for you:
date {
match => [ "timestamp", "MMM dd HH:mm:ss", "MMM d HH:mm:ss" ]
}
It is hard to know were the problem without seeing an example event that is causing you the problem. I can suggest you to try the grok debugger in order to verify the pattern is correct and to adjust it to your needs once you see the problem.

How to stop logstash from creating a default mapping in ElasticSearch

I am using logstash to feed logs into ElasticSearch.
I am configuring logstash output as:
input {
file {
path => "/tmp/foo.log"
codec =>
plain {
format => "%{message}"
}
}
}
output {
elasticsearch {
#host => localhost
codec => json {}
manage_template => false
index => "4glogs"
}
}
I notice that as soon as I start logstash it creates a mapping ( logs ) in ES as below.
{
"4glogs": {
"mappings": {
"logs": {
"properties": {
"#timestamp": {
"type": "date",
"format": "dateOptionalTime"
},
"#version": {
"type": "string"
},
"message": {
"type": "string"
}
}
}
}
}
}
How can I prevent logstash from creating this mapping ?
UPDATE:
I have now resolved this error too. "object mapping for [logs] tried to parse as object, but got EOF, has a concrete value been provided to it?"
As John Petrone has stated below, once you define a mapping, you have to ensure that your documents conform to the mapping. In my case, I had defined a mapping of "type: nested" but the output from logstash was a string.
So I removed all codecs ( whether json or plain ) from my logstash config and that allowed the json document to pass through without changes.
Here is my new logstash config ( with some additional filters for multiline logs ).
input {
kafka {
zk_connect => "localhost:2181"
group_id => "logstash_group"
topic_id => "platform-logger"
reset_beginning => false
consumer_threads => 1
queue_size => 2000
consumer_id => "logstash-1"
fetch_message_max_bytes => 1048576
}
file {
path => "/tmp/foo.log"
}
}
filter {
multiline {
pattern => "^\s"
what => "previous"
}
multiline {
pattern => "[0-9]+$"
what => "previous"
}
multiline {
pattern => "^$"
what => "previous"
}
mutate{
remove_field => ["kafka"]
remove_field => ["#version"]
remove_field => ["#timestamp"]
remove_tag => ["multiline"]
}
}
output {
elasticsearch {
manage_template => false
index => "4glogs"
}
}
You will need a mapping to store data in Elasticsearch and to search on it - that's how ES knows how to index and search those content types. You can either let logstash create it dynamically or you can prevent it from doing so and instead create it manually.
Keep in mind you cannot change existing mappings (although you can add to them). So first off you will need to delete the existing index. You would then modify your settings to prevent dynamic mapping creation. At the same time you will want to create your own mapping.
For example, this will create the mappings for the logstash data but also restrict any dynamic mapping creation via "strict":
$ curl -XPUT 'http://localhost:9200/4glogs/logs/_mapping' -d '
{
"logs" : {
"dynamic": "strict",
"properties" : {
"#timestamp": {
"type": "date",
"format": "dateOptionalTime"
},
"#version": {
"type": "string"
},
"message": {
"type": "string"
}
}
}
}
'
Keep in mind that the index name "4glogs" and the type "logs" need to match what is coming from logstash.
For my production systems I generally prefer to turn off dynamic mapping as it avoids accidental mapping creation.
The following links should be useful if you want to make adjustments to your dynamic mappings:
https://www.elastic.co/guide/en/elasticsearch/guide/current/dynamic-mapping.html
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/custom-dynamic-mapping.html
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/dynamic-mapping.html
logs in this case is the index_type. If you don't want to create it as logs, specify some other index_type on your elasticsearch element. Every record in elasticsearch is required to have an index and a type. Logstash defaults to logs if you haven't specified it.
There's always an implicit mapping created when you insert records into Elasticsearch, so you can't prevent it from being created. You can create the mapping yourself before you insert anything (via say a template mapping).
The setting manage_template of false just prevents it from creating the template mapping for the index you've specified. You can delete the existing template if it's already been created by using something like curl -XDELETE http://localhost:9200/_template/logstash?pretty
Index templates can help you. Please see this jira for more details. You can create index templates with wildcard support to match an index name and put your default mappings.

Resources