Logstash plugin kv - keys and values not getting read into Elastic - elasticsearch

Small part of my CSV log:
TAGS
contentms:Drupal;contentms.ver:7.1.8;vuln:rce;cve:CVE-2018-0111;
cve:CVE-2014-0160;vuln:Heartbleed;
contentms.ver:4.1.6;contentms:WordPress;tag:backdoor
tag:energia;
Idea is that I know nothing of the keys and values other than the format
key:value;key:value;key:value;key:value; etc
I just create an pattern with logstash plugin "kv"
kv {
source => "TAGS"
field_split => ";"
value_split => ":"
target => "TAGS"
}
I've been trying to get my data into Elastic for Kibana and some of it goes through. But for example keys contentms: and contentms.ver: don't get read. Also keys that do - only one value is searchable in Kibana. For example key cve: is seen on mutliple lines mutliple times in my log with different values but only this value is indexed cve:CVE-2014-0160 same problem for tag: and vuln: keys.
I've seen some similar problems and solutions with ruby, but any solutions with just kv? or change my log format around a bit?

I can't test it right now, but notice that you have both "contentms" (a string) and "contentms.ver", which probably looks to elasticsearch like a nested field ([contentms][ver]), but "contentms" was already defined as a string, so you can't nest beneath it.
After the cvs filter, try renaming "contentms" to "[contentms][name]", which would then be a peer to "[contentms][ver]".
You'd need to start with a new index to create this new mapping.

Related

How can I translate all values in an array with Logstash?

I'm indexing logs in ELasticsearch through Logstash which contain a field with an array of codes, for example:
indicator.codes : [ "3", "120", "148" ]
Is there some way in Logstash to lookup these codes in a csv and save the categories and descriptions in 2 new fields such as indicator.categories and indicator.descriptions.
A subset of the csv with 3 columns:
Column 1 => indicator.code
Column 2 => indicator.category
Column 3 => indicator.description
3;Hiding;There are signs in the header
4;Hiding;This binary might try to schedule a task
34;General;This is a 7zip selfextracting file
120;General;This is a selfextracting RAR file
121;General;This binary tries to run as a service
148;Stealthiness;This binary uses tunnel traffic
I've been looking at the csv filter and the translate filter, but they do not seem to be able to lookup multiple keys.
The translate filter seems to work only with 2 columns. The csv filter seems unable to loop through the indicator.codes array.
I would suggest using a Ruby filter to loop over the indicator.codes and compare them to your data you retrieved from the csv.
https://www.elastic.co/guide/en/logstash/8.1/plugins-filters-ruby.html

Can't assign elastic search index for data stream

I am trying to create an index in elasticsearch using the index => option for elasticsearch logstash output running on docker:
output {
elasticsearch {
cloud_id => "..."
data_stream => "true"
ssl => "true"
api_key => "..."
document_id => "%{_log_id}"
index => "%{target_index}"
}
}
If I comment the index line, the pipeline works and data is sent to the default index. However, with the index defined (with or without it being a constant string) the following error is given on launch before ingesting any data
elasticsearch - Invalid data stream configuration, following parameters are not supported: {"index"=>"%{target_index}"}
Where target_index is an entry in the JSON body parsed in filter.
And breaks with Could not execute action: PipelineAction::Create<firmware_pipeline> indicating that this is before the pipeline is actually triggered.
Not sure if I'm just reading the docs wrong but this seems to be what others are doing as well.
Logstash version: 7.13.2
When you use a data stream, events are automatically routed to indexes based on values in the [data_stream] field. You cannot have automatic routing at the same time as explicit routing with the index => "%{target_index}" option. That is what the following is telling you:
following parameters are not supported: {"index"=>"%{target_index}"}
Remove the index option if you want to use a data stream. If you want explicit routing, remove the data_stream option.
If you need data to go to both destinations, use a second output.

Elasticsearch index does not recognize date (Logstash pipeline)

I return with my doubts about elasticsearch.
I've been doing more testing and I'm almost done learning about ELK.
But I have a little problem. By creating a pipeline and outputting the elasticsearch cluster as follows:
output {
elasticsearch {
hosts => [ "IP:9200", "IP:9200", "IP:9200" ]
manage_template => false
index => "example-%{+yyyy.MM.dd}"
}
}
It does not correctly create the date. I have followed the indications of the documentation as it says: https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-index
The above code creates the index: example-
Instead if I leave the default by unchecking the fields manage_template and index it create the index with the format correct: logstash-2020.08.17-000001
So I don't really know what I'm failing at. If it is a documentation issue or some other formatting error.
Another detail is that I have seen some examples from 2018 with the following format: index => "%{[#metadata][beat]}-%{+YYYY.MM.dd}". And reviewing the same previous link about the elasticsearch output plugin means creating different indexes, but I think it refers more to separating indexes according to a production or development space more than creating different indexes according to a date.
After a couple of days I have noticed that I changed the default name of the field #timestamp
Now it works correctly. Thank you

Create a new index per day for Elasticsearch in Logstash configuration

I intend to have an ELK stack setup where daily JSON inputs get stored in log files created, one for each date. My logstash shall listen to the input via these logs and store it to Elasticsearch at an index corresponding to the date of the log file entry.
My logstash-output.conf goes something like:
output {
elasticsearch {
host => localhost
cluster => "elasticsearch_prod"
index => "test"
}
}
Thus, as for now, all the inputs to logstash get stored at index test of elasticsearch. What I want is that an entry to logstash occurring on say, 2015.11.19, which gets stored in logfile named logstash-2015.11.19.log, must be correspondingly stored at an index test-2015.11.19.
How should I edit my logstash configuration file to enable this ?
Answer because the comment can't be formatted and it looks awful.
Your filename ( I assume you use a file input ) is stored in your path variable as such:
file {
path => "/logs/**/*my_log_file*.log"
}
type => "myType"
}
This variable is accessible throughout your whole configuration, so what you can do is use a regex filter to parse your date out of the path, for example using grok, you could do something like that (look out: Pseudocode)
if [type] == "myType" {
grok {
match => {
"path" => "%{MY_DATE_PATTERN:myTimeStampVar}"
}
}
}
With this you now have your variable in "myTimeStampVar" and you can use it in your output:
elasticsearch {
host => "127.0.0.1"
cluster => "logstash"
index => "events-%{myTimeStampVar}"
}
Having said all this, I am not quite sure why you need this? I think it is better to have ES do the job for you. It will know the timestamp of your log and index it accordingly so you have easy access to it. However, the setup above should work for you, I used a very similar approach to parse out a client name and create sub-indexes on a per-client bases, for example: myIndex-%{client}-%{+YYYY.MM.dd}
Hope this helps,
Artur
Edit: I did some digging because I suspect that you are worried your logs get put in the wrong index because they are parsed at the wrong time? If this is correct, the solution is not to parse the index out of the log file, but to parse the timestamp out of each log.
I assume each log line for you has a timestamp. Logstash will create an #timestamp field which is the current date. So this would be not equal to the index. However, the correct way to solve this, is to mutate the #timestamp field and instead use the timestamp in your log line (the parsed one). That way logstash will have the correct index and put it there.

ELK Type Conversion - Not a number but a string

I'm trying to set up an elk dashboard to see some numbers like total bytes, avg load time, etc. I'm forcing some conversions in logstash to make sure these fields aren't strings
convert => [ "bytes", "integer" ]
convert => [ "seconds", "float" ]
convert => [ "milliseconds", "integer" ]
Those Logstash conversions are working. See this excerpt from my logstash.log. Statuscode is a string, bytes, ... are numbers
"http_statuscode" => "200",
"bytes" => 2731,
"seconds" => 0.0,
"milliseconds" => 9059,
But when I try to build my dashboard with avg, min, max and total bytes for instance elasticsearch logs this:
Facet [stats]: field [bytes] isn't a number field, but a string
Am I missing some kind of conversion or something? Anybody already expierenced this behavior?
Thanks gus yand regards. Sebastian
One possible issue is that the mapping for fields in an index is set when the first document is inserted in the index. Changing the mapping will not update any old documents in the index, nor affect any new documents that are inserted into that index.
If you're in development, the easiest thing is to drop the index (thus deleting your earlier data). Any new documents would then use your new mapping.
If you can't toss the old data, you can wait for tomorrow, when you'll get a new index.
If necessary, you can also rebuild the index, but I've always felt it to be a pain.
One other possibility is that you have the same field name with different mappings in different types in the same index. [ repeat that a few times and it will make sense ]. Field [foo] must have the same mapping definition in each type of the same index.
I recently solved this problem (I mean use bytes or request time as numbers in Kibana, I use v4 beta 3 and you ?). The three following points might help you :
How do you parse your log ? Using Grok filter ? If yes, you can try matching your logs with the following patterns %{INT:bytes:int} instead of using the convert filter.
Did you "reload field list" (yellow button) in Kibana 4 (settings->indices) after you've done your changes ?
If you have old indexes in your ES cluster, did you correctly remove these indexes ? If not, you might have some conflicts between old types and new ones.
Hope it will help.

Resources