Can't assign elastic search index for data stream - elasticsearch

I am trying to create an index in elasticsearch using the index => option for elasticsearch logstash output running on docker:
output {
elasticsearch {
cloud_id => "..."
data_stream => "true"
ssl => "true"
api_key => "..."
document_id => "%{_log_id}"
index => "%{target_index}"
}
}
If I comment the index line, the pipeline works and data is sent to the default index. However, with the index defined (with or without it being a constant string) the following error is given on launch before ingesting any data
elasticsearch - Invalid data stream configuration, following parameters are not supported: {"index"=>"%{target_index}"}
Where target_index is an entry in the JSON body parsed in filter.
And breaks with Could not execute action: PipelineAction::Create<firmware_pipeline> indicating that this is before the pipeline is actually triggered.
Not sure if I'm just reading the docs wrong but this seems to be what others are doing as well.
Logstash version: 7.13.2

When you use a data stream, events are automatically routed to indexes based on values in the [data_stream] field. You cannot have automatic routing at the same time as explicit routing with the index => "%{target_index}" option. That is what the following is telling you:
following parameters are not supported: {"index"=>"%{target_index}"}
Remove the index option if you want to use a data stream. If you want explicit routing, remove the data_stream option.
If you need data to go to both destinations, use a second output.

Related

logstash put all log files in one Elasticsearch index and create a new index per each log file day for Elasticsearch in Logstash configuration

This is the naming convention of my log files which looks like this:
adminPortal-2021-10-10.0.log
adminPortal-2021-10-27.0.log
I need to publish them to different indices that match the log file date, but Elasticsearch publishes logs from all log files into one index.
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "admin-%{+YYYY-MM-dd}"
}
}
A sprintf reference to a date, like %{+YYYY-MM-dd} always uses the value of the #timestamp field. If you want it to use the value from the log entry you will need to parse the timestamp out of the [message] field, possibly using grok, and then parse that using a date filter to overwrite the default value of the #timestamp field (which is Time.now).

Logstash plugin kv - keys and values not getting read into Elastic

Small part of my CSV log:
TAGS
contentms:Drupal;contentms.ver:7.1.8;vuln:rce;cve:CVE-2018-0111;
cve:CVE-2014-0160;vuln:Heartbleed;
contentms.ver:4.1.6;contentms:WordPress;tag:backdoor
tag:energia;
Idea is that I know nothing of the keys and values other than the format
key:value;key:value;key:value;key:value; etc
I just create an pattern with logstash plugin "kv"
kv {
source => "TAGS"
field_split => ";"
value_split => ":"
target => "TAGS"
}
I've been trying to get my data into Elastic for Kibana and some of it goes through. But for example keys contentms: and contentms.ver: don't get read. Also keys that do - only one value is searchable in Kibana. For example key cve: is seen on mutliple lines mutliple times in my log with different values but only this value is indexed cve:CVE-2014-0160 same problem for tag: and vuln: keys.
I've seen some similar problems and solutions with ruby, but any solutions with just kv? or change my log format around a bit?
I can't test it right now, but notice that you have both "contentms" (a string) and "contentms.ver", which probably looks to elasticsearch like a nested field ([contentms][ver]), but "contentms" was already defined as a string, so you can't nest beneath it.
After the cvs filter, try renaming "contentms" to "[contentms][name]", which would then be a peer to "[contentms][ver]".
You'd need to start with a new index to create this new mapping.

How to easily change a field from analyzed to non_analyzed

I have a hostname field that's coming in via filebeat to my logstash instance is getting passed to ElasticSearch where it's being treated as an analyzed field. That's causing issues, because the field itself needs to be reported on in it's totality.
Example: Knowing how many requests come to "prd-awshst-x-01" rather than splitting those out into prd, awshst, x, 01.
Does anyone have a lightweight way of doing this that can be used with visualizations?
Thanks,
We have to update mapping from analyzed to not_analyzed for specific field.
PUT/ mapping url/
{
property:{
field:{
text:"not_analyzed"
}
}
}
After updating the property please check is it reflected in mapping using GET method on mapping url.
Based on the title of your post, you already know that you need to change the mapping of the field to not_analyzed.
You should setup a template so that future indexes contain this mapping.
If you want to keep the existing data, you'll have to reindex it into a new index with the new mapping.
If you're using the default logstash template, it might be creating you a not_analyzed ".raw" field that you can use in visualizations in kibana.
The index template that is provided with Filebeat configures the hostname field as not_analyzed.
You should manually install the index template provided with Filebeat and then configure Logstash to write data to the Filebeat index as described in the docs.
This is what the elasticsearch output would look like. If you are processing other data through Logstash, then you might want to add a conditional around this output so that only beat events are sent via this output.
output {
elasticsearch {
hosts => "localhost:9200"
manage_template => false
index => "%{[#metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
}

logstash metadata not passed to elasticsearch

I am trying to follow the example https://www.digitalocean.com/community/tutorials/how-to-install-elasticsearch-logstash-and-kibana-elk-stack-on-centos-7
But the index name set by 30-elasticsearch-output.conf is not being resolved. In the example 30-elasticsearch-output.conf file:
index => "%{[#metadata][beat]}-%{+YYYY.MM.dd}"
In my case, the result elasticsearch index name is:
"%{[#metadata][beat]}-2016.09.07"
Only the date portion of the index name is set correctly.
What is responsible for setting the metadata value? I must have missed something in following the example.
This is related to a question asked earlier: ELK not passing metadata from filebeat into logstash
You can create index like this
index => "%{[beat][name]}-%{+YYYY.MM.dd}"
This would work definitely.

Create a new index per day for Elasticsearch in Logstash configuration

I intend to have an ELK stack setup where daily JSON inputs get stored in log files created, one for each date. My logstash shall listen to the input via these logs and store it to Elasticsearch at an index corresponding to the date of the log file entry.
My logstash-output.conf goes something like:
output {
elasticsearch {
host => localhost
cluster => "elasticsearch_prod"
index => "test"
}
}
Thus, as for now, all the inputs to logstash get stored at index test of elasticsearch. What I want is that an entry to logstash occurring on say, 2015.11.19, which gets stored in logfile named logstash-2015.11.19.log, must be correspondingly stored at an index test-2015.11.19.
How should I edit my logstash configuration file to enable this ?
Answer because the comment can't be formatted and it looks awful.
Your filename ( I assume you use a file input ) is stored in your path variable as such:
file {
path => "/logs/**/*my_log_file*.log"
}
type => "myType"
}
This variable is accessible throughout your whole configuration, so what you can do is use a regex filter to parse your date out of the path, for example using grok, you could do something like that (look out: Pseudocode)
if [type] == "myType" {
grok {
match => {
"path" => "%{MY_DATE_PATTERN:myTimeStampVar}"
}
}
}
With this you now have your variable in "myTimeStampVar" and you can use it in your output:
elasticsearch {
host => "127.0.0.1"
cluster => "logstash"
index => "events-%{myTimeStampVar}"
}
Having said all this, I am not quite sure why you need this? I think it is better to have ES do the job for you. It will know the timestamp of your log and index it accordingly so you have easy access to it. However, the setup above should work for you, I used a very similar approach to parse out a client name and create sub-indexes on a per-client bases, for example: myIndex-%{client}-%{+YYYY.MM.dd}
Hope this helps,
Artur
Edit: I did some digging because I suspect that you are worried your logs get put in the wrong index because they are parsed at the wrong time? If this is correct, the solution is not to parse the index out of the log file, but to parse the timestamp out of each log.
I assume each log line for you has a timestamp. Logstash will create an #timestamp field which is the current date. So this would be not equal to the index. However, the correct way to solve this, is to mutate the #timestamp field and instead use the timestamp in your log line (the parsed one). That way logstash will have the correct index and put it there.

Resources