How to control number of shards in Elastic index from logstash? - elasticsearch

I would like to control how many shards a new index should have in my logstash output file. Ex:
10-output.conf:
output {
if [type] == "mytype" {
elasticsearch {
hosts => [ "1.1.1.1:9200" ]
index => "logstash-mytype-%{+YYYY.ww}"
workers => 8
flush_size => 1000
? <====== what option to control the number of index shards goes here?
}
}
From what I understand in logstash elastic options this is not possible and new index will default to 5 shards?

The Logstash-Elasticsearch mix it's designed to work differently than what your expectation is: in Elasticsearch one defines an index template in which the number or shards is a configuration setting.
And whenever Logstash creates a new index by sending documents to this new index, Elasticsearch uses that index template (by matching the new index name with the configured template) to actually create the index.

Related

Elasticsearch copy index mappings

We have an elasticsearch cluster consisting of 6 nodes version 6 and we have an index called bishkek in the cluster, now I want to copy only the index mappings (no data) to the new index bishkek_v2
Elasticsearch doesn't have any API that copy only mappings, so you would need to first get your mapping for bishkek index and create new index based on the mapping. To get the mapping you can run this GET Request.
GET /bishkek/_mapping
After getting the mapping you create your new Index:
PUT /bishkek_v2
{
"mappings": {
[Mapping you get from your old index]
}
}
I think this will help you to clone the index
POST /my_source_index/_clone/my_target_index

Logstash Elasticsearch Output with specific fields to index

Is there possibility for logstash to ingest whole document to one index and specific fields to other index in the elastic search output.
Example: If i have 10 fields coming in from input , can i write all the fields to one index and some specific fields to other index.
my current logstash is like
input{
kafka input
}
filter{
fingerprint id will be generated using some fields
}
output{
elastic{
whole document should go into this index
}
elastic{
Only the fingerprint field should go into this index
}
}

logstash restrict search result to past day

I want to query Elasticsearch for an index a day before current date in Logstash using Elasticsearch input plugin.
I tried the following config for logstash,
input {
elasticsearch
{
hosts => ["localhost:9200"]
index => "logstash-%{+YYYY.MM.dd-6}"
query => '{ "query": { "query_string": { "query": "*" } } }'
size => 500
scroll => "5m"
docinfo => true
}
}
output { stdout { codec => rubydebug }
}
Can someone help me on how to do it?
You can use Date math index name within your elastic search query,
Date math index name resolution enables you to search a range of
time-series indices, rather than searching all of your time-series
indices and filtering the results or maintaining aliases. Limiting the
number of indices that are searched reduces the load on the cluster
and improves execution performance. For example, if you are searching
for errors in your daily logs, you can use a date math name template
to restrict the search to the past two days.
Almost all APIs that have an index parameter, support date math in the
index parameter value.
for instance to search for indices for yesterday, assuming the index use the default Logstash index name format, logstash-YYYY.MM.dd
GET /<logstash-{now/d-1d}>/_search

Logstash doc_as_upsert cross index in Elasticsearch to eliminate duplicates

I have a logstash configuration that uses the following in the output block in an attempt to mitigate duplicates.
output {
if [type] == "usage" {
elasticsearch {
hosts => ["elastic4:9204"]
index => "usage-%{+YYYY-MM-dd-HH}"
document_id => "%{[#metadata][fingerprint]}"
action => "update"
doc_as_upsert => true
}
}
}
The fingerprint is calculated from a SHA1 hash of two unique fields.
This works when logstash sees the same doc in the same index, but since the command that generates the input data doesn't have a reliable rate at which different documents appear, logstash will sometimes insert duplicates docs in a different date stamped index.
For example, the command that logstash runs to get the input generally returns the last two hours of data. However, since I can't definitively tell when a doc will appear/disappear, I tun the command every fifteen minutes.
This is fine when the duplicates occur within the same hour. However, when the hour or day date stamp rolls over, and the document still appears, elastic/logstash thinks it's a new doc.
Is there a way to make the upsert work cross index? These would all be the same type of doc, they would simply apply to every index that matches "usage-*"
A new index is an entirely new keyspace and there's no way to tell ES to not index two documents with the same ID in two different indices.
However, you could prevent this by adding an elasticsearch filter to your pipeline which would look up the document in all indices and if it finds one, it could drop the event.
Something like this would do (note that usages would be an alias spanning all usage-* indices):
filter {
elasticsearch {
hosts => ["elastic4:9204"]
index => "usages"
query => "_id:%{[#metadata][fingerprint]}"
fields => {"_id" => "other_id"}
}
# if the document was found, drop this one
if [other_id] {
drop {}
}
}

Logstash Indexing

I would like to create two separate indexes for two different systems that are sending data to the logstash server setup for udp - syslog. In Elasticsearch, I created an Index called CiscoASA01 and another Index called CiscoASA02. How can I configure Logstash to filter all events coming from the first device to go into the CiscoASA01 index and the events coming from the second device to go to the second index? Thank you.
You can use if to separate the logs. Assume your first device is CiscoASA01 & second is CiscoASA02.
Here is the output
output {
if [host] == "CiscoASA01"
{
elasticsearch {
host => "elasticsearch_server"
index => "CiscoASA01"
}
}
if [host] == "CiscoASA02"
{
elasticsearch {
host => "elasticsearch_server"
index => "CiscoASA02"
}
}
}
The [host] is the field in logstash event. You can use it to separate the log to different output.
Hope this can help you.

Resources