Elasticsearch create multi index on a daily bases - elasticsearch

I load this index in elasticsearch
curl -XPUT 'localhost:9200/filebeat?pretty' -H 'Content-Type: application/json' -d'
{
"mappings": {
"_default_": {
"_all": {
"enabled": true,
"norms": {
"enabled": false
}
},
"dynamic_templates": [
{
"template1": {
"mapping": {
"doc_values": true,
"ignore_above": 50000,
"index": "not_analyzed",
"type": "{dynamic_type}"
},
"match": "*"
}
}
],
"properties": {
"#timestamp": {
"type": "date"
},
"message": {
"type": "string",
"index": "analyzed"
},
"offset": {
"type": "long",
"doc_values": "true"
},
"geoip" : {
"type" : "object",
"dynamic": true,
"properties" : {
"location" : { "type" : "geo_point" }
}
}
}
}
},
"settings": {
"index.refresh_interval": "2s"
},
"template": "filebeat-*"
}
'
and the result of curl 'localhost:9200/_cat/indices?v' is filebeat-2018-02-05
and every day on index add to the list of elasticsearch on a daily basies and I have to add it in kibana if I want to search on my latest log file. why elasticsearch add multiple index on a daily bases. and how can I solve this problem (just have my own indexes)
thank you.

I assume you're pushing data to elasticsearch using filebeat.
Elasticsearch doesnt decide what index your data should be written to. It is filebeat that tells elasticsearch where the data should be written. And the default behaviour of filebeat/logstash is to create a new new index every day.
If you want to visualize data for a range of index patterns, you can use the wildcard symbol in your kibana index pattern, say filebeat-*. And all visualizations created against filebeat-* should have aggregated data from all your filebeat- indices.
The reason to have a new index everyday is to help with the logging use case, where new data is more valuable than old data. Hence, this gives an opportunity to easily retire old data, or to move old indices to a less performant elasticsearch node etc.
If you still need a new pattern, you should be able to modify your filebeat config file and specify the new index_pattern value. Document

Related

Avoid creating dual mappings from logstash

I notice that logstash creates an extra "keyword" field in the index mapping for every string field that it extracts from the log files and sends to elastic search.
There are many fields that I've removed completely with the prune plugin, but there are other fields that I don't want to remove completely, but I also don't need to have a *.keyword for them.
Is there a way to have logstash only create *.keyword fields for some fields and not others? Specifically, is there a way for logstash to have a whitelist of fields that it is OK to create *.keywords for, and not do it for anything else?
(using elasticsearch 6.x)
I think you need to change the mapping of the desired fields. The mapping page shows the default text type mapping:
https://www.elastic.co/guide/en/elasticsearch/reference/5.5/breaking_50_mapping_changes.html
I tried to set a field without a keyword field and it worked except you couldn't agregate on that field (I tried terms aggregation) even if you set index: true in the mapping. I might have missed something but I think this is where you should start.
The solution I'm working with for now is a dynamic templates.
I can map some fields to just text and others to text and a keyword. For example:
{
"mappings": {
"doc": {
"dynamic_templates": [
{
"match_my_custom_fields": {
"match_mapping_type": "string",
"match": "custom_prefix_*",
"mapping": {
"type": "text",
"fields": {
"raw": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
{
"strings_as_keywords": {
"match_mapping_type": "string",
"mapping": {
"type": "keyword",
"ignore_above": 256
}
}
}
],
"properties": {
"geoip": {
"dynamic": true,
"properties": {
"ip": {
"type": "ip"
},
"location": {
"type": "geo_point"
},
"latitude": {
"type": "half_float"
},
"longitude": {
"type": "half_float"
}
}
}
}
}
}
This way, everything beginning with custom_prefix_ will have a text and keyword field, and everything else will just have a keyword.
Of course, I somehow broke the geoip.geo_point that was being emitted from the geoip logstash plugin, and now my map visualizations won't work, so I need to figure out how to restore that.
EDIT: Got geo_point working again, see the "geoip" prop

Only allow fields that are in the index template

I have logstash pushing docs into an elasticsearch cluster.
And I apply a template to the indices with logstash:
elasticsearch {
hosts => 1.1.1.1.,2.2.2.2.
index => "logstash-myindex-%{+YYYY-MM-dd}"
template_name => "mytemplate"
template => "/etc/logstash/index_templates/mytemplate.json"
template_overwrite => true
}
Is there a way I can have only the fields defined in the template get added to the docs? Because sometimes the docs have a bunch of other fields I don't care about and I don't want to manually filter out each one. I want to be able to say if field not in index template do not add.
edit:
I did this in my index template but fields not specified in the template are still getting added to docs:
{
"template": "logstash-myindex*",
"order": 10,
"mappings": {
"_default_": {
"dynamic": "scrict",
"_all": {
"enabled": false
},
"properties": {
"#timestamp": {
"type": "date",
"include_in_all": false
},
"#version": {
"type": "keyword",
"include_in_all": false
},
"bytesReceived": {
"type": "text",
"norms": false,
"fields": {
"keyword": {
"type": "keyword"
}
}
},
.... etc
I'm not familiar with logstash - but I'm assuming this is just like creating an index in ElasticSearch.
In ElasticSearch you can disabled the dynamic creation of fields by adding:
"dynamic": false
to the mapping.
This would look something like this:
{
"mappings": {
"_default_": {
"dynamic": false
}
}
}

Elasticsearch first query is slow, rest of them are fast

I'm using this kind of mapping (well, it's a shortener version in order to make the question easier) on a children-parent relationship where item is the parent and user_items is the children.
curl -XPUT 'localhost:9200/myindex?pretty=true' -d '{
"mappings": {
"items": {
"dynamic": "strict",
"properties" : {
"title" : { "type": "string" },
"body" : { "type": "string" },
}},
"user_items": {
"dynamic": "strict",
"_parent": {"type": "items" },
"properties" : {
"user_id" : { "type": "integer" },
"source_id" : { "type": "integer" },
}}}}'
And the type of query I usually make:
curl -XGET 'localhost:9200/myindex/items/_search?pretty=true' -d '{
"query": {
"bool": {
"must": [
{
"query_string": {
"fields": ["title", "body"],
"query": "mercado"
}
},
{
"has_child": {
"type": "user_items",
"query": {
"term": {
"user_id": 655
}}}}]}}}'
On this query it has to search on the fields title and body the string mercado on a given user_id, in this case 655.
I read that the reason of being so slow the first query is that it gets cacheed and then the rest queries are fast because it works with the cached content.
I read I can make the first query faster using eager to preload my data (using "loading" : "eager", right?) but I dont know what do I've to preload. Do I've to use the earger on title and body as follows?
{
"mappings": {
"items": {
"dynamic": "strict",
"properties" : {
"title" : { "type": "string" ,
"fielddata": {
"loading" : "eager"}},
"body" : { "type": "string",
"fielddata": {
"loading" : "eager"}},
}},
"user_items": {
"dynamic": "strict",
"_parent": {"type": "items" },
"properties" : {
"user_id" : { "type": "integer" },
"source_id" : { "type": "integer" },
}}}}'
Any other recommendation fot boosting/cacheeing the first query is welcome. Thanks in advance
PS: I'm using ES 2.3.2 under a Linux machine and I've a total of 25.396.369 documents.
Just fixed the same issue. Field Data preloading was the key. Index warming was deprecated and doc_values is on by default. My application searched a couple fields on a large index (100G+) and was slow I had to rebuild the index with loading=eager for all of the fields that I searched on. This preloads it and causes a pretty long startup but after that search went from initial of 10s (<400ms afterwards) to <900ms initial search (<400ms afterwards). Make the mapping and reimport the data
PUT localhost:9200/newindex/
{
"mappings": {
"items": {
"properties": {
"title": {
"type": "string",
"fielddata": {
"loading" : "eager"
}
},
"body": {
"type": "string",
"fielddata": {
"loading" : "eager"
}
}
}
}
}
}
There are three things you can do.
Use field data preloading
To preload field data use following snippet in mapping
"fielddata": {
"loading" : "eager"
}
More details here
Use index warmer
Index warmers are certain queries that you can configure which will run automatically whenever a index is refreshed.
This link contains details on how to set up a warmer.
Use doc_values
Doc values are the on-disk data structure, built at document index time, which makes data access pattern possible for aggregation and sorting possible.
Find more details here

How to define the shards number on ElasticSearch?

I've installed, a elasticsearch cluster with three nodes, i pretend to use for search emails.
On my platform i'll have 40k mails per month. So my question is How to define the shards number and replicas number on ElasticSearch?
Are there best practices for the measure?
Thanks in advance.
You can define it in the elasticsearch.yml file or when you create a template for an index.
The best would be to have an index per timeframe like one index per month or week. This will help because we cant increase the number of shards per index once its created.
Now you need to define the maping and settings of the indices that are going to be created in future. So you need to define an index template which will in turn define these.
The best practice would be to define the shard number and replica number in the template.
Following is a sample template which would be applied to any index who is not created and has a name starting with te.
curl -XPUT localhost:9200/_template/te_prefix -d '{
"template" : "te*",
"settings" : {
"number_of_shards" : 3,
"number_of_replicas" : 1
},
"mappings" : {
"_default_" : {
"_all" : { "enabled" : false }
}
}
}'
Or if you want to create just one index and settings , you can use the following -
Here index name is stats.
curl -X PUT "http://localhost/stats" -d '{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"analysis": {
"analyzer": {
"flat": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
},
"mappings": {
"stats": {
"properties": {
"DocCount": {
"type": "long"
},
"Query": {
"type": "string",
"analyzer": "flat"
},
"ResponseTime": {
"type": "long"
}
}
}
}
}'

Changing elasticsearch index's shard-count on the next index-rotation

I have an ELK (Elasticsearch-Kibana) stack wherein the elasticsearch node has the default shard value of 5. Logs are pushed to it in logstash format (logstash-YYYY.MM.DD), which - correct me if I am wrong - are indexed date-wise.
Since I cannot change the shard count of an existing index without reindexing, I want to increase the number of shards to 8 when the next index is created. I figured that the ES-API allows on-the-fly persistent changes.
How do I go about doing this?
You can use the "Template Management" features in Elasticsearch: http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/indices-templates.html
Create a new logstash template by using:
curl -XPUT localhost:9200/_template/logstash -d '
{
"template": "logstash-*",
"settings": {
"number_of_replicas": 1,
"number_of_shards": 8,
"index.refresh_interval": "5s"
},
"mappings": {
"_default_": {
"_all": {
"enabled": true
},
"dynamic_templates": [
{
"string_fields": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "analyzed",
"omit_norms": true,
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
}
}
}
],
"properties": {
"#version": {
"type": "string",
"index": "not_analyzed"
},
"geoip": {
"type": "object",
"dynamic": true,
"path": "full",
"properties": {
"location": {
"type": "geo_point"
}
}
}
}
}
}
}'
The next time the index that matches your pattern is created, it will be created with your new settings.
The setting is on your elasticsearch. You need to change to config file config/elasticsearch.yml
Change the index.number_of_shards: 8. and restart elasticsearch. The new configuration will set and the new index will use the new configuration, which create 8 shard as you want.
Best would be to use templates and to add one I would recommend Kopf pluin found here: https://github.com/lmenezes/elasticsearch-kopf
You can ofcourse use the API:
curl -XPUT $ELASTICSEARCH-MASTER$:9200/_template/$TEMPLATE-NAME$ -d '$TEMPLATE-CONTENT$'
In the plugin: on the top left corner click on more -> Index templates and then create a new template and make sure you have the following settings as part of your template:
{
"order": 0,
"template": "logstash*",
"settings": {
"index": {
"number_of_shards": "5",
"number_of_replicas": "1"
}
},
"mappings": {### your mapping ####},
"aliases": {}
}
The above setting will make sure that if a new new index with name logstash* is created it would have 5 number of shards and 1 replica.

Resources