Elastic index rollover at inconsistent docs count - elasticsearch

I have created ILM policy with following configuration :
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_docs": 15000
}
}
},
"delete": {
"min_age": "1d",
"actions": {
"delete": {}
}
}
}
}
}
With the index_template for matching indices.
Now when I bootstrap the index with initial index lets say <index-name>-000001 then the expectation is to rollover the index once docs.count reaches to 15000.
The rollover is happening but at random docs.count and I'm not sure why it is happening.
Also docs.count is not updating I will have to manually hit refresh API, then docs count is getting updated at /_cat/indices. Please let me know what is wrong in the config. Did I missing anything?

The problem was with the refresh_interval.
By default, Elasticsearch periodically refreshes indices every second, but only on indices that have received one search request or more in the last 30 seconds.
My indices were not getting search request so couldn't update the indices.
I changed the default setting, I added "refresh_interval":"10s" in my index template which inherited into my newly created index.
Also I changed the _cluster/setting poll interval to 1 minute for testing and it worked!

Related

What is the best way to update cache in elasticsearch

I'm using elasticsearch index as a cache table.
My document structure is the following:
{
"mappings": {
"dynamic": False,
"properties": {
"query_str": {"type": "text"},
"search_results": {
"type": "object",
"enabled": false
},
"query_embedding": {
"type": "dense_vector",
"dims": 768,
},
}
}
The cache search is performed via embedding vector similarity. So if the embedding of the new query is close enough to a cached one, it is considered as a cache hit, and search_results field is returned to the user.
The problem is that I need to update cached results about once an hour. I wish my service won't lose the ability to use cache efficiently while updating procedure, so I'm not sure which one of solutions is the best:
Sequentially update documents one-by-one, so the index won't be destroyed. The drawback of this solution I afraid is the fact, that every update causes index rebuilding, so the cache requests will become slow
Create entirely new index with new results and then somehow swap current cache index with the new one. The drawbacks I see are
a) I've found no elegant way to swap indexes
b) Users will get their cached resuts lately than in solution (1)
I would go with #2 as everytime you update a document the cache is flushed.
There is an elegant way to swap indices:
You have an alias that points to your current index, you fill a new index with the fresh records, and then you point this alias to the new index.
Something like this:
Current index name is items-2022-11-26-001
Create alias items pointing to items-2022-11-26-001
POST _aliases
{
"actions": [
{
"add": {
"index": "items-2022-11-26-001",
"alias": "items"
}
}
]
}
Create new index with fresh data items-2022-11-26-002
When it finishes, now point the items alias to items-2022-11-26-002
POST _aliases
{
"actions": [
{
"remove": {
"index": "items-2022-11-26-001",
"alias": "items"
}
},
{
"add": {
"index": "items-2022-11-26-002",
"alias": "items"
}
}
]
}
Delete items-2022-11-26-001
You run all your queries against "items" alias that will act as an index.
References:
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html

ElasticSearch ILM not deleting indices

I have set a simple ILM policy on my fluentd.* indices to be deleted after (for testing - ) a short period of time.
ILM:
PUT _ilm/policy/fluentd
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"rollover": {
"max_age": "1d",
"max_size": "1gb"
},
"set_priority": {
"priority": 100
}
}
},
"delete": {
"min_age": "4d",
"actions": {
"delete": {}
}
}
}
}
}
Index Template:
PUT _template/fluentd
{
"order": 0,
"index_patterns": [
"fluentd.*"
],
"settings": {
"index": {
"lifecycle": {
"name": "fluentd"
}
}
},
"aliases": {
"fluent": {}
}
}
With these settings, I expected ES to delete indices older than 5-6 days, but there are still indices from 3 weeks ago in ES. Currently, it says there are 108 linked indices to this ILM policy.
What is it actually doing, it seems it's not doing anything at all... how to delete indices after x days?
I tried first to use the "index template" but it's useless, it does not apply settings to each index (maybe yes but only on creation????).
Then I put the ILM on the index by hand (another bug: you can't select all index and hit "add ILM policy" - you need to add the policy one by one) which required me to click about 600 times.
Now the problem was, I had "hot" phase defined but it didn't trigger (it's buggy?) - because the hot phase didn't trigger (i set to to "rollover after 1 day after index creation") - the delete phase didn't either. When I removed the hot phase and applied the ILM to index again with only delete - it worked! but adding and removing all this is buggy, I get Ooops, something went wrong errors here and there.
I don't understand why I have to remove the ILM and reapply it to each index when I change something in the ILM policy. It's 1000% inconvenient.
ES really needs to put some work into it, it's still too beta and I got a hell lot of status code 500, although I am using most recent version directly on Elastic Cloud.
With these settings, I expected ES to delete indices older than 5-6 days, but there are still indices from 3 weeks ago in ES. Currently, it says there are 108 linked indices to this ILM policy.
With your settings, the delete phase starts at 4 day from rollover. If you want to start the delete phase at 4 day from "index creation" you need to remove the rollover action from the hot phase:
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"set_priority": {
"priority": 100
}
}
},
"delete": {
"min_age": "4d",
"actions": {
"delete": {}
}
}
}
}
}
I tried first to use the "index template" but it's useless, it does not apply settings to each index (maybe yes but only on creation????).
Yes, it works on index creation.
Then I put the ILM on the index by hand (another bug: you can't select all index and hit "add ILM policy" - you need to add the policy one by one) which required me to click about 600 times.
Kibana does not allow you to apply ILM policy to all index, but the elasticsearch API allows it!
Simply open a kibana dev tools and run the follow request:
PUT fluentd.*/_settings
{
"index": {
"lifecycle": {
"name": "fluentd"
}
}
}
Now the problem was, I had "hot" phase defined but it didn't trigger (it's buggy?) - because the hot phase didn't trigger (i set to to "rollover after 1 day after index creation") - the delete phase didn't either. When I removed the hot phase and applied the ILM to index again with only delete - it worked! but adding and removing all this is buggy, I get Ooops, something went wrong errors here and there.
If rollover phase was not triggered, the ILM could not progress.
I don't understand why I have to remove the ILM and reapply it to each index when I change something in the ILM policy. It's 1000% inconvenient.
Because the ILM definition are cached on each index.
see the doc: https://www.elastic.co/guide/en/elasticsearch/reference/current/ilm-index-lifecycle.html#ilm-phase-execution
A little bit late, but maybe it will help somebody.
Another reason can be like that was mentioned here:
ILM is not really intended to be used on 1m lifecycle. I Do not
believe You will achieve your desired behavior. My understanding is
that ILM is an opportunistic background task it is not preemptive so
it is not going to execute on the exact time frame.
It's designed to work on the order of hours or days not minutes.
I have the same situation at my indices and I checked - indices are deleted, but later than I sat them up.

How to point elasticsearch alias to current index and removing the alias from old index from index template?

In our application , we are creating the elasticsearch index daily basis and index pattern is index-. (eg. index-17-09-2019). But our application is accessing the index through an alias which is pointing the current index. Now attaching and removing of the alias with the index is done through a cron job. Is it possible to do it through through index template as we are avoiding the cron job.
We can attach alias with the index through index template but I am not sure whether we can detach the alias with the old index and add it to the new index through index template.
That can be done with built-in index lifecycle management (ILM). Your application will be sending data to index alias and ILM will take care of the rest.
Here is the description of how it can be done, but basically you need to:
1. Create ILM job
PUT /_ilm/policy/my_policy
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_age": "1d"
}
}
}
}
}
}
2. Create an index template with ILM policy attached
PUT _template/my_template
{
"index_patterns": ["test-*"],
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1,
"index.lifecycle.name": "my_policy",
"index.lifecycle.rollover_alias": "test-alias"
}
}
3. Start the process by creating init index
PUT test-000001
{
"aliases": {
"test-alias":{
"is_write_index": true
}
}
}
That will help you with handling creation of new index every day without external CRON job. You can also extend your policy, later on to e.g. delete old indices after 7 days after rollover.
Hope that helps.

Elasticsearch - Reindex whole cluster using pattern for new index name

I have an index with thousands of indices, with 5 shards per index.
I would like to reindex them with only 1 shard per index.
Is there a build in solution in Elastic to reindex for instance all the indices by adding "-reindexed" to each index ?
Looks like you want to dynamically change the index names while reindexing.
Let's understand this with an example:
1) Add some indices:
POST sample/_doc/1
{
"test" : "sample"
}
POST sample1/_doc/1
{
"test" : "sample"
}
POST sample2/_doc/1
{
"test" : "sample"
}
2) Use Reindex API to dynamically change the index names while reindexing multiple indices:
POST _reindex
{
"source": {
"index": "sample*"
},
"dest": {
"index": ""
},
"script": {
"inline": "ctx._index = ctx._index + '-reindexed'"
}
}
The above request will reindex all the indices starting with sample and add -reindexed in their indexNames. So that means sample, sample1 and sample2 will be reindexed as sample-reindexed, sample1-reindexed and sample2-reindexed all with this one request.
In order to set up the destination indices with one shard you need to
create those indices before reindexing.
Hope that helps.
You could do a simple reindex but I'd also recommend you take a look at the Shrink Index API:
https://www.elastic.co/guide/en/elasticsearch/reference/7.0/indices-shrink-index.html
The documentation above links to v7.0, but this has been around for many iterations.
In your example, you would do something similar to the following:
First, reallocate copies of all primary or replica shards to a single node and prevent any future write-access while the shrink operations are being performed.
PUT /my_source_index/_settings
{
"settings": {
"index.routing.allocation.require._name": "shrink_node_name",
"index.blocks.write": true
}
}
Initiate the shrink operation, clear the index settings set in the previous command, and update your primary and replica settings on the target index:
POST my_source_index/_shrink/my_target_index-reindexed
{
"settings": {
"index.routing.allocation.require._name": null,
"index.blocks.write": null,
"index.number_of_replicas": 1,
"index.number_of_shards": 1,
"index.codec": "best_compression"
}
}
Note the above is also allocating a replica shard - if you don't want this, ensure you set this to 0.
You would want to set up a script of some sort to iterate through the list of source indices one by one.

Modify default number of Elasticsearch shards

If I have a 15 node cluster, do I have to change the
index.number_of_shards
value on all 15 nodes, and restart them, before the new value comes into effect for new indexes?
That is right changing index.number_of_shards defaults in config file would involve changing the setting on all nodes and then restarting the instance ideally following the guidelines for rolling restarts.
However if that is not an option and if explicitly specifying the number_of_shards in the settings while creating the new index is not ideal then the workaround would be using index templates
Example:
One can create an index_defaults default as below
PUT /_template/index_defaults
{
"template": "*",
"settings": {
"number_of_shards": 4
}
}
This applies the setting specified in index_defaults template to all new indexes.
Once you set the number of shards for an index in ElasticSearch, you cannot change them. You will need to create a new index with the desired number of shards, and depending on your use case, you may want then to transfer the data to the new index.
I say depending on the use case because, for instance, if you are storing time based data such as log events, it is perfectly reasonable to close one index and open a new one with a different number of shards, and index all data going forward to that new index, keeping the old one for searches.
However, if your use case is, for instance, storing blog documents, and your indices are by topic, then you will need to (a) create new indices as stated above with a different number of shards and (b) reindex your data. For (b) I recommend using the Scroll and Scan API to get the data out of the old index.
You need to create a template for new indices that will be created:
PUT /_template/index_defaults
{
"index_patterns": "*",
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 1
}
}
}
For old indices you need to reindex.
Example: from my_old_index to my_new_index
Create the new index with appropriate mapping and settings:
PUT my_new_index
{
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 1
}
}
}
Reindex from old index to new one, specify type only if you desire:
POST /_reindex?slices=5
{
"size": 100000,
"source": { "index": "my_old_index" },
"dest": { "index": "my_new_index", "type": "my_type" }
}
Updated syntax to avoid some deprecation warnings in Elasticsearch 6+
per
https://www.elastic.co/guide/en/elasticsearch/reference/6.0/indices-templates.html
PUT /_template/index_defaults
{
"index_patterns": ["*"],
"order" : 0,
"settings": {
"number_of_shards": 2
}
}
Please remember that specifying the number of shards is a static operation and should be done when creating an index. But, any change after the index is created will require complete reindexing again which will take time.
To create the number of shards when creating an index use this command.
curl -XPUT ‘localhost:9200/my_sample_index?pretty’ -H ‘Content-Type: application/json’ -d’
{
“settings:”{
“number_of_shards”:2,
“number_of_replicas”:0
}
}
you don't have to to run this on all the nodes. run them on any one node. All the nodes communicate with each other about the change to the elastic index.

Resources