How to configure Elasticsearch ILM rollover to create indexes tiwh date? - elasticsearch

My data source write index MyIndex-%{+YYYY.MM.dd.HH.mm}, but data in index in each day to big.
I need rollover to create new index if data more than 10gb
For example
MyIndex-2022.12.23-1 size 10GB
MyIndex-2022.12.23-2 size 10GB
MyIndex-2022.12.23-3 size 10GB
...
MyIndex-2022.12.24-1 size 10GB
MyIndex-2022.12.24-2 size 10GB
...
MyIndex-2022.12.25-1 size 10GB
etc.
Can someone help me? I using logsstash to put data to elastic

do you have a Kibana instance?
If so see this article:
https://www.elastic.co/guide/en/elasticsearch/reference/8.5/index-lifecycle-management.html
You need to create an index template matching the pattern of your index.
Then create an ILM in the Stack Management. Here you should be able to set the shard and index sizes for rollovers. Just open the advanced options in the hot phase.
See here:
https://www.elastic.co/guide/en/elasticsearch/reference/8.5/getting-started-index-lifecycle-management.html
You don't need to change anything in Logstash for that.
If you don't have Kibana you need to use the APIs and some REST calls to add the policy.
Hope that helps!

Related

Usage exceeded flood-stage watermark index has read-only-allow-delete when trying to delete document

I have issue with deleting document from elasticsearch with command:
DELETE /indexName/_doc/1
When trying to fire above http request I am getting too_many_requests/12/disk usage exceeded flood-stage watermark index has read-only-allow-delete. And I understand that I need to increase disc size for my node to make it work or disable flood-stage watermark.
But when I saw read-only-allow-delete I thought that I can READ from given index and DELETE documents to free some space. But in realty I can only READ, why is that?
Does ...-allow-delete means something different or is it not related to rest call and I need to clean my node by 'hand'?
Your understanding is correct. You can READ the document but you can not DELETE single document from index. However, This will allow to delete entire index. You can read same explanation in this documentation.
Deleting documents from an index to release resources - rather than
deleting the index itself - can increase the index size over time.
When index.blocks.read_only_allow_delete is set to true, deleting
documents is not permitted. However, deleting the index itself
releases the read-only index block and makes resources available
almost immediately.

Is it possible to append (instead of restore) a snapshot of indices?

Suppose we have some indices in our cluster. I can make a snapshot of my favorite index and I can restore the same index again to my cluster if the same index is not exists or is closed. But what if the index currently exists and I need to add/append extra data/documents to it ?
Suppose I currently have 100000 documents in my index in my server. I create/add 100 documents to my index in my local system which has the same name, the same mappings and the same settings, the same number of shards and . . ., now I want to add 100 documents to my current index in my server (100000 documents) . What is the best way ?
In MySQL I use export to csv or excel and ... and it is so easy to import or append data to currently existed index.
There is no Append API for Elasticsearch but I suggest to restore indices with temporary name and use Reindex API to index local data to bigger indices. then delete temporary indices.
also you can use Logstash for this purpose (reindex). build a pipeline which read data from temp indices (Elasticsearch input plugin ) and write data to primary indices (Elasticsearch output plugin)
note: you can't have two indices with the same name in cluster.
In addition to answer by Hamid Bayat, :
Is it possible to append (instead of restore) a snapshot of indices?
Snapshots by nature are incremental i.e append-only. See this and also this. Thus, if your index has 1000 docs and you snapshot it and later add 100 more docs, then when you trigger another snapshot, only the recently added 100 docs will be snapshotted and not all the 1100. However, restore is not incremental. I.e. you cannot restore only those recently added 100 docs. If you restore an index, you restore all the docs.
From your description of the question, it seems you are looking for something like: when you add 100 docs to local ES Cluster, you also want those 100 docs to be added in the remote (other) ES Cluster as well. Am I correct?
As for export csv or excel, there's an excellent tool called es2csv that allows to export data from ES to csv. And then you can use Kibana to import the CSV data. Or use this tool called Elasticsearch_Loader. You might also want to look at another excellent tool called elasticdump

How to handle Elasticsearch data when it fills up dedicated volume

I am creating an EFK stack on a k8s cluster. I am using an EFK helm chart described here. This creates two PVC's: one for es-master and one for es-data.
Let's say I allocated 50 Gi for each of these PVC's. When these eventually fill up, my desired behavior is to have new data start overwriting the old data. Then I want the old data stored to, for example, an s3 bucket. How can I configure Elasticsearch to do this?
One easy tool that can help you do that is Elasticsearch Curator:
https://www.elastic.co/guide/en/elasticsearch/client/curator/5.5/actions.html
you can use it to:
Rollover the indices that hold the data, by size/time. This will cause each PVC to hold few indices, based on time.
snapshot the rolled over indices to backup in S3
delete old indices based on their date - delete the oldest indices in order to free up space for the new indices.
Curator can help you do all these.

Elasticsearch reindex store sizes vary greatly

I am running Elasticsearch 6.2.4. I have a program that will automatically create an index for me as well as the mappings necessary for my data. For this issue, I created an index called "landsat" but it needs to actually be named "landsat_8", so I chose to reindex. The original "landsat" index has 2 shards and 0 read replicas. The store size is ~13.4gb with ~6.6gb per shard and the index holds just over 515k documents.
I created a new index called "landsat_8" with 5 shards, 1 read replica, and started a reindex with no special options. On a very small Elastic Cloud cluster (4GB RAM), it finished in 8 minutes. It was interesting to see that the final store size was only 4.2gb, yet it still held all 515k documents.
After it was finished, I realized that I failed to create my mappings before reindexing, so I blew it away and started over. I was shocked to find that after an hour, the /cat/_indices endpoint showed that only 7.5gb of data and 154,800 documents had been reindexed. 4 hours later, the entire job seemed to have died at 13.1gb, but it only showed 254,000 documents had been reindexed.
On this small 4gb cluster, this reindex operation was maxing out CPU. I increased the cluster to the biggest one Elastic Cloud offered (64gb ram), 5 shards, 0 RR and started the job again. This time, I set the refresh_interval on the new index to -1 and changed the size for the reindex operation to 2000. Long story short, this job ended in somewhere between 1h10m and 1h19m. However, this time I ended up with a total store size of 25gb, where each shard held ~5gb.
I'm very confused as to why the reindex operation causes such wildly different results in store size and reindex performance. Why, when I don't explicitly define any mappings and let ES automatically create mappings, is the store size so much smaller? And why, when I use the exact same mappings as the original index, is the store so much bigger?
Any advice would be greatly appreciated. Thank you!
UPDATE 1:
Here are the only differences in mappings:
The left image is "landsat" and the right image is "landsat_8". There is a root level "type" field and a nested "properties.type" field in the original "landsat" index. I forgot one of my goals was to remove the field "properties.type" from the data during the reindex. I seem to have been successful in doing so, but at the same time, accidentally renamed the root-level "type" field mapping to "provider", thus "landsat_8" has an unused "provider" mapping and an auto-created "type" mapping.
So there are some problems here, but I wouldn't think this would nearly double my store size...

Selecting elasticsearch memory storage

I need to know which setting I have to set for selecting an onheap or offheap memory index. It seems that index.store.type=memory stores index data offheap, but I need to store my data onheap.
I looked at the documentation and wasn't able to find this setting.
Thanks,
Joan.

Resources