Remove or delete old data from elastic search - elasticsearch

How to remove old data from elastic search index as the index has large amount of data being inserted every day.

You can do that with delete by query plugin.
Assuming you have some timestamp or creation date field in your index, your query would look something like this
DELETE /your_index/your_type/_query
{
"query": {
"range": {
"timestamp": {
"lte": "now-10y"
}
}
}
}
This will delete records older than 10 years.
I hope this helps

Split data to daily indexes and use alias as old index name. then Delete the each index daily. just as logstash:
Daily indices :logstash-20151011,logstash-20151012,logstash-20151013.
Full Alias: logstash
Then daily delete last index.

If you are using time-based indices, that should be something like:
curl -XDELETE http://localhost:9200/test-2017-06

Related

How to create rolling index with date as the index name?

My elastic search index will ingest thousands of documents per second. The service which puts documents in the index doesn't creates a new index, instead it just gets current data in nodejs and indexes docs in "log-YYYY.MM.DD". So, we know index is created automatically if not present.
Now, my question is can this practice of creating index and putting docs at the same time cause performance issues or failures given that index wil be ingesting thousands of docs per second ?
If the answer to above question is yes, how can I create a rolling index whitch date as the index name? Say today is 5 May, 2021, so I want automatic creation of index for 6 May, 2021 in the format log-2021.05.06.
For your first question, may be this can help
how many indices?
For the second question I think you can use index-alias
like
PUT /_index_template/logdate_template
{
"index_patterns": [
"log*"
],
"priority": 1,
"template": {
"aliases": {
"log":{}
},
"mappings": {
//your mappings
}
}
}
}
As here index_pattern is "log*",
in your application code you can have a job, which creates index everyday by generating date in required format and calling
PUT log-YYYY.MM.DD
The advantage of index-alias is: You can access all those indices with "log" only.

Elastic GET by ID query on rollover alias fails with ""Alias [...] has more than one indices associated with it..."

Our new rollover indices just rolled over. Now this query...
GET http://my.elastic/system-logs/_doc/7e8017d8-0cb8-4b9e-b021-b2a4b4ac71c7
...fails with this:
"Alias [system-logs] has more than one indices associated with it [[system-logs-000002, system-logs-000001]], can't execute a single index op"
But doing the same thing with _search works fine:
GET http://my.elastic/system-logs/_search/
{
"query": {
"bool": {
"must": [{"term": {"_id": "a1906f52-3957-4f4b-9b40-531422e3a04e"}}]
}
}
}
The exception comes from this code, which looks like there is an allowAliasesToMultipleIndices setting for this, but I haven't been able to find a place to set it.
We're on Elastic 6.8.
In the first http request, you are just trying to find the doc with particular id on an index which in turn is an alias of more than one index.
That's the problem.
Reason:
_doc is a mapping type in elastic search. It is used to segregate documents in the same index. So it cannot check across the indices. It is deprecated. Refer, this also
And you need to use GET request with the permitted queries[like your second example] (term, terms, match, query_string, simple_query_string). Refer

logstash restrict search result to past day

I want to query Elasticsearch for an index a day before current date in Logstash using Elasticsearch input plugin.
I tried the following config for logstash,
input {
elasticsearch
{
hosts => ["localhost:9200"]
index => "logstash-%{+YYYY.MM.dd-6}"
query => '{ "query": { "query_string": { "query": "*" } } }'
size => 500
scroll => "5m"
docinfo => true
}
}
output { stdout { codec => rubydebug }
}
Can someone help me on how to do it?
You can use Date math index name within your elastic search query,
Date math index name resolution enables you to search a range of
time-series indices, rather than searching all of your time-series
indices and filtering the results or maintaining aliases. Limiting the
number of indices that are searched reduces the load on the cluster
and improves execution performance. For example, if you are searching
for errors in your daily logs, you can use a date math name template
to restrict the search to the past two days.
Almost all APIs that have an index parameter, support date math in the
index parameter value.
for instance to search for indices for yesterday, assuming the index use the default Logstash index name format, logstash-YYYY.MM.dd
GET /<logstash-{now/d-1d}>/_search

Exclude results from Elasticsearch / Kibana based on aggregation value

Is it possible to exclude results based on the outcome of an aggregation?
In other words, I have aggregated on a Term and a whole bunch of results appear in a data table ordered in descending order by the count. Is it possible to configure kibana / elasticsearch to exclude results where count is 1 or less. (Where count is an aggregation).
I realise I can export the raw data from the data table visualization and delete those records manually through a text editor or excel. But I am trying to convince my organization that elasticsearch is a cool new thing and this is one of their 1st requirements...
You can exclude the result from the search by applying a filter here a sample that can be helpfull.
"query": {
"bool": {
"filter": {
"range": {
"Your_term": {
"gte": 1
}
}
}
}

Rebuild index with zero downtime

Currently working on something and needed some help. I will have an elastic index populated from a sql database. There will be an initial full reindex from the sql database then there will be nightly job which will update / delete / insert updates.
In the event of a major failure I may need to do full reindex. Ideally i want zero downtime. I did find some articles about creating aliases etc however this sees to be more updates to field mappings. My situation is a full reindex of the data from my source db. Can i just get that data push the docs to elastic and elastic will just update the existing index as ids will be same? Or do i need to do something else?
Regards
Ismail
For zero downtime you can create a new index, populate it from your database, and use the alias to switch from the old index to the new one. Steps:
Call your main index something like main_index_1 (or whatever you like)
Create an alias for that index called main_index
curl -XPUT 'localhost:9200/main_index_1/_alias/main_index?pretty
Set up your application to point to this alias
Create a new index called main_index_2 and index it from your database
Switch the alias to point to the new index
curl -XPOST 'localhost:9200/_aliases?pretty' -H 'Content-Type: application/json' -d
{
"actions": [
{ "remove": { "index": "main_index_1", "alias": "main_index" }},
{ "add": { "index": "main_index_2", "alias": "main_index" }}
]
}

Resources