Elasticsearch Curator - delete indices except newest - elasticsearch

Using Elasticsearch curator, how do I delete all indices matching a pattern, except for the newest?
I tried using filtertype: age but it does not seem to do what I need.

Here is an example code which you can use to delete the indices which
are older than 14 days assuming your index name have the date in it. You can get more information on the below link
https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/curator.html
import os
import sys
import json, io, boto3
import time, datetime
import curator
from elasticsearch import Elasticsearch, RequestsHttpConnection
from requests_aws4auth import AWS4Auth
import boto3
esEndPoint = ES_HOST # Add the ElasticSearch host.
region = REGION # Region where the ElasticSearch is present.
service = 'es'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)
def lambda_handler(event, context):
esClient = connectES(esEndPoint)
index_list = curator.IndexList(esClient)
index_list.filter_by_age(source='name', direction='older', timestring='%Y.%m.%d', unit='days', unit_count=14)
print(index_list.indices)
if index_list.indices:
curator.DeleteIndices(index_list).do_action() # Delete the indices
def connectES(esEndPoint):
# Function used to connect to ES
try:
es = Elasticsearch(
hosts=[{'host': esEndPoint, 'port': 443}],
http_auth=awsauth,
use_ssl=True,
verify_certs=True,
connection_class=RequestsHttpConnection
)
return es
except Exception as E:
print("Unable to connect to {0}".format(esEndPoint))
print(E)

You need two filters: pattern (to match the indexes you want to delete) and age (to specify the age of the indexes to delete).
For instance the Curator configuration below is configured to delete
indexes named example_dev_*
and which are older than 10 days
Configuration:
actions:
1:
action: delete_indices
description: >-
Delete indices older than 10 days (based on index name), for example_dev_
prefixed indices.
options:
ignore_empty_list: True
disable_action: True
filters:
- filtertype: pattern
kind: prefix
value: example_dev_
- filtertype: age
source: creation_date
direction: older
unit: days
unit_count: 10
- filtertype: count
count: 1
You need to adapt both filter conditions to your needs, but that would achieve what you expect.

I suggest using the count filter after the pattern filter. Be sure to play with exclude true/false and dry-runs until it does what you expect.

Related

Elasticsearch/Kibana shows the wrong timestamp

I transfer logfiles with filebeat to elasticsearch.
The data are analyzed with kibana.
Now to my problem:
Kibana shows not the timestamp from the logfile.
Kibana shows the time of the transmission in #timestamp.
I want to show the timestamp from the logfile in kibana.
But the timestamp in the logfile is overwritten.
Where is my fault?
Has anyone a solution for my problem?
Here a example from my logfile and the my filebeat config.
{"#timestamp":"2022-06-23T10:40:25.852+02:00","#version":1,"message":"Could not refresh JMS Connection]","logger_name":"org.springframework.jms.listener.DefaultMessageListenerContainer","level":"ERROR","level_value":40000}
## Filebeat configuration
## https://github.com/elastic/beats/blob/master/deploy/docker/filebeat.docker.yml
#
filebeat.config:
modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: false
filebeat.autodiscover:
providers:
# The Docker autodiscover provider automatically retrieves logs from Docker
# containers as they start and stop.
- type: docker
hints.enabled: true
filebeat.inputs:
- type: filestream
id: pls-logs
paths:
- /usr/share/filebeat/logs/*.log
parsers:
- ndjson:
processors:
- add_cloud_metadata: ~
output.elasticsearch:
hosts: ['http://elasticsearch:9200']
username: elastic
password:
## HTTP endpoint for health checking
## https://www.elastic.co/guide/en/beats/filebeat/current/http-endpoint.html
#
http.enabled: true
http.host: 0.0.0.0
Thanks for any support!
Based upon the question, this could be one potential option, which would be to use filebeat processors. What you could do is write that initial #timestamp value to another field, like event.ingested, using the following script below:
#Script to move the timestamp to the event.ingested field
- script:
lang: javascript
id: init_format
source: >
function process(event) {
var fieldTest = event.Get("#timestamp");
event.Put("event.ingested", fieldTest);
}
And then the last processor you write could move that event.ingested field to #timestamp again using the following processor:
#setting the timestamp field to the Date/time when the event originated, which would be the event.created field
- timestamp:
field: event.created
layouts:
- '2006-01-02T15:04:05Z'
- '2006-01-02T15:04:05.999Z'
- '2006-01-02T15:04:05.999-07:00'
test:
- '2019-06-22T16:33:51Z'
- '2019-11-18T04:59:51.123Z'
- '2020-08-03T07:10:20.123456+02:00'

Get all children key values in a YAML with PyYAML

Say I have a YAML like:
Resources:
AlarmTopic:
Type: AWS::SNS::Topic
Properties:
Subscription:
- !If
- ShouldAlarm
Protocol: email
How do I get each key and value of all the children if I'm walking over each resource and I want to know if one of the values may contain a certain string? I'm using PyYAML but I'm also open to using some other library.
You can use the low-level event API if you only want to inspect scalar values:
import yaml
import sys
input = """
Resources:
AlarmTopic:
Type: AWS::SNS::Topic
Properties:
Subscription:
- !If
- ShouldAlarm
- Protocol: email
"""
for e in yaml.parse(input):
if isinstance(e, yaml.ScalarEvent):
print(e.value)
(I fixed your YAML because it had a syntax error.) This yields:
Resources
AlarmTopic
Type
AWS::SNS::Topic
Properties
Subscription
ShouldAlarm
Protocol
email

Mindmeld Elasticsearch index and QuestionAnswerer

I am using Mindmeld blueprint application (kwik_e_mart) to understand how the Question Answerer retrieves data from relevant knowledge base data file (newbie to Mindmeld, OOP and Elasticsearch).
See code snippet below:
from mindmeld.components import QuestionAnswerer
config = {"model_type": "keyword"}
qa = QuestionAnswerer(app_path='kwik_e_mart', config=config)
qa.load_kb(app_namespace='kwik_e_mart', index_name='stores',
data_file='kwik_e_mart/data/stores.json', app_path='kwik_e_mart', config=config, clean = True)
Output - Loading Elasticsearch index stores: 100%|██████████| 25/25 [00:00<00:00, 495.28it/s]
Output -Loaded 25 documents
Although Elasticsearch is able to load all 25 documents (see output above), unable to retrieve any data with index greater than 9.
stores = qa.get(index='stores')
stores[0]
Output: - {'address': '23 Elm Street, Suite 800, Springfield, OR, 97077',
'store_name': '23 Elm Street',
'open_time': '7:00',
'location': {'lon': -123.022029, 'lat': 44.046236},
'phone_number': '541-555-1100',
'id': '1',
'close_time': '19:00',
'_score': 1.0}
However, stores [10] gives an error
`stores[10]`
Output: - IndexError Traceback (most recent call last)
<ipython-input-12-08132a2cd460> in <module>
----> 1 stores[10]
IndexError: list index out of range
Not sure why documents at index higher than 9 are unreachable. My understanding is that the elasticsearch index is still pointing to remote blueprint data (http/middmeld/blueprint...) and not pointing to the folder locally.
Not sure how to resolve this. Any help is much appreciated.
By default, the get() method only returns 10 records per search - so only stores[0] through stores[9] will be valid.
You can add the size= option to your get() to increase the number of records it returns:
stores = qa.get(index='stores', size=25)
See the bottom of this section for more info.

Is it possible to take snapshot and restore with elasticsearch-curator without loosing the updates in the destination index?

I am able to run the curator to take the snapshot from the source index and restore the same snapshot in the destination index.
But all the updations that I did on the destination index are lost after the next snapshot and restore action.
Is it possible to specify not to overwrite the updations of the destination index?
source index: test_index
destination index: dest_test_index
snapshot-action.yml file
actions:
1:
action: snapshot
description: Snapshot selected indices to 'repository' with the snapshot name or name pattern in 'name'. Use all other options as assigned
options:
repository: esbackup
name:
wait_for_completion: True
max_wait: 3600
wait_interval: 10
filters:
- filtertype: pattern
kind: regex
value: '^(test_index)$'
exclude:
restore-action.yml file
actions:
1:
action: create_index
description: "Create the temporary index with dest_index_v2 name"
options:
name: dest_index_v2
2:
action: close
description: >-
Close index dest_indiex_v2.
options:
ignore_empty_list: True
skip_flush: False
delete_aliases: False
ignore_sync_failures: True
disable_action: False
filters:
- filtertype: pattern
kind: prefix
value: dest_index_v2
3:
action: restore
description: >-
Restore test_index from the most recent snapshot in temp index dest_index_v2.
options:
repository: esbackup
# If name is blank, the most recent snapshot by age will be selected
name:
# If indices is blank, all indices in the snapshot will be restored
indices: ['test_index']
rename_pattern: test_index
rename_replacement: dest_index_v2
wait_for_completion: True
max_wait: 3600
wait_interval: 10
filters:
- filtertype: none
4:
action: open
description: >-
Open index pattern dest_index_v2.
options:
disable_action: False
filters:
- filtertype: pattern
kind: prefix
value: dest_index_v2
exclude:
5:
description: "Reindex dest_index_v2 into dest_test_index"
action: reindex
options:
wait_interval: 9
max_wait: -1
request_body:
source:
index: dest_index_v2
dest:
index: dest_test_index
filters:
- filtertype: none
6:
action: delete_indices
description: >-
Delete index dest_index_v2. Ignore the error if the filter does not result in an
actionable list of indices (ignore_empty_list) and exit cleanly.
options:
ignore_empty_list: False
disable_action: False
filters:
- filtertype: pattern
kind: prefix
value: dest_index_v2
If you are looking for some elasticsearch setting which will merge in updated destination index with the source index(which you are restoring from snapshot), then the short answer is NO.
You can write custom code to perform following operation to make sure destination index update is not lost.
Restore source index(test_index) to a temporary index in the cluster
, lets call this index as temp_index
Retrieve documents from temp_index and insert in destination index (dest_test_index) with op_type=create
Operation type create will make sure that the index operation will fail if a document by that id already exists in the index.
You can refer to documentation here
Hope this solves your purpose.

Curator 4.0 : Unable to take snapshot or run any action. Following examples from the document

I am trying to take snapshot of elastic index using curator 4. (Windows machine)
Getting below error (Getting same error for all actions).
Failed to complete action: snapshot. : Not an IndexList object. Type:
Any idea when we get this ?
I am following the examples provided in the documentation
https://www.elastic.co/guide/en/elasticsearch/client/curator/current/snapshot.html
Action yaml file :
actions:
1:
action: snapshot
description: >-
Snapshot logstash- prefixed indices older than 1 day (based on index
creation_date) with the default snapshot name pattern of
'curator-%Y%m%d%H%M%S'. Wait for the snapshot to complete. Do not skip
the repository filesystem access check. Use the other options to create
the snapshot.
options:
repository: myrepo
name: shan
ignore_unavailable: False
include_global_state: True
partial: False
wait_for_completion: True
skip_repo_fs_check: False
timeout_override:
continue_if_exception: False
disable_action: False
filters:
- filtertype: age
source: creation_date
direction: younger
unit: days
unit_count: 1
field:
stats_result:
epoch:
exclude:
OutPut :
2016-07-25 22:16:40,929 INFO Action #1: snapshot
2016-07-25 22:16:40,929 INFO Starting new HTTP connection (1): 127.0.0.1
2016-07-25 22:16:40,944 INFO GET http://127.0.0.1:9200/ [status:200 request:0.015s]
2016-07-25 22:16:40,946 INFO GET http://127.0.0.1:9200/_all/_settings?expand_wildcards=open%2Cclosed [status:200 request:0.002s]
2016-07-25 22:16:40,950 INFO GET http://127.0.0.1:9200/_cluster/state/metadata/.marvel-es-1-2016.06.27,.marvel-es-1-2016.06.28,.marvel-es-1-2016.06.29,.marvel-es-1-2016.06.30,.marvel-es-data-1,shan-claim-1 [status:200 request:0.004s]
2016-07-25 22:16:40,993 INFO GET http://127.0.0.1:9200/.marvel-es-1-2016.06.27,.marvel-es-1-2016.06.28,.marvel-es-1-2016.06.29,.marvel-es-1-2016.06.30,.marvel-es-data-1,shan-claim-1/_stats/store,docs [status:200 request:0.042s]
2016-07-25 22:16:40,993 ERROR Failed to complete action: snapshot. <class 'TypeError' at 0x000000001DFCC400>: Not an IndexList object. Type: <class 'curator.indexlist.IndexList' at 0x0000000002DB39B8>.
You need to add another filtertype so curator knows which indexs to run against. For example if your indexes are named logstash- your filters would look like
filters:
- filtertype: pattern
kind: prefix
value: logstash-
exclude:
- filtertype: age
source: creation_date
direction: younger
unit: days
unit_count: 1
field:
stats_result:
epoch:
exclude:
There is a bad identation at the beginning of yourfile. The acion list should be within the "actions" keyword. This is your root level.

Resources