ElasticSearch: Curator concurrent snapshots - elasticsearch

We're getting this message:
[2017-08-11T04:00:02,908][WARN ][r.suppressed ] path: /_snapshot/s3_currently/curator-20170811040002, params: {repository=s3_currently, wait_for_completion=true, snapshot=curator-20170811040002}
org.elasticsearch.snapshots.ConcurrentSnapshotExecutionException: [s3_currently:curator-20170811040002]a snapshot is already running
We've configured x-pack curator with two actions:
/home/curator/actions/currently.yml
---
actions:
1:
action: snapshot
description: Create snapshot every 30 minutes.
options:
repository: s3_currently
wait_for_completion: true
filters:
- filtertype: alias
aliases: living
2:
action: delete_snapshots
description: Remove recently snapshots
options:
repository: s3_currently
retry_interval: 120
retry_count: 3
filters:
- filtertype: count
count: 48
And /home/curator/actions/currently-dev.yml:
---
actions:
1:
action: snapshot
description: Create snapshot every hour for development.
options:
repository: s3_currently_dev
wait_for_completion: true
filters:
- filtertype: alias
aliases: living
2:
action: delete_snapshots
description: Remove recently snapshots
options:
repository: s3_currently_dev
retry_interval: 120
retry_count: 3
filters:
- filtertype: count
count: 24
We've added two cron jobs:
0 * * * * -> currently_dev
0,30 * * * * -> currently
Any ideas? It seems that elasticsearch doesn't allow to execute two concurrent snapshots, does it?

Elasticsearch does not allow for more than one snapshot to run at a time. The reason for this is that it is compelled to freeze the Lucene segments for the selected indices for the duration of the snapshot. It would be extremely taxing to the cluster to do this for multiple concurrent snapshots, not in terms of processing, but in terms of how it has to track all segments at all times. It must allow for new data to be indexed into new segments while others are locked/frozen for snapshotting. This could create a situation where there are too many open segments, which could deprive one or more nodes of needed memory resources. As a result, it's safer for Elasticsearch to only permit a single snapshot at a time.

Related

Helmfile - "needs" keyword has no effect

I have been trying to make use of the keyword needs (following the doc) to control the order of installation of the releases.
Here is my helmfile:
helmDefaults:
createNamespace: false
timeout: 600
helmBinary: /usr/local/bin/helm
releases:
- name: dev-sjs-pg
chart: ../helm_charts/sjs-pg
- name: dev-sjs
chart: ../helm_charts/sjs
needs: ['dev-sjs-pgg']
Regarding versions:
helmfile version v0.139.9
helm version.BuildInfo{Version:"v3.5.4", GitCommit:"1b5edb69df3d3a08df77c9902dc17af864ff05d1", GitTreeState:"clean", GoVersion:"go1.15.11"}
When I run helmfile sync , both releases are installed simultaneously. In particular, there is no error due to my spelling error (dev-sjs-pgg instead of dev-sjs-pg). It is like needs is just not read.
Could you help me understanding what I am doing wrong please ?
I tried to reproduce this. When executing helmfile --log-level=debug sync I see in the debug log:
processing 2 groups of releases in this order:
GROUP RELEASES
1 dev-sjs-pg
2 dev-sjs
I also see these are deployed one after another (just a few seconds difference because I am deploying a fast nginx chart):

Difference between curator-cli and curator action files

I'm stuck on a similar problem seen on this post, but can't find a solution : https://github.com/elastic/curator/issues/1513
To snapshot my Elasticsearch cluster (7.7.1), I use curator (5.8) to daily snapshot all indices.
I realised today that only my indices starting with "." are being snapshoted by Curator.
If I use the curator-cli, all indices are indeed seen by curator and snapshoted.
I tried to remove all filters in my action file, replaced them by :
filters:
- filtertype: none
Nothing seems to work, my dry-runs always end up listing all indices beggining with a dot.
This is my action file :
---
actions:
1:
action: snapshot
description: >-
Snapshot all indices
options:
repository: backup
name: testbackup6
ignore_unavailable: False
include_global_state: True
partial: False
wait_for_completion: True
skip_repo_fs_check: False
disable_action: False
filters:
- filtertype: none
Curator logs (I have anonymized some results)
2021-01-08 18:34:44,021 INFO DRY-RUN: snapshot: testbackup6 in repository backup with arguments: {'ignore_unavailable': False, 'include_global_state': True, 'partial': False, 'indices': '.apm-XXX,.apm-customXXX,.async-sXXX,.kibana_1,.kibana_task_manager_1,.monitoring-alerts-7,.monitoring-es-7-2021.01.02,.monitoring-es-7-2021.01.03,.monitoring-es-7-2021.01.04
...
,.triggered_watches,.watches'}
I went to see the DEBUG logs, and the indices lifecycle seems to be a problem.
Here are some accepted/rejected indices :
2021-01-08 19:54:07,925 DEBUG curator.indexlist __not_actionable:39 Index XXXX_supervision-server_logs-2020.12.31-000014 is not actionable, removing from list.
2021-01-08 19:54:07,925 DEBUG curator.indexlist __excludify:58 **Removed** from actionable list: XXX_supervision-server_logs-2020.12.31-000014 has index.lifecycle.name XXX_supervision-server_logs-policy
2021-01-08 19:54:07,925 DEBUG curator.indexlist __actionable:35 Index .monitoring-es-7-2021.01.05 is actionable and remains in the list.
2021-01-08 19:54:07,925 DEBUG curator.indexlist __excludify:58 **Remains** in actionable list: index.lifecycle.name is not set for index .monitoring-es-7-2021.01.05
2021-01-08 19:54:07,925 DEBUG curator.indexlist __not_actionable:39 Index XXX_logs-2021.01.05-000019 is not actionable, removing from list.
Has anyone experienced this ?
I can't see the link between indices having ILM policies and curator not matching them.
I can't find a workaround with regex to help me match all my indices. With the same "filtertype: none" on curator-cli, everything is OK.
Thanks a lot
I just found it ><
"allow_ilm_indices: True" must be added in the action file in order to show all indices...
The curator_cli has this option on True by default, which is not the case of curator itself.

How to update loggingService of container.v1.cluster with deployment-manager

I want to set the loggingService field of an existing container.v1.cluster through deployment-manager.
I have the following config
resources:
- name: px-cluster-1
type: container.v1.cluster
properties:
zone: europe-west1-b
cluster:
description: "dev cluster"
initialClusterVersion: "1.13"
nodePools:
- name: cluster-pool
config:
machineType: "n1-standard-1"
oauthScopes:
- https://www.googleapis.com/auth/compute
- https://www.googleapis.com/auth/devstorage.read_only
- https://www.googleapis.com/auth/logging.write
- https://www.googleapis.com/auth/monitoring
management:
autoUpgrade: true
autoRepair: true
initialNodeCount: 1
autoscaling:
enabled: true
minNodeCount: 3
maxNodeCount: 10
ipAllocationPolicy:
useIpAliases: true
loggingService: "logging.googleapis.com/kubernetes"
masterAuthorizedNetworksConfig:
enabled: false
locations:
- "europe-west1-b"
- "europe-west1-c"
When I try to run gcloud deployment-manager deployments update ..., I get the following error
ERROR: (gcloud.deployment-manager.deployments.update) Error in Operation [operation-1582040492957-59edb819a5f3c-7155f798-5ba37285]: errors:
- code: NO_METHOD_TO_UPDATE_FIELD
message: No method found to update field 'cluster' on resource 'px-cluster-1' of
type 'container.v1.cluster'. The resource may need to be recreated with the new
field.
The same succeeds if I remove loggingService.
Is there a way to update loggingService using deployment-manager without deleting the cluster?
The error NO_METHOD_TO_UPDATE_FIELD is due to updating "initialClusterVersion" when you issued the update call to GKE. This field is only used on creation of the cluster, and the type definition doesn't currently allow for it to be updated later. So that should remain static at the original value and will have no effect on the deployment moving forward or try to delete/comment that line.
Even when the previous entry is true, there is also no method to update the logging service, actually Deployment Manager doesn't have many update methods, so, try using the gcloud command to update the cluster directly, keep in mind that you have to use the monitoring service together with the logging service, so, the commando would look like:
gcloud container clusters update px-cluster-1 --logging-service=logging.googleapis.com/kubernetes --monitoring-service=monitoring.googleapis.com/kubernetes --zone=europe-west1-b

Google Cloud Build - How to Cache Bazel?

I recently started using Cloud Build with Bazel.
So I have a basic cloudbuild.yaml
steps:
- id: 'run unit tests'
name: gcr.io/cloud-builders/bazel
args: ['test', '//...']
which runs all tests of my Bazel project.
But as you can see from this screenshot, every build takes around 4 minutes, although I haven't touched any code which would affect my tests.
Locally running the tests for the first time takes about 1 minute. But running the tests a second time, with the help of Bazels cache, it takes only a few seconds.
So my goal is to use the Bazel cache with Google Cloud Build
Update
As suggested by Thierry Falvo I'v looked into those recommendations. An thus I tried to the add the following to my cloudbuild.yaml:
steps:
- name: gcr.io/cloud-builders/gsutil
args: ['cp', 'gs://cents-ideas-build-cache/bazel-bin', 'bazel-bin']
- id: 'run unit tests'
name: gcr.io/cloud-builders/bazel
args: ['test', '//...']
- name: gcr.io/cloud-builders/gsutil
args: ['cp', 'bazel-bin', 'gs://cents-ideas-build-cache/bazel-bin']
Although I created the bucket and folder, I get this error:
CommandException: No URLs matched
I think that rather than cache discrete results (artifacts), you want to use GCS (cloud storage) as a bazel remote cache.
- name: gcr.io/cloud-builders/bazel
args: ['test', '--remote_cache=https://storage.googleapis.com/<bucketname>', '--google_default_credentials', '--test_output=errors', '//...']

Elasticsearch: get current running snapshot operation

Assume, need to automate the snapshot restoring of 2 or more snapshots to elastic cluster.
It is necessary to detect, that snapshot operation is completed before next api call: _snaphot/<repository>/<snapshot>/_restore.
If I call while snapshot is restoring, cluster responses 503.
I tried to use thread pool api with running snapshot operation:
curl -XGET 'http://127.0.0.1:9200/_cat/thread_pool?h=snapshot.active
But, it returns 0 anyway.
What is proper way to do get info about current running restore operation?
UPDATE:
An example how have it managed to work with ansible:
- name: shell | restore latest snapshot
uri:
url: "http://127.0.0.1:9200/_snapshot/{{ es_snapshot_repository }}/snapshot_name/_restore"
method: "POST"
body: '{"index_settings":{"index.number_of_replicas": 0}}'
body_format: json
- name: shell | get state of active recovering operations | log indices
uri:
url: "http://127.0.0.1:9200/_recovery?active_only"
method: "GET"
register: response
until: "response.json == {}"
retries: 6
delay: 10
You can monitor status of indices being restored using Indices Recovery API.
The easiest way of doing this is looking at the stage property:
init: Recovery has not started
index: Reading index meta-data and copying bytes from source to destination
start: Starting the engine;
opening the index for use translog: Replaying transaction log
finalize: Cleanup done: Complete
done: Complete
Parameter active_only returns info about shards that are not in done state:
http://127.0.0.1:9200/_recovery?active_only

Resources