Elasticsearch snaphots to s3 - elasticsearch

I have a elasticsearch 5.6.2 cluster with one master and two data nodes and I am using Kibana for visualizing . I want to enable automatic snapshots for the elasticsearch cluster to Amazon-s3 every 30mins. Can I Know How Can I accomplish it ..? There is no proper Documentation . I had also refered curator docs and I have a question, DO I need to configure that curator or on each node ...?
Please help guys

Curator is an external process.
You must put it on one single machine. It can be a node or any other machine.
It will send REST requests to elasticsearch when needed.
Put in your crontab and that is going to be ok.
You can also call the SNAPSHOT endpoint manually from a shell script every 30 minutes and don’t use curator at all.
Elastic cloud does a backup every 30 minutes (in case you don’t want to manage the cluster yourself and have that kind of advanced features like also rolling upgrades, Kibana, security...)

Related

When configuring Snapshot for an ElasticSearch cluster, do I do that to every node?

Sorry for what may be an obvious question. But I have a 3 node ElasticSearch cluster, and I want it to take a nightly snapshot that is sent to S3 for recovery. I have done this for my test cluster which is a single node. And I was starting to do it for my 3 node production cluster when I was left wondering if I have to configure the repository and snapshot on each node separately or can I just do it on one node via Kibana and then it will replicate that across the cluster? I have looked through the documentation but didn't see anything about this.
Thank you!
Yes, you need to configure it in every node.
First you need to install the repository-s3 plugin in every node, this is explained in the documentation.
After that, you also need to add the access and secret keys in the elasticsearch-keystore of every node. (documentation).
The rest of the configuration, creating the repository and setting the snapshots, are done through Kibana once.

Automatically remove older zipkin entries in elasticsearch

This is specifically for Zipkin's Elastic Search storage connector. Which does not do the index that you can use Curator.
Is there a way of automatically removing old traces and have that as part of the ElasticSearch configuration (rather than building yet another service or cron job) Since I am using it for a development server I just need it wiped every hour or so.
From zipkin docs:
There is no support for TTL through this SpanStore. It is recommended instead to use Elastic Curator to remove indices older than the point you are interested in.

Unknow source of daily clean up of indices

I have two separate elastic clusters, each one of elastic node is docker container, which live in docker swarm. I aggregate logs from various microservices in indices, and one of them is in format "logs-timestamp".
In one of cluster I have those indices from previous days, in other one I have only from present day.
This affect only those ones in "logs-timestamp" format.
Do you have any idea? or point from I can start to lookup?
Does elastic has some form of builtin garbage collector?
Ps. I didn't start this project so basiclly I have quite small knowledge about whole infrastructure.
You should check the ILM policies documentation (here) which is one way of automatically removing old indices.
In short, check the result of this command in kibana
GET _ilm/policy
It will tell you if you have some policy configured.
The other way I know for automatic indices curation is Curator ( see here and here). You should check if Curator is installed somewhere in your infrastructure and check the configuration.
Hope it helps.

Can I use a single elasticsearch/kibana for multiple k8 clusters?

Do you know of any gotcha's or requirements that would not allow using a single ES/kibana as a target for fluentd in multiple k8 clusters?
We are engineering rolling out a new kubernetes model. I have requirements to run multiple kubernetes clusters, lets say 4-6. Even though the workload is split in multiple k8 clusters, I do not have a requirement to split the logging and believe it would be easier to find the logs for pods in all clusters in a centralized location. Also less maintenance for kibana/elasticsearch.
Using EFK for Kubernetes, can I point Fluentd from multiple k8 clusters at a single ElasticSearch/Kibana? I don't think I'm the first one with this thought however I haven't been able to find any discussion of doing this. Found lots of discussions of setting up efk but all that I have found only discuss a single k8 to its own elasticsearch/kibana.
Has anyone else gone down the path of using a single es/kibana to service logs from multiple kubernetes clusters? We'll plunge ahead with testing it out but seeing if anyone else has already gone down this road.
I dont think you should create an elastic instance for each kubernetes cluster, you can run a main elastic instance and index it all logs.
But even if you don`t have an elastic instance for each kubernetes client, i think you sohuld have a drp, so lets says instead moving your logs of all pods to elastic directly, maybe move it to kafka, and then split it to two elastic clusters.
Also it is very depend on the use case, if every kubernetes cluster is on different regions, and you need the pod`s logs in low latency (<1s), so maybe one elastic instance is not the right answer.
Based on [1] we can read:
Fluentd collects logs from pods running on cluster nodes, then routes
them to a central​​​​​​ized Elasticsearch.
Then Elasticsearch ingests these logs from Fluentd and stores them in a central location. It is also used to efficiently search text files.
Kibana is the UI; the user can visualize the collected logs and metrics and create custom dashboards based on queries.
There are several ways in which they can solve your dilemma:
a) Create a centralized dashboard and use each cluster’s Elasticsearch as backend. So you can see all your clusters logs in one place.
b) Create an Elasticsearch cluster and add each Elasticsearch into it. This is NOT the best option since you will duplicate your data several times, you will need to handle each index shards and you will need to fight with the split brain dilemma but it’s great for data resiliency.
c) Use another solution like an APM (New Relic, Instana, etc) to fully centralize your logs in one place.
[1] https://techbeacon.com/enterprise-it/9-top-open-source-tools-monitoring-kubernetes

Elasticsearch - Backup and Restore on Found cluster

I am running my cluster on found (So no option to add plugins), and I am trying to configure s3 bucket to be able to create snapshots of my own.
I managed to configure the bucket but now I am trying to run the actual backup.
Is there a way to configure scheduled backup in elasticsearch? or I need to manually trigger a backup by sending a request to elasticsearch?
Thanks!
How to use a custom repository is documented here: https://www.elastic.co/guide/en/found/current/custom-repository.html
Note that we do snapshots every 30 minutes already: https://www.elastic.co/guide/en/found/current/restoring-snapshots.html
A more appropriate place for Found-related questions is this forum: https://discuss.elastic.co/c/found

Resources