Automatically remove older zipkin entries in elasticsearch - elasticsearch

This is specifically for Zipkin's Elastic Search storage connector. Which does not do the index that you can use Curator.
Is there a way of automatically removing old traces and have that as part of the ElasticSearch configuration (rather than building yet another service or cron job) Since I am using it for a development server I just need it wiped every hour or so.

From zipkin docs:
There is no support for TTL through this SpanStore. It is recommended instead to use Elastic Curator to remove indices older than the point you are interested in.

Related

AWS Elasticsearch cluster upgarde from 6.3 to 7

Presently AWS Elasticsearch cluster version is 6.3 and I am planning to upgrade it to 7. reindexing is also have to be done. reindexing is required
to have _doc as type for the indices instead of our custom mapping types.
Below are my queries:
1. What is the end to end process of upgrading AWS ES cluster version.
2. What are the impacts post upgrade.
3. Any specific backup is required?
4. How to perform upgrade in AWS cluster?
5. Post upgrade , Do I need to carry any validtion?
6. when to do reindexing? post cluster upgrade?
What is the end to end process of upgrading AWS ES cluster version.
You can perform an in-place upgrade of an AWS ES cluster from the AWS console. Upgrade triggers a blue green deployment and takes quite a while. For example, We upgraded an ES 6.8 cluster with 4 nodes (10 TB each) to OpenSearch 1.3 recently and it took almost 12 hours to complete.
What are the impacts post upgrade.
By default, AWS migrates all the data and resources (mapping templates, alerts, lifecycle policies etc) into the new upgraded cluster.
If you have some scripts that uses the ES APIs, expect some API paths being changed in the upgraded one. For example, the /_template path in ES 6.8 becomes _index_template in OpenSearch 1.3.
By default, AWS routes all traffic to the new cluster and does not mess around with the ES endpoint. So, if you have some data ingestion pipelines that may use the ES endpoint, it should work automatically. However, I would still recommend you to check the logs of each of your data collectors for any errors.
For example, If you are using kinesis firehose delivery streams, check destination error logs from the AWS console. If you are using logstash or vector, check their logs too.
Any specific backup is required?
It's always a good idea to take periodic snapshots of your AWS ES domain. If something goes wrong, you can always spin up a new domain from a previous working snapshot.
How to perform upgrade in AWS cluster?
Not sure what you mean by this. There's actually no way to manually access the underlying nodes/machines and perform the upgrade yourself. This is because the ES cluster is fully managed by AWS.
Post upgrade , Do I need to carry any validtion?
As mentioned in Question no.2 answer, it's definitely a good idea to check your ingestion pipelines. Check for any warning/errors on the logs. You can also use the Kibana/OpensearchDashboard to visually inspect your data for anything weird.
When to do reindexing? post cluster upgrade?
After you perform the in-place upgrade from AWS console, your existing indices and data are all copied to the newly upgraded cluster.

Unknow source of daily clean up of indices

I have two separate elastic clusters, each one of elastic node is docker container, which live in docker swarm. I aggregate logs from various microservices in indices, and one of them is in format "logs-timestamp".
In one of cluster I have those indices from previous days, in other one I have only from present day.
This affect only those ones in "logs-timestamp" format.
Do you have any idea? or point from I can start to lookup?
Does elastic has some form of builtin garbage collector?
Ps. I didn't start this project so basiclly I have quite small knowledge about whole infrastructure.
You should check the ILM policies documentation (here) which is one way of automatically removing old indices.
In short, check the result of this command in kibana
GET _ilm/policy
It will tell you if you have some policy configured.
The other way I know for automatic indices curation is Curator ( see here and here). You should check if Curator is installed somewhere in your infrastructure and check the configuration.
Hope it helps.

Is there any way to restore Elasticsearch snapshots apart from using the Elasticsearch restore API?

my company wants to use an existing Elasticsearch snapshot repository (consisting of various hundreds of gigabytes) to obtain the original documents and store them elsewhere. I must state that the snapshots have been obtained using the Elasticsearch snapshot API.
My company is somehow reluctant to use Elasticsearch to restore the snapshots, as they fear that would involve creating a new Elasticsearch cluster that would consume considerable resources. So far, I have not seen any other way to restore the snapshots than to use Elasticsearch, but, given my company's insistence, I ask here: is there any other tool that I could use to restore said snapshots? Thank you in advance for any help resolving this issue.
What I would do in your shoes is to spin up a local cluster and restore the existing snapshot into it (here is the relevant Elastic documentation: Restoring to a different cluster). Then, from there, I would either export the data by using the Kibana Reporting plugin (https://www.elastic.co/what-is/kibana-reporting), or by writing a Logstash pipeline to export the data from the local cluster to - say - a CSV file.

ElasticSearch Upgrade 1.x to 6.x

We are using ElasticSearch 1.x on production for sometime now with millions of records.
We want to upgrade the version from 1.x to 6.x as:
There have been multiple updates by the company and the support for older versions is discontinued.
1.x does not support Kibana.
What's the best way to do it with explicit steps on data security?
Thanks!
I've recently did a migration from Elasticsearch 1.5 to 6.2.
Steps, that needs to be performed:
Update the mappings, there are a lot of changes that happened between those 2 versions (just as an example _all field is disable starting from 6.0). The official documentation should help you here.
After you updated the mappings you would need another cluster set up with desired version of Elasticsearch. Also update if needed your Logstash/Kibana.
Enable it to access your old cluster by adding your old cluster to the reindex.remote.whitelist in elasticsearch.yml, by doing: reindex.remote.whitelist: oldhost:9200
For each index that you need to migrate, you would need to manually create a new index in your new сluster with updated mappings from #1
Reindex from remote to pull documents from the old index into the new 6.x index
Full documentation regarding this one is available here - https://www.elastic.co/guide/en/elasticsearch/reference/current/reindex-upgrade-remote.html

Elasticsearch snaphots to s3

I have a elasticsearch 5.6.2 cluster with one master and two data nodes and I am using Kibana for visualizing . I want to enable automatic snapshots for the elasticsearch cluster to Amazon-s3 every 30mins. Can I Know How Can I accomplish it ..? There is no proper Documentation . I had also refered curator docs and I have a question, DO I need to configure that curator or on each node ...?
Please help guys
Curator is an external process.
You must put it on one single machine. It can be a node or any other machine.
It will send REST requests to elasticsearch when needed.
Put in your crontab and that is going to be ok.
You can also call the SNAPSHOT endpoint manually from a shell script every 30 minutes and don’t use curator at all.
Elastic cloud does a backup every 30 minutes (in case you don’t want to manage the cluster yourself and have that kind of advanced features like also rolling upgrades, Kibana, security...)

Resources