How to force segment merge in Elasticsearch 1.6? - elasticsearch

I found in the official docs how to force segment merge for Elasticsearch 2.3 but not for prior version , in particular 1.6.2 which I am using. So I would like to ask if there is a way to force segment merging for older versions of Elasticsearch .

Prior to 2.x that was an optimize API
But be careful with it, running that on index with running index/update requests will hit performance hard.

Related

Adding TTL on edges/ vertices doesn't work on Mixed index(elasticsearch version > 6.x.x)

I'm using janusgraph with AWSKeyspace (cassandra) and elasticsearch as the storage and indexing backends respectively. I have a requirement to delete all the edges older than 30days. Setting a TTL on edge/vertex property doesn't work on mixed index.
Elasticsearch version used is 7.x.x. I think elasticsearch stopped supporting ttls after 6.x.x onwards and now it is a part of index life cycle management which janusgraph doesn't support as of now(v0.6.2)
Ref: https://www.elastic.co/guide/en/elasticsearch/reference/7.17/mapping-ttl-field.html
Should we be handling this outside of janusgraph for now? If there is any other way please suggest.
This confusing behavior has been an open issue since 2018, see:
https://github.com/JanusGraph/janusgraph/issues/987
Therefore, the issue predates the elasticsearch 6.x to 7.x upgrade and does not seem related to this. Also, no alternatives are suggested other than using solr as an indexing backend for JanusGraph.

How to migrate data from elasticsearch 5.6 to elasticsearch 8.3

I have an elastic search cluster running 5.6. I plan to upgrade my cluster but i plan to do it by basically running a ES cluster 8.3 running in parallel and then moving data over to it.
The preferred way i think is to do snapshot and restore https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshot-restore.html
But I am confused about what exactly Snapshot version compatibility mean :
In the above does it mean that if i take a snapshot in elasticsearch 5.6, I cannot restore directly in version 8.3 ?? (which mean I have to first move to 6.x then to 7.x and finally to 8.x ) ??
The below index compatibility matrix, however says that a version in 5.x will work in 8.x ?
Am i missing something ? or can someone help me elaborate this?
So, the underlying problem is that data written in Lucene version N, can only be read with N+1. For Elasticsearch 5 to 8 the Lucene version was always 1 greater than the ES version (so 6 to 9).
That means, both for an upgrade or a restored snapshot: If your data was written with 5.x, you can only read / restore it with 6.x. For 7.x or 8.x you'll need to reindex the data. I would do a remote reindex straight from 5.x to 8.latest if possible: https://www.elastic.co/guide/en/elasticsearch/reference/current/reindex-upgrade-remote.html
There are some small caveats but they will probably not apply to you:
This doesn't apply to source only snapshots, but those always need a reindex, so that's not going to add any benefit for you.
8.3 added a feature to still read snapshots from 5.0 on but it is slower, doesn't support all features, and it is a commercial feature (platinum license if I'm not mistaken).
Depending on what kind of data it is: If it's aging out (like logs or metrics), maybe you don't have to migrate it to the new cluster?

Upgrade Elasticsearch from 5 to 7

I want to upgrade my single node elasticsearch from version 5.30 to 7.2. What is the best possible ways of doing it?
Please follow the elasticsearch official document https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-upgrade.html for upgrading your elasticsearch version.
Some important points to take care from upgrading 5.x to 7.3:
Elasticsearch can read indices created in the previous major version. If you have indices created in 5.x or before, you must reindex or delete them before upgrading to 7.3.1. Elasticsearch nodes will fail to start if incompatible indices are present.
Take care of things mentioned in https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-upgrade.html#_preparing_to_upgrade
Let me know if you need additional information.

Ritiring old logs without using Elasticsearch-curator

I'm running an ELK stack and would like a strategy for automatically retiring logs older than a certain age... I have tried using elasticsearch-curator but it requires python 2.7 and I have python 2.6.6, and am anxious to upgrade python in case I break other packages.
Is there a similar product or does the elasticsearch api cater for such a requirement?
The older versions of curator will work with older versions of ElasticSearch ( I'm using curator version 3.1.0 with a 1.7.1 ES cluster.)
We started out using the Elasticsearch S3 archiving plugin but soon discovered certain limitations when wanting to restore data. We also experienced performance issues with the plugin, which tended to slow down the entire cluster. Since then, we have migrated to a new system in which we archive the data for us and our customers using our own code before indexing the data to Elasticsearch in a clear text format. This gives us all the flexibility we and our customers require.
You may be mistaking the dependency on a given version of the elasticsearch-py module for a version of Elasticsearch. Curator version 3.5.1 requires es-py 2.3, but works with any version of elasticsearch greater than 1.0.

Document found with _search but not with GET

We have a single machine elastic search server (8 shards but all hosted at the same machine). Index contains 7 million documents. We do not specify any custom routing when indexing the documents. We are using Elastic search version 1.2.
The problem is that we are unable to retrieve many of our documents using GET , . However using search?_id: we are able to retrieve all of those documents.
We are also successful in retrieving a document by specifying routing parameter (with different values (1,2,3,...) ) with GET.
With previous version, i.e. Elastic Search 1.0.3, we did not have that problem.
Any suggestions for resolution?
Thanks in advance
There is a bug in Elasticsearch 1.2.0 that causes this specific behavior. It's due to a routing bug that was introduced in 1.2.0:
There was a routing bug in Elasticsearch 1.2.0 that could have a
number of bad side effects on the cluster. Possible side effects
include:
documents that were indexed prior to the upgrade to 1.2.0 may not be accessible via get. A search would find these documents, but not
a direct get of the document by ID.
documents that were updated after the upgrade to 1.2.0 may be duplicated, with one copy from pre-1.2.0 and a second copy updated
since the upgrade to 1.2.0.
if a document is duplicated as above, and versioning is in use, the document added after the upgrade to 1.2.0 will have its version
reset.
ES is advising everyone to upgrade to 1.2.1 immediately. No word yet on how to resolve what appears to be index corruption introduced by using 1.2.0 to insert or update. Full details here:
http://www.elasticsearch.org/blog/elasticsearch-1-2-1-released/

Resources