Document found with _search but not with GET - elasticsearch

We have a single machine elastic search server (8 shards but all hosted at the same machine). Index contains 7 million documents. We do not specify any custom routing when indexing the documents. We are using Elastic search version 1.2.
The problem is that we are unable to retrieve many of our documents using GET , . However using search?_id: we are able to retrieve all of those documents.
We are also successful in retrieving a document by specifying routing parameter (with different values (1,2,3,...) ) with GET.
With previous version, i.e. Elastic Search 1.0.3, we did not have that problem.
Any suggestions for resolution?
Thanks in advance

There is a bug in Elasticsearch 1.2.0 that causes this specific behavior. It's due to a routing bug that was introduced in 1.2.0:
There was a routing bug in Elasticsearch 1.2.0 that could have a
number of bad side effects on the cluster. Possible side effects
include:
documents that were indexed prior to the upgrade to 1.2.0 may not be accessible via get. A search would find these documents, but not
a direct get of the document by ID.
documents that were updated after the upgrade to 1.2.0 may be duplicated, with one copy from pre-1.2.0 and a second copy updated
since the upgrade to 1.2.0.
if a document is duplicated as above, and versioning is in use, the document added after the upgrade to 1.2.0 will have its version
reset.
ES is advising everyone to upgrade to 1.2.1 immediately. No word yet on how to resolve what appears to be index corruption introduced by using 1.2.0 to insert or update. Full details here:
http://www.elasticsearch.org/blog/elasticsearch-1-2-1-released/

Related

Adding TTL on edges/ vertices doesn't work on Mixed index(elasticsearch version > 6.x.x)

I'm using janusgraph with AWSKeyspace (cassandra) and elasticsearch as the storage and indexing backends respectively. I have a requirement to delete all the edges older than 30days. Setting a TTL on edge/vertex property doesn't work on mixed index.
Elasticsearch version used is 7.x.x. I think elasticsearch stopped supporting ttls after 6.x.x onwards and now it is a part of index life cycle management which janusgraph doesn't support as of now(v0.6.2)
Ref: https://www.elastic.co/guide/en/elasticsearch/reference/7.17/mapping-ttl-field.html
Should we be handling this outside of janusgraph for now? If there is any other way please suggest.
This confusing behavior has been an open issue since 2018, see:
https://github.com/JanusGraph/janusgraph/issues/987
Therefore, the issue predates the elasticsearch 6.x to 7.x upgrade and does not seem related to this. Also, no alternatives are suggested other than using solr as an indexing backend for JanusGraph.

Upgrade Elasticsearch from 5 to 7

I want to upgrade my single node elasticsearch from version 5.30 to 7.2. What is the best possible ways of doing it?
Please follow the elasticsearch official document https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-upgrade.html for upgrading your elasticsearch version.
Some important points to take care from upgrading 5.x to 7.3:
Elasticsearch can read indices created in the previous major version. If you have indices created in 5.x or before, you must reindex or delete them before upgrading to 7.3.1. Elasticsearch nodes will fail to start if incompatible indices are present.
Take care of things mentioned in https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-upgrade.html#_preparing_to_upgrade
Let me know if you need additional information.

ElasticSearch Upgrade 1.x to 6.x

We are using ElasticSearch 1.x on production for sometime now with millions of records.
We want to upgrade the version from 1.x to 6.x as:
There have been multiple updates by the company and the support for older versions is discontinued.
1.x does not support Kibana.
What's the best way to do it with explicit steps on data security?
Thanks!
I've recently did a migration from Elasticsearch 1.5 to 6.2.
Steps, that needs to be performed:
Update the mappings, there are a lot of changes that happened between those 2 versions (just as an example _all field is disable starting from 6.0). The official documentation should help you here.
After you updated the mappings you would need another cluster set up with desired version of Elasticsearch. Also update if needed your Logstash/Kibana.
Enable it to access your old cluster by adding your old cluster to the reindex.remote.whitelist in elasticsearch.yml, by doing: reindex.remote.whitelist: oldhost:9200
For each index that you need to migrate, you would need to manually create a new index in your new сluster with updated mappings from #1
Reindex from remote to pull documents from the old index into the new 6.x index
Full documentation regarding this one is available here - https://www.elastic.co/guide/en/elasticsearch/reference/current/reindex-upgrade-remote.html

How to force segment merge in Elasticsearch 1.6?

I found in the official docs how to force segment merge for Elasticsearch 2.3 but not for prior version , in particular 1.6.2 which I am using. So I would like to ask if there is a way to force segment merging for older versions of Elasticsearch .
Prior to 2.x that was an optimize API
But be careful with it, running that on index with running index/update requests will hit performance hard.

Ritiring old logs without using Elasticsearch-curator

I'm running an ELK stack and would like a strategy for automatically retiring logs older than a certain age... I have tried using elasticsearch-curator but it requires python 2.7 and I have python 2.6.6, and am anxious to upgrade python in case I break other packages.
Is there a similar product or does the elasticsearch api cater for such a requirement?
The older versions of curator will work with older versions of ElasticSearch ( I'm using curator version 3.1.0 with a 1.7.1 ES cluster.)
We started out using the Elasticsearch S3 archiving plugin but soon discovered certain limitations when wanting to restore data. We also experienced performance issues with the plugin, which tended to slow down the entire cluster. Since then, we have migrated to a new system in which we archive the data for us and our customers using our own code before indexing the data to Elasticsearch in a clear text format. This gives us all the flexibility we and our customers require.
You may be mistaking the dependency on a given version of the elasticsearch-py module for a version of Elasticsearch. Curator version 3.5.1 requires es-py 2.3, but works with any version of elasticsearch greater than 1.0.

Resources