Elasticsearch Shard Location - elasticsearch

I am trying to setup an elasticsearch cluster and have a question thats bothering me. I am transitioning from Marklogic to Elasticsearch and have this concept of storing data on a different disk rather than on the same disk where my software i.e. MarkLogic is installed. I know how to do it in MarkLogic but somehow can not find anything on this on elasticsearch. Can anyone point me to a document that can help me configure my shard on a different machine where elasticsearch is not installed?
Thanks,
S.

You simply need to change the path.data setting in your elasticsearch.yml configuration file:
path:
data:
- /mnt/hda1
- /mnt/hda2
- /mnt/hda3
You can use a single location or several and when you do, ES will store your index data on those locations. Note that data pertaining to a given shard will always be located at the same path location.

Related

Unknow source of daily clean up of indices

I have two separate elastic clusters, each one of elastic node is docker container, which live in docker swarm. I aggregate logs from various microservices in indices, and one of them is in format "logs-timestamp".
In one of cluster I have those indices from previous days, in other one I have only from present day.
This affect only those ones in "logs-timestamp" format.
Do you have any idea? or point from I can start to lookup?
Does elastic has some form of builtin garbage collector?
Ps. I didn't start this project so basiclly I have quite small knowledge about whole infrastructure.
You should check the ILM policies documentation (here) which is one way of automatically removing old indices.
In short, check the result of this command in kibana
GET _ilm/policy
It will tell you if you have some policy configured.
The other way I know for automatic indices curation is Curator ( see here and here). You should check if Curator is installed somewhere in your infrastructure and check the configuration.
Hope it helps.

Elasticsearch - Migrate lndex from Windows machine to Linux machine

We are currently running ES on Windows 2012 R2 server machine (In-house) and it has total 20 Million Documents with 12 GB of index size.
Now we are called to migrate our Windows server into Linux Server. In order to that I am seeking any reliable method to ship Index data from Windows to Linux machine. Can anyone please suggest the best workaround?
Thanks.
Don't copy the data directory! Choose a supported path:
CCR - the easiest and fastest if you have both cluster platinum licensed
Snapshot via FS/S3 - if you have snapshots already in place, a good option, especially with S3 as storage as you don't need to copy the snapshot to the new nodes or mount on all data nodes in both clusters. This is also a fast option as you don't reindex in the destination cluster - it's just a fast restore of shards and probably the second-best approach in term of speed.
Reindex from remote - comes with the overhead of reindexing the docs but works also with different elasticsearch versions, if you want a simple way or need to update the elastic version to newer major version, try this way
Logstash with elasticsearch input and output - Same as 3.) but with logstash in between. An easy path if you want to modify the docs while copying
Good luck!

What's the easiest way of moving Elastic Search data between servers

I've got Elastic Search v6.1.0 installed on Windows and Centos7 machines. The goal is to migrate data from Win to Centos7 machine.
Since they both have the same ES version, I simply dragged "data" folder from machine A to B. When I checked its health, its status was red and active_primary_shards was 0. So I reversed the changes I made.
What other methods are there? Can Snapshot/Restore method be used for this purpose? I think it's for migrating between different versions.
So the question is, what's the best/easiest method for moving data between 2 servers with same ES versions?
Using snapshot/restore
You can perfectly use snapshot/restore for this task as long as you have a shared file system or a single-node cluster. The shared FS should meet the following criteria:
In order to register the shared file system repository it is necessary
to mount the same shared filesystem to the same location on all master
and data nodes.
So it's not a problem if you have a single-node cluster. In this case just make a snapshot and copy it over to other machine.
It might though be a challenging task if you have many nodes running.
You may use one of the supported plugins for S3, HDFS and other cloud storages.
The advantage of this approach is that the data and the indices are snapshotted entirely.
Using _reindex API
It might be easier to use _reindex API to transfer data from one ES cluster to another. There is a special Reindex from Remote mode that allows exactly this use case.
What reindex actually does is a scroll on the source index and a lot of bulk inserts to the target index (which can be remote).
There are couple of issues you should take care of:
setting up the target index (no mapping, no settings will be set by reindex)
if some fields on the source index are excluded from _source then their contents won't be copied to the target index
Summing up
For snapshot/restore
Pros:
all data and the indices are saved/restored as they are
2 calls to the ES API are needed
Cons:
if cluster has more than 1 node, you need to setup a shared FS or to use some cloud storage
For _reindex
Pros:
Works for cluster of any size
Data is copied directly (no intermediate storage required)
1 call to the ES API is needed
Cons:
Data excluded from _source will be lost
Here's also a similar SO question from some three years ago.
Hope that helps!

Change cluster name in elastic search

How to rename the current cluster in elasticsearch config?
i want to rename the cluster without it going down if possible.
Make edits in the elasticsearch.yml file. By default the es cluster name is elasticsearch and the cluster.name field in the yml file is commented out. So first uncomment it, then give a name and restart es.
If you are having multi nodes cluster means, you can try updating cluster names in config file & directory name (if replicas enabled) one by one nodes; which is similar to rolling upgrade of the Elasticsearch.
if you are using single node cluster means, you can attempt changing the cluster name in config file but restart of cluster will be needed to take effect change.

Getting old indexed elasticsearch data

I have lot of data indexed in my elasticsearch.
I deleted elasticsearch folder and then extarct again fresh zip of elasticsearch and start the elasticsearch server.
I am surprised because after staring new elasticsearch server, I again found all old data and this problem persists again and again.
Can any please help me? I don't want to get all old data indexed in elasticsearch.
Regards
Given the cluster health response it's not a problem with multiple nodes running on the same cluster as suggested by Igor. I'd suggest you to check the java processes running. You could maybe have an elasticsearch hanging somewhere which keeps writing in that folder.

Resources