how to start elastic search in non cluster mode - elasticsearch

I have two different machines running elastic search server instances. They automatically create a cluster and changes made on one instance reflect on other instance on different machine. I have changed the cluster.name property in elasticsearch.yml file in config folder and the issue is resolved. I wanted to know if i can start elastic search server instance in non-cluster mode ?

You can't start the es server in non-cluster mode.
But if you want the two servers to run independently (in its own cluster), there are 2 options that I can think of:
Disable multicast and don't set the hosts for them in unicast
Change the cluster.name to make them have different names

The easiest is to set node.local: true
This prevents elasticsearch from trying to connect to other nodes.
Using a custom name is also a good idea in any case just to prevent unintended exchange of data. Use something else for production, testing, and development.

Related

how to sync up two ElasticSearch cluster

I need to setup a replicated ES clusterII in data centerII, the ES clusterII just need to sync up with ES clusterI which in data centerI. So far my idea is that store snapshot in custerII and restore the snapshot in order to sync up clusterI. But this way kind of having some delay. Is there any better way please.
The ability to cluster is a concept baked into ElasticSearch. However it was not designed to be scaled across datacenters because this involves network latency, but it can do it.
The idea behind ElasticSearch is to have a highly-available cluster that replicates shards within itself (i.e. a replica level of 2 in a cluster means that you have 2 copies of the data across your cluster). This means one cluster alone is its own backup.
First, if you don't have it configured as a cluster, do so by adding the following to your /etc/elasticsearch/elasticsearch.yml (or wherever you put your config):
/etc/elasticsearch/elasticsearch.yml:
cluster.name: thisismycluster
node.name: ${HOSTNAME}
Alternatively, you can make node.name whatever you want, but it's best to put in your hostname.
You also want to make sure you have the ElasticSearch service bound to a particular address and/or interface, where the interface is probably your best bet because you need a point-to-point link across those datacenters:
/etc/elasticsearch/elasticsearch.yml:
network.host: [_tun1_]
You will need to make sure you set a list of discovery hosts, which means that on every host in the cluster, if their cluster.name parameter name matches, they will be discovered and assigned to that cluster. ElasticSearch takes care of the rest, it's magical!
You may add the host by name (only if defined in your /etc/hosts or DNS across your datacenters can resolve it) or IP:
/etc/elasticsearch/elasticsearch.yml:
discovery.zen.ping.unicast.hosts: ["ip1", "ip2", "..."]
Save the config and restart ElasticSearch:
sudo systemctl restart elasticsearch
OR
sudo service elasticsearch restart
If you aren't using systemd (depending on your OS), I would highly suggest using it.
I will tell you though that doing snapshots with ElasticSearch is a terrible idea, and to avoid it at all costs because ElasticSearch built the mentality of high-availability into the application already - this is why this application is so powerful and is being heavily adopted by the community and companies alike.

Adding a secondary node on another computer?

The running instance of elasticsearch on a server is running with all defaults, no changes.
How can I scale horizontally to another server on another network?
Where do you specify this?
I only see one elasticsearch.yml file in the config directory, do I have to make a new config file for each cluster/node etc I would like to enable? The config file appears to be for one instance only. How do I tell it to use it as a master and the secondary server outside of the network as a secondary instance?
On the other node, you install ES as usual and, depending on the network characteristics and your preference, you change or not things in elasticsearch.yml of both ES instances.
ES uses by default multicasting on the network to discover nodes in the same cluster. A cluster is defined by "cluster.name" property you can find in elasticsearch.yml file. Nodes with the same "cluster.name" will join the same cluster. If using multicasting, you need to make sure, first the multicasting is available in your network configuration, and then that you don't have firewalls or any other things that could block communication between the nodes (like port 54328).
You can also use unicasting for nodes discovery, where the address of each node is specified in elasticsearch.yml. For more details about this, check elasticsearch.yml file as it has some good description of these settings. For example, disable multicasting:
discovery.zen.ping.multicast.enabled: false
and configure unicasting:
discovery.zen.ping.unicast.hosts: ["host1", "host2:port"]

Elasticsearch 0.90.0: Putting existing server (with some data) into cluster

I am using elasticsearch 0.90.0 in my production environment. Currently, I am using only a single server (along with jetty plugin to restrict write access).
Now I want a cluster of two servers consisting of one old server (with data) and one new server. Now, I need my data on both of my servers, so that in case if anyone of them fails, data can be fetched from another. What should I do? Will the normal configuration work? Can I copy data folder from one server to another and expect it to work properly when placed in a cluster? Or should I clone elasticsearch folder itself on my second machine?
By normal configuration, I mean this:-
cluster.name: elasticsearch
node.name: "John"
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: []
Please help!
You should not have to copy data manually.
If you kept the default settings, adding one more node to the cluster should trigger :
creation of one replica by existing shard on your new node
rebalancing of the primary shards between your two nodes
You will have to add the adresses of your nodes as hosts as you have enabled unicast communication.
I recommend you to make some tests with ElasticSearch instances on your personal computer in order to check this behavior.

Running multiple elasticsearch instances

I need to setup 2 Elasticsearch instances:
one for kibana logs (my separate application will throw logs at it)
one for search for my production application
My plan is to create a separate folders with elasticsearch in them. They dont talk to each other which means they are separate databases and if one goes down, the other still runs. Is this good solution or should I use only one elasticsearch folder with muliple elasticsearch.yaml configuration files? What is the best practice for multiple elasticsearch instances?
The best practice is to NOT run two Elasticsearch instances on the SAME server.
Your production search will probably need a lot of ram to work fast and stay responsive. You don't want your logging system interfere with that.

ElasticSearch replication

Can somebody provide some instructions on how to configure ElasticSearch for replication. I am running ES in windows and understand that if I run the bat files multiple times on the same server, a separate instance of ES is started and they all connect to each other.
I will be moving to a production environment soon and will have a three node set up, each node being on a different server. Can someone point me at some documentation which gives me a bit more control of the replication set up.
Have a look at the discovery documentation. It works out-of-the-box with multicast discovery, even though you could have problems with firewalls etc., but I would recommend against it in production. I would rather use unicast and configure the host names of the nodes belonging to the cluster in the elasticsearch.yml. That way you make sure nobody is going to join the production cluster from his own machine.
One other thing I would do is configure a proper cluster name specific for every environment.
Replication is set to each index in Elasticsearch, not set to the server or node. That is, each index can have different number of replication setting. The number of replica setting is 1 by default.
The number of replication is not related or restricted to number of node set up. If the number of replication is greater than the number of data nodes, only the index health becomes yellow since some replications are not allocated, anything still works fine.
You can reference to the document for further information: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-update-settings.html

Resources