Autobalance the shards in ElasticSearch - elasticsearch

We have 4 ElasticSearch nodes in version 5.6.9, that for some previous rules, they have an unbalanced number of shards in each node.
We have found that we can move one shard at a time to another node, but that is incredibly slow.
Apart from creating a script that uses the ElasticSearch API to balance the shards, is there another way?

You can do so using Cluster Reroute it allows for manual changes to the allocation of individual shards in the cluster. check out the docs Cluster Reroute
POST /_cluster/reroute
{
"commands" : [
{
"move" : {
"index" : "test", "shard" : 0,
"from_node" : "node1", "to_node" : "node2"
}
},
{
"allocate_replica" : {
"index" : "test", "shard" : 1,
"node" : "node3"
}
}
]
}

We found the issue, the system was not autorebalancing the cluster's indices, because we had the cluster.routing.rebalance.enable = none
We found the information here.
The problem we had with the cluster/reroute, was the according to the documentation the system will try to balance itself again. Either way, thanks for your help.

Related

external access to ElasticSearch cluster

Using this link I can easily setup a 3-node cluster on a single host, with docker-compose.
This is all fine if I just use ES via the included Kibana container.
However I need to access this cluster from external hosts. This becomes problematic because the nodes inside the cluster are exposed through their docker-internal IP address. The application uses this API call below to get the addresses, and then of course errors out.
$ curl 172.16.0.146:9200/_nodes/http?pretty
{
"_nodes" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"cluster_name" : "es-cluster-test",
"nodes" : {
"hYCGiuBLQMK4vn5I3C3pQQ" : {
"name" : "es01",
"transport_address" : "192.168.48.3:9300",
"host" : "192.168.48.3",
"ip" : "192.168.48.3",
"version" : "8.2.2",
.....
How can I overcome this? I have tried exposing the 9200/9300 ports for all 3 nodes to different ports on the docker-host, and then adding a network.publish_host=172.16.0.146 environment setting to each node, but this results in three 1-node clusters.
Someone must have faced this one in the past...

ElasticCloud : how to configure [search.max_buckets] cluster level setting?

We were using ElasticSearch 6.X deployed on my own server.
We migrate recently in the cloud. So the version used is 7.X.
We have a huge query with aggregates that was working on 6.X but this query is not working anymore.
This is due to a Breaking changes between version.
https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking-changes-7.0.html#breaking_70_aggregations_changes
search.max_buckets in the cluster settingedit
The dynamic cluster setting named search.max_buckets now defaults to 10,000 (instead of unlimited in the previous version). Requests that try to return more than the limit will fail with an exception.
So when, we execute the query with aggregates, we have the exception:
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : 0,
"index" : "xxxxxxx",
"node" : "xxxxxxxxxxxxxxxx",
"reason" : {
"type" : "too_many_buckets_exception",
"reason" : "Trying to create too many buckets. Must be less than or equal to: [10000] but was [10001]. This limit can be set by changing the [search.max_buckets] cluster level setting.",
"max_buckets" : 10000
}
}
We don't have time to change query so how can we configure the parameter on ElasticCloud?
Or can I add a parameter to the query?
Thanks for your help.
I found the answer on the ElasticSearch website:
https://discuss.elastic.co/t/increasing-max-buckets-for-specific-visualizations/187390

Is it possible to organize data between elasticsearch shards based on stored data?

I want to build a data store with three nodes. The first one should keep all data, the second one data of the last month, the third data of the last week. Is it possible to automatically configure elasticsearch shards to relocate themselves between nodes so that this functionality is given?
if you want to move existing documents from some node to another then you can use _cluster/reroute.
But using this solution with automatic allocation can be dangerous as just after moving an index to target node it will try to even balance the cluster.
Or you can disable automatic allocations, in that case, only custom allocations will work and can be really risky to handle for large data set.
POST /_cluster/reroute
{
"commands" : [
{
"move" : {
"index" : "test", "shard" : 0,
"from_node" : "node1", "to_node" : "node2"
}
},
{
"allocate_replica" : {
"index" : "test", "shard" : 1,
"node" : "node3"
}
}
]
}
source: Elasticsearch rerouting
Also, you should read this : > Customize document routing

How to get creation time of indices in elastic search using Jest

I am trying to delete the indexes from elasticsearch which are created 24 hours before. I am not finding a way to get the creation time of indices for the particular node. Using spring boot elastic search, this can be accomplished. However, I am using the Jest API.
You can get the settings.index.creation_date value that was stored at index creation time.
With curl you can get it easily using:
curl -XGET localhost:9200/your_index/_settings
You get:
{
"your_index" : {
"settings" : {
"index" : {
"creation_date" : "1460663685415", <--- this is what you're looking for
"number_of_shards" : "5",
"number_of_replicas" : "1",
"version" : {
"created" : "1040599"
},
"uuid" : "dIG5GYsMTueOwONu4RGSQw"
}
}
}
}
With Jest, you can get the same value using:
import io.searchbox.indices.settings.GetSettings;
GetSettings getSettings = new GetSettings.Builder().build();
JestResult result = client.execute(getSettings);
You can then use JestResult in order to find the creation_date
If I may suggest something, curator would be a much handier tool for achieving what you need.
Simply run this once a day:
curator delete indices --older-than 1 --time-unit days

Elasticsearch querying alias with routing giving partial results

In an effort to create multi-tenant architecture for my project.
I've created an elasticsearch cluster with an index 'tenant'
"tenant" : {
"some_type" : {
"_routing" : {
"required" : true,
"path" : "tenantId"
},
Now,
I've also created some aliases -
"tenant" : {
"aliases" : {
"tenant_1" : {
"index_routing" : "1",
"search_routing" : "1"
},
"tenant_2" : {
"index_routing" : "2",
"search_routing" : "2"
},
"tenant_3" : {
"index_routing" : "3",
"search_routing" : "3"
},
"tenant_4" : {
"index_routing" : "4",
"search_routing" : "4"
}
I've added some data with tenantId = 2
After all that, I tried to query 'tenant_2' but I only got partial results, while querying 'tenant' index directly returns with the full results.
Why's that?
I was sure that routing is supposed to query all the shards that documents with tenantId = 2 resides on.
When you have created aliases in elasticsearch, you have to do all operations using aliases only. Be it indexing, update or search.
Try reindexing the data again and check if possible (If it is a test index, I hope so).
Remove all the indices.
curl -XDELETE 'localhost:9200/' # Warning:!! Dont use this in production.
Use this command only if it is test index.
Create the index again. Create alias again. Do all the indexing, search and delete operations on alias name. Even the import of data should also be done via alias name.

Resources