I wanted to ask a question about Elasticsearch making 5 shards in each index by default. Well for some reason this is not the case for me. I was wondering whether it was an error on my side (even though I didn't make any changes to the custom template) or this is no longer a case (no longer 5 shards defaultly for each index)? I didn't find anything in documentation or in internet about it. I know I can change this by running:
PUT _template/default
{
"index_patterns": ["*"],
"order": -1,
"settings": {
"number_of_shards": "5",
"number_of_replicas": "1"
}
}
however this is not my point I just wanted to learn what is current way of working for Elasticsearch.
Thanks for all answers!
From the 7.x version, the default number of primary shard in each index is 1, as mentioned here in the documentation
Before the 7.x version, the default number of primary shared for each index were 5
You can refer to the breaking changes of the elasticsearch 7.0.0 version here
Index creation no longer defaults to five shards Previous versions of
Elasticsearch defaulted to creating five shards per index. Starting
with 7.0.0, the default is now one shard per index.
Related
I keep studying about ELK Stack and ran into a little problem.
I have been reading all the documentation possible and it makes great emphasis on the importance of shards and replicas.
But nowhere does it say how to configure the number of each one. I have read some site that says that it is better to leave it in automatic and others that say how to configure it in version 5.8 but that no longer works.
So if someone can explain to me I would be very grateful.
Just a small Add-on to #Val answer, related to primary shards.
Reason that you can't change the primary shards are due to the fact, it will change the way data is split between primary shards and changing them will cause consistent hashing to break, which is very popular technique to horizontally scale and splitting the data technique.
Replicas shards are just copy so you can increase and decrease as it doesn't make any impact on consistent hashing.
If you want to change the primary shards, you have to create a new index and use alias API and Reindex API for efficiently doing it,
When you create an index, you can configure both values in the settings of that index:
PUT your-index
{
"settings": {
"index.number_of_shards": 3,
"index.number_of_replicas": 1
}
}
Also note that you can update the settings of an index after its creation, but you can only update the number of replicas and not the number of primary shards:
PUT your-index/_settings
{
"settings": {
"index.number_of_replicas": 2
}
}
As simple as that!
I have a newly setup Elasticsearch 7.5.2 cluster. When I create an index, only one shard is created default for it.
My cluster strategy is as below:
Total Nodes: 5
--------------
Node 1 & Node 2 - Master Only
Node 3 - Master & Data Node
Node 4 & Node 5 - Data Only
Could not find any cluster setting that is restricting the shards for index creation.
Is the issue with cluster strategy or am I missing any settings here?.
Please help me to find the the issue.
Earlier Elasticsearch had default number of primary shards to 5, which is changed from Elasticsearch 7.X, which you are using, hence you are seeing just 1 primary shard.
Elasticsearch link for this change and more info on this SO answer.
Apart from API which is applicable on a particular index, which #Kamal already mentioned, you can specify this setting in your elasticsearch.yml, which would be effective on every index created until you override using the API call.
Config to add in your elasticsearch.yml
index.number_of_shards: {your desired number of shards}
Note: This is for primary shards that can't be changed dynamically, so be cautious of setting this, Unlike the number of replicas which can be changed dynamically.
That is correct. Post version 7, Elasticsearch by default creates index with shard size 1 as mentioned here
You can always specify the index shard using the below settings, while creating the index.
PUT <your_index_name>
{
"settings" : {
"index" : {
"number_of_shards" : 5
}
}
}
Hope this helps!
I am new to elastic search. I have installed elastic search in my local machine. when I add any document, the result shards total is 2, when I see all indices through _cat API primary and replica are showing as 1 only. But by default shards should be 5, but its showing only 1 for me. for every index default shards are 1 only, I didn't changed any configuration.
So starting with Elasticsearch version 7.0 the default number of shards was reduced from 5 to 1.
You can see the difference by comparing the version 6.8 and version 7.0 of the documentation.
If you still want to have 5 shards created for your index, you have to create it like the following:
PUT my_index
{
"settings": {
"number_of_shards": 5,
"number_of_replicas": 1
}
}
The resulting shards after indexing a document is two because you have also one replica. Since you operate a single-node-cluster the replica shard can not be allocated onto another node and therefore only one shard succeeded.
I am very new to elastic search. I need to know what is settings in the index.is it optional? what happens if we don't include it and what happens if we don't include shards in settings.
If you're new to Elasticsearch, it's important that you understand the basic terminologies of Elastic search first.
cluster – An Elasticsearch cluster consists of one or more nodes and is identifiable by its cluster name.
node – A single Elasticsearch instance. In most environments, each node runs on a separate box or virtual machine.
index – In Elasticsearch, an index is a collection of documents like the database in mysql.
shard – Because Elasticsearch is a distributed search engine, an index is usually split into elements known as shards that are distributed across multiple nodes. Elasticsearch automatically manages the arrangement of these shards. It also rebalances the shards as necessary, so users need not worry about the details.
replica – By default, Elasticsearch creates five primary shards and one replica for each index. This means that each index will consist of five primary shards, and each shard will have one copy.
Settings are generally used to define the overall architecture of your application. It differs based on the requirement of the application.
It contains the number of shards, no of Replica sets, etc. This information is helpful to design our Elastic according to the need of the application as below:
{
"settings" : {
"index" : {
"number_of_shards" : 3,
"number_of_replicas" : 2
}
}
}
For further clarification you can visit the official documentation of Elastic community, that is very well written here.
Setting in ElasticSearch
We've discovered some duplicate documents in one of our Elasticsearch indices and we haven't been able to work out the cause. There are two copies of each of the affected documents, and they have exactly the same _id, _type and _uid fields.
A GET request to /index-name/document-type/document-id just returns one copy, but searching for the document with a query like this returns two results, which is quite surprising:
POST /index-name/document-type/_search
{
"filter": {
"term": {
"_id": "document-id"
}
}
}
Aggregating on the _uid field also identifies the duplicate documents:
POST /index-name/_search
{
"size": 0,
"aggs": {
"duplicates": {
"terms": {
"field": "_uid",
"min_doc_count": 2
}
}
}
}
The duplicates are all on different shards. For example, a document might have one copy on primary shard 0 and one copy on primary shard 1. We've verified this by running the aggregate query above on each shard in turn using the preference parameter: it does not find any duplicates within a single shard.
Our best guess is that something has gone wrong with the routing, but we don't understand how the copies could have been routed to different shards. According to the routing documentation, the default routing is based on the document ID, and should consistently route a document to the same shard.
We are not using custom routing parameters that would override the default routing. We've double-checked this by making sure that the duplicate documents don't have a _routing field.
We also don't define any parent/child relationships which would also affect routing. (See this question in the Elasticsearch forum, for example, which has the same symptoms as our problem. We don't think the cause is the same because we're not setting any document parents).
We fixed the immediate problem by reindexing into a new index, which squashed the duplicate documents. We still have the old index around for debugging.
We haven't found a way of replicating the problem. The new index is indexing documents correctly, and we've tried rerunning an overnight processing job which also updates documents but it hasn't created any more duplicates.
The cluster has 3 nodes, 3 primary shards and 1 replica (i.e. 3 replica shards). minimum_master_nodes is set to 2, which should prevent the split-brain issue. We're running Elasticsearch 2.4 (which we know is old - we're planning to upgrade soon).
Does anyone know what might cause these duplicates? Do you have any suggestions for ways to debug it?
We found the answer! The problem was that the index had unexpectedly switched the hashing algorithm it used for routing, and this caused some updated documents to be stored on different shards to their original versions.
A GET request to /index-name/_settings revealed this:
"version": {
"created": "1070599",
"upgraded": "2040699"
},
"legacy": {
"routing": {
"use_type": "false",
"hash": {
"type": "org.elasticsearch.cluster.routing.DjbHashFunction"
}
}
}
"1070599" refers to Elasticsearch 1.7, and "2040699" is ES 2.4.
It looks like the index tried to upgrade itself from 1.7 to 2.4, despite the fact that it was already running 2.4. This is the issue described here: https://github.com/elastic/elasticsearch/issues/18459#issuecomment-220313383
We think this is what happened to trigger the change:
Back when we upgraded the index from ES 1.7 to 2.4, we decided not to upgrade Elasticsearch in-place, since that would cause downtime. Instead, we created a separate ES 2.4 cluster.
We loaded data into the new cluster using a tool that copied over all the index settings as well as the data, including the version setting which you should not set in ES 2.4.
While dealing with a recent issue, we happened to close and reopen the index. This normally preserves all the data, but because of the incorrect version setting, it caused Elasticsearch to think that an upgrade was in processed.
ES automatically set the legacy.routing.hash.type setting because of the false upgrade. This meant that any data indexed after this point used the old DjbHashFunction instead of the default Murmur3HashFunction which had been used to route the data originally.
This means that reindexing the data into a new index was the right thing to do to fix the issue. The new index has the correct version setting and no legacy hash function settings:
"version": {
"created": "2040699"
}