elasticsearch reindexing to different index has different no of documnts - elasticsearch

i am trying to re-index an existing index to some other index. ex. index all documents from index A to index B. index B has new mapping .but when i look for number of documents in both indexes its very much different i am getting an approx difference of 19000 documents.what could be the reason for it.here in the re-indexing code:
The nuber of documents in index B happens to be 19000 less than the documents in index A.
POST /_reindex
{
"source": {
"index": "A"
},
"dest": {
"index": "B"
}
}
EDIT: i needed to remove a type from an existing index and add some new types to that index .below are the steps that i performed.
steps to remove an existing type from index and add new type to it
remove all data from a type of index A
download new data to index A
create index B and update it mapping (latest mapping for new
data)
reindex from original index A to new index B
remove original index A -
create the original index A with updated mapping
reindex from index B to index A

Related

Does non-indexed field update triggers reindexing in elasticsearch8?

My index mapping is the following:
{
"mappings": {
"dynamic": False,
"properties": {
"query_str": {"type": "text", "index": False},
"search_results": {
"type": "object",
"enabled": False
},
"query_embedding": {
"type": "dense_vector",
"dims": 768,
},
}
}
Field search_result is disabled. Actual search is performed only via query_embedding, other fields are just non-searchable data.
If I will update search_result field in existing document, will it trigger reindexing?
The docs say that "The enabled setting, which can be applied only to the top-level mapping definition and to object fields, causes Elasticsearch to skip parsing of the contents of the field entirely. The JSON can still be retrieved from the _source field, but it is not searchable or stored in any other way". So, it seems logical not to re-index docs if changes took place only in non-indexed part, but I'm not sure
Elasticsearch documents (Lucene Segments) are inmutable, so every change you make in a document will delete the document and create a new one. This is a Lucene's behavior:
Lucene's index is composed of segments, each of which contains a
subset of all the documents in the index, and is a complete searchable
index in itself, over that subset. As documents are written to the
index, new segments are created and flushed to directory storage.
Segments are immutable; updates and deletions may only create new
segments and do not modify existing ones. Over time, the writer merges
groups of smaller segments into single larger ones in order to
maintain an index that is efficient to search, and to reclaim dead
space left behind by deleted (and updated) documents.
When you set enable:false you are just avoiding to have the field content in the searchable structures but the data still lives in Lucene.
You can see a similar answer here:
Partial update on field that is not indexed

Reindexing more than 10k documents in Elasticsearch

Let's say I have an index- A. It contains 26k documents. Now I want to change a field status with type as Keyword. As I can't change A's status field type which is already existing, I will create a new index: B with my setting my desired type.
I followed reindex API:
POST _reindex
{
"source": {
"index": "A",
"size": 10000
},
"dest": {
"index": "B",
"version_type": "external"
}
}.
But the problem is, here I can migrate only 10k docs. How to copy the rest?
How can I copy all the docs without losing any?
delete the size: 10000 and problem will be solved.
by the way the size field in Reindex API means that what batch size elasticsearch should use to fetch and reindex docs every time. by default the batch size is 100. (you thought it means how many document you want to reindex)

Elasticsearch copy index mappings

We have an elasticsearch cluster consisting of 6 nodes version 6 and we have an index called bishkek in the cluster, now I want to copy only the index mappings (no data) to the new index bishkek_v2
Elasticsearch doesn't have any API that copy only mappings, so you would need to first get your mapping for bishkek index and create new index based on the mapping. To get the mapping you can run this GET Request.
GET /bishkek/_mapping
After getting the mapping you create your new Index:
PUT /bishkek_v2
{
"mappings": {
[Mapping you get from your old index]
}
}
I think this will help you to clone the index
POST /my_source_index/_clone/my_target_index

Moving data from oine Elasticsearch index to another with higher number of shards or increasing shard number in existing index

I am new to Elasticsearch and I have been reading documentation in order to find a way of increasing amount of shards that my index consists of. Currently my index looks like this:
country_data 0 p STARTED 227 100.7kb 192.168.0.115 $HOSTNAME
country_data 0 r STARTED 227 100.7kb 192.168.0.116 $HOSTNAME
I wanted to increase the number of shard to 5 however I was unable to find a proper way of doing it. I learnt from another Stackoverflow question that I should be able to do it like this:
POST _reindex?slices=5
{
"source": {
"index": "country_data"
},
"dest": {
"index": "country_data_new"
}
}
However when I did that I got a copy of my country_data with same amount of shards and replicas (1 and 1). I tried to learn more about it in documentation but all I found is this: https://www.elastic.co/guide/en/elasticsearch/client/curator/current/option_slices.html
I couldn't find anything in documentation about increasing number of shards in existing index or how can I move data to new index which would have more shards. I would be grateful for any insights into this problem or at least a website where could I learn how to do it.
This can be done in any of the below mentioned way.
1st Option : You can use the elastic search Split Index API.
I suggest you to please go through the documentation once before proceeding with this method.
2nd Option : Create a new index with same mappings and give the required settings for new shards. Then use the reindex API to copy data from source index to destination index
To create the new Index:
PUT /<NEW_INDEX_NAME>
{
"settings": {
"number_of_shards": <REQUIRED_NUMBER_OF_SHARDS>
},
"mappings": {<MAPPINGS_OF_SOURCE_INDEX>}
}
}
If you don't give the number of shards in the settings while creating an index, by default it creates index with one primary and one replica shard.
To Reindex from source to newly created index:
POST _reindex
{
"source": {
"index": "<SOURCE_INDEX_NAME>"
},
"dest": {
"index": "<NEW_INDEX_NAME>"
}
}

How to update a document using index alias

I have created an index "index-000001" with primary shards = 5 and replica = 1. And I have created two aliases
alias-read -> index-000001
alias-write -> index-000001
for indexing and searching purposes. When I do a rollover on alias-write when it reaches its maximum capacity, it creates a new "index-000002" and updates aliases as
alias-read -> index-000001 and index-000002
alias-write -> index-000002
How do I update/delete a document existing in index-000001(what if in case all I know is the document id but not in which index the document resides) ?
Thanks
Updating using an index alias is not directly possible, the best solution for this is to use a search query using the document id or a term and get the required index. Using the index you can update your document directly.
GET alias-read/{type}/{doc_id} will get the required Document if doc_id is known.
If doc_id is not known, then find it using a unique id reference
GET alias-read/_search
{
"term" : { "field" : "value" }
}
In both cases, you will get a single document as a response.
Once the document is obtained, you can use the "_index" field to get the required index.
PUT {index_name}/{type}/{id} {
"required_field" : "new_value"
}
to update the document.

Resources