I have daily created indices, these indices are filled by an agent which collects a logs every second of the day, and i'am reindexing them (by field) to new indices using _reindex api.
How can i tell _reindex api to still reindixing while the source index still receiving new documents ?
Any help woould be really appriciated!
Thank you
you cannot force reindex API to be online to reindex new received documents.
but I have solution. you can add a date field (index_time) to your source index. write an hourly cron job to run reindex API with a query to get last hour indexed docs via index_time.
POST _reindex
{
"source": {
"index": "my-index-000001",
"query": {
"filter" :{
"query": {
"range": {
"index_time": {"gte" : "now-1h"}
}
}
}
}
},
"dest": {
"index": "my-new-index-000001"
}
}
I am trying to reindex an index of 200M of documents from cluster A to cluster B. I used the Reindex API with a remote source and everything worked fine. In the menwhile of my reindex some documents were added into the cluster A so I want to add them as well into the cluster B.
I launched again the reindex request but it seems that the reindex process is taking a lot, like if it was reindexing everything again.
My question is, is the cluster reindexing from scratch all the documents, even if they didn't change ?
My elasticsearch version is the 5.6
The elasticsearch does not know there is a change in the documents or not. So it tries to have each document completely in both indices. If you have a field like insert_time in your data, you can use reindex with query to limit the part of index of A to become reindex on B. This will let you use your older reindex and finish it faster. Reindex by query would be something like this:
POST _reindex
{
"source": {
"index": "A",
"query": {
"range": {
"insert_time": {
"gt": "time you want"
}
}
},
"dest": {
"index": "B"
}
}
My company is using elasticsearch 2.3.4.
We have a cluster that contains 38 ES nodes, and we've been having a problem with reindexing some of our data lately...
We've reindexed before very large indexes and had no problems, but recently, when trying to reindex much smaller indexed (less than 10GB) - we get : "SearchContextMissingException [No search context found for id [XXX]]".
We have no idea what's causing this problem or how to fix it. We'd like some guidance.
Has anyone saw this exception before?
From github comments on issues related to this , i think this can be avoided by changing batch size :
From documentation:
By default _reindex uses scroll batches of 1000. You can change the batch size with the size field in the source element:
POST _reindex
{
"source": {
"index": "source",
"size": 100
},
"dest": {
"index": "dest",
"routing": "=cat"
}
}
I had the same problem with an index that holds many huge documents. I had to reduce the batch size down to 10. (100 and 50 both didn't work).
This was the request that worked in the end:
POST _reindex?slices=5&refresh
{
"source": {
"index": "source_index",
"size": 10
},
"dest": {
"index": "dest_index"
}
}
You should also set the slices to the number of shards you have in your index.
what is offline and online indexing in Elastic search? I did my research but I couldn't find enough resources to see what these terms mean? any idea? and also when do we need to reindex? any examples would be great
The terms offline and online indexing are used here.
https://spark-summit.org/2014/wp-content/uploads/2014/07/Streamlining-Search-Indexing-using-Elastic-Search-and-Spark-Holden-Karau.pdf
Reindexing
The most basic form if reindexing just copies one index to another.
I have used this form of reindexing to change a mapping.
Elasticsearch doesn't allow you to change a mapping, so if you want to change a mapping you have to create a new index (index2) with a new mapping and then reindex. The reindex will fill that new mapping with the data of the old index.
The command below will move everything from index to index2.
curl -XPOST 'localhost:9200/_reindex?pretty' -d'
{
"source": {
"index": "index"
},
"dest": {
"index": "index2"
}
}'
You can also use reindexing to fill a new index with a part of the old one. You can do so by using a couple of parameters. The example below will copy the newest 1000 documents.
POST /_reindex
{
"size": 1000,
"source": {
"index": "index",
"sort": { "date": "desc" }
},
"dest": {
"index": "index2"
}
}
For more examples about reindexing please have a look at the official documentation.
offline vs online indexing
In ONLINE mode the new index is built while the old index is accessible to reads and writes. any update on the old index will also get applied to the new index.
In OFFLINE mode the table is locked up front for any read or write, and then the new index gets built from the old index. No read or write operation is permitted on the table while the index is being rebuilt. Only when the operation is done is the lock on the table released and reads and writes are allowed again.
Let's say I have movie data in my ElasticSearch and I created them like this:
curl -XPUT "http://192.168.0.2:9200/movies/movie/1" -d'
{
"title": "The Godfather",
"director": "Francis Ford Coppola",
"year": 1972
}'
And I have a bunch of movies from different years. I want to copy all the movies from a particular year (so, 1972) and copy them to a new index of "70sMovies", but I couldn't see how to do that.
Since ElasticSearch 2.3 you can now use the built in _reindex API
for example:
POST /_reindex
{
"source": {
"index": "twitter"
},
"dest": {
"index": "new_twitter"
}
}
Or only a specific part by adding a filter/query
POST /_reindex
{
"source": {
"index": "twitter",
"query": {
"term": {
"user": "kimchy"
}
}
},
"dest": {
"index": "new_twitter"
}
}
Read more: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html
The best approach would be to use elasticsearch-dump tool https://github.com/taskrabbit/elasticsearch-dump.
The real world example I used :
elasticdump \
--input=http://localhost:9700/.kibana \
--output=http://localhost:9700/.kibana_read_only \
--type=mapping
elasticdump \
--input=http://localhost:9700/.kibana \
--output=http://localhost:9700/.kibana_read_only \
--type=data
Check out knapsack:
https://github.com/jprante/elasticsearch-knapsack
Once you have the plugin installed and working, you could export part of your index via query. For example:
curl -XPOST 'localhost:9200/test/test/_export' -d '{
"query" : {
"match" : {
"myfield" : "myvalue"
}
},
"fields" : [ "_parent", "_source" ]
}'
This will create a tarball with only your query results, which you can then import into another index.
To reindex specific type from source index to destination index type syntax is
POST _reindex/
{
"source": {
"index": "source_index",
"type": "source_type",
"query": {
// add filter criteria
}
},
"dest": {
"index": "dest_index",
"type": "dest_type"
}
}
If the intent were to copy some portion of the data or the entire data to an index with the same settings/mappings as that of the original index one could use the clone api to achieve the same. Something like below:
POST /<index>/_clone/<target-index>
OR
PUT /<index>/_clone/<target-index>
However if the intent is to copy the data to a new index with the different settings/mappings than the original index one could use the reindex api to achieve the same. Something like below:
POST _reindex/
{
"source": {
"index": "source_index",
"type": "source_type",
"query": {
// add filter criteria
}
},
"dest": {
"index": "dest_index",
"type": "dest_type"
}
}
*Note: In case of reindex api the target index has to be created prior to actual api call.
For further reading on difference between clone and reindex refer What's the difference between cloning and reindexing an index in Elasticsearch?
You can do it easily with elasticsearch-dump (https://github.com/taskrabbit/elasticsearch-dump) in three steps. In the following example I copy the index "thor" to "thor2"
elasticdump --input=http://localhost:9200/thor --output=http://localhost:9200/thor2 --type=analyzer
elasticdump --input=http://localhost:9200/thor --output=http://localhost:9200/thor2 --type=mapping
elasticdump --input=http://localhost:9200/thor --output=http://localhost:9200/thor2 --type=data
Well the straightforward way to do this is to write code, with the API of your choice, querying for "year": 1972 and then indexing that data into a new index. You would use the Search api or the Scan and Scroll API to get all the documents and then either index them one by one or use the Bulk Api:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-search.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html
Assuming you don't want to do this via code but are looking for a direct way of doing this, I suggest the Elasticsearch Snapshot and Restore. Basically you would take a snapshot of your existing index, restore it into a new index and then use the Delete command to delete all documents with a year other than 1972.
Snapshot And Restore
The snapshot and restore module allows to create snapshots of
individual indices or an entire cluster into a remote repository. At
the time of the initial release only shared file system repository was
supported, but now a range of backends are available via officially
supported repository plugins.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-snapshots.html
Delete By Query API
The delete by query API allows to delete documents from one or more
indices and one or more types based on a query. The query can either
be provided using a simple query string as a parameter, or using the
Query DSL defined within the request body.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-delete-by-query.html
Since v7.4 the _clone api was introduced and can easily satisfy your need: (read for the relevant prerequisites and monitoring involved)
POST /<index>/_clone/<target-index>
Or:
PUT /<index>/_clone/<target-index>
You can use elasticdump --searchBody:
# Copy documents from movies to 70sMovies (filtering using query)
elasticdump \
--input=http://localhost:9200/movies \
--output=http://localhost:9200/70sMovies \
--type=data \
--searchBody="{\"query\":{\"term\":{\"username\": \"admin\"}}}" # <--- Your query here
more on elasticdump options here.