Elasticsearch reindex api deleting document after copy - elasticsearch

I've gone through the _reindex api documentation a few times, and can't figure out if it's possible or not. Once the document is copied from the source index to the destination index, is it possible to also remove the source document?
Here is the current _reindex api call body that I'm invoking:
{
"source": {
"index": "srcindex",
"type": "type",
"query": {
"range": {
"date": {
"from": <timestamp>
}
}
}
},
"dest": {
"index": "dstindex",
"type": "type"
}
}

Currently, It is not supported i.e copying then deleting immediately(effectively moving a document).
You can find good discussion happened on this topic here.
Eventually, you need to do _reindex then _delete_by_query to achieve your goal.
Hope this helps!

Related

How do I save the values that I applied the filter to a new index?

How do I save the values that I applied the filter to a new index?
The picture is extracted only the values I want through the filter function.
I'd like to save this extracted value to a new index.
Thank you very much for letting me know.
GET 0503instgram_csv/_search?_source=message&filter_path=hits.hits._source
You can use the Reindex Api, Yuo can create new index with desired mapping and settings then project your old index with ingested data into the new one just you created. The source and destination can be any pre-existing index, index alias, or new index. However, the source and destination must be different. Consider below example, Where we created new index with the name "new_index" with some basic mappings inside PUT properties Api.
PUT /new_index
{
"settings": {
"number_of_shards": 1
},
“mappings”: {
"properties": {
"name":{
"type": "text"
},
"id":{
"type": "integer"
},
"paid": {
"type": "object"
}
}
Finally your Reindex Api may look like as below.
POST _reindex
{
"source": {
"index": "old_index"
},
"dest": {
"index": "new_index"
}
}

Re-Index Elasticsearch, ignore fields not in mapping

Trying to test out re-index API in elasticsearch and running into issues where existing data contains fields not present in the new index's strict mapping. Is there a way to tell elasticsearch to simply ignore those fields and carry on?
Edit: To clarify, by ignore I meant not to include those fields during the re-index process.
If you have access to the index settings before running reindex you can just do:
PUT test/_mapping
{
"dynamic": "false"
}
then change it back to strict once reindexing is done.
UPDATE based on your comment
POST _reindex
{
"source": {
"index": "src"
},
"dest": {
"index": "dst"
},
"script": {
"lang": "painless",
"source": """
ctx['_source'].remove('email');
ctx['_source'].remove('username');
ctx['_source'].remove('name');
// removing from nested:
for(item in ctx['_source'].Groups){
item.remove('GroupName');
item.remove('IsActive');
}
"""
}
}
While reindexing you can include or exclude source fields according to your destination index mapping.
To exclude some specific fields while reindexing:
POST _reindex
{
"source": {
"index": "source-index",
"_source": {
"excludes": ["exclude_a", "exclude_b"]
}
},
"dest": {
"index": "dest-index"
}
}
To include any specific field while reindexing:
POST _reindex
{
"source": {
"index": "source-index",
"_source": ["include_a", "include_b"]
},
"dest": {
"index": "dest-index"
}
}

Duplicate a document on elasticsearch

I need to clone the content of a document in my elasticsearch index (in the same index) by using the kibana console. I need exactly the same fields in the _source of the document (of course, the copy will have another id). I tryed to:
GET the document
Create a new empty instance of document
Update the new document by
manually copying the properties of the result on (1):
POST /blog/post/VAv2FWoBKgnBpki61WiD/_update { "doc" : {
"content" : "..." ...
But the problem is the field contain veeeery long properties. And sometimes I got an error since the strings seem not to be scaped when I manually copy them from the Kibana interface.
I searched in the documentation but I can not find a query to duplicate a document, and it is a quite common think to do I think...
Any clue?
Make use of Reindex API. Here is what you can do.
Summary of steps:
Create a destination_index (dummy). Make sure the mapping is exact to that of source_index
Using Reindex API, reindex that particular document from source_index to desitnation_index. During this operation, update the _id (I've mentioned the script)
Reindex this document back from desitnation_index to source_index
Reindex Query
Step 1: Copy document from source_index to destination_index. (With the script)
POST _reindex
{
"source": {
"index": "source_index",
"query": {
"match": {
"_id": "1"
}
}
},
"dest": {
"index": "destination_index"
},
"script": {
"inline": "ctx._id=2",
"lang": "painless"
}
}
Note how I've added a script in the above query that would change the _id (_id is set as 2) of the document. Your destination_index would have all the fields with exact same values as that of source except for the _id field.
Step 2: Copy that document from destination_index to source_index
POST _reindex
{
"source": {
"index": "destination_index",
"query": {
"match": {
"_id": "2"
}
}
},
"dest": {
"index": "source_index"
}
}
Now search the source_index, it would have two documents with different _ids (_id=1 and _id=2) having exact same content.
Hope this helps!

how to index questions and answers in elaticsearch

I am doing a project to index questions and answers of a website in elasticsearch (version 6) for search purpose.
I have first thought of creating two indexes as shown below, one for questions and one for answers.
questions mapping:
{"mappings": {
"question": {
"properties": {
"title":{
"type":"text"
},
"question": {
"type": "text"
},
"questionId":{
"type":"keyword"
}
}
}
}
}
answers mapping:
{"mappings": {
"answer": {
"properties": {
"answer":{
"type":"text"
},
"answerId": {
"type": "keyword"
},
"questionId":{
"type":"keyword"
}
}
}
}
}
I have used multimatch query along with term and top_hits aggregation to search the indexed Q&As (referred question).I used this method to remove the duplicates from the search results. As answers or the question itself of the same question can appear in the result. I only want one entry per question in the results. the problem I am facing is to paginate the results. there is no possible way to paginate aggregation in elasticsearch. It can only paginate hits not aggregations.
then I thought of saving the both question and answers in one document, answers in a Json array. the problem with this approach is that there is no clean way to add, remove, update a specific answer in a given question document. only way I found was using a groovy script (referred question). which is deprecated in elasticsearch v6 AFAIK.
Is there a better and clean way to design this ?
Thanks.
Parent-Child Relationship
Use the parent-child relationship. It is similar to the nested model, and allows association of one entity with another. You can associate one document type with another, in a one-to-many relationship.
More information on here: https://www.elastic.co/guide/en/elasticsearch/guide/current/parent-child.html
Child documents can be added, changed, or deleted without affecting the parent nor other children. You can do pagination on the parent documents using the Scroll API.
Child documents can be retrieved using the has_parent join.
The trade-off: you do not have to take care of duplicates and pagination problems, but parent-child queries can be 5 to 10 times slower than the equivalent nested query.
Your mapping can be like the following:
PUT /my-index
{
"mappings": {
"question": {
"properties": {
"title": {
"type": "text"
},
"question": {
"type": "text"
},
"questionId": {
"type": "keyword"
}
}
},
"answer": {
"_parent": {
"type": "question"
},
"properties": {
"answer": {
"type": "text"
},
"answerId": {
"type": "keyword"
},
"questionId": {
"type": "keyword"
}
}
}
}
}

ElasticSearch: reindex and aliases (keep routing policy)

I'm using alias per user. For each alias I'm linking a routing and a filter:
PUT _/<index>/_alias/u1#u1.com'
{
"routing": "u1#u1.com",
"filter": {
"term": {
"user": "u1#u1.com"
}
}
}
So, I'm setting that indexation and searching is using routing information.
I want to reindex all documents on another index using _reindex api. After having created the new index, I've created all aliases. So, I figure out that documents have to be reindexed using aliases in order to keep routing policy.
Is there any way to set it up on _reindex?
Example:
POST _/_reindex
{
"source": {
"index": "old"
},
"dest": {
"index": "new"
}
}
'
Any ideas?
Yes, you can:
POST _reindex
{
"source": {
"index": "old",
"query": {
"term": {
"user": "u1#u1.com"
}
}
},
"dest": {
"index": "new",
"routing": "=u1#u1.com"
}
}
According to this documentation:
By default if _reindex sees a document with routing then the routing is preserved unless it’s changed by the script.
So, as far I've been able to figure out, by default, for each document that already has a routing information it's preserved.

Resources