How to create a duplicate index in ElasticSearch from existing index? - elasticsearch

I have an existing index with mappings and data in ElasticSearch which I need to duplicate for testing new development. Is there anyway to create a temporary/duplicate index from the already existing one?
Coming from an SQL background, I am looking at something equivalent to
SELECT *
INTO TestIndex
FROM OriginalIndex
WHERE 1 = 0
I have tried the Clone API but can't get it to work.
I'm trying to clone using:
POST /originalindex/_clone/testindex
{
}
But this results in the following exception:
{
"error": {
"root_cause": [
{
"type": "invalid_type_name_exception",
"reason": "Document mapping type name can't start with '_', found: [_clone]"
}
],
"type": "invalid_type_name_exception",
"reason": "Document mapping type name can't start with '_', found: [_clone]"
},
"status": 400
}
I know someone would guide me quickly. Thanks in advance all you wonderful folks.

First you have to set the source index to be read-only
PUT /originalindex/_settings
{
"settings": {
"index.blocks.write": true
}
}
Then you can clone
POST /originalindex/_clone/testindex
If you need to copy documents to a new index, you can use the reindex api
curl -X POST "localhost:9200/_reindex?pretty" -H 'Content-Type:
application/json' -d'
{
"source": {
"index": "someindex"
},
"dest": {
"index": "someindex_copy"
}
}
'
(See: https://wrossmann.medium.com/clone-an-elasticsearch-index-b3e9b295d3e9)

Shortly after posting the question, I figured out a way.
First, get the properties of original index:
GET originalindex
Copy the properties and put to a new index:
PUT /testindex
{
"aliases": {...from the above GET request},
"mappings": {...from the above GET request},
"settings": {...from the above GET request}
}
Now I have a new index for testing.

Related

How to update data type of a field in elasticsearch

I am publishing a data to elasticsearch using fluentd. It has a field Data.CPU which is currently set to string. Index name is health_gateway
I have made some changes in python code which is generating the data so now this field Data.CPU has now become integer. But still elasticsearch is showing it as string. How can I update it data type.
I tried running below commands in kibana dev tools:
PUT health_gateway/doc/_mapping
{
"doc" : {
"properties" : {
"Data.CPU" : {"type" : "integer"}
}
}
}
But it gave me below error:
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "Types cannot be provided in put mapping requests, unless the include_type_name parameter is set to true."
}
],
"type" : "illegal_argument_exception",
"reason" : "Types cannot be provided in put mapping requests, unless the include_type_name parameter is set to true."
},
"status" : 400
}
There is also this document which says using mutate we can convert the data type but I am not able to understand it properly.
I do not want to delete the index and recreate as I have created a visualization based on this index and after deleting it will also be deleted. Can anyone please help in this.
The short answer is that you can't change the mapping of a field that already exists in a given index, as explained in the official docs.
The specific error you got is because you included /doc/ in your request path (you probably wanted /<index>/_mapping), but fixing this alone won't be sufficient.
Finally, I'm not sure you really have a dot in the field name there. Last I heard it wasn't possible to use dots in field names.
Nevertheless, there are several ways forward in your situation... here are a couple of them:
Use a scripted field
You can add a scripted field to the Kibana index-pattern. It's quick to implement, but has major performance implications. You can read more about them on the Elastic blog here (especially under the heading "Match a number and return that match").
Add a new multi-field
You could add a new multifield. The example below assumes that CPU is a nested field under Data, rather than really being called Data.CPU with a literal .:
PUT health_gateway/_mapping
{
"doc": {
"properties": {
"Data": {
"properties": {
"CPU": {
"type": "keyword",
"fields": {
"int": {
"type": "short"
}
}
}
}
}
}
}
}
Reindex your data within ES
Use the Reindex API. Be sure to set the correct mapping on the target index.
Delete and reindex everything from source
If you are able to regenerate the data from source in a timely manner, without disrupting users, you can simply delete the index and reingest all your data with an updated mapping.
You can update the mapping, by indexing the same field in multiple ways i.e by using multi fields.
Using the below mapping, Data.CPU.raw will be of integer type
{
"mappings": {
"properties": {
"Data": {
"properties": {
"CPU": {
"type": "string",
"fields": {
"raw": {
"type": "integer"
}
}
}
}
}
}
}
}
OR you can create a new index with correct index mapping, and reindex the data in it using the reindex API

How to test analyzer-smartcn plugin for Elastic Search on local machine

I installed the smartcn plugin on my elastic search, restarted elasticsearch and tried to create an index with these settings:
PUT /test_chinese
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"default": {
"type": "smartcn"
}
}
}
}
}
}
However, when I run this in Marvel, I get this error back and I see a bunch of errors in Elastic search:
"error": "IndexCreationException[[test_chinese] failed to create
index]; nested: ElasticsearchIllegalArgumentException[failed to find
analyzer type [smartcn] or tokenizer for [default]]; nested:
NoClassSettingsException[Failed to load class setting [type] with
value [smartcn]]; nested:
ClassNotFoundException[org.elasticsearch.index.analysis.smartcn.SmartcnAnalyzerProvider];
", "status": 400
Any ideas what I might be missing?
I figured it out. I manually installed the plugins from the zip and it was causing issues... I reinstalled the right way but specific to 1.7 and it worked

ElasticSearch - Reindexing your data with zero downtime

https://www.elastic.co/blog/changing-mapping-with-zero-downtime/
I try to create a new index and reindexing my data with zero downtime with this guide.
Now I have an index called "photoshooter" and I follow the steps
1) Create new index "photoshooter_v1" with the new mapping... (Done)
2) Create alias...
curl -XPOST localhost:9200/_aliases -d '
{
"actions": [
{ "add": {
"alias": "photoshooter",
"index": "photoshooter_v1"
}}
]
}
and I get this error...
{
"error": "InvalidAliasNameException[[photoshooter_v1] Invalid alias name [photoshooter], an index exists with the same name as the alias]",
"status": 400
}
I think I lose something with the logic..
Lets say your current index is named as "photoshooter " if i am guessing it right ok.
Now Create a Alias for this index first - OK
{
"actions": [
{ "add": {
"alias": "photoshooter_docs",
"index": "photoshooter"
}}
]
}
test it - curl -XGET 'localhost:9200/photoshooter_docs/_search'
Note - now you will use 'photoshooter_docs' as index name to interact with your index which is actually 'photoshooter' Ok.
Now we create a new index with your new mapping let's say we name it 'photoshooter_v2' now copy your 'photoshooter' index data to new index(photoshooter_v2)
Once you have copied all your data now simply
Remove the alias from previous index to new index -
curl -XPOST localhost:9200/_aliases -d '
{
"actions": [
{ "remove": {
"alias": "photoshooter_docs",
"index": "photoshooter"
}},
{ "add": {
"alias": "photoshooter_docs",
"index": "photoshooter_v2"
}}
]
}
test it again -> curl -XGET 'localhost:9200/photoshooter_docs/_search'
Congrats you have changed your mapping without zero downtime .
And to copy data you can use tools like this
https://github.com/mallocator/Elasticsearch-Exporter
Note - this tools also copies the mapping from old index to new index which you might don't want to do. So that you have read in its documentation or edit it according to your use .
Thanks
Hope this helps
It's it very simple, you cannot create an alias with a name of an index that already exists.
You'll need to consider a new name for the new index, re-index the data in the new one and then remove the old one to be able to give it the same name.
If you want to do that on daily basis, you might consider adding per say the date to your index's name and switch upon it every day.

Creating a mapping for an existing index with a new type in elasticsearch

I found an article on elasticsearch's site describing how to 'reindex without downtime', but that's not really acceptable every time a new element is introduced that needs to have a custom mapping (http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/)
Does anyone know why I can't create a mapping for an existing index but a new type in elasticsearch? The type doesn't exist yet, so why not? Maybe I'm missing something and it IS possible? If so, how can that be achieved?
Thanks,
Vladimir
Here is a simple example to create two type mapping in a index, (one after another)
I've used i1 as index and t1 and t2 as types,
Create index
curl -XPUT "http://localhost:9200/i1"
Create type 1
curl -XPUT "http://localhost:9200/i1/t1/_mapping" -d
{
"t1": {
"properties": {
"field1": {
"type": "string"
},
"field2": {
"type": "string"
}
}
}
}'
Create type 2
curl -XPUT "localhost:9200/i1/t2/_mapping" -d'
{
"t2": {
"properties": {
"field3": {
"type": "string"
},
"field4": {
"type": "string"
}
}
}
}'
Now Looking at mapping( curl -XGET "http://localhost:9200/i1/_mapping" ), It seems like it is working.
Hope this helps!! Thanks
If you're using Elasticsearch 6.0 or above, an index can have only one type.
So you have to create an index for your second type or create a custom type that would contain the two types.
For more details : Removal of multiple types in index

How to copy some ElasticSearch data to a new index

Let's say I have movie data in my ElasticSearch and I created them like this:
curl -XPUT "http://192.168.0.2:9200/movies/movie/1" -d'
{
"title": "The Godfather",
"director": "Francis Ford Coppola",
"year": 1972
}'
And I have a bunch of movies from different years. I want to copy all the movies from a particular year (so, 1972) and copy them to a new index of "70sMovies", but I couldn't see how to do that.
Since ElasticSearch 2.3 you can now use the built in _reindex API
for example:
POST /_reindex
{
"source": {
"index": "twitter"
},
"dest": {
"index": "new_twitter"
}
}
Or only a specific part by adding a filter/query
POST /_reindex
{
"source": {
"index": "twitter",
"query": {
"term": {
"user": "kimchy"
}
}
},
"dest": {
"index": "new_twitter"
}
}
Read more: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html
The best approach would be to use elasticsearch-dump tool https://github.com/taskrabbit/elasticsearch-dump.
The real world example I used :
elasticdump \
--input=http://localhost:9700/.kibana \
--output=http://localhost:9700/.kibana_read_only \
--type=mapping
elasticdump \
--input=http://localhost:9700/.kibana \
--output=http://localhost:9700/.kibana_read_only \
--type=data
Check out knapsack:
https://github.com/jprante/elasticsearch-knapsack
Once you have the plugin installed and working, you could export part of your index via query. For example:
curl -XPOST 'localhost:9200/test/test/_export' -d '{
"query" : {
"match" : {
"myfield" : "myvalue"
}
},
"fields" : [ "_parent", "_source" ]
}'
This will create a tarball with only your query results, which you can then import into another index.
To reindex specific type from source index to destination index type syntax is
POST _reindex/
{
"source": {
"index": "source_index",
"type": "source_type",
"query": {
// add filter criteria
}
},
"dest": {
"index": "dest_index",
"type": "dest_type"
}
}
If the intent were to copy some portion of the data or the entire data to an index with the same settings/mappings as that of the original index one could use the clone api to achieve the same. Something like below:
POST /<index>/_clone/<target-index>
OR
PUT /<index>/_clone/<target-index>
However if the intent is to copy the data to a new index with the different settings/mappings than the original index one could use the reindex api to achieve the same. Something like below:
POST _reindex/
{
"source": {
"index": "source_index",
"type": "source_type",
"query": {
// add filter criteria
}
},
"dest": {
"index": "dest_index",
"type": "dest_type"
}
}
*Note: In case of reindex api the target index has to be created prior to actual api call.
For further reading on difference between clone and reindex refer What's the difference between cloning and reindexing an index in Elasticsearch?
You can do it easily with elasticsearch-dump (https://github.com/taskrabbit/elasticsearch-dump) in three steps. In the following example I copy the index "thor" to "thor2"
elasticdump --input=http://localhost:9200/thor --output=http://localhost:9200/thor2 --type=analyzer
elasticdump --input=http://localhost:9200/thor --output=http://localhost:9200/thor2 --type=mapping
elasticdump --input=http://localhost:9200/thor --output=http://localhost:9200/thor2 --type=data
Well the straightforward way to do this is to write code, with the API of your choice, querying for "year": 1972 and then indexing that data into a new index. You would use the Search api or the Scan and Scroll API to get all the documents and then either index them one by one or use the Bulk Api:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-search.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html
Assuming you don't want to do this via code but are looking for a direct way of doing this, I suggest the Elasticsearch Snapshot and Restore. Basically you would take a snapshot of your existing index, restore it into a new index and then use the Delete command to delete all documents with a year other than 1972.
Snapshot And Restore
The snapshot and restore module allows to create snapshots of
individual indices or an entire cluster into a remote repository. At
the time of the initial release only shared file system repository was
supported, but now a range of backends are available via officially
supported repository plugins.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-snapshots.html
Delete By Query API
The delete by query API allows to delete documents from one or more
indices and one or more types based on a query. The query can either
be provided using a simple query string as a parameter, or using the
Query DSL defined within the request body.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-delete-by-query.html
Since v7.4 the _clone api was introduced and can easily satisfy your need: (read for the relevant prerequisites and monitoring involved)
POST /<index>/_clone/<target-index>
Or:
PUT /<index>/_clone/<target-index>
You can use elasticdump --searchBody:
# Copy documents from movies to 70sMovies (filtering using query)
elasticdump \
--input=http://localhost:9200/movies \
--output=http://localhost:9200/70sMovies \
--type=data \
--searchBody="{\"query\":{\"term\":{\"username\": \"admin\"}}}" # <--- Your query here
more on elasticdump options here.

Resources