How do I alter the schema without destroying data in elasticsearch? - elasticsearch

This is my current schema
{
"mappings": {
"historical_data": {
"properties": {
"continent": {
"type": "string",
"index": "not_analyzed"
},
"country": {
"type": "string",
"index": "not_analyzed"
},
"description": {
"type": "string"
},
"funding": {
"type": "long"
},
"year": {
"type": "integer"
},
"agency": {
"type": "string"
},
"misc": {
"type": "string"
},
"university": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
I have 700k records uploaded. Without destroying the data, how can I make the university index not "not_analysed" such that the change reflects in my existing data?

The mapping for an existing field cannot be modified.
However you can achieve the desired outcome in two ways .
Create another field. Adding fields is free using put _mapping API
curl -XPUT localhost:9200/YOUR_INDEX/_mapping -d '{
"properties": {
"new_university": {
"type": "string"
}
}
}'
Use multi-fields, add a sub-field to your not_analyzed field.
curl -XPUT localhost:9200/YOUR_INDEX/_mapping -d '{
"properties": {
"university": {
"type": "string",
"index": "not_analyzed",
"fields": {
"university_analyzed": {
"type": "string" // <-- ANALYZED sub field
}
}
}
}
}'
In both the case, you need to reindex in order to populate the new field. Use _reindex API
curl -XPUT localhost:9200/_reindex -d '{
"source": {
"index": "YOUR_INDEX"
},
"dest": {
"index": "YOUR_INDEX"
},
"script": {
"inline": "ctx._source.university = ctx._source.university"
}
}'

You are not exactly forced to "destroy" your data, what you can do is reindex your data as described in this article (I'm not gonna rip off the examples as they are particularly clear in the section Reindexing your data with zero downtime).
For reindexing, you can also take a look at the reindexing API, the simplest way being:
POST _reindex
{
"source": {
"index": "twitter"
},
"dest": {
"index": "new_twitter"
}
}
Of course it will take some resources to perform this operation, so I would suggest that you take a complete look at the changes you want to introduce in your mapping, and perform the operation when you have the least amount of activity on your servers (e.g. during the weekend, or at night...)

Related

Elasticsearch Field Preference for result sequence

I have created the index in elasticsearch with the following mapping:
{
"test": {
"mappings": {
"documents": {
"properties": {
"fields": {
"type": "nested",
"properties": {
"uid": {
"type": "keyword"
},
"value": {
"type": "text",
"copy_to": [
"fulltext"
]
}
}
},
"fulltext": {
"type": "text"
},
"tags": {
"type": "text"
},
"title": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"url": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
}
}
While searching I want to set the preference of fields for example if search text found in title or url then that document comes first then other documents.
Can we set a field preference for search result sequence(in my case preference like title,url,tags,fields)?
Please help me into this?
This is called "boosting" . Prior to elasticsearch 5.0.0 - boosting could be applied in indexing phase or query phase( added as part of field mapping ). This feature is deprecated now and all mappings after 5.0 are applied in query time .
Current recommendation is to to use query time boosting.
Please read this documents to get details on how to use boosting:
1 - https://www.elastic.co/guide/en/elasticsearch/guide/current/_boosting_query_clauses.html
2 - https://www.elastic.co/guide/en/elasticsearch/guide/current/_boosting_query_clauses.html

ElasticSearch create an index with dynamic properties

Is it possible to create an index, restricting indexing a parent property?
For example,
$ curl -XPOST 'http://localhost:9200/actions/action/' -d '{
"user": "kimchy",
"message": "trying out Elasticsearch",
"actionHistory": [
{ "timestamp": 123456789, "action": "foo" },
{ "timestamp": 123456790, "action": "bar" },
{ "timestamp": 123456791, "action": "buz" },
...
]
}'
I don't want actionHistory to be indexed at all. How can this be done?
For the above document, I believe the index would be created as
$ curl -XPOST localhost:9200/actions -d '{
"settings": {
"number_of_shards": 1
},
"mappings": {
"action": {
"properties" : {
"user": { "type": "string", "index" : "analyzed" },
"message": { "type": "string": "index": "analyzed" },
"actionHistory": {
"properties": {
"timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"action": { "type": "string", "index": "analyzed" }
}
}
}
}
}
}'
Would removing properties from actionHistory and replace it with "index": "no" be the proper solution?
This is an example, however my actual situation are documents with dynamic properties (i.e. actionHistory contains various custom, non-repeating properties across all documents) and my mapping definition for this particular type has over 2000 different properties, making searches extremely slow (i.e. worst than full text search from the database).
You can probably get away by using dynamic templates, match on all actionHistory sub-fields and set "index": "no" for all of them.
PUT actions
{
"mappings": {
"action": {
"dynamic_templates": [
{
"actionHistoryRule": {
"path_match": "actionHistory.*",
"mapping": {
"type": "{dynamic_type}",
"index": "no"
}
}
}
]
}
}
}

not analyzed string in elasticsearch

I want to write a template in elasticsearch that changes all strngs to not analyzed. The official documentation shows that I can do that using
"properties": {
"host_name": {
"type": "string",
"index": "not_analyzed"
},
"created_at": {
"type": "date",
"format": "EEE MMM dd HH:mm:ss Z YYYY"
}
}
But the problem here is that I need to do this for every field like it is done here for host_name. I tried using _all and __all but it did not seem to work. How can I change all the strings to not analyzed using a custom template?
For an already existent index, you cannot change the mapping of the already existent fields and, even if you could, you need to reindex all documents so that they can obey the new mapping rules.
Otherwise, if you just create the index:
PUT /_template/not_analyzed_strings
{
"template": "xxx-*",
"order": 0,
"mappings": {
"_default_": {
"dynamic_templates": [
{
"string_fields": {
"mapping": {
"index": "not_analyzed",
"type": "string"
},
"match_mapping_type": "string",
"match": "*"
}
}
]
}
}
}

How exactly do mapped fields in elastic search work?

The documentation is sparse and not entirely helpful. So say I have the following fields for my attribute:
{
"my_index": {
"mappings": {
"my_type": {
"my_attribute": {
"mapping": {
"my_attribute": {
"type": "string",
"analyzer": "my_analyzer",
"fields": {
"lowercased": {
"type": "string"
},
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
}
}
my_analyzer lowercases tokens (in addition to other stuff).
So now I would like to know if the following statements are true:
my_analyzer does not get applied to raw, because the not_analyzed index does not have any analyzers, as its name implies.
my_attribute and my_attribute.lowercased are the exact same, so it is redundant to have the field my_attribute.lowercased
Your first statement is correct, however the second is not. my_attribute and my_attribute.lowercased might not be the same since the former has your custom my_analyzer search and index analyzer, while my_attribute.lowercased has the standard analyzer (since no analyzer is specified the standard one kicks in).
Besides, your mapping is not correct the way it is written, it should be like this:
{
"mappings": {
"my_type": {
"properties": {
"my_attribute": {
"type": "string",
"analyzer": "my_analyzer",
"fields": {
"lowercased": {
"type": "string"
},
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}

ElasticSearch - Reindex to add doc_value

What am I trying to do?
Add doc_type to an existing index.
What have I tried?
Created index and document
POST /my_index-1/my_type/1
{
"my_prop": "my_value"
}
Added a template
PUT /_template/my_template
{
"id": "my_template",
"template": "my_index-*",
"mappings": {
"_default_": {
"dynamic_templates": [
{
"my_prop_template": {
"mapping": {
"index": "not_analyzed",
"doc_values": true,
"fielddata": {
"format": "doc_values"
},
"type": "string"
},
"match": "my_prop",
"match_mapping_type": "string"
}
}
]
}
}
}
Reindexed
./stream2es es --source http://localhost:9200/my_index-1 --target http://localhost:9200/my_index-2
What went wrong?
In the new index my_index-2 the property did not receive "doc_values": true:
...
"properties": {
"my_prop": {
"type": "string"
}
}
...
Just for the sanity, I have also tried adding the same document to my_index-3, and it got "doc_values": true.
My question
How can I reindex my old index with "doc_values": true?
Thanks #Val! Logstash indeed solved the problem.
Both stream2es and elasticsearch-reindex created new mapping without "doc_values": true.

Resources