Use geo_point data type on field in _reindex api - elasticsearch

I have index that contains two fields: longitude and latitude that are stored as float. I want to create new index and copy data from the first one but with different mappings. I use reindex api with elastic processors which can rename fields and give them different data types. When i try to create field with type "geo_point" it fails.
"type": "parse_exception",
"reason": "[type] type [geo_point] not supported, cannot convert field.",
but when i create new index i am able to create field with "geo_point" type.
i tried different workarounds but documentation says that with geo queries you can only use "geo_point" type.
is there any solution?
{
"description": "test pipe",
"processors": [
{
"convert": {
"field": "location",
"type": "geo_point"
}
}
]
}
added pipe definition.

OK, let's say that your current index mapping looks like this:
PUT oldindex
{
"mappings": {
"doc": {
"properties": {
"latitude": {
"type": "float"
},
"longitude": {
"type": "float"
}
}
}
}
}
You need to create a new index with the proper mapping, as follows
PUT newindex
{
"mappings": {
"doc": {
"properties": {
"location": {
"type": "geo_point"
}
}
}
}
}
And then, you can simply leverage the reindex API to copy the old index into the new one with some additional scripting to create the location field:
POST _reindex
{
"source": {
"index": "oldindex",
},
"dest": {
"index": "newindex"
},
"script": {
"source": "ctx._source.location = ['lat': ctx._source.latitude, 'lon': ctx._source.longitude]; ctx._source.remove('latitude'); ctx._source.remove('longitude'); "
}
}
And you're good to go with the location field in your new shiny index!

Related

How to avoid index explosion in ElasticSearch

I have two docs from the same index that originally look like this (only _source value is shown here)
{
"id" : "3",
"name": "Foo",
"property":{
"schemaId":"guid_of_the_RGB_schema_defined_extenally",
"value":{
"R":255,
"G":100,
"B":20
}
}
}
{
"id" : "2",
"name": "Bar",
"property":{
"schemaId":"guid_of_the_HSL_schema_defined_extenally",
"value":{
"H":255,
"S":100,
"L":20
}
}
}
The schema(used for validation of value) is stored outside of ES since it has nothing to do with the indexing.
If I don't define mapping, the value field will be consider Object mapping. And its subfield will grow once there is a new subfield.
Currently, ElasticSearch supports Flattened mapping https://www.elastic.co/guide/en/elasticsearch/reference/current/flattened.html to prevent this explosion in the index. However it has a limited support for searching for inner field due to its restriction: As with queries, there is no special support for numerics — all values in the JSON object are treated as keywords. When sorting, this implies that values are compared lexicographically.
I need to be able to query the index to find the document match a given doc (e.g. B in the range [10,30])
So far I come up with a solution that structure my doc like this
{
"id":4,
"name":"Boo",
"property":
{
"guid_of_the_normalized_RGB_schema_defined_extenally":
{
"R":0.1,
"G":0.2,
"B":0.5
}
}
Although it does not solve my issue of the explosion in mapping, it mitigates some other issue.
My mapping now will look similar like this for the field property
"property": {
"properties": {
"guid_of_the_RGB_schema_defined_extenally": {
"properties": {
"B": {
"type": "long"
},
"G": {
"type": "long"
},
"R": {
"type": "long"
}
}
},
"guid_of_the_normalized_RGB_schema_defined_extenally": {
"properties": {
"B": {
"type": "float"
},
"G": {
"type": "float"
},
"R": {
"type": "float"
}
},
"guid_of_the_HSL_schema_defined_extenally": {
"properties": {
"B": {
"type": "float"
},
"G": {
"type": "float"
},
"R": {
"type": "float"
}
}
}
}
}
This solve the issue with the case where the field have the same name but different data type.
Can someone suggest me a solution that could solve the explosion of indices with out suffering from the limit that the Flattened has in searching?
To avoid mapping explosion, the best solution is to normalize your data better.
You can set "dynamic": "strict", in your mapping, then a doc will be rejected if it contains a field which is not already in the mapping.
After that, you can still add new fields but you will have to add them explicitly in the mapping before.
You can add a pipeline to clean up and normalize your data before ingestion.
If you don't want, or cannot reindex:
To make your query easy even if you can not know the "middle" part of your key, you can use a multimatch with a star.
GET myindex/_search
{
"query": {
"multi_match": {
"query": 0.5,
"fields": ["property.*.B"]
}
}
}
But you will still not be able to sort it as you want.
For ordering on multiple 'unknown' field names without touching the data, you can use a script: https://www.elastic.co/guide/en/elasticsearch/painless/current/painless-sort-context.html
But maybe you could simplify the whole process by adding a dynamic template to your index.
PUT test/_mapping
{
"dynamic_templates": [
{
"unified_red": {
"path_match": "property.*.R",
"mapping": {
"type": "float",
"copy_to": "unified_color.R"
}
}
},
{
"unified_green": {
"path_match": "property.*.G",
"mapping": {
"type": "float",
"copy_to": "unified_color.G"
}
}
},
{
"unified_blue": {
"path_match": "property.*.B",
"mapping": {
"type": "float",
"copy_to": "unified_color.B"
}
}
}
],
"properties": {
"unified_color": {
"properties": {
"R": {
"type": "float"
},
"G": {
"type": "float"
},
"B": {
"type": "float"
}
}
}
}
}
Then you'll be able to query any value with the same query :
GET test/_search
{
"query": {
"range": {
"unified_color.B": {
"gte": 0.1,
"lte": 0.6
}
}
}
}
For already existing fields, you'll have to add the copy_to by yourself on the mapping, and after that run an _update_by_query to populate them.

How to rename a field in Elasticsearch?

I have an index in Elasticsearch with the following field mapping:
{
"version_data": {
"properties": {
"title": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"updated_at": {
"type": "date"
},
"updated_by": {
"type": "keyword"
}
}
}
}
I have already created some documents in it and now want to rename version_data field with _version_data.
Is there any way in the Elasticsearch to rename a field within the mapping and in documents?
The closest thing is the alias data type.
In your mapping you could link it from the old to the new name like this:
PUT test/_mapping
{
"properties": {
"_version_data": {
"type": "alias",
"path": "version_data"
}
}
}
BTW I would generally avoid leading underscored since those normally used for internal fields like _id.

Retrieving Copy_to/Stored fields

I had created a copy_to field as of an existing field by altering its mapping below:
{
"properties": {
"ExistingField": {
"type": "date",
"copy_to": "CopiedField"
},
"CopiedField": {
"type": "date",
"store": true
}
}
}
I had used "store": "true" since I wanted this new fields value to be retrieved when I do a search. Aggregations with the "CopiedField" work fine but when I try to search for a value in this new CopiedField I cannot see anything being retrieved:
{
"stored_fields": [
"CopiedField"
],
"query": {
"match_all": {}
}
}
How do I retrieve the value of this "CopiedField" in a simple search?
Mapping cannot be changed for already existing fields.
You will need to create a new index(with correct mapping) and move documents from old index to new index. You can delete old index and use alias so that index name is not changed
(Mapping)[https://www.elastic.co/blog/changing-mapping-with-zero-downtime]
(Reindex)[https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html]
(Alias)[https://www.elastic.co/guide/en/elasticsearch/reference/6.2/indices-aliases.html]
ex.
Old index index35 with below mapping
PUT index35
{
"mappings": {
"properties": {
"ExistingField": {
"type": "date"
}
}
}
}
Query: below query will not return anything
GET index35/_search
{
"stored_fields": [
"CopiedField"
],
"query": {
"match_all": {}
}
}
Create New index
PUT index36
{
"mappings": {
"properties": {
"ExistingField": {
"type": "date",
"copy_to": "CopiedField"
},
"CopiedField": {
"type": "date",
"store": true
}
}
}
}
Move old documents to new documents
POST _reindex
{
"source": {
"index": "index35"
},
"dest": {
"index": "index36" ----> must be created before reindex
}
}
Make sure document count is same in both old and new index(to prevent data loss)
Delete old index :-DELETE index35
Create an alias for new index(give old name) so that search queries are not affected
POST /_aliases
{
"actions" : [
{ "add" : { "index" : "index36", "alias" : "index35" } }
]
}
Old query will now return results

Elasticsearch - how to extract date from date time field and add it to each document as a new field

I am learning elasticssearch and in one of the demo databases given to me, i have a date time field saved with the name time_stamp. The date data is saved as text:
"time_stamp":"13-06-2019 04:44:23"
I want to create a new data field titled "date" and extract only the date from each document and store it within the same document. The current index mapping is as follows:
{
"vp1": {
"mappings": {
"dynamic": "false",
"properties": {
"client_id": {
"type": "text"
},
"encod": {
"type": "float"
},
"imagename": {
"type": "text"
},
"indx": {
"type": "text"
},
"machid": {
"type": "text"
},
"matchid": {
"type": "float"
},
"sequence_id": {
"type": "integer"
},
"time_stamp": {
"type": "text"
}
}
}
}
}
I am using python3 to interact with the index.
You first need to update your mapping in order to add the new field since the dynamic setting is set to false the new field cannot be created automatically:
PUT vp1/_mapping
{
"properties": {
"date": {
"type": "date"
}
}
}
Then, an easy way to achieve what you want is to do it like this:
POST vp1/_update_by_query
{
"script": {
"source": "ctx._source.date = /\\s/.split(ctx._source.time_stamp)[0]"
}
}

Elasticsearch: Schema without mapping?

According to Elasticsearch's roadmap, mapping types are going to be completely removed at 7.x
How are we going to give a schema structure to Documents without mapping?
For example how would we replace this (A Doc/mapping_type with 3 fields of specific data type):
PUT twitter
{
"mappings": {
"user": {
"properties": {
"name": { "type": "text" },
"user_name": { "type": "keyword" },
"email": { "type": "keyword" }
}
}
}
They are going to remove types (user in you example) from mapping, because there is only 1 type per index now, the rest will be the same:
PUT twitter
{
"mappings": {
"_doc": {
"properties": {
"name": { "type": "text" },
"user_name": { "type": "keyword" },
"email": { "type": "keyword" }
}
}
}
}
As you can see, there is no user type anymore.

Resources