How to avoid index explosion in ElasticSearch

How to avoid index explosion in ElasticSearch - elasticsearch

I have two docs from the same index that originally look like this (only _source value is shown here)
{
"id" : "3",
"name": "Foo",
"property":{
"schemaId":"guid_of_the_RGB_schema_defined_extenally",
"value":{
"R":255,
"G":100,
"B":20
}
}
}
{
"id" : "2",
"name": "Bar",
"property":{
"schemaId":"guid_of_the_HSL_schema_defined_extenally",
"value":{
"H":255,
"S":100,
"L":20
}
}
}
The schema(used for validation of value) is stored outside of ES since it has nothing to do with the indexing.
If I don't define mapping, the value field will be consider Object mapping. And its subfield will grow once there is a new subfield.
Currently, ElasticSearch supports Flattened mapping https://www.elastic.co/guide/en/elasticsearch/reference/current/flattened.html to prevent this explosion in the index. However it has a limited support for searching for inner field due to its restriction: As with queries, there is no special support for numerics — all values in the JSON object are treated as keywords. When sorting, this implies that values are compared lexicographically.
I need to be able to query the index to find the document match a given doc (e.g. B in the range [10,30])
So far I come up with a solution that structure my doc like this
{
"id":4,
"name":"Boo",
"property":
{
"guid_of_the_normalized_RGB_schema_defined_extenally":
{
"R":0.1,
"G":0.2,
"B":0.5
}
}
Although it does not solve my issue of the explosion in mapping, it mitigates some other issue.
My mapping now will look similar like this for the field property
"property": {
"properties": {
"guid_of_the_RGB_schema_defined_extenally": {
"properties": {
"B": {
"type": "long"
},
"G": {
"type": "long"
},
"R": {
"type": "long"
}
}
},
"guid_of_the_normalized_RGB_schema_defined_extenally": {
"properties": {
"B": {
"type": "float"
},
"G": {
"type": "float"
},
"R": {
"type": "float"
}
},
"guid_of_the_HSL_schema_defined_extenally": {
"properties": {
"B": {
"type": "float"
},
"G": {
"type": "float"
},
"R": {
"type": "float"
}
}
}
}
}
This solve the issue with the case where the field have the same name but different data type.
Can someone suggest me a solution that could solve the explosion of indices with out suffering from the limit that the Flattened has in searching?

To avoid mapping explosion, the best solution is to normalize your data better.
You can set "dynamic": "strict", in your mapping, then a doc will be rejected if it contains a field which is not already in the mapping.
After that, you can still add new fields but you will have to add them explicitly in the mapping before.
You can add a pipeline to clean up and normalize your data before ingestion.
If you don't want, or cannot reindex:
To make your query easy even if you can not know the "middle" part of your key, you can use a multimatch with a star.
GET myindex/_search
{
"query": {
"multi_match": {
"query": 0.5,
"fields": ["property.*.B"]
}
}
}
But you will still not be able to sort it as you want.
For ordering on multiple 'unknown' field names without touching the data, you can use a script: https://www.elastic.co/guide/en/elasticsearch/painless/current/painless-sort-context.html
But maybe you could simplify the whole process by adding a dynamic template to your index.
PUT test/_mapping
{
"dynamic_templates": [
{
"unified_red": {
"path_match": "property.*.R",
"mapping": {
"type": "float",
"copy_to": "unified_color.R"
}
}
},
{
"unified_green": {
"path_match": "property.*.G",
"mapping": {
"type": "float",
"copy_to": "unified_color.G"
}
}
},
{
"unified_blue": {
"path_match": "property.*.B",
"mapping": {
"type": "float",
"copy_to": "unified_color.B"
}
}
}
],
"properties": {
"unified_color": {
"properties": {
"R": {
"type": "float"
},
"G": {
"type": "float"
},
"B": {
"type": "float"
}
}
}
}
}
Then you'll be able to query any value with the same query :
GET test/_search
{
"query": {
"range": {
"unified_color.B": {
"gte": 0.1,
"lte": 0.6
}
}
}
}
For already existing fields, you'll have to add the copy_to by yourself on the mapping, and after that run an _update_by_query to populate them.

Related

How to rename a field in Elasticsearch?

I have an index in Elasticsearch with the following field mapping:
{
"version_data": {
"properties": {
"title": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"updated_at": {
"type": "date"
},
"updated_by": {
"type": "keyword"
}
}
}
}
I have already created some documents in it and now want to rename version_data field with _version_data.
Is there any way in the Elasticsearch to rename a field within the mapping and in documents?

The closest thing is the alias data type.
In your mapping you could link it from the old to the new name like this:
PUT test/_mapping
{
"properties": {
"_version_data": {
"type": "alias",
"path": "version_data"
}
}
}
BTW I would generally avoid leading underscored since those normally used for internal fields like _id.

Return only top level fields in elasticsearch query?

I have a document that has nested fields. Example:
"mappings": {
"blogpost": {
"properties": {
"title": { "type": "text" },
"body": { "type": "text" },
"comments": {
"type": "nested",
"properties": {
"name": { "type": "text" },
"comment": { "type": "text" },
"age": { "type": "short" },
"stars": { "type": "short" },
"date": { "type": "date" }
}
}
}
}
}
}
Can the query be modified so that the response only contains non-nested fields?
In this example, the response would only contain body and title.
Using _source you can exclude/include fields
GET /blogpost/_search
{
"_source":{
"excludes":["comments"]
}
}
But you have to explicitly put the field names inside exclude, I'm searching for a way to exclude all nested fields without knowing their field name

You can achieve that but in a static way, which means you entered the field(s) name using excludes keyword, like:
GET your_index/_search
{
"_source": {
"excludes": "comments"
},
"query": {
"match_all" : {}
}
}
excludes can take an array of strings; not just one string.

nested terms aggregation on object containing a string field

I like to run a nested terms aggregation on string field which is inside an object.
Usually, I use this query
"terms": {
"field": "fieldname.keyword"
}
to enable fielddata
But I am unable to do that for a nested document like this
{
"nested": {
"path": "objectField"
},
"aggs": {
"allmyaggs": {
"terms": {
"field": "objectField.fieldName.keyword"
}
}
}
}
The above query is just returning an empty buckets array
Is there a way this can be done without enabling field-data by default during index mapping.
Since that will take a large heap memory and I have already loaded a huge data without it
document mapping
{
"mappings": {
"properties": {
"productname": {
"type": "nested",
"properties": {
"productlineseqno": {
"type": "text"
},
"invoiceitemname": {
"type": "text"
},
"productlinename": {
"type": "text"
},
"productlinedescription": {
"type": "text"
},
"isprescribable": {
"type": "boolean"
},
"iscontrolleddrug": {
"type": "boolean"
}
}
}
sample document
{
"productname": [
{
"productlineseqno": "1.58",
"iscontrolleddrug": "false",
"productlinename": "Consultations",
"productlinedescription": "Consultations",
"isprescribable": "false",
"invoiceitemname": "invoice name"
}
]
}
Fixed
By changing the mapping to enable field data

Nested query is used to access nested fields similarly nested aggregation is needed to aggregation on nested fields
{
"aggs": {
"fieldname": {
"nested": {
"path": "objectField"
},
"aggs": {
"fields": {
"terms": {
"field": "objectField.fieldname.keyword",
"size": 10
}
}
}
}
}
}
EDIT1:
If you are searching for productname.invoiceitemname.keyword then it will give empty bucket as no field exists with that name.
You need to define your mapping like below
{
"mappings": {
"properties": {
"productname": {
"type": "nested",
"properties": {
"productlineseqno": {
"type": "text"
},
"invoiceitemname": {
"type": "text",
"fields":{ --> note
"keyword":{
"type":"keyword"
}
}
},
"productlinename": {
"type": "text"
},
"productlinedescription": {
"type": "text"
},
"isprescribable": {
"type": "boolean"
},
"iscontrolleddrug": {
"type": "boolean"
}
}
}
}
}
}
Fields
It is often useful to index the same field in different ways for
different purposes. This is the purpose of multi-fields. For instance,
a string field could be mapped as a text field for full-text search,
and as a keyword field for sorting or aggregations:
When mapping is not explicitly provided, keyword fields are created by default. If you are creating your own mapping(which you need to do for nested type), you need to provide keyword fields in mapping, wherever you intend to use them

Elasticsearch: Schema without mapping?

According to Elasticsearch's roadmap, mapping types are going to be completely removed at 7.x
How are we going to give a schema structure to Documents without mapping?
For example how would we replace this (A Doc/mapping_type with 3 fields of specific data type):
PUT twitter
{
"mappings": {
"user": {
"properties": {
"name": { "type": "text" },
"user_name": { "type": "keyword" },
"email": { "type": "keyword" }
}
}
}

They are going to remove types (user in you example) from mapping, because there is only 1 type per index now, the rest will be the same:
PUT twitter
{
"mappings": {
"_doc": {
"properties": {
"name": { "type": "text" },
"user_name": { "type": "keyword" },
"email": { "type": "keyword" }
}
}
}
}
As you can see, there is no user type anymore.

Elasticsearch Mapping Custom Propery with Script

{
"mappings": {
"exam": {
"properties": {
"id": {
"type": "long"
},
"score": {
"type": "integer"
},
"custom_score": {
"type": "integer"
}
}
}
}
}
i have tihs mapping. The custom_score is calculcated with this script
if(score >= 0)
custom_score = score
else
custom_score = score-100
Is it possible elasticsearch auto index this field? I want to use this value to make some sortings to some queries. Thanks

You can use a transform but be careful that this feature is deprecated in 2.x and will be removed in ES 5. The only options remaining for ES 5 is to do the transformation in your own client code and index the value already changed accordingly.
But, for now, using transforms:
{
"mappings": {
"exam": {
"transform": {
"script": "if (ctx._source['score'].toInteger()>=0) ctx._source['custom_score'] = ctx._source['score'].toInteger(); else ctx._source['custom_score'] = ctx._source['score'].toInteger()-100"
},
"properties": {
"id": {
"type": "long"
},
"score": {
"type": "integer"
},
"custom_score": {
"type": "integer"
}
}
}
}
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to avoid index explosion in ElasticSearch - elasticsearch

Related

How to rename a field in Elasticsearch?

Return only top level fields in elasticsearch query?

nested terms aggregation on object containing a string field

Elasticsearch: Schema without mapping?

Elasticsearch Mapping Custom Propery with Script

Categories

Resources