Elasticsearch Mapping Custom Propery with Script - elasticsearch

{
"mappings": {
"exam": {
"properties": {
"id": {
"type": "long"
},
"score": {
"type": "integer"
},
"custom_score": {
"type": "integer"
}
}
}
}
}
i have tihs mapping. The custom_score is calculcated with this script
if(score >= 0)
custom_score = score
else
custom_score = score-100
Is it possible elasticsearch auto index this field? I want to use this value to make some sortings to some queries. Thanks

You can use a transform but be careful that this feature is deprecated in 2.x and will be removed in ES 5. The only options remaining for ES 5 is to do the transformation in your own client code and index the value already changed accordingly.
But, for now, using transforms:
{
"mappings": {
"exam": {
"transform": {
"script": "if (ctx._source['score'].toInteger()>=0) ctx._source['custom_score'] = ctx._source['score'].toInteger(); else ctx._source['custom_score'] = ctx._source['score'].toInteger()-100"
},
"properties": {
"id": {
"type": "long"
},
"score": {
"type": "integer"
},
"custom_score": {
"type": "integer"
}
}
}
}
}

Related

How to avoid index explosion in ElasticSearch

I have two docs from the same index that originally look like this (only _source value is shown here)
{
"id" : "3",
"name": "Foo",
"property":{
"schemaId":"guid_of_the_RGB_schema_defined_extenally",
"value":{
"R":255,
"G":100,
"B":20
}
}
}
{
"id" : "2",
"name": "Bar",
"property":{
"schemaId":"guid_of_the_HSL_schema_defined_extenally",
"value":{
"H":255,
"S":100,
"L":20
}
}
}
The schema(used for validation of value) is stored outside of ES since it has nothing to do with the indexing.
If I don't define mapping, the value field will be consider Object mapping. And its subfield will grow once there is a new subfield.
Currently, ElasticSearch supports Flattened mapping https://www.elastic.co/guide/en/elasticsearch/reference/current/flattened.html to prevent this explosion in the index. However it has a limited support for searching for inner field due to its restriction: As with queries, there is no special support for numerics — all values in the JSON object are treated as keywords. When sorting, this implies that values are compared lexicographically.
I need to be able to query the index to find the document match a given doc (e.g. B in the range [10,30])
So far I come up with a solution that structure my doc like this
{
"id":4,
"name":"Boo",
"property":
{
"guid_of_the_normalized_RGB_schema_defined_extenally":
{
"R":0.1,
"G":0.2,
"B":0.5
}
}
Although it does not solve my issue of the explosion in mapping, it mitigates some other issue.
My mapping now will look similar like this for the field property
"property": {
"properties": {
"guid_of_the_RGB_schema_defined_extenally": {
"properties": {
"B": {
"type": "long"
},
"G": {
"type": "long"
},
"R": {
"type": "long"
}
}
},
"guid_of_the_normalized_RGB_schema_defined_extenally": {
"properties": {
"B": {
"type": "float"
},
"G": {
"type": "float"
},
"R": {
"type": "float"
}
},
"guid_of_the_HSL_schema_defined_extenally": {
"properties": {
"B": {
"type": "float"
},
"G": {
"type": "float"
},
"R": {
"type": "float"
}
}
}
}
}
This solve the issue with the case where the field have the same name but different data type.
Can someone suggest me a solution that could solve the explosion of indices with out suffering from the limit that the Flattened has in searching?
To avoid mapping explosion, the best solution is to normalize your data better.
You can set "dynamic": "strict", in your mapping, then a doc will be rejected if it contains a field which is not already in the mapping.
After that, you can still add new fields but you will have to add them explicitly in the mapping before.
You can add a pipeline to clean up and normalize your data before ingestion.
If you don't want, or cannot reindex:
To make your query easy even if you can not know the "middle" part of your key, you can use a multimatch with a star.
GET myindex/_search
{
"query": {
"multi_match": {
"query": 0.5,
"fields": ["property.*.B"]
}
}
}
But you will still not be able to sort it as you want.
For ordering on multiple 'unknown' field names without touching the data, you can use a script: https://www.elastic.co/guide/en/elasticsearch/painless/current/painless-sort-context.html
But maybe you could simplify the whole process by adding a dynamic template to your index.
PUT test/_mapping
{
"dynamic_templates": [
{
"unified_red": {
"path_match": "property.*.R",
"mapping": {
"type": "float",
"copy_to": "unified_color.R"
}
}
},
{
"unified_green": {
"path_match": "property.*.G",
"mapping": {
"type": "float",
"copy_to": "unified_color.G"
}
}
},
{
"unified_blue": {
"path_match": "property.*.B",
"mapping": {
"type": "float",
"copy_to": "unified_color.B"
}
}
}
],
"properties": {
"unified_color": {
"properties": {
"R": {
"type": "float"
},
"G": {
"type": "float"
},
"B": {
"type": "float"
}
}
}
}
}
Then you'll be able to query any value with the same query :
GET test/_search
{
"query": {
"range": {
"unified_color.B": {
"gte": 0.1,
"lte": 0.6
}
}
}
}
For already existing fields, you'll have to add the copy_to by yourself on the mapping, and after that run an _update_by_query to populate them.

Elasticsearch remove a field from an object of an array in a dynamically generated index

I'm trying to delete fields from an object of an array in Elasticsearch. The index has been dynamically generated.
This is the mapping:
{
"mapping": {
"_doc": {
"properties": {
"age": {
"type": "long"
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"result": {
"properties": {
"resultid": {
"type": "long"
},
"resultname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
},
"timestamp": {
"type": "date"
}
}
}
}
}
}
this is a document:
{
"result": [
{
"resultid": 69,
"resultname": "SFO"
},
{
"resultid": 151,
"resultname": "NYC"
}
],
"age": 54,
"name": "Jorge",
"timestamp": "2020-04-02T16:07:47.292000"
}
My goals is to remove all the fields resultid in result in all the document of the index. After update the document should look like this:
{
"result": [
{
"resultname": "SFO"
},
{
"resultname": "NYC"
}
],
"age": 54,
"name": "Jorge",
"timestamp": "2020-04-02T16:07:47.292000"
}
I tried using the following articles on stackoverflow but with no luck:
Remove elements/objects From Array in ElasticSearch Followed by Matching Query
remove objects from array that satisfying the condition in elastic search with javascript api
Delete nested array in elasticsearch
Removing objects from nested fields in ElasticSearch
Hopefully someone can help me find a solution.
You should reindex your index in a new one with _reindex API and call a script to remove your fields :
POST _reindex
{
"source": {
"index": "my-index"
},
"dest": {
"index": "my-index-reindex"
},
"script": {
"source": """
for (int i=0;i<ctx._source.result.length;i++) {
ctx._source.result[i].remove("resultid")
}
"""
}
}
After you can delete your first index :
DELETE my-index
And reindex it :
POST _reindex
{
"source": {
"index": "my-index-reindex"
},
"dest": {
"index": "my-index"
}
}
I combined the answer from Luc E with some of my own knowledge in order to reach a solution without reindexing.
POST INDEXNAME/TYPE/_update_by_query?wait_for_completion=false&conflicts=proceed
{
"script": {
"source": "for (int i=0;i<ctx._source.result.length;i++) { ctx._source.result[i].remove(\"resultid\")}"
},
"query": {
"bool": {
"must": [
{
"exists": {
"field": "result.id"
}
}
]
}
}
}
Thanks again Luc!
If your array has more than one copy of element you want to remove. Use this:
ctx._source.some_array.removeIf(tag -> tag == params['c'])

elasticsearch reindex nested object's element to keyword

I have an index structured like below:
"my_index": {
"mappings": {
"my_index": {
"properties": {
"adId": {
"type": "keyword"
},
"name": {
"type": "keyword"
},
"title": {
"type": "keyword"
},
"creativeStatistics": {
"type": "nested",
"properties": {
"clicks": {
"type": "long"
},
"creativeId": {
"type": "keyword"
}
}
}
}
}
}
}
I need to remove the nested object in a new index and just save the creativeId as a new keyword (to make it clear: I know I will loose the clicks data, and it is not important). It means the final new index scheme would be:
"my_new_index": {
"mappings": {
"my_new_index": {
"properties": {
"adId": {
"type": "keyword"
},
"name": {
"type": "keyword"
},
"title": {
"type": "keyword"
},
"creativeId": {
"type": "keyword"
}
}
}
}
}
Right now each row has exactly one creativeStatistics. and therefore there is no complexity in selecting one of the creativeIds.
I know it is possible to reindex using painless scripts, but I don't know how can I do that. Any help will be appreciated.
You can do it like this:
POST _reindex
{
"source": {
"index": "my_old_index"
},
"dest": {
"index": "my_new_index"
},
"script": {
"source": "if (ctx._source.creativeStatistics != null && ctx._source.creativeStatistics.size() > 0) {ctx._source.creativeId = ctx._source.creativeStatistics[0].creativeId; ctx._source.remove('creativeStatistics')}",
"lang": "painless"
}
}
You can also create a Pipeline by creating a Script Processor as follows:
PUT _ingest/pipeline/my_pipeline
{
"description" : "My pipeline",
"processors" : [
{ "script" : {
"source": "for (item in ctx.creativeStatistics) { if(item.creativeId!=null) {ctx.creativeId = item.creativeId;} }"
}
},
{
"remove": {
"field": "creativeStatistics"
}
}
]
}
Note that if you have multiple nested objects, it would append the last object's creativeId. And it would only add creativeId if a source document has one in its creativeStatistics.
Below is how you can then use reindex query:
POST _reindex
{
"source": {
"index": "creativeindex_src"
},
"dest": {
"index": "creativeindex_dest",
"pipeline": "my_pipeline"
}
}

Nested query in ElasticSearch - two levels

I have the next mapping :
"c_index": {
"aliases": {},
"mappings": {
"an": {
"properties": {
"id": {
"type": "string"
},
"sm": {
"type": "nested",
"properties": {
"cr": {
"type": "nested",
"properties": {
"c": {
"type": "string"
},
"e": {
"type": "long"
},
"id": {
"type": "string"
},
"s": {
"type": "long"
}
}
},
"id": {
"type": "string"
}
}
}
}
}
}
And I need a query than gives me all the cr's when:
an.id == x and sm.id == y
I tried with :
{"query":{"bool":{"should":[{"terms": {"_id": ["x"]}},
{"nested":{"path": "sm","query":{
"match": {"sm.id":"y"}}}}]}}}
But runs very slow and gives more info than i need.
What's the most efficient way to do that ? Thank you!
You don't need nested query here. Also, use filter instead of should if you want to find documents matching all the queries (the exception would be if you wanted the query to affect the score, like match query, which is not the case here, then you could use should + minimum_should_match option)
{
"query": {
"bool": {
"filter": [
{ "term": { "_id": "x" } },
{ "term": { "sm.id": "y" } }
]
}
}
}

Elastic search range not working for double indexed as long

I have mapped a field as long, but the input data is decimal (100.123).
I've tried any range search and it doesn't work. I've verified and the data is in the proper index and I can find them if I search for missing/exists.
Range query:
"range": {
"nr_val": {
"from": 123,
"to": 1234
}
}
Is Elasticsearch just ignoring the values, treating them as strings in a range search ?
So in my situation, what can I do to make a range search from:100, to:200 work for 100.123 other than a full dump and re-import? Are there any conversion options available?
Update with detailed specs
{
"state": "open",
"settings": {
"index": {
"creation_date": "1447858537098",
"number_of_shards": "5",
"uuid": "iiPzQXasQadvnDF1da8oMw",
"version": {
"created": "1070299"
},
"number_of_replicas": "1"
}
},
"mappings": {
"mongo_doc": {
"properties": {
"parent": {
"type": "string"
},
"data.current.specs.nr._nrm_val": {
"type": "double"
},
"data.current.specs.nr_b._nrm_val": {
"type": "double"
},
"data": {
"properties": {
"current": {
"properties": {
"specs": {
"properties": {
"nr": {
"properties": {
"_nrm_val": {
"type": "double"
}
}
},
"nr_b": {
"properties": {
"_nrm_val": {
"type": "long"
}
}
}
}
}
}
}
}
}
}
}
},
"aliases": []
}
Seems that the mapping is not quite right... switched to ['data']['properties']['current']['properties'](...) notation.
In your case that field should have been double, not long. And the indexed value for 100.123 is 100 and you loose the decimals.
At this point, other than re-indexing which is ideal, probably just scripted filtering will do it:
{
"query": {
"filtered": {
"filter": {
"script": {
"script": "_source['nr'].value >= param1 && _source['nr'].value <= param2",
"params": {
"param1": 100,
"param2": 200
}
}
}
}
}
}
but it will be expensive because of the _source loading.

Resources