Use condition in Elasticsearch mapping - elasticsearch

I'd like to know if it's possible to use conditions when I define the mapping of an index, because depending on certain values of a field, other fields are then required.
For example :
PUT /my-index
{
"mappings": {
"properties": {
"age": { "type": "integer" },
"email": { "type": "keyword" },
"name": { "type": "text" },
"is_external": {"type": "boolean" }
if [is_external] == true {
"adress": { "type": "text" },
"date": { "type": "date" }
}
}
}
}
If there's no way to do this, how can I deal with it ?

It doesn't make sense to do this kind of checks at the mapping level, because ultimately your index will contain both documents with is_external: true and is_external: false and so the mapping will have to also contain the address and date field definitions for the documents where is_external: true and there can only be one single mapping for each index.
If you want to enforce that a document with is_external: true also contains the address and date fields, then you can do this using an ingest pipeline with a drop processor:
...
{
"drop": {
"if" : "ctx.is_external == true && (ctx.date == null || ctx.address == null)"
}
}

Related

Elasticsearch searchable synthetic fields

Provided that in a source document (JSON) exist a couple of fields named, a and b,
that are of type long, I would like to construct a synthetic field (e.g. c)
by concatenating the values of the previous fields with an underscore and
index it as keyword.
That is, I am looking into a feature that could be supported with an imaginary, partial, mapping like this:
...
"a": { "type": "long" },
"b": { "type": "long" },
"c": {
"type": "keyword"
"expression": "${a}_${b}"
},
...
NOTE: The mapping above was made up just for the sake of the example. It is NOT valid!
So what I am looking for, is if there is a feature in elasticsearch, a recipe or hint to support
this requirement. The field need not be registered in _source, just need to be searchable.
There are 2 steps to this -- a dynamic_mapping and an ingest_pipeline.
I'm assuming your field c is non-trivial so you may want to match that field in a dynamic template using a match and assign the keyword mapping to it:
PUT synthetic
{
"mappings": {
"dynamic_templates": [
{
"c_like_field": {
"match_mapping_type": "string",
"match": "c*",
"mapping": {
"type": "keyword"
}
}
}
],
"properties": {
"a": {
"type": "long"
},
"b": {
"type": "long"
}
}
}
}
Then you can set up a pipeline which'll concatenate your a & b:
PUT _ingest/pipeline/combined_ab
{
"description" : "Concatenates fields a & b",
"processors" : [
{
"set" : {
"field": "c",
"value": "{{_source.a}}_{{_source.b}}"
}
}
]
}
After ingesting a new doc (with the activated pipeline!)
POST synthetic/_doc?pipeline=combined_ab
{
"a": 531351351351,
"b": 251531313213
}
we're good to go:
GET synthetic/_search
yields
{
"a":531351351351,
"b":251531313213,
"c":"531351351351_251531313213"
}
Verify w/ GET synthetic/_mapping too.

Why elasticsearch dynamic templates create explicit fields in the mapping?

The document that I want to index is as follows
{
"Under Armour": 0.16667,
"Skechers": 0.14774,
"Nike": 0.24404,
"New Balance": 0.11905,
"SONOMA Goods for Life": 0.11236
}
Fields under this node are dynamic, which means when documents are getting added various fields(brands) will come with those documents.
If I create an index without specifying a mapping, ES says "maximum number of fields (1000) have been reached". Though we can increase this value, it is not a good practice.
In order to support the above document, I created a mapping as follows and created an index.
{
"mappings": {
"my_type": {
"dynamic_templates": [
{
"template1":{
"match_mapping_type": "double",
"match": "*",
"mapping": {
"type": "float"
}
}
}
]
}
}
}
When I add above document to the created index and checked the mapping of the index again. It looks like as below.
{
"my_index": {
"mappings": {
"my_type": {
"dynamic_templates": [
{
"template1": {
"match": "*",
"match_mapping_type": "double",
"mapping": {
"type": "float"
}
}
}
],
"properties": {
"New Balance": {
"type": "float"
},
"Nike": {
"type": "float"
},
"SONOMA Goods for Life": {
"type": "float"
},
"Skechers": {
"type": "float"
},
"Under Armour": {
"type": "float"
}
}
}
}
}
}
If you clearly see the mapping that I created earlier and the mapping when I added a document to the index is different. It added fields statically added to the mapping. When I keep adding more documents, new fields will be added to the mapping (which will end up with maximum number of fields(1000) has been reached).
My question is,
The mapping that I mentioned above is correct for the above mentioned document.
If it is correct, why new fields are added to the mapping?
According to the posts that I read, increasing the number of fields in an index is not a good practice it may increase the resource usage.
In this case, when there are enormous number of brands are there and new brands to be introduced.
The proper solution for such a case is, introduce key-value pairs. (Probably I need to do a transformation during ETL)
{
"brands": [
{
"key": "Under Armour",
"value": 0.16667
},
{
"key": "Skechers",
"value": 0.14774
},
{
"key": "Nike",
"value": 0.24404
}
]
}
When the data is formatted as above, the map won't be change.
A good reading that I found was
https://www.elastic.co/blog/found-beginner-troubleshooting#keyvalue-woes
Thanks #Val for the suggestion

How to specify or target a field from a specific document type in queries or filters in Elasticsearch?

Given:
Documents of two different types, let's say 'product' and 'category', are indexed to the same Elasticsearch index.
Both document types have a field 'tags'.
Problem:
I want to build a query that returns results of both types, but the documents of type 'product' are allowed to have tags 'X' and 'Y', and the documents of type 'category' are only allowed to have tag 'Z'. How can I achieve this? It appears I can't use product.tags and category.tags since then ES will look for documents' product/category field, which is not what I intend.
Note:
While for the example above there might be some kind of workaround, I'm looking for a general way to target or specify fields of a specific document type when writing queries. I basically want to 'namespace' the field names used in my query so only documents of the type I want to work with are considered.
I think field aliasing would be the best answer for you, but it's not possible.
Instead you can use "copy_to" but I it probably affects index size:
DELETE /test
PUT /test
{
"mappings": {
"product" : {
"properties": {
"tags": { "type": "string", "copy_to": "ptags" },
"ptags": { "type": "string" }
}
},
"category" : {
"properties": {
"tags": { "type": "string", "copy_to": "ctags" },
"ctags": { "type": "string" }
}
}
}
}
PUT /test/product/1
{ "tags":"X" }
PUT /test/product/2
{ "tags":"Y" }
PUT /test/category/1
{ "tags":"Z" }
And you can query one of fields or many of them:
GET /test/product,category/_search
{
"query": {
"term": {
"ptags": {
"value": "x"
}
}
}
}
GET /test/product,category/_search
{
"query": {
"multi_match": {
"query": "x",
"fields": [ "ctags", "ptags" ]
}
}
}

ElasticSearch - issue with sub term aggregation with array fields

I have the two following documents:
{
"title":"The Avengers",
"year":2012,
"casting":[
{
"name":"Robert Downey Jr.",
"category":"Actor",
},
{
"name":"Chris Evans",
"category":"Actor",
}
]
}
and:
{
"title":"The Judge",
"year":2014,
"casting":[
{
"name":"Robert Downey Jr.",
"category":"Producer",
},
{
"name":"Robert Duvall",
"category":"Actor",
}
]
}
I would like to perform aggregations, based on two fields : casting.name and casting.category.
I tried with a TermsAggregation based on casting.name field, with a subaggregation, which is another TermsAggregation based on the casting.category field.
The problem is that for the "Chris Evans" entry, ElasticSearch set buckets for ALL categories (Actor, Producer) whereas it should set only 1 bucket (Actor).
It seems that there is a cartesian product between all casting.category occurences and all casting.name occurences.
It behaves like this with array fields (casting), whereas I don't have the problem with simple fields (as title, or year).
I also tried to use nested aggregations, but maybe not properly, and ElasticSearch throws an error telling that casting.category is not a nested field.
Any idea here?
Elasticsearch will flatten the nested objects, so internally you will get:
{
"title":"The Judge",
"year":2014,
"casting.name": ["Robert Downey Jr.","Robert Duvall"],
"casting.category": ["Producer", "Actor"]
}
if you want to keep the relationship you'll need to use either nested objects or a parent child relationship
To do a nested mapping you'd need to do something like this:
"mappings": {
"movies": {
"properties": {
"title" : { "type": "string" },
"year" : { "type": "integer" },
"casting": {
"type": "nested",
"properties": {
"name": { "type": "string" },
"category": { "type": "string" }
}
}
}
}
}

Elastic search AND operator not woking

If I run below query having AND it returns nothing:
TF1:"rohan sharma" AND TF3:"gaurav"
but if I use OR its returns result:
TF1:"rohan sharma" OR TF3:"gaurav"
both data is available
Here is mapping:
{
"TEST": {
"mappings": {
"indextestField1": {
"_ttl": {
"enabled": true
},
"properties": {
"TF1": {
"type": "string"
},
"TF2": {
"type": "string"
}
}
}
}
}
}
Here is my java code
SearchResponse response = client.prepareSearch("TEST")
.setQuery(QueryBuilders.queryString("TF1:\"rohan sharma\" AND TF2:\"milind bagal\""))
.execute()
.actionGet();
Your mapping must be wrong, add your analyzer to the fields TF1 and TF3.
{
"TEST": {
"mappings": {
"properties": {
"TF1": {
"type": "string",
"index": "not_analyzed"
},
"TF2": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
References,
hack on elasticsearch (3 hrs)
Scripts example
About the command you provided, you double-quoted the "rohan sharma", which means the positions of the two words must be located in order.
And I guess the document you want to find, these two words "rohan sharma" are not next to each other, which means this condition failed and it results in not-match using the AND operator.
So if you don't need the positional criteria, don't use double-quotes and just use:
TF1:rohan AND TF1:sharma AND TF3:gaurav
It's just my assumption. For more detailed answer, you might need to provide the document you want to find using the command you provides.

Resources