Why elasticsearch dynamic templates create explicit fields in the mapping? - elasticsearch

The document that I want to index is as follows
{
"Under Armour": 0.16667,
"Skechers": 0.14774,
"Nike": 0.24404,
"New Balance": 0.11905,
"SONOMA Goods for Life": 0.11236
}
Fields under this node are dynamic, which means when documents are getting added various fields(brands) will come with those documents.
If I create an index without specifying a mapping, ES says "maximum number of fields (1000) have been reached". Though we can increase this value, it is not a good practice.
In order to support the above document, I created a mapping as follows and created an index.
{
"mappings": {
"my_type": {
"dynamic_templates": [
{
"template1":{
"match_mapping_type": "double",
"match": "*",
"mapping": {
"type": "float"
}
}
}
]
}
}
}
When I add above document to the created index and checked the mapping of the index again. It looks like as below.
{
"my_index": {
"mappings": {
"my_type": {
"dynamic_templates": [
{
"template1": {
"match": "*",
"match_mapping_type": "double",
"mapping": {
"type": "float"
}
}
}
],
"properties": {
"New Balance": {
"type": "float"
},
"Nike": {
"type": "float"
},
"SONOMA Goods for Life": {
"type": "float"
},
"Skechers": {
"type": "float"
},
"Under Armour": {
"type": "float"
}
}
}
}
}
}
If you clearly see the mapping that I created earlier and the mapping when I added a document to the index is different. It added fields statically added to the mapping. When I keep adding more documents, new fields will be added to the mapping (which will end up with maximum number of fields(1000) has been reached).
My question is,
The mapping that I mentioned above is correct for the above mentioned document.
If it is correct, why new fields are added to the mapping?

According to the posts that I read, increasing the number of fields in an index is not a good practice it may increase the resource usage.
In this case, when there are enormous number of brands are there and new brands to be introduced.
The proper solution for such a case is, introduce key-value pairs. (Probably I need to do a transformation during ETL)
{
"brands": [
{
"key": "Under Armour",
"value": 0.16667
},
{
"key": "Skechers",
"value": 0.14774
},
{
"key": "Nike",
"value": 0.24404
}
]
}
When the data is formatted as above, the map won't be change.
A good reading that I found was
https://www.elastic.co/blog/found-beginner-troubleshooting#keyvalue-woes
Thanks #Val for the suggestion

Related

Can anyone help me - how to use arrays in opensearch?

I put an object with some field and i wanna figure out how to mapping the index to handle and show the values like elasticsearch. I dunno why opensearch separate to individual fields the values. Both app has the same index mappings but the display is different for something.
I tried to map the object type set to nested but nothing changes
PUT test
{
"mappings": {
"properties": {
"szemelyek": {
"type": "nested",
"properties": {
"szam": {
"type": "integer"
},
"nev": {
"type": "text"
}
}
}
}
}
}

Changing mapping fields structure flow in Elasticsearch

I have an index with the mappings
{
"mappings": {
"properties": {
"title": {
"type": "text"
},
"location": {
"type": "keyword"
}
}
}
}
In location field at the moment we are storing the city name.
And we need to change the mapping structure to store also the country and state, so the mapping will be
{
"mappings": {
"properties": {
"title": {
"type": "text"
},
"location": {
"properties": {
"country": {
"type": "keyword"
},
"state": {
"type": "keyword"
},
"city": {
"type": "keyword"
}
}
}
}
}
}
What is the recommended flow for such migration?
Elasticsearch does not allow changing the definition of mapping for existing fields, just the addition of new field definitions as you can check here.
So one of the possibilities is:
create a new field definition, with a different name obviously, to store the new data type.
Stop to use the location field
The another but costly possibility is:
create a new index with the right mapping
do the reindex of the data from the old index to the new index
To reindex the data from the old index with the right format to the new index you can use a painless script:
POST /_reindex
{
"source": {
"index": "old_index_name"
},
"dest": {
"index": "new_index_name"
},
"script": {
"lang": "painless",
"params" : {
"location":{
"country" : null,
"state": null,
"city": null
}
},
"source": """
params.location.city = ctx._source.location
ctx._source.location = params.location
"""
}
}
After you can update country and state fields for the old data.
If you need the same index name, use the new index you created with the correct mapping just as a backup, then you need to delete the index with the old mapping and recreate it again with the same name using the correct mapping and bring the data that are in the other reserve index.
For more about change the mapping read CHANGE ELASTIC SEARCH MAPPING.
Follow these steps:
Create a new index
Reindex the existing index to populate the new index
Aliases can help cutover from over index to another

Elasticsearch searchable synthetic fields

Provided that in a source document (JSON) exist a couple of fields named, a and b,
that are of type long, I would like to construct a synthetic field (e.g. c)
by concatenating the values of the previous fields with an underscore and
index it as keyword.
That is, I am looking into a feature that could be supported with an imaginary, partial, mapping like this:
...
"a": { "type": "long" },
"b": { "type": "long" },
"c": {
"type": "keyword"
"expression": "${a}_${b}"
},
...
NOTE: The mapping above was made up just for the sake of the example. It is NOT valid!
So what I am looking for, is if there is a feature in elasticsearch, a recipe or hint to support
this requirement. The field need not be registered in _source, just need to be searchable.
There are 2 steps to this -- a dynamic_mapping and an ingest_pipeline.
I'm assuming your field c is non-trivial so you may want to match that field in a dynamic template using a match and assign the keyword mapping to it:
PUT synthetic
{
"mappings": {
"dynamic_templates": [
{
"c_like_field": {
"match_mapping_type": "string",
"match": "c*",
"mapping": {
"type": "keyword"
}
}
}
],
"properties": {
"a": {
"type": "long"
},
"b": {
"type": "long"
}
}
}
}
Then you can set up a pipeline which'll concatenate your a & b:
PUT _ingest/pipeline/combined_ab
{
"description" : "Concatenates fields a & b",
"processors" : [
{
"set" : {
"field": "c",
"value": "{{_source.a}}_{{_source.b}}"
}
}
]
}
After ingesting a new doc (with the activated pipeline!)
POST synthetic/_doc?pipeline=combined_ab
{
"a": 531351351351,
"b": 251531313213
}
we're good to go:
GET synthetic/_search
yields
{
"a":531351351351,
"b":251531313213,
"c":"531351351351_251531313213"
}
Verify w/ GET synthetic/_mapping too.

Elasticsearch indexing homogenous objects under dynamic keys

The kind of document we want to index and query contains variable keys but are grouped into a common root key as follows:
{
"articles": {
"0000000000000000000000000000000000000001": {
"crawled_at": "2016-05-18T19:26:47Z",
"language": "en",
"tags": [
"a",
"b",
"d"
]
},
"0000000000000000000000000000000000000002": {
"crawled_at": "2016-05-18T19:26:47Z",
"language": "en",
"tags": [
"b",
"c",
"d"
]
}
},
"articles_count": 2
}
We want to able to ask: what documents contains articles with tags "b" and "d", with language "en".
The reason why we don't use list for articles, is that elasticsearch can efficiently and automatically merge documents with partial updates. The challenge however is to index the objects inside under the variable keys. One possible way we tried is to use dynamic_templates as follows:
{
"sources": {
"dynamic": "strict",
"dynamic_templates": [
{
"article_template": {
"mapping": {
"fields": {
"crawled_at": {
"format": "dateOptionalTime",
"type": "date"
},
"language": {
"index": "not_analyzed",
"type": "string"
},
"tags": {
"index": "not_analyzed",
"type": "string"
}
}
},
"path_match": "articles.*"
}
}
],
"properties": {
"articles": {
"dynamic": false,
"type": "object"
},
"articles_count": {
"type": "integer"
}
}
}
}
However this dynamic template fails because when documents are inserted, the following can be found in the logs:
[2016-05-30 17:44:45,424][WARN ][index.codec] [node]
[main] no index mapper found for field:
[articles.0000000000000000000000000000000000000001.language] returning
default postings format
Same for the two other fields as well. When I try to query for the existence of a certain article, or even articles it doesn't return any document (no error but empty hits):
curl -LsS -XGET 'localhost:9200/main/sources/_search' -d '{"query":{"exists":{"field":"articles"}}}'
When I query for the existence of articles_count, it returns everything. Is there a minor error in what we are trying to achieve, for example in the schema: the definition of articles as a property and in the dynamic template? What about the types and dynamic false? The path seems correct. Maybe this is not possible to define templates for objects in variable-keys, but it should be according to the documentation.
Otherwise, what alternatives are possible without changing the document if possible?
Notes: we have other types in the same index main that also have these fields like language, I ignore if it could influence. The version of ES we are using is 1.7.5 (we cannot upgrade to 2.X for now).

How to specify or target a field from a specific document type in queries or filters in Elasticsearch?

Given:
Documents of two different types, let's say 'product' and 'category', are indexed to the same Elasticsearch index.
Both document types have a field 'tags'.
Problem:
I want to build a query that returns results of both types, but the documents of type 'product' are allowed to have tags 'X' and 'Y', and the documents of type 'category' are only allowed to have tag 'Z'. How can I achieve this? It appears I can't use product.tags and category.tags since then ES will look for documents' product/category field, which is not what I intend.
Note:
While for the example above there might be some kind of workaround, I'm looking for a general way to target or specify fields of a specific document type when writing queries. I basically want to 'namespace' the field names used in my query so only documents of the type I want to work with are considered.
I think field aliasing would be the best answer for you, but it's not possible.
Instead you can use "copy_to" but I it probably affects index size:
DELETE /test
PUT /test
{
"mappings": {
"product" : {
"properties": {
"tags": { "type": "string", "copy_to": "ptags" },
"ptags": { "type": "string" }
}
},
"category" : {
"properties": {
"tags": { "type": "string", "copy_to": "ctags" },
"ctags": { "type": "string" }
}
}
}
}
PUT /test/product/1
{ "tags":"X" }
PUT /test/product/2
{ "tags":"Y" }
PUT /test/category/1
{ "tags":"Z" }
And you can query one of fields or many of them:
GET /test/product,category/_search
{
"query": {
"term": {
"ptags": {
"value": "x"
}
}
}
}
GET /test/product,category/_search
{
"query": {
"multi_match": {
"query": "x",
"fields": [ "ctags", "ptags" ]
}
}
}

Resources