How exactly do mapped fields in elastic search work? - elasticsearch

The documentation is sparse and not entirely helpful. So say I have the following fields for my attribute:
{
"my_index": {
"mappings": {
"my_type": {
"my_attribute": {
"mapping": {
"my_attribute": {
"type": "string",
"analyzer": "my_analyzer",
"fields": {
"lowercased": {
"type": "string"
},
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
}
}
my_analyzer lowercases tokens (in addition to other stuff).
So now I would like to know if the following statements are true:
my_analyzer does not get applied to raw, because the not_analyzed index does not have any analyzers, as its name implies.
my_attribute and my_attribute.lowercased are the exact same, so it is redundant to have the field my_attribute.lowercased

Your first statement is correct, however the second is not. my_attribute and my_attribute.lowercased might not be the same since the former has your custom my_analyzer search and index analyzer, while my_attribute.lowercased has the standard analyzer (since no analyzer is specified the standard one kicks in).
Besides, your mapping is not correct the way it is written, it should be like this:
{
"mappings": {
"my_type": {
"properties": {
"my_attribute": {
"type": "string",
"analyzer": "my_analyzer",
"fields": {
"lowercased": {
"type": "string"
},
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}

Related

How to create a custom reusable type in ElasticSearch?

My json for ElasticSearch schema looks like this :-
{
"mappings": {
"properties": {
"DESCRIPTION_FR": {
"type": "text",
"analyzer": "french"
},
"FEEDBACK_FR": {
"type": "text",
"analyzer": "french"
},
"SOURCE_FR": {
"type": "text",
"analyzer": "french"
}
}
}
}
There are 100 of properties like this. Replicating a change across all the properties with this approach is redundant and erroneous.
Is there a way in ElasticSearch 7.2 to write custom data type and reuse it in property mapping.
{
"settings": {
//definition of custom type "text_fr"
},
"mappings": {
"properties": {
"DESCRIPTION_FR": {
"type": "text_fr"
},
"FEEDBACK_FR": {
"type": "text_fr"
},
"SOURCE_FR": {
"type": "text_fr"
}
}
}
}
Yes! What you're after is dynamic mapping templates. More specifically the match feature.
Define the target field names with a leading wildcard:
PUT my_index
{
"mappings": {
"dynamic_templates": [
{
"is_french_text": {
"match_mapping_type": "*",
"match": "*_FR",
"mapping": {
"type": "text",
"analyzer": "french"
}
}
}
]
}
}
Insert a doc:
POST my_index/_doc
{
"DESCRIPTION_FR": "je",
"FEEDBACK_FR": "oui",
"SOURCE_FR": "je ne sais quoi"
}
Verify the dynamically generated mapping:
GET my_index/_mapping

Elasticsearch fielddata vs. fields mapping

Can somebody explain me please what is the difference between settings fielddata and fields while mapping in Elasticsearch?
For example what is the difference between this two codes:
PUT my_index
{
"mappings": {
"properties": {
"my_field": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword" // for ordering
}
}
}
}
}
}
and
PUT my_index/_mapping
{
"properties": {
"my_field": {
"type": "text",
"fielddata": true // what is the difference?
}
}
}
Or can you tell me if this code does make any sence?
PUT my_index
{
"mappings": {
"properties": {
"my_field": {
"type": "text",
"fielddata": true,
"fields": {
"keyword": {
"type": "keyword" // for ordering
}
}
}
}
}
}
Since the main intent is to do sorting and aggregations, then definitely use the first option, i.e. the keyword (sub-)field.
fielddata is the old-fashioned way of doing it and eats up a lot more memory.
You can find more detailed information and a link to a related article here

kibana keyword occurrency across documents

I have been unable to show words occurrency in kibana inside a full_text field mapped as "type": "keyword" across documents in the index.
My first attempt involved the usage of an analyzer. However I have been unable to change the document in any way, the index mapping relfect the analyzer but no field reflect the analysis.
This is the simplified mapping:
{
"mappings": {
"doc": {
"properties": {
"text": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
},
"analyzed": {
"type": "text",
"analyzer": "rebuilt"
}
}
}
}
}
},
"settings": {
"analysis": {
"analyzer": {
"rebuilt": {
"tokenizer": "standard"
}
}
},
"index.mapping.ignore_malformed": true,
"index.mapping.total_fields.limit": 2000
}
}
but still I'm unable to see the array of words that I expect to be saved under the text.analyzed field, indeed that fields does not exists and I'm wondering why
It seems like settings fielddata=true link, in spite of being heavily discouraged, solved my problem (at least for now), and allows me to visualize in kibana the occurrence (or absolute frequency) of each word in the text field across documents.
The final version of the proposed simplified mapping therefore became:
{
"mappings": {
"doc": {
"properties": {
"text": {
"type": "text",
"analyzer": "rebuilt",
"fielddata": true
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
},
"settings": {
"analysis": {
"analyzer": {
"rebuilt": {
"tokenizer": "standard"
}
}
},
"index.mapping.ignore_malformed": true,
"index.mapping.total_fields.limit": 2000
}
}
Getting rid of the useless analyzed field.
I still have to check the performance of kibana. If someone has a performance safe solution to this problem please do not hesitate.
Thanks.

Set fields to not_analysed in all (future) types under an index in elasticsearch 1.7

I have an index with quite a few types, and new types' creation is not controlled by me. I know that the data structure is pretty solid, but I don't know the type's name in advance.
I would like to set some fields as not_analysed, while some should be analysed. Is there a way to achieve this?
I would also add to Val's excellent answer that you are probably wanting to add these dynamic templates to the _default_ mapping for your index, since you mentioned you do not know the types in advance. For example:
PUT /my_index/_mapping/_default_
{
"dynamic_templates": [
{
"analyzed": {
"match_mapping_type": "string",
"match": "*_text",
"mapping": {
"type": "string"
}
}
},
{
"not_analyzed": {
"match_mapping_type": "string",
"match": "*_key",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
}
]
}
With that in place, you can add any arbitrary type to the index and any fields in the documents added to the new type that end with "_text" will be analyzed. Any fields ending with "_key" will not be analyzed. You can read more about the default mapping in the docs.
Dynamic mappings are the way to go. Since you're mentioning analyzed vs not_analyzed I reckon you're talking about string fields.
The idea is to update your index and mapping in order to include a dynamic template for your string fields:
PUT my_index
{
"mappings": {
"my_type": {
"dynamic_templates": [ <--- include this section in your existing mapping
{
"analyzed": {
"match_mapping_type": "string",
"match": "field1",
"mapping": {
"type": "string"
}
}
},
{
"not_analyzed": {
"match_mapping_type": "string",
"match": "field2",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
}
]
}
}
}
Another way would be to make each new string field both analyzed and not_analyzed so you don't have to enumerate all your fields, simply using:
PUT my_index
{
"mappings": {
"my_type": {
"dynamic_templates": [ <--- include this section in your existing mapping
{
"strings": {
"match_mapping_type": "string", <-- match all string fields
"mapping": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
}
}
}
]
}
}
}

Elasticsearch: How to make all properties of object type as non analyzed?

I need to create an Elasticsearch mapping with an Object field whose keys are not known in advance. Also, the values can be integers or strings. But I want the values to be stored as non analyzed fields if they are strings. I tried the following mapping:
PUT /my_index/_mapping/test
{
"properties": {
"alert_text": {
"type": "object",
"index": "not_analyzed"
}
}
}
Now the index is created fine. But if I insert values like this:
POST /my_index/test
{
"alert_text": {
"1": "hello moto"
}
}
The value "hello moto" is stored as an analyzed field using standard analyzer. I want it to be stored as a non analyzed field. Is it possible if I don't know in advance what all keys can be present ?
Try dynamic templates. With this feature you can configure a set of rules for the fields that are created dynamically.
In this example I've configured the rule that I think you need, i.e, all the strings fields within alert_text are not_analyzed:
PUT /my_index
{
"mappings": {
"test": {
"properties": {
"alert_text": {
"type": "object"
}
},
"dynamic_templates": [
{
"alert_text_strings": {
"path_match": "alert_text.*",
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
}
]
}
}
}
POST /my_index/test
{
"alert_text": {
"1": "hello moto"
}
}
After executing the requests above you can execute this query to show the current mapping:
GET /my_index/_mapping
And you will obtain:
{
"my_index": {
"mappings": {
"test": {
"dynamic_templates": [
{
"alert_text_strings": {
"mapping": {
"index": "not_analyzed",
"type": "string"
},
"match_mapping_type": "string",
"path_match": "alert_text.*"
}
}
],
"properties": {
"alert_text": {
"properties": {
"1": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
}
Where you can see that alert_text.1 is stored as not_analyzed.

Resources