Exceeding maximum length of field in elasticsearch - error in kibana - elasticsearch

Discover: The length of [message] field of [-CSnZmwB_xkQcDCOrP1V] doc of [prod_logs] index has exceeded [1000000] - maximum allowed to be analyzed for highlighting. This maximum can be set by changing the [index.highlight.max_analyzed_offset] index level setting. For large texts, indexing with offsets or term vectors is recommended!
I get the above error in Kibana. I use ELK version 7.2.0. Answers / Suggestions are most welcome.

You should change your mapping.If you can not update your mapping create a temp new index.And add term_vector your big text field
"mappings": {
"properties": {
"sample_field": {
"type": "text",
"term_vector": "with_positions_offsets"
}
}
}
Then clone your data to new index.
POST /_reindex
{
"source": {
"index": "old_index"
},
"dest": {
"index": "new_index"
}
}
Then use "unified" in highlight query.
"highlight": {
"fields": {
"textString": {
"type": "unified"
}
}
like that.

Related

Changing mapping fields structure flow in Elasticsearch

I have an index with the mappings
{
"mappings": {
"properties": {
"title": {
"type": "text"
},
"location": {
"type": "keyword"
}
}
}
}
In location field at the moment we are storing the city name.
And we need to change the mapping structure to store also the country and state, so the mapping will be
{
"mappings": {
"properties": {
"title": {
"type": "text"
},
"location": {
"properties": {
"country": {
"type": "keyword"
},
"state": {
"type": "keyword"
},
"city": {
"type": "keyword"
}
}
}
}
}
}
What is the recommended flow for such migration?
Elasticsearch does not allow changing the definition of mapping for existing fields, just the addition of new field definitions as you can check here.
So one of the possibilities is:
create a new field definition, with a different name obviously, to store the new data type.
Stop to use the location field
The another but costly possibility is:
create a new index with the right mapping
do the reindex of the data from the old index to the new index
To reindex the data from the old index with the right format to the new index you can use a painless script:
POST /_reindex
{
"source": {
"index": "old_index_name"
},
"dest": {
"index": "new_index_name"
},
"script": {
"lang": "painless",
"params" : {
"location":{
"country" : null,
"state": null,
"city": null
}
},
"source": """
params.location.city = ctx._source.location
ctx._source.location = params.location
"""
}
}
After you can update country and state fields for the old data.
If you need the same index name, use the new index you created with the correct mapping just as a backup, then you need to delete the index with the old mapping and recreate it again with the same name using the correct mapping and bring the data that are in the other reserve index.
For more about change the mapping read CHANGE ELASTIC SEARCH MAPPING.
Follow these steps:
Create a new index
Reindex the existing index to populate the new index
Aliases can help cutover from over index to another

What is the correct setup for ElasticSearch 7.6.2 highlighting with FVH?

How to properly setup highlighting search words in huge documents using fast vector highlighter?
I've tried documentation and the following settings for the index (as Python literal, commented alternative settings, which I also tried, with store and without):
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"members": {
"dynamic": "strict",
"properties": {
"url": {
"type": "text",
"term_vector": "with_positions_offsets",
#"index_options": "offsets",
"store": True
},
"title": {
"type": "text",
#"index_options": "offsets",
"term_vector": "with_positions_offsets",
"store": True
},
"content": {
"type": "text",
#"index_options": "offsets",
"term_vector": "with_positions_offsets",
"store": True
}
}
}
}
}
Search done by the following query (again, commented places were tried one by one, in some combinations):
{
"query": {
"multi_match": {
"query": term,
"fields": ["url", "title", "content"]
},
},
"_source": {
#"includes": ["url", "title", "_id"],
# "excludes": ["content"]
},
"highlight": {
"number_of_fragments": 40,
"fragment_size": 80,
"fields": {
"content": {"matched_fields": ["content"]},
#"content": {"type": "fvh", "matched_fields": ["content"]},
#"title": {"type": "fvh", "matched_fields": ["title"]},
}
}
}
The problem is, that when FVH is not used, ElasticSearch complains that "content" field is too large. (And I do not want to increase the allowed size). When I add "fvh" type, ES complain that terms vectors are needed: Even though I've checked those are there by querying document info (offsets, starts, etc):
the field [content] should be indexed with term vector with position
offsets to be used with fast vector highlighter
It seems like:
When I omit "type": "fvh", it is not used even though documentation mentions it's the default when "term_vector": "with_positions_offsets".
I can see term vectors in the index, but ES does not find them. (indirectly, when indexing with term vectors the index is almost twice as large)
All the trials included removing old index and adding it again.
It's also so treacherous, that it fails only when a large document is encountered. Highlights are there for queries, where documents are small.
What is the proper way to setup highlights in ElasticSearch 7, free edition (I tried under Ubuntu with binary deb from the vendor)?
The fvh highlighter uses the Lucene Fast Vector highlighter. This highlighter can be used on fields with term_vector set to with_positions_offsets in the mapping. The fast vector highlighter requires setting term_vector to with_positions_offsets which increases the size of the index.
you can define a mapping like below for your field.
"mappings": {
"properties": {
"text": {
"type": "text",
"term_vector": "with_positions_offsets"
}
}
}
while querying for highlight fields, you need to use "type" : "fvh"
The fast vector highlighter will be used by default for the text field because term vectors are enabled.

Using both term and match query on same text field?

I have an index with a text field.
"state": {
"type": "text"
}
Now suppose there are two data.
"state": "vail"
and
"state": "eagle vail"
For one of my requirements,
- I need to do a term level query, such that if I type "vail", the search results should only return states with "vail" and not "eagle vail".
But another requirement for different search on the same index,
- I need to do a match query for full text search, such that if I type "vail", "eagle vail" should display as well.
So my question is, how do I do both term level and full text search in this field, as for doing a term level query, I would have to set it as "keyword" type such that it wont be analyzed.
You can use "multi-field" feature to achieve this. Here is a mapping:
{
"mappings": {
"my_type": {
"properties": {
"state": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
}
In this case state will act as text field (tokenized) whereas state.raw will be keyword (single-token). When indexing a document you should only set state. state.raw will be created automatically.

Why elasticsearch dynamic templates create explicit fields in the mapping?

The document that I want to index is as follows
{
"Under Armour": 0.16667,
"Skechers": 0.14774,
"Nike": 0.24404,
"New Balance": 0.11905,
"SONOMA Goods for Life": 0.11236
}
Fields under this node are dynamic, which means when documents are getting added various fields(brands) will come with those documents.
If I create an index without specifying a mapping, ES says "maximum number of fields (1000) have been reached". Though we can increase this value, it is not a good practice.
In order to support the above document, I created a mapping as follows and created an index.
{
"mappings": {
"my_type": {
"dynamic_templates": [
{
"template1":{
"match_mapping_type": "double",
"match": "*",
"mapping": {
"type": "float"
}
}
}
]
}
}
}
When I add above document to the created index and checked the mapping of the index again. It looks like as below.
{
"my_index": {
"mappings": {
"my_type": {
"dynamic_templates": [
{
"template1": {
"match": "*",
"match_mapping_type": "double",
"mapping": {
"type": "float"
}
}
}
],
"properties": {
"New Balance": {
"type": "float"
},
"Nike": {
"type": "float"
},
"SONOMA Goods for Life": {
"type": "float"
},
"Skechers": {
"type": "float"
},
"Under Armour": {
"type": "float"
}
}
}
}
}
}
If you clearly see the mapping that I created earlier and the mapping when I added a document to the index is different. It added fields statically added to the mapping. When I keep adding more documents, new fields will be added to the mapping (which will end up with maximum number of fields(1000) has been reached).
My question is,
The mapping that I mentioned above is correct for the above mentioned document.
If it is correct, why new fields are added to the mapping?
According to the posts that I read, increasing the number of fields in an index is not a good practice it may increase the resource usage.
In this case, when there are enormous number of brands are there and new brands to be introduced.
The proper solution for such a case is, introduce key-value pairs. (Probably I need to do a transformation during ETL)
{
"brands": [
{
"key": "Under Armour",
"value": 0.16667
},
{
"key": "Skechers",
"value": 0.14774
},
{
"key": "Nike",
"value": 0.24404
}
]
}
When the data is formatted as above, the map won't be change.
A good reading that I found was
https://www.elastic.co/blog/found-beginner-troubleshooting#keyvalue-woes
Thanks #Val for the suggestion

Kibana visualization not showing analyzed fields

I am working on a based facebook comments dashboard from facebook graph api using elasticsearch5 & kibana5. I add some analyzed fields and they are appearing in the discover part on Kibana but when going to the visualization i don't find those fields.
My facebook comments index :
PUT fb_comments
{
"settings": {
"analysis": {},
"mapping.ignore_malformed": true
},
"mappings": {
"fb_comment": {
"dynamic_templates": [
{
"created_time": {
"match": "created_time",
"mapping": {
"type": "date",
"format": "epoch_second"
}
}
},
{
"message": {
"match": "message",
"mapping": {
"type": "string",
"analyzer": "simple"
}
}
},
{
"strings": {
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
}
]
}
}
}
The field message the analyzed one is appearing in discover
The field message the analyzed one is not appearing in visualization part
I think it might be related to a memory limitation. As per Kibana 5 help, analyzed fields might required more memory.
I checked my memory and it is indeed used at its max capacity.
I finally found the solution.
So in elasticsearch 4.X we had string type and then you specified the type of analyzer if you wish to be analyzed. In elasticsearch 5.X we have two types keyword which is automatically aggregated and not analyzed, and the 2nd is text which is autmatically analyzed and not aggregated. The solution is if you want an analyzed field and aggregated at the same time you should add a property "fielddata":true and it will be analyzed and aggregated.

Resources