Elasticsearch memory usage of fielddata - elasticsearch

I created a simple autocomplete using ElasticSearch following the article http://www.bilyachat.com/2015/07/search-like-google-with-elasticsearch.html.
The below is a simple example.
https://gist.github.com/jinkyou/ac92c0d9fc53860b703ac773af03b0da
At first, I didn't set "fielddata": true to the autocomplete property.
Then ES returned an error.
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [autocomplete] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
}
So, I added "fielddata": true to the property of autocomplete and it works fine.
But after the documents are increased, [fielddata] Data too large, data for [autocomplete] would be larger than limit of [249416908/237.8mb]] error occurred.
Here's the my questions.
Is it a right way to implement 'autocomplete'?
I though it's a common function of search engine, but there is no suggester for it in ES. Am I right?
Is 'fielddata' is necessary for this function?
I think even it's not necessary for the function, I need it to response in short time.
How can I reduce fielddata size? or increase fielddata limit size?
I'm gonna try to decrease max_shingle_size. If there is any other good way, please let me know.
What is the real memory usage of ES
I want to figure out memory consumption following document size, but the result of GET _nodes/stats contains strange things that
{
"os": {
"mem": {
"total_in_bytes": 128922271744,
"free_in_bytes": 2966560768,
"used_in_bytes": 125955710976,
"free_percent": 2,
"used_percent": 98
}
}
}
I just turn on the ES and there is no index but it shows almost memories are in use.
Thank you so much to read this.

Related

Сount how many requests there were to a specific document

I'm using elasticsearch index as a cache table for some kind of search API.
I am currently using the following mapping:
{
"mappings": {
"dynamic": False,
"properties": {
"query_str": {"type": "text"},
"search_results": {
"type": "object",
"enabled": false
},
"query_embedding": {
"type": "dense_vector",
"dims": 768,
},
}
}
The cache search is performed via embedding vector similarity. So if the embedding of the new query is close enough to a cached one, it is considered as a cache hit, and search_results field is returned to the user.
I want to clear cached search results due to their unpopularity among users (i.e. low cache hitrate). Because of that, I need to count how many cache hits (i.e. request hits) there were to each document for a certain period of time (last month for example).
I understand, that I can explicitly add a hit_rate field and update it every time when the new query hits some cashed query, but is there a more elegant way to do this (maybe via some built-in elasticsearch statistic)?
That's not possible. Actually the App Search product has an analytics feature that records the document clicks and uses a different index to do that (also store the search query).

Elastic Beats - Changing the Field Type of Default Fields in Beats Documents?

I'm still fairly new to the Elastic Stack and I'm still not seeing the entire picture from what I'm reading on this topic.
Let's say I'm using the latest versions of Filebeat or Metricbeat for example, and pushing that data to Logstash output, (which is then configured to push to ES). I want an "out of the box" field from one of these beats to have its field type changed (example: change beat.hostname from it's current default "text" type to "keyword"), what is the best place/practice for configuring this? This kind of change is something I would want consistent across multiple hosts running the same Beat.
I wouldn't change any existing fields since Kibana is building a lot of visualizations, dashboards, SIEM,... on the exptected fields + data types.
Instead extend (add, don't change) the default mapping if needed. On top of the default index template, you can add your own and they will be merged. Adding more fields will require some more disk space (and probably memory when loading), but it should be manageable and avoids a lot of drawbacks of other approaches.
Agreed with #xeraa. It is not advised to change the default template since that field might be used in any default visualizations.
Create a new template, you can have multiple templates for the same index pattern. All the mappings will be merged.The order of the merging can be controlled using the order parameter, with lower order being applied first, and higher orders overriding them.
For your case, probably create a multi-field for any field that needs to be changed. Eg: As shown here create a new keyword multifield, then you can refer the new field as
fieldname.raw
.
"properties": {
"city": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
The other answers are correct but I did the below in Dev console to update the message field from text to text & keyword
PUT /index_name/_mapping
{
"properties": {
"message": {
"type": "match_only_text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 10000
}
}
}
}
}

Elasticsearch - Making a field aggregatable but not searchable

My elasticsearch data has a large number of fields that I don't need to search by. But I would like to get aggregations like percentiles, median, count, avg. etc. on these fields.
Is there a way to disable searchability of a field but let it still be aggregatable?
Most of the fields are indexed by default and hence make them searchable. If you want to make a field non-searchable all you have to do is set its index param as false and doc_values to true.
As per elastic documentation:
All fields which support doc values have them enabled by default.
So you need not explicitly set "doc_values": true for such fields.
For e.g.
{
"mappings": {
"_doc": {
"properties": {
"only_agg": {
"type": "keyword",
"index": false
}
}
}
}
}
If you try to search on field only_agg in above example, elastic will throw exception with reason as below:
Cannot search on field [only_agg] since it is not indexed.
yeah take a look at doc_value:
https://www.elastic.co/guide/en/elasticsearch/reference/current/doc-values.html

Fielddata is disabled on text fields by default

I've encountered a classic problem, hovewer, no page on SO or any other Q&A or forum has helped me.
I need to extract a numerical value of parameter "wsProcessingElapsedTimeMS" out of string, like (where the parameter is contained in the message field):
2018-07-31 07:37:43,740|DEBUG|[ACTIVE] ExecuteThread: '43' for queue:
'weblogic.kernel.Default (self-tuning)'
|LoggerHandler|logMessage|sessionId=9AWTu
wsOperationName=FindBen wsProcessingEndTime=2018-07-31 07:37:43.738
wsProcessingElapsedTimeMS=6 httpStatus=200 outgoingAddress=172.xxx.xxx.xxx
and keep getting error:
"type":"illegal_argument_exception","reason":"Fielddata is disabled on text
fields by default. Set fielddata=true on [message] in order to load fielddata
in memory by uninverting the inverted index. Note that this can however use
significant memory. Alternatively use a keyword field instead."
Point is, I've already did run the query (by Dev Tools in Kibana GUI if that matters) to mark a field as fielddata in following way:
PUT my_index/_mapping/message
{
"message": {
"properties": {
"publisher": {
"type": "text",
"fielddata": true
}
}
}
}
, which returned brief information:
{
"acknowledged": true
}
After which I've tried to rebuilt the index like:
POST _reindex?wait_for_completion=false
{
"source": {
"index": "my_index"
},
"dest": {
"index": "my_index"
}
}
(the ?wait_for_completion=false flag is set because otherwise it was a timeout; theres a lot of data in the system now).
And finally, having performed above steps, I've also tried to relaunch the kibana and elasticsearch services (processes) to force a reindexing (which tool really long).
Also, using the "message.keyword" instead of "message" (as suggested in the official documentation) is not helping - it's just empty in most of cases.
I'm using the Kibana to access the ElasticSearch engine.
ElasticSearch v. 5.6.3
Kibana v. 5.6.3
Logstash v. 5.5.0
Any suggestion will be appreciated, even regarding the use of additional plugins (provided they have a release compliant with above Kibana/ElasticSearch/Logstash versions, as I can't update them to newer right now).
PUT your_index/_mapping/your_type
{
"your_type": {
"properties": {
"publisher": {
"type": "text",
"fielddata": true
}
}
}
}

FIELDDATA Data is too large

I open kibana and do a search and i get the error where shards failed. I looked in the elasticsearch.log file and I saw this error:
org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large, data for [#timestamp] would be larger than limit of [622775500/593.9mb]
Is there any way to increase that limit of 593.9mb?
You can try to increase the fielddata circuit breaker limit to 75% (default is 60%) in your elasticsearch.yml config file and restart your cluster:
indices.breaker.fielddata.limit: 75%
Or if you prefer to not restart your cluster you can change the setting dynamically using:
curl -XPUT localhost:9200/_cluster/settings -d '{
"persistent" : {
"indices.breaker.fielddata.limit" : "40%"
}
}'
Give it a try.
I meet this problem,too.
Then i check the fielddata memory.
use below request:
GET /_stats/fielddata?fields=*
the output display:
"logstash-2016.04.02": {
"primaries": {
"fielddata": {
"memory_size_in_bytes": 53009116,
"evictions": 0,
"fields": {
}
}
},
"total": {
"fielddata": {
"memory_size_in_bytes": 53009116,
"evictions": 0,
"fields": {
}
}
}
},
"logstash-2016.04.29": {
"primaries": {
"fielddata": {
"memory_size_in_bytes":0,
"evictions": 0,
"fields": {
}
}
},
"total": {
"fielddata": {
"memory_size_in_bytes":0,
"evictions": 0,
"fields": {
}
}
}
},
you can see my indexes name base datetime, and evictions is all 0. Addition, 2016.04.02 memory is 53009116, but 2016.04.29 is 0, too.
so i can make conclusion, the old data have occupy all memory, so new data cant use it, and then when i make agg query new data , it raise the CircuitBreakingException
you can set config/elasticsearch.yml
indices.fielddata.cache.size: 20%
it make es can evict data when reach the memory limit.
but may be the real solution you should add you memory in furture.and monitor the fielddata memory use is good habits.
more detail: https://www.elastic.co/guide/en/elasticsearch/guide/current/_limiting_memory_usage.html
Alternative solution for CircuitBreakingException: [FIELDDATA] Data too large error is cleanup the old/unused FIELDDATA cache.
I found out that fielddata.limit been shared across indices, so deleting a cache of an unused indice/field can solve the problem.
curl -X POST "localhost:9200/MY_INDICE/_cache/clear?fields=foo,bar"
For more info https://www.elastic.co/guide/en/elasticsearch/reference/7.x/indices-clearcache.html
I think it is important to understand why this is happening in the first place.
In my case, I had this error because I was running aggregations on "analyzed" fields. In case you really need your string field to be analyzed, you should consider using multifields and make it analyzed for searches and not_analyzed for aggregations.
I ran into this issue the other day. In addition to checking the fielddata memory, I'd also consider checking the JVM and OS memory as well. In my case, the admin forgot to modify the ES_HEAP_SIZE and left it at 1gig.
just use:
ES_JAVA_OPTS="-Xms10g -Xmx10g" ./bin/elasticsearch
since the default heap is 1G, if your data is big ,you should set it bigger

Resources