cluster.routing.allocation.enable: none seems not working - elasticsearch

I disable the shard allocation with following snippet:
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.enable": "none"
}
}
I double check GET _cluster/settings' and confirm it has been set with none.
But when I use the following snippet trying to move a shard between nodes. And the move succeeds,
looks the "cluster.routing.allocation.enable": "none" doesn't take effect?
POST /_cluster/reroute
{
"commands": [
{
"move": {
"index": "lib38",
"shard": 0,
"from_node": "node-1",
"to_node": "node-3"
}
}
]
}

Manually rerouting a shard will always take precedence over configuration.
cluster.routing.allocation.enable is only giving a hint to the cluster that automatic reallocation should not take place.
In your other question, you were concerned about automatic rebalancing, it seems.

Related

Adding filter to elasticsearch return nothing

I have an ElasticSearch 7.9 single node instance setup with 0 documents and trying to add a filter by following the documentation example . When I try to add a filter by issuing a PUT on index my_web (an index that exists)
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "whitespace",
"filter": [ "stop" ]
}
}
}
}
}
I get no response from the server. If I issue a GET with _ml/filters/ it responds showing there are 0 filters.
Do I have badly formed JSON or something? It's pretty frustrating not to receive any response at all...
I had 2 issues.
My application wasn't returning exception errors, so Elasticsearch was indeed reporting an error. Thanks to #Val in the comments for pushing me on the right track with this
Once I had this resolved, I could see there was a resource_already_exists_exception because the command was trying to add an index that already existed (I thought it would just replace it)
The solution found in Elasticsearch documentation To add an analyzer, you must close the index, define the analyzer, and reopen the index:
POST /my-index-000001/_close
PUT /my-index-000001/_settings
{
"analysis" : {
"analyzer":{
"content":{
"type":"custom",
"tokenizer":"whitespace"
}
}
}
}
POST /my-index-000001/_open

Issue setting up ElasticSearch Index Lifecycle policy with pipeline date index name

I'm new to setting up a proper Lifecycle policy, so I'm hoping someone can please give me a hand with this. So, I have an existing index getting created on a weekly basis. This is a third party integration (they provided me with the pipeline and index template for the incoming logs). Logs are being created weekly in the pattern "name-YYYY-MM-DD". I'm attempting to setup a lifecycle policy for these indexes so they transition from hot->warm->delete. So far, I have done the following:
Updated the index template to add the policy and set an alias:
{
"index": {
"lifecycle": {
"name": "Cloudflare",
"rollover_alias": "cloudflare"
},
"mapping": {
"ignore_malformed": "true"
},
"number_of_shards": "1",
"number_of_replicas": "1"
On the existing indexes, set the alias and which one is the "write" index:
POST /_aliases
{
"actions" : [
{
"add" : {
"index" : "cloudflare-2020-07-13",
"alias" : "cloudflare",
"is_write_index" : true
}
}
]
}
POST /_aliases
{
"actions" : [
{
"add" : {
"index" : "cloudflare-2020-07-06",
"alias" : "cloudflare",
"is_write_index" : false
}
}
]
}
Once I did that, I started seeing the following 2 errors (1 on each index):
ILM error #1
ILM error #2
I'm not sure why the "is not the write index" error is showing up on the older index. Perhaps this is because it is still "hot" and trying to move it to another phase without it being the write index?
For the second error, is this because the name of the index is wrong for rollover?
I'm also not clear if this is a good scenario for rollover. These indexes are being created weekly, which I assume is ok. I would think normally you would create a single index and let the policy split off the older ones based upon your criteria (size, age, etc). Should I change this or can I make this policy work with existing weekly files? In case you need it, here is part of the pipeline that I imported into ElasticSearch that I believe is responsible for the index naming:
{
"date_index_name" : {
"field" : "EdgeStartTimestamp",
"index_name_prefix" : "cloudflare-",
"date_rounding" : "w",
"timezone" : "UTC",
"date_formats" : [
"uuuu-MM-dd'T'HH:mm:ssX",
"uuuu-MM-dd'T'HH:mm:ss.SSSX",
"yyyy-MM-dd'T'HH:mm:ssZ",
"yyyy-MM-dd'T'HH:mm:ss.SSSZ"
]
}
},
So, for me at the moment the more important error is the "number_format_exception". I'm thinking it is due to this setting I'm seeing in the index (provided_name):
{
"settings": {
"index": {
"lifecycle": {
"name": "Cloudflare",
"rollover_alias": "cloudflare"
},
"mapping": {
"ignore_malformed": "true"
},
"number_of_shards": "1",
"provided_name": "<cloudflare-{2020-07-20||/w{yyyy-MM-dd|UTC}}>",
"creation_date": "1595203589799",
"priority": "100",
"number_of_replicas": "1",
I believe this "provided_name" is getting established from the pipeline's "date_index_name" I provided above. If this is the issue, is there a way to create a fixed index name via the ingest pipeline without it changing based upon the date? I would rather just create a fixed index and let the lifecycle policy handle the split offs (i.e. 0001, 0002, etc).
I've been looking for a way to create a fixed index name without the "date_index_name" processor, but I haven't found a way to do this yet. Or, if I can create an index name with a date and add a suffix that would allow the LifeCycle policy manager (ILM) to add the incremental number at the end, that might work as well. Any help here would be greatly appreciated!
The main issue is that the existing indexes do not end with a sequence number (i.e. 0001, 0002, etc), hence the ILM doesn't really know how to proceed.
The name of this index must match the template’s index pattern and end with a number
You'd be better off letting ILM manage the index creation and rollover, since that's exactly what it's supposed to do. All you need to do is to keep writing to the same cloudflare alias and that's it. No need for a date_index_name ingest processor.
So your index template is correct as it is.
Next you need to bootstrap the initial index
PUT cloudflare-2020-08-11-000001
{
"aliases": {
"cloudflare": {
"is_write_index": true
}
}
}
You can then either reindex your old indices into ILM-managed indices or apply lifecycle policies to your old indices.

Fielddata is disabled on text fields by default

I've encountered a classic problem, hovewer, no page on SO or any other Q&A or forum has helped me.
I need to extract a numerical value of parameter "wsProcessingElapsedTimeMS" out of string, like (where the parameter is contained in the message field):
2018-07-31 07:37:43,740|DEBUG|[ACTIVE] ExecuteThread: '43' for queue:
'weblogic.kernel.Default (self-tuning)'
|LoggerHandler|logMessage|sessionId=9AWTu
wsOperationName=FindBen wsProcessingEndTime=2018-07-31 07:37:43.738
wsProcessingElapsedTimeMS=6 httpStatus=200 outgoingAddress=172.xxx.xxx.xxx
and keep getting error:
"type":"illegal_argument_exception","reason":"Fielddata is disabled on text
fields by default. Set fielddata=true on [message] in order to load fielddata
in memory by uninverting the inverted index. Note that this can however use
significant memory. Alternatively use a keyword field instead."
Point is, I've already did run the query (by Dev Tools in Kibana GUI if that matters) to mark a field as fielddata in following way:
PUT my_index/_mapping/message
{
"message": {
"properties": {
"publisher": {
"type": "text",
"fielddata": true
}
}
}
}
, which returned brief information:
{
"acknowledged": true
}
After which I've tried to rebuilt the index like:
POST _reindex?wait_for_completion=false
{
"source": {
"index": "my_index"
},
"dest": {
"index": "my_index"
}
}
(the ?wait_for_completion=false flag is set because otherwise it was a timeout; theres a lot of data in the system now).
And finally, having performed above steps, I've also tried to relaunch the kibana and elasticsearch services (processes) to force a reindexing (which tool really long).
Also, using the "message.keyword" instead of "message" (as suggested in the official documentation) is not helping - it's just empty in most of cases.
I'm using the Kibana to access the ElasticSearch engine.
ElasticSearch v. 5.6.3
Kibana v. 5.6.3
Logstash v. 5.5.0
Any suggestion will be appreciated, even regarding the use of additional plugins (provided they have a release compliant with above Kibana/ElasticSearch/Logstash versions, as I can't update them to newer right now).
PUT your_index/_mapping/your_type
{
"your_type": {
"properties": {
"publisher": {
"type": "text",
"fielddata": true
}
}
}
}

FIELDDATA Data is too large

I open kibana and do a search and i get the error where shards failed. I looked in the elasticsearch.log file and I saw this error:
org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large, data for [#timestamp] would be larger than limit of [622775500/593.9mb]
Is there any way to increase that limit of 593.9mb?
You can try to increase the fielddata circuit breaker limit to 75% (default is 60%) in your elasticsearch.yml config file and restart your cluster:
indices.breaker.fielddata.limit: 75%
Or if you prefer to not restart your cluster you can change the setting dynamically using:
curl -XPUT localhost:9200/_cluster/settings -d '{
"persistent" : {
"indices.breaker.fielddata.limit" : "40%"
}
}'
Give it a try.
I meet this problem,too.
Then i check the fielddata memory.
use below request:
GET /_stats/fielddata?fields=*
the output display:
"logstash-2016.04.02": {
"primaries": {
"fielddata": {
"memory_size_in_bytes": 53009116,
"evictions": 0,
"fields": {
}
}
},
"total": {
"fielddata": {
"memory_size_in_bytes": 53009116,
"evictions": 0,
"fields": {
}
}
}
},
"logstash-2016.04.29": {
"primaries": {
"fielddata": {
"memory_size_in_bytes":0,
"evictions": 0,
"fields": {
}
}
},
"total": {
"fielddata": {
"memory_size_in_bytes":0,
"evictions": 0,
"fields": {
}
}
}
},
you can see my indexes name base datetime, and evictions is all 0. Addition, 2016.04.02 memory is 53009116, but 2016.04.29 is 0, too.
so i can make conclusion, the old data have occupy all memory, so new data cant use it, and then when i make agg query new data , it raise the CircuitBreakingException
you can set config/elasticsearch.yml
indices.fielddata.cache.size: 20%
it make es can evict data when reach the memory limit.
but may be the real solution you should add you memory in furture.and monitor the fielddata memory use is good habits.
more detail: https://www.elastic.co/guide/en/elasticsearch/guide/current/_limiting_memory_usage.html
Alternative solution for CircuitBreakingException: [FIELDDATA] Data too large error is cleanup the old/unused FIELDDATA cache.
I found out that fielddata.limit been shared across indices, so deleting a cache of an unused indice/field can solve the problem.
curl -X POST "localhost:9200/MY_INDICE/_cache/clear?fields=foo,bar"
For more info https://www.elastic.co/guide/en/elasticsearch/reference/7.x/indices-clearcache.html
I think it is important to understand why this is happening in the first place.
In my case, I had this error because I was running aggregations on "analyzed" fields. In case you really need your string field to be analyzed, you should consider using multifields and make it analyzed for searches and not_analyzed for aggregations.
I ran into this issue the other day. In addition to checking the fielddata memory, I'd also consider checking the JVM and OS memory as well. In my case, the admin forgot to modify the ES_HEAP_SIZE and left it at 1gig.
just use:
ES_JAVA_OPTS="-Xms10g -Xmx10g" ./bin/elasticsearch
since the default heap is 1G, if your data is big ,you should set it bigger

elasticsearch in memory speed

I'm trying to test how much faster would be the in-memory solution with elasticsearch.
For this, I wrote a test in which Im generating ~10milion records and after that performing a text search. Result comes in 3-20ms but there is no difference (at all) when I do the search in memory and without this setting. Is it possible? Is 10million records too small to see any difference? Im not even 100% sure if I enabled the in-memory mode correctly. Im loading the settings from a json file, in which I places some settings I found on internet that was supposed to improve overall solution, but it seems like its not working at all.
The settings regarding index looks like this:
"index": {
"store": {
"type":"memory"
},
"merge": {
"policy": {
"use_compound_file": false
}
},
"translog": {
"flush_threshold": 50000
},
"engine": {
"robin": {
"refresh_interval": 2
}
},
"cache": {
"field": {
"max_size": 500000,
"expire": "30m"
}
}
},
"indices": {
"memory": {
"index_buffer_size": 256
}
},
I don't know if you are using in-memory storage wisely or not. you can just match what type of storage do you need here.
But, You have to provide storage setting, while creating the index (make sure that index doesn't exists previously)
Try this,
curl -XPUT "http://localhost:9200/my_index/" -d'
{
"settings": {
"index.store.type": "memory"
}
}'
This will create index, which will stores the index in main memory, using Lucene’s RamIndexStore.

Resources