How to monitor all CircuitBreakers in ElasticSearch - elasticsearch

Is it possible to monitor all Circuit Breakers limits and size?
Fielddata Breaker can be monitored using this by node:
GET _nodes/stats/breaker,http
But how can we monitor the other Breakers like breaker.request and breaker.total ?
ElasticSearch-version: 1.3.5

I think those breakers are available from 1.4.x on. See this PR in github with details that seem to indicate this.
And I've tested shortly this and I can see the additional requests breaker:
"breakers": {
"request": {
"limit_size_in_bytes": 415550668,
"limit_size": "396.2mb",
"estimated_size_in_bytes": 0,
"estimated_size": "0b",
"overhead": 1,
"tripped": 0
},
"fielddata": {
"limit_size_in_bytes": 623326003,
"limit_size": "594.4mb",
"estimated_size_in_bytes": 2847496,
"estimated_size": "2.7mb",
"overhead": 1.03,
"tripped": 0
},
"parent": {
"limit_size_in_bytes": 727213670,
"limit_size": "693.5mb",
"estimated_size_in_bytes": 2847496,
"estimated_size": "2.7mb",
"overhead": 1,
"tripped": 0
}
}

Related

_update_by_query fails to update all documents in ElasticSearch

I have over 30 million documents in Elasticsearch (version - 6.3.3), I am trying to add new field to all existing documents and setting the value to 0.
For example: I want to add start field which does not exists previously in Twitter document, and set it's initial value to 0, in all 30 million documents.
In my case I was able to update 4 million only. If I try to check the submitted task with TASK API http://localhost:9200/_task/{taskId}, result from says something like ->
{
"completed": false,
"task": {
"node": "Jsecb8kBSdKLC47Q28O6Pg",
"id": 5968304,
"type": "transport",
"action": "indices:data/write/update/byquery",
"status": {
"total": 34002005,
"updated": 3618000,
"created": 0,
"deleted": 0,
"batches": 3619,
"version_conflicts": 0,
"noops": 0,
"retries": {
"bulk": 0,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1.0,
"throttled_until_millis": 0
},
"description": "update-by-query [Twitter][tweet] updated with Script{type=inline, lang='painless', idOrCode='ctx._source.Twitter.start = 0;', options={}, params={}}",
"start_time_in_millis": 1574677050104,
"running_time_in_nanos": 466805438290,
"cancellable": true,
"headers": {}
}
}
The query I am executing against ES , is something like:
curl -XPOST "http://localhost:9200/_update_by_query?wait_for_completion=false&conflicts=proceed" -H 'Content-Type: application/json' -d'
{
"script": {
"source": "ctx._source.Twitter.start = 0;"
},
"query": {
"exists": {
"field": "Twitter"
}
}
}'
Any suggestions would be great, thanks

Determine Hadoop cluster resources used by completed job

How can you determine Hadoop cluster resources used by a completed job?
Our cluster resource manager is yarn. Access to certain yarn API endpoints is available through HTTP; for example:
curl -L http://my.hadoop.instance:8088/ws/v1/cluster/apps/application_1547448533998_502644
would return:
{
"app": {
"allocatedMB": -1,
"allocatedVCores": -1,
"amContainerLogs": "http://someNode.hadoop.instance:8042/node/containerlogs/container_e149_1547448533998_502644_01_000001/someUser",
"amHostHttpAddress": "someNode.hadoop.instance:8042",
"amNodeLabelExpression": "",
"applicationTags": "",
"applicationType": "SPARK",
"clusterId": 1547448533998,
"clusterUsagePercentage": 0.0,
"diagnostics": "",
"elapsedTime": 583889,
"finalStatus": "SUCCEEDED",
"finishedTime": 1550621490747,
"id": "application_1547448533998_502644",
"logAggregationStatus": "TIME_OUT",
"memorySeconds": 15821179,
"name": "ProjectCantor",
"numAMContainerPreempted": 0,
"numNonAMContainerPreempted": 0,
"preemptedResourceMB": 0,
"preemptedResourceVCores": 0,
"priority": 0,
"progress": 100.0,
"queue": "dsg",
"queueUsagePercentage": 0.0,
"runningContainers": -1,
"startedTime": 1550620906858,
"state": "FINISHED",
"trackingUI": "History",
"trackingUrl": "http://my.hadoop.instance:8088/proxy/application_1547448533998_502644/",
"unmanagedApplication": false,
"user": "someUser",
"vcoreSeconds": 14713
}
}
However, there are several attributes such as allocatedMB and allocatedVCores set to -1 rather than any meaningful value.
Thanks in advance.

Elasticsearch queries consuming 100% of CPU

I'm still relatively new to Elasticsearch and, currently, I'm attempting to switch from Solr to Elasticsearch and am seeing a huge increase in CPU usage when ES is on our production website. The site sees anywhere from 10,000 to 30,000 requests to ES per second. Solr handles that load just fine with our current hardware.
The books index mapping: https://pastebin.com/bKM9egPS
A query for a book: https://pastebin.com/AdfZ895X
ES is hosted on AWS on an m4.xlarge.elasticsearch instance.
Our cluster is set up as follows (anything not included is default):
"persistent": {
"cluster": {
"routing": {
"allocation": {
"cluster_concurrent_rebalance": "2",
"node_concurrent_recoveries": "2",
"disk": {
"watermark": {
"low": "15.0gb",
"flood_stage": "5.0gb",
"high": "10.0gb"
}
},
"node_initial_primaries_recoveries": "4"
}
}
},
"indices": {
"recovery": {
"max_bytes_per_sec": "60mb"
}
}
Our nodes have the following configuration:
"_nodes": {
"total": 2,
"successful": 2,
"failed": 0
},
"cluster_name": "cluster",
"nodes": {
"####": {
"name": "node1",
"version": "6.3.1",
"build_flavor": "oss",
"build_type": "zip",
"build_hash": "####",
"roles": [
"master",
"data",
"ingest"
]
},
"###": {
"name": "node2",
"version": "6.3.1",
"build_flavor": "oss",
"build_type": "zip",
"build_hash": "###",
"roles": [
"master",
"data",
"ingest"
]
}
}
Can someone please help me figure out what exactly is happening so I can get this deployment finished?

elasticsearch spend all time in build_scorer

When we've upgraded our ES from ES 1.4 to ES 5.2 we got performance problem with such type of queries:
{
"_source": false,
"from": 0,
"size": 50,
"profile": true,
"query": {
"bool": {
"filter": [
{
"ids": {
"values": [<list of 400 ids>],
"boost": 1
}
}
],
"should": [
{
"terms": {
"views": [ <list od 20 ints> ]
}
]
"minimum_should_match": "0",
"boost": 1
}
}
}
When profiling we found that the problem with build_scorer, which call for each segment:
1 shard;
20 segments;
took: 55
{
"type": "BooleanQuery",
"description": "views:[9875227 TO 9875227] views:[6991599 TO 6991599] views:[6682953 TO 6682953] views:[6568587 TO 6568587] views:[10080097 TO 10080097] views:[9200174 TO 9200174] views:[9200174 TO 9200174] views:[10080097 TO 10080097] views:[9966870 TO 9966870] views:[6568587 TO 6568587] views:[6568587 TO 6568587] views:[8538669 TO 8538669] views:[8835038 TO 8835038] views:[9200174 TO 9200174] views:[7539089 TO 7539089] views:[6991599 TO 6991599] views:[8222303 TO 8222303] views:[9342166 TO 9342166] views:[7828288 TO 7828288] views:[9699294 TO 9699294] views:[9108691 TO 9108691] views:[9431297 TO 9431297] views:[7539089 TO 7539089] views:[6032694 TO 6032694] views:[9491741 TO 9491741] views:[9498225 TO 9498225] views:[8051047 TO 8051047] views:[9866955 TO 9866955] views:[8222303 TO 8222303] views:[9622214 TO 9622214]",
"time": "39.70427700ms",
"breakdown": {
"score": 99757,
"build_scorer_count": 20,
"match_count": 0,
"create_weight": 37150,
"next_doc": 0,
"match": 0,
"create_weight_count": 1,
"next_doc_count": 0,
"score_count": 110,
"build_scorer": 38648674,
"advance": 918274,
"advance_count": 291
},
So 38 ms of total 55ms was taken by build_scorer, it seems weired.
On ES 1.5 we have about the same number of segments but query run 10x faster
Unfortunately ES 1.x doesn't have profiler to check how many times build_scorer executes in ES 1.x
So the question is why build_scorer_count equal to number of segments and how can we tackle this performance issue?

What is the meaning of throttle_time_in_millis Elasticsearch stats?

I created an index in a 4 nodes Elasticsearch cluster. I added about 3.5 M documents using the java Elasticsearch API.
When asking for the stats i get a very high number in throttle_time_in_millis as follows:
{
"_shards": {
"total": 10,
"successful": 10,
"failed": 0
},
"_all": {
"primaries": {
"docs": {
"count": 3855540,
"deleted": 0
},
"store": {
"size_in_bytes": 1203074796,
"throttle_time_in_millis": 980255
},
"indexing": {
"index_total": 3855540,
"index_time_in_millis": 426300,
"index_current": 0,
"delete_total": 0,
"delete_time_in_millis": 0,
"delete_current": 0
},
What is the meaning of throttle_time_in_millis?
What could be the reason for this to increase?
Thx in advance
I'm not 100% sure on this but looking at the Java source code available here and the description of Store stats available here I think this is a measure of total time taken to merge the segments. It could be an indication of poor disk I/O. an increase in throttle_time_in_millis would mean the disk is failing, but only if you have previous benchmarks that show it used to be quicker. If this figure is consistently this high then I would argue it's just a symptom of the disk type you're using or the number of documents you are storing. If you're using a traditional HDD could you try a switch to an SSD?

Resources