Circuit breaker exception elasticsearch - elasticsearch

{
'error': {
'type': 'circuit_breaking_exception',
'reason': '[parent] Data too large, data for [<http_request>] would be [123848638/118.1mb], which is larger than the limit of [123273216/117.5mb], real usage: [120182112/114.6mb], new bytes reserved: [3666526/3.4mb]',
'bytes_wanted': 123848638,
'bytes_limit': 123273216,
'durability': 'TRANSIENT'
},
'status': 429
}
I am trying to understand the above circuit breaker error.
[123273216/117.5mb] - parent circuit breaker limit (95%).
new bytes reserved [3666526/3.4mb] - this means the new request memory needs
[123848638/118.1mb] - current heap + new bytes reserved
real usage: [120182112/114.6mb] - current heap status
Is my understanding correct?

Yes, that's correct, basically:
the current heap usage is 120,182,112
the new bytes needed to carry out the request is 3,666,526
the bytes wanted from memory would thus be 1 + 2 = 123,848,638
the total heap is 129,761,280
the maximum reservable heap memory for the parent circuit breaker is 95% of total heap = 123,273,216
Since 3 (bytes wanted) > 5 (circuit breaker limit), the circuit breaker trips

Related

EHCache has too many misses on the heap tier

I am using a standalone EHcache 3.10 with heap tier, and disk tier.
The heap is configured to contain 100 entries with no expirations.
Actually, the algorithm insert only 30 entries into the cache, but it perform many "updates" and "reads" of these 30 entries.
Before I insert a new entry to the cache, there is a check if the entry already exist.
Therefore when I check ehcache statistics, I expect to see only 30 - misses on the heap tier, and 30 misses on the disk tier. But,
Instead I get 9806 misses on the heap tier, and 9806 hits on the disk tier. (Meaning 9806 times ehcache did not found the entry on the heap, but instead it was found on the disk)
These numbers make no sense to me, since the heap tier should contain 100 entries, so, why there are so many misses?
Here is my configuration:
statisticsService = new DefaultStatisticsService();
// create temp dir
Path cacheDirectory = getCacheDirectory();
ResourcePoolsBuilder resourcePoolsBuilder =
ResourcePoolsBuilder.newResourcePoolsBuilder()
.heap(100, EntryUnit.ENTRIES)
.disk(Long.MAX_VALUE, MemoryUnit.B, true)
;
CacheConfigurationBuilder cacheConfigurationBuilder =
CacheConfigurationBuilder.newCacheConfigurationBuilder(
String.class, // The cache key
ARFileImpl.class, // The cache value
resourcePoolsBuilder)
.withExpiry(ExpiryPolicyBuilder.noExpiration()) // No expiration
.withResilienceStrategy(new ThrowingResilienceStrategy<>())
.withSizeOfMaxObjectGraph(100000);
// Create the cache manager
cacheManager =
CacheManagerBuilder.newCacheManagerBuilder()
// Set the persistent directory
.with(CacheManagerBuilder.persistence(cacheDirectory.toFile()))
.withCache(ARFILE_CACHE_NAME, cacheConfigurationBuilder)
.using(statisticsService)
.build(true);
Here is the result of the statistics:
Cache stats:
CacheExpirations: 0
CacheEvictions: 0
CacheGets: 6669684
CacheHits: 6669684
CacheMisses: 0
CacheHitPercentage: 100.0
CacheMissPercentage: 0.0
CachePuts: 10525
CacheRemovals: 0
Heap stats:
AllocatedByteSize: -1
Mappings: 30
Evictions: 0
Expirations: 0
Hits: 6659878
Misses: 9806
OccupiedByteSize: -1
Puts: 0
Removals: 0
Disk stats:
AllocatedByteSize: 22429696
Mappings: 30
Evictions: 0
Expirations: 0
Hits: 9806
Misses: 0
OccupiedByteSize: 9961952
Puts: 10525
Removals: 0
The reason for asking this question is that there are a lot of redundant disk reads, which result with performance degradation.

elasticsearch bulkload performance issue

We want to increase the speed of bulk-load.
Now we used JAVA to bulk load documents to Elasticsearch. We planned to import 10m documents each document size is almost 8M. Now we only can import 400K documents each day/ 5 documents every second.
Our ES infrastructure is 3 master node with 4G ES_JAVA_OPTS(heap size) 2 data nodes and 2 client nodes with 2G memory. When I want to increase the speed of bulk-load, we will get over the heap size issue. we set up the es cluster on Kubernetes.
The I/O is below.
dd if=/dev/zero of=/data/tmp/test1.img bs=1G count=10 oflag=dsync
10737418240 bytes (11 GB) copied, 50.7528 s, 212 MB/s
dd if=/dev/zero of=/data/tmp/test2.img bs=512 count=100000 oflag=dsync
51200000 bytes (51 MB) copied, 336.107 s, 152 kB/s
Any advice for the improvement?
for (int x =0; x<200000;x++) {
BulkRequest bulkRequest = new BulkRequest();
for (int k = 0; k < 50; k++) {
Order order = generateOrder();
IndexRequest indexRequest = new IndexRequest("orderpot", "orderpot");
Object esDataMap = objectToMap(order);
String source = JSONObject.valueToString(esDataMap);
indexRequest.source(source, XContentType.JSON);
bulkRequest.add(indexRequest);
}
rhlclient.bulk(bulkRequest, RequestOptions.DEFAULT);
over heap size
Seems you need more memory for data node.10m documents with 8M each will cost a lot of memory.And you can reduce the memory of master node and add on data nodes, master node need less memory than data nodes, and if there is no more nodes, you can combine the client nodes with data nodes, more data nodes with share the pressure.
Some other advise:
1. disable refresh by setting index.refresh_interval to -1 and set index.number_of_replicas to 0, when indexing.
2. set a mapping for your index, do not use default mapping, for example:some fields can be integer no need to use long, some fields can be text but keyword will never be used, and some fields will only be used as text.
[tune-for-indexing-speed given by official][1]https://www.elastic.co/guide/en/elasticsearch/reference/master/tune-for-indexing-speed.html

Understanding Spring Boot actuator `http.server.requests` metrics MAX attribute

can someone explain what does the MAX statistic refers to in the below response. I don't see it documented anywhere.
localhost:8081/actuator/metrics/http.server.requests?tag=uri:/myControllerMethod
Response:
{
"name":"http.server.requests",
"description":null,
"baseUnit":"milliseconds",
"measurements":[
{
"statistic":"COUNT",
"value":13
},
{
"statistic":"TOTAL_TIME",
"value":57.430899
},
{
"statistic":"MAX",
"value":0
}
],
"availableTags":[
{
"tag":"exception",
"values":[
"None"
]
},
{
"tag":"method",
"values":[
"GET"
]
},
{
"tag":"outcome",
"values":[
"SUCCESS"
]
},
{
"tag":"status",
"values":[
"200"
]
},
{
"tag":"commonTag",
"values":[
"somePrefix"
]
}
]
}
You can see the individual metrics by using ?tag=url:{endpoint_tag} as defined in the response of the root /actuator/metrics/http.server.requests call. The details of the measurements values are;
COUNT: Rate per second for calls.
TOTAL_TIME: The sum of the times recorded. Reported in the monitoring system's base unit of time
MAX: The maximum amount recorded. When this represents a time, it is reported in the monitoring system's base unit of time.
As given here, also here.
The discrepancies you are seeing is due to the presence of a timer. Meaning after some time currently defined MAX value for any tagged metric can be reset back to 0. Can you add some new calls to /myControllerMethod then immediately do a call to /actuator/metrics/http.server.requests to see a non-zero MAX value for given tag?
This is due to the idea behind getting MAX metric for each smaller period. When you are seeing these metrics, you will be able to get an array of MAX values rather than a single value for a long period of time.
You can get to see this in action within Micrometer source code. There is a rotate() method focused on resetting the MAX value to create above described behaviour.
You can see this is called for every poll() call, which is triggered every some period for metric gathering.
What does MAX represent
MAX represents the maximum time taken to execute endpoint.
Analysis for /user/asset/getAllAssets
COUNT TOTAL_TIME MAX
5 115 17
6 122 17 (Execution Time = 122 - 115 = 17)
7 131 17 (Execution Time = 131 - 122 = 17)
8 187 56 (Execution Time = 187 - 131 = 56)
9 204 56 From Now MAX will be 56 (Execution Time = 204 - 187 = 17)
Will MAX be 0 if we have less number of request (or 1 request) to the particular endpoint?
No number of request for particular endPoint does not affect the MAX (see Image from Spring Boot Admin)
When MAX will be 0
There is Timer which set the value 0. When the endpoint is not being called or executed for sometime Timer sets MAX to 0. Here approximate timer value is 2 minutes (120 seconds)
DistributionStatisticConfig has .expiry(Duration.ofMinutes(2)).
which sets some measurements to 0 if there is no request has been made in between expiry time or rotate time.
How I have determined the timer value?
For that, I have taken 6 samples (executed the same endpoint for 6 times). For that, I have determined the time difference between the time of calling the endpoint - time for when MAX set back to zero
More Details
UPDATE
Document has been updated.
NOTE:
Max for basic DistributionSummary implementations such as CumulativeDistributionSummary, StepDistributionSummary is a time
window max (TimeWindowMax).
It means that its value is the maximum value during a time window.
If the time window ends, it'll be reset to 0 and a new time window starts again.
Time window size will be the step size of the meter registry unless expiry in DistributionStatisticConfig is set to other value
explicitly.

Win dbg Dump OOM exception in IIS

Occasionally, we get an OutOfMemoryException in one of our IIS processes. I tried to analyze the dump but wasn't able to reach concrete conclusions. I also tried looking into MS hotfixes, found similar problems and resolutions, but not sure if its related or not: link
Below is the output of the !analyze -v command in WinDbg:
!analyze -v
[...]
CoInitialize failed 80010106
CoInitialize failed 80010106
CoInitialize failed 80010106
GetPageUrlData failed, server returned HTTP status 404
URL requested: http://watson.microsoft.com/StageOne/w3wp_exe/7_5_7601_17514/4ce7a5f8/unknown/0_0_0_0/bbbbbbb4/80000007/00000000.htm?Retriage=1
FAULTING_IP:
+75d2faf02afdbf0
00000000 ?? ???
EXCEPTION_RECORD: ffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 00000000
ExceptionCode: 80000007 (Wake debugger)
ExceptionFlags: 00000000
NumberParameters: 0
BUGCHECK_STR: 80000007
PROCESS_NAME: w3wp.exe
ERROR_CODE: (NTSTATUS) 0x80000007 - {Kernel Debugger Awakened} the system debugger was awakened by an interrupt.
EXCEPTION_CODE: (HRESULT) 0x80000007 (2147483655) - Operation aborted
MOD_LIST: *** ERROR: Could not build analysis XML
NTGLOBALFLAG: 0
APPLICATION_VERIFIER_FLAGS: 0
MANAGED_STACK: !dumpstack -EE
OS Thread Id: 0x2364 (0)
Current frame:
ChildEBP RetAddr Caller, Callee
DERIVED_WAIT_CHAIN:
Dl Eid Cid WaitType
-- --- ------- --------------------------
0 370.2364 Event
WAIT_CHAIN_COMMAND: ~0s;k;;
BLOCKING_THREAD: 00002364
DEFAULT_BUCKET_ID: APPLICATION_HANG_BlockedOn_EventHandle
PRIMARY_PROBLEM_CLASS: APPLICATION_HANG_BlockedOn_EventHandle
LAST_CONTROL_TRANSFER: from 758e149d to 778df8c1
FAULTING_THREAD: 00000000
STACK_TEXT:
002efb8c 758e149d 000001d4 00000000 00000000 ntdll!ZwWaitForSingleObject+0x15
002efbf8 75c71194 000001d4 ffffffff 00000000 KERNELBASE!WaitForSingleObjectEx+0x98
002efc10 75c71148 000001d4 ffffffff 00000000 kernel32!WaitForSingleObjectExImplementation+0x75
002efc24 7470765a 000001d4 ffffffff 747057c1 kernel32!WaitForSingleObject+0x12
002efc30 747057c1 00000000 74706f84 00a21320 w3wphost!WP_IPM::WaitForShutdown+0xb
002efc38 74706f84 00a21320 00a215d0 002efd58 w3wphost!W3WP_HOST::WaitForShutdown+0x11
002efc48 00a22bdb 002efc68 00a25708 00000001 w3wphost!AppHostInitialize+0x11e
002efd58 00a23584 0000000f 00702828 00703b48 w3wp!wmain+0x373
002efd9c 75c733aa fffde000 002efde8 778f9ed2 w3wp!_initterm_e+0x163
002efda8 778f9ed2 fffde000 71b16c75 00000000 kernel32!BaseThreadInitThunk+0xe
002efde8 778f9ea5 00a236b5 fffde000 ffffffff ntdll!__RtlUserThreadStart+0x70
002efe00 00000000 00a236b5 fffde000 00000000 ntdll!_RtlUserThreadStart+0x1b
FOLLOWUP_IP:
w3wphost!WP_IPM::WaitForShutdown+b
7470765a f60520d0707403 test byte ptr [w3wphost!g_dwDebugFlags (7470d020)],3
SYMBOL_STACK_INDEX: 4
SYMBOL_NAME: w3wphost!WP_IPM::WaitForShutdown+b
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: w3wphost
IMAGE_NAME: w3wphost.dll
DEBUG_FLR_IMAGE_TIMESTAMP: 4ce7a5d0
STACK_COMMAND: ~0s ; kb
BUCKET_ID: 80000007_w3wphost!WP_IPM::WaitForShutdown+b
FAILURE_BUCKET_ID: APPLICATION_HANG_BlockedOn_EventHandle_80000007_w3wphost.dll!WP_IPM::WaitForShutdown
WATSON_STAGEONE_URL: http://watson.microsoft.com/StageOne/w3wp_exe/7_5_7601_17514/4ce7a5f8/unknown/0_0_0_0/bbbbbbb4/80000007/00000000.htm?Retriage=1
Followup: MachineOwner
Additional information as requested from comments:
[0:000> !AnalyzeOOM
---------Heap 11---------
Managed OOM occured after GC #15967 (Requested to allocate 0 bytes)
Reason: Low on memory during GC
Detail: SOH: Failed to reserve memory (16777216 bytes)
---------Heap 20---------
Managed OOM occured after GC #15977 (Requested to allocate 0 bytes)
Reason: Low on memory during GC
Detail: SOH: Failed to reserve memory (16777216 bytes)
---------Heap 21---------
Managed OOM occured after GC #15979 (Requested to allocate 0 bytes)
Reason: Low on memory during GC
Detail: SOH: Failed to reserve memory (16777216 bytes)
---------Heap 22---------
Managed OOM occured after GC #15529 (Requested to allocate 0 bytes)
Reason: Low on memory during GC
Detail: SOH: Failed to reserve memory (16777216 bytes)
---------Heap 23---------
Managed OOM occured after GC #15975 (Requested to allocate 0 bytes)
Reason: Low on memory during GC
Detail: SOH: Failed to reserve memory (16777216 bytes)
---------Heap 25---------
Managed OOM occured after GC #15985 (Requested to allocate 0 bytes)
Reason: Low on memory during GC
Detail: SOH: Failed to reserve memory (16777216 bytes)
---------Heap 27---------
Managed OOM occured after GC #40008 (Requested to allocate 0 bytes)
Reason: Low on memory during GC
Detail: SOH: Failed to reserve memory (16777216 bytes)
---------Heap 30---------
Managed OOM occured after GC #40006 (Requested to allocate 0 bytes)
Reason: Low on memory during GC
Detail: SOH: Failed to reserve memory (16777216 bytes)
0:000> !vmstat
TYPE MINIMUM MAXIMUM AVERAGE BLK COUNT TOTAL
~~~~ ~~~~~~~ ~~~~~~~ ~~~~~~~ ~~~~~~~~~ ~~~~~
Free:
Small 4K 64K 57K 4,651 266,932K
Medium 68K 1,024K 288K 97 27,967K
Large 1,088K 6,080K 2,305K 27 62,247K
Summary 4K 6,080K 74K 4,775 357,150K
Reserve:
Small 4K 64K 12K 926 11,567K
Medium 68K 1,020K 277K 390 108,263K
Large 1,148K 16,376K 12,201K 190 2,318,211K
Summary 4K 16,376K 1,618K 1,506 2,438,043K
Commit:
Small 4K 64K 10K 8,169 85,567K
Medium 68K 1,024K 322K 552 178,023K
Large 1,028K 23,300K 5,137K 221 1,135,447K
Summary 4K 23,300K 156K 8,942 1,399,038K
Private:
Small 4K 64K 11K 5,939 65,578K
Medium 68K 1,024K 311K 472 146,891K
Large 1,028K 23,300K 9,725K 316 3,073,339K
Summary 4K 23,300K 488K 6,727 3,285,811K
Mapped:
Small 4K 64K 11K 85 979K
Medium 68K 1,004K 366K 12 4,399K
Large 1,520K 2,888K 2,206K 4 8,824K
Summary 4K 2,888K 140K 101 14,203K
Image:
Small 4K 64K 9K 3,071 30,575K
Medium 68K 1,024K 294K 458 134,995K
Large 1,032K 15,480K 4,082K 91 371,495K
Summary 4K 15,480K 148K 3,620 537,064K][1]
#############################
0:000> !eeheap -gc
Number of GC Heaps: 32
------------------------------
Heap 0 (1a616d08)
generation 0 starts at 0xa062179c
generation 1 starts at 0xa0621000
generation 2 starts at 0x1ab91000
ephemeral segment allocation context: none
segment begin allocated size
1ab90000 1ab91000 1adce1c8 0x23d1c8(2347464)
a0620000 a0621000 a0867db8 0x246db8(2387384)
Large object heap starts at 0x3ab91000
segment begin allocated size
3ab90000 3ab91000 3b343490 0x7b2490(8070288)
Heap Size: Size: 0xc36410 (12805136) bytes.
------------------------------
Heap 1 (1a619970)
generation 0 starts at 0xa965da00
generation 1 starts at 0xa9621000
generation 2 starts at 0x1bb91000
ephemeral segment allocation context: none
segment begin allocated size
1bb90000 1bb91000 1be9bbd0 0x30abd0(3189712)
a9620000 a9621000 a982dd14 0x20cd14(2149652)
Large object heap starts at 0x3b391000
segment begin allocated size
3b390000 3b391000 3bae09f0 0x74f9f0(7666160)
Heap Size: Size: 0xc672d4 (13005524) bytes.
------------------------------
Heap 2 (1a6215d8)
generation 0 starts at 0xa762370c
generation 1 starts at 0xa7621000
generation 2 starts at 0x1cb91000
ephemeral segment allocation context: none
segment begin allocated size
1cb90000 1cb91000 1d0a4604 0x513604(5322244)
a7620000 a7621000 a78a3a20 0x282a20(2632224)
Large object heap starts at 0x3bb91000
segment begin allocated size
3bb90000 3bb91000 3c384cf8 0x7f3cf8(8338680)
736b0000 736b1000 73769790 0xb8790(755600)
Heap Size: Size: 0x10424ac (17048748) bytes.
------------------------------
Heap 3 (1a624240)
generation 0 starts at 0xb56226d0
generation 1 starts at 0xb5621000
generation 2 starts at 0x1db91000
ephemeral segment allocation context: none
segment begin allocated size
1db90000 1db91000 1debd778 0x32c778(3327864)
b5620000 b5621000 b56346dc 0x136dc(79580)
Large object heap starts at 0x3c391000
segment begin allocated size
3c390000 3c391000 3c88b720 0x4fa720(5220128)
Heap Size: Size: 0x83a574 (8627572) bytes.
------------------------------
Heap 4 (1a626ea8)
generation 0 starts at 0x9762eb1c
generation 1 starts at 0x97621000
generation 2 starts at 0x1eb91000
ephemeral segment allocation context: none
segment begin allocated size
1eb90000 1eb91000 1ee6ae1c 0x2d9e1c(2989596)
97620000 97621000 97a87308 0x466308(4612872)
Large object heap starts at 0x3cb91000
segment begin allocated size
3cb90000 3cb91000 3d36c7b8 0x7db7b8(8239032)
f9e70000 f9e71000 f9e975a0 0x265a0(157088)
Heap Size: Size: 0xf41e7c (15998588) bytes.
------------------------------
Heap 5 (1a639b10)
generation 0 starts at 0x8f62107c
generation 1 starts at 0x8f621000
generation 2 starts at 0x1fb91000
ephemeral segment allocation context: none
segment begin allocated size
1fb90000 1fb91000 20b8500c 0xff400c(16728076)
8f620000 8f621000 8f777088 0x156088(1400968)
Large object heap starts at 0x3d391000
segment begin allocated size
3d390000 3d391000 3d903cb0 0x572cb0(5713072)
Heap Size: Size: 0x16bcd44 (23842116) bytes.
------------------------------
Heap 6 (1a63c778)
generation 0 starts at 0xba6611e8
generation 1 starts at 0xba621000
generation 2 starts at 0x20b91000
ephemeral segment allocation context: none
segment begin allocated size
20b90000 20b91000 20e66118 0x2d5118(2969880)
ba620000 ba621000 ba7051f4 0xe41f4(934388)
Large object heap starts at 0x3db91000
segment begin allocated size
3db90000 3db91000 3e348dd8 0x7b7dd8(8093144)
Heap Size: Size: 0xb710e4 (11997412) bytes.
------------------------------
Heap 7 (1a63f3e0)
generation 0 starts at 0xad621918
generation 1 starts at 0xad621000
generation 2 starts at 0x21b91000
ephemeral segment allocation context: none
segment begin allocated size
21b90000 21b91000 21fe7dd0 0x456dd0(4550096)
ad620000 ad621000 adad37e8 0x4b27e8(4925416)
Large object heap starts at 0x3e391000
segment begin allocated size
3e390000 3e391000 3eaea868 0x759868(7706728)
Heap Size: Size: 0x1062e20 (17182240) bytes.
------------------------------
Heap 8 (1a642048)
generation 0 starts at 0xf5e724e0
generation 1 starts at 0xf5e71000
generation 2 starts at 0x22b91000
ephemeral segment allocation context: none
segment begin allocated size
22b90000 22b91000 22ee2cc8 0x351cc8(3480776)
f5e70000 f5e71000 f5eb04ec 0x3f4ec(259308)
Large object heap starts at 0x3eb91000
segment begin allocated size
3eb90000 3eb91000 3f03b3c0 0x4aa3c0(4891584)
Heap Size: Size: 0x83b574 (8631668) bytes.
------------------------------
Heap 9 (1a648cb0)
generation 0 starts at 0x8d630bc4
generation 1 starts at 0x8d621000
generation 2 starts at 0x23b91000
ephemeral segment allocation context: none
segment begin allocated size
23b90000 23b91000 23e4d69c 0x2bc69c(2868892)
8d620000 8d621000 8daf7fb4 0x4d6fb4(5074868)
Large object heap starts at 0x3f391000
segment begin allocated size
3f390000 3f391000 3f991138 0x600138(6291768)
Heap Size: Size: 0xd93788 (14235528) bytes.
------------------------------
Heap 10 (1a64b918)
generation 0 starts at 0xa86261d0
generation 1 starts at 0xa8621000
generation 2 starts at 0x24b91000
ephemeral segment allocation context: none
segment begin allocated size
24b90000 24b91000 250b5b3c 0x524b3c(5393212)
a8620000 a8621000 a891ad34 0x2f9d34(3120436)
Large object heap starts at 0x3fb91000
segment begin allocated size
3fb90000 3fb91000 3ff89810 0x3f8810(4163600)
Heap Size: Size: 0xc17080 (12677248) bytes.
------------------------------
Heap 11 (1a64e580)
generation 0 starts at 0x916238ec
generation 1 starts at 0x91621000
generation 2 starts at 0x25b91000
ephemeral segment allocation context: none
segment begin allocated size
25b90000 25b91000 25ea5d64 0x314d64(3231076)
91620000 91621000 91930198 0x30f198(3207576)
Large object heap starts at 0x40391000
segment begin allocated size
40390000 40391000 40ac8f50 0x737f50(7569232)
Heap Size: Size: 0xd5be4c (14007884) bytes.
------------------------------
Heap 12 (1a65b850)
generation 0 starts at 0x7c52281c
generation 1 starts at 0x7c521000
generation 2 starts at 0x26b91000
ephemeral segment allocation context: none
segment begin allocated size
26b90000 26b91000 2702cad8 0x49bad8(4831960)
7c520000 7c521000 7c7b662c 0x29562c(2709036)
Large object heap starts at 0x40b91000
segment begin allocated size
40b90000 40b91000 41378c38 0x7e7c38(8289336)
e73d0000 e73d1000 e78cce00 0x4fbe00(5225984)
Heap Size: Size: 0x1414b3c (21056316) bytes.
------------------------------
Heap 13 (1a65ef20)
generation 0 starts at 0xf7e77370
generation 1 starts at 0xf7e71000
generation 2 starts at 0x27b91000
ephemeral segment allocation context: none
segment begin allocated size
27b90000 27b91000 27ee43d4 0x3533d4(3486676)
f7e70000 f7e71000 f828f6fc 0x41e6fc(4318972)
Large object heap starts at 0x41391000
segment begin allocated size
41390000 41391000 41b8edf0 0x7fddf0(8379888)
ebc80000 ebc81000 ec460740 0x7df740(8255296)
7e520000 7e521000 7e56dba8 0x4cba8(314280)
Heap Size: Size: 0x179bba8 (24755112) bytes.
------------------------------
Heap 14 (1a661458)
generation 0 starts at 0x9e65f268
generation 1 starts at 0x9e621000
generation 2 starts at 0x28b91000
ephemeral segment allocation context: none
segment begin allocated size
28b90000 28b91000 28f1aacc 0x389acc(3709644)
9e620000 9e621000 9e96f57c 0x34e57c(3466620)
Large object heap starts at 0x41b91000
segment begin allocated size
41b90000 41b91000 42268f58 0x6d7f58(7176024)
Heap Size: Size: 0xdaffa0 (14352288) bytes.
------------------------------
Heap 15 (1a663990)
generation 0 starts at 0x9faacc7c
generation 1 starts at 0x9faa8ac4
generation 2 starts at 0x29b91000
ephemeral segment allocation context: none
segment begin allocated size
29b90000 29b91000 29cde0e8 0x14d0e8(1364200)
9f620000 9f621000 9fd16c88 0x6f5c88(7298184)
Large object heap starts at 0x42391000
segment begin allocated size
42390000 42391000 42adf6a0 0x74e6a0(7661216)
Heap Size: Size: 0xf91410 (16323600) bytes.
------------------------------
Heap 16 (1a665ec8)
generation 0 starts at 0xc362a47c
generation 1 starts at 0xc3621000
generation 2 starts at 0x2ab91000
ephemeral segment allocation context: none
segment begin allocated size
2ab90000 2ab91000 2afbc464 0x42b464(4371556)
c3620000 c3621000 c3854488 0x233488(2307208)
Large object heap starts at 0x42b91000
segment begin allocated size
42b90000 42b91000 42f635f8 0x3d25f8(4007416)
Heap Size: Size: 0xa30ee4 (10686180) bytes.
------------------------------
Heap 17 (1a668418)
generation 0 starts at 0x94622638
generation 1 starts at 0x94621000
generation 2 starts at 0x2bb91000
ephemeral segment allocation context: none
segment begin allocated size
2bb90000 2bb91000 2bfd1374 0x440374(4457332)
94620000 94621000 948da24c 0x2b924c(2855500)
Large object heap starts at 0x43391000
segment begin allocated size
43390000 43391000 43b7a280 0x7e9280(8295040)
67350000 67351000 6739db20 0x4cb20(314144)
Heap Size: Size: 0xf2f360 (15922016) bytes.
------------------------------
Heap 18 (1a669d20)
generation 0 starts at 0x9a621f68
generation 1 starts at 0x9a621000
generation 2 starts at 0x2cb91000
ephemeral segment allocation context: none
segment begin allocated size
2cb90000 2cb91000 2ce5c30c 0x2cb30c(2929420)
9a620000 9a621000 9a6e597c 0xc497c(805244)
Large object heap starts at 0x43b91000
segment begin allocated size
43b90000 43b91000 43f1f520 0x38e520(3728672)
Heap Size: Size: 0x71e1a8 (7463336) bytes.
------------------------------
Heap 19 (1a66b628)
generation 0 starts at 0x83641300
generation 1 starts at 0x83621000
generation 2 starts at 0x2db91000
ephemeral segment allocation context: none
segment begin allocated size
2db90000 2db91000 2dfaecb8 0x41dcb8(4316344)
83620000 83621000 83855614 0x234614(2311700)
Large object heap starts at 0x44391000
segment begin allocated size
44390000 44391000 44a37488 0x6a6488(6972552)
Heap Size: Size: 0xcf8754 (13600596) bytes.
------------------------------
Heap 20 (1a66cf30)
generation 0 starts at 0x8b621738
generation 1 starts at 0x8b621000
generation 2 starts at 0x2eb91000
ephemeral segment allocation context: none
segment begin allocated size
2eb90000 2eb91000 2ef0c5e4 0x37b5e4(3651044)
8b620000 8b621000 8b94d484 0x32c484(3327108)
Large object heap starts at 0x44b91000
segment begin allocated size
44b90000 44b91000 450100c0 0x47f0c0(4714688)
Heap Size: Size: 0xb26b28 (11692840) bytes.
------------------------------
Heap 21 (1a66e838)
generation 0 starts at 0xf31d3830
generation 1 starts at 0xf31d1000
generation 2 starts at 0x2fb91000
ephemeral segment allocation context: none
segment begin allocated size
2fb90000 2fb91000 2fe8b854 0x2fa854(3123284)
f31d0000 f31d1000 f35a9948 0x3d8948(4032840)
Large object heap starts at 0x45391000
segment begin allocated size
45390000 45391000 458c3008 0x532008(5447688)
Heap Size: Size: 0xc051a4 (12603812) bytes.
------------------------------
Heap 22 (1a670140)
generation 0 starts at 0x9867de74
generation 1 starts at 0x98621000
generation 2 starts at 0x30b91000
ephemeral segment allocation context: none
segment begin allocated size
30b90000 30b91000 3102bbdc 0x49abdc(4828124)
98620000 98621000 988edc84 0x2ccc84(2935940)
Large object heap starts at 0x45b91000
segment begin allocated size
45b90000 45b91000 462adab8 0x71cab8(7457464)
Heap Size: Size: 0xe84318 (15221528) bytes.
------------------------------
Heap 23 (1a671a48)
generation 0 starts at 0xe8c810dc
generation 1 starts at 0xe8c81000
generation 2 starts at 0x31b91000
ephemeral segment allocation context: none
segment begin allocated size
31b90000 31b91000 31de8af0 0x257af0(2456304)
e8c80000 e8c81000 e8f756f8 0x2f46f8(3098360)
Large object heap starts at 0x46391000
segment begin allocated size
46390000 46391000 467d71b0 0x4461b0(4481456)
Heap Size: Size: 0x992398 (10036120) bytes.
------------------------------
Heap 24 (1a673350)
generation 0 starts at 0xa1621544
generation 1 starts at 0xa1621000
generation 2 starts at 0x32b91000
ephemeral segment allocation context: none
segment begin allocated size
32b90000 32b91000 32f74f04 0x3e3f04(4079364)
a1620000 a1621000 a1803858 0x1e2858(1976408)
Large object heap starts at 0x46b91000
segment begin allocated size
46b90000 46b91000 4737fc08 0x7eec08(8317960)
67b90000 67b91000 67d11100 0x180100(1573120)
Heap Size: Size: 0xf35464 (15946852) bytes.
------------------------------
Heap 25 (1a674c58)
generation 0 starts at 0x8c6222b8
generation 1 starts at 0x8c621000
generation 2 starts at 0x33b91000
ephemeral segment allocation context: none
segment begin allocated size
33b90000 33b91000 33edff20 0x34ef20(3469088)
8c620000 8c621000 8ca2c690 0x40b690(4241040)
Large object heap starts at 0x47391000
segment begin allocated size
47390000 47391000 47a011a0 0x6701a0(6750624)
Heap Size: Size: 0xdca750 (14460752) bytes.
------------------------------
Heap 26 (1a676560)
generation 0 starts at 0x9b62150c
generation 1 starts at 0x9b621000
generation 2 starts at 0x34b91000
ephemeral segment allocation context: none
segment begin allocated size
34b90000 34b91000 34fa6200 0x415200(4280832)
9b620000 9b621000 9b8b531c 0x29431c(2704156)
Large object heap starts at 0x47b91000
segment begin allocated size
47b90000 47b91000 48373ec0 0x7e2ec0(8269504)
7aa10000 7aa11000 7ab44168 0x133168(1257832)
Heap Size: Size: 0xfbf544 (16512324) bytes.
------------------------------
Heap 27 (1a677e68)
generation 0 starts at 0x92630b90
generation 1 starts at 0x92621000
generation 2 starts at 0x35b91000
ephemeral segment allocation context: none
segment begin allocated size
35b90000 35b91000 361323f0 0x5a13f0(5903344)
92620000 92621000 929fcd4c 0x3dbd4c(4046156)
Large object heap starts at 0x48391000
segment begin allocated size
48390000 48391000 48b76c48 0x7e5c48(8281160)
f0680000 f0681000 f06f4570 0x73570(472432)
Heap Size: Size: 0x11d62f4 (18703092) bytes.
------------------------------
Heap 28 (1a679770)
generation 0 starts at 0xe1c610dc
generation 1 starts at 0xe1c61000
generation 2 starts at 0x36b91000
ephemeral segment allocation context: none
segment begin allocated size
36b90000 36b91000 37076c64 0x4e5c64(5135460)
e1c60000 e1c61000 e1ed5044 0x274044(2572356)
Large object heap starts at 0x48b91000
segment begin allocated size
48b90000 48b91000 4937c3a8 0x7eb3a8(8303528)
f51d0000 f51d1000 f56afdf8 0x4dedf8(5107192)
Heap Size: Size: 0x1423e48 (21118536) bytes.
------------------------------
Heap 29 (1a67b078)
generation 0 starts at 0xa6621380
generation 1 starts at 0xa6621000
generation 2 starts at 0x37b91000
ephemeral segment allocation context: none
segment begin allocated size
37b90000 37b91000 37ecffc0 0x33efc0(3403712)
a6620000 a6621000 a6873190 0x252190(2433424)
Large object heap starts at 0x49391000
segment begin allocated size
49390000 49391000 49a365c8 0x6a55c8(6968776)
Heap Size: Size: 0xc36718 (12805912) bytes.
------------------------------
Heap 30 (1a67c980)
generation 0 starts at 0xb36238ac
generation 1 starts at 0xb3621000
generation 2 starts at 0x38b91000
ephemeral segment allocation context: none
segment begin allocated size
38b90000 38b91000 38eda4b8 0x3494b8(3445944)
b3620000 b3621000 b36978b8 0x768b8(485560)
Large object heap starts at 0x49b91000
segment begin allocated size
49b90000 49b91000 49ffd360 0x46c360(4637536)
Heap Size: Size: 0x82c0d0 (8569040) bytes.
------------------------------
Heap 31 (1a67e288)
generation 0 starts at 0x79a11784
generation 1 starts at 0x79a11000
generation 2 starts at 0x39b91000
ephemeral segment allocation context: none
segment begin allocated size
39b90000 39b91000 3a35caf0 0x7cbaf0(8174320)
79a10000 79a11000 79ec789c 0x4b689c(4941980)
Large object heap starts at 0x4a391000
segment begin allocated size
4a390000 4a391000 4a94e330 0x5bd330(6017840)
Heap Size: Size: 0x123f6bc (19134140) bytes.
------------------------------
GC Heap Size: Size: 0x1c1341b8 (471024056) bytes.
Based on the output from !vmstat, you are out of memory. There's some mild address space fragmentation, but you only have a total of ~350MB of free memory, so you're really running close to the address space limit. The largest free block is just 6MB, and the CLR allocates virtual memory segments that are at least 16MB in size.
Your total GC heap size is just 470MB (see the last line from the !eeheap -gc output), which means you have other stuff in your process using up address space. Namely, you have >500MB of images (DLLs) and >3GB of memory classified as "Private". This can be a bunch of different things; for example, it can be unmanaged heap allocations.
You can try to further zoom in on the space hog by running !heap -s -h 0 to see if you have large unmanaged heaps in your process. I suggest that once you have a direction (is it an unmanaged heap leak? something else?) to ask another question with your findings. From the information you posted so far, we can conclude it's likely unrelated to what the managed part of your application is doing. Do you have large unmanaged components in your application? There are techniques for analyzing unmanaged memory leaks, such as UMDH or ETW heap allocation tracing.
One final comment: why are you running a 32-bit app on a system with 32 processors? Looks like a server system, and I bet you have more than 4GB of physical memory. If it's at all under your control, try making the move to 64-bit.

Fielddata never gets evicted even though it uses more than indices.fielddata.cache.size

We set indices.fielddata.cache.size = '6gb' but even though the field data cache uses more than that, evictions are never happening. The circuit breaker eventually triggers:
"RemoteTransportException[[elasticsearch][inet[/0.0.0.0:9300]][indices:data/read/search[phase/query]]]; nested: ElasticsearchException[org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large, data for [myField] would be larger than limit of [9437184000/8.7gb]]; nested: UncheckedExecutionException[org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large, data for [myField] would be larger than limit of [9437184000/8.7gb]]; nested: CircuitBreakingException[[FIELDDATA] Data too large, data for [myField] would be larger than limit of [9437184000/8.7gb]];
All the field data settings:
indices.fielddata.cache.size: "6gb"
indices.breaker.fielddata.limit: "60%"
indices.breaker.request.limit: "30%"
indices.breaker.total.limit: "70%"
Here's what our field data size looks like for the cluster (call to /_stats/fielddata?fields=*&human&pretty):
"fielddata" : {
"memory_size" : "168.1gb",
"memory_size_in_bytes" : 180591558840,
"evictions" : 0
},
From my understanding the max size should be capped at 6 gb * 24 nodes = 144 gb.

Resources