Java8 MetaspaceSize flag not working - java-8

I have a simple test code that setup both -XX:MetaspaceSize and -XX:MaxMetaspaceSize to a same value. I think the metaspace then should not dynamically resizing. But from my testing (check Metaspace diagram from VisualVM GC and print out log by jstat), i saw metaspace keep growing from a low value to the max value i set. So doesn't the -XX:MetaspaceSize not working?
My testing code:
try {
while(true){
Enhancer enhancer = new Enhancer();
enhancer.setSuperclass(A.class);
enhancer.setUseCache(false);
enhancer.setCallback((MethodInterceptor) (obj, method, args1, methodProxy) -> methodProxy.invokeSuper(obj, args1));
enhancer.create();
Thread.sleep(50);
}
} catch (Throwable throwable) {
throwable.printStackTrace();
}
VM args:
-XX:MetaspaceSize=10m -XX:MaxMetaspaceSize=10m
Java version:
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
jstat result: (MC value keep growing to around 10m)
C:\Users\dyu>jstat -gc 12336 1000 20
S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT GCT
8192.0 8192.0 0.0 2207.8 49152.0 43399.2 131072.0 16.0 7168.0 6777.9 768.0 677.7 1 0.015 0 0.000 0.015
8192.0 8192.0 0.0 2207.8 49152.0 48166.1 131072.0 16.0 7168.0 6777.9 768.0 677.7 1 0.015 0 0.000 0.015
8192.0 8192.0 2592.0 0.0 49152.0 3691.9 131072.0 24.0 8832.0 8403.6 896.0 795.9 2 0.020 0 0.000 0.020
8192.0 8192.0 2592.0 0.0 49152.0 7537.9 131072.0 24.0 8832.0 8403.6 896.0 795.9 2 0.020 0 0.000 0.020
8192.0 8192.0 2592.0 0.0 49152.0 11378.9 131072.0 24.0 8832.0 8403.6 896.0 795.9 2 0.020 0 0.000 0.020
8192.0 8192.0 2592.0 0.0 49152.0 16180.3 131072.0 24.0 8832.0 8403.6 896.0 795.9 2 0.020 0 0.000 0.020
8192.0 8192.0 2592.0 0.0 49152.0 20021.3 131072.0 24.0 8832.0 8403.6 896.0 795.9 2 0.020 0 0.000 0.020
8192.0 8192.0 2592.0 0.0 49152.0 24822.5 131072.0 24.0 8832.0 8403.6 896.0 795.9 2 0.020 0 0.000 0.020
8192.0 8192.0 2592.0 0.0 49152.0 28663.5 131072.0 24.0 8832.0 8403.6 896.0 795.9 2 0.020 0 0.000 0.020
8192.0 8192.0 2592.0 0.0 49152.0 33466.8 131072.0 24.0 8832.0 8403.6 896.0 795.9 2 0.020 0 0.000 0.020
8192.0 8192.0 2592.0 0.0 49152.0 37312.8 131072.0 24.0 8832.0 8403.6 896.0 795.9 2 0.020 0 0.000 0.020
8192.0 8192.0 2592.0 0.0 49152.0 42114.1 131072.0 24.0 8832.0 8403.6 896.0 795.9 2 0.020 0 0.000 0.020
8192.0 8192.0 2592.0 0.0 49152.0 45955.1 131072.0 24.0 8832.0 8403.6 896.0 795.9 2 0.020 0 0.000 0.020
8192.0 8192.0 0.0 3488.0 49152.0 1925.1 131072.0 32.0 10752.0 10225.1 1024.0 934.2 3 0.026 0 0.000 0.026
8192.0 8192.0 0.0 3488.0 49152.0 6737.6 131072.0 32.0 10752.0 10225.1 1024.0 934.2 3 0.026 0 0.000 0.026
8192.0 8192.0 0.0 3488.0 49152.0 11758.5 131072.0 32.0 10752.0 10225.1 1024.0 934.2 3 0.026 0 0.000 0.026
8192.0 8192.0 0.0 3488.0 49152.0 15608.7 131072.0 32.0 10752.0 10225.1 1024.0 934.2 3 0.026 0 0.000 0.026
8192.0 8192.0 0.0 3488.0 49152.0 29056.4 131072.0 32.0 10752.0 10225.1 1024.0 934.2 3 0.026 0 0.000 0.026
8192.0 8192.0 0.0 3488.0 49152.0 34196.6 131072.0 32.0 10752.0 10225.1 1024.0 934.2 3 0.026 0 0.000 0.026
8192.0 8192.0 0.0 3488.0 49152.0 39339.7 131072.0 32.0 10752.0 10225.1 1024.0 934.2 3 0.026 0 0.000 0.026

You have misunderstood what -XX:MetaspaceSize does:
-XX:MetaspaceSize=size
Sets the size of the allocated class metadata space that will trigger a garbage collection the first time it is exceeded. This threshold for a garbage collection is increased or decreased depending on the amount of metadata used.
The name of that option might be considered misleading, unless you interpret it as “intended size”, so obviously it should trigger garbage collection when exceeded (while max size defines the hard limit).
There is an open bug report, JDK-8067205 calling for an option to set the initial metaspace size.

Related

How to disable libx264 stderr output?

I'm writing a library that uses the ffmpeg library interface (libavcodec, libavformat, libavutil, etc.) to manipulate some video. Part of this involves some encoding for which I'm using libx264. Everything works great but libx264 writes output to stderr, for example:
[libx264 # 0x62cbc0] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 AVX512
[libx264 # 0x62cbc0] profile High, level 5.0, 4:2:0, 8-bit
[libx264 # 0x62cbc0] frame I:28 Avg QP:26.78 size:144822
[libx264 # 0x62cbc0] frame P:135 Avg QP:32.21 size: 30853
[libx264 # 0x62cbc0] frame B:108 Avg QP:36.18 size: 15709
[libx264 # 0x62cbc0] consecutive B-frames: 20.3% 79.7%
[libx264 # 0x62cbc0] mb I I16..4: 9.5% 75.9% 14.6%
[libx264 # 0x62cbc0] mb P I16..4: 1.1% 1.5% 1.4% P16..4: 44.8% 9.6% 4.2% 0.0% 0.0% skip:37.5%
[libx264 # 0x62cbc0] mb B I16..4: 0.3% 0.2% 0.4% B16..8: 49.8% 4.6% 0.9% direct: 1.0% skip:42.7% L0:46.9% L1:49.8% BI: 3.2%
[libx264 # 0x62cbc0] final ratefactor: 26.71
[libx264 # 0x62cbc0] 8x8 transform intra:68.4% inter:70.4%
[libx264 # 0x62cbc0] direct mvs spatial:87.0% temporal:13.0%
[libx264 # 0x62cbc0] coded y,uvDC,uvAC intra: 76.2% 73.8% 31.6% inter: 9.7% 9.5% 4.2%
[libx264 # 0x62cbc0] i16 v,h,dc,p: 9% 57% 7% 27%
[libx264 # 0x62cbc0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 9% 21% 4% 8% 9% 7% 12% 7% 22%
[libx264 # 0x62cbc0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 13% 28% 9% 6% 6% 6% 10% 6% 16%
[libx264 # 0x62cbc0] i8c dc,h,v,p: 20% 50% 19% 11%
[libx264 # 0x62cbc0] Weighted P-Frames: Y:0.0% UV:0.0%
[libx264 # 0x62cbc0] ref P L0: 73.2% 12.6% 10.0% 2.7% 1.1% 0.3%
[libx264 # 0x62cbc0] ref B L0: 92.3% 5.6% 1.7% 0.4%
[libx264 # 0x62cbc0] kb/s:7912.01
It is not acceptable for my library to write to stdout or stderr. How can I programmatically disable this output to stderr?
I realize that I can "hijack" stderr, but this seems like a crude hack and I would strongly prefer not to do so.
x264 has command-line argument --quiet. If used as library, its equivalent is to set
param->i_log_level = X264_LOG_NONE;

Xcode keeps on crashing when indexing process starts

I'm on my latest beta on both Xcode and
Mac - macOS High Sierra 10.13 beta 7
XCode - XCode 9 beta 6
Since yesterday after updating Xcode crashes when opening my project and when indexing starts. Other answers in stack overflow do not fix the issue.
Have attached full log here
Part of Crash log as follows
Process: Xcode [665]
Path: /Applications/Xcode-beta.app/Contents/MacOS/Xcode
Identifier: Xcode
Version: 9.0 (13238.4)
Code Type: X86-64 (Native)
Parent Process: ??? [1]
Responsible: Xcode [665]
User ID: 1105600005
Date/Time: 2017-08-24 07:59:26.512 +0530
OS Version: Mac OS X 10.13 (17A352a)
Report Version: 12
Bridge OS Version: 3.0 (14Y661)
Anonymous UUID: B0A9A8FC-BC57-8953-BB69-E279B3226BBF
Time Awake Since Boot: 630 seconds
System Integrity Protection: enabled
Crashed Thread: 19
Exception Type: EXC_BAD_INSTRUCTION (SIGILL)
Exception Codes: 0x0000000000000001, 0x0000000000000000
Exception Note: EXC_CORPSE_NOTIFY
Termination Signal: Illegal instruction: 4
Termination Reason: Namespace SIGNAL, Code 0x4
Terminating Process: exc handler [0]
Application Specific Information:
ProductBuildVersion: 9M214v
mcount overflow
Thread 0:: Dispatch queue: com.apple.main-thread
0 libsystem_kernel.dylib 0x00007fff796ece76 mach_msg_trap + 10
1 libsystem_kernel.dylib 0x00007fff796ec390 mach_msg + 60
2 com.apple.CoreFoundation 0x00007fff52083445 __CFRunLoopServiceMachPort + 341
3 com.apple.CoreFoundation 0x00007fff52082797 __CFRunLoopRun + 1783
4 com.apple.CoreFoundation 0x00007fff52081e13 CFRunLoopRunSpecific + 483
5 com.apple.HIToolbox 0x00007fff513a1876 RunCurrentEventLoopInMode + 286
6 com.apple.HIToolbox 0x00007fff513a15e6 ReceiveNextEventCommon + 613
7 com.apple.HIToolbox 0x00007fff513a1364 _BlockUntilNextEventMatchingListInModeWithFilter + 64
8 com.apple.AppKit 0x00007fff4f69f783 _DPSNextEvent + 2085
9 com.apple.AppKit 0x00007fff4fe34688 -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 3044
10 com.apple.dt.DVTKit 0x000000010b45c8be -[DVTApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 390
11 com.apple.AppKit 0x00007fff4f694591 -[NSApplication run] + 764
12 com.apple.AppKit 0x00007fff4f663736 NSApplicationMain + 804
13 libdyld.dylib 0x00007fff795a6145 start + 1
Thread 1:
0 libsystem_kernel.dylib 0x00007fff796ece76 mach_msg_trap + 10
1 libsystem_kernel.dylib 0x00007fff796ec390 mach_msg + 60
2 com.apple.CoreFoundation 0x00007fff52083445 __CFRunLoopServiceMachPort + 341
3 com.apple.CoreFoundation 0x00007fff52082797 __CFRunLoopRun + 1783
4 com.apple.CoreFoundation 0x00007fff52081e13 CFRunLoopRunSpecific + 483
5 com.apple.Foundation 0x00007fff540ec3f6 -[NSRunLoop(NSRunLoop) runMode:beforeDate:] + 277
6 com.apple.DTDeviceKitBase 0x000000011d2e94f8 +[DTDKRemoteDeviceConnection startServiceBrowsers] + 217
7 com.apple.Foundation 0x00007fff540fa6d8 __NSThread__start__ + 1197
8 libsystem_pthread.dylib 0x00007fff798306c1 _pthread_body + 340
9 libsystem_pthread.dylib 0x00007fff7983056d _pthread_start + 377
10 libsystem_pthread.dylib 0x00007fff7982fc5d thread_start + 13
0x7fff7983f000 - 0x7fff79846ff7 libsystem_symptoms.dylib (820.1.4) <24FD262C-9701-388A-AEDC-D675747F8CBD> /usr/lib/system/libsystem_symptoms.dylib
0x7fff79847000 - 0x7fff7985aff7 libsystem_trace.dylib (829.1.2) <10955EBB-1AC8-3085-9A2D-F3088CA2DF71> /usr/lib/system/libsystem_trace.dylib
0x7fff7985c000 - 0x7fff79861ff7 libunwind.dylib (35.3) <6D4FCD49-D2A9-3233-95C7-A7635CE265F2> /usr/lib/system/libunwind.dylib
0x7fff79862000 - 0x7fff7988dff7 libxpc.dylib (1205.1.10) <E7C5DB12-6D0E-3D1E-A743-F750DF112F5F> /usr/lib/system/libxpc.dylib
External Modification Summary:
Calls made by other processes targeting this process:
task_for_pid: 4
thread_create: 0
thread_set_state: 0
Calls made by this process:
task_for_pid: 0
thread_create: 0
thread_set_state: 0
Calls made by all processes on this machine:
task_for_pid: 7883
thread_create: 0
thread_set_state: 0
VM Region Summary:
ReadOnly portion of Libraries: Total=871.0M resident=0K(0%) swapped_out_or_unallocated=871.0M(100%)
Writable regions: Total=1.6G written=0K(0%) resident=0K(0%) swapped_out=0K(0%) unallocated=1.6G(100%)
VIRTUAL REGION
REGION TYPE SIZE COUNT (non-coalesced)
=========== ======= =======
Accelerate framework 384K 3
Activity Tracing 256K 2
CG backing stores 79.9M 5
CG image 1432K 57
CG raster data 104K 5
CoreAnimation 53.0M 357
CoreGraphics 8K 2
CoreImage 244K 28
CoreServices 3540K 2
CoreUI image data 4892K 30
CoreUI image file 308K 11
Dispatch continuations 16.0M 2
Foundation 348K 5
Image IO 1956K 45
JS JIT generated code 8K 3
JS JIT generated code (reserved) 1.0G 2 reserved VM address space (unallocated)
Kernel Alloc Once 8K 2
MALLOC 444.7M 110
MALLOC guard page 192K 44
MALLOC_LARGE (reserved) 7684K 3 reserved VM address space (unallocated)
Memory Tag 242 12K 2
Memory Tag 244 128K 3
Memory Tag 251 60K 3
Memory Tag 255 32K 2
SQLite page cache 4928K 11
STACK GUARD 56.1M 28
Stack 21.2M 28
VM_ALLOCATE 116K 17
WebKit Malloc 1056K 3
__DATA 79.7M 619
__FONT_DATA 4K 2
__GLSLBUILTINS 2588K 2
__LINKEDIT 264.7M 200
__TEXT 606.4M 594
__UNICODE 556K 2
libnetwork 128K 2
mapped file 228.4M 231
shared memory 704K 21
=========== ======= =======
TOTAL 2.8G 2450
TOTAL, minus reserved VM space 1.8G 2450
I would review recent changes to your .xcodeproj on the disk vs those in your repo. Xcode has crashed for me when I accidentally added a character to the xcproj file.
Use the Terminal to go back in Git history to a previous commit until you find one where indexing works.
Let it finish indexing and then go back to head.

Titan/Cassandra read scalability, random servers going down or dropping messages

We are using Titan-1.0.0 supported by backed as cassandra 2.1.7. The cluster consists of 36 VMs each of 16GB RAM (6GB heapsize) and 16 CPU cores. We are using SSD disk for cassandra data and a normal HDD for commitlogs. The RF is 3 and reads/writes are being done at CF=2 each. The Java version is 1.8.0_45 There is heavy reads being served through this cluster. We are facing random outages with some servers at an average interval of around twice-thrice in day one of the server goes down. Let me share a particular instance of around 10-May-2017 04:00 AM, there were too many READ/MUTATION drops and we had to restart the server. Following are some of parameters that we have collected when the servers go down:
Can someone please please help us with some pointers, as to what is going wrong here and what can be done to stabilize the servers.
gc.log
2017-05-11T21:09:44.018+0530: 1900952.543: Total time for which application threads were stopped: 0.0156063 seconds, Stopping threads took: 0.0125466 seconds 2017-05-11T21:09:45.021+0530: 1900953.546: Total time for which application threads were stopped: 0.0031342 seconds, Stopping threads took: 0.0006697 seconds 2017-05-11T21:09:46.025+0530: 1900954.550: Total time for which application threads were stopped: 0.0037100 seconds, Stopping threads took: 0.0011525 seconds 2017-05-11T21:09:47.031+0530: 1900955.556: Total time for which application threads were stopped: 0.0057972 seconds, Stopping threads took: 0.0030861 seconds 2017-05-11T21:09:48.034+0530: 1900956.559: Total time for which application threads were stopped: 0.0029592 seconds, Stopping threads took: 0.0007129 seconds 2017-05-11T21:09:49.061+0530: 1900957.586: Total time for which application threads were stopped: 0.0268731 seconds, Stopping threads took: 0.0242117 seconds 2017-05-11T21:10:03.095+0530: 1900971.620: Total time for which application threads were stopped: 0.0045643 seconds, Stopping threads took: 0.0009709 seconds 2017-05-11T21:10:04.099+0530: 1900972.624: Total time for which application threads were stopped: 0.0033764 seconds, Stopping threads took: 0.0007673 seconds 2017-05-11T21:10:07.104+0530: 1900975.629: Total time for which application threads were stopped: 0.0056445 seconds, Stopping threads took: 0.0019495 seconds 2017-05-11T21:10:07.447+0530: 1900975.972: Total time for which application threads were stopped: 0.0045721 seconds, Stopping threads took: 0.0009695 seconds {Heap before GC invocations=85258 (full 0): garbage-first heap total 6291456K, used 5059586K [0x0000000640000000, 0x0000000640206000, 0x00000007c0000000) region size 2048K, 1844 young (3776512K), 105 survivors (215040K) Metaspace used 36253K, capacity 36654K, committed 36784K, reserved 1081344K class space used 3844K, capacity 3965K, committed 4016K, reserved 1048576K 2017-05-11T21:10:07.453+0530:
1900975.978: [GC pause (GCLocker Initiated GC) (young) Desired survivor size 242221056 bytes, new threshold 15 (max 15)
- age 1: 182659896 bytes, 182659896 total
- age 2: 21418160 bytes, 204078056 total
- age 3: 871016 bytes, 204949072 total
- age 4: 3600512 bytes, 208549584 total
- age 5: 1313096 bytes, 209862680 total
- age 6: 21152 bytes, 209883832 total , 0.0972867 secs] [Parallel Time: 83.6 ms, GC Workers: 13]
[GC Worker Start (ms): Min: 1900975980.9, Avg: 1900975986.0, Max: 1900975989.4, Diff: 8.5]
[Ext Root Scanning (ms): Min: 8.5, Avg: 11.9, Max: 17.4, Diff: 8.9, Sum: 154.3]
[Update RS (ms): Min: 26.8, Avg: 30.8, Max: 34.2, Diff: 7.4, Sum: 399.8]
[Processed Buffers: Min: 31, Avg: 87.3, Max: 144, Diff: 113, Sum: 1135]
[Scan RS (ms): Min: 0.1, Avg: 2.8, Max: 6.2, Diff: 6.1, Sum: 36.6]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.3, Sum: 0.9]
[Object Copy (ms): Min: 31.5, Avg: 32.3, Max: 33.2, Diff: 1.7, Sum: 419.9]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.3, Sum: 3.1]
[GC Worker Total (ms): Min: 74.7, Avg: 78.1, Max: 83.3, Diff: 8.7, Sum: 1014.7]
[GC Worker End (ms): Min: 1900976063.9, Avg: 1900976064.1, Max: 1900976064.3, Diff: 0.3] [Code Root Fixup: 0.1 ms] [Code Root Purge: 0.0 ms] [Clear CT: 1.4 ms] [Other: 12.1 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 1.0 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 1.6 ms]
[Humongous Reclaim: 0.4 ms]
[Free CSet: 4.5 ms] [Eden: 3478.0M(3476.0M)->0.0B(252.0M) Survivors: 210.0M->54.0M Heap: 4941.5M(6144.0M)->1175.0M(6144.0M)] Heap after GC invocations=85259 (full 0): garbage-first heap total 6291456K, used 1203201K [0x0000000640000000, 0x0000000640206000, 0x00000007c0000000) region size 2048K, 27 young (55296K), 27 survivors (55296K) Metaspace used 36253K, capacity 36654K, committed 36784K, reserved 1081344K class space used 3844K, capacity 3965K, committed 4016K, reserved 1048576K } [Times: user=0.58 sys=0.21, real=0.09 secs] 2017-05-11T21:10:07.551+0530:
1900976.076: Total time for which application threads were stopped: 0.1015506 seconds, Stopping threads took: 0.0004470 seconds 2017-05-11T21:10:08.557+0530: 1900977.082: Total time for which application threads were stopped: 0.0053236 seconds, Stopping threads took: 0.0010397 seconds {Heap before GC invocations=85259 (full 0): garbage-first heap total 6291456K, used 1461249K [0x0000000640000000, 0x0000000640206000, 0x00000007c0000000) region size 2048K, 153 young (313344K), 27 survivors (55296K) Metaspace used 36253K, capacity 36654K, committed 36784K, reserved 1081344K class space used 3844K, capacity 3965K, committed 4016K, reserved 1048576K 2017-05-11T21:10:08.995+0530: 1900977.520: [GC pause (G1 Evacuation Pause) (mixed) Desired survivor size 20971520 bytes, new threshold 3 (max 15)
- age 1: 19769440 bytes, 19769440 total
- age 2: 801568 bytes, 20571008 total
- age 3: 20679216 bytes, 41250224 total
- age 4: 738736 bytes, 41988960 total
- age 5: 3536096 bytes, 45525056 total
- age 6: 1242648 bytes, 46767704 total
- age 7: 19208 bytes, 46786912 total , 0.1879873 secs] [Parallel Time: 175.3 ms, GC Workers: 13]
[GC Worker Start (ms): Min: 1900977524.7, Avg: 1900977525.0, Max: 1900977525.3, Diff: 0.6]
[Ext Root Scanning (ms): Min: 14.5, Avg: 14.8, Max: 15.3, Diff: 0.8, Sum: 192.9]
[Update RS (ms): Min: 7.1, Avg: 7.9, Max: 13.3, Diff: 6.1, Sum: 103.2]
[Processed Buffers: Min: 4, Avg: 37.0, Max: 93, Diff: 89, Sum: 481]
[Scan RS (ms): Min: 55.7, Avg: 60.9, Max: 61.8, Diff: 6.1, Sum: 791.9]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.1, Max: 0.7, Diff: 0.7, Sum: 0.9]
[Object Copy (ms): Min: 90.3, Avg: 90.7, Max: 91.2, Diff: 0.9, Sum: 1179.5]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.1, Sum: 0.8]
[GC Worker Total (ms): Min: 174.3, Avg: 174.6, Max: 174.8, Diff: 0.6, Sum: 2269.2]
[GC Worker End (ms): Min: 1900977699.6, Avg: 1900977699.6, Max: 1900977699.7, Diff: 0.1] [Code Root Fixup: 0.2 ms] [Code Root Purge: 0.0 ms] [Clear CT: 0.9 ms] [Other: 11.5 ms]
[Choose CSet: 3.3 ms]
[Ref Proc: 0.4 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 1.1 ms]
[Humongous Reclaim: 0.1 ms]
[Free CSet: 3.6 ms] [Eden: 252.0M(252.0M)->0.0B(3656.0M) Survivors: 54.0M->30.0M Heap: 1427.0M(6144.0M)->890.0M(6144.0M)] Heap after GC invocations=85260 (full 0): garbage-first heap total 6291456K, used 911310K [0x0000000640000000, 0x0000000640206000, 0x00000007c0000000) region size 2048K, 15 young (30720K), 15 survivors (30720K) Metaspace used 36253K, capacity 36654K, committed 36784K, reserved 1081344K class space used 3844K, capacity 3965K, committed 4016K, reserved 1048576K } [Times: user=1.59 sys=0.60, real=0.19 secs]
jstat
S0 S1 E O M CCS YGC YGCT FGC FGCT GCT
0.00 100.00 2.29 37.85 98.56 95.74 79645 6351.963 0 0.000 6351.963
0.00 100.00 4.63 37.85 98.56 95.74 79645 6351.963 0 0.000 6351.963
0.00 100.00 7.81 37.85 98.56 95.74 79645 6351.963 0 0.000 6351.963
0.00 100.00 10.46 37.85 98.56 95.74 79645 6351.963 0 0.000 6351.963
0.00 100.00 13.17 37.81 98.56 95.74 79645 6351.963 0 0.000 6351.963
0.00 100.00 15.30 38.11 98.56 95.74 79645 6351.963 0 0.000 6351.963
/var/log/cassandra/system.log
WARN [SharedPool-Worker-1] 2017-05-11 21:08:49,621 SliceQueryFilter.java:319 - Read 286 live and 12978 tombstone cells in icmsgraph_10042017.edgestore for key: f800000000104600 (see tombstone_warn_threshold). 2147483647 columns were requested, slices=[b02ea000800000350080000002-b02ea000800000350080000003]
WARN [SharedPool-Worker-1] 2017-05-11 21:08:52,647 SliceQueryFilter.java:319 - Read 286 live and 12978 tombstone cells in icmsgraph_10042017.edgestore for key: f800000000104600 (see tombstone_warn_threshold). 2147483647 columns were requested, slices=[b02ea000800000350080000002-b02ea000800000350080000003]
WARN [GossipTasks:1] 2017-05-11 21:09:05,030 Gossiper.java:714 - Gossip stage has 1 pending tasks; skipping status check (no nodes will be marked down)
WARN [GossipTasks:1] 2017-05-11 21:09:16,588 FailureDetector.java:249 - Not marking nodes down due to local pause of 16886451559 > 5000000000
INFO [ScheduledTasks:1] 2017-05-11 21:09:16,954 MessagingService.java:888 - 1924 READ messages dropped in last 5000ms
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,153 StatusLogger.java:51 - Pool Name Active Pending Completed Blocked All Time Blocked
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,154 StatusLogger.java:66 - MutationStage 0 0 32568360 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,155 StatusLogger.java:66 - RequestResponseStage 1 0 12112153129 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,155 StatusLogger.java:66 - ReadRepairStage 0 0 593141839 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,155 StatusLogger.java:66 - CounterMutationStage 0 0 0 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,155 StatusLogger.java:66 - ReadStage 15 0 12168491480 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,155 StatusLogger.java:66 - MiscStage 0 0 0 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,156 StatusLogger.java:66 - HintedHandoff 0 1 3531 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,156 StatusLogger.java:66 - GossipStage 0 0 5798815 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,156 StatusLogger.java:66 - CacheCleanupExecutor 0 0 0 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,156 StatusLogger.java:66 - InternalResponseStage 0 0 537 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,156 StatusLogger.java:66 - CommitLogArchiver 0 0 0 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,157 StatusLogger.java:66 - CompactionExecutor 0 0 29915673 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,157 StatusLogger.java:66 - ValidationExecutor 0 0 0 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,157 StatusLogger.java:66 - MigrationStage 0 0 3990 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,157 StatusLogger.java:66 - AntiEntropyStage 0 0 0 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,158 StatusLogger.java:66 - PendingRangeCalculator 0 0 255 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,159 StatusLogger.java:66 - Sampler 0 0 0 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,159 StatusLogger.java:66 - MemtableFlushWriter 0 0 7705 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,159 StatusLogger.java:66 - MemtablePostFlush 0 0 44884 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,159 StatusLogger.java:66 - MemtableReclaimMemory 0 0 7705 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,160 StatusLogger.java:75 - CompactionManager 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,160 StatusLogger.java:87 - MessagingService n/a 57/3
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,160 StatusLogger.java:97 - Cache Type Size Capacity KeysToSave
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,160 StatusLogger.java:99 - KeyCache 104857584 104857600 all
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,160 StatusLogger.java:105 - RowCache 0 0 all
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,160 StatusLogger.java:112 - ColumnFamily Memtable ops,data
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,162 StatusLogger.java:115 - system.compaction_history 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,162 StatusLogger.java:115 - system.hints 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,162 StatusLogger.java:115 - system.IndexInfo 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,162 StatusLogger.java:115 - system.schema_columnfamilies 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,162 StatusLogger.java:115 - system.schema_triggers 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,162 StatusLogger.java:115 - system.size_estimates 762300,93404798
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,163 StatusLogger.java:115 - system.paxos 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,163 StatusLogger.java:115 - system.peer_events 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,163 StatusLogger.java:115 - system.range_xfers 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,163 StatusLogger.java:115 - system.compactions_in_progress 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,163 StatusLogger.java:115 - system.peers 283,103428
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,163 StatusLogger.java:115 - system.schema_keyspaces 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,163 StatusLogger.java:115 - system.schema_usertypes 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,163 StatusLogger.java:115 - system.local 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,163 StatusLogger.java:115 - system.sstable_activity 315,63532
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,163 StatusLogger.java:115 - system.schema_columns 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,163 StatusLogger.java:115 - system.batchlog 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,169 StatusLogger.java:115 - icmsgraph_10042017.titan_ids 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,169 StatusLogger.java:115 - icmsgraph_10042017.graphindex 1843,277464
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,169 StatusLogger.java:115 - icmsgraph_10042017.system_properties_lock_ 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,169 StatusLogger.java:115 - icmsgraph_10042017.system_properties 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,169 StatusLogger.java:115 - icmsgraph_10042017.edgestore_lock_ 6,1191
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,169 StatusLogger.java:115 - icmsgraph_10042017.systemlog 741,242777
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,169 StatusLogger.java:115 - icmsgraph_10042017.txlog 741,242777
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,169 StatusLogger.java:115 - icmsgraph_10042017.graphindex_lock_ 1332,277240
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,169 StatusLogger.java:115 - icmsgraph_10042017.edgestore 75171,16624309
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,169 StatusLogger.java:115 - icmsgraph_05052017.system_properties_lock_ 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,169 StatusLogger.java:115 - icmsgraph_05052017.system_properties 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,170 StatusLogger.java:115 - icmsgraph_05052017.edgestore_lock_ 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,170 StatusLogger.java:115 - icmsgraph_05052017.txlog 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,170 StatusLogger.java:115 - icmsgraph_05052017.edgestore 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,170 StatusLogger.java:115 - icmsgraph_05052017.graphindex 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,170 StatusLogger.java:115 - icmsgraph_05052017.titan_ids 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,170 StatusLogger.java:115 - icmsgraph_05052017.systemlog 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:17,170 StatusLogger.java:115 - icmsgraph_05052017.graphindex_lock_ 0,0
WARN [SharedPool-Worker-24] 2017-05-11 21:09:17,200 SliceQueryFilter.java:319 - Read 286 live and 12978 tombstone cells in icmsgraph_10042017.edgestore for key: f800000000104600 (see tombstone_warn_threshold). 2147483647 columns were requested, slices=[b02ea000800000350080000002-b02ea000800000350080000003]
WARN [SharedPool-Worker-18] 2017-05-11 21:09:17,201 SliceQueryFilter.java:319 - Read 286 live and 12978 tombstone cells in icmsgraph_10042017.edgestore for key: f800000000104600 (see tombstone_warn_threshold). 2147483647 columns were requested, slices=[b02ea000800000350080000002-b02ea000800000350080000003]
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,170 MessagingService.java:888 - 1 READ messages dropped in last 5000ms
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,170 StatusLogger.java:51 - Pool Name Active Pending Completed Blocked All Time Blocked
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,171 StatusLogger.java:66 - MutationStage 0 0 32568376 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,171 StatusLogger.java:66 - RequestResponseStage 0 0 12112193109 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,171 StatusLogger.java:66 - ReadRepairStage 0 0 593143878 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,171 StatusLogger.java:66 - CounterMutationStage 0 0 0 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,171 StatusLogger.java:66 - ReadStage 0 0 12168508581 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,171 StatusLogger.java:66 - MiscStage 0 0 0 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,172 StatusLogger.java:66 - HintedHandoff 0 1 3531 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,172 StatusLogger.java:66 - GossipStage 0 0 5798844 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,172 StatusLogger.java:66 - CacheCleanupExecutor 0 0 0 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,172 StatusLogger.java:66 - InternalResponseStage 0 0 537 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,172 StatusLogger.java:66 - CommitLogArchiver 0 0 0 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,172 StatusLogger.java:66 - CompactionExecutor 0 0 29915673 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,172 StatusLogger.java:66 - ValidationExecutor 0 0 0 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,172 StatusLogger.java:66 - MigrationStage 0 0 3990 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,172 StatusLogger.java:66 - AntiEntropyStage 0 0 0 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,172 StatusLogger.java:66 - PendingRangeCalculator 0 0 255 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,172 StatusLogger.java:66 - Sampler 0 0 0 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,173 StatusLogger.java:66 - MemtableFlushWriter 0 0 7705 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,173 StatusLogger.java:66 - MemtablePostFlush 0 0 44884 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,173 StatusLogger.java:66 - MemtableReclaimMemory 0 0 7705 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,173 StatusLogger.java:75 - CompactionManager 0 0
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,173 StatusLogger.java:87 - MessagingService n/a 0/0
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,173 StatusLogger.java:97 - Cache Type Size Capacity KeysToSave
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,173 StatusLogger.java:99 - KeyCache 104857584 104857600 all
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,174 StatusLogger.java:105 - RowCache 0 0 all
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,173 StatusLogger.java:99 - KeyCache 104857584 104857600 all
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,174 StatusLogger.java:105 - RowCache 0 0 all
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,174 StatusLogger.java:112 - ColumnFamily Memtable ops,data
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,179 StatusLogger.java:115 - icmsgraph_10042017.titan_ids 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,179 StatusLogger.java:115 - icmsgraph_10042017.graphindex 1844,277804
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,179 StatusLogger.java:115 - icmsgraph_10042017.system_properties_lock_ 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,179 StatusLogger.java:115 - icmsgraph_10042017.system_properties 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,179 StatusLogger.java:115 - icmsgraph_10042017.edgestore_lock_ 6,1191
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,179 StatusLogger.java:115 - icmsgraph_10042017.systemlog 741,242777
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,179 StatusLogger.java:115 - icmsgraph_10042017.txlog 741,242777
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,179 StatusLogger.java:115 - icmsgraph_10042017.graphindex_lock_ 1332,277240
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,179 StatusLogger.java:115 - icmsgraph_10042017.edgestore 75176,16625645
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,180 StatusLogger.java:115 - icmsgraph_05052017.system_properties_lock_ 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,180 StatusLogger.java:115 - icmsgraph_05052017.system_properties 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,180 StatusLogger.java:115 - icmsgraph_05052017.edgestore_lock_ 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,180 StatusLogger.java:115 - icmsgraph_05052017.txlog 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,180 StatusLogger.java:115 - icmsgraph_05052017.edgestore 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,180 StatusLogger.java:115 - icmsgraph_05052017.graphindex 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,180 StatusLogger.java:115 - icmsgraph_05052017.titan_ids 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,180 StatusLogger.java:115 - icmsgraph_05052017.systemlog 0,0
INFO [ScheduledTasks:1] 2017-05-11 21:09:22,180 StatusLogger.java:115 - icmsgraph_05052017.graphindex_lock_ 0,0
WARN [SharedPool-Worker-34] 2017-05-11 21:09:24,177 SliceQueryFilter.java:319 - Read 286 live and 12978 tombstone cells in icmsgraph_10042017.edgestore for key: f800000000104600 (see tombstone_warn_threshold). 2147483647 columns were requested, slices=[b02ea000800000350080000002-b02ea000800000350080000003]
ThreadDump
Around 600 Thrift threads:
"Thrift:20118" #124110 daemon prio=5 os_prio=0 tid=0x00007feab9a71460 nid=0xb1c6 runnable [0x00007fea6a578000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
- locked (a java.io.BufferedInputStream)
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27)
at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:205)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
72 MessagingService-Incoming threads ...
"MessagingService-Incoming-/192.168.33.67" #124106 prio=5 os_prio=0 tid=0x00007feab98436b0 nid=0xb0ca runnable [0x00007fea7503a000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
- locked (a java.lang.Object) at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:192)
- locked (a java.lang.Object) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
- locked (a sun.nio.ch.SocketAdaptor$SocketInputStream) at net.jpountz.lz4.LZ4BlockInputStream.readFully(LZ4BlockInputStream.java:215) at net.jpountz.lz4.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:149) at net.jpountz.lz4.LZ4BlockInputStream.read(LZ4BlockInputStream.java:101) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:169) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:88)
sar -q
09:10:01 IST 5 1396 16.02 9.24 6.31 0
I think the problem is cassandra's tombstone
In Cassandra whenever a delete, insert explicitly null or TTL expire Tombstone created. Tombstones are a mechanism which allows Cassandra to write fast but it has an operational price to pay.
First of all, tombstones are themselves records. They take up space and can substantially increase the amount of storage you require.
Secondly, Large Number of Tombstones Causes Latency and Heap Pressure
From your cassandra's system log it seems that about 13K tombstone is created only for a single vertex (Vertex Key ID : 0xf800000000104600). It's huge. You can see that when this warning appear on cassandra's system.log, Heap Pressure increases and so gc.log printed at the same time.
What should i do ?
Cassandra will fully drop those tombstones when a compaction triggers, only after local_delete_time + gc_grace_seconds as defined on the table the data belongs to.
Run nodetool repair regularly, Once a week.
You can force a major compaction on one or more table with nodetool compact
Change the table compaction class. Which Compaction class to use
Read More :
http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html
https://opencredo.com/cassandra-tombstones-common-issues/

JVM Crashes while 25000 clients are accessing remote kaa Server resource

I run 25000 clients that just upload log to server every 1 second. The Server crashes in the process.From the log file, we found that the cause of the crash was the JVM crash.The Error log show :
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f032964f085, pid=2043, tid=0x00007f02955cd700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_111-b14) (build 1.8.0_111-b14)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.111-b14 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# V [libjvm.so+0x5c4085] G1ParScanThreadState::copy_to_survivor_space(InCSetState, oopDesc*, markOopDesc*)+0x45
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /tmp/hs_err_pid2043.log
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
My JVM Arguments and System and more infos :
VM Arguments:
jvm_args: -Xms256M -Xmx16G -XX:+UseG1GC -Dfile.encoding=UTF8 -Dserver_log_dir=/var/log/kaa -Dserver_log_sufix= -Dserver_home_dir=/usr/lib/kaa-node -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.port=7091 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false
java_command: org.kaaproject.kaa.server.node.KaaNodeApplication
java_class_path (initial): /usr/lib/kaa-node/conf:/usr/lib/kaa-node/lib/spring-context-4.2.5.RELEASE.jar:/usr/lib/kaa-node/lib/curator-client-2.9.0.jar:/usr/lib/kaa-node/lib/spring-jdbc-4.2.5.RELEASE.jar:/usr/lib/kaa-node/lib/javax.annotation-api-1.2.jar:/usr/lib/kaa-node/lib/log4j-over-slf4j-1.7.7.jar:/usr/lib/kaa-node/lib/jna-4.0.0.jar:/usr/lib/kaa-node/lib/jetty-server-9.2.2.v20140723.jar:/usr/lib/kaa-node/lib/fastutil-6.5.7.jar:/usr/lib/kaa-node/lib/application-action-0.0.64.jar:/usr/lib/kaa-node/lib/commons-collections-3.2.1.jar:/usr/lib/kaa-node/lib/joda-time-2.2.jar:/usr/lib/kaa-node/lib/httpcore-4.3.2.jar:/usr/lib/kaa-node/lib/velocity-1.7.jar:/usr/lib/kaa-node/lib/spring-tx-4.2.5.RELEASE.jar:/usr/lib/kaa-node/lib/hibernate-entitymanager-4.3.11.Final.jar:/usr/lib/kaa-node/lib/jetty-http-9.2.2.v20140723.jar:/usr/lib/kaa-node/lib/commons-cli-1.2.jar:/usr/lib/kaa-node/lib/gwt-client-0.2.1.jar:/usr/lib/kaa-node/lib/swagger-annotations-1.5.9.jar:/usr/lib/kaa-node/lib/jandex-1.1.0.Final.jar:/usr/lib/kaa-node/lib/core-0.10.0.jar:/usr/lib/kaa-node/lib/jetty-security-9.2.2.v20140723.jar:/usr/lib/kaa-node/lib/commons-compress-1.8.jar:/usr/lib/kaa-node/lib/jackson-core-asl-1.9.13.jar:/usr/lib/kaa-node/lib/netty-codec-4.0.34.Final.jar:/usr/lib/kaa-node/lib/dao-0.10.0.jar:/usr/lib/kaa-node/lib/file-appender-0.10.0.jar:/usr/lib/kaa-node/lib/spring-security-web-3.2.9.RELEASE.jar:/usr/lib/kaa-node/lib/gwtquery-1.4.2.jar:/usr/lib/kaa-node/lib/facebook-verifier-0.10.0.jar:/usr/lib/kaa-node/lib/transport-0.10.0-tcp.jar:/usr/lib/kaa-node/lib/spring-data-mongodb-1.9.4.RELEASE.jar:/usr/lib/kaa-node/lib/cassandra-driver-extras-3.0.0.jar:/usr/lib/kaa-node/lib/cassandra-all-3.4.jar:/usr/lib/kaa-node/lib/jcl-over-slf4j-1.7.21.jar:/usr/lib/kaa-node/lib/ant-launcher-1.9.4.jar:/usr/lib/kaa-node/lib/hamcrest-core-1.3.jar:/usr/lib/kaa-node/lib/aspectjrt-1.7.4.jar:/usr/lib/kaa-node/lib/guava-18.0.jar:/usr/lib/kaa-node/lib/spring-beans-4.2.5.RELEASE.jar:/usr/lib/kaa-node/lib/kaa-node-0.10
Launcher Type: SUN_STANDARD
Environment Variables:
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
SHELL=/bin/bash
SYSTEM:
OS:DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.5 LTS"
uname:Linux 4.4.0-31-generic #50~14.04.1-Ubuntu SMP Wed Jul 13 01:07:32 UTC 2016 x86_64
libc:glibc 2.19 NPTL 2.19
rlimit: STACK 8192k, CORE 0k, NPROC 32768, NOFILE 65536, AS infinity
load average:23.81 21.57 24.39
/proc/meminfo:
MemTotal: 32629180 kB
MemFree: 11245384 kB
MemAvailable: 16204112 kB
Buffers: 116504 kB
Cached: 5084432 kB
SwapCached: 0 kB
Active: 9205152 kB
Inactive: 3234744 kB
Active(anon): 7260192 kB
Inactive(anon): 1048 kB
Active(file): 1944960 kB
Inactive(file): 3233696 kB
Unevictable: 8523068 kB
Mlocked: 8523068 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 3096 kB
Writeback: 0 kB
AnonPages: 15762160 kB
Mapped: 168560 kB
Shmem: 1384 kB
Slab: 280612 kB
SReclaimable: 181816 kB
SUnreclaim: 98796 kB
KernelStack: 17856 kB
PageTables: 36468 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 16314588 kB
Committed_AS: 18242600 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
HardwareCorrupted: 0 kB
AnonHugePages: 15398912 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 70872 kB
DirectMap2M: 2754560 kB
DirectMap1G: 30408704 kB
CPU:total 8 (4 cores per cpu, 2 threads per core) family 6 model 60 stepping 3, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, avx, avx2, aes, clmul, erms, lzcnt, ht, tsc, tscinvbit, bmi1, bmi2
/proc/cpuinfo:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Core(TM) i7-4790 CPU # 3.60GHz
stepping : 3
microcode : 0x1d
cpu MHz : 3800.109
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts
bugs :
bogomips : 7183.28
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Core(TM) i7-4790 CPU # 3.60GHz
stepping : 3
microcode : 0x1d
cpu MHz : 3860.859
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 1
cpu cores : 4
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts
bugs :
bogomips : 7183.28
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 2
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Core(TM) i7-4790 CPU # 3.60GHz
stepping : 3
microcode : 0x1d
cpu MHz : 3799.968
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 2
cpu cores : 4
apicid : 4
initial apicid : 4
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts
bugs :
bogomips : 7183.28
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Core(TM) i7-4790 CPU # 3.60GHz
stepping : 3
microcode : 0x1d
cpu MHz : 3799.968
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 3
cpu cores : 4
apicid : 6
initial apicid : 6
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts
bugs :
bogomips : 7183.28
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 4
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Core(TM) i7-4790 CPU # 3.60GHz
stepping : 3
microcode : 0x1d
cpu MHz : 3893.906
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts
bugs :
bogomips : 7183.28
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 5
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Core(TM) i7-4790 CPU # 3.60GHz
stepping : 3
microcode : 0x1d
cpu MHz : 3800.109
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 1
cpu cores : 4
apicid : 3
initial apicid : 3
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts
bugs :
bogomips : 7183.28
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 6
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Core(TM) i7-4790 CPU # 3.60GHz
stepping : 3
microcode : 0x1d
cpu MHz : 3799.968
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 2
cpu cores : 4
apicid : 5
initial apicid : 5
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts
bugs :
bogomips : 7183.28
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 7
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Core(TM) i7-4790 CPU # 3.60GHz
stepping : 3
microcode : 0x1d
cpu MHz : 3799.968
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 3
cpu cores : 4
apicid : 7
initial apicid : 7
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts
bugs :
bogomips : 7183.28
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
Memory: 4k page, physical 32629180k(11245384k free), swap 0k(0k free)
vm_info: Java HotSpot(TM) 64-Bit Server VM (25.111-b14) for linux-amd64 JRE (1.8.0_111-b14), built on Sep 22 2016 16:14:03 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8)
time: Fri Nov 25 05:01:22 2016
elapsed time: 21112 seconds (0d 5h 51m 52s)
In my limited experience with JVM. So I am searching for a long time on net and find related errors at Oracle site.But I didn't find a solution from it. From my error log:
Memory: 4k page, physical 32629180k(11245384k free), swap 0k(0k free)
show the physical memory occupied too much.This can be caused by any bug that corrupts heap memory. It could be an issue with GC, with the compiler, with bad native code.
If you don't use any native libraries that might have corrupted the heap, this a a bug in the JVM. You should check whether Oracle already knows about the bug and (if not) file a bug report.
The name of the problematic frame (G1ParScanThreadState::copy_to_survivor_space) strongly suggest that the garbage collector (GC) crashes. So for a workaround until the bug is fixed you can try any of the following:
Monitor the garbage collector and make sure the memory usage doesn't increase over time and the garbage collector doesn't use too much CPU time
Change the garbage collector parameters (see Java's command line
parameters)
Switch to a different garbage collector (see Java's command line parameters)
As you're trying to work around a bug, it's trial and error.

Perf tool not detecting TLB misses

I'm trying to measure TLB misses in my laptop with the following configuration:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 61
model name : Intel(R) Core(TM) i5-5200U CPU # 2.20GHz
stepping : 4
microcode : 0x1d
cpu MHz : 1593.625
cache size : 3072 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 20
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap xsaveopt
bugs :
bogomips : 4389.43
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
The above is for processor 0 , with similar results for processor 1,2 and 3.
Here's my result for trying to measure TLB misses:
perf stat -B -e dTLB-load-misses sleep 2
Performance counter stats for 'sleep 2':
0 dTLB-load-misses
2.001923304 seconds time elapsed
Not sure how to interpret this. Any insights? I read somewhere that perf doesn't work well on Sandy Bridge laptops...

Resources