On a 6 node Cassandra (datastax-community-2.1) cluster running on EC2 (i2.xlarge) instances, Jvm hosting the cassandra service randomly crashes with following error.
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 12288 bytes for committing reserved memory.
# Possible reasons:
# The system is out of physical RAM or swap space
# In 32 bit mode, the process size limit was hit
# Possible solutions:
# Reduce memory load on the system
# Increase physical memory or swap space
# Check if swap backing store is full
# Use 64 bit Java on a 64 bit OS
# Decrease Java heap size (-Xmx/-Xms)
# Decrease number of Java threads
# Decrease Java thread stack sizes (-Xss)
# Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
#
# Out of Memory Error (os_linux.cpp:2747), pid=26159, tid=140305605682944
#
# JRE version: Java(TM) SE Runtime Environment (7.0_60-b19) (build 1.7.0_60-b19)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.60-b09 mixed mode linux-amd64 compressed oops)
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
--------------- T H R E A D ---------------
Current thread (0x0000000008341000): JavaThread "MemtableFlushWriter:2055" daemon [_thread_new, id=23336, stack(0x00007f9b71c56000,0x00007f9b71c97000)]
Stack: [0x00007f9b71c56000,0x00007f9b71c97000], sp=0x00007f9b71c95820, free space=254k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0x99e7ca] VMError::report_and_die()+0x2ea
V [libjvm.so+0x496fbb] report_vm_out_of_memory(char const*, int, unsigned long, char const*)+0x9b
V [libjvm.so+0x81d81e] os::Linux::commit_memory_impl(char*, unsigned long, bool)+0xfe
V [libjvm.so+0x81d8dc] os::pd_commit_memory(char*, unsigned long, bool)+0xc
V [libjvm.so+0x81565a] os::commit_memory(char*, unsigned long, bool)+0x2a
V [libjvm.so+0x81bdcd] os::pd_create_stack_guard_pages(char*, unsigned long)+0x6d
V [libjvm.so+0x9522de] JavaThread::create_stack_guard_pages()+0x5e
V [libjvm.so+0x958c24] JavaThread::run()+0x34
V [libjvm.so+0x81f7f8] java_start(Thread*)+0x108
Modifications in cassandra-env.sh
MAX_HEAP_SIZE="8G"
HEAP_NEWSIZE="800M"
JVM_OPTS="$JVM_OPTS -XX:TargetSurvivorRatio=50"
JVM_OPTS="$JVM_OPTS -XX:+AggressiveOpts"
JVM_OPTS="$JVM_OPTS -XX:+UseLargePages"
Here are the full jvm crash logs from two machines
db_06
db_05
Writes are about 10K-15K/sec and there are very few reads. Cassandra 2.0.9 with same settings never crashed. Any clues to what might be causing the to crashes?
Thanks
Related
When I run the Omnet++, and the IDE of it is loaded completely, after about 20 seconds, the Omnet++ IDE and the Omnet++ are terminated automatically.
What are the possible reasons for this unwanted phenomenon?
The Omnet++ is installed on Ubuntu LTS 22 in the virtual state by the VirtualBox.
Thanks in advance
Error Log :
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f5970257bd2, pid=2831, tid=3159
#
# JRE version: OpenJDK Runtime Environment Temurin-17.0.1+12 (17.0.1+12) (build 17.0.1+12)
# Java VM: OpenJDK 64-Bit Server VM Temurin-17.0.1+12 (17.0.1+12, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C [libc.so.6+0x7fbd2] fread+0x22
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /home/maryam/core.2831)
#
# An error report file with more information is saved as:
# /home/maryam/hs_err_pid2831.log
#
# If you would like to submit a bug report, please visit:
# https://github.com/adoptium/adoptium-support/issues
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
If the IDE just disappears without any interaction or error report, the most likely cause is that your VM is configured with very low amount of memory and the omkiller kills the IDE process because of low memory conditions.
Below error is occurring in logs. Application was running fine until last month and it started occurring all of sudden without any changes. I tried to find more about this and I found, this happens due to modification of zip/jar which is in use specially in Linux(Window does not allow modification). It is suggested to find the problematic code/scenarios, which I could not identify so far as this happens only in production. It is also suggested to Run JDKs with -Dsun.zip.disableMemoryMapping=true system property. Which will disable memory mapping and the application will fail instead with a "file corrupted" type ZipException. This property also has some performance impact. Could you please help how can find the problematic code? How likely this property can help in resolving the issue.
Thanks in advance.
Below are the errors shown :
1..
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGBUS (0x7) at pc=0x00007fe3fca8053e, pid=6120, tid=140605725304576
#
# JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build 1.8.0_60-b27)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C [libc.so.6+0x14f53e] __memmove_ssse3_back+0x23de
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
2.
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGBUS (0x7) at pc=0x00007f9137df6802, pid=13359, tid=140250685896448
#
# JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build 1.8.0_60-b27)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C [libzip.so+0x11802] newEntry+0x62
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
I am running a Python Script which needs a file (genome.fa) as a dependency(reference) to execute. When I run this command :
hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/had oop-streaming-2.5.1.jar -file ./methratio.py -file '../Test_BSMAP/genome.fa' - mapper './methratio.py -r -g ' -input /TextLab/sravisha_test/SamFiles/test_sam -output ./outfile
I am getting this Error:
15/01/30 10:48:38 INFO mapreduce.Job: map 0% reduce 0%
15/01/30 10:52:01 INFO mapreduce.Job: Task Idattempt_1422600586708_0001_m_000 009_0, Status : FAILED
Container [pid=22533,containerID=container_1422600586708_0001_01_000017] is running beyond physical memory limits. Current usage: 1.1 GB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used. Killing container.
I am using Cloudera Manager (Free Edition) .These are my config :
yarn.app.mapreduce.am.resource.cpu-vcores = 1
ApplicationMaster Java Maximum Heap Size = 825955249 B
mapreduce.map.memory.mb = 1GB
mapreduce.reduce.memory.mb = 1 GB
mapreduce.map.java.opts = -Djava.net.preferIPv4Stack=true
mapreduce.map.java.opts.max.heap = 825955249 B
yarn.app.mapreduce.am.resource.mb = 1GB
Java Heap Size of JobHistory Server in Bytes = 397 MB
Can Someone tell me why I am getting this error ??
I think your python script is consuming a lot of memory during the reading of your large input file (clue: genome.fa).
Here is my reason (Ref: http://courses.coreservlets.com/Course-Materials/pdf/hadoop/04-MapRed-6-JobExecutionOnYarn.pdf, Container is running beyond memory limits, http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/)
Container’s Memory Usage = JVM Heap Size + JVM Perm Gen + Native Libraries + Memory used by spawned processes
The last variable 'Memory used by spawned processes' (the Python code) might be the culprit.
Try increasing the mem size of these 2 parameters: mapreduce.map.java.opts
and mapreduce.reduce.java.opts.
Try increasing the maps spawning at the time of execution ... you can increase no. of mappers by decreasing the split size... mapred.max.split.size ...
It will have overheads but will mitigate the problem ....
My OS is Fedora 17. Recently, kernel tainted warning "kernel bug at kernel/auditsc.c:1772!-abrt" occurs:
This problem should not be reported (it is likely a known problem). A kernel problem occurred, but your kernel has been tainted (flags:GD). Kernel maintainers are unable to diagnose tainted reports.
Then, I get the following:
# cat /proc/sys/kernel/tainted
128
# dmesg | grep -i taint
[ 8306.955523] Pid: 4511, comm: chrome Tainted: G D 3.9.10-100.fc17.i686.PAE #1 Dell Inc.
[ 8307.366310] Pid: 4571, comm: chrome Tainted: G D 3.9.10-100.fc17.i686.PAE #1 Dell Inc.
It seems that the value "128" is much serious:
128 – The system has died.
How about this warning? Since chrome is flagged as the "Tainted" source, anybody also meet this matter?
To (over) simplify, 'tainted' means that the kernel is in a state other than what it would be in if it were built fresh from the open source origin and used in a way that it had been intended. It is a way of flagging a kernel to warn people (e.g., developers) that there may be unknown reasons for it to be unreliable, and that debugging it may be difficult or impossible.
In this case, 'GD' means that all modules are licensed as GPL or compatible (ie not proprietary), and that a crash or BUG() occurred.
The reasons are listed below:
See: oops-tracing.txt
---------------------------------------------------------------------------
Tainted kernels:
Some oops reports contain the string 'Tainted: ' after the program
counter. This indicates that the kernel has been tainted by some
mechanism. The string is followed by a series of position-sensitive
characters, each representing a particular tainted value.
1: 'G' if all modules loaded have a GPL or compatible license, 'P' if
any proprietary module has been loaded. Modules without a
MODULE_LICENSE or with a MODULE_LICENSE that is not recognised by
insmod as GPL compatible are assumed to be proprietary.
2: 'F' if any module was force loaded by "insmod -f", ' ' if all
modules were loaded normally.
3: 'S' if the oops occurred on an SMP kernel running on hardware that
hasn't been certified as safe to run multiprocessor.
Currently this occurs only on various Athlons that are not
SMP capable.
4: 'R' if a module was force unloaded by "rmmod -f", ' ' if all
modules were unloaded normally.
5: 'M' if any processor has reported a Machine Check Exception,
' ' if no Machine Check Exceptions have occurred.
6: 'B' if a page-release function has found a bad page reference or
some unexpected page flags.
7: 'U' if a user or user application specifically requested that the
Tainted flag be set, ' ' otherwise.
8: 'D' if the kernel has died recently, i.e. there was an OOPS or BUG.
9: 'A' if the ACPI table has been overridden.
10: 'W' if a warning has previously been issued by the kernel.
(Though some warnings may set more specific taint flags.)
11: 'C' if a staging driver has been loaded.
12: 'I' if the kernel is working around a severe bug in the platform
firmware (BIOS or similar).
13: 'O' if an externally-built ("out-of-tree") module has been loaded.
14: 'E' if an unsigned module has been loaded in a kernel supporting
module signature.
15: 'L' if a soft lockup has previously occurred on the system.
16: 'K' if the kernel has been live patched.
The primary reason for the 'Tainted: ' string is to tell kernel
debuggers if this is a clean kernel or if anything unusual has
occurred. Tainting is permanent: even if an offending module is
unloaded, the tainted value remains to indicate that the kernel is not
trustworthy.
Also showing numbers for the content of /proc/sys/kernel/tainted file:
Non-zero if the kernel has been tainted. Numeric values, which can be
ORed together. The letters are seen in "Tainted" line of Oops reports.
1 (P): A module with a non-GPL license has been loaded, this
includes modules with no license.
Set by modutils >= 2.4.9 and module-init-tools.
2 (F): A module was force loaded by insmod -f.
Set by modutils >= 2.4.9 and module-init-tools.
4 (S): Unsafe SMP processors: SMP with CPUs not designed for SMP.
8 (R): A module was forcibly unloaded from the system by rmmod -f.
16 (M): A hardware machine check error occurred on the system.
32 (B): A bad page was discovered on the system.
64 (U): The user has asked that the system be marked "tainted". This
could be because they are running software that directly modifies
the hardware, or for other reasons.
128 (D): The system has died.
256 (A): The ACPI DSDT has been overridden with one supplied by the user
instead of using the one provided by the hardware.
512 (W): A kernel warning has occurred.
1024 (C): A module from drivers/staging was loaded.
2048 (I): The system is working around a severe firmware bug.
4096 (O): An out-of-tree module has been loaded.
8192 (E): An unsigned module has been loaded in a kernel supporting module
signature.
16384 (L): A soft lockup has previously occurred on the system.
32768 (K): The kernel has been live patched.
65536 (X): Auxiliary taint, defined and used by for distros.
131072 (T): The kernel was built with the struct randomization plugin.
Source: https://www.kernel.org/doc/Documentation/sysctl/kernel.txt
Credit: https://askubuntu.com/questions/248470/what-does-the-kernel-taint-value-mean
I build a project just like ODesk Tracker which take snapshots after intervals. The application is working fine in windows and linux. But in Mac when i minimize the application. It gives Error and a popup displays Java Quit Unexpectedly.
A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007fff885a20e6, pid=1386, tid=1287
#
# JRE version: Java(TM) SE Runtime Environment (8.0-b118) (build 1.8.0-ea-b118)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b60 mixed mode bsd-amd64 compressed oops)
# Problematic frame:
# C [libobjc.A.dylib+0x80e6] objc_release+0x16
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /Users/babita/Documents/workspace/FBTracker 2/hs_err_pid1386.log