Recently have been experiencing a strange problem on our JBoss5. After running our app for a while, the clients who call the EJB's start Throwing NoClassDefFoundError on some classes. After a restart, all is fine again for a while until other functions start returning NoClassDefFoundError. It seems totally random and a restart of the JBoss seems to cure the problem. This particular JBoss runs in a VM with 4GB of RAM and 2 CPU's and more than enough disk space (it has never has less than 5Gb free at any time). We have increased the Xmx and XMs to 2048 Mb and the permgen sweeping to 512Mb (ridiculousness I know). Intersetingly, the same install runs elsewhere on a VM with half the memory and Xmx/Xms/permgen settings with no problems whatsover. The Only differnce being that the last stable one is not any major load , although the broken one only has maximum of 8 clients connecting which could hardly constitues "load" in my books :-). Has anybody come across this kind of problem, or have any idea of what it could be?
Not really an answer, be we had a RPM install for CENTOS 5. We removed that and used the zip file from the the jboss site instead. That cured the problem. Looks like we had a Dodgy install.
Related
I am running Cloudera Hadoop on my laptop and Oracle VirtualBox VM.
I have given 5.6 GB out of mine 8 and six from eight cores as well.
And still I am not able to keep it up and running.
Even without load services would not stay up and running and when I try a query at least Hive will be down within 20 minutes. And sometimes they go down like dominoes: one after another.
More memory seemed to help some: with 3GB and all services, Hue was blinking with red colors when the Hue itself managed to get up. And after rebooting it would takes 30 - 60 minutes before I manage to get the system up enough to even try running anything on it.
There has been two sensible notes (that I have managed to find):
- Warning of swapping.
- Crashing note when the system used 26 GB of virtual memory which was not enough.
My dataset is less than one megabyte, so it is hard to understand why the system would go up to dozens of gigabytes, but for whatever was reason for that has passed: now the system is running more steadily around the 5.6 GB that I have given to it after closing down a few services: see my answer to myself.
And still it is just more stable. Right after I got a warning of swapping and the Hive went down again. What could be reason for more-or-less all Hadoop services going down if the VM starts to swap?
I don't have enough reputation to post the picture to here, but when Hive went down again it was swapping 13 pages / second and utilizing 5.9 GB / 5.6 GB. So basically my system starts crashing more-or-less right after it start to swap. "428 pages were swapped to disk in the previous 15 minute(s)"
I have used default installation options as far as hard drive is concerned.
Only addition is a shared folder between Windows and VM. That works somewhat strangely locking files all the time, so I used it just like FTP and only for passing files from one system to another. Thus I can go days without using it, but systems still crash, so that is not the cause either.
Now that the system is mostly up, services crash still about twice a day: Service Monitor and Hive are quite even with their crashing frequency. After those come Activity Monitor and Event Server, which appear to crash always together. I believe Yarn crashes as well, but it gets up on its own. Last time Hive crashed first, and then it got followed by Service Monitor, Hive (second time), Activity Monitor and Event Server all.
As swap is disk, perhaps the problem is with disk:
# cat /etc/fstab
# swapoff -a
# badblocks -v /dev/VolGroup/lv_swap
Checking blocks 0 to 8388607
Checking for bad blocks (read-only test): done
Pass completed, 0 bad blocks found.
# badblocks -vw /dev/VolGroup/lv_swap
Checking for bad blocks in read-write mode
From block 0 to 8388607
Testing with pattern 0xaa: done
Reading and comparing: done
Testing with pattern 0x55: done
Reading and comparing: done
Testing with pattern 0xff: done
Reading and comparing: done
Testing with pattern 0x00: done
Reading and comparing: done
Pass completed, 0 bad blocks found.
So nothing wrong with swap disk and I have not noticed any disk error anywhere else either.
Note that you could check file system from Windows side also. But I expect that if you make Windows to fix your Linux file system, you have good chances of destroying your Linux with that, so I did my checks somewhat pessimistically, because AFAIK these commands are safe to execute.
About half of the services kept going down, so giving more specifics would be a long story.
I succeeded to get the system more stable by closing down flume, hbase, impala, ks_indexer, oozie, spark and sqoop. And by increasing more memory to some remaining services that complained they had not been given enough memory.
Also I fixed couple of thing on the Windows side, I am not sure which one of these helped:
- MsMpEng.exe kept my hard drive busy. I didn't have permissions to kill it, but I decreased its priority to lowest possible.
- CcmExec.exe got to loop on my DVD and kept reading it for forever. This I solved by taking the DVD out from the drive. Then later on I killed the process tree to keep it from bothering for a while.
I found these using Windows resource manager.
The VM requires 4GB: http://www.cloudera.com/content/cloudera-content/cloudera-docs/DemoVMs/Cloudera-QuickStart-VM/cloudera_quickstart_vm.html You should use that.
I am not clear whether you are using the QuickStart VM though. It's set up to run just the essential services and tuned to conserve memory rather than exploit lots of memory.
It sounds like you are running your own installation, on one virtual machine, on your Windows machine. You may be running an entire cluster's worth of services on one desktop machine. Each of these services has master, worker processes, monitoring processes, etc. You don't need most of them.
You also probably have left memory settings at default suitable for a server-class machine of 16+ GB RAM. Remember these services usually run across many machines, not all on one.
Finally, you're clearly swapping, and that makes things incredibly slow. Remember this is all through a VM too!
Bottom line, use the QuickStart VM if you really want a 1-machine cluster tuned correctly. If you want a real cluster or more services, you need more hardware.
Also consider: cloudera.com/live contains a full CDH 5.1 cluster + sample data, running on demand on AWS. Of course, the advantage of the VM is that you can BYOD, but if you're simply looking for a hands-on Hadoop experience, Live is a great option.
I am working with a SuSE machine (cat /etc/issue: SUSE Linux Enterprise Server 11 SP1 (i586)) running Postgresql 8.1.3 and the Slony-I replication system (slon version 1.1.5). We have a working replication setup going between two databases on this server, which is generating log shipping files to be sent to the remote machines we are tasked to maintain. As of this morning, we ran into a problem with this.
For a while now, we've had strange memory problems on this machine - the oom-killer seems to be striking even when there is plenty of free memory left. That has set the stage for our current issue to occur - we ran a massive update on our system last night, while replication was turned off. Now, as things currently stand, we cannot replicate the changes out - slony is attempting to compile all the changes into a single massive log file, and after about half an hour or so of running, it trips over the oom-killer issue, which appears to restart the replication package. Since it is constantly trying to rebuild that same package, it never gets anywhere.
My first question is this: Is there a way to cap the size of Slony log shipping files, so that it writes out no more than 'X' bytes (or K, or Meg, etc.) and after going over that size, closes the current log shipping file and starts a new one? We've been able to hit about four megs in size before the oom-killer hits with fair regularity, so if I could cap it there, I could at least start generating the smaller files and hopefully eventually get through this.
My second question, I guess, is this: Does anyone have a better solution for this issue than the one I'm asking about? It's quite possible I'm getting tunnel vision looking at the problem, and all I really need is -a- solution, not necessarily -my- solution.
I have a project with a few dozen EJBs and a web project that I'm attempting to deploy from NetBeans 7.0.1 on my laptop directly to Glassfish 3.0.1 on a Solaris 10 server. Ignoring the transfer time of copying the ear file, the deployments seem to take a very long time (3 minutes is the fastest I've seen it). The performance of deployments seems to degrade over time, to the point where eventually I have to restart my domain. I've seen a deployment take anywhere from 12-20 minutes after I've redeployed my application a few times.
I deploy by right-clicking my main project in NetBeans and picking "Deploy". What options do I have for making this more usable? What additional information can I provide to help track down the source of the problem?
UPDATE: Letting the most recent deployment run through to completion, it ended with the following error message in my log:
[#|2011-08-20T14:05:54.494-0400|SEVERE|glassfish3.1|javax.enterprise.system.tools.admin.org.glassfish.deployment.admin|_ThreadID=2490;_ThreadName=Thread-1;|Exception while loading the app : EJB Container initialization error
java.lang.OutOfMemoryError: Java heap space
|#]
So this does appear to be memory related. The deployment itself ran for over 10 minutes before dying in this manner.
Because of my application's requirements, I had to increase the heap space from the default 512MB allocation to a min/max of 1GB/2GB. This seems to have improved deployment slightly. My typical deployment time is ~1 minute now. It's not stellar, but it's at least tolerable.
This is the result of a serious bug in the weld-integration module of Glassfish. Without this bug the deployment is more than 20!! times faster as before.
http://java.net/jira/browse/GLASSFISH-18875
Please vote to get this fixed as soon as possible!
In a nutshell: my JBoss instance is running ok, but after some days it's performance is slowly degrading.
Detailed:
I've got a setup with JBoss 5.1.0-GA and Java 1.6.0_18-b07 (x64) running on a 64 bits RHEL 4 box. The hardware is a virtual machine with 8 core Xeon X5550 / 20G ram.
The product deployed in JBoss contains a webservice on which a endurance test is performed.
No database is involved in the process.
The tests are performed using soapui with 4 threads and the tests are configured to create 20% cpu usage.
Let say, at first the average response times are 300ms. After 2 days, the response times are now 600ms, which I don't understand.
Of course I did some checks:
There are no memory leaks (confirmed with jprofiler)
Heap mem is always around 25-50%, perm space usage is 50%
GC is almost never busy
All threads are idle after inspecting a thread dump
While do some further investigations, I did a cpu profile with JProfiler at the beginning (when it's still fast), and on on the (slow) end. What I see then, is that every single call just is 100% slower!
Even call's to a simple Map#put(). (the # of invocations and the content of these maps are the same).
When running a profiler, there are no signs of blocked threads, just running threads.
Does anyone has a clue what's causing the performance degradation?
Thanks!
Update: solved the performance degradation by upgrading the Java version to 1.6.0_24 !
While out of options, I scanned through all the release notes of the java vm, and discovered a performance and reliability fix in 1.6.0_23. See also the
1.6.0_23 release notes
After the jvm upgrade, the performance stays the same and does not degrade over days.
Solution found by Jan :
Solved the performance degradation by upgrading the Java version to 1.6.0_24 !
While out of options, I scanned through all the release notes of the java vm, and discovered a performance and reliability fix in 1.6.0_23. See also the 1.6.0_23 release notes
After the jvm upgrade, the performance stays the same and does not degrade over days.*
I installed Eclipse Galileo and after trouble with the JDK, its starting well. But I have big problems with performance. Every third second, Eclipse is hanging for a while. It runs not smoothly. I need a efficient IDE as Eclipse for work. So, it would be very nice when you have a fast answer :)
Both Eclipse as the JDK are 64-bit versions.
Have you any ideas?
Update:
I can´t really explain the problem from scratch. But in my case, it was a trouble between Eclipse´s and the auto-complete-function of my OSK. If I disabled auto-complete, there was no hangs anymore. I don't know why the using of the OSK blocks the thread (?) of the whole editor.
Maybe anyone of you, has an idea why?
From your description it sounds like the garbage collector is being triggered. How much RAM have you got in the system? Depending on the plugins you're loading Eclipse can need quite a lot of it. I think the bare minimum is 256 Mb, and realistically you need at least 1 Gb, more if you're doing web development
Have you got an up-to-date JVM? Eclipse generally runs much quicker with a 1.6 JVM.
One other thing to check, do you have an aggressive virus scanner? Eclipse plugins are collections of small files in jars, some virus scanners can really slow down the performance. If you are able, remove the Eclipse install directory from the scanned files.
See this EclipseZone article or this question for some general performance tips.
Run Process Monitor and see what kind of system calls and/or file system calls the JVM is doing. Use filters aggressively to pinpoint a specific process. I had a similar issue where a graphics card utility triggered a flood of registry lookups for every UI update which just made Eclipse incredibly slow. (Somehow SWT was hit exceptionally hard by this bug, I'm not sure why.)
EDIT: I meant "Process Monitor", not "Process Explorer". But the link was correct.
You could try to run it from within a virtual machine set up on your computer to see if the problem is still there. If it's not, it might be faster for you to just work from within the virtual machine environment. Doesn't address the issue, but it may help avoid it altogether.
I had same problem so I just switched to the 32 bit version of Eclipse and it runs fine with no performance issues.
I can´t really explain the problem from scratch. But in my case, it was a trouble between Eclipse´s and the auto-complete-function of my OSK. If I disabled auto-complete, there was no hangs anymore. I don´t know why the using of the OSK blocks the thread (?) of the whole editor.
Maybe anyone of you, has an idea why?
Thanks for any help!
Same problem for me
I have Windows 7 professional 64 bit and 8gb of RAM
Eclipse is extremely slow, probably 5 times slower than the Windows Vista 32 bit machine I have recently upgraded from (Europa version) - and that machine was a complete dog!
Adding -Xmx1024m -XX:+UseParallelGC -vm C:\Program Files\Java\jdk1.6.0_20\jre\bin\server\jvm.dll has made a pretty big difference
I have same problem as not respoinding.
I searched in internet for a solution. I found one by adding the below to
eclipse helios config file.
-vm
C:\Program Files\Java\jre7\bin\javaw.exe
Initially it looks Ok to start and click on the different buttons and
running on several files in eclips project. But when I click on debug
and step by step process. Then it is again showing not respoding.
I have a new laptop win7 installed.
I have the same problems with the 32 bit version, running with a 32 bit JVM.
It's more that my RCP Application which I developed with Eclipse is slow. I've tried both -Xmx1024m and -XX:+UseParallelGC, with no noticable effect. Has this issue been registed with eclipse.org?