Jruby Rails app on Tomcat CPU Usage spikes - jms

This might also belong on serverfault. It's kind of a combo between server config and code (I think)
Here's my setup:
Rails 2.3.5 app running on jruby 1.3.1
Service Oriented backend over JMS with activeMQ 5.3 and mule 2.2.1
Tomcat 5.5 with opts: "-Xmx1536m -Xms256m -XX:MaxPermSize=256m -XX:+CMSClassUnloadingEnabled"
Java jdk 1.5.0_19
Debian Etch 4.0
Running top, every time i click a link on my site, I see my java process CPU usage spike. If it's a small page, it's sometimes just 10% usage, but sometimes on a more complicated page, my CPU goes up to 44% (never above, not sure why). In this case, a request can take upwards of minutes while my server's load average steadily climbs up to 8 or greater. This is just from clicking one link that loads a few requests from some services, nothing too complicated. The java process memory hovers around 20% most of the time.
If I leave it for a bit, load average goes back down to nothing. Clicking a few more links, climbs back up.
I'm running a small amazon instance for the rails frontend and a large instance for all the services.
Now, this is obviously unacceptable. A single user can bring spike the load average to 8 and with two people using it, it maintains that load average for the duration of our using the site. I'm wondering what I can do to inspect what's going on? I'm at a complete loss as to how I can debug this. (it doesn't happen locally when I run the rails app through jruby, not inside the tomcat container)
Can someone enlighten me as to how I might inspect on my jruby app to find out how it could possibly be using up such huge resources?
Note, I noticed this a little bit before, seemingly at random, but now, after upgrading from Rails 2.2.2 to 2.3.5 I'm seeing it ALL THE TIME and it makes the site completely unusable.
Any tips on where to look are greatly appreciated. I don't even know where to start.

Make sure that there is no unexpected communication between the Tomcat and something else. I would check in the first place if:
ActiveMQ broker doesn't communicate with the other brokers in your network. By default AMQ broker start in OpenWire auto-discovery mode.
JGroups/Multicasts in general do not communicate with something in your network.
This unnecessary load may result from the processing of the messages coming from another application.

Related

Ubuntu instance gets unresponsive while performing cap deployment

I have a small ruby on rails application which i have deployed on an amazon ec-2 instance using capistrano, my instance is a t2.small instance with nginx installed on it and local postgress db installed on the server too. i have a development instance on which i do frequent deployments, recently whenever i try to do a capistrano deployment on my ec-2 instance the cpu-utilization has an enormous spike, usually is its between 20-25% but during deployment for some reason it goes upto 85% which makes my instance unresponsive and i have to do a hard restart on my server to get it back working
I dont know why is this happening and what should i do to solve this because load balancing and auto scaling makes no sense in this scenario as the issue occurs only during deployment
I have attached a screenshot of my server cpu utilization and the 2 high peaks are both when i performed cap deployment
The only solution i can think of is increasing the instance type, but i want to know what other options do i have to solve this. Any help is appreciated, thanks in advance
If this is interim spike (only during installation) and you don't need high CPU during application usage, you may try t2.unlimited approach.
If t2.unlimited couldn't support your need, I think increasing the instance type is the only option left for you.

How many users should a EC2 Micro Instance be able to handle only with a nginx server?

I have a iOS Social App.
This app talks to my server to do updates & retrieval fairly often. Mostly small text as JSON. Sometimes users will upload pictures that my web-server will then upload to a S3 Bucket. No pictures or any other type of file will be retrieved from the web-server
The EC2 Micro Ubuntu 13.04 Instance runs PHP 5.5, PHP-FPM and NGINX. Cache is handled by Elastic Cache using Redis and the database connects to a separate m1.large MongoDB server. The content can be fairly dynamic as newsfeed can be dynamic.
I am a total newbie in regards to configuring NGINX for performance and I am trying to see whether I've configured my server properly or not.
I am using Siege to test my server load but I can't find any type of statistics on how many concurrent users / page loads should my system be able to handle so that I know that I've done something right or something wrong.
What amount of concurrent users / page load should my server be able to handle?
I guess if I cant get hold on statistic from experience what should be easy, medium, and extreme for my micro instance?
I am aware that there are several other questions asking similar things. But none provide any sort of estimates for a similar system, which is what I am looking for.
I haven't tried nginx on microinstance for the reasons Jonathan pointed out. If you consume cpu burst you will be throttled very hard and your app will become unusable.
IF you want to follow that path I would recommend:
Try to cap cpu usage for nginx and php5-fpm to make sure you do not go over the thereshold of cpu penalities. I have no ideia what that thereshold is. I believe the main problem with micro instance is to maintain a consistent cpu availability. If you go over the cap you are screwed.
Try to use fastcgi_cache, if possible. You want to hit php5-fpm only if really needed.
Keep in mind that gzipping on the fly will eat alot of cpu. I mean alot of cpu (for a instance that has almost none cpu power). If you can use gzip_static, do it. But I believe you cannot.
As for statistics, you will need to do that yourself. I have statistics for m1.small but none for micro. Start by making nginx serve a static html file with very few kb. Do a siege benchmark mode with 10 concurrent users for 10 minutes and measure. Make sure you are sieging from a stronger machine.
siege -b -c10 -t600s 'http:// private-ip /test.html'
You will probably see the effects of cpu throttle by just doing that! What you want to keep an eye on is the transactions per second and how much throughput can the nginx serve. Keep in mind that m1small max is 35mb/s so m1.micro will be even less.
Then, move to a json response. Try gzipping. See how much concurrent requests per second you can get.
And dont forget to come back here and report your numbers.
Best regards.
Micro instances are unique in that they use a burstable profile. While you may get up two 2 ECU's in terms of performance for a short period of time, after it uses its burstable allotment it will be limited to around 0.1 or 0.2 ECU. Eventually the allotment resets and you can get 2 ECU's again.
Much of this is going to come down to how CPU/Memory heavy your application is. It sounds like you have it pretty well optimized already.

Web application very slow in Tomcat 7

I implemented a web application to start the Tomcat service works very quickly, but spending hours and when more users are entering is getting slow (up to 15 users approx.).
Checking RAM usage statistics (20%), CPU (25%)
Server Features:
RAM 8GB
Processor i7
Windows Server 2008 64bit
Tomcat 7
MySql 5.0
Struts2
-Xms1024m
-Xmx1024m
PermGen = 1024
MaxPernGen = 1024
I do not use Web server, we publish directly on Tomcat.
Entering midnight slowness is still maintained (only 1 user online)
The solution I have is to restart the Tomcat service and response time is again excellent.
Is there anyone who has experienced this issue? Any clue would be appreciated.
Not enough details provided. Need more information :(
Use htop or top to find memory and CPU usage per process & per thread.
CPU
A constant 25% CPU usage in a 4 cores system can indicate that a single-core application/thread is running 100% CPU on the only core it is able to use.
Which application is eating the CPU ?
Memory
20% memory is ~1.6GB. It is a bit more than I expect for an idle server running only tomcat + mysql. The -Xms1024 tells tomcat to preallocate 1GB memory so that explains it.
Change tomcat settings to -Xms512 and -Xmx2048. Watch tomcat memory usage while you throw some users at it. If it keeps growing until it reaches 2GB... then freezes, that can indicate a memory leak.
Disk
Use df -h to check disk usage. A full partition can make the issues you are experiencing.
Filesystem Size Used Avail Usage% Mounted on
/cygdrive/c 149G 149G 414M 100% /
(If you just discovered in this example that my laptop is running out of space. You're doing it right :D)
Logs
Logs are awesome. Yet they have a bad habit to fill up the disk. Check logs disk usage. Are logs being written/erased/rotated properly when new users connect ? Does erasing logs fix the issue ? (copy them somewhere for future analysis before you erase them)
If not. Logs are STILL awesome. They have the good habit to help you track bugs. Check tomcat logs. You may want to set logging level to debug. What happens last when the website die ? Any useful error message ? Do user connections are still received and accepted by tomcat ?
Application
I suppose that the 25% CPU goes to tomcat (and not mysql). Tomcat doesn't fail by itself. The application running on it must be failing. Try removing the application from tomcat (you can eventually put an hello world instead). Can tomcat keep working overnight without your application ? It probably can, in which case the fault is on the application.
Enable full debug logging in your application and try to track the issue. Run it straight from eclipse in debug mode and throw users at it. Does it fail consistently in the same way ?
If yes, hit "pause" in the eclipse debugger and check what the application is doing. Look at the piece of code each thread is currently running + its call stack. Repeat that a few times. If there is a deadlock, an infinite loop, or similar, you can find it this way.
You will have found the issue by now if you are lucky. If not, you're unfortunate and it's a tricky bug that might be deep inside the application. That can get tricky to trace. Determination will lead to success. Good luck =)
For performance related issue, we need to follow the given rules:
You can equalize and emphasize the size of xms and xmx for effectiveness.
-Xms2048m
-Xmx2048m
You can also enable the PermGen to be garbage collected.
-XX:+UseConcMarkSweepGC -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled
If the page changes too frequently to make this option logical, try temporarily caching the dynamic content, so that it doesn't need to be regenerated over and over again. Any techniques you can use to cache work that's already been done instead of doing it again should be used - this is the key to achieving the best Tomcat performance.
If there any database related issue, then can follow sql query perfomance tuning
rotating the Catalina.out log file, without restarting Tomcat.
In details,There are two ways.
The first, which is more direct, is that you can rotate Catalina.out by adding a simple pipe to the log rotation tool of your choice in Catalina's startup shell script. This will look something like:
"$CATALINA_BASE"/logs/catalina.out WeaponOfChoice 2>&1 &
Simply replace "WeaponOfChoice" with your favorite log rotation tool.
The second way is less direct, but ultimately better. The best way to handle the rotation of Catalina.out is to make sure it never needs to rotate. Simply set the "swallowOutput" property to true for all Contexts in "server.xml".
This will route System.err and System.out to whatever Logging implementation you have configured, or JULI, if you haven't configured.
See more at: Tomcat Catalina Out
I experienced a very slow stock Tomcat dashboard on a clean Centos7 install and found the following cause and solution:
Slow start up times for Tomcat are often related to Java's
SecureRandom implementation. By default, it uses /dev/random as an
entropy source. This can be slow as it uses system events to gather
entropy (e.g. disk reads, key presses, etc). As the urandom manpage
states:
When the entropy pool is empty, reads from /dev/random will block until additional environmental noise is gathered.
Source: https://www.digitalocean.com/community/questions/tomcat-8-5-9-restart-is-really-slow-on-my-centos-7-2-droplet
Fix it by adding the following configuration option to your tomcat.conf or (preferred) a custom file into /tomcat/conf/conf.d/:
JAVA_OPTS="-Djava.security.egd=file:/dev/./urandom"
We encountered a similar problem, the cause was "catalina.out". It is the standard destination log file for "System.out" and "System.err". It's size kept on increasing thus slowing things down and ultimately tomcat crashed. This problem was solved by rotating "catalina.out". We were using redhat so we made a shell script to rotate "catalina.out".
Here are some links:-
Mulesoft article on catalina (also contains two methods of rotating):
Tomcat Catalina Introduction
If "catalina.out" is not the problem then try this instead:-
Mulesoft article on optimizing tomcat:
Tuning Tomcat Performance For Optimum Speed
We had a problem, which looks similar to yours. Tomcat was slow to respond, but access log showed just milliseconds for answer. The problem was streaming responses. One of our services returned real-time data that user could subscribe to. EPOLL were becoming bloated. Network requests couldn't get to the Tomcat. And whats more interesting, CPU was mostly idle (since no one could ask server to do anything) and acceptor/poller threads were sitting in WAIT, not RUNNING or IN_NATIVE.
At the time we just limited amount of such requests and everything became normal.

GRAILS 2, memory issues

We have an application built using Grails 2.0.1 and MongoDB. And as our userbase have grown and we did some performance research, we noticed that for each typical request grails eats about 150Mb of RAM, and when RAM is about to reach maximum it performs GC.
We've put singleton mode for controllers, and non-transactional for Services. We use JRockit.
I'd like to know if it can be considered normal for grails app or no. Our website is nothing more than a usual website, no extra memory usages, just a user management system and the code itself seems to be OK.
Here are the plugins we use:
app.grails.version=2.0.1,
app.servlet.version=2.4,
app.version=0.1,
plugins.cache-headers=1.1.3,
plugins.code-coverage=1.2.5,
plugins.codenarc=0.12,
plugins.crypto=2.0,
plugins.gsp-arse=1.3
plugins.jaxrs=0.6,
plugins.mongodb=1.0.0.RC5,
plugins.navigation=1.2,
plugins.quartz=0.4.2,
plugins.redis=1.0.0.M9,
plugins.rendering=0.4.3,
plugins.selenium=0.8,
plugins.selenium-rc=1.0.2,
plugins.spring-security-core=1.2.7.2,
plugins.springcache=1.3.1,
plugins.svn=1.0.1,
plugins.tomcat=2.0.1,
plugins.ui-performance=1.2.2
On a Sun JDK, fire up jvisualvm (or the jrockit equivalent, if there is one. Otherwise get yourself a proper profiler that works with jrockit), attach it to your running server, start the profiler and analyze the output. This will give you an idea on where to look.
Maybe you are actually loading that much information from the backend storage. but that's just a guess.

Why are my basic Heroku apps taking two seconds to load?

I created two very simple Heroku apps to test out the service, but it's often taking several seconds to load the page when I first visit them:
Cropify - Basic Sinatra App (on github)
Textile2HTML - Even more basic Sinatra App (on github)
All I did was create a simple Sinatra app and deploy it. I haven't done anything to mess with or test the Heroku servers. What can I do to improve response time? It's very slow right now and I'm not sure where to start. The code for the projects are on github if that helps.
If your application is unused for a while it gets unloaded (from the server memory).
On the first hit it gets loaded and stays loaded until some time passes without anyone accessing it.
This is done to save server resources. If no one uses your app why keep resources busy and not let someone who really needs use them ?
If your app has a lot of continous traffic it will never be unloaded.
There is an official note about this.
You might also want to investigate the caching options you have on Heroku w/ Varnish and Memcached. These are persisted independent of the dynos.
For example, if you have an unchanging homepage, you can cache that for extended periods in Varnish by adding Cache-Control headers to the response. Then your users won't experience the load hit until they want to "do something" rather than when they arrive.
You should check out Tom Robinson's answer to "Scalability: How Does Heroku Work?" on Quora: http://www.quora.com/Scalability/How-does-Heroku-work
Heroku divides up server resources among many different customers/applications. Your app is allotted blocks of computing power. Heroku partitions based on resource demand. When you have a popular application that demands more power, you can pay for more 'dynos' (application containers) and then get a larger chunk of the pie in return.
In your case though, you are running a free app that few people--if any outside of you--are visiting/using. Therefore, Heroku cuts down on the resources you're getting by unloading your app--putting it in hibernation essentially--until there is a request made to your address. When that happens, and your app has been idling for a long time, it takes time to reload.
Add 1 extra dyno to keep your app from falling asleep, if that reload time is important.
I am having the same problem. I deployed a Rails 3 (1.9.2) app last night and it's slow. I know that 1.9.2/Rails 3 is in BETA on Heroku but the support ticket said it should be fine using some instructions they sent me.
I understand that the first request after a long time takes the longest. Makes sense. But simply loading pages that don't even connect to a DB taking 10 seconds sometimes is pretty bad.
Anyway, you might want to try what I'm going to do. That is profile my app and see how long it takes locally. If it's taking 400ms then something is wrong. But if I get 50ms locally and it still takes 10 seconds on Heroku then something is definitely wrong.
Obviously, caching helps but you only get 5MB for free and once again, with ONE person using the site, it shouldn't be that slow.
I had the same problem with every app I have put on via heroku's free account. Now there are options of adding dynos so that your app does not get offloaded while it is not being used for a while, you can also try using redis or memcached for caching. But I used a hacky solution for my small scale project, I basically built web scraper put it inside an infinite loop with sleep and tada the website actually had much better response times(I guess it was not getting offloaded because of the script).

Resources