How do I absolutely ensure that a Phusion Passenger instance stays alive? - ruby

I'm having a problem where no matter what I try all Passenger instances are destroyed after an idle period (5 minutes, but sometimes longer). I've read the Passenger docs and related questions/answers on Stack Overflow.
My global config looks like this:
PassengerMaxPoolSize 6
PassengerMinInstances 1
PassengerPoolIdleTime 300
And my virtual config:
PassengerMinInstances 1
The above should ensure that at least one instance is kept alive after the idle timeout. I'd like to avoid setting PassengerPoolIdleTime to 0 as I'd like to clean up all but one idle instance.
I've also added the ruby binary to my CSF ignore list to prevent the long running process from being culled.
Is there somewhere else I should be looking?

Have you tried to set the PassengerMinInstances to anything other than 1 like 3 and see that work?

Ok, I found the answer for you on this link: http://groups.google.com/group/phusion-passenger/browse_thread/thread/7557f8ef0ff000df/62f5c42aa1fe5f7e . Look at the last comment by Phusion guy.
Is there a way to ensure that I always have 10 processes up and
running, and that each process only serves 500 requests before being
shut down?
"Not at this time. But the current behavior is such that the next time
it determines that more processes need to be spawned it will make sure
L at least PassengerMinInstances processes exist."
I have to say their documentation doesn't seem to match what the current behavior.

This seems to be quite a common problem for people running Apache on WHM/cPanel:
http://techiezdesk.wordpress.com/2011/01/08/apache-graceful-restart-requested-every-two-hours/
Enabling piped logging sorted the problem out for me.

Related

Bash script that checks website every 10 seconds

The following script checks a sites content to see if any change has been done to it, every 10 seconds. It's for a very time sensitive application. If something on the site has changed, I merely have seconds to do something else. It will then start a new download and compare cycle and wait for the next change and do cycle. The do something else, has yet to be scripted and not relevant to the question.
The question: Will it be a problem for a public website to have a script downloading a single page every 10-15 seconds. If so, is there any other way to monitor a site, unmanned?
#!/bin/bash
Domain="example.com"
Ocontent=$(curl -L "$Domain")
Ncontent="$Ocontent"
until [ "$Ocontent" != "$Ncontent" ]; do
Ocontent=$(curl -L "$Domain")
#CONTENT CHANGED TRUE
#if [ "$Ocontent" == "$Ncontent ]; then
# Ocontent=$(curl -L "$Domain")
#fi
echo "$Ocontent"
sleep 10
done
The problems you're going to run into:
If the site notices and has a problem with it, you may end up on a banned IP list. Using an IP pool or other distributed resource can mitigate this.
Pinging a website precisely every x number of seconds is unlikely. Network latency is likely to cause a great deal of variance in this.
If you get a network partition, your code should know how to cope. (What if your connection goes down? What should happen?)
Note that getting the immediate response is only part of downloading a webpage. There may be changes to referenced files, such as css, javascript or images that are not immediately apparent from just the original http response.

How to Fix Read timed out in Elasticsearch

I used Elasticsearch-1.1.0 to index tweets.
The indexing process is okay.
Then I upgraded the version. Now I use Elasticsearch-1.3.2, and I get this message randomly:
Exception happened: Error raised when there was an exception while talking to ES.
ConnectionError(HTTPConnectionPool(host='127.0.0.1', port=8001): Read timed out. (read timeout=10)) caused by: ReadTimeoutError(HTTPConnectionPool(host='127.0.0.1', port=8001): Read timed out. (read timeout=10)).
Snapshot of the randomness:
Happened --33s-- Happened --27s-- Happened --22s-- Happened --10s-- Happened --39s-- Happened --25s-- Happened --36s-- Happened --38s-- Happened --19s-- Happened --09s-- Happened --33s-- Happened --16s-- Happened
--XXs-- = after XX seconds
Can someone point out on how to fix the Read timed out problem?
Thank you very much.
Its hard to give a direct answer since the error your seeing might be associated with the client you are using. However a solution might be one of the following:
1.Increase the default timeout Globally when you create the ES client by passing the timeout parameter. Example in Python
es = Elasticsearch(timeout=30)
2.Set the timeout per request made by the client. Taken from Elasticsearch Python docs below.
# only wait for 1 second, regardless of the client's default
es.cluster.health(wait_for_status='yellow', request_timeout=1)
The above will give the cluster some extra time to respond
Try this:
es = Elasticsearch(timeout=30, max_retries=10, retry_on_timeout=True)
It might won't fully avoid ReadTimeoutError, but it minimalize them.
Read timeouts can also happen when query size is large. For example, in my case of a pretty large ES index size (> 3M documents), doing a search for a query with 30 words took around 2 seconds, while doing a search for a query with 400 words took over 18 seconds. So for a sufficiently large query even timeout=30 won't save you. An easy solution is to crop the query to the size that can be answered below the timeout.
For what it's worth, I found that this seems to be related to a broken index state.
It's very difficult to reliably recreate this issue, but I've seen it several times; operations run as normal except certain ones which periodically seem to hang ES (specifically refreshing an index it seems).
Deleting an index (curl -XDELETE http://localhost:9200/foo) and reindexing from scratch fixed this for me.
I recommend periodically clearing and reindexing if you see this behaviour.
Increasing various timeout options may immediately resolve issues, but does not address the root cause.
Provided the ElasticSearch service is available and the indexes are healthy, try increasing the the Java minimum and maximum heap sizes: see https://www.elastic.co/guide/en/elasticsearch/reference/current/jvm-options.html .
TL;DR Edit /etc/elasticsearch/jvm.options -Xms1g and -Xmx1g
You also should check if all fine with elastic. Some shard can be unavailable, here is nice doc about possible reasons of unavailable shard https://www.datadoghq.com/blog/elasticsearch-unassigned-shards/

FastRWeb performance on Ubuntu with built-in web server

I have installed FastRWeb 1.1-0 on an installation of R 2.15.2 (Trick or Treat) running on an Ubuntu 10.04 box. I hope to use the resulting system to run a web service.
I've configured the system by setting http.port to 8181 in rserve.conf and unsetting the socket destination. I've assigned .http.request to FastRWeb::.http.request. I exchange JSON blobs between the client and the server using HTTP POST (the second blob can exceed 150KB in size, and will not fit in an HTTP GET query string.)
Everything works end to end -- I have a little client-side R script which generates JSON RPC calls across the channel. I see the run function invoked, and see it returned.
I've run into a significant performance problem, however: the return path takes in excess of 12 seconds from the time run() returns (including the call to done()) and the time that the R client gets the return value. RCurl doesn't seem to be the culprit; it appears that something is taking twelve seconds to do a return.
Does anybody have any suggestions of where to look? I can easily shift over to using Apache 2.0 and CGI, but, honestly, I'd rather keep everything R centric.
Answering my own question.
I wrapped .http.request with an Rprof()/Rprof(NULL) pair and looked at the time spent in each routine. It turns out that the system spends ~11 seconds inside URLDecode in the standard implementation of .run. This looks like a scaling problem in URLDecode in the core.

Why are my delayed_job jobs re-running even though I tell them not to?

I have this in my initializer:
Delayed::Job.const_set( "MAX_ATTEMPTS", 1 )
However, my jobs are still re-running after failure, seemingly completely ignoring this setting.
What might be going on?
more info
Here's what I'm observing: jobs with a populated "last error" field and an "attempts" number of more than 1 (10+).
I've discovered I was reading the old/wrong wiki. The correct way to set this is
Delayed::Worker.max_attempts = 1
Check your dbms table "delayed_jobs" for records (jobs) that still exist after the job "fails". The job will be re-run if the record is still there. -- If it shows that the "attempts" is non-zero then you know that your constant setting isn't working right.
Another guess is that the job's "failure," for some reason, is not being caught by DelayedJob. -- In that case, the "attempts" would still be at 0.
Debug by examining the delayed_job/lib/delayed/job.rb file. Esp the self.workoff method when one of your jobs "fail"
Added #John, I don't use MAX_ATTEMPTS. To debug, look in the gem to see where it is used. Sounds like the problem is that the job is being handled in the normal way rather than limiting attempts to 1. Use the debugger or a logging stmt to ensure that your MAX_ATTEMPTS setting is getting through.
Remember that the DelayedJobs jobs runner is not a full Rails program. So it could be that your initializer setting is not being run. Look into the script you're using to run the jobs runner.

Django1.3 multiple gunicorn workers caching problems

i have weird caching problems with the 1.3 version of django. I probably have something configured wrong, but am not sure what.
A good example is django-avatar, which uses caching and many people use it. Even if I dont have a cache backend defined the avatar seems to be cached, which by itself would be ok, but it keeps switching back and forth between the last values cached. Example: I upload a new avatar, now on approximately 50% of the requests it will show me the new one, 50% the old one. If I delete the old one I still get it on the site 50% of the time. The only way to fix it is to disable the caching of the avatar by setting it to one second.
First I thought it was because i used django.core.cache.backends.locmem.LocMemCache, which I never used before, but it even happens when I dont configure a cache backend at all.
I found one similar bug:
Django caching bug .. even if caching is disabled
but my pages render just fine, its the templatetags (for now) that cause the problems in my setup.
I use django 1.3, postgres, nginx, gunicorn 0.12.0, greenlet==0.3.1, eventlet==0.9.16
I just did some more testing and realized that it only happens when I start gunicorn using the config file. If I start it with ./manage.py run_gunicorn everything is fine. Running "gunicorn_django -c deploy/gunicorn.conf.py" causes the problems.
The only explanation I can think of is that each worker gets his own cache (I wonder why, since I did not define a cache).
Update: running ./manage.py run_gunicorn -w 4 also causes the same problems. Therefore I am almost certain that the multiple workers are causing the problems and each worker caches the values seperately.
My configuration:
import os
import socket
import sys
PORT = 8000
PROC_NAME = 'myapp_gunicorn'
LOGFILE_NAME = 'gunicorn.log'
TIMEOUT = 3600
IP = '127.0.0.1'
DEPLOYMENT_ROOT = os.path.dirname(os.path.abspath(__file__))
SITE_ROOT = os.path.abspath(os.path.sep.join([DEPLOYMENT_ROOT, '..']))
CPU_CORES = os.sysconf("SC_NPROCESSORS_ONLN")
sys.path.insert(0, os.path.join(SITE_ROOT, "apps"))
bind = '%s:%s' % (IP, PORT)
logfile = os.path.sep.join([DEPLOYMENT_ROOT, 'logs', LOGFILE_NAME])
proc_name = PROC_NAME
timeout = TIMEOUT
worker_class = 'eventlet'
workers = 2 * CPU_CORES + 1
I also tried it without using 'eventlet', but got the same errors.
Thanks for any help.
It is most likely defaulting to the in-memory-cache, which means each worker has it's own version of the cache in it's own memory space. If you hit thread 1 you get a different cache then thread 3. Nginx is spreading the load between each thread most likely via a round robin distribution, so you are changing threads each hit. Which explains your wacky results.
When you do manage.py run_gunicorn it is most likely running single threaded, and thus only one cache, and that is why you don't see the same results.
Using memcached or something similar is the way to go.

Resources