My production environment has started constantly throwing this error:
Error fetching message: ERR Error running script (call to f_0ab965f5b2899a2dec38dec41fff8c97f7a33ee9): #user_script:56: #user_script: 56: -OOM command not allowed when used memory > 'maxmemory'.
I am using the Heroku Redis addon with a worker dyno running Sidekiq.
Both Redis and the Worker Dyno have plenty of memory right now and the logs don't show them running out.
What is causing this error to be thrown and how can I fix it?
I had a job that required more memory than I had available in order to run.
Run "config get maxmemory" on your redis server. Maybe that config is limiting the amount of memory Redis is using.
Related
Background
We have a worker application server that does long running reporting export jobs. Since they are export jobs we connected it to a (managed, not serverless) Aurora database cluster with a write master and read replicas that auto-scale with the following scaling policy:
The worker uses the read endpoint that comes out-of-the-box with the db cluster that should be distributing load evenly on the existing readers.
Problem
We noticed that the export jobs that are attempting to connect to the read db cluster are failing with this error:
SQLSTATE[HY000]: General error: 7 SSL SYSCALL error: EOF detected
we were able to verify that this error happens exactly when autoscaling happens b/c we cross referenced the timing of the errors:
to make the problem worse, all subsequent export attempts fail with the same error (even after the auto-scaling is over!). The only way were able to fix this is by restarting the worker application servers.
Question
What can we do to let our database cluster gracefully handle incoming db connections to the read replicas while it's scaling? Or how do we force the worker to re-establish a new connection if it finds that the current one is terminated?
I run my Application on Kubernetes.
I have one Service for requests and one service for the worker processes.
If I access the Horizon UI it often shows the Inactive Status, but there are still jobs being processed by the worker. I know this because the JOBS PAST HOUR are getting more.
If I scale up my worker service there will be constantly "failing" Jobs with this exception Illuminate\Queue\MaxAttemptsExceededException.
If I connect directly to the pods and run ps aux I will see that there are horizon instances running.
If I connect to a pod on which the worker is running and execute the horizon:list command it tells me that one (or multiple) Masters are running.
How can I further debug this?
Laravel version: 5.7.15
Horizon version: 2.0.0
Redis version: 3.2.4
The issue was that the Server Time was out of Sync so the "old" ones got restartet all the time
I built my app with docker-compose , one container is database use mariadb image ,one php to run Laravel (I installed php-memcached or php-redis extension for my app), one cache container built on redis docker image .
at first everything goes on well , but after running 2 or 3 days , I got the php exception : Connection timed out [tcp://redis:6379];
I monitor the cpu and memory and network use zabbix installed by myself on host server , but I got these error :
monitor CPU
monitor memory
I changed cache container to memcached and 2 or 3 days same thing happen,
the only way I found to solve this problem is to restart system , and it can run another 2 or 3 days before getting the same error. you know it's not possible to restart system on production, so any one can suggest me where to solve the problem other than restarting system ?
Thanks!
I think you are facing problem with redis docker container. This type of error comes when memory is exhausted. You need to set max memory parameter of redis server.
Advice: Please try to use another image of redis.
I deploy Docker containers on Mesos(0.21) and Marathon(0.7.6) on Google Cloud Engine.
I use JMeter to test a REST service that run on Marathon. When the concurrent requests are less than 10, it works normal, but when the concurrent requests are over 50, the container is killed and Mesos start another container. I increase RAM, CPU but it still happens.
This is log in /var/log/mesos/
E0116 09:33:31.554816 19298 slave.cpp:2344] Failed to update resources for container 10e47946-4c54-4d64-9276-0ce94af31d44 of executor dev_service.2e25332d-964f-11e4-9004-42010af05efe running task dev_service.2e25332d-964f-11e4-9004-42010af05efe on status update for terminal task, destroying container: Failed to determine cgroup for the 'cpu' subsystem: Failed to read /proc/612/cgroup: Failed to open file '/proc/612/cgroup': No such file or directory
The error message you're seeing is actually another symptom, not the root cause of the issue. There's a good explanation/discussion in this Apache Jira bug report:
https://issues.apache.org/jira/browse/MESOS-1837
Basically, your container is crashing for one reason or another and the /proc/pid#/ directory is getting cleared without Mesos being aware, so it throws the error message you found when it goes to check that /proc directory.
Try setting your allocated CPU higher in the JSON file describing your task.
When I am running the project locally and have not got resque running I will get an error message when using enqueue. I understand that totally as the resque server is not running.
Is there a way to catch that error instead so that I can display it as a flash error message instead of halting execution.
What I usually do is do the jobs with resque when Rails.env == production or staging. In development, the jobs are done directly.