Supervisor stops/terminates before timeout - laravel-5

I know this might be a hard question as it concerns supervisord queueing but I hope someone would be able to try and think with me to solve an application-breaking problem.
The scenario I'm in at the moment is that I have an application that runs huge imports, each on its own supervisor thread/proc and even though the timeout of the queue is set to either 0 or 3600, the supervisor queue keeps terminating before the 3600 mark, generally after about 30-40 minutes.
I've tracked the RAM- & CPU-usage for an import but this could not be the problem as I am using max. 2GB / 8GB ram and the CPU is using only 1 thread with around 20% usage.
My supervisor queue looks as follows:
[program:laravel_queue]
process_name=%(program_name)s_%(process_num)02d
command=php /var/www/application/artisan queue:work --sleep=3 --tries=1 --queue=application --timeout=0
autostart=true
autorestart=true
user=administrator
numprocs=4
redirect_stderr=true
stdout_logfile=/var/www/application/storage/logs/queue/laravel_queue.out.log
stderr_logfile=/var/www/application/storage/logs/queue/laravel_queue.err.log
I've tried the same variant with --timeout=3600, but the queue still terminates at about 40 minutes.
I'm running the same setup in a Homestead/Laravel VM, and it runs perfectly in the Homestead VM.
Am I missing some redis configuration setting that terminates connections after 40 minutes? I'm running a cluster-system where there is 1 seperate - remote - redis-server which the application(s) talk to.
I want to provide more information but I'm not sure what more I could provide that would be helpful, so if you have any thoughts, I'm probably willing to provide most information.
Thanks in advance.
--- EDIT ---
I got a part of the supervisor log where a termination happens right here:
2017-11-28 12:54:34,151 INFO waiting for application_laravel_queue_02 to stop
2017-11-28 12:54:34,151 INFO waiting for application_laravel_queue_03 to stop
2017-11-28 12:54:34,151 INFO waiting for application_laravel_queue_00 to stop
2017-11-28 12:54:34,151 INFO waiting for application_laravel_queue_01 to stop
2017-11-28 12:54:34,155 INFO stopped: application_laravel_queue_02 (terminated by SIGKILL)
2017-11-28 12:54:34,155 INFO stopped: application_laravel_queue_00 (terminated by SIGKILL)
2017-11-28 12:54:34,156 INFO stopped: application_laravel_queue_01 (terminated by SIGKILL)
2017-11-28 12:54:36,158 INFO waiting for application_laravel_queue_03 to stop
2017-11-28 12:54:38,161 INFO waiting for application_laravel_queue_03 to stop
2017-11-28 12:54:40,164 INFO waiting for application_laravel_queue_03 to stop
2017-11-28 12:54:42,170 INFO waiting for application_laravel_queue_03 to stop
2017-11-28 12:54:44,173 WARN killing 'application_laravel_queue_03' (1004) with SIGKILL
2017-11-28 12:54:44,174 INFO waiting for application_laravel_queue_03 to stop
2017-11-28 12:54:44,182 INFO stopped: application_laravel_queue_03 (terminated by SIGKILL)
2017-11-28 12:54:45,193 INFO spawned: 'application_laravel_queue_02' with pid 4235
2017-11-28 12:54:45,196 INFO spawned: 'application_laravel_queue_00' with pid 4236
2017-11-28 12:54:45,199 INFO spawned: 'application_laravel_queue_01' with pid 4237
2017-11-28 12:54:46,378 INFO success: application_laravel_queue_02 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2017-11-28 12:54:46,379 INFO success: application_laravel_queue_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2017-11-28 12:54:46,379 INFO success: application_laravel_queue_01 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
it looks like it it getting a termination signal (SIGKILL), but I don't know how or from where. The php-fpm logs is cleaned for this timeframe.

Related

The oozie job does not run with the message [AM container is launched, waiting for AM container to Register with RM]

I ran a shell job among the oozie examples.
However, YARN application is not executed.
Detail information YARN UI & LOG:
https://docs.google.com/document/d/1N8LBXZGttY3rhRTwv8cUEfK3WkWtvWJ-YV1q_fh_kks/edit
YARN application status is
Application Priority: 0 (Higher Integer value indicates higher priority)
YarnApplicationState: ACCEPTED: waiting for AM container to be allocated, launched and register with RM.
Queue: default
FinalStatus Reported by AM: Application has not completed yet.
Finished: N/A
Elapsed: 20mins, 30sec
Tracking URL: ApplicationMaster
Log Aggregation Status: DISABLED
Application Timeout (Remaining Time): Unlimited
Diagnostics: AM container is launched, waiting for AM container to Register with RM
Application Attempt status is
Application Attempt State: FAILED
Elapsed: 13mins, 19sec
AM Container: container_1607273090037_0001_02_000001
Node: N/A
Tracking URL: History
Diagnostics Info: ApplicationMaster for attempt appattempt_1607273090037_0001_000002 timed out
Node Local Request Rack Local Request Off Switch Request
Num Node Local Containers (satisfied by) 0
Num Rack Local Containers (satisfied by) 0 0
Num Off Switch Containers (satisfied by) 0 0 1
nodemanager log
2020-12-07 01:45:16,237 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler: Starting container [container_1607273090037_0001_01_000001]
2020-12-07 01:45:16,267 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1607273090037_0001_01_000001 transitioned from SCHEDULED to RUNNING
2020-12-07 01:45:16,267 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Starting resource-monitoring for container_1607273090037_0001_01_000001
2020-12-07 01:45:16,272 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, /tmp/hadoop-oozie/nm-local-dir/usercache/oozie/appcache/application_1607273090037_0001/container_1607273090037_0001_01_000001/default_container_executor.sh]
2020-12-07 01:45:17,301 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: container_1607273090037_0001_01_000001's ip = 127.0.0.1, and hostname = localhost.localdomain
2020-12-07 01:45:17,345 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Skipping monitoring container container_1607273090037_0001_01_000001 since CPU usage is not yet available.
2020-12-07 01:45:48,274 INFO logs: Aliases are enabled
2020-12-07 01:54:50,242 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Cache Size Before Clean: 496756, Total Deleted: 0, Public Deleted: 0, Private Deleted: 0
2020-12-07 01:58:10,071 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1607273090037_0001_000001 (auth:SIMPLE)
2020-12-07 01:58:10,078 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Stopping container with container Id: container_1607273090037_0001_01_000001
What is the problem ?

Run cron with supervisor and docker

I'm stuck on an issue with a legacy Laravel project. It uses supervisor and cron to run the scheduled tasks, but it seems that the cronjobs won't run (and have never run apparently).
This is the Dockerfile:
FROM 704666026001.dkr.ecr.eu-central-1.amazonaws.com/laravel-prod
# Copy project
COPY . /var/www/html/
# Copy cronjob setup fro laravel scheduler
COPY docker/cron/cron.txt /etc/docker/cron/cron.txt
# Copy laravel queue worker supervisor conf
COPY docker/supervisor /etc/docker/supervisor/conf
RUN mkdir -p /var/www/html/storage/framework/cache/data \
&& /usr/bin/crontab -u www-data /etc/docker/cron/cron.txt \
&& chown -R www-data:www-data /var/www/html/
In the docker/supervisor folder, there a two files:
One named queue-worker.conf with:
[group:laravel]
programs=laravel-worker
priority=30
[program:laravel-worker]
process_name=%(program_name)s_%(process_num)02d
command=php /var/www/html/artisan queue:work --sleep=3 --tries=3
user=www-data
numprocs=1
startsecs=10
autostart=true
autorestart=true
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0
And cron.conf with:
[group:cron]
programs=crond
priority=40
[program:crond]
process_name=%(program_name)s
command=crond -f
user=www-data
autostart=true
autorestart=true
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0
And the file docker/cron/cron.txt has one line:
* * * * * php /var/www/html/artisan schedule:run >> /dev/null 2>&1
The docker image does build without any errors. When i run it locally, this is the output:
2020-06-16 10:21:05,045 INFO Included extra file "/etc/docker/supervisor/conf/cron.conf" during parsing
2020-06-16 10:21:05,045 INFO Included extra file "/etc/docker/supervisor/conf/nginx.conf" during parsing
2020-06-16 10:21:05,045 INFO Included extra file "/etc/docker/supervisor/conf/php-fpm.conf" during parsing
2020-06-16 10:21:05,045 INFO Included extra file "/etc/docker/supervisor/conf/queue-worker.conf" during parsing
2020-06-16 10:21:05,062 INFO RPC interface 'supervisor' initialized
2020-06-16 10:21:05,063 INFO supervisord started with pid 1
2020-06-16 10:21:06,073 INFO spawned: 'nginxd' with pid 9
2020-06-16 10:21:06,078 INFO spawned: 'php-fpmd' with pid 10
2020-06-16 10:21:06,084 INFO spawned: 'laravel-worker_00' with pid 11
2020-06-16 10:21:06,088 INFO spawned: 'crond' with pid 12
2020/06/16 10:21:06 [notice] 9#9: using the "epoll" event method
2020/06/16 10:21:06 [notice] 9#9: nginx/1.16.1
2020/06/16 10:21:06 [notice] 9#9: OS: Linux 4.19.76-linuxkit
2020/06/16 10:21:06 [notice] 9#9: getrlimit(RLIMIT_NOFILE): 1048576:1048576
2020/06/16 10:21:06 [notice] 9#9: start worker processes
2020-06-16 10:21:06,121 INFO success: nginxd entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2020-06-16 10:21:06,121 INFO success: php-fpmd entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2020/06/16 10:21:06 [notice] 9#9: start worker process 13
2020/06/16 10:21:06 [notice] 9#9: start worker process 14
2020/06/16 10:21:06 [notice] 9#9: start worker process 15
2020/06/16 10:21:06 [notice] 9#9: start worker process 16
2020/06/16 10:21:06 [notice] 9#9: start cache manager process 17
2020/06/16 10:21:06 [notice] 9#9: start cache loader process 18
[16-Jun-2020 10:21:06] NOTICE: fpm is running, pid 10
[16-Jun-2020 10:21:06] NOTICE: ready to handle connections
2020-06-16 10:21:07,259 INFO success: crond entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-06-16 10:21:16,253 INFO success: laravel-worker_00 entered RUNNING state, process has stayed up for > than 10 seconds (startsecs)
It does show 'crond entered RUNNING state', but the cronjob isn't run in any way.
Does anyone have an idea why? Is this setup even valid?
Thanks in advance for the help!
Supervisor stops runs if they are not doing anything for certain amount of time.
So with cron you have task only running intermittently. So it gets shut down.

Running Kafka-Manager inside Docker container on Windows

I am following this tutorial to run Kafka inside a Docker container on windows.
When I try to launch Kafka-Manager by opening http://localhost:9000 in the browser as described there, I get ERR_CONNECTION_REFUSED.
Something I think might be related is that at the first time I ran docker-compose up, PowerShell showed an error saying I needed to run some command first, to open a virtual machine or something like that.
Then I ran the command that PowerShell had told me and then I managed to run docker-compose up successfully. However the tutorial didn't mention anything about it, and since then every time I tried to run docker-compose up I managed to to it without running another command first, even if I closed and reopened PowerShell.
I suspect PowerShell remembers I'm connected to a virtual machine so docker-compose up runs Kafka inside a virtual machine, and therefore I can't reach Kafka-Manager in the browser, although I see shows the following message:
kafkamanager | [info] p.c.s.NettyServer - Listening for HTTP on
/0.0.0.0:9000
Edit:
docker logs for kafka container:
/usr/lib/python2.7/dist-packages/supervisor/options.py:296: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security.
'Supervisord is running as root and it is searching '
2020-02-28 08:37:37,274 CRIT Supervisor running as root (no user in config file)
2020-02-28 08:37:37,274 WARN Included extra file "/etc/supervisor/conf.d/zookeeper.conf" during parsing
2020-02-28 08:37:37,274 WARN Included extra file "/etc/supervisor/conf.d/kafka.conf" during parsing
2020-02-28 08:37:37,303 INFO RPC interface 'supervisor' initialized
2020-02-28 08:37:37,303 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2020-02-28 08:37:37,303 INFO supervisord started with pid 1
2020-02-28 08:37:38,306 INFO spawned: 'zookeeper' with pid 8
2020-02-28 08:37:38,308 INFO spawned: 'kafka' with pid 9
2020-02-28 08:37:39,372 INFO success: zookeeper entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-02-28 08:37:39,372 INFO success: kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-02-28 21:16:01,095 WARN received SIGTERM indicating exit request
2020-02-28 21:16:01,095 INFO waiting for zookeeper, kafka to die
2020-02-28 21:16:02,102 INFO stopped: kafka (terminated by SIGTERM)
2020-02-28 21:16:02,442 INFO stopped: zookeeper (exit status 143)
/usr/lib/python2.7/dist-packages/supervisor/options.py:296: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security.
'Supervisord is running as root and it is searching '
2020-02-28 21:17:50,843 CRIT Supervisor running as root (no user in config file)
2020-02-28 21:17:50,843 WARN Included extra file "/etc/supervisor/conf.d/zookeeper.conf" during parsing
2020-02-28 21:17:50,843 WARN Included extra file "/etc/supervisor/conf.d/kafka.conf" during parsing
2020-02-28 21:17:50,858 INFO RPC interface 'supervisor' initialized
2020-02-28 21:17:50,858 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2020-02-28 21:17:50,859 INFO supervisord started with pid 1
2020-02-28 21:17:51,862 INFO spawned: 'zookeeper' with pid 8
2020-02-28 21:17:51,864 INFO spawned: 'kafka' with pid 9
2020-02-28 21:17:52,926 INFO success: zookeeper entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-02-28 21:17:52,927 INFO success: kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-02-28 21:17:59,672 INFO exited: kafka (exit status 1; not expected)
2020-02-28 21:18:00,675 INFO spawned: 'kafka' with pid 297
2020-02-28 21:18:01,694 INFO success: kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-02-29 19:42:18,487 WARN received SIGTERM indicating exit request
2020-02-29 19:42:18,487 INFO waiting for zookeeper, kafka to die
2020-02-29 19:42:18,488 INFO stopped: kafka (terminated by SIGTERM)
2020-02-29 19:42:18,821 INFO stopped: zookeeper (exit status 143)
/usr/lib/python2.7/dist-packages/supervisor/options.py:296: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security.
'Supervisord is running as root and it is searching '
2020-02-29 19:42:26,841 CRIT Supervisor running as root (no user in config file)
2020-02-29 19:42:26,841 WARN Included extra file "/etc/supervisor/conf.d/zookeeper.conf" during parsing
2020-02-29 19:42:26,842 WARN Included extra file "/etc/supervisor/conf.d/kafka.conf" during parsing
2020-02-29 19:42:26,854 INFO RPC interface 'supervisor' initialized
2020-02-29 19:42:26,854 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2020-02-29 19:42:26,855 INFO supervisord started with pid 1
2020-02-29 19:42:27,857 INFO spawned: 'zookeeper' with pid 8
2020-02-29 19:42:27,859 INFO spawned: 'kafka' with pid 9
2020-02-29 19:42:28,903 INFO success: zookeeper entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-02-29 19:42:28,903 INFO success: kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-02-29 19:42:34,985 INFO exited: kafka (exit status 1; not expected)
2020-02-29 19:42:35,988 INFO spawned: 'kafka' with pid 297
2020-02-29 19:42:37,014 INFO success: kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-02-29 19:43:20,590 WARN received SIGTERM indicating exit request
2020-02-29 19:43:20,590 INFO waiting for zookeeper, kafka to die
2020-02-29 19:43:20,590 INFO stopped: kafka (terminated by SIGTERM)
2020-02-29 19:43:20,784 INFO stopped: zookeeper (exit status 143)
/usr/lib/python2.7/dist-packages/supervisor/options.py:296: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security.
'Supervisord is running as root and it is searching '
2020-02-29 19:45:38,600 CRIT Supervisor running as root (no user in config file)
2020-02-29 19:45:38,600 WARN Included extra file "/etc/supervisor/conf.d/zookeeper.conf" during parsing
2020-02-29 19:45:38,600 WARN Included extra file "/etc/supervisor/conf.d/kafka.conf" during parsing
2020-02-29 19:45:38,619 INFO RPC interface 'supervisor' initialized
2020-02-29 19:45:38,629 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2020-02-29 19:45:38,630 INFO supervisord started with pid 1
2020-02-29 19:45:39,632 INFO spawned: 'zookeeper' with pid 8
2020-02-29 19:45:39,634 INFO spawned: 'kafka' with pid 9
2020-02-29 19:45:40,687 INFO success: zookeeper entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-02-29 19:45:40,689 INFO success: kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-02-29 19:45:47,740 INFO exited: kafka (exit status 1; not expected)
2020-02-29 19:45:48,743 INFO spawned: 'kafka' with pid 297
2020-02-29 19:45:49,763 INFO success: kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-02-29 19:46:20,659 WARN received SIGTERM indicating exit request
2020-02-29 19:46:20,659 INFO waiting for zookeeper, kafka to die
2020-02-29 19:46:20,660 INFO stopped: kafka (terminated by SIGTERM)
2020-02-29 19:46:20,991 INFO stopped: zookeeper (exit status 143)
/usr/lib/python2.7/dist-packages/supervisor/options.py:296: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security.
'Supervisord is running as root and it is searching '
2020-03-13 22:16:26,128 CRIT Supervisor running as root (no user in config file)
2020-03-13 22:16:26,128 WARN Included extra file "/etc/supervisor/conf.d/zookeeper.conf" during parsing
2020-03-13 22:16:26,128 WARN Included extra file "/etc/supervisor/conf.d/kafka.conf" during parsing
2020-03-13 22:16:26,157 INFO RPC interface 'supervisor' initialized
2020-03-13 22:16:26,162 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2020-03-13 22:16:26,162 INFO supervisord started with pid 1
2020-03-13 22:16:27,164 INFO spawned: 'zookeeper' with pid 8
2020-03-13 22:16:27,167 INFO spawned: 'kafka' with pid 9
2020-03-13 22:16:28,226 INFO success: zookeeper entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-03-13 22:16:28,227 INFO success: kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-03-13 22:16:36,496 INFO exited: kafka (exit status 1; not expected)
2020-03-13 22:16:37,499 INFO spawned: 'kafka' with pid 298
2020-03-13 22:16:38,511 INFO success: kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-03-13 22:17:20,939 WARN received SIGTERM indicating exit request
2020-03-13 22:17:20,940 INFO waiting for zookeeper, kafka to die
2020-03-13 22:17:20,940 INFO stopped: kafka (terminated by SIGTERM)
2020-03-13 22:17:21,268 INFO stopped: zookeeper (exit status 143)
/usr/lib/python2.7/dist-packages/supervisor/options.py:296: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security.
'Supervisord is running as root and it is searching '
2020-03-27 21:25:59,495 CRIT Supervisor running as root (no user in config file)
2020-03-27 21:25:59,496 WARN Included extra file "/etc/supervisor/conf.d/zookeeper.conf" during parsing
2020-03-27 21:25:59,497 WARN Included extra file "/etc/supervisor/conf.d/kafka.conf" during parsing
2020-03-27 21:25:59,520 INFO RPC interface 'supervisor' initialized
2020-03-27 21:25:59,522 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2020-03-27 21:25:59,523 INFO supervisord started with pid 1
2020-03-27 21:26:00,530 INFO spawned: 'zookeeper' with pid 8
2020-03-27 21:26:00,532 INFO spawned: 'kafka' with pid 9
2020-03-27 21:26:01,620 INFO success: zookeeper entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-03-27 21:26:01,620 INFO success: kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
docker logs for kafka manager container seems fine:
[info] o.a.z.ZooKeeper - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
[info] o.a.z.ZooKeeper - Client environment:java.io.tmpdir=/tmp
[info] o.a.z.ZooKeeper - Client environment:java.compiler=<NA>
[info] o.a.z.ZooKeeper - Client environment:os.name=Linux
[info] o.a.z.ZooKeeper - Client environment:os.arch=amd64
[info] o.a.z.ZooKeeper - Client environment:os.version=4.9.93-boot2docker
[info] o.a.z.ZooKeeper - Client environment:user.name=root
[info] o.a.z.ZooKeeper - Client environment:user.home=/root
[info] o.a.z.ZooKeeper - Client environment:user.dir=/kafka-manager-1.3.3.4
[info] o.a.z.ZooKeeper - Initiating client connection, connectString=kafkaserver:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState#7a27a9b4
[info] o.a.z.ClientCnxn - Opening socket connection to server kafka.kafka_kafkanet/172.18.0.2:2181. Will not attempt to authenticate using SASL (unknown error)
[info] k.m.a.KafkaManagerActor - zk=kafkaserver:2181
[info] k.m.a.KafkaManagerActor - baseZkPath=/kafka-manager
[info] o.a.z.ClientCnxn - Socket connection established to kafka.kafka_kafkanet/172.18.0.2:2181, initiating session
[info] o.a.z.ClientCnxn - Session establishment complete on server kafka.kafka_kafkanet/172.18.0.2:2181, sessionid = 0x1711de33be70001, negotiated timeout = 40000
[info] k.m.a.KafkaManagerActor - Started actor akka://kafka-manager-system/user/kafka-manager
[info] k.m.a.KafkaManagerActor - Starting delete clusters path cache...
[info] k.m.a.DeleteClusterActor - Started actor akka://kafka-manager-system/user/kafka-manager/delete-cluster
[info] k.m.a.DeleteClusterActor - Starting delete clusters path cache...
[info] k.m.a.DeleteClusterActor - Adding kafka manager path cache listener...
[info] k.m.a.DeleteClusterActor - Scheduling updater for 10 seconds
[info] k.m.a.KafkaManagerActor - Starting kafka manager path cache...
[info] k.m.a.KafkaManagerActor - Adding kafka manager path cache listener...
[info] play.api.Play - Application started (Prod)
[info] p.c.s.NettyServer - Listening for HTTP on /0.0.0.0:9000
[info] k.m.a.KafkaManagerActor - Updating internal state...
[info] k.m.a.KafkaManagerActor - Updating internal state...
[info] k.m.a.KafkaManagerActor - Updating internal state...
[info] k.m.a.KafkaManagerActor - Updating internal state...
This log is a lot longer so I've ommited the beginning but it seems fine.
Yes, there's a hypervisor, not a full VM. You can open the hyperV manager to look at it
You compose file needs a port forward
ports:
- '9000:9000'
If you are using docker toolbox on windows you can try to access kafka-manager with this address: http://192.168.99.100:9000
Note: 192.168.99.100 is the default ip address of VM which docker running on.
docker-compose.yaml is totally fine which is given in the tutorial. Can you do docker-compose down and then again bring up the docker-compose up?
Then try to browse http://localhost:9000 and you should be able to see it.
Possible errors:-
Port forwarding (already done in the docker-compose)
Instead of HTTP, you are opening HTTPS in the browser.

Change Mesos Master Leader, cause Marathon shutdown?

Env:
Zookeeper on computer A,
Mesos master on computer B as Leader,
Mesos master on computer C,
Marathon on computer B singleton.
Action:
Kill Mesos master task on computer B, attempt to change mesos cluster leader
Result:
Mesos cluster leader change to mesos master on computer C,
But Marathon task on computer auto shutdown with following logs.
Question:
Somebody can help me why marathon down? and how to fix it!
Logs:
I1109 12:19:10.010197 11287 detector.cpp:152] Detected a new leader: (id='9')
I1109 12:19:10.010646 11291 group.cpp:699] Trying to get '/mesos/json.info_0000000009' in ZooKeeper
I1109 12:19:10.013425 11292 zookeeper.cpp:262] A new leading master (UPID=master#10.4.23.55:5050) is detected
[2017-11-09 12:19:10,015] WARN Disconnected (mesosphere.marathon.MarathonScheduler:Thread-23)
I1109 12:19:10.018977 11292 sched.cpp:2021] Asked to stop the driver
I1109 12:19:10.019161 11292 sched.cpp:336] New master detected at master#10.4.23.55:5050
I1109 12:19:10.019892 11292 sched.cpp:1203] Stopping framework d52cbd8c-1015-4d94-8328-e418876ca5b2-0000
[2017-11-09 12:19:10,020] INFO Driver future completed with result=Success(()). (mesosphere.marathon.MarathonSchedulerService:ForkJoinPool-3-worker-5)
[2017-11-09 12:19:10,022] INFO Abdicating leadership (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-3-worker-5)
[2017-11-09 12:19:10,022] INFO Stopping the election service (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-3-worker-5)
[2017-11-09 12:19:10,029] INFO backgroundOperationsLoop exiting (org.apache.curator.framework.imps.CuratorFrameworkImpl:Curator-Framework-0)
[2017-11-09 12:19:10,061] INFO Session: 0x15f710ffb010058 closed (org.apache.zookeeper.ZooKeeper:ForkJoinPool-3-worker-5)
[2017-11-09 12:19:10,061] INFO EventThread shut down for session: 0x15f710ffb010058 (org.apache.zookeeper.ClientCnxn:pool-3-thread-1-EventThread)
[2017-11-09 12:19:10,063] INFO Stopping MarathonSchedulerService [RUNNING]'s leadership (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-3-worker-5)
[2017-11-09 12:19:10,063] INFO Lost leadership (mesosphere.marathon.MarathonSchedulerService:ForkJoinPool-3-worker-5)
[2017-11-09 12:19:10,066] INFO All actors suspended:
* Actor[akka://marathon/user/offerMatcherStatistics#-1904211014]
* Actor[akka://marathon/user/reviveOffersWhenWanted#-238627718]
* Actor[akka://marathon/user/expungeOverdueLostTasks#608979053]
* Actor[akka://marathon/user/launchQueue#803590575]
* Actor[akka://marathon/user/offersWantedForReconciliation#598482724]
* Actor[akka://marathon/user/offerMatcherLaunchTokens#813230776]
* Actor[akka://marathon/user/offerMatcherManager#1205401692]
* Actor[akka://marathon/user/instanceTracker#1055980147]
* Actor[akka://marathon/user/killOverdueStagedTasks#-40058350]
* Actor[akka://marathon/user/taskKillServiceActor#-602552505]
* Actor[akka://marathon/user/rateLimiter#-911383474]
* Actor[akka://marathon/user/deploymentManager#2013376325] (mesosphere.marathon.core.leadership.impl.LeadershipCoordinatorActor:marathon-akka.actor.default-dispatcher-10)
I1109 12:19:10.069551 11272 sched.cpp:2021] Asked to stop the driver
[2017-11-09 12:19:10,068] INFO Stopping driver (mesosphere.marathon.MarathonSchedulerService:ForkJoinPool-3-worker-5)
[2017-11-09 12:19:10,069] INFO Stopped MarathonSchedulerService [RUNNING]'s leadership (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-3-worker-5)
[2017-11-09 12:19:10,070] INFO Terminating due to leadership abdication or failure (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-3-worker-5)
[2017-11-09 12:19:10,071] INFO Call postDriverRuns callbacks on (mesosphere.marathon.MarathonSchedulerService:ForkJoinPool-3-worker-5)
[2017-11-09 12:19:10,074] INFO Now standing by. Closing existing handles and rejecting new. (mesosphere.marathon.core.event.impl.stream.HttpEventStreamActor:marathon-akka.actor.default-dispatcher-12)
[2017-11-09 12:19:10,074] INFO Suspending scheduler actor (mesosphere.marathon.MarathonSchedulerActor:marathon-akka.actor.default-dispatcher-2)
[2017-11-09 12:19:10,083] INFO Finished postDriverRuns callbacks (mesosphere.marathon.MarathonSchedulerService:ForkJoinPool-3-worker-5)
[2017-11-09 12:19:10,084] INFO ExpungeOverdueLostTasksActor has stopped (mesosphere.marathon.core.task.jobs.impl.ExpungeOverdueLostTasksActor:marathon-akka.actor.default-dispatcher-9)
[1]+ Exit 137
I think there is wrong configuration in zookeeper cluster. Use 3 zookeeper cluster and 2 mesos master n multiple slaves. Ref : https://www.google.co.in/amp/s/beingasysadmin.wordpress.com/2014/08/16/managing-ha-docker-cluster-using-multiple-mesos-masters/amp/
Did you set masters reference to marathon conf?
can you do
cat /etc/marathon/conf/master

Docker stop exit code -1 if the default CMD is a shell script

I am building a tomcat container in Docker with supervisord. If the default command in the Dockerfile is
CMD supervisord -c /etc/supervisord.conf
and when i dispatch docker stop command, the container exits successfully with the exit code 0.
But instead if i have
CMD ["/run"]
and in run.sh,
supervisord -c /etc/supervisord.conf
The docker stop command gives me a exit code -1. On viewing the logs, it seems that the supervisord did not receive the SIGTERM indicating the exit request.
2014-10-06 19:48:54,420 CRIT Supervisor running as root (no user in config file)
2014-10-06 19:48:54,450 INFO RPC interface 'supervisor' initialized
2014-10-06 19:48:54,451 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2014-10-06 19:48:54,451 INFO supervisord started with pid 6
2014-10-06 19:48:55,457 INFO spawned: 'tomcat' with pid 9
2014-10-06 19:48:56,503 INFO success: tomcat entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
as opposed to the previous logs where it receives a sigterm and gracefully exits.
2014-10-06 20:02:59,527 CRIT Supervisor running as root (no user in config file)
2014-10-06 20:02:59,556 INFO RPC interface 'supervisor' initialized
2014-10-06 20:02:59,556 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2014-10-06 20:02:59,557 INFO supervisord started with pid 1
2014-10-06 20:03:00,561 INFO spawned: 'tomcat' with pid 9
2014-10-06 20:03:01,602 INFO success: tomcat entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2014-10-06 20:05:11,690 WARN received SIGTERM indicating exit request
2014-10-06 20:05:11,690 INFO waiting for tomcat to die
2014-10-06 20:05:12,450 INFO stopped: tomcat (exit status 143)
Any help appreciated.
Thanks,
Karthik
UPDATE:
supervisord.conf file
[supervisord]
nodaemon=true
logfile=/var/log/supervisor/supervisord.log
[program:mysql]
command=/usr/bin/pidproxy /var/run/mysqld/mysqld.pid /usr/bin/mysqld_safe --pid-file=/var/run/mysqld/mysqld.pid
stdout_logfile=/tmp/mysql.log
stderr_logfile=/tmp/mysql_err.log
[supervisorctl]
serverurl=unix:///tmp/supervisor.sock
[unix_http_server]
file=/tmp/supervisor.sock ; path to your socket file
[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface
When you run the process via run.sh, signals are only sent to that process. Unless you are
going out of your way to send signals to child processes, e.g. with trap
sending signals to the process group.
doing exec supervisord ... in run.sh
the child process won't get the signals.

Resources