Running Kafka-Manager inside Docker container on Windows - windows

I am following this tutorial to run Kafka inside a Docker container on windows.
When I try to launch Kafka-Manager by opening http://localhost:9000 in the browser as described there, I get ERR_CONNECTION_REFUSED.
Something I think might be related is that at the first time I ran docker-compose up, PowerShell showed an error saying I needed to run some command first, to open a virtual machine or something like that.
Then I ran the command that PowerShell had told me and then I managed to run docker-compose up successfully. However the tutorial didn't mention anything about it, and since then every time I tried to run docker-compose up I managed to to it without running another command first, even if I closed and reopened PowerShell.
I suspect PowerShell remembers I'm connected to a virtual machine so docker-compose up runs Kafka inside a virtual machine, and therefore I can't reach Kafka-Manager in the browser, although I see shows the following message:
kafkamanager | [info] p.c.s.NettyServer - Listening for HTTP on
/0.0.0.0:9000
Edit:
docker logs for kafka container:
/usr/lib/python2.7/dist-packages/supervisor/options.py:296: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security.
'Supervisord is running as root and it is searching '
2020-02-28 08:37:37,274 CRIT Supervisor running as root (no user in config file)
2020-02-28 08:37:37,274 WARN Included extra file "/etc/supervisor/conf.d/zookeeper.conf" during parsing
2020-02-28 08:37:37,274 WARN Included extra file "/etc/supervisor/conf.d/kafka.conf" during parsing
2020-02-28 08:37:37,303 INFO RPC interface 'supervisor' initialized
2020-02-28 08:37:37,303 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2020-02-28 08:37:37,303 INFO supervisord started with pid 1
2020-02-28 08:37:38,306 INFO spawned: 'zookeeper' with pid 8
2020-02-28 08:37:38,308 INFO spawned: 'kafka' with pid 9
2020-02-28 08:37:39,372 INFO success: zookeeper entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-02-28 08:37:39,372 INFO success: kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-02-28 21:16:01,095 WARN received SIGTERM indicating exit request
2020-02-28 21:16:01,095 INFO waiting for zookeeper, kafka to die
2020-02-28 21:16:02,102 INFO stopped: kafka (terminated by SIGTERM)
2020-02-28 21:16:02,442 INFO stopped: zookeeper (exit status 143)
/usr/lib/python2.7/dist-packages/supervisor/options.py:296: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security.
'Supervisord is running as root and it is searching '
2020-02-28 21:17:50,843 CRIT Supervisor running as root (no user in config file)
2020-02-28 21:17:50,843 WARN Included extra file "/etc/supervisor/conf.d/zookeeper.conf" during parsing
2020-02-28 21:17:50,843 WARN Included extra file "/etc/supervisor/conf.d/kafka.conf" during parsing
2020-02-28 21:17:50,858 INFO RPC interface 'supervisor' initialized
2020-02-28 21:17:50,858 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2020-02-28 21:17:50,859 INFO supervisord started with pid 1
2020-02-28 21:17:51,862 INFO spawned: 'zookeeper' with pid 8
2020-02-28 21:17:51,864 INFO spawned: 'kafka' with pid 9
2020-02-28 21:17:52,926 INFO success: zookeeper entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-02-28 21:17:52,927 INFO success: kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-02-28 21:17:59,672 INFO exited: kafka (exit status 1; not expected)
2020-02-28 21:18:00,675 INFO spawned: 'kafka' with pid 297
2020-02-28 21:18:01,694 INFO success: kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-02-29 19:42:18,487 WARN received SIGTERM indicating exit request
2020-02-29 19:42:18,487 INFO waiting for zookeeper, kafka to die
2020-02-29 19:42:18,488 INFO stopped: kafka (terminated by SIGTERM)
2020-02-29 19:42:18,821 INFO stopped: zookeeper (exit status 143)
/usr/lib/python2.7/dist-packages/supervisor/options.py:296: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security.
'Supervisord is running as root and it is searching '
2020-02-29 19:42:26,841 CRIT Supervisor running as root (no user in config file)
2020-02-29 19:42:26,841 WARN Included extra file "/etc/supervisor/conf.d/zookeeper.conf" during parsing
2020-02-29 19:42:26,842 WARN Included extra file "/etc/supervisor/conf.d/kafka.conf" during parsing
2020-02-29 19:42:26,854 INFO RPC interface 'supervisor' initialized
2020-02-29 19:42:26,854 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2020-02-29 19:42:26,855 INFO supervisord started with pid 1
2020-02-29 19:42:27,857 INFO spawned: 'zookeeper' with pid 8
2020-02-29 19:42:27,859 INFO spawned: 'kafka' with pid 9
2020-02-29 19:42:28,903 INFO success: zookeeper entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-02-29 19:42:28,903 INFO success: kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-02-29 19:42:34,985 INFO exited: kafka (exit status 1; not expected)
2020-02-29 19:42:35,988 INFO spawned: 'kafka' with pid 297
2020-02-29 19:42:37,014 INFO success: kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-02-29 19:43:20,590 WARN received SIGTERM indicating exit request
2020-02-29 19:43:20,590 INFO waiting for zookeeper, kafka to die
2020-02-29 19:43:20,590 INFO stopped: kafka (terminated by SIGTERM)
2020-02-29 19:43:20,784 INFO stopped: zookeeper (exit status 143)
/usr/lib/python2.7/dist-packages/supervisor/options.py:296: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security.
'Supervisord is running as root and it is searching '
2020-02-29 19:45:38,600 CRIT Supervisor running as root (no user in config file)
2020-02-29 19:45:38,600 WARN Included extra file "/etc/supervisor/conf.d/zookeeper.conf" during parsing
2020-02-29 19:45:38,600 WARN Included extra file "/etc/supervisor/conf.d/kafka.conf" during parsing
2020-02-29 19:45:38,619 INFO RPC interface 'supervisor' initialized
2020-02-29 19:45:38,629 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2020-02-29 19:45:38,630 INFO supervisord started with pid 1
2020-02-29 19:45:39,632 INFO spawned: 'zookeeper' with pid 8
2020-02-29 19:45:39,634 INFO spawned: 'kafka' with pid 9
2020-02-29 19:45:40,687 INFO success: zookeeper entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-02-29 19:45:40,689 INFO success: kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-02-29 19:45:47,740 INFO exited: kafka (exit status 1; not expected)
2020-02-29 19:45:48,743 INFO spawned: 'kafka' with pid 297
2020-02-29 19:45:49,763 INFO success: kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-02-29 19:46:20,659 WARN received SIGTERM indicating exit request
2020-02-29 19:46:20,659 INFO waiting for zookeeper, kafka to die
2020-02-29 19:46:20,660 INFO stopped: kafka (terminated by SIGTERM)
2020-02-29 19:46:20,991 INFO stopped: zookeeper (exit status 143)
/usr/lib/python2.7/dist-packages/supervisor/options.py:296: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security.
'Supervisord is running as root and it is searching '
2020-03-13 22:16:26,128 CRIT Supervisor running as root (no user in config file)
2020-03-13 22:16:26,128 WARN Included extra file "/etc/supervisor/conf.d/zookeeper.conf" during parsing
2020-03-13 22:16:26,128 WARN Included extra file "/etc/supervisor/conf.d/kafka.conf" during parsing
2020-03-13 22:16:26,157 INFO RPC interface 'supervisor' initialized
2020-03-13 22:16:26,162 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2020-03-13 22:16:26,162 INFO supervisord started with pid 1
2020-03-13 22:16:27,164 INFO spawned: 'zookeeper' with pid 8
2020-03-13 22:16:27,167 INFO spawned: 'kafka' with pid 9
2020-03-13 22:16:28,226 INFO success: zookeeper entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-03-13 22:16:28,227 INFO success: kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-03-13 22:16:36,496 INFO exited: kafka (exit status 1; not expected)
2020-03-13 22:16:37,499 INFO spawned: 'kafka' with pid 298
2020-03-13 22:16:38,511 INFO success: kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-03-13 22:17:20,939 WARN received SIGTERM indicating exit request
2020-03-13 22:17:20,940 INFO waiting for zookeeper, kafka to die
2020-03-13 22:17:20,940 INFO stopped: kafka (terminated by SIGTERM)
2020-03-13 22:17:21,268 INFO stopped: zookeeper (exit status 143)
/usr/lib/python2.7/dist-packages/supervisor/options.py:296: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security.
'Supervisord is running as root and it is searching '
2020-03-27 21:25:59,495 CRIT Supervisor running as root (no user in config file)
2020-03-27 21:25:59,496 WARN Included extra file "/etc/supervisor/conf.d/zookeeper.conf" during parsing
2020-03-27 21:25:59,497 WARN Included extra file "/etc/supervisor/conf.d/kafka.conf" during parsing
2020-03-27 21:25:59,520 INFO RPC interface 'supervisor' initialized
2020-03-27 21:25:59,522 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2020-03-27 21:25:59,523 INFO supervisord started with pid 1
2020-03-27 21:26:00,530 INFO spawned: 'zookeeper' with pid 8
2020-03-27 21:26:00,532 INFO spawned: 'kafka' with pid 9
2020-03-27 21:26:01,620 INFO success: zookeeper entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-03-27 21:26:01,620 INFO success: kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
docker logs for kafka manager container seems fine:
[info] o.a.z.ZooKeeper - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
[info] o.a.z.ZooKeeper - Client environment:java.io.tmpdir=/tmp
[info] o.a.z.ZooKeeper - Client environment:java.compiler=<NA>
[info] o.a.z.ZooKeeper - Client environment:os.name=Linux
[info] o.a.z.ZooKeeper - Client environment:os.arch=amd64
[info] o.a.z.ZooKeeper - Client environment:os.version=4.9.93-boot2docker
[info] o.a.z.ZooKeeper - Client environment:user.name=root
[info] o.a.z.ZooKeeper - Client environment:user.home=/root
[info] o.a.z.ZooKeeper - Client environment:user.dir=/kafka-manager-1.3.3.4
[info] o.a.z.ZooKeeper - Initiating client connection, connectString=kafkaserver:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState#7a27a9b4
[info] o.a.z.ClientCnxn - Opening socket connection to server kafka.kafka_kafkanet/172.18.0.2:2181. Will not attempt to authenticate using SASL (unknown error)
[info] k.m.a.KafkaManagerActor - zk=kafkaserver:2181
[info] k.m.a.KafkaManagerActor - baseZkPath=/kafka-manager
[info] o.a.z.ClientCnxn - Socket connection established to kafka.kafka_kafkanet/172.18.0.2:2181, initiating session
[info] o.a.z.ClientCnxn - Session establishment complete on server kafka.kafka_kafkanet/172.18.0.2:2181, sessionid = 0x1711de33be70001, negotiated timeout = 40000
[info] k.m.a.KafkaManagerActor - Started actor akka://kafka-manager-system/user/kafka-manager
[info] k.m.a.KafkaManagerActor - Starting delete clusters path cache...
[info] k.m.a.DeleteClusterActor - Started actor akka://kafka-manager-system/user/kafka-manager/delete-cluster
[info] k.m.a.DeleteClusterActor - Starting delete clusters path cache...
[info] k.m.a.DeleteClusterActor - Adding kafka manager path cache listener...
[info] k.m.a.DeleteClusterActor - Scheduling updater for 10 seconds
[info] k.m.a.KafkaManagerActor - Starting kafka manager path cache...
[info] k.m.a.KafkaManagerActor - Adding kafka manager path cache listener...
[info] play.api.Play - Application started (Prod)
[info] p.c.s.NettyServer - Listening for HTTP on /0.0.0.0:9000
[info] k.m.a.KafkaManagerActor - Updating internal state...
[info] k.m.a.KafkaManagerActor - Updating internal state...
[info] k.m.a.KafkaManagerActor - Updating internal state...
[info] k.m.a.KafkaManagerActor - Updating internal state...
This log is a lot longer so I've ommited the beginning but it seems fine.

Yes, there's a hypervisor, not a full VM. You can open the hyperV manager to look at it
You compose file needs a port forward
ports:
- '9000:9000'

If you are using docker toolbox on windows you can try to access kafka-manager with this address: http://192.168.99.100:9000
Note: 192.168.99.100 is the default ip address of VM which docker running on.

docker-compose.yaml is totally fine which is given in the tutorial. Can you do docker-compose down and then again bring up the docker-compose up?
Then try to browse http://localhost:9000 and you should be able to see it.
Possible errors:-
Port forwarding (already done in the docker-compose)
Instead of HTTP, you are opening HTTPS in the browser.

Related

The oozie job does not run with the message [AM container is launched, waiting for AM container to Register with RM]

I ran a shell job among the oozie examples.
However, YARN application is not executed.
Detail information YARN UI & LOG:
https://docs.google.com/document/d/1N8LBXZGttY3rhRTwv8cUEfK3WkWtvWJ-YV1q_fh_kks/edit
YARN application status is
Application Priority: 0 (Higher Integer value indicates higher priority)
YarnApplicationState: ACCEPTED: waiting for AM container to be allocated, launched and register with RM.
Queue: default
FinalStatus Reported by AM: Application has not completed yet.
Finished: N/A
Elapsed: 20mins, 30sec
Tracking URL: ApplicationMaster
Log Aggregation Status: DISABLED
Application Timeout (Remaining Time): Unlimited
Diagnostics: AM container is launched, waiting for AM container to Register with RM
Application Attempt status is
Application Attempt State: FAILED
Elapsed: 13mins, 19sec
AM Container: container_1607273090037_0001_02_000001
Node: N/A
Tracking URL: History
Diagnostics Info: ApplicationMaster for attempt appattempt_1607273090037_0001_000002 timed out
Node Local Request Rack Local Request Off Switch Request
Num Node Local Containers (satisfied by) 0
Num Rack Local Containers (satisfied by) 0 0
Num Off Switch Containers (satisfied by) 0 0 1
nodemanager log
2020-12-07 01:45:16,237 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler: Starting container [container_1607273090037_0001_01_000001]
2020-12-07 01:45:16,267 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1607273090037_0001_01_000001 transitioned from SCHEDULED to RUNNING
2020-12-07 01:45:16,267 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Starting resource-monitoring for container_1607273090037_0001_01_000001
2020-12-07 01:45:16,272 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, /tmp/hadoop-oozie/nm-local-dir/usercache/oozie/appcache/application_1607273090037_0001/container_1607273090037_0001_01_000001/default_container_executor.sh]
2020-12-07 01:45:17,301 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: container_1607273090037_0001_01_000001's ip = 127.0.0.1, and hostname = localhost.localdomain
2020-12-07 01:45:17,345 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Skipping monitoring container container_1607273090037_0001_01_000001 since CPU usage is not yet available.
2020-12-07 01:45:48,274 INFO logs: Aliases are enabled
2020-12-07 01:54:50,242 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Cache Size Before Clean: 496756, Total Deleted: 0, Public Deleted: 0, Private Deleted: 0
2020-12-07 01:58:10,071 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1607273090037_0001_000001 (auth:SIMPLE)
2020-12-07 01:58:10,078 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Stopping container with container Id: container_1607273090037_0001_01_000001
What is the problem ?

Docker : Worker keep exiting and respawn and 99% CPU

Supervisord can't keep redis worker on, my worker still exited with satus code 12 and respawn. My redis container is on and on the the docker network of app (where supervisor be)
I follow the doc of laravel : https://laravel.com/docs/5.8/queues#supervisor-configuration
I tried to daemonize the command and some code update
I updated my Debian to Stretch and Docker as well
I tested all of it in local, everything works fine...
[program:laravel-worker]
process_name=%(program_name)s_%(process_num)02d
command=php /var/www/artisan queue:work redis --sleep=3 --tries=3
autostart=true
autorestart=true
user=www-data
numprocs=8
priority=10
redirect_stderr=true
stdout_logfile=/var/log/worker.log
api:
image: gitlab.ladechetterieduweb.com:5555/lddw/backend:latest
container_name: backend-lddw-develop
restart: always
working_dir: /var/www
volumes:
- ./config/api:/var/env
- ./app/storage:/var/www/storage
- ./logs/laravel:/var/www/storage/logs
- ./logs/supervisord:/var/log
depends_on:
- db
command: /bin/bash -c "cp /var/env/.env /var/www/.env && supervisord -c /etc/supervisord.conf --nodaemon"
networks:
- app-network
redis:
image: redis:5.0.3-stretch
restart: always
ports:
- 6379:6379
container_name: redis-lddw-develop
volumes:
- redis_data:/data
- ./config/redis:/usr/local/etc/redis
command: /bin/bash -c "cp /usr/local/etc/redis/rc.local /etc/rc.local && redis-server --appendonly yes"
networks:
- app-network
2019-06-25 21:50:15,199 CRIT Set uid to user 0
2019-06-25 21:50:15,199 WARN No file matches via include "/etc/supervisor/conf.d/*.conf"
2019-06-25 21:50:15,295 INFO RPC interface 'supervisor' initialized
2019-06-25 21:50:15,295 CRIT Server 'inet_http_server' running without any HTTP authentication checking
2019-06-25 21:50:15,457 INFO RPC interface 'supervisor' initialized
2019-06-25 21:50:15,457 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2019-06-25 21:50:15,457 INFO supervisord started with pid 1
2019-06-25 21:50:16,460 INFO spawned: 'php-fpm' with pid 10
2019-06-25 21:50:16,461 INFO spawned: 'laravel-worker_00' with pid 11
2019-06-25 21:50:16,463 INFO spawned: 'laravel-worker_01' with pid 12
2019-06-25 21:50:16,464 INFO spawned: 'laravel-worker_02' with pid 13
2019-06-25 21:50:16,467 INFO spawned: 'laravel-worker_03' with pid 14
2019-06-25 21:50:16,469 INFO spawned: 'laravel-worker_04' with pid 15
2019-06-25 21:50:16,472 INFO spawned: 'laravel-worker_05' with pid 16
2019-06-25 21:50:16,474 INFO spawned: 'laravel-worker_06' with pid 17
2019-06-25 21:50:16,476 INFO spawned: 'laravel-worker_07' with pid 18
2019-06-25 21:50:17,667 INFO success: php-fpm entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2019-06-25 21:50:17,667 INFO success: laravel-worker_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2019-06-25 21:50:17,667 INFO success: laravel-worker_01 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2019-06-25 21:50:17,667 INFO success: laravel-worker_02 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2019-06-25 21:50:17,667 INFO success: laravel-worker_03 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2019-06-25 21:50:17,668 INFO success: laravel-worker_04 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2019-06-25 21:50:17,668 INFO success: laravel-worker_05 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2019-06-25 21:50:17,668 INFO success: laravel-worker_06 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2019-06-25 21:50:17,668 INFO success: laravel-worker_07 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2019-06-25 21:50:26,205 INFO exited: laravel-worker_00 (exit status 12; not expected)
2019-06-25 21:50:26,205 INFO exited: laravel-worker_02 (exit status 12; not expected)
2019-06-25 21:50:26,205 INFO exited: laravel-worker_03 (exit status 12; not expected)
2019-06-25 21:50:26,205 INFO exited: laravel-worker_05 (exit status 12; not expected)
2019-06-25 21:50:26,205 INFO exited: laravel-worker_06 (exit status 12; not expected)
2019-06-25 21:50:26,205 INFO exited: laravel-worker_07 (exit status 12; not expected)
Your workers are exiting with status 12 :
2019-06-25 21:50:26,205 INFO exited: laravel-worker_00 (exit status 12; not expected)
2019-06-25 21:50:26,205 INFO exited: laravel-worker_02 (exit status 12; not expected)
2019-06-25 21:50:26,205 INFO exited: laravel-worker_03 (exit status 12; not expected)
2019-06-25 21:50:26,205 INFO exited: laravel-worker_05 (exit status 12; not expected)
2019-06-25 21:50:26,205 INFO exited: laravel-worker_06 (exit status 12; not expected)
2019-06-25 21:50:26,205 INFO exited: laravel-worker_07 (exit status 12; not expected)
This exit code is triggered when your worker is consuming too much memory, see :
https://github.com/laravel/framework/blob/5.8/src/Illuminate/Queue/Worker.php#L204
$this->memoryExceeded($options->memory) returns true and exit.
You have 2 options, reducing the memory footprint of your worker or increasing memory allowed to it. As the default is pretty low (128MB), you can try to add some memory.
To change memory allowed to your workers, you have to edit your supervisord configuration :
[program:laravel-worker]
process_name=%(program_name)s_%(process_num)02d
command=php /var/www/artisan queue:work redis --sleep=3 --tries=3 --memory=1024
autostart=true
autorestart=true
user=www-data
numprocs=8
priority=10
redirect_stderr=true
stdout_logfile=/var/log/worker.log
See the --memory I added to your conf in the command section
Regards

Connection Refused when creating a two nodes cluster in Apache NiFi

I'm facing a problem regarding a refused connection on the cluster node protocol port.
I'm using the following configs to create the two nodes cluster:
For the First node manager :
####################
# State Management #
####################
nifi.state.management.configuration.file=./conf/state-management.xml
# The ID of the local state provider
nifi.state.management.provider.local=local-provider
# The ID of the cluster-wide state provider. This will be ignored if NiFi is not clustered but must be populated if running in a cluster.
nifi.state.management.provider.cluster=zk-provider
# Specifies whether or not this instance of NiFi should run an embedded ZooKeeper server
nifi.state.management.embedded.zookeeper.start=true
# Properties file that provides the ZooKeeper properties to use if <nifi.state.management.embedded.zookeeper.start> is set to true
nifi.state.management.embedded.zookeeper.properties=./conf/zookeeper.properties
# web properties #
nifi.web.war.directory=./lib
nifi.web.http.host=10.129.140.22
nifi.web.http.port=3000
nifi.web.http.network.interface.default=
nifi.web.https.host=
nifi.web.https.port=
nifi.web.https.network.interface.default=
nifi.web.jetty.working.directory=./work/jetty
nifi.web.jetty.threads=200
nifi.web.max.header.size=16 KB
nifi.web.proxy.context.path=
nifi.web.proxy.host=
# cluster node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address=
nifi.cluster.node.protocol.port=10000
nifi.cluster.node.protocol.threads=10
nifi.cluster.node.protocol.max.threads=50
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=5 sec
nifi.cluster.node.read.timeout=5 sec
nifi.cluster.node.max.concurrent.requests=100
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=5 mins
nifi.cluster.flow.election.max.candidates=
# cluster load balancing properties #
nifi.cluster.load.balance.host=
nifi.cluster.load.balance.port=6342
nifi.cluster.load.balance.connections.per.node=4
nifi.cluster.load.balance.max.thread.count=8
nifi.cluster.load.balance.comms.timeout=30 sec
# zookeeper properties, used for cluster management #
nifi.zookeeper.connect.string=localhost:2181
nifi.zookeeper.connect.timeout=3 secs
nifi.zookeeper.session.timeout=3 secs
nifi.zookeeper.root.node=/nifi
For the second node slave:
####################
# State Management #
####################
nifi.state.management.configuration.file=./conf/state-management.xml
# The ID of the local state provider
nifi.state.management.provider.local=local-provider
# The ID of the cluster-wide state provider. This will be ignored if NiFi is not clustered but must be populated if running in a cluster.
nifi.state.management.provider.cluster=zk-provider
# Specifies whether or not this instance of NiFi should run an embedded ZooKeeper server
nifi.state.management.embedded.zookeeper.start=false
# Properties file that provides the ZooKeeper properties to use if <nifi.state.management.embedded.zookeeper.start> is set to true
nifi.state.management.embedded.zookeeper.properties=./conf/zookeeper.properties
# web properties #
nifi.web.war.directory=./lib
nifi.web.http.host=
nifi.web.http.port=9021
nifi.web.http.network.interface.default=
nifi.web.https.host=
nifi.web.https.port=
nifi.web.https.network.interface.default=
nifi.web.jetty.working.directory=./work/jetty
nifi.web.jetty.threads=200
nifi.web.max.header.size=16 KB
nifi.web.proxy.context.path=
nifi.web.proxy.host=
# cluster node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address=
nifi.cluster.node.protocol.port=10001
nifi.cluster.node.protocol.threads=10
nifi.cluster.node.protocol.max.threads=50
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=5 sec
nifi.cluster.node.read.timeout=5 sec
nifi.cluster.node.max.concurrent.requests=100
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=5 mins
nifi.cluster.flow.election.max.candidates=
# cluster load balancing properties #
nifi.cluster.load.balance.host=10.129.140.22
nifi.cluster.load.balance.port=6343
nifi.cluster.load.balance.connections.per.node=4
nifi.cluster.load.balance.max.thread.count=8
nifi.cluster.load.balance.comms.timeout=30 sec
# zookeeper properties, used for cluster management #
nifi.zookeeper.connect.string=10.129.140.22:2181
nifi.zookeeper.connect.timeout=3 secs
nifi.zookeeper.session.timeout=3 secs
nifi.zookeeper.root.node=/nifi
The logs fils shows the following :
For the slave
2019-05-23 10:37:07,384 INFO [main] o.a.n.c.repository.FileSystemRepository Initializing FileSystemRepository with 'Always Sync' set to false
2019-05-23 10:37:07,541 INFO [main] o.apache.nifi.controller.FlowController Not enabling RAW Socket Site-to-Site functionality because nifi.remote.input.socket.port is not set
2019-05-23 10:37:07,546 INFO [main] o.apache.nifi.controller.FlowController Checking if there is already a Cluster Coordinator Elected...
2019-05-23 10:37:07,591 INFO [main] o.a.c.f.imps.CuratorFrameworkImpl Starting
2019-05-23 10:37:07,658 INFO [main-EventThread] o.a.c.f.state.ConnectionStateManager State change: CONNECTED
2019-05-23 10:37:07,693 INFO [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl backgroundOperationsLoop exiting
2019-05-23 10:37:07,697 INFO [main] o.apache.nifi.controller.FlowController The Election for Cluster Coordinator has already begun (Leader is localhost:10000). Will not register to be elected for this role until after connecting to the cluster and inheriting the cluster's flow.
2019-05-23 10:37:07,699 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=true] Registered new Leader Selector for role Cluster Coordinator; this node is a silent observer in the election.
2019-05-23 10:37:07,699 INFO [main] o.a.c.f.imps.CuratorFrameworkImpl Starting
2019-05-23 10:37:07,703 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=false] Registered new Leader Selector for role Cluster Coordinator; this node is a silent observer in the election.
2019-05-23 10:37:07,703 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=false] started
2019-05-23 10:37:07,703 INFO [main] o.a.n.c.c.h.AbstractHeartbeatMonitor Heartbeat Monitor started
2019-05-23 10:37:07,706 INFO [main-EventThread] o.a.c.f.state.ConnectionStateManager State change: CONNECTED
2019-05-23 10:37:09,587 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext#1a6a4595{nifi-api,/nifi-api,file:///home/superman/nifi-1.9.2/work/jetty/nifi-web-api-1.9.2.war/webapp/,AVAILABLE}{./work/nar/framework/nifi-framework-nar-1.9.2.nar-unpacked/NAR-INF/bundled-dependencies/nifi-web-api-1.9.2.war}
2019-05-23 10:37:09,850 INFO [main] o.e.j.a.AnnotationConfiguration Scanning elapsed time=77ms
2019-05-23 10:37:09,852 INFO [main] o.e.j.s.h.C._nifi_content_viewer No Spring WebApplicationInitializer types detected on classpath
2019-05-23 10:37:09,873 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext#4b1b2255{nifi-content-viewer,/nifi-content-viewer,file:///home/superman/nifi-1.9.2/work/jetty/nifi-web-content-viewer-1.9.2.war/webapp/,AVAILABLE}{./work/nar/framework/nifi-framework-nar-1.9.2.nar-unpacked/NAR-INF/bundled-dependencies/nifi-web-content-viewer-1.9.2.war}
2019-05-23 10:37:09,895 INFO [main] o.e.j.a.AnnotationConfiguration Scanning elapsed time=6ms
2019-05-23 10:37:09,896 WARN [main] o.e.j.webapp.StandardDescriptorProcessor Duplicate mapping from / to default
2019-05-23 10:37:09,915 INFO [main] o.e.j.s.h.ContextHandler._nifi_docs No Spring WebApplicationInitializer types detected on classpath
2019-05-23 10:37:09,917 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext#4965454c{nifi-docs,/nifi-docs,file:///home/superman/nifi-1.9.2/work/jetty/nifi-web-docs-1.9.2.war/webapp/,AVAILABLE}{./work/nar/framework/nifi-framework-nar-1.9.2.nar-unpacked/NAR-INF/bundled-dependencies/nifi-web-docs-1.9.2.war}
2019-05-23 10:37:09,936 INFO [main] o.e.j.a.AnnotationConfiguration Scanning elapsed time=8ms
2019-05-23 10:37:09,955 INFO [main] o.e.j.server.handler.ContextHandler._ No Spring WebApplicationInitializer types detected on classpath
2019-05-23 10:37:09,957 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext#1e4a4ed5{nifi-error,/,file:///home/superman/nifi-1.9.2/work/jetty/nifi-web-error-1.9.2.war/webapp/,AVAILABLE}{./work/nar/framework/nifi-framework-nar-1.9.2.nar-unpacked/NAR-INF/bundled-dependencies/nifi-web-error-1.9.2.war}
2019-05-23 10:37:09,967 INFO [main] o.eclipse.jetty.server.AbstractConnector Started ServerConnector#4518bffd{HTTP/1.1,[http/1.1]}{0.0.0.0:9021}
2019-05-23 10:37:09,967 INFO [main] org.eclipse.jetty.server.Server Started #28769ms
2019-05-23 10:37:09,978 INFO [main] org.apache.nifi.web.server.JettyServer Loading Flow...
2019-05-23 10:37:09,982 INFO [main] org.apache.nifi.io.socket.SocketListener Now listening for connections from nodes on port 10001
2019-05-23 10:37:10,026 INFO [main] o.apache.nifi.controller.FlowController Successfully synchronized controller with proposed flow
2019-05-23 10:37:10,071 INFO [main] o.a.nifi.controller.StandardFlowService Connecting Node: localhost:9021
2019-05-23 10:37:10,073 INFO [main] o.a.n.c.c.n.LeaderElectionNodeProtocolSender Determined that Cluster Coordinator is located at localhost:10000; will use this address for sending heartbeat messages
2019-05-23 10:37:10,074 WARN [main] o.a.nifi.controller.StandardFlowService Failed to connect to cluster due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed to create socket to localhost:10000 due to: java.net.ConnectException: Connection refused (Connection refused)
2019-05-23 10:37:12,715 WARN [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Failed to determine which node is elected active Cluster Coordinator: ZooKeeper reports the address as localhost:10000, but there is no node with this address. Attempted to determine the node's information but failed to retrieve its information due to org.apache.nifi.cluster.protocol.ProtocolException: Failed to create socket due to: java.net.ConnectException: Connection refused (Connection refused)
2019-05-23 10:37:12,720 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Event Reported for localhost:9021 -- Received heartbeat from node previously disconnected due to Has Not Yet Connected to Cluster. Issuing reconnection request.
2019-05-23 10:37:12,721 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Event Reported for localhost:9021 -- Requesting that node connect to cluster
2019-05-23 10:37:12,721 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Status of localhost:9021 changed from NodeConnectionStatus[nodeId=localhost:9021, state=DISCONNECTED, Disconnect Code=Has Not Yet Connected to Cluster, Disconnect Reason=Has Not Yet Connected to Cluster, updateId=1] to NodeConnectionStatus[nodeId=localhost:9021, state=CONNECTING, updateId=3]
2019-05-23 10:37:15,075 INFO [main] o.a.n.c.c.n.LeaderElectionNodeProtocolSender Determined that Cluster Coordinator is located at localhost:10000; will use this address for sending heartbeat messages
2019-05-23 10:37:15,076 WARN [main] o.a.nifi.controller.StandardFlowService Failed to connect to cluster due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed to create socket to localhost:10000 due to: java.net.ConnectException: Connection refused (Connection refused)
For the manager
2019-05-23 10:36:59,752 INFO [main] o.a.zookeeper.server.ZooKeeperServer Server environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2019-05-23 10:36:59,752 INFO [main] o.a.zookeeper.server.ZooKeeperServer Server environment:java.io.tmpdir=/tmp
2019-05-23 10:36:59,752 INFO [main] o.a.zookeeper.server.ZooKeeperServer Server environment:java.compiler=<NA>
2019-05-23 10:36:59,752 INFO [main] o.a.zookeeper.server.ZooKeeperServer Server environment:os.name=Linux
2019-05-23 10:36:59,752 INFO [main] o.a.zookeeper.server.ZooKeeperServer Server environment:os.arch=amd64
2019-05-23 10:36:59,753 INFO [main] o.a.zookeeper.server.ZooKeeperServer Server environment:os.version=4.15.0-20-generic
2019-05-23 10:36:59,753 INFO [main] o.a.zookeeper.server.ZooKeeperServer Server environment:user.name=root
2019-05-23 10:36:59,753 INFO [main] o.a.zookeeper.server.ZooKeeperServer Server environment:user.home=/root
2019-05-23 10:36:59,753 INFO [main] o.a.zookeeper.server.ZooKeeperServer Server environment:user.dir=/home/superman/nifi-1.9.2
2019-05-23 10:36:59,753 INFO [main] o.a.zookeeper.server.ZooKeeperServer tickTime set to 2000
2019-05-23 10:36:59,754 INFO [main] o.a.zookeeper.server.ZooKeeperServer minSessionTimeout set to -1
2019-05-23 10:36:59,754 INFO [main] o.a.zookeeper.server.ZooKeeperServer maxSessionTimeout set to -1
2019-05-23 10:36:59,855 INFO [main] o.apache.nifi.controller.FlowController Checking if there is already a Cluster Coordinator Elected...
2019-05-23 10:36:59,903 INFO [main] o.a.c.f.imps.CuratorFrameworkImpl Starting
2019-05-23 10:36:59,950 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] o.a.zookeeper.server.ZooKeeperServer Client attempting to establish new session at /127.0.0.1:40388
2019-05-23 10:36:59,950 INFO [SyncThread:0] o.a.z.server.persistence.FileTxnLog Creating new log file: log.3c
2019-05-23 10:36:59,963 INFO [SyncThread:0] o.a.zookeeper.server.ZooKeeperServer Established session 0x16ae443f4130000 with negotiated timeout 4000 for client /127.0.0.1:40388
2019-05-23 10:36:59,975 INFO [main-EventThread] o.a.c.f.state.ConnectionStateManager State change: CONNECTED
2019-05-23 10:36:59,998 INFO [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl backgroundOperationsLoop exiting
2019-05-23 10:37:00,003 INFO [main] o.apache.nifi.controller.FlowController The Election for Cluster Coordinator has already begun (Leader is localhost:10001). Will not register to be elected for this role until after connecting to the cluster and inheriting the cluster's flow.
2019-05-23 10:37:00,005 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=true] Registered new Leader Selector for role Cluster Coordinator; this node is a silent observer in the election.
2019-05-23 10:37:00,005 INFO [main] o.a.c.f.imps.CuratorFrameworkImpl Starting
2019-05-23 10:37:00,017 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=false] Registered new Leader Selector for role Cluster Coordinator; this node is a silent observer in the election.
2019-05-23 10:37:00,017 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=false] started
2019-05-23 10:37:00,017 INFO [main] o.a.n.c.c.h.AbstractHeartbeatMonitor Heartbeat Monitor started
2019-05-23 10:37:00,019 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] o.a.zookeeper.server.ZooKeeperServer Client attempting to establish new session at /127.0.0.1:40390
2019-05-23 10:37:00,020 INFO [SyncThread:0] o.a.zookeeper.server.ZooKeeperServer Established session 0x16ae443f4130001 with negotiated timeout 4000 for client /127.0.0.1:40390
2019-05-23 10:37:00,020 INFO [main-EventThread] o.a.c.f.state.ConnectionStateManager State change: CONNECTED
2019-05-23 10:37:02,022 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext#1a05ff8e{nifi-api,/nifi-api,file:///home/superman/nifi-1.9.2/work/jetty/nifi-web-api-1.9.2.war/webapp/,AVAILABLE}{./work/nar/framework/nifi-framework-nar-1.9.2.nar-unpacked/NAR-INF/bundled-dependencies/nifi-web-api-1.9.2.war}
2019-05-23 10:37:02,373 INFO [main] o.e.j.a.AnnotationConfiguration Scanning elapsed time=165ms
2019-05-23 10:37:02,375 INFO [main] o.e.j.s.h.C._nifi_content_viewer No Spring WebApplicationInitializer types detected on classpath
2019-05-23 10:37:02,401 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext#251e2f4a{nifi-content-viewer,/nifi-content-viewer,file:///home/superman/nifi-1.9.2/work/jetty/nifi-web-content-viewer-1.9.2.war/webapp/,AVAILABLE}{./work/nar/framework/nifi-framework-nar-1.9.2.nar-unpacked/NAR-INF/bundled-dependencies/nifi-web-content-viewer-1.9.2.war}
2019-05-23 10:37:02,419 INFO [main] o.e.j.a.AnnotationConfiguration Scanning elapsed time=6ms
2019-05-23 10:37:02,420 WARN [main] o.e.j.webapp.StandardDescriptorProcessor Duplicate mapping from / to default
2019-05-23 10:37:02,421 INFO [main] o.e.j.s.h.ContextHandler._nifi_docs No Spring WebApplicationInitializer types detected on classpath
2019-05-23 10:37:02,441 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext#1abea1ed{nifi-docs,/nifi-docs,file:///home/superman/nifi-1.9.2/work/jetty/nifi-web-docs-1.9.2.war/webapp/,AVAILABLE}{./work/nar/framework/nifi-framework-nar-1.9.2.nar-unpacked/NAR-INF/bundled-dependencies/nifi-web-docs-1.9.2.war}
2019-05-23 10:37:02,457 INFO [main] o.e.j.a.AnnotationConfiguration Scanning elapsed time=6ms
2019-05-23 10:37:02,475 INFO [main] o.e.j.server.handler.ContextHandler._ No Spring WebApplicationInitializer types detected on classpath
2019-05-23 10:37:02,478 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext#6f5288c5{nifi-error,/,file:///home/superman/nifi-1.9.2/work/jetty/nifi-web-error-1.9.2.war/webapp/,AVAILABLE}{./work/nar/framework/nifi-framework-nar-1.9.2.nar-unpacked/NAR-INF/bundled-dependencies/nifi-web-error-1.9.2.war}
2019-05-23 10:37:02,488 INFO [main] o.eclipse.jetty.server.AbstractConnector Started ServerConnector#167ed1cf{HTTP/1.1,[http/1.1]}{10.129.140.22:3000}
2019-05-23 10:37:02,488 INFO [main] org.eclipse.jetty.server.Server Started #26145ms
2019-05-23 10:37:02,500 INFO [main] org.apache.nifi.web.server.JettyServer Loading Flow...
2019-05-23 10:37:02,503 INFO [main] org.apache.nifi.io.socket.SocketListener Now listening for connections from nodes on port 10000
2019-05-23 10:37:02,545 INFO [main] o.apache.nifi.controller.FlowController Successfully synchronized controller with proposed flow
2019-05-23 10:37:02,587 INFO [main] o.a.nifi.controller.StandardFlowService Connecting Node: 10.129.140.22:3000
2019-05-23 10:37:02,589 INFO [main] o.a.n.c.c.n.LeaderElectionNodeProtocolSender Determined that Cluster Coordinator is located at localhost:10001; will use this address for sending heartbeat messages
2019-05-23 10:37:02,590 WARN [main] o.a.nifi.controller.StandardFlowService Failed to connect to cluster due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed to create socket to localhost:10001 due to: java.net.ConnectException: Connection refused (Connection refused)
2019-05-23 10:37:04,001 INFO [SessionTracker] o.a.zookeeper.server.ZooKeeperServer Expiring session 0x16ae42f180d0003, timeout of 4000ms exceeded
2019-05-23 10:37:04,001 INFO [SessionTracker] o.a.zookeeper.server.ZooKeeperServer Expiring session 0x16ae42f180d0002, timeout of 4000ms exceeded
2019-05-23 10:37:05,026 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Event Reported for 10.129.140.22:3000 -- Received heartbeat from node previously disconnected due to Has Not Yet Connected to Cluster. Issuing reconnection request.
2019-05-23 10:37:05,028 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Event Reported for 10.129.140.22:3000 -- Requesting that node connect to cluster
2019-05-23 10:37:05,028 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Status of 10.129.140.22:3000 changed from NodeConnectionStatus[nodeId=10.129.140.22:3000, state=DISCONNECTED, Disconnect Code=Has Not Yet Connected to Cluster, Disconnect Reason=Has Not Yet Connected to Cluster, updateId=0] to NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=5]
2019-05-23 10:37:07,591 WARN [main] o.a.nifi.controller.StandardFlowService There is currently no Cluster Coordinator. This often happens upon restart of NiFi when running an embedded ZooKeeper. Will register this node to become the active Cluster Coordinator and will attempt to connect to cluster again
2019-05-23 10:37:07,594 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=false] Registered new Leader Selector for role Cluster Coordinator; this node is an active participant in the election.
2019-05-23 10:37:07,612 INFO [Leader Election Notification Thread-1] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener#1d6dcdcb This node has been elected Leader for Role 'Cluster Coordinator'
2019-05-23 10:37:07,612 INFO [Leader Election Notification Thread-1] o.apache.nifi.controller.FlowController This node elected Active Cluster Coordinator
2019-05-23 10:37:07,668 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Event Reported for 10.129.140.22:3000 -- Received heartbeat from node previously disconnected due to Has Not Yet Connected to Cluster. Issuing reconnection request.
2019-05-23 10:37:07,668 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Event Reported for 10.129.140.22:3000 -- Requesting that node connect to cluster
2019-05-23 10:37:07,669 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Status of 10.129.140.22:3000 changed from NodeConnectionStatus[nodeId=10.129.140.22:3000, state=DISCONNECTED, Disconnect Code=Has Not Yet Connected to Cluster, Disconnect Reason=Has Not Yet Connected to Cluster, updateId=1] to NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=6]
2019-05-23 10:37:07,675 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Event Reported for 10.129.140.22:3000 -- Received heartbeat from node previously disconnected due to Has Not Yet Connected to Cluster. Issuing reconnection request.
2019-05-23 10:37:07,675 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Event Reported for 10.129.140.22:3000 -- Requesting that node connect to cluster
2019-05-23 10:37:07,675 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Status of 10.129.140.22:3000 changed from NodeConnectionStatus[nodeId=10.129.140.22:3000, state=DISCONNECTED, Disconnect Code=Has Not Yet Connected to Cluster, Disconnect Reason=Has Not Yet Connected to Cluster, updateId=2] to NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=7]
2019-05-23 10:37:07,694 INFO [Process Cluster Protocol Request-1] o.a.n.c.c.node.NodeClusterCoordinator Status of 10.129.140.22:3000 changed from NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=5] to NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=5]
2019-05-23 10:37:07,695 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Event Reported for 10.129.140.22:3000 -- Received heartbeat from node previously disconnected due to Has Not Yet Connected to Cluster. Issuing reconnection request.
2019-05-23 10:37:07,699 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Event Reported for 10.129.140.22:3000 -- Requesting that node connect to cluster
2019-05-23 10:37:07,700 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Status of 10.129.140.22:3000 changed from NodeConnectionStatus[nodeId=10.129.140.22:3000, state=DISCONNECTED, Disconnect Code=Has Not Yet Connected to Cluster, Disconnect Reason=Has Not Yet Connected to Cluster, updateId=3] to NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=8]
2019-05-23 10:37:07,701 INFO [Process Cluster Protocol Request-5] o.a.n.c.c.node.NodeClusterCoordinator Status of 10.129.140.22:3000 changed from NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=7] to NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=7]
2019-05-23 10:37:07,702 INFO [Process Cluster Protocol Request-1] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 19834836-9bda-41b3-8fef-4a288d90c7bf (type=NODE_STATUS_CHANGE, length=1103 bytes) from localhost.localdomain in 33 millis
2019-05-23 10:37:07,702 INFO [Process Cluster Protocol Request-5] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 85b0bb3f-c2a6-4dfd-abd6-e9df14710c4d (type=NODE_STATUS_CHANGE, length=1103 bytes) from localhost.localdomain in 10 millis
2019-05-23 10:37:07,703 INFO [Process Cluster Protocol Request-3] o.a.n.c.c.node.NodeClusterCoordinator Status of 10.129.140.22:3000 changed from NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=6] to NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=6]
2019-05-23 10:37:07,705 INFO [Process Cluster Protocol Request-3] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 80447901-4ad3-44e3-91ad-d9f075624eae (type=NODE_STATUS_CHANGE, length=1103 bytes) from localhost.localdomain in 31 millis
2019-05-23 10:37:07,706 INFO [Reconnect to Cluster] o.a.nifi.controller.StandardFlowService Processing reconnection request from cluster coordinator.
2019-05-23 10:37:07,706 INFO [Reconnect to Cluster] o.a.nifi.controller.StandardFlowService Received a Reconnection Request that contained no DataFlow. Will attempt to connect to cluster using local flow.
2019-05-23 10:37:07,707 INFO [Process Cluster Protocol Request-2] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 22cdceee-c01f-445f-a091-38812e878d10 (type=RECONNECTION_REQUEST, length=3095 bytes) from 10.129.140.22:3000 in 34 millis
2019-05-23 10:37:07,708 INFO [Reconnect to Cluster] o.a.nifi.controller.StandardFlowService Processing reconnection request from cluster coordinator.
2019-05-23 10:37:07,708 INFO [Reconnect to Cluster] o.a.nifi.controller.StandardFlowService Received a Reconnection Request that contained no DataFlow. Will attempt to connect to cluster using local flow.
2019-05-23 10:37:07,709 INFO [Process Cluster Protocol Request-4] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 8605cf39-2034-4ee2-92c4-0fbe54e97fb2 (type=RECONNECTION_REQUEST, length=3013 bytes) from 10.129.140.22:3000 in 27 millis
2019-05-23 10:37:07,712 INFO [Reconnect 10.129.140.22:3000] o.a.n.c.c.node.NodeClusterCoordinator Successfully requested that 10.129.140.22:3000 join the cluster
2019-05-23 10:37:07,712 INFO [Reconnect 10.129.140.22:3000] o.a.n.c.c.node.NodeClusterCoordinator Successfully requested that 10.129.140.22:3000 join the cluster
2019-05-23 10:37:07,725 INFO [Process Cluster Protocol Request-6] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 0ca55348-44eb-416b-91dd-3d80da4c5ebe (type=RECONNECTION_REQUEST, length=3013 bytes) from 10.129.140.22:3000 in 29 millis
2019-05-23 10:37:07,725 INFO [Reconnect 10.129.140.22:3000] o.a.n.c.c.node.NodeClusterCoordinator Successfully requested that 10.129.140.22:3000 join the cluster
2019-05-23 10:37:07,728 INFO [Process Cluster Protocol Request-7] o.a.n.c.c.node.NodeClusterCoordinator Status of 10.129.140.22:3000 changed from NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=8] to NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=8]
2019-05-23 10:37:07,728 INFO [Process Cluster Protocol Request-7] o.a.n.c.p.impl.SocketProtocolListener Finished processing request c9b647d7-67ac-4d0a-833b-8a0a8cc0ba6d (type=NODE_STATUS_CHANGE, length=1103 bytes) from localhost.localdomain in 3 millis
2019-05-23 10:37:07,728 INFO [Reconnect to Cluster] o.a.nifi.controller.StandardFlowService Connecting Node: 10.129.140.22:3000
2019-05-23 10:37:07,725 INFO [Reconnect to Cluster] o.a.nifi.controller.StandardFlowService Processing reconnection request from cluster coordinator.
2019-05-23 10:37:07,732 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.h.AbstractHeartbeatMonitor Finished processing 4 heartbeats in 2 seconds, 708 millis
2019-05-23 10:37:07,732 INFO [Reconnect to Cluster] o.a.nifi.controller.StandardFlowService Received a Reconnection Request that contained no DataFlow. Will attempt to connect to cluster using local flow.
2019-05-23 10:37:07,733 INFO [Reconnect to Cluster] o.a.n.c.c.n.LeaderElectionNodeProtocolSender Determined that Cluster Coordinator is located at localhost:10000; will use this address for sending heartbeat messages
2019-05-23 10:37:07,734 INFO [Reconnect to Cluster] o.a.nifi.controller.StandardFlowService Connecting Node: 10.129.140.22:3000
2019-05-23 10:37:07,735 INFO [Reconnect to Cluster] o.a.nifi.controller.StandardFlowService Connecting Node: 10.129.140.22:3000
2019-05-23 10:37:07,736 INFO [Reconnect to Cluster] o.a.n.c.c.n.LeaderElectionNodeProtocolSender Determined that Cluster Coordinator is located at localhost:10000; will use this address for sending heartbeat messages
2019-05-23 10:37:07,736 INFO [Reconnect to Cluster] o.a.n.c.c.n.LeaderElectionNodeProtocolSender Determined that Cluster Coordinator is located at localhost:10000; will use this address for sending heartbeat messages
2019-05-23 10:37:07,748 INFO [Process Cluster Protocol Request-8] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 434daf63-1beb-4b82-9290-bb0da4e89b7f (type=RECONNECTION_REQUEST, length=2972 bytes) from 10.129.140.22:3000 in 16 millis
2019-05-23 10:37:07,749 INFO [Reconnect 10.129.140.22:3000] o.a.n.c.c.node.NodeClusterCoordinator Successfully requested that 10.129.140.22:3000 join the cluster**
Set these properties on both:
nifi.web.http.host=<host>
nifi.cluster.node.address=<host>
Beware of this value visibility in network scopes:
nifi.zookeeper.connect.string=localhost:2181
e.g 'localhost', in the other side you're using real IP addr
They share it during replication, primary/coordinator node election and flow election.

docker installation of openproject: Phusion passenger fails to start after installation

I am trying to install openproject using docker on centos7.6 but Phusion passenger fails to start after installation. Error is suggesting it failed to parse response.
The preloader process sent an unparseable response:. I don't know how to fix this issue.
stdout:
-----> Database setup finished.
On first installation, the default admin credentials are login: admin, password: admin
-----> Launching supervisord...
2019-05-08 08:14:46,313 CRIT Supervisor running as root (no user in config file)
2019-05-08 08:14:46,318 INFO supervisord started with pid 1
2019-05-08 08:14:47,321 INFO spawned: 'postgres' with pid 155
2019-05-08 08:14:47,325 INFO spawned: 'apache2' with pid 156
2019-05-08 08:14:47,328 INFO spawned: 'web' with pid 157
2019-05-08 08:14:47,331 INFO spawned: 'worker' with pid 158
2019-05-08 08:14:47,351 INFO spawned: 'postfix' with pid 159
2019-05-08 08:14:47,360 INFO spawned: 'memcached' with pid 160
2019-05-08 08:14:47.634 UTC [172] LOG: database system was shut down at 2019-05-08 08:14:44 UTC
2019-05-08 08:14:47,634 INFO success: postfix entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2019-05-08 08:14:47.649 UTC [172] LOG: MultiXact member wraparound protections are now enabled
2019-05-08 08:14:47.653 UTC [155] LOG: database system is ready to accept connections
2019-05-08 08:14:47.663 UTC [177] LOG: autovacuum launcher started
2019-05-08 08:14:48,670 INFO success: postgres entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2019-05-08 08:14:48,670 INFO success: apache2 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2019-05-08 08:14:48,670 INFO success: web entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2019-05-08 08:14:48,670 INFO success: worker entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2019-05-08 08:14:48,670 INFO success: memcached entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 172.17.0.2. Set the 'ServerName' directive globally to suppress this message
2019-05-08 08:14:50,198 INFO exited: postfix (exit status 0; expected)
--> Downloading a Phusion Passenger agent binary for your platform
--> Installing Nginx 1.15.8 engine
--------------------------
[passenger_native_support.so] trying to compile for the current user (app) and Ruby interpreter...
(set PASSENGER_COMPILE_NATIVE_SUPPORT_BINARY=0 to disable)
Compilation successful. The logs are here:
/tmp/passenger_native_support-15tsfhk.log
[passenger_native_support.so] successfully loaded.
=============== Phusion Passenger Standalone web server started ===============
PID file: /app/tmp/pids/passenger.8080.pid
Log file: /app/log/passenger.8080.log
Environment: production
Accessible via: http://0.0.0.0:8080/
You can stop Phusion Passenger Standalone by pressing Ctrl-C.
Problems? Check https://www.phusionpassenger.com/library/admin/standalone/troubleshooting/
===============================================================================
[ N 2019-05-08 08:15:01.7338 404/Tb age/Cor/SecurityUpdateChecker.h:519 ]: Security update check: no update found (next check in 24 hours)
Forcefully loading the application. Use :environment to avoid eager loading.
[auth_saml] Missing settings from '/app/config/plugins/auth_saml/settings.yml', skipping omniauth registration.
hook registered
App 439 output: [auth_saml] Missing settings from '/app/config/plugins/auth_saml/settings.yml', skipping omniauth registration.
App 439 output: hook registered
Creating scope :order_by_name. Overwriting existing method Sprint.order_by_name.
App 439 output: Creating scope :order_by_name. Overwriting existing method Sprint.order_by_name.
[Worker(host:d0b3748f627a pid:158)] Starting job worker
2019-05-08T08:15:45+0000: [Worker(host:d0b3748f627a pid:158)] Starting job worker
App 439 output: /app/vendor/bundle/ruby/2.6.0/gems/passenger-6.0.1/src/ruby_supportlib/phusion_passenger/preloader_shared_helpers.rb:108:in `fork': Cannot allocate memory - fork(2) (Errno::ENOMEM)
App 439 output: from /app/vendor/bundle/ruby/2.6.0/gems/passenger-6.0.1/src/ruby_supportlib/phusion_passenger/preloader_shared_helpers.rb:108:in `handle_spawn_command'
App 439 output: from /app/vendor/bundle/ruby/2.6.0/gems/passenger-6.0.1/src/ruby_supportlib/phusion_passenger/preloader_shared_helpers.rb:78:in `accept_and_process_next_client'
App 439 output: from /app/vendor/bundle/ruby/2.6.0/gems/passenger-6.0.1/src/ruby_supportlib/phusion_passenger/preloader_shared_helpers.rb:167:in `run_main_loop'
App 439 output: from /app/vendor/bundle/ruby/2.6.0/gems/passenger-6.0.1/src/helper-scripts/rack-preloader.rb:207:in `<module:App>'
App 439 output: from /app/vendor/bundle/ruby/2.6.0/gems/passenger-6.0.1/src/helper-scripts/rack-preloader.rb:30:in `<module:PhusionPassenger>'
App 439 output: from /app/vendor/bundle/ruby/2.6.0/gems/passenger-6.0.1/src/helper-scripts/rack-preloader.rb:29:in `<main>'
[ E 2019-05-08 08:15:46.6971 404/Tc age/Cor/App/Implementation.cpp:221 ]: Could not spawn process for application /app: The preloader process sent an unparseable response:
Error ID: d7825364
Error details saved to: /tmp/passenger-error-wjSTKF.html
[ E 2019-05-08 08:15:46.7028 404/T8 age/Cor/Con/CheckoutSession.cpp:276 ]: [Client 1-1] Cannot checkout session because a spawning error occurred. The identifier of the error is d7825364. Please see earlier logs for details about the error.
[ W 2019-05-08 08:34:24.7967 404/Tk age/Cor/Spa/SmartSpawner.h:572 ]: An error occurred while spawning an application process: Cannot connect to Unix socket '/tmp/passenger.PKROzbY/apps.s/preloader.hyl9g8': No such file or directory (errno=2)
[ W 2019-05-08 08:34:24.7968 404/Tk age/Cor/Spa/SmartSpawner.h:574 ]: The application preloader seems to have crashed, restarting it and trying again...
App 543 output: [auth_saml] Missing settings from '/app/config/plugins/auth_saml/settings.yml', skipping omniauth registration.
App 543 output: hook registered
App 543 output: Creating scope :order_by_name. Overwriting existing method Sprint.order_by_name.
App 543 output: /app/vendor/bundle/ruby/2.6.0/gems/passenger-6.0.1/src/ruby_supportlib/phusion_passenger/preloader_shared_helpers.rb:108:in `fork': Cannot allocate memory - fork(2) (Errno::ENOMEM)
App 543 output: from /app/vendor/bundle/ruby/2.6.0/gems/passenger-6.0.1/src/ruby_supportlib/phusion_passenger/preloader_shared_helpers.rb:108:in `handle_spawn_command'
App 543 output: from /app/vendor/bundle/ruby/2.6.0/gems/passenger-6.0.1/src/ruby_supportlib/phusion_passenger/preloader_shared_helpers.rb:78:in `accept_and_process_next_client'
App 543 output: from /app/vendor/bundle/ruby/2.6.0/gems/passenger-6.0.1/src/ruby_supportlib/phusion_passenger/preloader_shared_helpers.rb:167:in `run_main_loop'
App 543 output: from /app/vendor/bundle/ruby/2.6.0/gems/passenger-6.0.1/src/helper-scripts/rack-preloader.rb:207:in `<module:App>'
App 543 output: from /app/vendor/bundle/ruby/2.6.0/gems/passenger-6.0.1/src/helper-scripts/rack-preloader.rb:30:in `<module:PhusionPassenger>'
App 543 output: from /app/vendor/bundle/ruby/2.6.0/gems/passenger-6.0.1/src/helper-scripts/rack-preloader.rb:29:in `<main>'
[ E 2019-05-08 08:34:52.2521 404/Tk age/Cor/App/Implementation.cpp:221 ]: Could not spawn process for application /app: The preloader process sent an unparseable response:
Error ID: c2ce0823
Error details saved to: /tmp/passenger-error-bpsfAC.html
[ E 2019-05-08 08:34:52.2570 404/T8 age/Cor/Con/CheckoutSession.cpp:276 ]: [Client 1-2] Cannot checkout session because a spawning error occurred. The identifier of the error is c2ce0823. Please see earlier logs for details about the error.
Thanks.
The import line in the log is this one:
App 439 output: /app/vendor/bundle/ruby/2.6.0/gems/passenger-6.0.1/src/ruby_supportlib/phusion_passenger/preloader_shared_helpers.rb:108:in `fork': Cannot allocate memory - fork(2) (Errno::ENOMEM)
This means your container is unable to allocate necessary memory. It could be that your system is in a OOM state and things are being killed or due to some other restriction on the daemon that prevents it from allocating additional memory
For reference:
https://success.docker.com/article/docker-daemon-error-cannot-allocate-memory

Docker stop exit code -1 if the default CMD is a shell script

I am building a tomcat container in Docker with supervisord. If the default command in the Dockerfile is
CMD supervisord -c /etc/supervisord.conf
and when i dispatch docker stop command, the container exits successfully with the exit code 0.
But instead if i have
CMD ["/run"]
and in run.sh,
supervisord -c /etc/supervisord.conf
The docker stop command gives me a exit code -1. On viewing the logs, it seems that the supervisord did not receive the SIGTERM indicating the exit request.
2014-10-06 19:48:54,420 CRIT Supervisor running as root (no user in config file)
2014-10-06 19:48:54,450 INFO RPC interface 'supervisor' initialized
2014-10-06 19:48:54,451 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2014-10-06 19:48:54,451 INFO supervisord started with pid 6
2014-10-06 19:48:55,457 INFO spawned: 'tomcat' with pid 9
2014-10-06 19:48:56,503 INFO success: tomcat entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
as opposed to the previous logs where it receives a sigterm and gracefully exits.
2014-10-06 20:02:59,527 CRIT Supervisor running as root (no user in config file)
2014-10-06 20:02:59,556 INFO RPC interface 'supervisor' initialized
2014-10-06 20:02:59,556 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2014-10-06 20:02:59,557 INFO supervisord started with pid 1
2014-10-06 20:03:00,561 INFO spawned: 'tomcat' with pid 9
2014-10-06 20:03:01,602 INFO success: tomcat entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2014-10-06 20:05:11,690 WARN received SIGTERM indicating exit request
2014-10-06 20:05:11,690 INFO waiting for tomcat to die
2014-10-06 20:05:12,450 INFO stopped: tomcat (exit status 143)
Any help appreciated.
Thanks,
Karthik
UPDATE:
supervisord.conf file
[supervisord]
nodaemon=true
logfile=/var/log/supervisor/supervisord.log
[program:mysql]
command=/usr/bin/pidproxy /var/run/mysqld/mysqld.pid /usr/bin/mysqld_safe --pid-file=/var/run/mysqld/mysqld.pid
stdout_logfile=/tmp/mysql.log
stderr_logfile=/tmp/mysql_err.log
[supervisorctl]
serverurl=unix:///tmp/supervisor.sock
[unix_http_server]
file=/tmp/supervisor.sock ; path to your socket file
[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface
When you run the process via run.sh, signals are only sent to that process. Unless you are
going out of your way to send signals to child processes, e.g. with trap
sending signals to the process group.
doing exec supervisord ... in run.sh
the child process won't get the signals.

Resources