Pgpool standby server's status is "down" some time after starting replication with online recovery - cluster-computing

I have setup a 2 server cluster system using this configuration examples. They are in the same LAN. Master is node 0 on 192.168.1.31, slave is node 1 on 192.168.1.32.
The problem is that approximately 10 minutes after I started streaming replication with running pcp_recovery_node, status of standby by node is changed to down, slave continues replication but all other postgresql nodes get shutdown:
It gets back up when I run "pcp_attach_node -h 192.168.1.55 -p 9898 -U pgpool -n 1" and same process repeats.
I have found a similar issue here and read further about unexpected EOF on standby connection but was not able to come up with a solution. Not sure if it is the only problem here.
Here are the logs:
Slave pgpool:
2022-03-07 16:29:28: pid 11035: LOG: signal_user1_to_parent_with_reason(2)
2022-03-07 16:29:28: pid 11030: LOG: Pgpool-II parent process received SIGUSR1
2022-03-07 16:29:28: pid 11030: LOG: Pgpool-II parent process received sync backend signal from watchdog
2022-03-07 16:29:28: pid 11035: LOG: new IPC connection received
2022-03-07 16:29:28: pid 11030: LOG: leader watchdog has performed failover
2022-03-07 16:29:28: pid 11030: DETAIL: syncing the backend states from the LEADER watchdog node
2022-03-07 16:29:28: pid 11035: LOG: new IPC connection received
2022-03-07 16:29:28: pid 11035: LOG: received the get data request from local pgpool-II on IPC interface
2022-03-07 16:29:28: pid 11035: LOG: get data request from local pgpool-II node received on IPC interface is forwarded to leader watchdog node "192.168.1.31:9999 Linux localhost.localdomain"
2022-03-07 16:29:28: pid 11035: DETAIL: waiting for the reply...
2022-03-07 16:29:28: pid 11030: LOG: leader watchdog node "192.168.1.31:9999 Linux localhost.localdomain" returned status for 2 backend nodes
2022-03-07 16:29:28: pid 11030: LOG: backend:1 is set to down status
2022-03-07 16:29:28: pid 11030: DETAIL: backend:1 is DOWN on cluster leader "192.168.1.31:9999 Linux localhost.localdomain"
2022-03-07 16:29:28: pid 11030: LOG: 1 backend node(s) were detached because of backend status sync from "192.168.1.31:9999 Linux localhost.localdomain"
2022-03-07 16:29:28: pid 11030: DETAIL: restarting the children processes
2022-03-07 16:29:28: pid 11030: LOG: Node 0 is not down (status: 1)
2022-03-07 16:29:28: pid 26533: LOG: worker process received restart request
2022-03-07 16:29:28: pid 11030: LOG: worker child process with pid: 26533 exits with status 256
2022-03-07 16:29:28: pid 11030: LOG: fork a new worker child process with pid: 27025
2022-03-07 16:29:28: pid 27025: LOG: process started
Master pgpool:
2022-03-07 16:29:10: pid 330084: ERROR: unable to read data from frontend
2022-03-07 16:29:10: pid 330084: DETAIL: EOF encountered with frontend
2022-03-07 16:29:11: pid 330081: LOG: reading message length
2022-03-07 16:29:11: pid 330081: DETAIL: message length (22) in slot 1 does not match with slot 0(23)
2022-03-07 16:29:19: pid 15432: LOG: new IPC connection received
2022-03-07 16:29:27: pid 330082: LOG: received degenerate backend request for node_id: 1 from pid [330082]
2022-03-07 16:29:27: pid 15432: LOG: new IPC connection received
2022-03-07 16:29:27: pid 15432: LOG: watchdog received the failover command from local pgpool-II on IPC interface
2022-03-07 16:29:27: pid 15432: LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
2022-03-07 16:29:27: pid 15432: LOG: we have got the consensus to perform the failover
2022-03-07 16:29:27: pid 15432: DETAIL: 1 node(s) voted in the favor
2022-03-07 16:29:27: pid 330082: LOG: signal_user1_to_parent_with_reason(0)
2022-03-07 16:29:27: pid 15429: LOG: Pgpool-II parent process received SIGUSR1
2022-03-07 16:29:27: pid 15429: LOG: Pgpool-II parent process has received failover request
2022-03-07 16:29:27: pid 330082: WARNING: write on backend 1 failed with error :"Broken pipe"
2022-03-07 16:29:27: pid 330082: DETAIL: while trying to write data from offset: 0 wlen: 5
2022-03-07 16:29:27: pid 15432: LOG: new IPC connection received
2022-03-07 16:29:27: pid 15432: LOG: received the failover indication from Pgpool-II on IPC interface
2022-03-07 16:29:27: pid 15432: LOG: watchdog is informed of failover start by the main process
2022-03-07 16:29:27: pid 15429: LOG: starting degeneration. shutdown host 192.168.1.32(5432)
2022-03-07 16:29:27: pid 15429: LOG: Do not restart children because we are switching over node id 1 host: 192.168.1.32 port: 5432 and we are in streaming replication mode
2022-03-07 16:29:27: pid 15429: LOG: child pid 330081 needs to restart because pool 1 uses backend 1
2022-03-07 16:29:27: pid 15429: LOG: execute command: /etc/pgpool-II/failover.sh 1 192.168.1.32 5432 /var/lib/pgsql/14/data 0 192.168.1.31 0 0 5432 /var/lib/pgsql/14/data 192.168.1.31 5432
+ FAILED_NODE_ID=1
+ FAILED_NODE_HOST=192.168.1.32
+ FAILED_NODE_PORT=5432
+ FAILED_NODE_PGDATA=/var/lib/pgsql/14/data
+ NEW_MAIN_NODE_ID=0
+ NEW_MAIN_NODE_HOST=192.168.1.31
+ OLD_MAIN_NODE_ID=0
+ OLD_PRIMARY_NODE_ID=0
+ NEW_MAIN_NODE_PORT=5432
+ NEW_MAIN_NODE_PGDATA=/var/lib/pgsql/14/data
+ OLD_PRIMARY_NODE_HOST=192.168.1.31
+ OLD_PRIMARY_NODE_PORT=5432
+ PGHOME=/usr/pgsql-14
+ REPL_SLOT_NAME=192_168_1_32
+ echo failover.sh: start: failed_node_id=1 failed_host=192.168.1.32 old_primary_node_id=0 new_main_node_id=0 new_main_host=192.168.1.31
failover.sh: start: failed_node_id=1 failed_host=192.168.1.32 old_primary_node_id=0 new_main_node_id=0 new_main_host=192.168.1.31
+ '[' 0 -lt 0 ']'
+ ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres#192.168.1.31 -i /var/lib/pgsql/.ssh/id_rsa_pgpool ls /tmp
Warning: Permanently added '192.168.1.31' (ECDSA) to the list of known hosts.
+ '[' 0 -ne 0 ']'
+ '[' 1 -ne 0 ']'
+ /usr/pgsql-14/bin/psql -h 192.168.1.31 -p 5432 -c 'SELECT pg_drop_replication_slot('\''192_168_1_32'\'');'
+ '[' 1 -ne 0 ']'
+ echo ERROR: failover.sh: drop replication slot '"192_168_1_32"' failed. You may need to drop replication slot manually.
ERROR: failover.sh: drop replication slot "192_168_1_32" failed. You may need to drop replication slot manually.
+ echo failover.sh: end: standby node is down. Skipping failover.
failover.sh: end: standby node is down. Skipping failover.
+ exit 0
2022-03-07 16:29:28: pid 15429: LOG: failover: set new primary node: 0
2022-03-07 16:29:28: pid 15429: LOG: failover: set new main node: 0
2022-03-07 16:29:28: pid 15429: LOG: child pid 330081 needs to restart because pool 1 uses backend 1
2022-03-07 16:29:28: pid 330070: LOG: worker process received restart request
2022-03-07 16:29:28: pid 15432: LOG: new IPC connection received
2022-03-07 16:29:28: pid 15432: LOG: received the failover indication from Pgpool-II on IPC interface
2022-03-07 16:29:28: pid 15432: LOG: watchdog is informed of failover end by the main process
failover done. shutdown host 192.168.1.32(5432)2022-03-07 16:29:28: pid 15429: LOG: failover done. shutdown host 192.168.1.32(5432)
2022-03-07 16:29:28: pid 330093: LOG: failover or failback event detected
2022-03-07 16:29:28: pid 330093: DETAIL: restarting myself
2022-03-07 16:29:28: pid 330089: LOG: failover or failback event detected
2022-03-07 16:29:28: pid 330089: DETAIL: restarting myself
2022-03-07 16:29:28: pid 329708: LOG: failover or failback event detected
2022-03-07 16:29:28: pid 329708: DETAIL: restarting myself
2022-03-07 16:29:28: pid 330085: LOG: failover or failback event detected
2022-03-07 16:29:28: pid 330085: DETAIL: restarting myself
2022-03-07 16:29:28: pid 330104: LOG: failover or failback event detected
2022-03-07 16:29:28: pid 330104: DETAIL: restarting myself
2022-03-07 16:29:28: pid 330075: LOG: failover or failback event detected
2022-03-07 16:29:28: pid 330075: DETAIL: restarting myself
2022-03-07 16:29:28: pid 330106: LOG: failover or failback event detected
2022-03-07 16:29:28: pid 330106: DETAIL: restarting myself
2022-03-07 16:29:28: pid 330092: LOG: failover or failback event detected
2022-03-07 16:29:28: pid 330092: DETAIL: restarting myself
2022-03-07 16:29:28: pid 330079: LOG: failover or failback event detected
2022-03-07 16:29:28: pid 330079: DETAIL: restarting myself

Related

The oozie job does not run with the message [AM container is launched, waiting for AM container to Register with RM]

I ran a shell job among the oozie examples.
However, YARN application is not executed.
Detail information YARN UI & LOG:
https://docs.google.com/document/d/1N8LBXZGttY3rhRTwv8cUEfK3WkWtvWJ-YV1q_fh_kks/edit
YARN application status is
Application Priority: 0 (Higher Integer value indicates higher priority)
YarnApplicationState: ACCEPTED: waiting for AM container to be allocated, launched and register with RM.
Queue: default
FinalStatus Reported by AM: Application has not completed yet.
Finished: N/A
Elapsed: 20mins, 30sec
Tracking URL: ApplicationMaster
Log Aggregation Status: DISABLED
Application Timeout (Remaining Time): Unlimited
Diagnostics: AM container is launched, waiting for AM container to Register with RM
Application Attempt status is
Application Attempt State: FAILED
Elapsed: 13mins, 19sec
AM Container: container_1607273090037_0001_02_000001
Node: N/A
Tracking URL: History
Diagnostics Info: ApplicationMaster for attempt appattempt_1607273090037_0001_000002 timed out
Node Local Request Rack Local Request Off Switch Request
Num Node Local Containers (satisfied by) 0
Num Rack Local Containers (satisfied by) 0 0
Num Off Switch Containers (satisfied by) 0 0 1
nodemanager log
2020-12-07 01:45:16,237 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler: Starting container [container_1607273090037_0001_01_000001]
2020-12-07 01:45:16,267 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1607273090037_0001_01_000001 transitioned from SCHEDULED to RUNNING
2020-12-07 01:45:16,267 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Starting resource-monitoring for container_1607273090037_0001_01_000001
2020-12-07 01:45:16,272 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, /tmp/hadoop-oozie/nm-local-dir/usercache/oozie/appcache/application_1607273090037_0001/container_1607273090037_0001_01_000001/default_container_executor.sh]
2020-12-07 01:45:17,301 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: container_1607273090037_0001_01_000001's ip = 127.0.0.1, and hostname = localhost.localdomain
2020-12-07 01:45:17,345 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Skipping monitoring container container_1607273090037_0001_01_000001 since CPU usage is not yet available.
2020-12-07 01:45:48,274 INFO logs: Aliases are enabled
2020-12-07 01:54:50,242 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Cache Size Before Clean: 496756, Total Deleted: 0, Public Deleted: 0, Private Deleted: 0
2020-12-07 01:58:10,071 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1607273090037_0001_000001 (auth:SIMPLE)
2020-12-07 01:58:10,078 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Stopping container with container Id: container_1607273090037_0001_01_000001
What is the problem ?

Running Kafka-Manager inside Docker container on Windows

I am following this tutorial to run Kafka inside a Docker container on windows.
When I try to launch Kafka-Manager by opening http://localhost:9000 in the browser as described there, I get ERR_CONNECTION_REFUSED.
Something I think might be related is that at the first time I ran docker-compose up, PowerShell showed an error saying I needed to run some command first, to open a virtual machine or something like that.
Then I ran the command that PowerShell had told me and then I managed to run docker-compose up successfully. However the tutorial didn't mention anything about it, and since then every time I tried to run docker-compose up I managed to to it without running another command first, even if I closed and reopened PowerShell.
I suspect PowerShell remembers I'm connected to a virtual machine so docker-compose up runs Kafka inside a virtual machine, and therefore I can't reach Kafka-Manager in the browser, although I see shows the following message:
kafkamanager | [info] p.c.s.NettyServer - Listening for HTTP on
/0.0.0.0:9000
Edit:
docker logs for kafka container:
/usr/lib/python2.7/dist-packages/supervisor/options.py:296: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security.
'Supervisord is running as root and it is searching '
2020-02-28 08:37:37,274 CRIT Supervisor running as root (no user in config file)
2020-02-28 08:37:37,274 WARN Included extra file "/etc/supervisor/conf.d/zookeeper.conf" during parsing
2020-02-28 08:37:37,274 WARN Included extra file "/etc/supervisor/conf.d/kafka.conf" during parsing
2020-02-28 08:37:37,303 INFO RPC interface 'supervisor' initialized
2020-02-28 08:37:37,303 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2020-02-28 08:37:37,303 INFO supervisord started with pid 1
2020-02-28 08:37:38,306 INFO spawned: 'zookeeper' with pid 8
2020-02-28 08:37:38,308 INFO spawned: 'kafka' with pid 9
2020-02-28 08:37:39,372 INFO success: zookeeper entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-02-28 08:37:39,372 INFO success: kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-02-28 21:16:01,095 WARN received SIGTERM indicating exit request
2020-02-28 21:16:01,095 INFO waiting for zookeeper, kafka to die
2020-02-28 21:16:02,102 INFO stopped: kafka (terminated by SIGTERM)
2020-02-28 21:16:02,442 INFO stopped: zookeeper (exit status 143)
/usr/lib/python2.7/dist-packages/supervisor/options.py:296: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security.
'Supervisord is running as root and it is searching '
2020-02-28 21:17:50,843 CRIT Supervisor running as root (no user in config file)
2020-02-28 21:17:50,843 WARN Included extra file "/etc/supervisor/conf.d/zookeeper.conf" during parsing
2020-02-28 21:17:50,843 WARN Included extra file "/etc/supervisor/conf.d/kafka.conf" during parsing
2020-02-28 21:17:50,858 INFO RPC interface 'supervisor' initialized
2020-02-28 21:17:50,858 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2020-02-28 21:17:50,859 INFO supervisord started with pid 1
2020-02-28 21:17:51,862 INFO spawned: 'zookeeper' with pid 8
2020-02-28 21:17:51,864 INFO spawned: 'kafka' with pid 9
2020-02-28 21:17:52,926 INFO success: zookeeper entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-02-28 21:17:52,927 INFO success: kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-02-28 21:17:59,672 INFO exited: kafka (exit status 1; not expected)
2020-02-28 21:18:00,675 INFO spawned: 'kafka' with pid 297
2020-02-28 21:18:01,694 INFO success: kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-02-29 19:42:18,487 WARN received SIGTERM indicating exit request
2020-02-29 19:42:18,487 INFO waiting for zookeeper, kafka to die
2020-02-29 19:42:18,488 INFO stopped: kafka (terminated by SIGTERM)
2020-02-29 19:42:18,821 INFO stopped: zookeeper (exit status 143)
/usr/lib/python2.7/dist-packages/supervisor/options.py:296: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security.
'Supervisord is running as root and it is searching '
2020-02-29 19:42:26,841 CRIT Supervisor running as root (no user in config file)
2020-02-29 19:42:26,841 WARN Included extra file "/etc/supervisor/conf.d/zookeeper.conf" during parsing
2020-02-29 19:42:26,842 WARN Included extra file "/etc/supervisor/conf.d/kafka.conf" during parsing
2020-02-29 19:42:26,854 INFO RPC interface 'supervisor' initialized
2020-02-29 19:42:26,854 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2020-02-29 19:42:26,855 INFO supervisord started with pid 1
2020-02-29 19:42:27,857 INFO spawned: 'zookeeper' with pid 8
2020-02-29 19:42:27,859 INFO spawned: 'kafka' with pid 9
2020-02-29 19:42:28,903 INFO success: zookeeper entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-02-29 19:42:28,903 INFO success: kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-02-29 19:42:34,985 INFO exited: kafka (exit status 1; not expected)
2020-02-29 19:42:35,988 INFO spawned: 'kafka' with pid 297
2020-02-29 19:42:37,014 INFO success: kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-02-29 19:43:20,590 WARN received SIGTERM indicating exit request
2020-02-29 19:43:20,590 INFO waiting for zookeeper, kafka to die
2020-02-29 19:43:20,590 INFO stopped: kafka (terminated by SIGTERM)
2020-02-29 19:43:20,784 INFO stopped: zookeeper (exit status 143)
/usr/lib/python2.7/dist-packages/supervisor/options.py:296: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security.
'Supervisord is running as root and it is searching '
2020-02-29 19:45:38,600 CRIT Supervisor running as root (no user in config file)
2020-02-29 19:45:38,600 WARN Included extra file "/etc/supervisor/conf.d/zookeeper.conf" during parsing
2020-02-29 19:45:38,600 WARN Included extra file "/etc/supervisor/conf.d/kafka.conf" during parsing
2020-02-29 19:45:38,619 INFO RPC interface 'supervisor' initialized
2020-02-29 19:45:38,629 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2020-02-29 19:45:38,630 INFO supervisord started with pid 1
2020-02-29 19:45:39,632 INFO spawned: 'zookeeper' with pid 8
2020-02-29 19:45:39,634 INFO spawned: 'kafka' with pid 9
2020-02-29 19:45:40,687 INFO success: zookeeper entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-02-29 19:45:40,689 INFO success: kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-02-29 19:45:47,740 INFO exited: kafka (exit status 1; not expected)
2020-02-29 19:45:48,743 INFO spawned: 'kafka' with pid 297
2020-02-29 19:45:49,763 INFO success: kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-02-29 19:46:20,659 WARN received SIGTERM indicating exit request
2020-02-29 19:46:20,659 INFO waiting for zookeeper, kafka to die
2020-02-29 19:46:20,660 INFO stopped: kafka (terminated by SIGTERM)
2020-02-29 19:46:20,991 INFO stopped: zookeeper (exit status 143)
/usr/lib/python2.7/dist-packages/supervisor/options.py:296: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security.
'Supervisord is running as root and it is searching '
2020-03-13 22:16:26,128 CRIT Supervisor running as root (no user in config file)
2020-03-13 22:16:26,128 WARN Included extra file "/etc/supervisor/conf.d/zookeeper.conf" during parsing
2020-03-13 22:16:26,128 WARN Included extra file "/etc/supervisor/conf.d/kafka.conf" during parsing
2020-03-13 22:16:26,157 INFO RPC interface 'supervisor' initialized
2020-03-13 22:16:26,162 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2020-03-13 22:16:26,162 INFO supervisord started with pid 1
2020-03-13 22:16:27,164 INFO spawned: 'zookeeper' with pid 8
2020-03-13 22:16:27,167 INFO spawned: 'kafka' with pid 9
2020-03-13 22:16:28,226 INFO success: zookeeper entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-03-13 22:16:28,227 INFO success: kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-03-13 22:16:36,496 INFO exited: kafka (exit status 1; not expected)
2020-03-13 22:16:37,499 INFO spawned: 'kafka' with pid 298
2020-03-13 22:16:38,511 INFO success: kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-03-13 22:17:20,939 WARN received SIGTERM indicating exit request
2020-03-13 22:17:20,940 INFO waiting for zookeeper, kafka to die
2020-03-13 22:17:20,940 INFO stopped: kafka (terminated by SIGTERM)
2020-03-13 22:17:21,268 INFO stopped: zookeeper (exit status 143)
/usr/lib/python2.7/dist-packages/supervisor/options.py:296: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security.
'Supervisord is running as root and it is searching '
2020-03-27 21:25:59,495 CRIT Supervisor running as root (no user in config file)
2020-03-27 21:25:59,496 WARN Included extra file "/etc/supervisor/conf.d/zookeeper.conf" during parsing
2020-03-27 21:25:59,497 WARN Included extra file "/etc/supervisor/conf.d/kafka.conf" during parsing
2020-03-27 21:25:59,520 INFO RPC interface 'supervisor' initialized
2020-03-27 21:25:59,522 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2020-03-27 21:25:59,523 INFO supervisord started with pid 1
2020-03-27 21:26:00,530 INFO spawned: 'zookeeper' with pid 8
2020-03-27 21:26:00,532 INFO spawned: 'kafka' with pid 9
2020-03-27 21:26:01,620 INFO success: zookeeper entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-03-27 21:26:01,620 INFO success: kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
docker logs for kafka manager container seems fine:
[info] o.a.z.ZooKeeper - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
[info] o.a.z.ZooKeeper - Client environment:java.io.tmpdir=/tmp
[info] o.a.z.ZooKeeper - Client environment:java.compiler=<NA>
[info] o.a.z.ZooKeeper - Client environment:os.name=Linux
[info] o.a.z.ZooKeeper - Client environment:os.arch=amd64
[info] o.a.z.ZooKeeper - Client environment:os.version=4.9.93-boot2docker
[info] o.a.z.ZooKeeper - Client environment:user.name=root
[info] o.a.z.ZooKeeper - Client environment:user.home=/root
[info] o.a.z.ZooKeeper - Client environment:user.dir=/kafka-manager-1.3.3.4
[info] o.a.z.ZooKeeper - Initiating client connection, connectString=kafkaserver:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState#7a27a9b4
[info] o.a.z.ClientCnxn - Opening socket connection to server kafka.kafka_kafkanet/172.18.0.2:2181. Will not attempt to authenticate using SASL (unknown error)
[info] k.m.a.KafkaManagerActor - zk=kafkaserver:2181
[info] k.m.a.KafkaManagerActor - baseZkPath=/kafka-manager
[info] o.a.z.ClientCnxn - Socket connection established to kafka.kafka_kafkanet/172.18.0.2:2181, initiating session
[info] o.a.z.ClientCnxn - Session establishment complete on server kafka.kafka_kafkanet/172.18.0.2:2181, sessionid = 0x1711de33be70001, negotiated timeout = 40000
[info] k.m.a.KafkaManagerActor - Started actor akka://kafka-manager-system/user/kafka-manager
[info] k.m.a.KafkaManagerActor - Starting delete clusters path cache...
[info] k.m.a.DeleteClusterActor - Started actor akka://kafka-manager-system/user/kafka-manager/delete-cluster
[info] k.m.a.DeleteClusterActor - Starting delete clusters path cache...
[info] k.m.a.DeleteClusterActor - Adding kafka manager path cache listener...
[info] k.m.a.DeleteClusterActor - Scheduling updater for 10 seconds
[info] k.m.a.KafkaManagerActor - Starting kafka manager path cache...
[info] k.m.a.KafkaManagerActor - Adding kafka manager path cache listener...
[info] play.api.Play - Application started (Prod)
[info] p.c.s.NettyServer - Listening for HTTP on /0.0.0.0:9000
[info] k.m.a.KafkaManagerActor - Updating internal state...
[info] k.m.a.KafkaManagerActor - Updating internal state...
[info] k.m.a.KafkaManagerActor - Updating internal state...
[info] k.m.a.KafkaManagerActor - Updating internal state...
This log is a lot longer so I've ommited the beginning but it seems fine.
Yes, there's a hypervisor, not a full VM. You can open the hyperV manager to look at it
You compose file needs a port forward
ports:
- '9000:9000'
If you are using docker toolbox on windows you can try to access kafka-manager with this address: http://192.168.99.100:9000
Note: 192.168.99.100 is the default ip address of VM which docker running on.
docker-compose.yaml is totally fine which is given in the tutorial. Can you do docker-compose down and then again bring up the docker-compose up?
Then try to browse http://localhost:9000 and you should be able to see it.
Possible errors:-
Port forwarding (already done in the docker-compose)
Instead of HTTP, you are opening HTTPS in the browser.

HDFS NFS startup error: “ERROR mount.MountdBase: Failed to start the TCP server...ChannelException: Failed to bind..."

Attempting to use / startup HDFS NFS following the docs (ignoring the instructions to stop the rpcbind service and did not start the hadoop portmap service given that the OS is not SLES 11 and RHEL 6.2), but running into error when trying to set up the NFS service starting the hdfs nfs3 service:
[root#HW02 ~]#
[root#HW02 ~]#
[root#HW02 ~]# cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
[root#HW02 ~]#
[root#HW02 ~]#
[root#HW02 ~]# service nfs status
Redirecting to /bin/systemctl status nfs.service
Unit nfs.service could not be found.
[root#HW02 ~]#
[root#HW02 ~]#
[root#HW02 ~]# service nfs stop
Redirecting to /bin/systemctl stop nfs.service
Failed to stop nfs.service: Unit nfs.service not loaded.
[root#HW02 ~]#
[root#HW02 ~]#
[root#HW02 ~]# service rpcbind status
Redirecting to /bin/systemctl status rpcbind.service
● rpcbind.service - RPC bind service
Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2019-07-23 13:48:54 HST; 28s ago
Process: 27337 ExecStart=/sbin/rpcbind -w $RPCBIND_ARGS (code=exited, status=0/SUCCESS)
Main PID: 27338 (rpcbind)
CGroup: /system.slice/rpcbind.service
└─27338 /sbin/rpcbind -w
Jul 23 13:48:54 HW02.ucera.local systemd[1]: Starting RPC bind service...
Jul 23 13:48:54 HW02.ucera.local systemd[1]: Started RPC bind service.
[root#HW02 ~]#
[root#HW02 ~]#
[root#HW02 ~]# hdfs nfs3
19/07/23 13:49:33 INFO nfs3.Nfs3Base: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting Nfs3
STARTUP_MSG: host = HW02.ucera.local/172.18.4.47
STARTUP_MSG: args = []
STARTUP_MSG: version = 3.1.1.3.1.0.0-78
STARTUP_MSG: classpath = /usr/hdp/3.1.0.0-78/hadoop/conf:/usr/hdp/3.1.0.0-78/hadoop/lib/jersey-server-1.19.jar:/usr/hdp/3.1.0.0-78/hadoop/lib/ranger-hdfs-plugin-shim-1.2.0.3.1.0.0-78.jar:
...
<a bunch of other jars>
...
STARTUP_MSG: build = git#github.com:hortonworks/hadoop.git -r e4f82af51faec922b4804d0232a637422ec29e64; compiled by 'jenkins' on 2018-12-06T12:26Z
STARTUP_MSG: java = 1.8.0_112
************************************************************/
19/07/23 13:49:33 INFO nfs3.Nfs3Base: registered UNIX signal handlers for [TERM, HUP, INT]
19/07/23 13:49:33 INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties
19/07/23 13:49:33 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
19/07/23 13:49:33 INFO impl.MetricsSystemImpl: Nfs3 metrics system started
19/07/23 13:49:33 INFO oncrpc.RpcProgram: Will accept client connections from unprivileged ports
19/07/23 13:49:33 INFO security.ShellBasedIdMapping: Not doing static UID/GID mapping because '/etc/nfs.map' does not exist.
19/07/23 13:49:33 INFO nfs3.WriteManager: Stream timeout is 600000ms.
19/07/23 13:49:33 INFO nfs3.WriteManager: Maximum open streams is 256
19/07/23 13:49:33 INFO nfs3.OpenFileCtxCache: Maximum open streams is 256
19/07/23 13:49:34 INFO nfs3.DFSClientCache: Added export: / FileSystem URI: / with namenodeId: -1408097406
19/07/23 13:49:34 INFO nfs3.RpcProgramNfs3: Configured HDFS superuser is
19/07/23 13:49:34 INFO nfs3.RpcProgramNfs3: Delete current dump directory /tmp/.hdfs-nfs
19/07/23 13:49:34 INFO nfs3.RpcProgramNfs3: Create new dump directory /tmp/.hdfs-nfs
19/07/23 13:49:34 INFO nfs3.Nfs3Base: NFS server port set to: 2049
19/07/23 13:49:34 INFO oncrpc.RpcProgram: Will accept client connections from unprivileged ports
19/07/23 13:49:34 INFO mount.RpcProgramMountd: FS:hdfs adding export Path:/ with URI: hdfs://hw01.ucera.local:8020/
19/07/23 13:49:34 INFO oncrpc.SimpleUdpServer: Started listening to UDP requests at port 4242 for Rpc program: mountd at localhost:4242 with workerCount 1
19/07/23 13:49:34 ERROR mount.MountdBase: Failed to start the TCP server.
org.jboss.netty.channel.ChannelException: Failed to bind to: 0.0.0.0/0.0.0.0:4242
at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
at org.apache.hadoop.oncrpc.SimpleTcpServer.run(SimpleTcpServer.java:89)
at org.apache.hadoop.mount.MountdBase.startTCPServer(MountdBase.java:83)
at org.apache.hadoop.mount.MountdBase.start(MountdBase.java:98)
at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.startServiceInternal(Nfs3.java:56)
at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.startService(Nfs3.java:69)
at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.main(Nfs3.java:79)
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
...
...
19/07/23 13:49:34 INFO util.ExitUtil: Exiting with status 1: org.jboss.netty.channel.ChannelException: Failed to bind to: 0.0.0.0/0.0.0.0:4242
19/07/23 13:49:34 INFO nfs3.Nfs3Base: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down Nfs3 at HW02.ucera.local/172.18.4.47
************************************************************/
Not sure how to interpret any of the errors seen here (and have not installed any packages like nfs-utils, assuming that Ambari would have installed all needed packages when cluster was initially installed).
Any debugging suggestions or solutions for what to do about this?
** UPDATE:
After looking at the error, I can see
Caused by: java.net.BindException: Address already in use
and looking into what is already using it, we see...
[root#HW02 ~]# netstat -ltnp | grep 4242
tcp 0 0 0.0.0.0:4242 0.0.0.0:* LISTEN 98067/jsvc.exec
The process jsvc.exec appears to be related to running java applications. Given that hadoop runs on java, I assume it would be bad to just kill the process. Is it not supposed to be on this port (since interferes with NFS Gateway)? Not sure what to do about this.
TLDR: nfs gateway service was already running (by default, apparently) and the service that I thought was blocking the hadoop nfs3 service (jsvc.exec) from starting was (I'm assuming) part of that service already running.
What made me suspect this was that when shutting down the cluster, the service also stopped plus the fact that it was using the port I needed for nfs. The way that I confirmed this was just from following the verification steps in the docs and seeing that my output was similar to what should be expected.
[root#HW02 ~]# rpcinfo -p hw02
program vers proto port service
100000 4 tcp 111 portmapper
100000 3 tcp 111 portmapper
100000 2 tcp 111 portmapper
100000 4 udp 111 portmapper
100000 3 udp 111 portmapper
100000 2 udp 111 portmapper
100005 1 udp 4242 mountd
100005 2 udp 4242 mountd
100005 3 udp 4242 mountd
100005 1 tcp 4242 mountd
100005 2 tcp 4242 mountd
100005 3 tcp 4242 mountd
100003 3 tcp 2049 nfs
[root#HW02 ~]# showmount -e hw02
Export list for hw02:
/ *
Another thing that could told me that the jsvc process was part of an already running hdfs nfs service would have been checking the process info...
[root#HW02 ~]# ps -feww | grep jsvc
root 61106 59083 0 14:27 pts/2 00:00:00 grep --color=auto jsvc
root 163179 1 0 12:14 ? 00:00:00 jsvc.exec -Dproc_nfs3 -outfile /var/log/hadoop/root/hadoop-hdfs-root-nfs3-HW02.ucera.local.out -errfile /var/log/hadoop/root/privileged-root-nfs3-HW02.ucera.local.err -pidfile /var/run/hadoop/root/hadoop-hdfs-root-nfs3.pid -nodetach -user hdfs -cp /usr/hdp/3.1.0.0-78/hadoop/conf:...
...
hdfs 163193 163179 0 12:14 ? 00:00:17 jsvc.exec -Dproc_nfs3 -outfile /var/log/hadoop/root/hadoop-hdfs-root-nfs3-HW02.ucera.local.out -errfile /var/log/hadoop/root/privileged-root-nfs3-HW02.ucera.local.err -pidfile /var/run/hadoop/root/hadoop-hdfs-root-nfs3.pid -nodetach -user hdfs -cp /usr/hdp/3.1.0.0-78/hadoop/conf:...
and seeing jsvc.exec -Dproc_nfs3 ... to get the hint that jsvc (which apparently is for running java apps on linux) was being used to run the very nfs3 service I was trying to start.
And for anyone else with this problem, note that I did not stop all the services that the docs want you to stop (since using centos7)
[root#HW01 /]# service nfs status
Redirecting to /bin/systemctl status nfs.service
● nfs-server.service - NFS server and services
Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; disabled; vendor preset: disabled)
Active: inactive (dead)
[root#HW01 /]# service rpcbind status
Redirecting to /bin/systemctl status rpcbind.service
● rpcbind.service - RPC bind service
Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2019-07-19 15:17:02 HST; 6 days ago
Main PID: 2155 (rpcbind)
CGroup: /system.slice/rpcbind.service
└─2155 /sbin/rpcbind -w
Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
Also note that I did not follow any of the config file settings recommended in the docs (and that some of the properties instructed in the docs could not even be found in the Ambari-managed HDFS configs (so if anyone can explain why this is still working for me despite that, please do)).
** Update:
After talking with some people more experienced with using HDP (v3.1) than me, the docs that I linked to for setting up NFS for HDFS may not be totally up to date (when setting up NFS via Ambari mgnt. in any case)...
Can have a cluster node act as an NFS gateway by checking it off as a NFS node in the Ambari host management UI:
Needed configs can be set like so in the HDFS mgnt. UI...
Can confirm that HDFS NFS gateway is running by looking at the Host > Summary > Components section in Ambari...

ESonar server does not start up

First time I am using SonarQube and after integration with eclipse, I start the sonarserver but I am getting the below error.
Error:
wrapper | --> Wrapper Started as Console
wrapper | Launching a JVM...
jvm 1 | WrapperManager class initialized by thread: main Using classloader:
sun.misc.Launcher$AppClassLoader#70dea4e
jvm 1 | Wrapper (Version 3.2.3) http://wrapper.tanukisoftware.org
jvm 1 | Copyright 1999-2006 Tanuki Software, Inc. All Rights Reserved.
jvm 1 |
jvm 1 | Wrapper Manager: JVM #1
jvm 1 | Running a 64-bit JVM.
jvm 1 | Wrapper Manager: Registering shutdown hook
jvm 1 | Wrapper Manager: Using wrapper
jvm 1 | Load native library. One or more attempts may fail if platform speci
fic libraries do not exist.
jvm 1 | Loading native library failed: wrapper-windows-x86-64.dll Cause: jav
a.lang.UnsatisfiedLinkError: no wrapper-windows-x86-64 in java.library.path
jvm 1 | Loaded native library: wrapper.dll
jvm 1 | Calling native initialization method.
jvm 1 | Initializing WrapperManager native library.
jvm 1 | Java Executable: C:\ProgramData\Oracle\Java\javapath\java.exe
jvm 1 | Windows version: 6.1.7601
jvm 1 | Java Version : 1.8.0_60-b27 Java HotSpot(TM) 64-Bit Server VM
jvm 1 | Java VM Vendor : Oracle Corporation
jvm 1 |
jvm 1 | Startup runner thread started.
jvm 1 | Control event monitor thread started.
jvm 1 | WrapperManager.start(org.tanukisoftware.wrapper.WrapperSimpleApp#3f99
bd52, args[]) called by thread: main
jvm 1 | Communications runner thread started.
jvm 1 | Open socket to wrapper...Wrapper-Connection
jvm 1 | Opened Socket from 31000 to 32001
jvm 1 | Send a packet KEY : 8vB3fcnsLty4gNaO
jvm 1 | handleSocket(Socket[addr=/127.0.0.1,port=32001,localport=31000])
jvm 1 | Received a packet LOW_LOG_LEVEL : 1
jvm 1 | Wrapper Manager: LowLogLevel from Wrapper is 1
jvm 1 | Received a packet PING_TIMEOUT : 0
jvm 1 | PingTimeout from Wrapper is 0
jvm 1 | Received a packet PROPERTIES : (Property Values)
jvm 1 | Received a packet START : start
jvm 1 | calling WrapperListener.start()
jvm 1 | Waiting for WrapperListener.start runner thread to complete.
jvm 1 | WrapperListener.start runner thread started.
jvm 1 | WrapperSimpleApp: start(args) Will wait up to 2 seconds for the main
method to complete.
jvm 1 | WrapperSimpleApp: invoking main method
jvm 1 | 2016.03.02 11:06:16 INFO app[o.s.p.m.JavaProcessLauncher] Launch pro
cess[search]: C:\Program Files\Java\jre1.8.0_60\bin\java -Djava.awt.headless=tru
e -Xmx1G -Xms256m -Xss256k -Djava.net.preferIPv4Stack=true -XX:+UseParNewGC -XX:
+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingO
ccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Djava.io.tmpdir=C:\Users\sri.laksh
mi.kovvuri\projectworkspace\SonarQube\sonarqube-5.3\temp -cp ./lib/common/*;./li
b/search/* org.sonar.search.SearchServer C:\Users\SRILAK~1.KOV\AppData\Local\Tem
p\sq-process1566850086740273426properties
jvm 1 | Send a packet START_PENDING : 5000
jvm 1 | Send a packet START_PENDING : 5000
jvm 1 | WrapperSimpleApp: start(args) end. Main Completed=false, exitCode=nu
ll
jvm 1 | WrapperListener.start runner thread stopped.
jvm 1 | returned from WrapperListener.start()
jvm 1 | Send a packet STARTED :
jvm 1 | Startup runner thread stopped.
jvm 1 | Received a packet PING : ping
jvm 1 | Send a packet PING : ok
jvm 1 | Wrapper Manager: ShutdownHook started
jvm 1 | WrapperManager.stop(0) called by thread: Wrapper-Shutdown-Hook
jvm 1 | Send a packet STOP : 0
jvm 1 | Received a packet STOP :
jvm 1 | Thread, Wrapper-Shutdown-Hook, handling the shutdown process.
jvm 1 | calling listener.stop()
jvm 1 | WrapperSimpleApp: stop(0)
jvm 1 | returned from listener.stop() -> 0
jvm 1 | shutdownJVM(0) Thread:Wrapper-Shutdown-Hook
jvm 1 | Send a packet STOPPED : 0
jvm 1 | Closing socket.
jvm 1 | Wrapper Manager: ShutdownHook complete
jvm 1 | Server daemon shut down
wrapper | <-- Wrapper Stopped
and in log file its coming the below error:
--> Wrapper Started as Console
Using tick timer.
server listening on port 32001.
Launching a JVM...
command: "java" -Djava.awt.headless=true -Xms3m -Xmx3m -Djava.library.path="./lib" -classpath "../../lib/jsw/wrapper-3.2.3.jar;../../lib/sonar-application-5.3.jar" -Dwrapper.key="n_2QZgO7FbHOC2G_" -Dwrapper.port=32001 -Dwrapper.jvm.port.min=31000 -Dwrapper.jvm.port.max=31999 -Dwrapper.debug="TRUE" -Dwrapper.pid=4876 -Dwrapper.version="3.2.3" -Dwrapper.native_library="wrapper" -Dwrapper.cpu.timeout="10" -Dwrapper.jvmid=1 org.tanukisoftware.wrapper.WrapperSimpleApp org.sonar.application.App
JVM started (PID=7932)
WrapperManager class initialized by thread: main Using classloader: sun.misc.Launcher$AppClassLoader#70dea4e
Wrapper (Version 3.2.3) http://wrapper.tanukisoftware.org
Copyright 1999-2006 Tanuki Software, Inc. All Rights Reserved.
Wrapper Manager: JVM #1
Running a 64-bit JVM.
Wrapper Manager: Registering shutdown hook
Wrapper Manager: Using wrapper
Load native library. One or more attempts may fail if platform specific libraries do not exist.
Loading native library failed: wrapper-windows-x86-64.dll Cause: java.lang.UnsatisfiedLinkError: no wrapper-windows-x86-64 in java.library.path
Loaded native library: wrapper.dll
Calling native initialization method.
Initializing WrapperManager native library.
Java Executable: C:\ProgramData\Oracle\Java\javapath\java.exe
Windows version: 6.1.7601
Java Version : 1.8.0_60-b27 Java HotSpot(TM) 64-Bit Server VM
Java VM Vendor : Oracle Corporation
Startup runner thread started.
Control event monitor thread started.
WrapperManager.start(org.tanukisoftware.wrapper.WrapperSimpleApp#3f99bd52, args[]) called by thread: main
Communications runner thread started.
Open socket to wrapper...Wrapper-Connection
Opened Socket from 31000 to 32001
Send a packet KEY : n_2QZgO7FbHOC2G_
handleSocket(Socket[addr=/127.0.0.1,port=32001,localport=31000])
accepted a socket from 127.0.0.1 on port 31000
read a packet KEY : n_2QZgO7FbHOC2G_
Got key from JVM: n_2QZgO7FbHOC2G_
send a packet LOW_LOG_LEVEL : 1
send a packet PING_TIMEOUT : 0
send a packet PROPERTIES : (Property Values)
Start Application.
send a packet START : start
Received a packet LOW_LOG_LEVEL : 1
Wrapper Manager: LowLogLevel from Wrapper is 1
Received a packet PING_TIMEOUT : 0
PingTimeout from Wrapper is 0
Received a packet PROPERTIES : (Property Values)
Received a packet START : start
calling WrapperListener.start()
Waiting for WrapperListener.start runner thread to complete.
WrapperListener.start runner thread started.
WrapperSimpleApp: start(args) Will wait up to 2 seconds for the main method to complete.
WrapperSimpleApp: invoking main method
2016.03.02 14:39:07 INFO app[o.s.p.m.JavaProcessLauncher] Launch process[search]: C:\Program Files\Java\jre1.8.0_60\bin\java -Djava.awt.headless=true -Xmx1G -Xms256m -Xss256k -Djava.net.preferIPv4Stack=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Djava.io.tmpdir=C:\Users\sri.lakshmi.kovvuri\projectworkspace\SonarQube\sonarqube-5.3\temp -cp ./lib/common/*;./lib/search/* org.sonar.search.SearchServer C:\Users\SRILAK~1.KOV\AppData\Local\Temp\sq-process637648259402434916properties
Send a packet START_PENDING : 5000
read a packet START_PENDING : 5000
JVM signalled a start pending with waitHint of 5000 millis.
2016.03.02 14:39:07 INFO es[o.s.p.ProcessEntryPoint] Starting search
2016.03.02 14:39:07 INFO es[o.s.s.SearchSettings] Elasticsearch listening on 127.0.0.1:9001
2016.03.02 14:39:08 INFO es[o.elasticsearch.node] [sonar-1456909746449] version[1.7.2], pid[6784], build[e43676b/2015-09-14T09:49:53Z]
2016.03.02 14:39:08 INFO es[o.elasticsearch.node] [sonar-1456909746449] initializing ...
2016.03.02 14:39:08 INFO es[o.e.plugins] [sonar-1456909746449] loaded [], sites []
2016.03.02 14:39:08 INFO es[o.elasticsearch.env] [sonar-1456909746449] using [1] data paths, mounts [[OSDisk (C:)]], net usable_space [32.7gb], net total_space [148.5gb], types [NTFS]
Send a packet START_PENDING : 5000
WrapperSimpleApp: start(args) end. Main Completed=false, exitCode=null
WrapperListener.start runner thread stopped.
returned from WrapperListener.start()
Send a packet STARTED :
Startup runner thread stopped.
read a packet START_PENDING : 5000
JVM signalled a start pending with waitHint of 5000 millis.
read a packet STARTED :
JVM signalled that it was started.
send a packet PING : ping
Received a packet PING : ping
Send a packet PING : ok
read a packet PING : ok
Got ping response from JVM
2016.03.02 14:39:12 WARN es[o.e.bootstrap] JNA not found. native methods will be disabled.
2016.03.02 14:39:13 INFO es[o.elasticsearch.node] [sonar-1456909746449] initialized
2016.03.02 14:39:13 INFO es[o.elasticsearch.node] [sonar-1456909746449] starting ...
2016.03.02 14:39:14 WARN es[o.s.p.ProcessEntryPoint] Fail to start search
org.elasticsearch.transport.BindTransportException: Failed to bind to [9001]
at org.elasticsearch.transport.netty.NettyTransport.bindServerBootstrap(NettyTransport.java:422) ~[elasticsearch-1.7.2.jar:na]
at org.elasticsearch.transport.netty.NettyTransport.doStart(NettyTransport.java:283) ~[elasticsearch-1.7.2.jar:na]
at org.elasticsearch.common.component.AbstractLifecycleComponent.start(AbstractLifecycleComponent.java:85) ~[elasticsearch-1.7.2.jar:na]
at org.elasticsearch.transport.TransportService.doStart(TransportService.java:153) ~[elasticsearch-1.7.2.jar:na]
at org.elasticsearch.common.component.AbstractLifecycleComponent.start(AbstractLifecycleComponent.java:85) ~[elasticsearch-1.7.2.jar:na]
at org.elasticsearch.node.internal.InternalNode.start(InternalNode.java:257) ~[elasticsearch-1.7.2.jar:na]
at org.sonar.search.SearchServer.start(SearchServer.java:45) [sonar-search-5.3.jar:na]
at org.sonar.process.ProcessEntryPoint.launch(ProcessEntryPoint.java:77) ~[sonar-process-5.3.jar:na]
at org.sonar.search.SearchServer.main(SearchServer.java:79) [sonar-search-5.3.jar:na]
Caused by: org.elasticsearch.common.netty.channel.ChannelException: Failed to bind to: /127.0.0.1:9001
at org.elasticsearch.common.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272) ~[elasticsearch-1.7.2.jar:na]
at org.elasticsearch.transport.netty.NettyTransport$1.onPortNumber(NettyTransport.java:413) ~[elasticsearch-1.7.2.jar:na]
at org.elasticsearch.common.transport.PortsRange.iterate(PortsRange.java:58) ~[elasticsearch-1.7.2.jar:na]
at org.elasticsearch.transport.netty.NettyTransport.bindServerBootstrap(NettyTransport.java:409) ~[elasticsearch-1.7.2.jar:na]
... 8 common frames omitted
Caused by: java.net.BindException: Address already in use: bind
at sun.nio.ch.Net.bind0(Native Method) ~[na:1.8.0_60]
at sun.nio.ch.Net.bind(Unknown Source) ~[na:1.8.0_60]
at sun.nio.ch.Net.bind(Unknown Source) ~[na:1.8.0_60]
at sun.nio.ch.ServerSocketChannelImpl.bind(Unknown Source) ~[na:1.8.0_60]
at sun.nio.ch.ServerSocketAdaptor.bind(Unknown Source) ~[na:1.8.0_60]
at org.elasticsearch.common.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193) ~[elasticsearch-1.7.2.jar:na]
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391) ~[elasticsearch-1.7.2.jar:na]
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315) ~[elasticsearch-1.7.2.jar:na]
at org.elasticsearch.common.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42) ~[elasticsearch-1.7.2.jar:na]
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) ~[elasticsearch-1.7.2.jar:na]
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) ~[elasticsearch-1.7.2.jar:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[na:1.8.0_60]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[na:1.8.0_60]
at java.lang.Thread.run(Unknown Source) ~[na:1.8.0_60]
2016.03.02 14:39:14 INFO es[o.elasticsearch.node] [sonar-1456909746449] stopping ...
2016.03.02 14:39:14 INFO es[o.elasticsearch.node] [sonar-1456909746449] stopped
2016.03.02 14:39:14 INFO es[o.elasticsearch.node] [sonar-1456909746449] closing ...
2016.03.02 14:39:14 INFO es[o.elasticsearch.node] [sonar-1456909746449] closed
send a packet PING : ping
Received a packet PING : ping
Send a packet PING : ok
read a packet PING : ok
Got ping response from JVM
Wrapper Manager: ShutdownHook started
WrapperManager.stop(0) called by thread: Wrapper-Shutdown-Hook
Send a packet STOP : 0
read a packet STOP : 0
JVM requested a shutdown. (0)
wrapperStopProcess(0) called.
Sending stop signal to JVM
send a packet STOP : NULL
Received a packet STOP :
Thread, Wrapper-Shutdown-Hook, handling the shutdown process.
calling listener.stop()
WrapperSimpleApp: stop(0)
returned from listener.stop() -> 0
shutdownJVM(0) Thread:Wrapper-Shutdown-Hook
Send a packet STOPPED : 0
read a packet STOPPED : 0
JVM signalled that it was stopped.
Closing socket. socket read no code (closed?).
server listening on port 32002.
Wrapper Manager: ShutdownHook complete
Server daemon shut down
JVM process exited with a code of 0, leaving the wrapper exit code set to 0.
JVM exited normally.
<-- Wrapper Stopped
Sometimes SonarQube doesn't launch after several start/shutdown cycles and reports that couldn't make the "bind". For me, going to the "Task Manager" and close all the Java processes solves the problem.
It happens that somehow after closing the SonarQube console, something about that process is still running in the background and causes to block the ports to be used by SonarQube.
First try to find out that specific port and stop it as it has not stopped gracefully.
netstat -a -o -n
taskkill /F /PID <port> #cmd command to stat and kill the process
If not sure then go to the
conf->server.propeties
and change the sonar.search.port

Docker stop exit code -1 if the default CMD is a shell script

I am building a tomcat container in Docker with supervisord. If the default command in the Dockerfile is
CMD supervisord -c /etc/supervisord.conf
and when i dispatch docker stop command, the container exits successfully with the exit code 0.
But instead if i have
CMD ["/run"]
and in run.sh,
supervisord -c /etc/supervisord.conf
The docker stop command gives me a exit code -1. On viewing the logs, it seems that the supervisord did not receive the SIGTERM indicating the exit request.
2014-10-06 19:48:54,420 CRIT Supervisor running as root (no user in config file)
2014-10-06 19:48:54,450 INFO RPC interface 'supervisor' initialized
2014-10-06 19:48:54,451 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2014-10-06 19:48:54,451 INFO supervisord started with pid 6
2014-10-06 19:48:55,457 INFO spawned: 'tomcat' with pid 9
2014-10-06 19:48:56,503 INFO success: tomcat entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
as opposed to the previous logs where it receives a sigterm and gracefully exits.
2014-10-06 20:02:59,527 CRIT Supervisor running as root (no user in config file)
2014-10-06 20:02:59,556 INFO RPC interface 'supervisor' initialized
2014-10-06 20:02:59,556 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2014-10-06 20:02:59,557 INFO supervisord started with pid 1
2014-10-06 20:03:00,561 INFO spawned: 'tomcat' with pid 9
2014-10-06 20:03:01,602 INFO success: tomcat entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2014-10-06 20:05:11,690 WARN received SIGTERM indicating exit request
2014-10-06 20:05:11,690 INFO waiting for tomcat to die
2014-10-06 20:05:12,450 INFO stopped: tomcat (exit status 143)
Any help appreciated.
Thanks,
Karthik
UPDATE:
supervisord.conf file
[supervisord]
nodaemon=true
logfile=/var/log/supervisor/supervisord.log
[program:mysql]
command=/usr/bin/pidproxy /var/run/mysqld/mysqld.pid /usr/bin/mysqld_safe --pid-file=/var/run/mysqld/mysqld.pid
stdout_logfile=/tmp/mysql.log
stderr_logfile=/tmp/mysql_err.log
[supervisorctl]
serverurl=unix:///tmp/supervisor.sock
[unix_http_server]
file=/tmp/supervisor.sock ; path to your socket file
[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface
When you run the process via run.sh, signals are only sent to that process. Unless you are
going out of your way to send signals to child processes, e.g. with trap
sending signals to the process group.
doing exec supervisord ... in run.sh
the child process won't get the signals.

Resources