Can't start Cloudera Manager, site not reachable - cloudera-manager

I have a small cluster with three nodes on my home server for learning purpose.
It was working fine after it was initially set up.
I haven't used it for a month and today when I try to use it, I found Cloudera Manager GUI cannot be accessed, I checked the network between the 3 nodes are good, they can ping to each other.
On master node where CM is installed, I tried service cloudera-scm-server start, it shows me [OK] in green; when I check the status it shows the following info:
[root#pocnnr1n1 ~]# service cloudera-scm-server status -l
● cloudera-scm-server.service - LSB: Cloudera SCM Server
Loaded: loaded (/etc/rc.d/init.d/cloudera-scm-server; bad; vendor preset: disabled)
Active: active (exited) since Fri 2017-09-15 20:58:24 EDT; 18min ago
Docs: man:systemd-sysv-generator(8)
Process: 107428 ExecStop=/etc/rc.d/init.d/cloudera-scm-server stop (code=exited, status=1/FAILURE)
Process: 107467 ExecStart=/etc/rc.d/init.d/cloudera-scm-server start (code=exited, status=0/SUCCESS)
Sep 15 20:58:19 pocnnr1n1.raymond.com systemd[1]: Starting LSB: Cloudera SCM Server...
Sep 15 20:58:19 pocnnr1n1.raymond.com su[107494]: (to cloudera-scm) root on none
Sep 15 20:58:24 pocnnr1n1.raymond.com cloudera-scm-server[107467]: Starting cloudera-scm-server: [ OK ]
Sep 15 20:58:24 pocnnr1n1.raymond.com systemd[1]: Started LSB: Cloudera SCM Server.
So, is the Cloudera Manager service started or stopped?
When I try to access CM through GUI, it shows below in chrome:
This site can’t be reached
192.168.211.251 refused to connect. Search Google for 192 168 211 251 7180 ERR_CONNECTION_REFUSED
Can anyone help me to fix it? Thank you very much.

This indicates the Cloudera Manager startup runs into an error. What you should do is to check the log file of your Cloudera Manager, which should be located at /var/log/cloudera-scm-server directory. Since this is a POC cluster, I assume that when you set it up, you did not use the external database like MySQL. Instead, you probably used the embedded postgresql database. If that's the case, please make sure the embedded database process is running while you start up the Cloudera Manager Server. To check the status of embedded db, you can do
service cloudera-scm-server-db status

The error when I attempted to start mariadb and failed was because there are dead processes, could be related to previous failed attempt, I killed those failed processes, and restart the mariadb with success, after that, cloudera-scm-server starts successfully.
Thank you. I hope this help for later viewers.

Related

Avoid waiting for user when checking the Apache Tomcat status

As part of a bash script I check the recently installed Apache Tomcat status with
sudo systemctl status tomcat
The output is as follows
● tomcat.service
Loaded: loaded (/etc/systemd/system/tomcat.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2023-01-30 16:25:48 UTC; 3min 9s ago
Process: 175439 ExecStart=/opt/tomcat/bin/startup.sh (code=exited, status=0/SUCCESS)
Main PID: 175447 (java)
Tasks: 30 (limit: 4546)
Memory: 253.0M
CPU: 9.485s
CGroup: /system.slice/tomcat.service
└─175447 /usr/lib/jvm/java-1.11.0-openjdk-amd64/bin/java -Djava.util.logging.config.file=/opt/tomcat/conf/logging.properties -Djava.uti>
Jan 30 16:25:48 vps-06354c04 systemd[1]: Starting tomcat.service...
Jan 30 16:25:48 vps-06354c04 startup.sh[175439]: Tomcat started.
Jan 30 16:25:48 vps-06354c04 systemd[1]: Started tomcat.service.
Jan 30 16:25:48 vps-06354c04 systemd[1]: /etc/systemd/system/tomcat.service:1: Assignment outside of section. Ignoring.
Jan 30 16:25:48 vps-06354c04 systemd[1]: /etc/systemd/system/tomcat.service:2: Assignment outside of section. Ignoring.
This is the info I expect to see, but after printing it, systemctl keeps waiting for the user to type a key, breaking the automation I expect to deliver.
How can I avoid this behaviour?
I'm pretty sure the --no-pager option would keep that from happening. I just confirmed that on my own system on a different service. Otherwise, it goes interactive.
I don't recall ever seeing systemctl status asking for input, so perhaps it's the sudo used in this command doing that, in which case you could ask your system administrator to enable passwordless sudo on the account that runs this command.
A general solution for automating user input in shell scripts is to use expect, but for a simple case where you only need to send a single value one time, you can often get by with using echo and piping the value to the command (e.g., echo 'foo' | sudo systemctl status tomcat), although you should never do this to pass sensitive information such as passwords because that will potentially be accessible to other users on that system.

Unable to restart mysql server installed on Alibaba Cloud ECS?

I have installed MySQL server on Alibaba Cloud ECS instance, I am updating the /etc/mysql/my.cnf with bind-address = 0.0.0.0, after that, I am unable to restart the MySQL service. The below is the error
mysql.service - MySQL Community Server
Loaded: loaded (/lib/systemd/system/mysql.service; enabled; vendor preset: enabled)
Active: activating (start-post) (Result: exit-code) since Tue 2018-12-25 16:56:32 IST; 7s ago
Process: 6905 ExecStart=/usr/sbin/mysqld (code=exited, status=1/FAILURE)
Process: 6896 ExecStartPre=/usr/share/mysql/mysql-systemd-start pre (code=exited, status=0/SUCCESS)
Main PID: 6905 (code=exited, status=1/FAILURE); : 6906 (mysql-systemd-s)
CGroup: /system.slice/mysql.service
└─control
├─6906 /bin/bash /usr/share/mysql/mysql-systemd-start post
└─6925 sleep 1
Dec 25 16:56:32 iZa2dej95yv6tb65txtwfhZ systemd[1]: Starting MySQL Community Server...
Dec 25 16:56:32 iZa2dej95yv6tb65txtwfhZ mysql-systemd-start[6896]: my_print_defaults: [ERROR] Found option without preceding group in config file /etc/mysql/my.cnf at line
Dec 25 16:56:32 iZa2dej95yv6tb65txtwfhZ mysql-systemd-start[6896]: my_print_defaults: [ERROR] Fatal error in defaults handling. Program aborted!
Dec 25 16:56:32 iZa2dej95yv6tb65txtwfhZ systemd[1]: mysql.service: Main process exited, code=exited, status=1/FAILURE
It seems to be some syntax problem or may be an encoding problem. Please check this
Setting bind address to "0.0.0.0" tells MySQL to listen to all interfaces. It seems like you want to access MySQL remotely if I am not wrong. If so then you can simply comment out the bind-address and restart it will work fine.
Some MySQL packaged versions would have startup | maintenance scripts which will attempt to access MySQL on default address (127.0.0.1). Even it may be the cause or you may have another bind-address parameter unknowingly.
So better to comment it out, restart the MySQL again.
Mysql Package maybe was change, testing to change on /etc/mysql/mysql.conf.d/mysqld.cnf, change or remove bind on /etc/mysql/my.cnf

custom systemd service can't start on Ubuntu 18.04

and thanks in advance for any assistance
I run original QT wallets (command-line based) for various cryptocurrencies. Earlier this year, I set them up as a custom systemd service, and that has been invaluable. It starts them up and shuts them down with the system just like all the normal services. I recently discovered an issue with one in particular, blackcoin.
This service worked fine in the past (I don't know how long it was down for before I found it)
If I run the command after execstart= command manually, everything works just fine. If I try to start the service (via systemctl start blackcoin), it fails with the following service status:
blackcoin.service - blackcoin wallet daemon
Loaded: loaded (/etc/systemd/system/blackcoin.service; enabled; vendor preset: enabled)
Active: failed (Result: core-dump) since Tue 2018-11-20 10:44:01 MST; 2h 51min ago
Process: 12272 ExecStart=/usr/bin/blackcoind -datadir=/coindaemon-rundirectory/blackcoin/ -conf=/coindaemon-rundirectory/blackcoin/blackcoin.conf -daemon (code=exited, status=0/SUCCESS)
Main PID: 12283 (code=dumped, signal=ABRT)
Nov 20 10:44:01 knox systemd[1]: blackcoin.service: Service hold-off time over, scheduling restart.
Nov 20 10:44:01 knox systemd[1]: blackcoin.service: Scheduled restart job, restart counter is at 5.
Nov 20 10:44:01 knox systemd[1]: Stopped blackcoin wallet daemon.
Nov 20 10:44:01 knox systemd[1]: blackcoin.service: Start request repeated too quickly.
Nov 20 10:44:01 knox systemd[1]: blackcoin.service: Failed with result 'core-dump'.
Nov 20 10:44:01 knox systemd[1]: Failed to start blackcoin wallet daemon.
Here is the body of the systemd service:
##################################################################
## Blackcoin Systemd service ##
##################################################################
[Unit]
Description=blackcoin wallet daemon
After=network.target
[Service]
Type=forking
User=somedude
RuntimeDirectory=blackcoind
PIDFile=/run/blackcoind/blackcoind.pid
Restart=on-failure
ExecStart=/usr/bin/blackcoind \
-datadir=/home/somedude/blackcoin/ \
-conf=/home/somedude/blackcoin/blackcoin.conf \
-daemon
ExecStop=/usr/bin/blackcoind \
-datadir=/home/somedude/blackcoin/ \
-conf=/home/somedude/blackcoin/blackcoin.conf \
stop
# Recommended hardening
# Provide a private /tmp and /var/tmp.
PrivateTmp=true
# Mount /usr, /boot/ and /etc read-only for the process.
ProtectSystem=full
# Disallow the process and all of its children to gain
# new privileges through execve().
NoNewPrivileges=true
# Use a new /dev namespace only populated with API pseudo devices
# such as /dev/null, /dev/zero and /dev/random.
PrivateDevices=true
# Deny the creation of writable and executable memory mappings.
MemoryDenyWriteExecute=true
[Install]
WantedBy=multi-user.target
And this is what blackcoin.conf contains:
rpcuser=somedude
rpcpassword=12345 (please don't rob my coins!)
# Wallets
wallet=wallet-blackcoin.dat
pid=/run/blackcoind/blackcoind.pid
rpcport=56111
port=56112
I'm going to keep testing and will post anything new that I find. Thanks for looking!

start request repeated too quickly

I'm writing a bash-script but I often face this issue.
When I try to start or stop a service I often get:
start request repeated too quickly
How can I solve this problem?
It's for example when I try to restart docker or openshift-origin master.
sudo service origin-master restart
● origin-master.service - Origin Master Service
Loaded: loaded (/usr/lib/systemd/system/origin-master.service; enabled; vendor preset: disabled)
Active: failed (Result: start-limit) since Wed 2016-02-17 08:22:11 UTC; 44s ago
Docs: https://github.com/openshift/origin
Process: 2296 ExecStart=/usr/bin/openshift start master --config=${CONFIG_FILE} $OPTIONS (code=exited, status=255)
Main PID: 2296 (code=exited, status=255)
Feb 17 08:22:10 ip-172-xx-xx-xx.eu-central-1.compute.internal systemd[1]: origin-master.service: main process exited, code=exited, status=255/n/a
Feb 17 08:22:10 ip-172-xx-xx-xx.eu-central-1.compute.internal systemd[1]: Failed to start Origin Master Service.
Feb 17 08:22:10 ip-172-xx-xx-xx.eu-central-1.compute.internal systemd[1]: Unit origin-master.service entered failed state.
Feb 17 08:22:10 ip-172-xx-xx-xx.eu-central-1.compute.internal systemd[1]: origin-master.service failed.
Feb 17 08:22:11 ip-172-xx-xx-xx.eu-central-1.compute.internal systemd[1]: origin-master.service holdoff time over, scheduling restart.
Feb 17 08:22:11 ip-172-xx-xx-xx.eu-central-1.compute.internal systemd[1]: start request repeated too quickly for origin-master.service
Feb 17 08:22:11 ip-172-xx-xx-xx.eu-central-1.compute.internal systemd[1]: Failed to start Origin Master Service.
Feb 17 08:22:11 ip-172-xx-xx-xx.eu-central-1.compute.internal systemd[1]: Unit origin-master.service entered failed state.
Feb 17 08:22:11 ip-172-xx-xx-xx.eu-central-1.compute.internal systemd[1]: origin-master.service failed.
My script is just doing:
if [ $1 = "-u" ]
then
sudo service origin-master restart
fi
A manual restart is possible before I've executed the script. But after it it remains giving the error
This is a "feature" of systemctl. There is a parameter in the file that limits the restart frequency in seconds. Lower this while testing.
Edit the file
/etc/systemd/system/multi-user.target.wants/<your service here>
my example:
Restart=on-failure
StartLimitBurst=2
# Restart, but not more than once every 10 minutes
#StartLimitInterval=600
# Restart, but not more than once every 30s (for testing purposes)
StartLimitInterval=30
I suggest you familiarize yourself with systemd. That's what you're using under the hood when you run service. As #chepner says, the service is failing (as you can see from the second line of the log), and it's being restarted too quickly, triggering the error.
Try running journalctl -u origin-master.service to figure out why the error is happening.
Also, systemd cat origin-master.service will show you the Service Unit file that describes your service - there might be errors.
I had this problem on Ubuntu 20.4. And by adding execute permission to the ExecStart file the problem was solved.
sudo chmod +x /path/to/execfile
I have faced same issue and solved this problem like that:
if /var/log/mysql folder not exists:
sudo mkdir /var/log/mysql
and then give permission this folder:
sudo chown -R mysql:mysql /var/log/mysql
sudo systemctl stop mysql
sudo systemctl start mysql
in my case, there was a typing mistake in this file -> /etc/systemd/system/multi-user.target.wants/<your service here> so after tweaking necessary parameters, if you are still facing the same error, don't forget to check the file
In my case , is my /etc/docker/daemon.json file format error, when i make this true, run systemctl start docker the server start success.
Please try running the command :
td-agent --dry-run
This will give you the root cause.

Orphaned process when pacemaker kills main monitor script(LSB) due timeout

In our pacemaker + corosync cluster
Last updated: Thu Oct 22 21:16:33 2015
Last change: Thu Oct 22 17:25:13 2015 via cibadmin on aws015
Stack: corosync
Current DC: aws015 (2887647247) - partition with quorum
Version: 1.1.10-42f2063
4 Nodes configured
16 Resources configured
We have follow situation. We write python LSB script, that check status of some application, and make it as a resource:
primitive pm2_app_gardenscapesDynamo_lsb lsb:pm2_app_gardenscapesDynamo \
op start interval="0" timeout="60s" \
op stop interval="0" timeout="60s" \
op monitor interval="30s" timeout="60s" on-fail="restart" \
meta failure-timeout="10s" migration-threshold="1"
This check is made by utility that can hung (LSB script launch that utility, and wait for reply from it). So when pacemaker reach timeout, it kill our python script, but hung utility still exists in memory, and doesn't dies.
Is it possible to prevent this situation?
You need to upgrade to pacemaker 1.1.12 or more recent.
The reason this happens is because pacemaker starts resource agents in their own process group. When an operation times out, pacemaker (1.1.10) kills the RA only, leaving any child processes it might have started as "orphaned".
Version 1.1.12 instead kills the entire process group.
The relevant code is in lib/common/mainloop.c, function child_kill_helper

Resources