Ambari won't restart: DB check failed - hortonworks-data-platform

When I restarted my cluster, ambari didn't start because of a db check failed config:
sudo service ambari-server restart --skip-database-check
Using python /usr/bin/python
Restarting ambari-server
Waiting for server stop...
Ambari Server stopped
Ambari Server running with administrator privileges.
Organizing resource files at /var/lib/ambari-server/resources...
Ambari Server is starting with the database consistency check skipped. Do not make any changes to your cluster topology or perform a cluster upgrade until you correct the database consistency issues. See "/var/log/ambari-server/ambari-server-check-database.log" for more details on the consistency issues.
Server PID at: /var/run/ambari-server/ambari-server.pid
Server out at: /var/log/ambari-server/ambari-server.out
Server log at: /var/log/ambari-server/ambari-server.log
Waiting for server start.....................
DB configs consistency check failed. Run "ambari-server start --skip-database-check" to skip. You may try --auto-fix-database flag to attempt to fix issues automatically. If you use this "--skip-database-check" option, do not make any changes to your cluster topology or perform a cluster upgrade until you correct the database consistency issues. See /var/log/ambari-server/ambari-server-check-database.log for more details on the consistency issues.
ERROR: Exiting with exit code -1.
REASON: Ambari Server java process has stopped. Please check the logs for more information.
I looked in the logs in "/var/log/ambari-server/ambari-server-check-database.log", and I saw:
2017-08-23 08:16:13,445 INFO - Checking Topology tables
2017-08-23 08:16:13,447 ERROR - Your topology request hierarchy is not complete for each row in topology_request should exist at least one raw in topology_logical_request, topology_host_request, topology_host_task, topology_logical_task.
I tried both options --auto-fix-database and --skip-database-check, it didn't work.

It seems that postgresql didn't start correctly, and even if in the log of Ambari there was no mention of postgresql not started or not available, but it was weird that ambari couldn't access to the topology configuration stored in it.
sudo service postgresql restart
Stopping postgresql service: [ OK ]
Starting postgresql service: [ OK ]
It did the trick:
sudo service ambari-server restart
Using python /usr/bin/python
Restarting ambari-server
Ambari Server is not running
Ambari Server running with administrator privileges.
Organizing resource files at /var/lib/ambari-server/resources...
Ambari database consistency check started...
Server PID at: /var/run/ambari-server/ambari-server.pid
Server out at: /var/log/ambari-server/ambari-server.out
Server log at: /var/log/ambari-server/ambari-server.log
Waiting for server start.........
Server started listening on 8080

Related

hue said Resource Manager not available error but running fine

when i run the quick start met the error message
Potential misconfiguration detected. Fix and restart Hue.
Resource Manager : Failed to contact an active Resource Manager: YARN RM returned a failed response: HTTPConnectionPool(host='localhost', port=8088): Max retries exceeded with url: /ws/v1/cluster/apps?user=hue (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused',))
Hive : Failed to access Hive warehouse: /user/hive/warehouse
HBase Browser : The application won't work without a running HBase Thrift Server v1.
Impala : No available Impalad to send queries to.
Oozie Editor/Dashboard : The app won't work without a running Oozie server
Pig Editor : The app won't work without a running Oozie server
Spark : The app won't work without a running Livy Spark Server
i don't know why hue said error for resource manager.
i didn't install another things yet.
my resource manager is running and that api is no problem this - http://RMHOST:8088/ws/v1/cluster/apps?user=hue
response is
{
"apps": null
}
is there any problem i missed?
I changed localhost to My IP address like 192.168.x.x in resourcemanager_host, resourcemanager_api_url, proxy_api_url
I don't know why it works

Hive : The application won't work without a running HiveServer2

I am new to this field. I was checking CDH 5.8 quick-start VM to try some basic hive/impala example.
But I hit an issue, while I am opening HUE it's giving below error. I searched solution for but didnt get anything which can resolve my issue.
Configuration files located in /etc/hue/conf.empty
Potential misconfiguration detected. Fix and restart Hue.
Hive The application won't work without a running HiveServer2.
I checked the and it's up & running. Tried restarting the service & CDH, didnt help.
Hive Server2 is running [ OK ]
When navigated to Hive tried some command it gave me below error.
Could not connect to quickstart.cloudera:10000 (code THRIFTTRANSPORT): TTransportException('Could not connect to quickstart.cloudera:10000',)
FOR Impala I am getting
AnalysisException: This Impala daemon is not ready to accept user requests. Status: Waiting for catalog update from the StateStore.
Tried starting hive --service metastore but got error
[cloudera#quickstart conf.empty]$ hive --service metastore
2017-03-03 05:37:14,502 WARN [main] mapreduce.TableMapReduceUtil: The hbase-prefix-tree module jar containing PrefixTreeCodec is not present. Continuing without it.
Starting Hive Metastore Server
org.apache.thrift.transport.TTransportException: Could not create ServerSocket on address 0.0.0.0/0.0.0.0:9083.
Not sure what is wrong or if I need to change some config. Can you anyone guide me towards the solution ?
You HiveServer2 requires Metastore up and running. Seems your Metastore Server cannot start because the port 9083 is already used by some service. Check it:
netstat -tulpn | grep 9083
If something is using this port you need to either change the port of you metastore in hive configuration or stop the application which already uses this port.

Apache Phoenix Installation not done properly

We are trying to install Phoenix 4.4.0 on HBase 1.0.0-cdh5.4.4 (CDH5.5.5 four nodes cluster) via this installation document: Phoenix installation
Based on that we copied our phoenix-server-4.4.0-HBase-1.0.jar to hbase libs on each region server and master server, so that, on each /opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hbase/lib folder in the master and three region servers.
After that we reboot the HBase service via Cloudera Manager.
Everything seems to be ok, but when we are trying to access to phoenix shell via ./sqlline.py localhost command, we get a Zookeeper error in that way:
15/09/09 14:20:51 WARN client.ZooKeeperRegistry: Can't retrieve clusterId from Zookeeper
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
So we are not sure that the installation is properly done. Is necessary any further configuration?
We are not even sure wether we are using the sqlline command properly.
Any help will be appreciated.
After reinstalling the 4 nodes cluster on AWS, phoenix is now working properly.
It's a pitty that we don't know exactly what was really happening, but we think that after several changes in our config, we broke something that made phoenix impossible to work.
One thing to take into consideration is that sqllline command has to be executed with an ip that is in the zookeeper quorum, and this is something we were doing wrong, since we were trying to run it from the namenode, and it wasn't in the zookeeper quorum.Once we run sqlline.py from a datanode, everything is working fine.
Btw, the installation guide that we finally followed is Phoenix Installation

Ambari, connection failed on datanode

I have done an installation of Ambari with Hadoop on my cluster.
I'm behind a proxy, and I have no problem during the installation...
When I start all service on Ambari, I have this errors :
HDFS DataNode Process -> Connection failed: [Errno 111] Connection refused to 0.0.0.0:50010
HDFS DataNode Web UI -> Connection failed to http://testslave3.ingc.com:50075
I tried many possibilities, but I don’t know where is the problem…
If you have any suggestion… thanks a lot :)
Are you using MySQL as the DB repository? If so, after upgrading to the latest mysql-connector-java, and registering this to the ambari-server, I no longer had this issue:
Get the mysql-connector-java RPM:
wget ftp://rpmfind.net/linux/fedora/linux/development/rawhide/x86_64/os/Packages/m/mysql-connector-java-5.1.36-1.fc23.noarch.rpm
Install the RPM:
rpm -ivh mysql-connector-java-5.1.36-1.fc23.noarch.rpm
Confirm that .jar is in the Java share directory
ls /usr/share/java/mysql-connector-java.jar
Modify ambari-server to use this connector:
ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/share/java/mysql-connector-java.jar

Spark Standalone Mode: Worker not starting properly in cloudera

I am new to the spark, After installing the spark using parcels available in the cloudera manager.
I have configured the files as shown in the below link from cloudera enterprise:
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/4.8.1/Cloudera-Manager-Installation-Guide/cmig_spark_installation_standalone.html
After this setup, I have started all the nodes in the spark by running /opt/cloudera/parcels/SPARK/lib/spark/sbin/start-all.sh. But I couldn't run the worker nodes as I got the specified error below.
[root#localhost sbin]# sh start-all.sh
org.apache.spark.deploy.master.Master running as process 32405. Stop it first.
root#localhost.localdomain's password:
localhost.localdomain: starting org.apache.spark.deploy.worker.Worker, logging to /var/log/spark/spark-root-org.apache.spark.deploy.worker.Worker-1-localhost.localdomain.out
localhost.localdomain: failed to launch org.apache.spark.deploy.worker.Worker:
localhost.localdomain: at java.lang.ClassLoader.loadClass(libgcj.so.10)
localhost.localdomain: at gnu.java.lang.MainThread.run(libgcj.so.10)
localhost.localdomain: full log in /var/log/spark/spark-root-org.apache.spark.deploy.worker.Worker-1-localhost.localdomain.out
localhost.localdomain:starting org.apac
When I run jps command, I got:
23367 Jps
28053 QuorumPeerMain
28218 SecondaryNameNode
32405 Master
28148 DataNode
7852 Main
28159 NameNode
I couldn't run the worker node properly. Actually I thought to install a standalone spark where the master and worker work on a single machine. In slaves file of spark directory, I given the address as "localhost.localdomin" which is my host name. I am not aware of this settings file. Please any one cloud help me out with this installation process. Actually I couldn't run the worker nodes. But I can start the master node.
Thanks & Regards,
bips
Please notice error info below:
localhost.localdomain: at java.lang.ClassLoader.loadClass(libgcj.so.10)
I met the same error when I installed and started Spark master/workers on CentOS 6.2 x86_64 after making sure that libgcj.x86_64 and libgcj.i686 had been installed on my server, finally I solved it. Below is my solution, wish it can help you.
It seem as if your JAVA_HOME environment parameter didn't set correctly.
Maybe, your JAVA_HOME links to system embedded java, e.g. java version "1.5.0".
Spark needs java version >= 1.6.0. If you are using java 1.5.0 to start Spark, you will see this error info.
Try to export JAVA_HOME="your java home path", then start Spark again.

Resources