When I am trying to start the HDFS, I am getting error:
Service did not start successfully; not all of the required roles
started: only 0/2 roles started. Reasons : Service has only 0 NameNode
roles running instead of minimum required 1.
How can I go about resolving this issue? Due to this issue, I am not able to work on this Cloudera VM.
Related
When I restarted my cluster, ambari didn't start because of a db check failed config:
sudo service ambari-server restart --skip-database-check
Using python /usr/bin/python
Restarting ambari-server
Waiting for server stop...
Ambari Server stopped
Ambari Server running with administrator privileges.
Organizing resource files at /var/lib/ambari-server/resources...
Ambari Server is starting with the database consistency check skipped. Do not make any changes to your cluster topology or perform a cluster upgrade until you correct the database consistency issues. See "/var/log/ambari-server/ambari-server-check-database.log" for more details on the consistency issues.
Server PID at: /var/run/ambari-server/ambari-server.pid
Server out at: /var/log/ambari-server/ambari-server.out
Server log at: /var/log/ambari-server/ambari-server.log
Waiting for server start.....................
DB configs consistency check failed. Run "ambari-server start --skip-database-check" to skip. You may try --auto-fix-database flag to attempt to fix issues automatically. If you use this "--skip-database-check" option, do not make any changes to your cluster topology or perform a cluster upgrade until you correct the database consistency issues. See /var/log/ambari-server/ambari-server-check-database.log for more details on the consistency issues.
ERROR: Exiting with exit code -1.
REASON: Ambari Server java process has stopped. Please check the logs for more information.
I looked in the logs in "/var/log/ambari-server/ambari-server-check-database.log", and I saw:
2017-08-23 08:16:13,445 INFO - Checking Topology tables
2017-08-23 08:16:13,447 ERROR - Your topology request hierarchy is not complete for each row in topology_request should exist at least one raw in topology_logical_request, topology_host_request, topology_host_task, topology_logical_task.
I tried both options --auto-fix-database and --skip-database-check, it didn't work.
It seems that postgresql didn't start correctly, and even if in the log of Ambari there was no mention of postgresql not started or not available, but it was weird that ambari couldn't access to the topology configuration stored in it.
sudo service postgresql restart
Stopping postgresql service: [ OK ]
Starting postgresql service: [ OK ]
It did the trick:
sudo service ambari-server restart
Using python /usr/bin/python
Restarting ambari-server
Ambari Server is not running
Ambari Server running with administrator privileges.
Organizing resource files at /var/lib/ambari-server/resources...
Ambari database consistency check started...
Server PID at: /var/run/ambari-server/ambari-server.pid
Server out at: /var/log/ambari-server/ambari-server.out
Server log at: /var/log/ambari-server/ambari-server.log
Waiting for server start.........
Server started listening on 8080
I am installing ambari hortonworks on CentOS 6. I have a local repo set up. Two data nodes and one master/name node are running. The ambari server is running on master.Now my issue is installation fails for random reasons at step "Install, Start and Test". one or more node randomly fails for reasons like failed to install service (hdp slecect,hive,hadoop) when i hit retry some nodes fails in other combination. I am missing something ???
I am trying to setup hadoop cluster in Google Compute Engine through "Launch click-to-deploy software" feature .I have created 1 master and 1 slave node and tried to start the cluster using start-all.sh script from master node and i got error "permission denied(publickey)" .
I have generated public and private keys in both slave and master nodes .
currently i logged into the master with my username, is it mandatory to login into master as "hadoop" user .If so ,what is the password for that userid .
please let me know how to overcome this problem .
The deployment creates a user hadoop which owns Hadoop-specific SSH keys which were generated dynamically at deployment time; this means since start-all.sh uses SSH under the hood, you must do the following:
sudo su hadoop
/home/hadoop/hadoop-install/bin/start-all.sh
Otherwise, your "normal" username doesn't have SSH keys properly set up so you won't be able to launch the Hadoop daemons, as you saw.
Another thing to note is that the deployment should have already started all the Hadoop daemons automatically, so you shouldn't need to manually run start-all.sh unless you're rebooting the daemons after some manual configuration updates. If the daemons weren't running after the deployment ran, you may have encountered some unexpected error during initialization.
I am new to the spark, After installing the spark using parcels available in the cloudera manager.
I have configured the files as shown in the below link from cloudera enterprise:
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/4.8.1/Cloudera-Manager-Installation-Guide/cmig_spark_installation_standalone.html
After this setup, I have started all the nodes in the spark by running /opt/cloudera/parcels/SPARK/lib/spark/sbin/start-all.sh. But I couldn't run the worker nodes as I got the specified error below.
[root#localhost sbin]# sh start-all.sh
org.apache.spark.deploy.master.Master running as process 32405. Stop it first.
root#localhost.localdomain's password:
localhost.localdomain: starting org.apache.spark.deploy.worker.Worker, logging to /var/log/spark/spark-root-org.apache.spark.deploy.worker.Worker-1-localhost.localdomain.out
localhost.localdomain: failed to launch org.apache.spark.deploy.worker.Worker:
localhost.localdomain: at java.lang.ClassLoader.loadClass(libgcj.so.10)
localhost.localdomain: at gnu.java.lang.MainThread.run(libgcj.so.10)
localhost.localdomain: full log in /var/log/spark/spark-root-org.apache.spark.deploy.worker.Worker-1-localhost.localdomain.out
localhost.localdomain:starting org.apac
When I run jps command, I got:
23367 Jps
28053 QuorumPeerMain
28218 SecondaryNameNode
32405 Master
28148 DataNode
7852 Main
28159 NameNode
I couldn't run the worker node properly. Actually I thought to install a standalone spark where the master and worker work on a single machine. In slaves file of spark directory, I given the address as "localhost.localdomin" which is my host name. I am not aware of this settings file. Please any one cloud help me out with this installation process. Actually I couldn't run the worker nodes. But I can start the master node.
Thanks & Regards,
bips
Please notice error info below:
localhost.localdomain: at java.lang.ClassLoader.loadClass(libgcj.so.10)
I met the same error when I installed and started Spark master/workers on CentOS 6.2 x86_64 after making sure that libgcj.x86_64 and libgcj.i686 had been installed on my server, finally I solved it. Below is my solution, wish it can help you.
It seem as if your JAVA_HOME environment parameter didn't set correctly.
Maybe, your JAVA_HOME links to system embedded java, e.g. java version "1.5.0".
Spark needs java version >= 1.6.0. If you are using java 1.5.0 to start Spark, you will see this error info.
Try to export JAVA_HOME="your java home path", then start Spark again.
I am attempting to run a single-node instance of Hadoop on Amazon Web Services using Apache Whirr. I set whirr.instance-templates equal to 1 jt+nn+dn+tt. The instance starts up fine. I am able to create directories, but when I try to put files, I get a File could only be replicated to 0 nodes, instead of 1 error. When I do a hadoop fsck / I get a Exception in thread "main" java.net.ConnectException: Connection refused error. Does anyone know what is wrong with my configuration?
I made the experience that whirr does not always start all services reliable. It sounds like the namenode started (the namenode is responsible for storing directory information) but the datanode did not start (the datanode stores the data).
Try running
hadoop dfsadmin -report
to see if a datanode is available.
If not: often it helps to restart the cluster.