I lost all the heartbeats with Ambari on one of the nodes of a cluster of 4 nodes.
http://i.stack.imgur.com/51Gie.png
I already tried to reboot the cluster, restart ambari-agent, ambari-server and restart some of the services manually like yarn. Nothing work and I am stuck now.
Ambari is in 2.1.1
Related
I recently installed the CDH distribution of Cloudera 6.1 to create a two node cluster. From the Cloudera Manager UI, all services are running fine.
However the namenode (and datanode) web UI alone is not opening.
Firewall is already disabled.
Any pointers on how to debug this problem?
Is anything listening on the ports on the host itself?
Did you check namenode logs? They should be in /var/log/hadoop-hdfs/hadoop-cmf-hdfs-NAMENODE-your.server.hostname.com.log.out
I'm adding Accumulo to my Cloudera cluster.
How should I assign roles.
I have 4 servers currently running.
1 Server: HDFS Name Node, HDFS Secondary Name node, HDFS Balancer, Activity Monitor, Cloudera Management Services, Spark Gateway, Spark History Server, Yarn Job History Server, Yarn Resource Manager, Zookeeper Server
3 Servers: HDFS Data Node, Kafka Broker, Spark Gateway, Yarn Node Manager, Zookeeper Server
Cloudera wizard asks for assignment of the following Accumulo roles: Master, Tablet Server, Garbage Collector, Monitor, Tracer, Gateway.
Is it OK if Tablet Server role is assigned to all HDFS Data Nodes and all other roles to first server?
Is there a sense to assign Accumulo Gateway to the same nodes as Tablet Server?
Yes, running the Accumulo Master, Garbage Collector, Monitor, and Tracer on the first server and running TabletServers on the others make sense.
I'm not sure what the "Accumulo Gateway" is; Apache Accumulo has no such component/service called "Gateway".
Two of my drives crashed on the Ambari Server node so I have to re-migrate my Ambari Cluster. No real data was lost (due to a different backup strategy) but the configuration files of the node, including Ambari Server configuration, are gone.
Because two drives crashed, I can not access any files from that node anymore (RAID 5).
I am now in the process of reinstalling the Ambari Server on the same node and would like to have my agents seamlessly reconnect to the "new" Ambari Server.
Is there a way to migrate the existing Cluster settings to the Ambari Server? I am thinking of Cluster settings that were distributed to the agents or similar.
If there is no such way to migrate the cluster, how would I go and install the Ambari Server? Do a fresh install and setup everything again? Will the Ambari agents be able to connect to the "new" Cluster without problems? Note that the Ambari Server will run on the same hostname/ip.
After restarting my 3 masters in my DC/OS cluster, the DC/OS dashboard is showing 0 connected nodes. However from the DC/OS cli I see all 6 of my agent nodes:
$ dcos node
HOSTNAME IP ID
172.16.1.20 172.16.1.20 a7af5134-baa2-45f3-892e-5e578cc00b4d-S7
172.16.1.21 172.16.1.21 a7af5134-baa2-45f3-892e-5e578cc00b4d-S12
172.16.1.22 172.16.1.22 a7af5134-baa2-45f3-892e-5e578cc00b4d-S8
172.16.1.23 172.16.1.23 a7af5134-baa2-45f3-892e-5e578cc00b4d-S6
172.16.1.24 172.16.1.24 a7af5134-baa2-45f3-892e-5e578cc00b4d-S11
172.16.1.25 172.16.1.25 a7af5134-baa2-45f3-892e-5e578cc00b4d-S10`
I am still able to schedule tasks in Marathon both from the dcos cli and from the Marathon gui, they then are properly scheduled and executed on the agents. Also, from the mesos interface on :5050 I can see all of the agents in the slaves page.
I have restarted agent nodes and master nodes. I have also rerun the DC/OS GUI installer and run preflight check, which of course fails with an "already installed" error.
Is there a way to re-register the node with DC/OS GUI short of uninstalling/reinstalling a node?
For anyone who is running into this, my problem was related to our corporate proxy. In order to get the Universe working in my cluster I had to add proxy settings to /opt/mesosphere/environment. I then restarted the dcos-cosmos.service and life was good. However, upon server restart, dcos-history-service.service was now running with the new environment and was unable to resolve my local names with our proxy server. To solve, I added a NO_PROXY to the /opt/mesosphere/environment and DCOS dashboard is again happy.
HDP cluster deployed successfully on AWS EC2. After restart of the HDP cluster nodes, heartbeat lost from ambari server as all Public and Private IP’s and DNS are changed.
Where in ambari server we can configure new IP’s or DNS ??
First, Ambari requires to have FQHN for all your nodes. It is best practice to assign proper hostnames on all your nodes.
A simple word-around for getting back the heartbeat on your Ambari server is to run the following on all your clients nodes:
sudo ambari-agent restart your_ambari.server.hostname.com
It worked for me on Ambari 2.0 and Ubuntu 12. Good luck!