I am learning about cloudera and came across Ambari agent that resides in each host that is part of a hadoop cluster. So while configuring/creating the cluster does Ambari agent generate the IP addresses for the hosts and send them to DNS or is my understanding completely wrong.
Thanks in advance :)
The agent reports the host information to the Ambari server, it doesn't manipulate anything outside of the Hadoop processes.
The IP & hostname of the nodes would already be assigned prior to installation of the agent
Related
Newbie w/ etcd/zookeeper type services ...
I'm not quite sure how to handle cluster installation for etcd. Should the service be installed on each client or a group of independent servers? I ask because if I'm on a client, how would I query the cluster? Every tutorial I've read shows a curl command running against localhost.
For etcd cluster installation, you can install the service on independent servers and form a cluster. The cluster information can be queried by logging onto one of the machines and running curl or remotely by specifying the IP address of one of the cluster member node.
For more information on how to set it up, follow this article
What is the difference between Apache Ambari Server and Agent?
What is the role\tasks of the server vs Agent?
Ambari server collects informations from all Ambari clients and sends operations to clients (start/stop/restart a service, change configuration for a service, ...).
Ambari client sends informations about machine and services installed on this machine.
You have one Ambari server for your cluster and one Ambari agent per machine on your cluster.
If you need more details, Ambari architecture's is explained here
I have two machines. One machine runs HBase 0.92.2 in pseudo-distributed mode, while the other one is using Nutch 2.x crawler. How can I configure these two machines so that one machine with HBase-0.92.2 acts as back end storage and the other with Nutch-2.x acts as a crawler?
I finally did it.I was easy to do.
i am sharing my experience here. May be it can help someone.
1- change the configuration file of hbase-site.xml for pseudo distributed mode.
2- MOST IMPORTANT THING: on hbase machine, replace localhost ip in /etc/hosts with your real network ip like this
10.11.22.189 master localhost
hbase machine's ip = 10.11.22.189
(note: if you won't change your hbase machine's localhost ip, remote nutch crawler won't be able to connect to it)
4- copy/symlink hbase-site.xml into $NUTCH_HOME/conf
5- start your crawler and see it working
I'm trying to use the Dedoop application that runs using Hadoop and HDFS on Amazon EC2. The Hadoop cluster is set up and the Namenode JobTracker and all other Daemons are running without error.
But the war Dedoop.war application is not able to connect to the Hadoop Namenode after deploying it on tomcat.
I have also checked to see if the ports are open in EC2.
Any help is appreciated.
If you're using Amazon AWS, I highly recommend using Amazon Elastic Map Reduce. Amazon takes care of setting up and provisioning the Hadoop cluster for you, including things like setting up IP addresses, NameNode, etc.
If you're setting up your own cluster on EC2, you have to be careful with public/private IP addresses. Most likely, you are pointing to the external IP addresses - can you replace them with the internal IP addresses and see if that works?
Can you post some lines of the Stacktrace from Tomcat's log files?
Dedoop must etablish an SOCKS proxy server (similar to ssh -D port username#host) to pass connections to Hadoop nodes on EC2. This is mainly because Hadoop resolves puplic IPS to EC2-internal IPs which breaks MR Jobs submission and HDFS management.
To this end Tomcat must be configured to to etablish ssh connections. The setup procedure is described here.
I recently installed the CDH distribution of Cloudera to create a 2 node cluster. From the Cloudera Manager UI, all services are running fine.
All the command line tools (hive etc ) are also working fine and I am able to read and write data to hdfs.
However the namenode (and datanode) web UI alone is not opening. Checking on netstat -a | grep LISTEN, the processes are listening on the assigned ports and there are no firewall rules which are blocking the connections ( I already disabled iptables)
I initially though that it could be a DNS issue but even the IP address is not working. The Cloudera Manager installed on the same machine on another port is opening fine.
Any pointers on how to debug this problem?
I had faced the same issue.
First it was because NAMENODE in safemode
then after because of two IP address (I have two NIC configured on CDH Cluster one internal connectivity of the servers (10.0.0.1) and other is to connect servers form Internet (192.168.0.1))
When i try to open NAMENODE GUI form any of the server connected to cluster on network 10.0.0.1 then GUI is opening and works fine but from other any other machine connected to servers by 192.168.0.1 network it fails.