I installed Ganglia to monitor the HBase cluster. I'm using ganglia-3.3.0.
Hadoop version: hadoop-1.1.2
HBase version : hbase-0.94.8
My Hadoop cluster comprises of 1 master node and 2 slave nodes.
Ganglia gmetad_server is configured on the master node
I changed the hbase/conf/hadoop-metrics.properties file.
hbase.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
hbase.period=10
hbase.servers=hostname_of_ganglia_server:8649
I started the service gmond on the master as well as slaves.
I get the basic metrics from the cluster (cpu, disk, load, ...)
But I'm not getting any HBase metrics from the Cluster.
The mistake was with the gmond.conf file. When I commented the following values, I got the HBase metrics in Ganglia.
mcast_join = 239.2.11.71
bind = 239.2.11.71
Related
I am new in that area and have some problems with my hadoop cluster.
I am installed CDH with Cloudera Manager on the cluster with 7 servers.
Inspection of all hasts was successful.
Image of successful inspection
In the installation process, there were some errors, but after the restarting of the cluster they disappeared and I've fixed the other health issues.
Now the status of my cluster looks like next image:
Status of my hadoop cluster
But i still can't watch the Cluster disk IO statistic because is NO DATA.
How can i solve this problem?
I have the hadoop cluster. Now i want to install the pig and hive on another machines as a client. The client machine will not be a part of that cluster so is it possible? if possible then how i connect that client machine with cluster?
First of all, If you have Hadoop cluster then you must have Master node(Namenode) + Slave node(DataNode)
The one another thing is Client node.
The working of Hadoop cluster is:
Here Namenode and Datanode forms Hadoop Cluster, Client submits job to Namenode.
To achieve this, Client should have same copy of Hadoop Distribution and configuration which is present at Namenode.
Then Only Client will come to know on which node Job tracker is running, and IP of Namenode to access HDFS data.
Go to Link1 Link2 for client configuration.
According to your question
After complete Hadoop cluster configuration(Master+slave+client). You need to do following steps :
Install Hive and Pig on Master Node
Install Hive and Pig on Client Node
Now Start Coding pig/hive on client node.
Feel free to comment if doubt....!!!!!!
I'm trying to set up and use a 4-node Hadoop cluster.
Setting up seems to go fine, as everything is running in the master and slave nodes.
Master: DataNode, ResourceManager, SecondaryNameNode, NameNode, NodeManager
Slaves: NodeManager, DataNode
Also, the logs show no errors. When I try to run my code however, it takes roughly the same amount of time as when I run it on a single node. Also, there is no increased CPU activity on any of the slave nodes.
Slaves can ssh to the master node, master node is listening at the correct port, ...
Any help on how I can track down the problem?
Thanks!
OS: Ubuntu 14.04.2
Hadoop version: 2.6.0
I have seen related questions, but they were no help to me:
hadoop cluster is using only master node or all nodes
Hadoop use only master node for processing data
Basically you have only one datanode and two nodemangers. It not much great configuration compared to single node cluster. To check whats happen you can goto resource manager UI . By default its on port 8088.
I have 4 nodes in my cluster. 1 will be master node, 1will be secondary master node and 2 will be slaves. All these nodes have single node setup running. For multinode setup is there any document available?
Are you using the apache hadoop version or anything from the distributions like cloudera or hortonworks.
For apache hadoop set up refer this.
http://hadoop.apache.org/docs/r0.18.3/cluster_setup.pdf
I've setup a Hadoop 2.5 cluster with 1 master node(namenode and secondary namenode and datanode) and 2 slave nodes(datanode).All of the machines use Linux CentOS 7 - 64bit. When I run my MapReduce program (wordcount), I can only see that master node is using extra CPU and RAM. Slave nodes are not doing a thing.
I've checked the logs from all of the namenode and there is nothing wrong on slave nodes. Resource Manager is running and all of the slave nodes can see the Resource Manager.
Datanodes are working in terms of distributed data storing but I can't see any indication of distributed data processing. Do I have to configure the xml configuration files in some other way so all of the machines will process data while I'm running my MapReduce Job?
Thank you
Make sure you are mentioaning the IP's Addresses of the daanodes on the Masternode networking files. Also each node in the cluster is supposed to contain IP address of the other machines.
Besides that check the includes file if it contains the relevant datanodes entry onto it or not.