Cluster disk IO - NO DATA Cloudera Hadoop - hadoop

I am new in that area and have some problems with my hadoop cluster.
I am installed CDH with Cloudera Manager on the cluster with 7 servers.
Inspection of all hasts was successful.
Image of successful inspection
In the installation process, there were some errors, but after the restarting of the cluster they disappeared and I've fixed the other health issues.
Now the status of my cluster looks like next image:
Status of my hadoop cluster
But i still can't watch the Cluster disk IO statistic because is NO DATA.
How can i solve this problem?

Related

Multiple datanodes on a single machine in hadoop2.7.1

I am working on hadoop hdfs 2.7.1. I have set up a single node cluster having one datanode. But now i need to set up three datanodes on the same machine. I tried using various methods available on the internet but am unable to start the hadoop cluster having three datanodes on the same machine. Please help me.
You can run a multi-node cluster on a single machine using Docker containers. The guys at SequenceIQ, a company that was recently acquired by Hortonworks, even prepared Docker images that you can download. See here:
http://blog.sequenceiq.com/blog/2014/06/19/multinode-hadoop-cluster-on-docker/

"LOST" node in EMR Cluster

How do I troubleshoot and recover a Lost Node in my long running EMR cluster?
The node stopped reporting a few days ago. The host seems to be fine and HDFS too. I noticed the issue only from the Hadoop Applications UI.
EMR nodes are ephemeral and you cannot recover them once they are marked as LOST. You can avoid this in first place by enabling 'Termination Protection' feature during a cluster launch.
Regarding finding reason for LOST node, you can probably check YARN ResourceManager logs and/or Instance controller logs of your cluster to find out more about root cause.

Ganglia fails to communicate with Apache HBase

I installed Ganglia to monitor the HBase cluster. I'm using ganglia-3.3.0.
Hadoop version: hadoop-1.1.2
HBase version : hbase-0.94.8
My Hadoop cluster comprises of 1 master node and 2 slave nodes.
Ganglia gmetad_server is configured on the master node
I changed the hbase/conf/hadoop-metrics.properties file.
hbase.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
hbase.period=10
hbase.servers=hostname_of_ganglia_server:8649
I started the service gmond on the master as well as slaves.
I get the basic metrics from the cluster (cpu, disk, load, ...)
But I'm not getting any HBase metrics from the Cluster.
The mistake was with the gmond.conf file. When I commented the following values, I got the HBase metrics in Ganglia.
mcast_join = 239.2.11.71
bind = 239.2.11.71

Hadoop use only master node for processing data

I've setup a Hadoop 2.5 cluster with 1 master node(namenode and secondary namenode and datanode) and 2 slave nodes(datanode).All of the machines use Linux CentOS 7 - 64bit. When I run my MapReduce program (wordcount), I can only see that master node is using extra CPU and RAM. Slave nodes are not doing a thing.
I've checked the logs from all of the namenode and there is nothing wrong on slave nodes. Resource Manager is running and all of the slave nodes can see the Resource Manager.
Datanodes are working in terms of distributed data storing but I can't see any indication of distributed data processing. Do I have to configure the xml configuration files in some other way so all of the machines will process data while I'm running my MapReduce Job?
Thank you
Make sure you are mentioaning the IP's Addresses of the daanodes on the Masternode networking files. Also each node in the cluster is supposed to contain IP address of the other machines.
Besides that check the includes file if it contains the relevant datanodes entry onto it or not.

How to administer Hadoop Cluster

i have running 4 nodes hadoop cluster and i am asking about any way to administer that cluster remotely
for example
administering the cluster from my laptop for
executing MapReduce tasks
disabling or enabling data nodes
is there any way to do that remotely ?
If you're using the Cloudera distribution, the Cloudera Manager webapp would let you do that.
Other distributions may have similar control apps. That would give you per-node control.
For executing MR tasks, you would setup normally submit the job from an external node anyway, pointing to the correct JobTracker and NameNode. So I'm not sure what else you're asking for there.

Resources