ambari metrics diskspacecheck issue - hadoop

I am installing a new hadoop cluster(total 5 nodes) using the Ambari dashboard. While deploying the cluster it fails but with warnings of disk space issues and error messages like '/' needs atleast 2GB of diskspace for mount. But I have allocated total 50GB of disk to each node. Upon googling for the solution I found that I need to make diskspacecheck=0 in the etc/yum.conf file as suggested in the below link(point 3.6):
http://docs.hortonworks.com/HDPDocuments/Ambari-2.1.0.0/bk_ambari_troubleshooting/content/_resolving_ambari_installer_problems.html
But I am using ubuntu image in the nodes and there is no yum file. And I didn't get any file with "diskspacecheck" parameter. Can anybody tell me how to solve this issue and successfully deploy my cluster?

Related

Hortonworks Data Platform: High load causes node restart

I have setup a Hadoop Cluster with Hortonworks Data Platform 2.5. I'm using 1 master and 5 slave (worker) nodes.
Every few days one (or more) of my worker nodes gets a high load and seem to restart the whole CentOS operating system automatically. After the restart the Hadoop components don't run anymore and have to be restarted manually via the Amabri management UI.
Here a screenshot of the "crashed" node (reboot after the high load value ~4 hours ago):
Here a screenshot of one of other "healthy" worker node (all other workers have similar values):
The node crashes alternate between the 5 worker nodes, the master node seems to run without problems.
What could cause this problem? Where are these high load values coming from?
This seems to be a Kernel problem, as the log file (e.g. /var/spool/abrt/vmcore-127.0.0.1-2017-06-26-12:27:34/backtrace) says something like
Version: 3.10.0-327.el7.x86_64
BUG: unable to handle kernel NULL pointer dereference at 00000000000001a0
After running a sudo yum update I had the kernel version
[root#myhost ~]# uname -r
3.10.0-514.26.2.el7.x86_64
Since the operating system updates the problem didn't occur anymore. I will observe the issue and give feedback if neccessary.

Hadoop Data Corrupted Following Power Failure

I'm new to Hadoop and learning to use it by working with a small cluster where each node is an Ubuntu Server VM. The cluster consists of 1 name node and 3 data nodes with a replication factor of 3. After a power loss on the machine hosting the VMs, all files stored in the cluster were corrupted and with the blocks storing those files missing. No queries were running at the time power was lost and no files were being written to or read from the cluster.
If I shut down the VMs correctly (even without first stopping the Hadoop cluster), then the data is preserved and I don't run into any issues with missing or corrupted blocks.
The only information I've been able to find suggested setting dfs.datanode.sync.behind.writes to true, but this did not resolve the issue (killing the VMs from the host causes the same issue as a power failure). The information I found here seems to indicate this property will only have an effect when writing data to the disk.
I also tried running hdfs namenode -recover, but this did not resolve the issue. Ultimately I had to remove the data stored in the dfs.namenode.name.dir directory, rebooted each VM in the cluster to remove any Hadoop files in /tmp and reformatted the name node before copying the data back into the cluster from local file storage.
I understand that having all nodes in the cluster running on the same hardware and only 3 data nodes to go with a replication factor of 3 is not an ideal configuration, but I'd like a way to ensure that any data that is already written to disk is not corrupted by a power loss. Is there a property or other configuration I need to implement to avoid this in the future (besides separate hardware, more nodes, power backup, etc.)?
EDIT: To clarify further, the issue I'm trying to resolve is data corruption, not cluster availability. I understand I need to make changes to the overall cluster architecture to improve reliability, but I'd like a way to ensure data is not lost even in the event of a cluster-wide power failure.

Continously shows Capacity used 90%

I've two questions.
How to mount the directory for Ambari disk usage.
I started to run the tera gen program and it does not go beyond 10% map tasks, Ambari continously shows me the message that: Capacity Used: [90.69%, 27.7 GB], Capacity Total: [30.5 GB], path=/usr/hdp I restarted the cluster, restarted Ambari but no use.
What is the way around?
Well,
After a few trial error I found the solution for the same.
You can change the location of log and local directories to bigger place
Remove the old log files from Ambari server.
Documented here.

Multiple datanodes on a single machine in hadoop2.7.1

I am working on hadoop hdfs 2.7.1. I have set up a single node cluster having one datanode. But now i need to set up three datanodes on the same machine. I tried using various methods available on the internet but am unable to start the hadoop cluster having three datanodes on the same machine. Please help me.
You can run a multi-node cluster on a single machine using Docker containers. The guys at SequenceIQ, a company that was recently acquired by Hortonworks, even prepared Docker images that you can download. See here:
http://blog.sequenceiq.com/blog/2014/06/19/multinode-hadoop-cluster-on-docker/

Region server geting down frequently after system start

I am running hbase on HDP on Amazon machine,
When i reboot my system and start all hbase services, it get started.
But after some time my region server get down.
Latest error that i am getting from its log file is that
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /apps/hbase/data/usertable/dd5a251551619e0109349a0dce855e1b/recovered.edits/0000000000000001172.temp could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1657)
Now i am not able to start it.
Any suggestion why it is happing.
Thanks in advance.
Make sure you datanodes are up and running. Also, set "dfs.data.dir" to some permanent location, if you haven't done it yet. It defaults to the "/tmp" dir which gets emptied at each restart. Also, make sure that your datanodes are able to talk to the namenode and there is no network related issue and the datanode machines have enough free space left.

Resources