Cloudera installation dfs.datanode.max.locked.memory issue on LXC - hadoop

I have created virtual box, ubuntu 14.04LTS environment on my mac machine.
In virtual box of ubuntu, I've created cluster of three lxc-containers. One for master and other two nodes for slaves.
On master, I have started installation of CDH5 using following link http://archive.cloudera.com/cm5/installer/latest/cloudera-manager-installer.bin
I have also made necessary changes in the /etc/hosts including FQDN and hostnames. Also created passwordless user named as "ubuntu".
While setting up the CDH5, during installation I'm constantly facing following error on datanodes. Max locked memory size: dfs.datanode.max.locked.memory of 922746880 bytes is more than the datanode's available RLIMIT_MEMLOCK ulimit of 65536 bytes.
Exception in secureMain: java.lang.RuntimeException: Cannot start datanode because the configured max locked memory size (dfs.datanode.max.locked.memory) of 922746880 bytes is more than the datanode's available RLIMIT_MEMLOCK ulimit of 65536 bytes.
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1050)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:411)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2297)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2184)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2231)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2407)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2431)

Krunal,
This solution will be probably be late for you but maybe it can help somebody else so here it is. Make sure your ulimit is set correctly. But in case its a config issue.
goto:
/run/cloudera-scm-agent/process/
find latest config dir,
in this case:
1016-hdfs-DATANODE
search for parameter in this dir:
grep -rnw . -e "dfs.datanode.max.locked.memory"
./hdfs-site.xml:163: <name>dfs.datanode.max.locked.memory</name>
and edit the value to the one he is expecting in your case(65536)

I solved by opening a seperate tab in Cloudera and set the value from there

Related

Whenever I restart my ubuntu system (Vbox) and start my hadoop , my name node is not working

Whenever I restart my ubuntu system (Vbox) and start my Hadoop, my name node is not working.
To resolve this I have to always the folders of namenode and datanode and format Hadoop every time I restart my system.
Since 2 days am trying to resolve the issue but its not working. I tried to give the permissions 777 again to the namenode and datanode folders, also I tried changing the paths for the same.
My error is
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /blade/Downloads/Hadoop/data/datanode is in an inconsistent state: storage directory does not exist or is not accessible
Please help me to resolve the issue.
You cannot just shutdown the VM. You need to cleanly stop the datanode and namenode processes in that order, otherwise there's a potential for a corrupted HDFS, causing you to need to reformat, assuming that you don't have a backup system
I'd also suggest putting Hadoop data for a VM in its own VM drive and mount, not a shared host folder under Downloads

hortonworks : start datanode failed

I have installed a new cluster HDP 2.3 using ambari 2.2. the problem is that namenode service can't be started and each time I try, I get the folowwing error. when I tried to find the problem I found an other error more explicit (port 50070 is used and I think that namenode use this port). Any one Has solved this problem before? thanks
resource_management.core.exceptions.Fail: Execution of 'ambari-sudo.sh
su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ;
/usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config
/usr/hdp/current/hadoop-client/conf start namenode'' returned 1.
starting namenode, logging to
/var/log/hadoop/hdfs/hadoop-hdfs-namenode-ip-10-8-23-175.eu-west-2.compute.internal.out
in order to install hortonworks cluster, ambari tries to set core file size limit to unlimited if not set initially. It seems like linux user which is installing the cluster doesn't have the privileges to set ulimits.
just set core file size to unlimited in
/etc/security/limits.conf and it should come up.
* soft core unlimited
* hard core unlimited

Mesos slave node unable to restart

I've setup a Mesos cluster using the CloudFormation templates from Mesosphere. Things worked fine after cluster launch.
I recently noticed that none of the slave nodes are listed in the Mesos dashboard. EC2 console shows the slaves are running & pass health checks. I restarted nodes on cluster but that didn't help.
I ssh'ed into one of the slaves and noticed mesos-slave services are not running. Executed sudo systemctl status dcos-mesos-slave.service but that couldn't start the service.
Looked in /var/log/mesos/ and tail -f mesos-slave.xxx.invalid-user.log.ERROR.20151127-051324.31267 and saw the following...
F1127 05:13:24.242182 31270 slave.cpp:4079] CHECK_SOME(state::checkpoint(path, bootId.get())): Failed to create temporary file: No space left on device
But the output of df -h and free show there is plenty of disk space left.
Which leads me to wonder, why is it complaining about no disk space?
Ok I figured it out.
When running Mesos for a long time or under frequent load, the /tmp folder won't have any disk space left since Mesos uses the /tmp/mesos/ as the work_dir. You see, the filesystem can only hold a certain number of file references(inodes). In my case, slaves were collecting large number of file chuncks from image pulls in /var/lib/docker/tmp.
To resolve this issue:
1) Remove files under /tmp
2) Set a different work_dir location
It is good practice to run
docker rmi -f $(docker images | grep "<none>" | awk "{print \$3}")
this way you will free space by deleting unused docker images

Installing Hadoop over 5 hard drives on a desktop

I have been working with installing Hadoop. I followed some instruction on a Udemy course, and I installed Hadoop on pseudo distributed mode, on my laptop. It was fairly straightforward.
After that, I started to wonder if I could set up Hadoop on a desktop computer. So went out and bought an empty case and put in a 64 bit, 8 core AMD processor, along with a 50GB SSD hard drive and 4 inexpensive 500GB hard drives. I installed Ubuntu 14.04 on the SSD drive, and put virtual machines on the other drives.
I'm envisioning using my SSD as the master and using my 4 hard drives as nodes. Again, everything is living in the same case.
Unfortunately, and I've been searching everywhere, and I can't find any tutorials, guides, books, etc, that describe setting up Hadoop in this manner. It seems like most everything I've found that details installation of Hadoop is either a simple pseudo distributed setup (which I've already done), or else the instructions jump straight to large scale commercial applications. I'm still learning the basics, clearly, but I'd like to play in this sort-of in between place.
Has anyone done this before, and/or come across any documentation / tutorials / etc that describe how to set Hadoop up in this way? Many thanks in advance for the help.
You can run hadoop in different VM's which are located in different drives in the same system.
But you need to allocate same configurations for all the master and slave nodes
Also ensure that all the VM's having different ip addresses.
You can get different IP addresses by connecting your master computer to the LAN or you need to disable some functionality in VM machines in order to get different IP addresses.
if you done the hadoop installation in pseduo mode means then follow the below steps this may help you.
MULTINODE :
Configure the hosts in the network using the following settings in the host file. This has to be done in all machine [in namenode too].
sudo vi /etc/hosts
add the following lines in the file:
yourip1 master
yourip2 slave01
yourip3 slave02
yourip4 slave03
yourip5 slave04
[Save and exit – type ESC then :wq ]
Change the hostname for the namenode and datanodes.
sudo vi /etc/hostname
For master machine [namenode ] – master
For other machines – slave01 and slave02 and slave03 and slave04 and slave 05
Restart the machines in order to get the settings related to the network applied.
sudo shutdown -r now
Copy the keys from the master node to all datanodes, so as this will help to access the machines without asking for permissions everytime.
#ssh-copy-id –i ~/.ssh/id_rsa.pub hduser#slave01
#ssh-copy-id –i ~/.ssh/id_rsa.pub hduser#slave02
#ssh-copy-id –i ~/.ssh/id_rsa.pub hduser#slave03
#ssh-copy-id –i ~/.ssh/id_rsa.pub hduser#slave04
Now we are about to configure the hadoop configuration settings, so navigate to the ‘conf’ folder.
cd ~/hadoop/etc
Edit the slaves file within the hadoop directory.
vi ~/hadoop/conf/slaves
And add the below :
master
slave01
slave02
slave03
slave04
Now update localhost to master in core-site.xml,hdfs-site.xml,mapred-site.xml and yarn-site.xml
Now copy the files from the hadoop/etc/hadoop folder from master to slave machines.
then format you namenode in all machines.
and start the hadoop services.
I given you the some clues for how to configure the hadoop multinode cluster.
Never tried, but if you type ifconfig then it gives you same ipaddress on all the vm machines in hard drives. So this may not be the better option to go..
You can try creating Hadoop Cluster on Amazon EC2 for free using this step by step guide HERE
Or Video guide HERE
Hope it helps!

Hadoop JobClient: Error Reading task output

I'm trying to process 40GB of Wikipedia English articles on my cluster. The problem is the following repeating error message:
13/04/27 17:11:52 INFO mapred.JobClient: Task Id : attempt_201304271659_0003_m_000046_0, Status : FAILED
Too many fetch-failures
13/04/27 17:11:52 WARN mapred.JobClient: Error reading task outputhttp://ubuntu:50060/tasklog?plaintext=true&attemptid=attempt_201304271659_0003_m_000046_0&filter=stdout
When I run the same MapReduce program on a smaller part of the Wikipedia articles rather than the full set, it works just fine and I get all the desired results. Based on that, I figured maybe its a memory issue. I cleared all the user logs (as specified in a similar post) and tried again. No Use.
I turned down replication to 1 and added a few more nodes. Still no use.
The cluster summary are as follows:
Configured Capacity: 205.76 GB
DFS Used: 40.39 GB
Non DFS USed: 44.66 GB
DFS Remaining: 120.7 GB
DFS Used%: 19.63%
DFS Remaining%: 58.66%
Live Nodes: 12
Dead Nodes: 0
Decomissioned Nodes: 0
Number of Under Replicated Blocks: 0
Each node runs on Ubuntu 12.04 LTS
Any help is appreciated.
EDIT
JobTracker Log: http://txtup.co/gtBaY
TaskTracker Log: http://txtup.co/wEZ5l
Fetch-failures are often due to DNS problems. Check each datanode to be sure that the hostname and ip address it is configured with match DNS resolves for that hostname.
You can do this by visiting each node in your cluster and run hostname and ifconfig and note the hostname and ip address returned. Lets say, for instance, this returns the following:
namenode.foo.com 10.1.1.100
datanode1.foo.com 10.1.1.1
datanode2.foo.com 10.1.1.2
datanode3.foo.com 10.1.1.3
Then, revisit each node and nslookup all the hostnames returned from the other nodes. Verify that the returned ip address matches the one found from ifconfig. For instance, when on datanode1.foo.com, you should do the following:
nslookup namenode.foo.com
nslookup datanode2.foo.com
nslookup datanode3.foo.com
and you should get back:
    10.1.1.100
    10.1.1.2
    10.1.1.3
When you ran your job on a subset of data, you probably didn't have enough splits to start a task on the datanode(s) that are misconfigured.
I had a similar problem and was able to find a solution. The problem lies on how hadoop deals with smaller files. In my case, I had about 150 text files that added up to 10MB. Because of how the files are "divided" into blocks the system runs out of memory pretty quickly. So to solve this you have to "fill" the blocks and arrange your new files so that they are spread nicely into blocks. Hadoop lets you "archive" small files so that they are correctly allocated into blocks.
hadoop archive -archiveName files.har -p /user/hadoop/data /user/hadoop/archive
In this case I created an archive called files.har from the /user/hadoop/data folder and stored it into the folder /user/hadoop/archive. After doing this, I rebalance the cluster allocation using start-balancer.sh.
Now when I run the wordcount example agains the files.har everything works perfectly.
Hope this helps.
Best,
Enrique
I had exactly the same problem with Hadoop 1.2.1 on an 8-node cluster. The problem was in the /etc/hosts file. I removed all entries containing "127.0.0.1 localhost". Instead of "127.0.0.1 localhost" you should map your IP Address to your hostname (e.g. "10.15.3.35 myhost"). Note, that you should that for all nodes in the cluster. So, in a two-node cluster,the master's /etc/hosts should contain "10.15.3.36 masters_hostname" and the slave's /etc/hosts should contain "10.15.3.37 slave1_hostname". After these changes, it would be good to restart the cluster.
Also have a look here for some basic Hadoop troubleshooting :Hadoop Troubleshooting

Resources