Setting fs.default.name in core-site.xml Sets HDFS to Safemode - hadoop

I installed the Cloudera CDH4 distribution on a single machine in pseudo-distributed mode and successfully tested that it was working correctly (e.g. can run MapReduce programs, insert data on the Hive server, etc.) However, if I chance the core-site.xml file to have fs.default.name set to machine name rather than localhost and restart the NameNode service, the HDFS enters safe-mode.
Before the change of fs.default.name, I ran the following to check the state of the HDFS:
$ hadoop dfsadmin -report
...
Configured Capacity: 18503614464 (17.23 GB)
Present Capacity: 13794557952 (12.85 GB)
DFS Remaining: 13790785536 (12.84 GB)
DFS Used: 3772416 (3.60 MB)
DFS Used%: 0.03%
Under replicated blocks: 2
Blocks with corrupt replicas: 0
Missing blocks: 0
Then I made the modification to core-site.xml (with the machine name being hadoop):
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop:8020</value>
</property>
I restarted the service and reran the report.
$ sudo service hadoop-hdfs-namenode restart
$ hadoop dfsadmin -report
...
Safe mode is ON
Configured Capacity: 0 (0 B)
Present Capacity: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used: 0 (0 B)
DFS Used%: NaN%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
An interesting note is that I can still perform some HDFS commands. For example, I can run
$ hadoop fs -ls /tmp
However, if I try to read a file using hadoop fs -cat or try to place a file in the HDFS, I am told the NameNode is in safemode.
$ hadoop fs -put somefile .
put: Cannot create file/user/hadinstall/somefile._COPYING_. Name node is in safe mode.
The reason I need the fs.default.name to be set to the machine name is because I need to communicate with this machine on port 8020 (the default NameNode port). If fs.default.name is left to localhost, then the NameNode service will not listen to external connection requests.
I am at a loss as to why this is happening and would appreciate any help.

The issue stemmed from domain name resolution. The /etc/hosts file needed to be modified to point the IP address of the machine of the hadoop machine for both localhost and the fully qualified domain name.
192.168.0.201 hadoop.fully.qualified.domain.com localhost

Safemode is an HDFS state in which the file system is mounted read-only; no replication is performed, nor can files be created or deleted. Filesystem operations that access the filesystem metadata like 'ls' in you case will work.
The Namenode can be manually forced to leave safemode with this command( $ hadoop dfsadmin -safemode leave).Verify status of safemode with ( $ hadoop dfsadmin -safemode get)and then run dfsadmin report to see if it shows data.If after getting out of safe mode the report still dose not show any data then i'm suspecting communication between namenode and datanode is not hapenning. Check namenode and datanode logs after this step.
The next steps could be to try restarting datanode process and last resort will be to format namenode which will result in loss of data.

Related

hadoop + Blocks with corrupt replicas

we have HDP cluster version 2.6.4 with ambari platform
from ambari dashboard we can see Blocks with corrupt replicas with 1
and also from
$ hdfs dfsadmin -report
Configured Capacity: 57734285504512 (52.51 TB)
Present Capacity: 55002945909856 (50.02 TB)
DFS Remaining: 29594344477833 (26.92 TB)
DFS Used: 25408601432023 (23.11 TB)
DFS Used%: 46.19%
Under replicated blocks: 0
Blocks with corrupt replicas: 1 <-----------------
Missing blocks: 0
Missing blocks (with replication factor 1): 0
in order to find the corrupted file we do the following
$ hdfs fsck -list-corruptfileblocks
Connecting to namenode via http://master.sys76.com:50070/fsck?ugi=hdfs&listcorruptfileblocks=1&path=%2F
The filesystem under path '/' has 0 CORRUPT files
but as we can see above we not found the file
also we did the following in order to delete the corrupted file
hdfs fsck / -delete
but still Blocks with corrupt replicas still remain with 1
any suggestions?
Consider that replicas and blocks are two different concepts.
Try using command:
hdfs fsck / | egrep -v '^\.+'
to find out more about unusual blocks.

Adding a new Namenode to an existing HDFS cluster

In Hadoop HDFS Federation the latest step of adding a new NameNode to an existing HDFS cluster is:
==> Refresh the Datanodes to pickup the newly added Namenode by running the following command against all the Datanodes in the cluster:
[hdfs]$ $HADOOP_PREFIX/bin/hdfs dfsadmin -refreshNameNodes <datanode_host_name>:<datanode_rpc_port>
Witch is the best place to execute the flowing command: NameNode or datanode ?
If I have 1000 Datanodes is it logical to run it 1OOO time ?
In namenode run this command once.
$HADOOP_PREFIX/sbin/slaves.sh hdfs dfsadmin -refreshNameNodes <datanode_host_name>:<datanode_rpc_port>
slaves.sh script will distribute the command to all the slave hosts which are mentioned in slaves file (typically placed in $HADOOP_CONF_DIR)

Unable to load large file to HDFS on Spark cluster master node

I have fired up a Spark Cluster on Amazon EC2 containing 1 master node and 2 servant nodes that have 2.7gb of memory each
However when I tried to put a file of 3 gb on to the HDFS through the code below
/root/ephemeral-hdfs/bin/hadoop fs -put /root/spark/2GB.bin 2GB.bin
it returns the error, "/user/root/2GB.bin could only be replicated to 0 nodes, instead of 1". fyi, I am able to upload files of smaller size but not when it exceeds a certain size (about 2.2 gb).
If the file exceeds the memory size of a node, wouldn't it will be split by Hadoop to the other node?
Edit: Summary of my understanding of the issue you are facing:
1) Total HDFS free size is 5.32 GB
2) HDFS free size on each node is 2.6GB
Note: You have bad blocks (4 Blocks with corrupt replicas)
The following Q&A mentions similar issues:
Hadoop put command throws - could only be replicated to 0 nodes, instead of 1
In that case, running JPS showed that the datanode are down.
Those Q&A suggest a way to restart the data-node:
What is best way to start and stop hadoop ecosystem, with command line?
Hadoop - Restart datanode and tasktracker
Please try to restart your data-node, and let us know if it solved the problem.
When using HDFS - you have one shared file system
i.e. all nodes share the same file system
From your description - the current free space on the HDFS is about 2.2GB , while you tries to put there 3GB.
Execute the following command to get the HDFS free size:
hdfs dfs -df -h
hdfs dfsadmin -report
or (for older versions of HDFS)
hadoop fs -df -h
hadoop dfsadmin -report

copyFromLocalFile doesn't work in CDH4

I've installed CDH4 on a ubuntu 12 LTS server successfully in the amazon cloud (1 server). I used Cloudera Manager free edition to install the software and had no errors).
I have a program that uses the java API to load a file from my home computer to HDFS in the cloud. I would like to know why this program fails and how to fix it.
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://node01:8020");
FileSystem fs = FileSystem.get(conf);
Path targetPath = new Path("/users/<username>/myfile.txt");
Path sourcePath = new Path("/home/<username>/myfile.txt");
fs.copyFromLocalFile(false,true,sourcePath,targetPath);
I get the following error (namenode log):
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/<username>/myfile.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1322)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2170)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
Then i upload my code to the cloud and run the code locally (uploading a file from the local fs to hdfs) there aren't any errors. It only happens when I run the code from my personal computer...
When i use the cli 'hadoop fs -put' command on my cloud server, I get no errors when writing to hdfs. I can also upload files using Hue. I've done some reading and found that this problem occurs when there isn't enough disk space, but I have plenty for both dfs and non-dfs (see report below). I can successfully read the hadoop filesystem with the java API from my home computer, and I can even connect and read/write from HBase using the API. All ports are open to my IP on this server. File permissions have been checked. After the program fails, I see the file I tried to upload in hdfs , but the contents are blank (similar to this post: https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!topic/cdh-user/XWA-3H0ekYY )
here is the output from hdfs dfsadmin -report
Configured Capacity: 95120474112 (88.59 GB)
Present Capacity: 95120474112 (88.59 GB)
DFS Remaining: 95039008768 (88.51 GB)
DFS Used: 81465344 (77.69 MB)
DFS Used%: 0.09%
Under replicated blocks: 177
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)
Live datanodes:
Name: privateip:port (node01)
Hostname: node01
Rack: /default
Decommission Status : Normal
Configured Capacity: 95120474112 (88.59 GB)
DFS Used: 81465344 (77.69 MB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 95039008768 (88.51 GB)
DFS Used%: 0.09%
DFS Remaining%: 99.91%
Last contact: Sun Jan 27 03:01:53 UTC 2013
I've resolved the problem-
I was connecting to hadoop from my home machine not on the hadoop local network. Apparently when you do this, the namenode tells my home machine to write to the datanode using the datanode's private IP. Not being on the same network, my home machine coulnd't connect to the datanode creating this error.
I resolved the problem by creating a VPN connection from my home network to the hadoop network and now everything work.

SafeModeException : Name node is in safe mode

I tried copying files from my local disk to hdfs . At first it gave SafeModeException. While searching for solution I read that the problem does not appear if one executes same command again. So I tried again and it didn't gave exception.
hduser#saket:/usr/local/hadoop$ bin/hadoop dfs -copyFromLocal /tmp/gutenberg/ /user/hduser/gutenberg
copyFromLocal: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /user/hduser/gutenberg. Name node is in safe mode.
hduser#saket:/usr/local/hadoop$ bin/hadoop dfs -copyFromLocal /tmp/gutenberg/ /user/hduser/gutenberg
Why is this happening?. Should I keep safemode off by using this code?
hadoop dfs -safemode leave
NameNode is in safemode until configured percent of blocks reported to be online by the data nodes. It can be configured by parameter dfs.namenode.safemode.threshold-pct in the hdfs-site.xml
For small / development clusters, where you have very few blocks - it makes sense to make this parameter lower then its default 0.9999f value. Otherwise 1 missing block can lead to system to hang in safemode.
Go to the hadoop path into bin(in my system /usr/local/hadoop/bin/),
cd /usr/local/hadoop/bin/
Check there is a file hadoop,
hadoopuser#arul-PC:/usr/local/hadoop/bin$ ls
the o/p will be,
hadoop hadoop-daemons.sh start-all.sh start-jobhistoryserver.sh stop-balancer.sh stop-mapred.sh
hadoop-config.sh rcc start-balancer.sh start-mapred.sh stop-dfs.sh task-controller
hadoop-daemon.sh slaves.sh start-dfs.sh stop-all.sh stop-jobhistoryserver.sh
Then you have to off safe mode by using command ./hadoop dfsadmin -safemode leave,
hadoopuser#arul-PC:/usr/local/hadoop/bin$ ./hadoop dfsadmin -safemode leave
you will get response as,
Safe mode is OFF
Note: I created Hadoop user with the name of hadoopuser.

Resources