copyFromLocalFile doesn't work in CDH4 - hadoop

I've installed CDH4 on a ubuntu 12 LTS server successfully in the amazon cloud (1 server). I used Cloudera Manager free edition to install the software and had no errors).
I have a program that uses the java API to load a file from my home computer to HDFS in the cloud. I would like to know why this program fails and how to fix it.
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://node01:8020");
FileSystem fs = FileSystem.get(conf);
Path targetPath = new Path("/users/<username>/myfile.txt");
Path sourcePath = new Path("/home/<username>/myfile.txt");
fs.copyFromLocalFile(false,true,sourcePath,targetPath);
I get the following error (namenode log):
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/<username>/myfile.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1322)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2170)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
Then i upload my code to the cloud and run the code locally (uploading a file from the local fs to hdfs) there aren't any errors. It only happens when I run the code from my personal computer...
When i use the cli 'hadoop fs -put' command on my cloud server, I get no errors when writing to hdfs. I can also upload files using Hue. I've done some reading and found that this problem occurs when there isn't enough disk space, but I have plenty for both dfs and non-dfs (see report below). I can successfully read the hadoop filesystem with the java API from my home computer, and I can even connect and read/write from HBase using the API. All ports are open to my IP on this server. File permissions have been checked. After the program fails, I see the file I tried to upload in hdfs , but the contents are blank (similar to this post: https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!topic/cdh-user/XWA-3H0ekYY )
here is the output from hdfs dfsadmin -report
Configured Capacity: 95120474112 (88.59 GB)
Present Capacity: 95120474112 (88.59 GB)
DFS Remaining: 95039008768 (88.51 GB)
DFS Used: 81465344 (77.69 MB)
DFS Used%: 0.09%
Under replicated blocks: 177
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)
Live datanodes:
Name: privateip:port (node01)
Hostname: node01
Rack: /default
Decommission Status : Normal
Configured Capacity: 95120474112 (88.59 GB)
DFS Used: 81465344 (77.69 MB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 95039008768 (88.51 GB)
DFS Used%: 0.09%
DFS Remaining%: 99.91%
Last contact: Sun Jan 27 03:01:53 UTC 2013

I've resolved the problem-
I was connecting to hadoop from my home machine not on the hadoop local network. Apparently when you do this, the namenode tells my home machine to write to the datanode using the datanode's private IP. Not being on the same network, my home machine coulnd't connect to the datanode creating this error.
I resolved the problem by creating a VPN connection from my home network to the hadoop network and now everything work.

Related

hadoop + Blocks with corrupt replicas

we have HDP cluster version 2.6.4 with ambari platform
from ambari dashboard we can see Blocks with corrupt replicas with 1
and also from
$ hdfs dfsadmin -report
Configured Capacity: 57734285504512 (52.51 TB)
Present Capacity: 55002945909856 (50.02 TB)
DFS Remaining: 29594344477833 (26.92 TB)
DFS Used: 25408601432023 (23.11 TB)
DFS Used%: 46.19%
Under replicated blocks: 0
Blocks with corrupt replicas: 1 <-----------------
Missing blocks: 0
Missing blocks (with replication factor 1): 0
in order to find the corrupted file we do the following
$ hdfs fsck -list-corruptfileblocks
Connecting to namenode via http://master.sys76.com:50070/fsck?ugi=hdfs&listcorruptfileblocks=1&path=%2F
The filesystem under path '/' has 0 CORRUPT files
but as we can see above we not found the file
also we did the following in order to delete the corrupted file
hdfs fsck / -delete
but still Blocks with corrupt replicas still remain with 1
any suggestions?
Consider that replicas and blocks are two different concepts.
Try using command:
hdfs fsck / | egrep -v '^\.+'
to find out more about unusual blocks.

HDFS with Talend

I am trying to put a csv file from my local windows machine to HDFS using Talend 6.3
I have 4 node cluster ( all linux servers from azure):
1 Namenode and 3 Datanodes.
I am getting following error while running:
"Exception in component tHDFSPut_1
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/root/5/source.csv could only be replicated to 0 nodes instead of minReplication (=1). There are 3 datanode(s) running and 3 node(s) are excluded in this operation."
File is getting created in HDFS but it is empty file.
Note: I am writing my Namenode IP address in NameNode URL
I tried same with sandbox. Same error was there.
Update:
I created a complete new cluster(1 namenode and 3 datanodes). Still I am getting same error.
Everything is running and is is in green (in Ambari). and I am able to put file through hadoop fs -put. Might be some problem in talend?

Unable to load large file to HDFS on Spark cluster master node

I have fired up a Spark Cluster on Amazon EC2 containing 1 master node and 2 servant nodes that have 2.7gb of memory each
However when I tried to put a file of 3 gb on to the HDFS through the code below
/root/ephemeral-hdfs/bin/hadoop fs -put /root/spark/2GB.bin 2GB.bin
it returns the error, "/user/root/2GB.bin could only be replicated to 0 nodes, instead of 1". fyi, I am able to upload files of smaller size but not when it exceeds a certain size (about 2.2 gb).
If the file exceeds the memory size of a node, wouldn't it will be split by Hadoop to the other node?
Edit: Summary of my understanding of the issue you are facing:
1) Total HDFS free size is 5.32 GB
2) HDFS free size on each node is 2.6GB
Note: You have bad blocks (4 Blocks with corrupt replicas)
The following Q&A mentions similar issues:
Hadoop put command throws - could only be replicated to 0 nodes, instead of 1
In that case, running JPS showed that the datanode are down.
Those Q&A suggest a way to restart the data-node:
What is best way to start and stop hadoop ecosystem, with command line?
Hadoop - Restart datanode and tasktracker
Please try to restart your data-node, and let us know if it solved the problem.
When using HDFS - you have one shared file system
i.e. all nodes share the same file system
From your description - the current free space on the HDFS is about 2.2GB , while you tries to put there 3GB.
Execute the following command to get the HDFS free size:
hdfs dfs -df -h
hdfs dfsadmin -report
or (for older versions of HDFS)
hadoop fs -df -h
hadoop dfsadmin -report

Setting fs.default.name in core-site.xml Sets HDFS to Safemode

I installed the Cloudera CDH4 distribution on a single machine in pseudo-distributed mode and successfully tested that it was working correctly (e.g. can run MapReduce programs, insert data on the Hive server, etc.) However, if I chance the core-site.xml file to have fs.default.name set to machine name rather than localhost and restart the NameNode service, the HDFS enters safe-mode.
Before the change of fs.default.name, I ran the following to check the state of the HDFS:
$ hadoop dfsadmin -report
...
Configured Capacity: 18503614464 (17.23 GB)
Present Capacity: 13794557952 (12.85 GB)
DFS Remaining: 13790785536 (12.84 GB)
DFS Used: 3772416 (3.60 MB)
DFS Used%: 0.03%
Under replicated blocks: 2
Blocks with corrupt replicas: 0
Missing blocks: 0
Then I made the modification to core-site.xml (with the machine name being hadoop):
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop:8020</value>
</property>
I restarted the service and reran the report.
$ sudo service hadoop-hdfs-namenode restart
$ hadoop dfsadmin -report
...
Safe mode is ON
Configured Capacity: 0 (0 B)
Present Capacity: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used: 0 (0 B)
DFS Used%: NaN%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
An interesting note is that I can still perform some HDFS commands. For example, I can run
$ hadoop fs -ls /tmp
However, if I try to read a file using hadoop fs -cat or try to place a file in the HDFS, I am told the NameNode is in safemode.
$ hadoop fs -put somefile .
put: Cannot create file/user/hadinstall/somefile._COPYING_. Name node is in safe mode.
The reason I need the fs.default.name to be set to the machine name is because I need to communicate with this machine on port 8020 (the default NameNode port). If fs.default.name is left to localhost, then the NameNode service will not listen to external connection requests.
I am at a loss as to why this is happening and would appreciate any help.
The issue stemmed from domain name resolution. The /etc/hosts file needed to be modified to point the IP address of the machine of the hadoop machine for both localhost and the fully qualified domain name.
192.168.0.201 hadoop.fully.qualified.domain.com localhost
Safemode is an HDFS state in which the file system is mounted read-only; no replication is performed, nor can files be created or deleted. Filesystem operations that access the filesystem metadata like 'ls' in you case will work.
The Namenode can be manually forced to leave safemode with this command( $ hadoop dfsadmin -safemode leave).Verify status of safemode with ( $ hadoop dfsadmin -safemode get)and then run dfsadmin report to see if it shows data.If after getting out of safe mode the report still dose not show any data then i'm suspecting communication between namenode and datanode is not hapenning. Check namenode and datanode logs after this step.
The next steps could be to try restarting datanode process and last resort will be to format namenode which will result in loss of data.

Hadoop: Datanodes available: 0 (0 total, 0 dead)

Each time I run:
hadoop dfsadmin -report
I get the following output:
Configured Capacity: 0 (0 KB)
Present Capacity: 0 (0 KB)
DFS Remaining: 0 (0 KB)
DFS Used: 0 (0 KB)
DFS Used%: �%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 0 (0 total, 0 dead)
There is no data directory in my dfs/ folder.
A lock file exists in this folder: in_use.lock
The master, job tracker and data nodes are running fine.
Please check the datanode logs . It will log errors when it is unable to report to namenode . If you post the those errors , people will be able to help ..
I had exactly same problem and when I checked datanodes logs, there were lots of could not connect to master:9000, and when I checked ports on master via netstat -ntlp I had this in output:
tcp 0 0 127.0.1.1:9000 ...
I realized that I should change my master machine name or change master in all configs. I decided to do the first cause it seems much easier.
so I modified /etc/hosts and changed 127.0.1.1 master to 127.0.1.1 master-machine and added an entry at the end of the file like this:
192.168.1.1 master
Then I changed master to master-machine in /etc/hostname and restart the machine.
The problem was gone.
um...
Did you check firewall?
When i use hadoop, I turn off firewall (iptables -F, in the all nodes)
and then try again.
It has happened to us, when we restarted the cluster. But after a while, the datanodes were automatically detected. Could be possibly because of block report delay time property.
Usually there are errors of namespace id issues in the datanode.
So delete the name dir from master and delete the data dir from the datanodes.
Now format the datanode and try start-dfs.
The report usually takes some time to reflect all the datanodes.
Even I was getting 0 datanodes, but after some time master detects the slaves.
I had the same problem and I just solved it.
/etc/hosts of all nodes should look like this:
127.0.0.1 localhost
xxx.xxx.xxx.xxx master
xxx.xxx.xxx.xxx slave-1
xxx.xxx.xxx.xxx slave-2
Just resolved the issue by following below steps -
Make sure the IP addresses for master and slave nodes are correct in /etc/hosts file
Unless you really need the data, stop-dfs.sh, delete all data directories in master/slave nodes, then run hdfs namenode -format and start-dfs.sh. This should recreate the hdfs and fix the issue
Just formatting the namenode didn't work for me. So I checked the logs at $HADOOP_HOME/logs. In secondarynamenode, I found this error:
ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint
java.io.IOException: Inconsistent checkpoint fields.
LV = -64 namespaceID = 2095041698 cTime = 1552034190786 ; clusterId = CID-db399b3f-0a68-47bf-b798-74ed4f5be097 ; blockpoolId = BP-31586866-127.0.1.1-1552034190786.
Expecting respectively: -64; 711453560; 1550608888831; CID-db399b3f-0a68-47bf-b798-74ed4f5be097; BP-2041548842-127.0.1.1-1550608888831.
at org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:143)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:550)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:360)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:325)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:482)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:321)
at java.lang.Thread.run(Thread.java:748)
So I stopped hadoop and then specifically formatted the given cluster id:
hdfs namenode -format -clusterId CID-db399b3f-0a68-47bf-b798-74ed4f5be097
This solved the problem.
There's another obscure reason this could happen as well: Your datanode did not start properly, but everything else was working.
In my case, when going through the log, I found that the bound port, 510010, was already in use by SideSync (for MacOS). I found this through
sudo lsof -iTCP -n -P|grep 0010,
But you can use similar techniques to determine what might have already taken your well known data node port.
Killing this off and restarting fixed the problem.
Additionally, if you've installed Hadoop/Yarn as root, but have data dirs in individual home directories, and then try to run it as an individual user, you'll have to make the data node directory public.

Resources