Not able to put file on HDFS - hadoop

I am having CDH virtual box running on my windows 10. I am running simple talend job, which has only component to put file on HDFS (tHDFSPut) from windows to HDFS which is located in virtual box. But when I run the job the file is created on HDFS but it is empty.
I am getting following error,
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/cloudera/test/Input.xml could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1716)
I checked, namenode is healthy and it is having sufficient disk space available, also I can able to access it outside the virtual box using localhost:50070
Need help!

Related

Unable to archive HDFS data

I am trying to archive HDFS data into a folder located at the path /archives using the command given below:
hadoop archive -archiveName zoo3.har -p /temp/test -r 3 /archives
But I get the following error:
File /archives/zoo3.har/_temporary/1/_temporary/attempt_1498217315045_0006_m_000000_2/part-0
could only be replicated to 0 nodes instead of minReplication (=1).
There are 6 datanode(s) running and no node(s) are excluded in this
operation.
Cluster Configuration:
6 Data Nodes (all active)
Each with 5 TB disk and 64 GB RAM
HDP - 2.4

HDFS with Talend

I am trying to put a csv file from my local windows machine to HDFS using Talend 6.3
I have 4 node cluster ( all linux servers from azure):
1 Namenode and 3 Datanodes.
I am getting following error while running:
"Exception in component tHDFSPut_1
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/root/5/source.csv could only be replicated to 0 nodes instead of minReplication (=1). There are 3 datanode(s) running and 3 node(s) are excluded in this operation."
File is getting created in HDFS but it is empty file.
Note: I am writing my Namenode IP address in NameNode URL
I tried same with sandbox. Same error was there.
Update:
I created a complete new cluster(1 namenode and 3 datanodes). Still I am getting same error.
Everything is running and is is in green (in Ambari). and I am able to put file through hadoop fs -put. Might be some problem in talend?

hadoop namenode not able to connect to datanode after restart

I have a 4-node hadoop cluster. It is running fine.
I temporarily stop hadoop cluster using:
stop-dfs.sh
stop-yarn.sh
When I restart it using:
start-dfs.sh
start-yarn.sh
all jps(hadoop processes) are running fine on all the 4 nodes,
but it is showing the following error log on running map-reduce job:
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/hduser4/QuasiMonteCarlo_14 21995096158_792628557/in/part0 could only be replicated to 0 nodes instead of minReplication (=1).
There are 0 datanode(s) running and no node(s) are excluded in this operation.
An option for the problem above is to reformat HDFS, but it will delete my existing data, which is not a proper solution for a production system.

Can't put files into HDFS

I'm trying to set up a hadoop multi-node cluster and I'm getting the following problem.
I have one node as master and another one as slave.
It seems that everything is all right because when I execute {jps} I get this processes for the master:
{
29983 SecondaryNameNode
30596 Jps
29671 NameNode
30142 ResourceManager
}
And this ones for the slave:
{
18096 NodeManager
17847 DataNode
18197 Jps
}
Unfortunately, when I try -put command, I get this error:
hduser#master:/usr/local/hadoop/bin$ ./hdfs dfs -put /home/hduser/Ejemplos/fichero /Ejemplos/
14/03/24 12:49:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/03/24 12:49:07 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /Ejemplos/fichero.COPYING could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
When I go to the WebUI, there are 0 live nodes and I don't know why!
I can't fix this error and I would appreciate some help!
File /Ejemplos/fichero.COPYING could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
the above error means, data nodes are either down or unable to properly communicate with namenode. You can check for the configuration which you specified in hdfs-site.xml, core-site.xml.
I had similar issue and solved it like this:
stop hadoop (stop-dfs.sh and stop-yarn.sh)
manually delete dfs/namenode and dfs/datanode directories
format namenode (hdfs namenode -format)
start hadoop (start-dfs.sh and start-yarn.sh)
There might be other issues like lack of disk space or slaves not configured in $HADOOP_HOME/etc/hadoop (this is actually configured to localhost by default.)
You will want to check the log files of your data node (slave) for errors in your set up. If you run cloudera CDH, you'll find these in /var/log/hadoop-hdfs, otherwise in the directory specified in your config.
The error "could only be replicated to 0 nodes" points to a problem there.
Also make sure that slave and master can connect via ssh with key authentication.
Just a quick question: you did format your namenode?
you are using as
./hdfs dfs -put....
Try
hadoop fs -put LocalFile.name /username/
or
hadoop fs -copyFromLocal LocalFile.name /username/

Hadoop 2.2 Add new Datanode to an existing hadoop installation

I first installed hadoop 2.2 on my machine (called Abhishek-PC) and everything worked fine. I am able to run the entire system successfully. (both namenode and datanode).
Now I created 1 VM hdclient1 and I want to add this VM as a data node.
Here are the steps which I have followed
I setup SSH successfully and I can ssh into hdclient1 without a password and I can login from hdclient1 into my main machine without a password.
I setup hadoop 2.2 on this VM and I modified the configuration files as per many tutorials on the web. Here are my configuration files
Name Node configuration
https://drive.google.com/file/d/0B0dV2NMSGYPXdEM1WmRqVG5uYlU/edit?usp=sharing
Data Node configuration
https://drive.google.com/file/d/0B0dV2NMSGYPXRnh3YUo1X2Frams/edit?usp=sharing
Now when I start start-dfs.sh on my first machine, I can see that DataNode starts successfully on hdclient1. Here is a screenshot from my hadoop console.
https://drive.google.com/file/d/0B0dV2NMSGYPXOEJ3UV9SV1d5bjQ/edit?usp=sharing
As you can see both the machines appear in my cluster (main main and data node).
Although both are called "localhost" for some strange reason.
I can see that the logs are being created on hdclient1in those logs there are no exceptions.
here are the logs from the name node
https://drive.google.com/file/d/0B0dV2NMSGYPXM0dZTWVRUWlGaDg/edit?usp=sharing
Here are the logs from the data node
https://drive.google.com/file/d/0B0dV2NMSGYPXNV9wVmZEcUtKVXc/edit?usp=sharing
I can login to the namenode UI successfully http://Abhishek-PC:50070
but here the UI in the live nodes it says only 1 live node and there is no mention of hdclient1.
https://drive.google.com/file/d/0B0dV2NMSGYPXZmMwM09YQlI4RzQ/edit?usp=sharing
I can create a directory in hdfs successfully hadoop fs -mkdir /small
From the datanode I can see that this directory has been created by using this command hadoop fs -ls /
Now when I try to add a file to my HDFS and I say
hadoop fs -copyFromLocal ~/Downloads/book/war_and_peace.txt /small
i get an error message
abhishek#Abhishek-PC:~$ hadoop fs -copyFromLocal
~/Downloads/book/war_and_peace.txt /small 14/01/04 20:07:41 WARN
util.NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable 14/01/04
20:07:41 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/small/war_and_peace.txt.COPYING could only be replicated to 0 nodes
instead of minReplication (=1). There are 1 datanode(s) running and
no node(s) are excluded in this operation. at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
So my question is What am I doing wrong here? Why do I get this exception when I try to copy the file into HDFS?
We have a 3-node cluster (all physical boxes) that's been working great for a couple of months. This article helped me the most to setup.

Resources