hadoop namenode not able to connect to datanode after restart - hadoop

I have a 4-node hadoop cluster. It is running fine.
I temporarily stop hadoop cluster using:
stop-dfs.sh
stop-yarn.sh
When I restart it using:
start-dfs.sh
start-yarn.sh
all jps(hadoop processes) are running fine on all the 4 nodes,
but it is showing the following error log on running map-reduce job:
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/hduser4/QuasiMonteCarlo_14 21995096158_792628557/in/part0 could only be replicated to 0 nodes instead of minReplication (=1).
There are 0 datanode(s) running and no node(s) are excluded in this operation.
An option for the problem above is to reformat HDFS, but it will delete my existing data, which is not a proper solution for a production system.

Related

Adding a node to hadoop cluster without restarting master

i have created a hadoop cluster and wanted to add a new node node in the cluster running as a slave without restarting the master node
how can this be acheived
Datanodes and nodemanagers can be added without restarting the namenode(s) or resource manager(s).
More specifically, these need to be ran on the machines of those running services
Namenode
hdfs dfsadmin -refreshNodes
ResourceManager
rmadmin -refreshNodes

Not able to put file on HDFS

I am having CDH virtual box running on my windows 10. I am running simple talend job, which has only component to put file on HDFS (tHDFSPut) from windows to HDFS which is located in virtual box. But when I run the job the file is created on HDFS but it is empty.
I am getting following error,
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/cloudera/test/Input.xml could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1716)
I checked, namenode is healthy and it is having sufficient disk space available, also I can able to access it outside the virtual box using localhost:50070
Need help!

HDFS with Talend

I am trying to put a csv file from my local windows machine to HDFS using Talend 6.3
I have 4 node cluster ( all linux servers from azure):
1 Namenode and 3 Datanodes.
I am getting following error while running:
"Exception in component tHDFSPut_1
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/root/5/source.csv could only be replicated to 0 nodes instead of minReplication (=1). There are 3 datanode(s) running and 3 node(s) are excluded in this operation."
File is getting created in HDFS but it is empty file.
Note: I am writing my Namenode IP address in NameNode URL
I tried same with sandbox. Same error was there.
Update:
I created a complete new cluster(1 namenode and 3 datanodes). Still I am getting same error.
Everything is running and is is in green (in Ambari). and I am able to put file through hadoop fs -put. Might be some problem in talend?

Can't put files into HDFS

I'm trying to set up a hadoop multi-node cluster and I'm getting the following problem.
I have one node as master and another one as slave.
It seems that everything is all right because when I execute {jps} I get this processes for the master:
{
29983 SecondaryNameNode
30596 Jps
29671 NameNode
30142 ResourceManager
}
And this ones for the slave:
{
18096 NodeManager
17847 DataNode
18197 Jps
}
Unfortunately, when I try -put command, I get this error:
hduser#master:/usr/local/hadoop/bin$ ./hdfs dfs -put /home/hduser/Ejemplos/fichero /Ejemplos/
14/03/24 12:49:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/03/24 12:49:07 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /Ejemplos/fichero.COPYING could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
When I go to the WebUI, there are 0 live nodes and I don't know why!
I can't fix this error and I would appreciate some help!
File /Ejemplos/fichero.COPYING could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
the above error means, data nodes are either down or unable to properly communicate with namenode. You can check for the configuration which you specified in hdfs-site.xml, core-site.xml.
I had similar issue and solved it like this:
stop hadoop (stop-dfs.sh and stop-yarn.sh)
manually delete dfs/namenode and dfs/datanode directories
format namenode (hdfs namenode -format)
start hadoop (start-dfs.sh and start-yarn.sh)
There might be other issues like lack of disk space or slaves not configured in $HADOOP_HOME/etc/hadoop (this is actually configured to localhost by default.)
You will want to check the log files of your data node (slave) for errors in your set up. If you run cloudera CDH, you'll find these in /var/log/hadoop-hdfs, otherwise in the directory specified in your config.
The error "could only be replicated to 0 nodes" points to a problem there.
Also make sure that slave and master can connect via ssh with key authentication.
Just a quick question: you did format your namenode?
you are using as
./hdfs dfs -put....
Try
hadoop fs -put LocalFile.name /username/
or
hadoop fs -copyFromLocal LocalFile.name /username/

How to remove a hadoop node from DFS but not from Mapred?

I am fairly new to hadoop. For running some benchmarks, I need variety of hadoop configuration for comparison.
I want to know a method to remove a hadoop slave from DFS (not running datanode daemon anymore) but not from Mapred (keep running tasktracker), or vice-versa.
AFAIK, there is a single slave file for such hadoop nodes and not separate slave files for DFS and Mapred.
Currently, I am trying to start both DFS and Mapred on the slave node , and then killing datanode on the slave. But it takes a while to put that node in to 'dead nodes' on HDFS GUI. Any parameter can be tuned to make this timeout quicker ?
Thankssss
Try using dfs.hosts and dfs.hosts.exclude in the hdfs-site.xml, mapred.hosts and mapred.hosts.exclude in mapred-site.xml. These are for allowing/excluding hosts to connect to the NameNode and the JobTracker.
Once the list of nodes in the files has been updated appropriately, the NameNode and the JobTracker have to be refreshed using the hadoop dfsadmin -refreshNodes and hadoop mradmin -refreshNodes command respectively.
Instead of using slaves file to start all processes on your cluster, you can start only required daemons on each machine if you have few nodes.

Resources