HBase is not working in Hadoop 2.2.0 - hadoop

I am trying to install hbase-0.96.0-hadoop2 on Hadoop 2.2.0. While I am trying to start my HBase. HBase is giving following error.
master: log4j:ERROR Could not find value for key log4j.appender.DRFAS
master: log4j:ERROR Could not instantiate appender named "DRFAS".
log4j:ERROR Could not find value for key log4j.appender.DRFAS
log4j:ERROR Could not instantiate appender named "DRFAS".
When I am doing JPS Linux is showing following processes:
17422 JobHistoryServer
11461 NameNode
31375 Jps
12127 ResourceManager
11671 DataNode
30077 HRegionServer
12344 NodeManager
11935 SecondaryNameNode
30948 HQuorumPeer
Here is my hbase-site.xml configuraiton:
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:9000/hbase</value>
<description>The directory shared by RegionServers.
</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>The mode the cluster will be in. Possible values are
false: standalone and pseudo-distributed setups with managed Zookeeper
true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
</description>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>master</value>
</property>
<property>
<name>zookeeper.znode.parent</name>
<value>/master</value>
</property>
</configuration>

Try these two methods .
Stop your hbase demon and clear the hbase log files which was located
in /tmp/ folder delete all files which had name hbase in it
after deleting disconnect your machine from internet and try to
start the hbase demon now.
Hbase has this weird issue in some x64 ubuntu machines disconnecting from internet will help in resolving this issue,after startup you can connect to the internet.
now try to access hbase from cli
bin/hbase

Related

Fatal error of "failed to become active master" while running hbase in cluster mode

I have 4 nodes, one master and 3 slaves.
master: * .* .*.18, slaves: * .*. *.12, 104, 36.
Configurations for Hadoop on Namenode:
core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/hduser/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/hduser/hadoop_store/hdfs/datanode</value>
</property>
</configuration>
hadoop-env.sh:
export JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64"
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
export HADOOP_PID_DIR=${HADOOP_PID_DIR} // default to /tmp
export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR}
export HADOOP_IDENT_STRING=$USER
mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
</property>
</configuration>
slaves:
10.0.3.12
10.0.3.36
10.0.3.104
yarn-site.xml:
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8050</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
In the slave nodes the configurations for hadoop are:
yarn-site.xml:
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.address</name>
<value>10.0.3.18:8050</value>
</property>
<property>
<name>yarn.nodemanager.address</name>
<value>localhost:8035</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
the rest of the files remain the same in all the slave nodes as in the master node. With respect to the Hbase configuration,
hbase-env.sh(in all):
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m -XX:ReservedCodeCacheSize=256m"
export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m -XX:ReservedCodeCacheSize=256m"
export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers
export HBASE_MANAGES_ZK=true
hbase-site.xml(in all):
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>10.0.3.18,10.0.3.12,10.0.3.104,10.0.3.36</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hduser/Downloads/hbase/zookeeper</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>zookeeper.session.timeout</name>
<value>1200000</value>
</property>
<property>
<name>hbase.zookeeper.property.tickTime</name>
<value>6000</value>
</property>
</configuration>
except that in slaves, localhost is changed to 10.0.3.18(address of namenode)
regionservers:
10.0.3.12
10.0.3.104
10.0.3.36
I formatted namenode and when I start hdfs with commands: start-dfs.sh and start-yarn.sh, output is as follows:
...succefully formatted namenode...
localhost: starting namenode, logging to /home/hduser/Downloads/hadoop/logs/hadoop-hduser-namenode-saichanda-OptiPlex-9020.out
10.0.3.12: starting datanode, logging to /home/hduser/Downloads/hadoop/logs/hadoop-hduser-datanode-aaron.out
10.0.3.36: starting datanode, logging to /home/hduser/Downloads/hadoop/logs/hadoop-hduser-datanode-dmacs-OptiPlex-9020.out
10.0.3.104: starting datanode, logging to /home/hduser/Downloads/hadoop/logs/hadoop-hduser-datanode-hadoop-104.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/hduser/Downloads/hadoop/logs/hadoop-hduser-secondarynamenode-saichanda-OptiPlex-9020.out
starting yarn daemons
starting resourcemanager, logging to /home/hduser/Downloads/hadoop/logs/yarn-hduser-resourcemanager-saichanda-OptiPlex-9020.out
10.0.3.12: starting nodemanager, logging to /home/hduser/Downloads/hadoop/logs/yarn-hduser-nodemanager-aaron.out
10.0.3.36: starting nodemanager, logging to /home/hduser/Downloads/hadoop/logs/yarn-hduser-nodemanager-dmacs-OptiPlex-9020.out
10.0.3.104: starting nodemanager, logging to /home/hduser/Downloads/hadoop/logs/yarn-hduser-nodemanager-hadoop-104.out
when I run jps command (on master):
28032 SecondaryNameNode
28481 Jps
28198 ResourceManager
27720 NameNode
when I run jps command (on slaves):
11303 DataNode
11595 Jps
11436 NodeManager
Then I started Hbase with the command: ./start-hbase.sh. output is:
10.0.3.12: running zookeeper, logging to /home/hduser/Downloads/hbase/bin/../logs/hbase-hduser-zookeeper-aaron.out
10.0.3.36: running zookeeper, logging to /home/hduser/Downloads/hbase/bin/../logs/hbase-hduser-zookeeper-dmacs-OptiPlex-9020.out
10.0.3.104: running zookeeper, logging to /home/hduser/Downloads/hbase/bin/../logs/hbase-hduser-zookeeper-hadoop-104.out
10.0.3.18: running zookeeper, logging to /home/hduser/Downloads/hbase/bin/../logs/hbase-hduser-zookeeper-saichanda-OptiPlex-9020.out
running master, logging to /home/hduser/Downloads/hbase/logs/hbase-hduser-master-saichanda-OptiPlex-9020.out
OpenJDK 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
10.0.3.12: running regionserver, logging to /home/hduser/Downloads/hbase/bin/../logs/hbase-hduser-regionserver-aaron.out
10.0.3.36: running regionserver, logging to /home/hduser/Downloads/hbase/bin/../logs/hbase-hduser-regionserver-dmacs-OptiPlex-9020.out
10.0.3.104: running regionserver, logging to /home/hduser/Downloads/hbase/bin/../logs/hbase-hduser-regionserver-hadoop-104.out
10.0.3.12: OpenJDK 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
10.0.3.12: OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
10.0.3.36: OpenJDK 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
10.0.3.36: OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
10.0.3.104: OpenJDK 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
10.0.3.104: OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
when I run jps on namenode:
28032 SecondaryNameNode
28821 HQuorumPeer
29126 Jps
28198 ResourceManager
27720 NameNode
when I run jps on slaves:
11776 HRegionServer
11669 HQuorumPeer
11303 DataNode
11899 Jps
11436 NodeManager
What I observed was that HMaster is not running on the namenode. Can anyone help understand the problem why HMaster is crashing out. After sometime even NodeManager crashes out in the slaves. Also I observed that When I shutdown hbase, HRegionservers on the slaves donot go down, they continue to be running even after I give stop-hbase.sh command in the master node. Key warnings and errors observed in My logs are as follows.
hadoop-namenode.log: multiple times I get this Exception...
java.io.IOException: File /hbase/.tmp/hbase.version could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
hadoop-secondary-namenode.log: multiple times I get this ERROR...
ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint
java.io.IOException: Inconsistent checkpoint fields.
No error found in yarn-resourcemanager.log.
For hbase logs: in hbase-master.log:
FATAL [saichanda-OptiPlex-9020:16000.activeMasterManager] master.HMaster: Failed to become active master
File /hbase/.tmp/hbase.version could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
In hbase-zookeeper.log: I see this line, as such no errors were there in the log.
019-01-29 10:09:49,431 INFO [main] server.NIOServerCnxnFactory: binding to port 0.0.0.0/0.0.0.0:2181
on one of the slaves, regionserver.log:
client.ZooKeeperRegistry: ClusterId read in ZooKeeper is null
on one of the slaves, hadoop-datanode.log gives multiple times the following warning.
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: localhost/127.0.0.1:9000
AMONG ALL THE ABOVE WARNINGS AND ERRORS, I FEEL THE ERROR PERTAINING TO HBASE-MASTER.LOG SEEMS CRITICAL, WHERE IT SAYS, replicated to 0 nodes instead of minReplication (=1). Please help me solve this issue.
Also, when I finally run the hbase shell, I get the error:
ERROR: Can't get master address from ZooKeeper; znode data == null
Thank you.

httpfs error Operation category READ is not supported in state standby

I am working on hadoop apache 2.7.1 and I have a cluster that consists of 3 nodes
nn1
nn2
dn1
nn1 is the dfs.default.name, so it is the master name node.
I have installed httpfs and started it of course after restarting all the services. When nn1 is active and nn2 is standby I can send this request
http://nn1:14000/webhdfs/v1/aloosh/oula.txt?op=open&user.name=root
from my browser and a dialog of open or save for this file appears, but when I kill the name node running on nn1 and start it again as normal then because of high availability nn1 becomes standby and nn2 becomes active.
So here httpfs should work, even if nn1 becomes stand by, but sending the same request now
http://nn1:14000/webhdfs/v1/aloosh/oula.txt?op=open&user.name=root
gives me the error
{"RemoteException":{"message":"Operation category READ is not supported in state standby","exception":"RemoteException","javaClassName":"org.apache.hadoop.ipc.RemoteException"}}
Shouldn't httpfs overcome nn1 standby status and bring the file? Is that because of a wrong configuration, or is there any other reason?
My core-site.xml is
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
It looks like HttpFs is not High Availability aware yet. This could be due to the missing configurations required for the Clients to connect with the current Active Namenode.
Ensure the fs.defaultFS property in core-site.xml is configured with the correct nameservice ID.
If you have the below in hdfs-site.xml
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
then in core-site.xml, it should be
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
Also configure the name of the Java class which will be used by the DFS Client to determine which NameNode is the currently Active and is serving client requests.
Add this property to hdfs-site.xml
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
Restart the Namenodes and HttpFs after adding the properties in all nodes.

Namenode daemon not starting properly

I have just started learning hadoop from the book Hadoop: The definitive guide.
I followed the tutorial for Hadoop installation in Pseudodistribution mode. I enabled the passwordless login to ssh.
Formatted the hdfs filesystem before using it for the first time. It started successfully for the first time.
After that I copied a text file using copyFromLocal to HDFS and everything went fine. But if I restart the system and start the daemons again and look at the web UI , only YARN is started successfully.
When I issue the stop-dfs.sh commmand I get
Stopping namenodes on [localhost]
localhost: no namenode to stop
localhost: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
If I format the hdfs file system again and then try starting the daemons then they all start successfully.
Here are my configuration files.Exactly as what is told in hadoop definitive guide book.
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost/</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
This is the error in the namenode log file
WARN org.apache.hadoop.hdfs.server.common.Storage: Storage directory /tmp/hadoop/dfs/name does not exist
WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /tmp/hadoop/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:327)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:215)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:975)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:585)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:645)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:812)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:796)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1493)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559)
This is from mapred log
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
at org.apache.hadoop.ipc.Client.call(Client.java:1451)
... 33 more
I visited apache hadoop : connection refused which says
Check that there isn't an entry for your hostname mapped to 127.0.0.1 or 127.0.1.1 in /etc/hosts (Ubuntu is notorious for this).
I found there is an entry in my /etc/hosts, but if I remove it my sudo breaks causing error sudo: unable to resolve host . What should I append in /etc/hosts if not remove my hostname mapped to 127.0.1.1
I cannot understand what is the root cause of this problem.
Well it says in your Namenode log file that default storage of your namenode directory is /tmp/hadoop. The /tmp directory is formatted in linux on reboot by some systems. So it must be the problem.
You need to change your default namenode and datanode directory by changing your hdfs-site.xml configuration file.
Add this in your hdfs-site.xml
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/"your-user-name"/hadoop</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/"your-user-name"/datanode</value>
</property>
After this format your namenode by hdfs namenode -format command.
I think this will end your problem.
If configuration file is not a problem, please try following:
1.first delete all contents from temporary folder:
rm -Rf <tmp dir> (my was /usr/local/hadoop/tmp)
2.format the namenode:
bin/hadoop namenode -format
3.start all processes again:
bin/start-all.sh

Hadoop can not start the thrift server, and Hue can't communicate with Hadoop namenode and datanode

I installed the hadoop CDH3u6 on 3 machines, but when I start the hadoop, I checked the namenode log, and find:
2014-06-22 13:58:39,535 WARN org.apache.hadoop.util.PluginDispatcher: Unable to load dfs.namenode.plugins plugins
So the hadoop thrift server can not start! and the Hue give an exception:
Exception communicating with HDFS Namenode HUE Plugin at x.x.x.x:50903: Could not connect to x.x.x.x:50903
My hadoop configs are as follows:
1. hdfs-site.xml
<property>
<name>dfs.namenode.plugins</name>
<value>org.apache.hadoop.thriftfs.NamenodePlugin</value>
<description>Comma-separated list of namenode plugins to be activated.
</description>
</property>
<property>
<name>dfs.datanode.plugins</name>
<value>org.apache.hadoop.thriftfs.DatanodePlugin</value>
<description>Comma-separated list of datanode plugins to be activated.
</description>
</property>
<property>
<name>dfs.thrift.address</name>
<value>0.0.0.0:50903</value>
</property>
Is the plugin jar installed? e.g. /usr/lib/hadoop/lib/hue-plugins-2.2.0-SNAPSHOT.jar
Could you list the hadoop packages?

Installing Hadoop on NFS

As a start, I've installed Hadoop (0.15.2) and setup a cluster of 3 nodes: one each for NameNode, DataNode and the JobTracker. All the daemons are up and running. But when I issue any command I get the above error. For instance, when I do a copyFromLocal, I get the following error:
Am I missing something?
More details:
I am trying to install Hadoop on an NFS file system. I've installed 1.0.4 version and tried running it but to of no avail. The 1.0.4 version doesn't start the datanode. And the log files for the datanode are empty. Hence I switched back to 0.15 version which started all the daemons atleast.
I believe the problem is due to the underlying NFS file system i.e. all the datanodes and masters using the same files and folders. But I am not sure if that is actually the case.
But I don't see any reason why I shouldn't be able to run Hadoop on NFS (after appropriately setting the configuration parameters).
Currently I am trying and figuring out if I could set the name and data directories differently for different machines based on the individual machine names.
Configuration file: (hadoop-site.xml)
<property>
<name>fs.default.name</name>
<value>mumble-12.cs.wisc.edu:9001</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>mumble-13.cs.wisc.edu:9001</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.secondary.info.port</name>
<value>9002</value>
</property>
<property>
<name>dfs.info.port</name>
<value>9003</value>
</property>
<property>
<name>mapred.job.tracker.info.port</name>
<value>9004</value>
</property>
<property>
<name>tasktracker.http.port</name>
<value>9005</value>
</property>
Error using Hadoop 1.0.4 (DataNode doesn't get started):
2013-04-22 18:50:50,438 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 9001, call addBlock(/tmp/hadoop-akshar/mapred/system/jobtracker.info, DFSClient_502734479, null) from 128.105.112.13:37204: error: java.io.IOException: File /tmp/hadoop-akshar/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
java.io.IOException: File /tmp/hadoop-akshar/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
Error using Hadoop 0.15.2:
[akshar#mumble-12] (38)$ bin/hadoop fs -copyFromLocal lib/junit-3.8.1.LICENSE.txt input
13/04/17 03:22:11 WARN fs.DFSClient: Error while writing.
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:189)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.net.SocketInputStream.read(SocketInputStream.java:203)
at java.io.DataInputStream.readShort(DataInputStream.java:312)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1660)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1733)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:55)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:83)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:140)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:826)
at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:120)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:1360)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:1478)
13/04/17 03:22:12 WARN fs.DFSClient: Error while writing.
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:189)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.net.SocketInputStream.read(SocketInputStream.java:203)
at java.io.DataInputStream.readShort(DataInputStream.java:312)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1660)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1733)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:55)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:83)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:140)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:826)
at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:120)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:1360)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:1478)
13/04/17 03:22:12 WARN fs.DFSClient: Error while writing.
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:189)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.net.SocketInputStream.read(SocketInputStream.java:203)
at java.io.DataInputStream.readShort(DataInputStream.java:312)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1660)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1733)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:55)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:83)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:140)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:826)
at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:120)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:1360)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:1478)
copyFromLocal: Connection reset
I was able to get Hadoop to run over NFS using version 1.1.2. It might work for other versions, but I can't guarantee anything.
If you have an NFS file system then each node should have access to the filesystem. The fs.default.name tells Hadoop the filesystem URI to use, so it should be pointed to the local disk. I'll assume that your NFS directory is mounted to each node at /nfs.
In core-site.xml you should define:
<property>
<name>fs.default.name</name>
<value>file:///</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/nfs/tmp</value>
</property>
In mapred-site.xml you should define:
<property>
<name>mapred.job.tracker</name>
<value>node1:8021</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/tmp/mapred-local</value>
</property>
Since hadoop.tmp.dir is pointed to the nfs drive then the default locations of mapred.system.dir and mapreduce.jobtracker.staging.root.dir point to locations on the nfs drive. It might run with leaving the default value for mapred.local.dir, but it is supposed to point to the local filesystem so to be safe you can put that in /tmp.
You don't have to worry about hdfs-site.xml. This configuration file is used when you start the namenode, but with everything being distributed on the nfs drive you shouldn't run HDFS.
Now you can run start-mapred.sh on the jobtracker node and run a hadoop job. Don't run start-all.sh or start-dfs.sh because those will start HDFS. If you run multiple DataNodes that point to the same NFS directory, then one DataNode will lock that directory and the others will shutdown because they are unable to obtain a lock.
I tested the configuration with:
bin/hadoop jar hadoop-examples-1.1.2.jar wordcount /nfs/data/test.text /nfs/out
Note that you need to specify full paths to the input and output locations.
I also tried:
bin/hadoop jar hadoop-examples-1.1.2.jar grep /nfs/data/loremIpsum.txt /nfs/out2 lorem
It gave me the same output as when I run it in Standalone, so I assume it is performing correctly.
Here is more information on fs.default.name:
http://www.greenplum.com/blog/dive-in/usage-and-quirks-of-fs-default-name-in-hadoop-filesystem

Resources