I have been trying to start a MapReduce job on my cluster with the following command:
bin/hadoop jar myjar.jar MainClass /user/hduser/input /user/hduser/output
But I get the following error over and over again, until connection is refused:
13/08/08 00:37:16 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
I then checked with netstat to see if the service was listening to the correct port:
~> sudo netstat -plten | grep java
tcp 0 0 10.1.1.4:54310 0.0.0.0:* LISTEN 10022 38365 11366/java
tcp 0 0 10.1.1.4:54311 0.0.0.0:* LISTEN 10022 32164 11829/java
Now I notice that my service is listening to port 10.1.1.4:54310, which is the IP of my master, but it seems that the 'hadoop jar' command is connecting to 127.0.0.1 (the localhost, which is the same machine) but therefore doesn't find the service. Is there anyway to force 'hadoop jar' to look at 10.1.1.4 instead of 127.0.0.1?
My NameNode, DataNode, JobTracker, TaskTracker, ... are all running. I even checked for DataNode and TaskTracker on the slaves and it all seems to be working. I can check the WebUI on the master and it shows my cluster is online.
I expect the problem to be DNS related since it seems that the 'hadoop jar' command finds the correct port, but always uses the 127.0.0.1 address instead of the 10.1.1.4
UPDATE
Configuration in core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>
Configuration in mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>
Configuration in hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>
Although it seemed to be a DNS issue, it was actually Hadoop trying to resolve a reference to localhost in the code. I was deploying the jar of someone else and assumed it was correct. Upon further inspection I found the reference to localhost and changed it to master, solving my issue.
Related
I have configured Hadoop to use IP address instead of localhost but when I started it, it still posted an error that can not ssh to localhost.
hdfs-site.xml:
<configuration>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///<hdfs_path>/hdfs/datanode</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///<hdfs_path>/hdfs/namenode</value>
</property>
</configuration>
core-site.xml:
<name>fs.defaultFS</name>
<value>hdfs://<IP_address>/</value>
<description>NameNode URI</description>
When I started Hadoop, the below error appeared:
localhost: ssh_exchange_identification: Connection closed by remote host
When I tried to allow shh to localhost in our test server, the error disappeared. But problem is that in our production server, ssh to localhost is not allowed. I tried to avoid using localhost and use IP address instead, but why Hadoop still needs to ssh to localhost?
Only the start-dfs, start-yarn, or start-all shell scripts use SSH, and the SSH connection has nothing to do with the XML files.
That's not required; you can run hadoop namenode and hadoop datanode, and the YARN daemon processes directly. However, you'll still need to somehow get into a shell of each machine to run those commands, if they don't start at boot or ever fail to (re)start
Internally (i.e. internal network; private IP address-to-private IP address), I can access my HDFS just fine using:
hdfs dfs -ls hdfs://#.#.#.#/
However, when I try the same from a machine outside the network on which the HDFS namenode resides (obviously using the namenode machine's WAN IP instead of its LAN IP), I get:
ls: DestHost:destPort ec2-▒-▒-▒-▒.compute-1.amazonaws.com:8020 , LocalHost:localPort mymachine/127.0.0.1:0. Failed on local exception: java.io.IOException: Connection reset by peer
The namenode log reads:
INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8020: readAndProcess from client ▒.▒.▒.▒:▒ threw exception [java.io.IOException: Connection reset by peer]
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:377)
at org.apache.hadoop.ipc.Server.channelRead(Server.java:3486)
at org.apache.hadoop.ipc.Server.access$2600(Server.java:138)
at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:2144)
at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:1389)
at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:1245)
at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:1216)
My core-site.xml reads:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://0.0.0.0:8020</value>
</property>
</configuration>
Note that I have also tried setting the fs.defaultFS value to hdfs://#.#.#.#:8020. I have also tried setting it to hdfs://hadoophost:8020, and adding #.#.#.# hadoophost to the top of /etc/hosts. (#.#.#.# is obviously the LAN IP of the namenode's machine in both cases.) The results have been the same.
My hdfs-site.xml reads:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hadoopdata/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hadoopdata/hdfs/datanode</value>
</property>
</configuration>
Note that I am able to telnet externally to the namenode's machine on port 8020 just fine.
What setting(s) am I missing to enable external access to my Hadoop file system?
I'm trying to run a hadoop cluster via Docker. I have one virtual machine as the namenode and another for the datanode, but the datanode gives me this error running start-dfs.sh:
namenode: namenode running as process 130. Stop it first.
The command jps on the datanode does not show the namenode running. Then I try to start it by hand, using:
hadoop namenode
And it fails with this error:
java.net.BindException: Problem binding to [namenode:9000] java.net.BindException: Cannot assign requested address; For more details see: http://wiki.apache.org/hadoop/BindException
So far it seems that namenode is not accesible or is not listening on port 9000. But the network setup is correct: if I execute on datanode:
telnet namenode 9000
It correctly connects to the namenode, and the command netstat -apn | grep 9000 from namenode shows the incoming connection. If I shut down dfs on namenode (stop-dfs.sh), the telnet command from datanode fails with "Connection closed by foreign host."
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value> <!-- I have tried with 1 and 2 too -->
</property>
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://namenode:9000</value>
</property>
</configuration>
Thanks!
I'm currently using hadoop 1.2.1 (because I need to run a spatial processing software only support this version). I'm trying to deploy in multinode mode with one master and three slaves.
I'm sure I'm able to ssh between all master and slaves without password (including themselves). Also the hostname on each node is correct.
Each node shares the same host file:
192.168.56.101 master
192.168.56.102 slave1
192.168.56.103 slave2
192.168.56.104 slave3
I keep having problems in the slaves node, error log info is as follows,
2015-05-21 23:39:16,841 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.lang.IllegalArgumentException: Does not contain a valid host:port authority: file:///
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164)
at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:212)
at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:244)
at org.apache.hadoop.hdfs.server.namenode.NameNode.getServiceAddress(NameNode.java:236)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:359)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:321)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1712)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1651)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1669)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1795)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:181
Configurations in core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
</configuration>
In mapred-site.xml:
<configuration>
<property>
<name>mapred.job.tracter</name>
<value>master:8012</value>
</property>
</configuration>
In hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration
There could be a problem with the naming convention of your node hostnames.
Make sure they do not contain symbols like "_".
Check Wikipedia for restrictions.
Try to change the "master" to the actual ip address, in all your config files.
You configed OK. You need run command "$HADOOP_HOME/bin/hdfs namenode -format master", after run command "$HADOOP_HOME/sbin/start-dfs"
I have installed Hadoop in my system, Jobtracker : localhost:50030/jobtracker.jsp is working fine but localhost:50075/ host is not resolved. Can anybody help my what is the problem in my Ubuntu system. Below check my code-site.xml configuration :
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
never seen 50075 before, but 50070 is the local NameNode, I suggest you format the NameNode to have a try:
rm -r /tmp/hadoop-*;
bin/hadoop namenode -format;
./bin/start-all.sh
Check your port configurations. Here is a list of hadoop daemon configurable parameters
dfs.http.address: The address and the base port where the dfs namenode web ui will listen on. If the port is 0 then the server will start on a free port.
The default for the name node is usually set to port 50070, so try localhost:50070.
http://hadoop.apache.org/common/docs/r0.20.2/hdfs-default.html
I think you mean core-site.xml (not code-site.xml). The fs.default.name configuration here does not determine the location of the hadoop web ui/dashboard. This is the port used by data nodes to communicate with the name node.