Cannot Read a file from HDFS using Spark

Cannot Read a file from HDFS using Spark - hadoop

I have installed cloudera CDH 5 by using cloudera manager.
I can easily do
hadoop fs -ls /input/war-and-peace.txt
hadoop fs -cat /input/war-and-peace.txt
this above command will print the whole txt file on the console.
now I start the spark shell and say
val textFile = sc.textFile("hdfs://input/war-and-peace.txt")
textFile.count
Now I get an error
Spark context available as sc.
scala> val textFile = sc.textFile("hdfs://input/war-and-peace.txt")
2014-12-14 15:14:57,874 INFO [main] storage.MemoryStore (Logging.scala:logInfo(59)) - ensureFreeSpace(177621) called with curMem=0, maxMem=278302556
2014-12-14 15:14:57,877 INFO [main] storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_0 stored as values in memory (estimated size 173.5 KB, free 265.2 MB)
textFile: org.apache.spark.rdd.RDD[String] = hdfs://input/war-and-peace.txt MappedRDD[1] at textFile at <console>:12
scala> textFile.count
2014-12-14 15:15:21,791 INFO [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 0 time(s); maxRetries=45
2014-12-14 15:15:41,905 INFO [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 1 time(s); maxRetries=45
2014-12-14 15:16:01,925 INFO [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 2 time(s); maxRetries=45
2014-12-14 15:16:21,983 INFO [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 3 time(s); maxRetries=45
2014-12-14 15:16:42,001 INFO [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 4 time(s); maxRetries=45
2014-12-14 15:17:02,062 INFO [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 5 time(s); maxRetries=45
2014-12-14 15:17:22,082 INFO [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 6 time(s); maxRetries=45
2014-12-14 15:17:42,116 INFO [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 7 time(s); maxRetries=45
2014-12-14 15:18:02,138 INFO [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 8 time(s); maxRetries=45
2014-12-14 15:18:22,298 INFO [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 9 time(s); maxRetries=45
2014-12-14 15:18:42,319 INFO [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 10 time(s); maxRetries=45
2014-12-14 15:19:02,354 INFO [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 11 time(s); maxRetries=45
2014-12-14 15:19:22,373 INFO [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 12 time(s); maxRetries=45
2014-12-14 15:19:42,424 INFO [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 13 time(s); maxRetries=45
2014-12-14 15:20:02,446 INFO [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 14 time(s); maxRetries=45
2014-12-14 15:20:22,512 INFO [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 15 time(s); maxRetries=45
2014-12-14 15:20:42,515 INFO [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 16 time(s); maxRetries=45
2014-12-14 15:21:02,550 INFO [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 17 time(s); maxRetries=45
2014-12-14 15:21:22,558 INFO [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 18 time(s); maxRetries=45
2014-12-14 15:21:42,683 INFO [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 19 time(s); maxRetries=45
2014-12-14 15:22:02,702 INFO [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 20 time(s); maxRetries=45
2014-12-14 15:22:22,832 INFO [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 21 time(s); maxRetries=45
2014-12-14 15:22:42,852 INFO [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 22 time(s); maxRetries=45
2014-12-14 15:23:02,974 INFO [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 23 time(s); maxRetries=45
2014-12-14 15:23:22,995 INFO [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 24 time(s); maxRetries=45
2014-12-14 15:23:43,109 INFO [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 25 time(s); maxRetries=45
2014-12-14 15:24:03,128 INFO [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 26 time(s); maxRetries=45
2014-12-14 15:24:23,250 INFO [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 27 time(s); maxRetries=45
java.net.ConnectException: Call From dn1home/192.168.1.21 to input:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
at org.apache.hadoop.ipc.Client.call(Client.java:1415)
Why did I get this error? I am able to read the same file by using hadoop commands?

Here is the solution
sc.textFile("hdfs://nn1home:8020/input/war-and-peace.txt")
How did I find out nn1home:8020?
Just search for the file core-site.xml and look for xml element fs.defaultFS

if you want to use sc.textFile("hdfs://...") you need to give the full path(absolute path), in your example that would be "nn1home:8020/.."
If you want to make it simple, then just use sc.textFile("hdfs:/input/war-and-peace.txt")
That's only one /

This will work:
val textFile = sc.textFile("hdfs://localhost:9000/user/input.txt")
Here, you can take localhost:9000 from hadoop core-site.xml config file's fs.defaultFS parameter value.

You are not passing a proper url string.
hdfs:// - protocol type
localhost - ip address(may be different for you eg. - 127.56.78.4)
54310 - port number
/input/war-and-peace.txt - Complete path to the file you want to load.
Finally the URL should be like this
hdfs://localhost:54310/input/war-and-peace.txt

If you started spark with HADOOP_HOME set in spark-env.sh, spark would know where to look for hdfs configuration files.
In this case spark already knows location of your namenode/datanode and only below should work fine to access hdfs files;
sc.textFie("/myhdfsdirectory/myfiletoprocess.txt")
You can create your myhdfsdirectory as below;
hdfs dfs -mkdir /myhdfsdirectory
and from your local file system you can move your myfiletoprocess.txt to hdfs directory using below command
hdfs dfs -copyFromLocal mylocalfile /myhdfsdirectory/myfiletoprocess.txt

I'm also using CDH5. For me the full path i,e "hdfs://nn1home:8020" is not working for some strange reason. Most of the example shows the path like that.
I used the command like
val textFile=sc.textFile("hdfs:/input1/Card_History2016_3rdFloor.csv")
o/p of above command:
textFile: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[1] at textFile at <console>:22
textFile.count
res1: Long = 58973
and this works fine for me.

This worked for me
logFile = "hdfs://localhost:9000/sampledata/sample.txt"

val conf = new SparkConf().setMaster("local[*]").setAppName("HDFSFileReader")
conf.set("fs.defaultFS", "hdfs://hostname:9000")
val sc = new SparkContext(conf)
val data = sc.textFile("hdfs://hostname:9000/hdfspath/")
data.saveAsTextFile("C:\\dummy\")
the above code reads all hdfs files from directory and save it locally in c://dummy folder.

It might be issue of file path or URL and hdfs port as well.
Solution:
First open core-site.xml file from location $HADOOP_HOME/etc/hadoop and check the value of property fs.defaultFS.
Let's say the value is hdfs://localhost:9000 and the file location in hdfs is /home/usr/abc/fileName.txt.
Then, the file URL will be : hdfs://localhost:9000/home/usr/abc/fileName.txt
and following command used to read file from hdfs:
var result= scontext.textFile("hdfs://localhost:9000/home/usr/abc/fileName.txt", 2)

Get the fs.defaultFS URL from core-site.xml(/etc/hadoop/conf) and read the file as below. In my case, fs.defaultFS is hdfs://quickstart.cloudera:8020
txtfile=sc.textFile('hdfs://quickstart.cloudera:8020/user/cloudera/rddoutput')
txtfile.collect()

Related

Yarn Container wrong hostname when contacting ResourceManager

I'm trying to write a simple query in Hive (just an INSERT) but I'm having issues with how MapReduce jobs are being provisioned. Containers are getting allocated correctly, but my jobs never run.
It seems that they're contacting the ResourceManager incorrectly. I have verified (via JPS) that my ResourceManager is indeed running, and is running on hostname hadoop1.personal which all servers have a reference to in /etc/hosts. The issue looks like this:
2016-09-27 09:41:55,223 INFO [main] org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue
2016-09-27 09:41:55,224 INFO [Socket Reader #1 for port 45744] org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 45744
2016-09-27 09:41:55,230 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2016-09-27 09:41:55,230 INFO [IPC Server listener on 45744] org.apache.hadoop.ipc.Server: IPC Server listener on 45744: starting
2016-09-27 09:41:55,299 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: nodeBlacklistingEnabled:true
2016-09-27 09:41:55,300 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: maxTaskFailuresPerNode is 3
2016-09-27 09:41:55,300 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: blacklistDisablePercent is 33
2016-09-27 09:41:55,375 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8030
2016-09-27 09:41:56,414 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-09-27 09:41:57,415 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-09-27 09:41:58,415 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-09-27 09:41:59,416 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-09-27 09:42:00,417 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
And of course it does go on for some time before eventually dying.
Now, I know that my configurations are getting picked up in some sense. Earlier in the logs, the containers say 2016-09-27 09:41:52,783 INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system [hdfs://hadoop1.personal:8020] which is the correct NameNode to be using.
Additionally, if I go to the NodeManager configuration (i.e. http://hadoop2.personal:8042/conf) then I can see that:
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop1.personal</value>
<source>yarn-site.xml</source>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
<source>yarn-default.xml</source>
</property>
So the NodeManager appears to know exactly where it needs to be at.
This seems incredibly strange to me: The NodeManager and ResourceManagers are talking together just fine, but containers are contacting the wrong scheduler. How do I control the address the containers are contacting for scheduling?
As a sidenote, I have tested this both with and without IPv6 enabled as recommended in this answer. No effect.

INFO org.apache.hadoop.ipc.Client: Retrying connect to server: nn1.hcluster.com/192.168.155.131:9000. Already tri ed 0 time(s)

I configured an Apache hadoop cluster with 1 Namenode and 2 Datanodes in VMware Workstation and Namenode is working fine, also did ssh-passwordless login too, but when I try to start datanode get the following error?
Under data nodes log getting Retrying error for namenode under both datanodes, whereas I tried to ping and connect with Namenode no error.
Below is the log for datanode,
2015-11-14 19:54:22,622 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = dn2.hcluster.com/192.168.155.133
STARTUP_MSG: args = []
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 1
5:23:09 PDT 2013
STARTUP_MSG: java = 1.8.0_65
************************************************************/
2015-11-14 19:54:23,447 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2015-11-14 19:54:23,485 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2015-11-14 19:54:23,486 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2015-11-14 19:54:23,486 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2015-11-14 19:54:23,876 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2015-11-14 19:54:25,720 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: nn1.hcluster.com/192.168.155.131:9000. Already tri
ed 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-11-14 19:54:27,723 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: nn1.hcluster.com/192.168.155.131:9000. Already tri
ed 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-11-14 19:54:28,726 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: nn1.hcluster.com/192.168.155.131:9000. Already tri
ed 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-11-14 19:54:29,729 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: nn1.hcluster.com/192.168.155.131:9000. Already tri
ed 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-11-14 19:54:30,733 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: nn1.hcluster.com/192.168.155.131:9000. Already tri
ed 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-11-14 19:54:31,753 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: nn1.hcluster.com/192.168.155.131:9000. Already tri
ed 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-11-14 19:54:32,755 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: nn1.hcluster.com/192.168.155.131:9000. Already tri
ed 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-11-14 19:54:33,758 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: nn1.hcluster.com/192.168.155.131:9000. Already tri
ed 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-11-14 19:54:34,762 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: nn1.hcluster.com/192.168.155.131:9000. Already tri
ed 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-11-14 19:54:35,764 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: nn1.hcluster.com/192.168.155.131:9000. Already tri
ed 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-11-14 19:54:35,922 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call to nn1.hcluster.com/192.168.155.
131:9000 failed on local exception: java.net.NoRouteToHostException: No route to host
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1150)
at org.apache.hadoop.ipc.Client.call(Client.java:1118)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
at com.sun.proxy.$Proxy4.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.checkVersion(RPC.java:422)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:414)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:392)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:374)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:453)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:335)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:300)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:385)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:321)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1712)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1651)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1669)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1795)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1812)
Caused by: java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:511)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:481)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:457)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:583)
at org.apache.hadoop.ipc.Client$Connection.access$2200(Client.java:205)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1249)
at org.apache.hadoop.ipc.Client.call(Client.java:1093)
... 16 more
2015-11-14 19:54:35,952 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at dn2.hcluster.com/192.168.155.133
************************************************************/
From Datanode 1 and 2, Namenode and it's GUI is working and all 3Desktop are able to communicate with eachother via pin or ssh passwordless too. Please help..
core-site.xml under namenode
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://nn01.hcluster.com:9000</value>
</property>
</configuration>

make sure your Namenode is running fine. Otherwise check the Machine IP and host name in /etc/hosts file.
Make sure that you have added this hostname "nn01.hcluster.com" there.

Cannot run hadoop dfs -ls from slave node

Should I be able to run the command:
hadoop dfs -ls
from slave node?
Currently I cannot, I get the following error:
13/11/01 14:58:03 INFO ipc.Client: Retrying connect to server: ec2-54-200-245-100.us-west-2.compute.amazonaws.com/172.31.17.66:9000. Already tried 0 time(s).
13/11/01 14:58:04 INFO ipc.Client: Retrying connect to server: ec2-54-200-245-100.us-west-2.compute.amazonaws.com/172.31.17.66:9000. Already tried 1 time(s).
13/11/01 14:58:05 INFO ipc.Client: Retrying connect to server: ec2-54-200-245-100.us-west-2.compute.amazonaws.com/172.31.17.66:9000. Already tried 2 time(s).
13/11/01 14:58:06 INFO ipc.Client: Retrying connect to server: ec2-54-200-245-100.us-west-2.compute.amazonaws.com/172.31.17.66:9000. Already tried 3 time(s).
13/11/01 14:58:07 INFO ipc.Client: Retrying connect to server: ec2-54-200-245-100.us-west-2.compute.amazonaws.com/172.31.17.66:9000. Already tried 4 time(s).
13/11/01 14:58:08 INFO ipc.Client: Retrying connect to server: ec2-54-200-245-100.us-west-2.compute.amazonaws.com/172.31.17.66:9000. Already tried 5 time(s).
13/11/01 14:58:09 INFO ipc.Client: Retrying connect to server: ec2-54-200-245-100.us-west-2.compute.amazonaws.com/172.31.17.66:9000. Already tried 6 time(s).
13/11/01 14:58:10 INFO ipc.Client: Retrying connect to server: ec2-54-200-245-100.us-west-2.compute.amazonaws.com/172.31.17.66:9000. Already tried 7 time(s).
13/11/01 14:58:11 INFO ipc.Client: Retrying connect to server: ec2-54-200-245-100.us-west-2.compute.amazonaws.com/172.31.17.66:9000. Already tried 8 time(s).
13/11/01 14:58:12 INFO ipc.Client: Retrying connect to server: ec2-54-200-245-100.us-west-2.compute.amazonaws.com/172.31.17.66:9000. Already tried 9 time(s).
Bad connection to FS. command aborted.

You should check on your slave nodes the property called fs.default.name in core-site.xml and make sure it points to your namenode.
Since you seem to be on EC2 it should be something like
<property>
<name>fs.default.name</name>
<value>hdfs://namenode.ec2.demdex.com:9000</value>
</property>

Slave unable to connect to master and start tasktracker or datanode in hadoop

I am working with a 2 node fully distributed hadoop cluster. I am trying to connect tasktracker to run on the slave node but it is not able to connect to my 9000/9001 ports. Below are the config files so if anyone spots something then please holler!
Error message from Tasktracker (ran using start-all on master)
2012-12-19 09:33:03,161 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2012-12-19 09:33:03,316 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2012-12-19 09:33:03,320 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2012-12-19 09:33:03,320 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: TaskTracker metrics system started
2012-12-19 09:33:03,888 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2012-12-19 09:33:04,502 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2012-12-19 09:33:04,755 INFO org.apache.hadoop.http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2012-12-19 09:33:04,799 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2012-12-19 09:33:04,807 INFO org.apache.hadoop.mapred.TaskTracker: Starting tasktracker with owner as hadoop
2012-12-19 09:33:04,813 INFO org.apache.hadoop.mapred.TaskTracker: Good mapred local directories are: /tmp/hadoop-hadoop/mapred/local
2012-12-19 09:33:04,826 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
2012-12-19 09:33:04,856 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source jvm registered.
2012-12-19 09:33:04,857 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source TaskTrackerMetrics registered.
2012-12-19 09:33:04,920 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
2012-12-19 09:33:04,923 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source RpcDetailedActivityForPort38644 registered.
2012-12-19 09:33:04,926 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source RpcActivityForPort38644 registered.
2012-12-19 09:33:04,929 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2012-12-19 09:33:04,931 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 38644: starting
2012-12-19 09:33:04,931 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 38644: starting
2012-12-19 09:33:04,932 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 38644: starting
2012-12-19 09:33:04,932 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 38644: starting
2012-12-19 09:33:04,933 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 38644: starting
2012-12-19 09:33:04,935 INFO org.apache.hadoop.mapred.TaskTracker: TaskTracker up at: localhost/127.0.0.1:38644
2012-12-19 09:33:04,935 INFO org.apache.hadoop.mapred.TaskTracker: Starting tracker tracker_10.77.26.116:localhost/127.0.0.1:38644
2012-12-19 09:33:05,980 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 0 time(s).
2012-12-19 09:33:06,982 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 1 time(s).
2012-12-19 09:33:07,985 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 2 time(s).
2012-12-19 09:33:08,987 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 3 time(s).
2012-12-19 09:33:09,989 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 4 time(s).
2012-12-19 09:33:10,991 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 5 time(s).
2012-12-19 09:33:11,994 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 6 time(s).
2012-12-19 09:33:12,996 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 7 time(s).
2012-12-19 09:33:13,998 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 8 time(s).
2012-12-19 09:33:15,001 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 9 time(s).
2012-12-19 09:33:15,004 INFO org.apache.hadoop.ipc.RPC: Server at ipdiscovermaster.cloudapp.net/168.63.72.148:9001 not available yet, Zzzzz...
2012-12-19 09:33:17,009 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 0 time(s).
2012-12-19 09:33:18,011 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 1 time(s).
2012-12-19 09:33:19,013 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 2 time(s).
2012-12-19 09:33:20,015 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 3 time(s).
2012-12-19 09:33:21,018 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 4 time(s).
2012-12-19 09:33:22,020 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 5 time(s).
2012-12-19 09:33:23,022 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 6 time(s).
2012-12-19 09:33:24,026 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 7 time(s).
2012-12-19 09:33:25,033 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 8 time(s).
2012-12-19 09:33:26,036 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 9 time(s).
2012-12-19 09:33:26,039 INFO org.apache.hadoop.ipc.RPC: Server at ipdiscovermaster.cloudapp.net/168.63.72.148:9001 not available yet, Zzzzz...
2012-12-19 09:33:28,044 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 0 time(s).
2012-12-19 09:33:29,045 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 1 time(s).
2012-12-19 09:33:30,048 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 2 time(s).
2012-12-19 09:33:31,051 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 3 time(s).
2012-12-19 09:33:32,055 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 4 time(s).
2012-12-19 09:33:33,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 5 time(s).
2012-12-19 09:33:34,060 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 6 time(s).
2012-12-19 09:33:35,063 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 7 time(s).
2012-12-19 09:33:36,071 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 8 time(s).
2012-12-19 09:33:37,073 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 9 time(s).
2012-12-19 09:33:37,083 INFO org.apache.hadoop.ipc.RPC: Server at ipdiscovermaster.cloudapp.net/168.63.72.148:9001 not available yet, Zzzzz...
2012-12-19 09:33:39,086 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 0 time(s).
2012-12-19 09:33:40,094 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 1 time(s).
2012-12-19 09:33:41,097 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 2 time(s).
2012-12-19 09:33:42,101 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 3 time(s).
2012-12-19 09:33:43,104 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 4 time(s).
2012-12-19 09:33:44,107 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 5 time(s).
2012-12-19 09:33:45,113 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 6 time(s).
2012-12-19 09:33:46,118 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 7 time(s).
2012-12-19 09:33:47,122 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 8 time(s).
2012-12-19 09:33:48,131 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 9 time(s).
2012-12-19 09:33:48,134 INFO org.apache.hadoop.ipc.RPC: Server at ipdiscovermaster.cloudapp.net/168.63.72.148:9001 not available yet, Zzzzz...
2012-12-19 09:33:50,137 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 0 time(s).
2012-12-19 09:33:51,140 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 1 time(s).
2012-12-19 09:33:52,143 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 2 time(s).
2012-12-19 09:33:53,145 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 3 time(s).
2012-12-19 09:33:54,148 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 4 time(s).
2012-12-19 09:33:55,151 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 5 time(s).
2012-12-19 09:33:56,154 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 6 time(s).
2012-12-19 09:33:57,158 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 7 time(s).
2012-12-19 09:33:58,161 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 8 time(s).
2012-12-19 09:33:59,167 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 9 time(s).
2012-12-19 09:33:59,169 INFO org.apache.hadoop.ipc.RPC: Server at ipdiscovermaster.cloudapp.net/168.63.72.148:9001 not available yet, Zzzzz...
2012-12-19 09:34:01,173 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 0 time(s).
2012-12-19 09:34:02,175 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 1 time(s).
2012-12-19 09:34:03,178 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 2 time(s).
2012-12-19 09:34:04,181 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 3 time(s).
2012-12-19 09:34:05,183 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 4 time(s).
2012-12-19 09:34:06,189 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 5 time(s).
2012-12-19 09:34:07,191 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 6 time(s).
2012-12-19 09:34:08,193 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 7 time(s).
2012-12-19 09:34:09,195 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 8 time(s).
2012-12-19 09:34:10,196 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 9 time(s).
2012-12-19 09:34:10,199 INFO org.apache.hadoop.ipc.RPC: Server at ipdiscovermaster.cloudapp.net/168.63.72.148:9001 not available yet, Zzzzz...
2012-12-19 09:34:12,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 0 time(s).
MASTER hosts file
#127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
#10.77.42.2 ipdiscovermaster.cloudapp.net
ipdiscoverreg1.cloudapp.net
#10.76.174.108 ipdiscoverreg1.cloudapp.net
ipdiscovermaster.cloudapp.net
MASTER core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://ipdiscovermaster.cloudapp.net:9000</value>
</property>
</configuration>
MASTER mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>ipdiscovermaster.cloudapp.net:9001</value>
</property>
</configuration>
MASTER masters file
ipdiscovermaster.cloudapp.net
MASTER slaves file
ipdiscovermaster.cloudapp.net
ipdiscoverreg1.cloudapp.net
SLAVE hosts file
#127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
#10.77.42.2 ipdiscovermaster.cloudapp.net
ipdiscoverreg1.cloudapp.net
ipdiscovermaster.cloudapp.net
#10.76.174.108 ipdiscoverreg1.cloudapp.net
SLAVE core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://ipdiscovermaster.cloudapp.net:9000</value>
</property>
</configuration>
SLAVE mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>ipdiscovermaster.cloudapp.net:9001</value>
</property>
</configuration>
SLAVE masters file
ipdiscovermaster.cloudapp.net

You need to check following possibilities
i Am amusing you have check log on Datanode ( 192.168.135.111 slave01) Which is best way go get exact error
If you have formatted nameNode
i)delete temp data folder ..
ii)recreate it
iii)give all the permission to temp folder
iv)format namenode
v)start hadoop cluster

add the IP and hostname of the slave into the /etc/hosts file of the master machine and vice-versa. also, add dfs.data.dir and dfs.name.dir properties in your hdfs-site.xml file. these values default to /temp which gets emptied at restart. as a result you may loose information and face some problems on machine restart. make sure you have proper name resolution as this is really important for proper hadoop functioning.

I had similar problem with this. the logs just showing "retrying connect to server XXX". Here is what i did to solve this issue. Simply modify master & slave nodes /etc/hosts files particularly it's own hostname and corresponding IP. Dont bind hostname with 127.0.0.1:
original hosts file in master:
127.0.0.1 master
192.168.135.111 slave01
original hosts file in slave:
192.168.135.110 master
127.0.0.1 slave01
Resolved hosts file in master:
**192.168.135.110** master
192.168.135.111 slave
Resolve hosts file in slave:
192.168.135.110 master
**192.168.135.111** slave

Errors while running hadoop

haduser#user-laptop:/usr/local/hadoop$ bin/hadoop dfs -copyFromLocal /tmp/input
/user/haduser/input
11/12/14 14:21:00 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 0 time(s).
11/12/14 14:21:01 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 1 time(s).
11/12/14 14:21:02 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 2 time(s).
11/12/14 14:21:03 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 3 time(s).
11/12/14 14:21:04 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 4 time(s).
11/12/14 14:21:05 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 5 time(s).
11/12/14 14:21:06 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 6 time(s).
11/12/14 14:21:07 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. -Already tried 7 time(s).
11/12/14 14:21:08 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 8 time(s).
11/12/14 14:21:09 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 9 time(s).
Bad connection to FS. command aborted. exception: Call to localhost/127.0.0.1:54310 failed on connection exception: java.net.ConnectException: Connection refused
I am getting the above errors when I'm trying to copy files from /tmp/input to /user/haduser/input even though the file /etc/hosts contain entry for localhost.
When the jps command is run, the TaskTracker and the namenode are not listed.
What could be the problem? Please someone help me with this.

I had similar issues - Actually Hadoop was binding to IPv6.
Then I Added - "export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true " to $HADOOP_HOME/conf/hadoop-env.sh
Hadoop was binding to IPv6 even when I had disabled IPv6 on my system.
Once I added it to env, started working fine.
Hope this helps someone.

Try to do ssh to your local system using the IP, in this case:
$ ssh 127.0.0.1
Once you are able to do the ssh successfully. Run the below command to know the list of open ports
~$ lsof -i
look for a listening connector with name: localhost:< PORTNAME > (LISTEN)
copy this < PORTNAME > and replace the existing value of port number in tag of fs.default.name property in your core-site.xml in the hadoop conf folder
save the core-site.xml, this should resolve the issue.

NameNode (NN) maintains the namespace for HDFS and it should be running for filesystem operations on HDFS. Check the logs why the NN hasn't started. TaskTracker is not required for operations on HDFS, only NN and DN are sufficient. Check the http://goo.gl/8ogSk and http://goo.gl/NIWoK tutorials on how to setup Hadoop on a single and multi node.

All the files in the bin are exectuables. Just copy the command and paste it in the terminal. Make sure the address is right, i.e. the user must be replaced by something. That would do the trick.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Cannot Read a file from HDFS using Spark - hadoop

Here is the solution sc.textFile("hdfs://nn1home:8020/input/war-and-peace.txt") How did I find out nn1home:8020? Just search for the file core-site.xml and look for xml element fs.defaultFS

if you want to use sc.textFile("hdfs://...") you need to give the full path(absolute path), in your example that would be "nn1home:8020/.." If you want to make it simple, then just use sc.textFile("hdfs:/input/war-and-peace.txt") That's only one /

This will work: val textFile = sc.textFile("hdfs://localhost:9000/user/input.txt") Here, you can take localhost:9000 from hadoop core-site.xml config file's fs.defaultFS parameter value.

This worked for me logFile = "hdfs://localhost:9000/sampledata/sample.txt"

Get the fs.defaultFS URL from core-site.xml(/etc/hadoop/conf) and read the file as below. In my case, fs.defaultFS is hdfs://quickstart.cloudera:8020 txtfile=sc.textFile('hdfs://quickstart.cloudera:8020/user/cloudera/rddoutput') txtfile.collect()

Related

Yarn Container wrong hostname when contacting ResourceManager

INFO org.apache.hadoop.ipc.Client: Retrying connect to server: nn1.hcluster.com/192.168.155.131:9000. Already tri ed 0 time(s)

Cannot run hadoop dfs -ls from slave node

Slave unable to connect to master and start tasktracker or datanode in hadoop

Errors while running hadoop

Categories

Resources