Hadoop 2 node Cluster Communication Query - hadoop

I have a 2 node Hadoop Cluster (Master and Slave). Both the nodes are up and running as I can check their health on the localhost:50070.
So I get this 150 mb folder (with plain text) into the Master's HDFS. Then I run the next command:
hadoop jar hadoop-mapreduce-examples-2.6.0.jar wordcount /In/ /Out/
The issue is that I only get the same execution time as when running the command with one single node. To me it seems like the nodes are not really doing any parallelism!!
I am checking the logs on the slave and I have the following:
2015-03-18 23:52:49,455 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-1680309327-31.220.211.10-1426721698684:blk_1073741856_1032 src: /31.220.211.10:46035 dest: /31.220.211.35:50010
2015-03-18 23:52:51,191 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /31.220.211.10:46035, dest: /31.220.211.35:50010, bytes: 3796560, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_688133940_1, offset: 0, srvID: fbea19bb-06ee-4868-af5c-0cb9699064f3, blockid: BP-1680309327-31.220.211.10-1426721698684:blk_1073741856_1032, duration: 1734807025
2015-03-18 23:52:51,191 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-1680309327-31.220.211.10-1426721698684:blk_1073741856_1032, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2015-03-18 23:52:59,733 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification succeeded for BP-1680309327-31.220.211.10-1426721698684:blk_1073741856_1032
And on the Master:
15/03/18 23:52:50 INFO mapred.Task: Task 'attempt_local1934686363_0001_r_000000_0' done.
15/03/18 23:52:50 INFO mapred.LocalJobRunner: Finishing task: attempt_local1934686363_0001_r_000000_0
15/03/18 23:52:50 INFO mapred.LocalJobRunner: reduce task executor complete.
15/03/18 23:52:50 INFO mapreduce.Job: map 100% reduce 100%
15/03/18 23:52:50 INFO mapreduce.Job: Job job_local1934686363_0001 completed successfully
15/03/18 23:52:51 INFO mapreduce.Job: Counters: 38
Is this normal? Why I am being said that both my nodes are alive but when running the wordcount example it does not parallelize? But instead it acts like everything runs local!!
I can't seem to find an answer to this problem, so I would be very happy if I could get some help.

The problem was that even though both my nodes where recognised as alive the job was still running locally.
That was because the yarn file was missing this property:
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
I have also triple checked that all the config files are the same on all the nodes!! After everything was checked carefully, the job ran globally.
Another thing would be to take attention when configuring the cluster as Hadoop 1.x and Hadoop 2.x don't share the same configuration parameters.

Related

In Hadoop, How can I find which slave node is executing an attempt N?

I'm using Hadoop 1.2.1, and my hadoop application fails in doing Reduce. From Hadoop run I see messages like following :
15/05/22 18:14:15 INFO mapred.JobClient: map 0% reduce 0% 15/05/22
18:14:25 INFO mapred.JobClient: map 100% reduce 0% 15/05/22 18:24:25
INFO mapred.JobClient: map 0% reduce 0% 15/05/22 18:24:26 INFO
mapred.JobClient: Task Id : attempt_201505221804_0013_m_000000_0,
Status : FAILED Task attempt_201505221804_0013_m_000000_0 failed to
report status for 600 seconds. Killing! 15/05/22 18:24:35 INFO
mapred.JobClient: map 100% reduce 0%
I'd like to see the log of attempt_201505221804_0013_m_000000_0, but it is too time-consuming to find which slave had executed attempt_201505221804_0013_m_000000_0.
Someone told me to use Hadoop web pages to find it, but there is some firewall on this cluster and I can't change the option because the cluster is fundamentally not owned by our group.
Is there any way to find in where this attempt was executed?
You should be able to find this information in the jobtracker logs which are by default under HADOOP_HOME/logs. This will contain entries looking similar to this:
INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'attempt_201503262103_0001_m_000000_0' to tip task_201503262103_0001_m_000000, for tracker 'host'
You can search the file for the specific attempt id.

Why Mapreduce with YARN stuck on CDH 5.3?

Mapreduce with YARN fail to move ahead of 0% map and 0% reduce. I am using Cloudera CDH on google compute high memory instance(13 GM RAM). 8 GB free ram is available on the machine. Can you please help me to fix it?
sunny#hadoop-m:~$ hadoop jar /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/jars/hadoop-mapreduce-examples-2.5.0-cdh5.3.0.jar grep input output 'dfs[a-z.]+'
14/12/24 00:13:53 INFO client.RMProxy: Connecting to ResourceManager at hadoop-m.c.sunny-hadoop-trial.internal/10.240.253.233:8032
14/12/24 00:13:53 WARN mapreduce.JobSubmitter: No job jar file set. User classes may not be found. See Job or Job#setJar(String).
14/12/24 00:13:54 INFO input.FileInputFormat: Total input paths to process : 5
14/12/24 00:13:54 INFO mapreduce.JobSubmitter: number of splits:5
14/12/24 00:13:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1419360146634_0001
14/12/24 00:13:54 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
14/12/24 00:13:54 INFO impl.YarnClientImpl: Submitted application application_1419360146634_0001
14/12/24 00:13:55 INFO mapreduce.Job: The url to track the job: http://hadoop-m.c.sunny-hadoop-trial.internal:8088/proxy/application_1419360146634_0001/
14/12/24 00:13:55 INFO mapreduce.Job: Running job: job_1419360146634_0001
Resource Manager Output
Some more info about job
yarn-site.xml: http://pastebin.mozilla.org/8113782
mapred-site.xml: http://pastebin.mozilla.org/8113813
Server 's IP got changed because of DHCP service. Client configuration for HDFS and YARN became stale. I needed to update client configuration, I did it with Cloudera manager and now cluster is running fine.

Hadoop: slaves in service but doing nothing at all

I set up a hadoop cluster and started a MapReduce job on the cluster.
The master node is running actively but all slaves are doing nothing at all.
JPS on the slave node produces
20390 DataNode
20492 NodeManager
21256 Jps
Here is the screen cast:
The next to last row corresponds to the master node.
So why the slaves using no blocks?
Also running top on master node yields the Java process(hadoop jar jar-file.jar args) taking almost 100% of CPU resources. However, such process does not exist on any slave machines.
That is why I think slaves are at rest, doing nothing at all.
Here is one example of the slave datanode log:
2014-07-24 23:28:01,302 INFO org.apache.hadoop.util.GSet: Computing capacity for map BlockMap
2014-07-24 23:28:01,302 INFO org.apache.hadoop.util.GSet: VM type = 64-bit
2014-07-24 23:28:01,304 INFO org.apache.hadoop.util.GSet: 0.5% max memory 889 MB = 4.4 MB
2014-07-24 23:28:01,304 INFO org.apache.hadoop.util.GSet: capacity = 2^19 = 524288 entries
2014-07-24 23:28:01,304 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Periodic Block Verification Scanner initialized with interval 504 hours for block pool BP-1752077220-193.167.138.8-1406217332464
2014-07-24 23:28:01,310 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Added bpid=BP-1752077220-193.167.138.8-1406217332464 to blockPoolScannerMap, new size=1
2014-07-24 23:31:01,116 INFO org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool BP-1752077220-193.167.138.8-1406217332464 Total blocks: 0, missing metadata files:0, missing block files:0, missing blocks in memory:0, mismatched blocks:0
And nothing more.
However, for the master data node, the log file contains lines like the following:
2014-07-24 22:27:23,443 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-1752077220-193.167.138.8-1406217332464:blk_1073742749_1925 src: /193.167.138.8:44210 dest: /193.167.138.8:50010
which I think means the node is receiving tasks and processing the data.
The following is from the yarn log file of one the slave node:
2014-07-24 23:28:13,811 INFO org.mortbay.log: Started SelectChannelConnector#0.0.0.0:8042
2014-07-24 23:28:13,812 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app /node started at 8042
2014-07-24 23:28:14,122 INFO org.apache.hadoop.yarn.webapp.WebApps: Registered webapp guice modules
2014-07-24 23:28:14,130 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at ugluk/193.167.138.8:8031
2014-07-24 23:28:14,176 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registering with RM using finished containers :[]
2014-07-24 23:28:14,366 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: Rolling master-key for container-tokens, got key with id 1336429163
2014-07-24 23:28:14,369 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: Rolling master-key for nm-tokens, got key with id :1986181585
2014-07-24 23:28:14,370 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as shagrat.hiit.fi:48662 with total resource of <memory:8192, vCores:8>
2014-07-24 23:28:14,370 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying ContainerManager to unblock new container-requests
I am using Hadoop 2.4.0
It seems that you formatted namenode more than once.
The block pool id error is majorly due to formatting of namenode multiple times.
Every time ,you format a namenode ,the blockpool id ,cluster id and the namespace id changes.
So first check the above attributes of the namenode and other datanodes and secondary namenode.
You can check using VERSION file in current directory of these nodes.For this ,first see where you configured your node by checking its path hadoop hdfs-site.xml.
go to that path,and look for the CURRENT directory and make the necessary changes.
Please let me know if this helps.

Hadoop error in shuffle in fetcher: Exceeded MAX_FAILED_UNIQUE_FETCHES

I am new to hadoop. I have a kerberos security enabled hadoop cluster (master and 1 slave) set up on a virtual box. I am trying to run a job from the hadoop examples 'pi'. The job terminates with the error Exceeded MAX_FAILED_UNIQUE_FETCHES. I tried searching for this error but the solutions given on the internet do not seem to be working for me. Perhaps I am missing something obvious. I even tried removing the slave from the etc/hadoop/slaves file to see if the job can run only on the master but that fails as well with the same error. Below is the log. I am running this on 64-bit Ubuntu 14.04 virtual box. Any help appreciated.
montauk#montauk-vmaster:/usr/local/hadoop$ sudo -u yarn bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar pi 2 10
Number of Maps = 2
Samples per Map = 10
OpenJDK 64-Bit Server VM warning: You have loaded library /usr/local/hadoop/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
14/06/05 12:04:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Wrote input for Map #0
Wrote input for Map #1
Starting Job
14/06/05 12:04:49 INFO client.RMProxy: Connecting to ResourceManager at /192.168.0.29:8040
14/06/05 12:04:50 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 17 for yarn on 192.168.0.29:54310
14/06/05 12:04:50 INFO security.TokenCache: Got dt for hdfs://192.168.0.29:54310; Kind: HDFS_DELEGATION_TOKEN, Service: 192.168.0.29:54310, Ident: (HDFS_DELEGATION_TOKEN token 17 for yarn)
14/06/05 12:04:50 INFO input.FileInputFormat: Total input paths to process : 2
14/06/05 12:04:51 INFO mapreduce.JobSubmitter: number of splits:2
14/06/05 12:04:51 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1401975262053_0007
14/06/05 12:04:51 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: 192.168.0.29:54310, Ident: (HDFS_DELEGATION_TOKEN token 17 for yarn)
14/06/05 12:04:53 INFO impl.YarnClientImpl: Submitted application application_1401975262053_0007
14/06/05 12:04:53 INFO mapreduce.Job: The url to track the job: http://montauk-vmaster:8088/proxy/application_1401975262053_0007/
14/06/05 12:04:53 INFO mapreduce.Job: Running job: job_1401975262053_0007
14/06/05 12:05:29 INFO mapreduce.Job: Job job_1401975262053_0007 running in uber mode : false
14/06/05 12:05:29 INFO mapreduce.Job: map 0% reduce 0%
14/06/05 12:06:04 INFO mapreduce.Job: map 50% reduce 0%
14/06/05 12:06:06 INFO mapreduce.Job: map 100% reduce 0%
14/06/05 12:06:34 INFO mapreduce.Job: map 100% reduce 100%
14/06/05 12:06:34 INFO mapreduce.Job: Task Id : attempt_1401975262053_0007_r_000000_0, Status : FAILED
Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#4
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:323)
at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:245)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:347)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165)
I came across the same problem as yours when I install cdh5.1.0 with kerberos security using tarball,solutions found by google are insufficient memory,but I don't think it's my situation since my input is very small (52K).
After digging several days,I found root cause in this link.
To sum up solutions in that link can be:
add following property in yarn-site.xml even it's default in yarn-default.xml
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
remove property yarn.nodemanager.local-dirs and use default value /tmp.Then exec following commands:
mkdir -p /tmp/hadoop-yarn/nm-local-dir
chown yarn:yarn /tmp/hadoop-yarn/nm-local-dir
The problem can be concluded:
After setting yarn.nodemanager.local-dirs property, the property yarn.nodemanager.aux-services.mapreduce_shuffle.class in yarn-default.xml doesn't work.
The root cause I haven't found also.
I had the same issue.I had mapreduce job without reducer.Then I solved it using job.setNumReduceTasks(0);
change below property in yarn-site.xml and create the directory.
yarn.nodemanager.local-dirs
/tmp
mkdir -p /tmp/hadoop-yarn/nm-local-dir
chown yarn:yarn /tmp/hadoop-yarn/nm-local-dir
tune the resources properety in mapred-site.xml
mapreduce.reduce.shuffle.input.buffer.percent=0.50
mapreduce.reduce.shuffle.memory.limit.percent=0.2
mapreduce.reduce.shuffle.parallelcopies=4
Restart resourcemanager and nodemanager on their respective nodes.

Hadoop Single Node installation on Windows 7

I am new to hadoop and trying to get a single node setup of Hadoop 0.20.2 on my Windows 7 machine.
My questions are two-fold - one with respect to the completeness of the installation itself and the other regarding the error in the reduce stage of a sample Word Count program.
My Installation steps are as follows:
I am following http://blog.benhall.me.uk/2011/01/installing-hadoop-0210-on-windows.html for the installation procedure.
I have installed cygwin and set up password-less ssh on my localhost
My java version is:
java version "1.7.0_02"
Java(TM) SE Runtime Environment (build 1.7.0_02-b13)
Java HotSpot(TM) 64-Bit Server VM (build 22.0-b10, mixed mode)
Contents of conf/core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Contents of conf/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Contents of conf/mapred-site.xml:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
I set the JAVA_HOME variable and the command "hadoop version" prints 0.20.2
hadoop namenode -format creates the DFS without any errors
start-all.sh prints that namenode, secondarynamenode, datanode, jobtracker and tasktracker have all started.
however, the command "jps" prints:
$ jps
4584 Jps
11008 JobTracker
2084 NameNode
I noticed in that jps printed the pids' of tasktracker, secondarynamenode as well.
I am able to view the output of
http://localhost:50030 for the jobtracker,
http://localhost:50060 for the tasktracker and
http://localhost:50070 for the namenode.
I tried both put and get commands to the hdfs and they were successful:
bin/hadoop fs -mkdir In
bin/hadoop fs -put *.txt In
mkdir temp
bin/hadoop fs -get In temp
ls -l temp/In
$ ls -l temp/In/
total 365
348624 Mar 24 23:59 CHANGES.txt
13366 Mar 24 23:59 LICENSE.txt
101 Mar 24 23:59 NOTICE.txt
1366 Mar 24 23:59 README.txt
I could also view these files by browsing the DFS via the http interface for namenode
Is my installation complete?
If yes, why does the jps command not show the pids of all five components?
If not, then, what steps do i need to complete the installation?
What are other sanity checks used to test the completeness of the installation?
I initially believed my installation to be complete and ran a sample WordCount map-reduce program along the lines of http://jayant7k.blogspot.com/2010/06/writing-your-first-map-reduce-program.html
I obtain the following output:
12/03/25 00:10:26 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/03/25 00:10:26 INFO input.FileInputFormat: Total input paths to process : 1
12/03/25 00:10:27 INFO mapred.JobClient: Running job: job_201203242348_0001
12/03/25 00:10:28 INFO mapred.JobClient: map 0% reduce 0%
12/03/25 00:10:35 INFO mapred.JobClient: map 100% reduce 0%
12/03/25 00:21:29 INFO mapred.JobClient: Task Id : attempt_201203242348_0001_r_0
00000_0, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
12/03/25 00:32:25 INFO mapred.JobClient: Task Id : attempt_201203242348_0001_r_0
00000_1, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
12/03/25 00:44:02 INFO mapred.JobClient: Task Id : attempt_201203242348_0001_r_0
00000_2, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
12/03/25 00:55:00 INFO mapred.JobClient: Job complete: job_201203242348_0001
12/03/25 00:55:00 INFO mapred.JobClient: Counters: 12
12/03/25 00:55:00 INFO mapred.JobClient: Job Counters
12/03/25 00:55:00 INFO mapred.JobClient: Launched reduce tasks=4
12/03/25 00:55:00 INFO mapred.JobClient: Launched map tasks=1
12/03/25 00:55:00 INFO mapred.JobClient: Data-local map tasks=1
12/03/25 00:55:00 INFO mapred.JobClient: Failed reduce tasks=1
12/03/25 00:55:00 INFO mapred.JobClient: FileSystemCounters
12/03/25 00:55:00 INFO mapred.JobClient: HDFS_BYTES_READ=13366
12/03/25 00:55:00 INFO mapred.JobClient: FILE_BYTES_WRITTEN=23511
12/03/25 00:55:00 INFO mapred.JobClient: Map-Reduce Framework
12/03/25 00:55:00 INFO mapred.JobClient: Combine output records=0
12/03/25 00:55:00 INFO mapred.JobClient: Map input records=244
12/03/25 00:55:00 INFO mapred.JobClient: Spilled Records=1887
12/03/25 00:55:00 INFO mapred.JobClient: Map output bytes=19699
12/03/25 00:55:00 INFO mapred.JobClient: Combine input records=0
12/03/25 00:55:00 INFO mapred.JobClient: Map output records=1887
The map task seems complete, but the reduce task shows the following error in the logs:
2012-03-25 00:10:35,202 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201203242348_0001_r_000000_0: Got 1 new map-outputs
2012-03-25 00:10:40,193 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201203242348_0001_r_000000_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
2012-03-25 00:10:40,243 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201203242348_0001_m_000000_0, compressed len: 23479, decompressed len: 23475
2012-03-25 00:10:40,243 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 23475 bytes (23479 raw bytes) into RAM from attempt_201203242348_0001_m_000000_0
2012-03-25 00:11:35,194 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201203242348_0001_r_000000_0 Need another 1 map output(s) where 1 is already in progress
2012-03-25 00:11:35,194 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201203242348_0001_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts)
2012-03-25 00:12:35,197 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201203242348_0001_r_000000_0 Need another 1 map output(s) where 1 is already in progress
2012-03-25 00:12:35,197 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201203242348_0001_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts)
2012-03-25 00:13:35,202 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201203242348_0001_r_000000_0 Need another 1 map output(s) where 1 is already in progress
2012-03-25 00:13:35,202 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201203242348_0001_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts)
2012-03-25 00:13:40,249 INFO org.apache.hadoop.mapred.ReduceTask: Failed to shuffle from attempt_201203242348_0001_m_000000_0
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at sun.net.www.http.ChunkedInputStream.fastRead(ChunkedInputStream.java:239)
at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:680)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2959)
at org.apache.hadoop.mapred.IFileInputStream.doRead(IFileInputStream.java:149)
at org.apache.hadoop.mapred.IFileInputStream.read(IFileInputStream.java:101)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1522)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
The following are the contents of the task tracker logs:
2012-03-25 00:10:27,910 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201203242348_0001_m_000002_0 task's state:UNASSIGNED
2012-03-25 00:10:27,915 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201203242348_0001_m_000002_0
2012-03-25 00:10:27,915 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201203242348_0001_m_000002_0
2012-03-25 00:10:28,453 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201203242348_0001_m_625085452
2012-03-25 00:10:28,454 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201203242348_0001_m_625085452 spawned.
2012-03-25 00:10:29,217 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201203242348_0001_m_625085452 given task: attempt_201203242348_0001_m_000002_0
2012-03-25 00:10:29,523 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201203242348_0001_m_000002_0 0.0% setup
2012-03-25 00:10:29,524 INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201203242348_0001_m_000002_0 is done.
2012-03-25 00:10:29,524 INFO org.apache.hadoop.mapred.TaskTracker: reported output size for attempt_201203242348_0001_m_000002_0 was 0
2012-03-25 00:10:29,526 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2
2012-03-25 00:10:29,718 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201203242348_0001_m_625085452 exited. Number of tasks it ran: 1
2012-03-25 00:10:30,911 INFO org.apache.hadoop.mapred.TaskTracker: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201203242348_0001/attempt_201203242348_0001_m_000002_0/output/file.out in any of the configured local directories
2012-03-25 00:10:30,952 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201203242348_0001_m_000000_0 task's state:UNASSIGNED
2012-03-25 00:10:30,952 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201203242348_0001_m_000000_0
2012-03-25 00:10:30,952 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201203242348_0001_m_000000_0
2012-03-25 00:10:30,952 INFO org.apache.hadoop.mapred.TaskTracker: Received KillTaskAction for task: attempt_201203242348_0001_m_000002_0
2012-03-25 00:10:30,952 INFO org.apache.hadoop.mapred.TaskTracker: About to purge task: attempt_201203242348_0001_m_000002_0
2012-03-25 00:10:30,952 INFO org.apache.hadoop.mapred.TaskRunner: attempt_201203242348_0001_m_000002_0 done; removing files.
2012-03-25 00:10:30,952 INFO org.apache.hadoop.mapred.IndexCache: Map ID attempt_201203242348_0001_m_000002_0 not found in cache
2012-03-25 00:10:31,077 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201203242348_0001_m_-1399302881
2012-03-25 00:10:31,077 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201203242348_0001_m_-1399302881 spawned.
2012-03-25 00:10:31,812 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201203242348_0001_m_-1399302881 given task: attempt_201203242348_0001_m_000000_0
2012-03-25 00:10:32,642 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201203242348_0001_m_000000_0 1.0%
2012-03-25 00:10:32,642 INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201203242348_0001_m_000000_0 is done.
2012-03-25 00:10:32,642 INFO org.apache.hadoop.mapred.TaskTracker: reported output size for attempt_201203242348_0001_m_000000_0 was 0
2012-03-25 00:10:32,642 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2
2012-03-25 00:10:32,822 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201203242348_0001_m_-1399302881 exited. Number of tasks it ran: 1
2012-03-25 00:10:33,982 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201203242348_0001_r_000000_0 task's state:UNASSIGNED
2012-03-25 00:10:33,982 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201203242348_0001_r_000000_0
2012-03-25 00:10:33,982 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201203242348_0001_r_000000_0
2012-03-25 00:10:34,057 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201203242348_0001_r_625085452
2012-03-25 00:10:34,057 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201203242348_0001_r_625085452 spawned.
2012-03-25 00:10:34,852 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201203242348_0001_r_625085452 given task: attempt_201203242348_0001_r_000000_0
2012-03-25 00:10:40,243 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 23479 bytes for reduce: 0 from map: attempt_201203242348_0001_m_000000_0 given 23479/23475
2012-03-25 00:10:40,243 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 192.168.1.33:50060, dest: 192.168.1.33:60790, bytes: 23479, op: MAPRED_SHUFFLE, cliID: attempt_201203242348_0001_m_000000_0
2012-03-25 00:10:41,153 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201203242348_0001_r_000000_0 0.0% reduce > copy >
2012-03-25 00:10:44,158 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201203242348_0001_r_000000_0 0.0% reduce > copy >
2012-03-25 00:16:05,244 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 23479 bytes for reduce: 0 from map: attempt_201203242348_0001_m_000000_0 given 23479/23475
2012-03-25 00:16:05,244 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 192.168.1.33:50060, dest: 192.168.1.33:60864, bytes: 23479, op: MAPRED_SHUFFLE, cliID: attempt_201203242348_0001_m_000000_0
2012-03-25 00:16:05,249 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201203242348_0001_r_000000_0 0.0% reduce > copy >
2012-03-25 00:16:08,249 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201203242348_0001_r_000000_0 0.0% reduce > copy >
2012-03-25 00:21:25,251 FATAL org.apache.hadoop.mapred.TaskTracker: Task: attempt_201203242348_0001_r_000000_0 - Killed due to Shuffle Failure: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
I had opened the ports 9000 and 9001 in the windows firewall
I checked the telnet output to verify that these ports were indeed open:
C:\Windows\system32>netstat -a -n | grep -e "500[367]0"
TCP 0.0.0.0:50030 0.0.0.0:0 LISTENING
TCP 0.0.0.0:50060 0.0.0.0:0 LISTENING
TCP 0.0.0.0:50070 0.0.0.0:0 LISTENING
TCP [::]:50030 [::]:0 LISTENING
TCP [::]:50060 [::]:0 LISTENING
TCP [::]:50070 [::]:0 LISTENING
C:\Windows\system32>netstat -a -n | grep -e "900[01]"
TCP 127.0.0.1:9000 0.0.0.0:0 LISTENING
TCP 127.0.0.1:9000 127.0.0.1:60332 ESTABLISHED
TCP 127.0.0.1:9000 127.0.0.1:60987 ESTABLISHED
TCP 127.0.0.1:9001 0.0.0.0:0 LISTENING
TCP 127.0.0.1:9001 127.0.0.1:60410 ESTABLISHED
TCP 127.0.0.1:60332 127.0.0.1:9000 ESTABLISHED
TCP 127.0.0.1:60410 127.0.0.1:9001 ESTABLISHED
TCP 127.0.0.1:60987 127.0.0.1:9000 ESTABLISHED
Could you help with both the issues of installation and getting the reduce task to work?
I looked at http://wiki.apache.org/hadoop/SocketTimeout and a few other links and tried the suggestions, but without any success.
I appreciate your patience in reading this post and would be happy to provide additional details.
Thanks in advance.
See this line in your logs:
2012-03-25 00:10:30,911 INFO org.apache.hadoop.mapred.TaskTracker: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201203242348_0001/attempt_201203242348_0001_m_000002_0/output/file.out in any of the configured local directories
I am guessing that you need to check hadoop.tmp.dir and mapred.local.dir. You mentioned about the configs that you are using and so the values of these two params is default. The default values of these params is given here. Set those to some relevant location and try again.
NOTE: Before you this change, you need to stop hadoop and start after you are done.

Resources