Hadoop maps but not reduce - hadoop

When I start a MapReduce job on my server the job will get to map 100% reduce 0% and then hang.
Opening up the web console shows that all the map jobs have completed and there is 1 "NEW" reduce but 0 "RUNNING" reduces.
The console output for the job is:
15/01/22 10:26:01 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/01/22 10:26:01 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/01/22 10:26:02 INFO input.FileInputFormat: Total input paths to process : 1
15/01/22 10:26:02 INFO mapreduce.JobSubmitter: number of splits:1
15/01/22 10:26:02 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
15/01/22 10:26:02 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
15/01/22 10:26:02 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
15/01/22 10:26:02 INFO Configuration.deprecation: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
15/01/22 10:26:02 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
15/01/22 10:26:02 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
15/01/22 10:26:02 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
15/01/22 10:26:02 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
15/01/22 10:26:02 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
15/01/22 10:26:02 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/01/22 10:26:02 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
15/01/22 10:26:02 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
15/01/22 10:26:03 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1421950773318_0001
15/01/22 10:26:04 INFO impl.YarnClientImpl: Submitted application application_1421950773318_0001 to ResourceManager at /0.0.0.0:8032
15/01/22 10:26:04 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1421950773318_0001/
15/01/22 10:26:04 INFO mapreduce.Job: Running job: job_1421950773318_0001
15/01/22 10:26:15 INFO mapreduce.Job: Job job_1421950773318_0001 running in uber mode : false
15/01/22 10:26:15 INFO mapreduce.Job: map 0% reduce 0%
15/01/22 10:26:33 INFO mapreduce.Job: map 100% reduce 0%
I am running:
64 bit CentOS release 6.4
Hadoop 2.2.0-gphd-3.1.0.0

I tried the wordcount example on wiki.apache.org/hadoop/WordCount
The example did not work but I was able to fix my problem by allocating less memory for the reduce operation.

Related

Pig jobs not running on hadoop 2.6

When I execute the below dump command, it stop in the 0% complete and not continuing for next step. How to track the issue .
grunt> a = load '/user/hduser1/file1' using PigStorage(',') as (usernames:chararray, password:chararray, price:int);
2015-12-25 10:34:26,471 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
grunt> dump a;
2015-12-25 10:34:33,862 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2015-12-25 10:34:33,904 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-12-25 10:34:33,910 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2015-12-25 10:34:33,951 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2015-12-25 10:34:34,069 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2015-12-25 10:34:34,088 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2015-12-25 10:34:34,088 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2015-12-25 10:34:34,158 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-12-25 10:34:34,197 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2015-12-25 10:34:34,425 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2015-12-25 10:34:34,432 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent
2015-12-25 10:34:34,432 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2015-12-25 10:34:34,432 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
2015-12-25 10:34:34,435 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2015-12-25 10:34:35,464 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/pig-0.14.0-core-h2.jar to DistributedCache through /tmp/temp-1328911758/tmp304593952/pig-0.14.0-core-h2.jar
2015-12-25 10:34:35,519 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp-1328911758/tmp1133010181/automaton-1.11-8.jar
2015-12-25 10:34:35,575 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp-1328911758/tmp-1601875197/antlr-runtime-3.4.jar
2015-12-25 10:34:35,641 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hadoop/share/hadoop/common/lib/guava-11.0.2.jar to DistributedCache through /tmp/temp-1328911758/tmp1137978977/guava-11.0.2.jar
2015-12-25 10:34:36,218 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/lib/joda-time-2.1.jar to DistributedCache through /tmp/temp-1328911758/tmp325144073/joda-time-2.1.jar
2015-12-25 10:34:36,259 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2015-12-25 10:34:36,267 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2015-12-25 10:34:36,267 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2015-12-25 10:34:36,267 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2015-12-25 10:34:36,304 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2015-12-25 10:34:36,305 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.http.address is deprecated. Instead, use mapreduce.jobtracker.http.address
2015-12-25 10:34:36,311 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2015-12-25 10:34:36,344 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-12-25 10:34:36,706 [JobControl] WARN org.apache.hadoop.mapreduce.JobSubmitter - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
2015-12-25 10:34:36,782 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2015-12-25 10:34:36,782 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2015-12-25 10:34:36,820 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2015-12-25 10:34:36,972 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2015-12-25 10:34:37,483 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1451017152672_0001
2015-12-25 10:34:37,664 [JobControl] INFO org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources.
2015-12-25 10:34:38,078 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1451017152672_0001
2015-12-25 10:34:38,221 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://vijee-Lenovo-IdeaPad-S510p:8088/proxy/application_1451017152672_0001/
2015-12-25 10:34:38,221 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1451017152672_0001
2015-12-25 10:34:38,222 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases a
2015-12-25 10:34:38,222 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: a[1,4],a[-1,-1] C: R:
2015-12-25 10:34:38,243 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2015-12-25 10:34:38,243 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1451017152672_0001]
Find the Image for the application.
logs

hbase warning about deprecated native.lib

hadoop conf file: opt/hadoop/etc/hadoop/core-site.xml
when set
<name>hadoop.native.lib</name>
and then start hbase shell, there will be four lines of warning:
2015-02-10 11:07:46,956 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
2015-02-10 11:07:47,005 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
2015-02-10 11:07:47,046 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
2015-02-10 11:07:47,081 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
2015-02-10 11:07:47,169 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
but when set
<name>io.native.lib.available</name>
and then start hbase shell, there will be one line of warning:
2015-02-10 11:07:46,956 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
How can I set to make it doesn't show any of this warning?
I'm on hadoop 2.5.2 and hbase 0.98.8 #ubuntu x64.

MapReduce in Hadoop 2.2.0 not working

After installing and configuring my Hadoop 2.2.0 in pseudo-distributed mode everything is running, as you can see in the jps:
$ jps
2287 JobHistoryServer
1926 ResourceManager
2162 NodeManager
1834 DataNode
1756 NameNode
3013 Jps
Then I ran the wordcount example with
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /user/hdfs/file /output
And the execution frezees (?) as follows:
$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /user/hdfs/file /output
OpenJDK 64-Bit Server VM warning: You have loaded library /home/hduser/hadoop-src/hadoop-2.2.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
14/04/22 22:17:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/04/22 22:17:25 INFO client.RMProxy: Connecting to ResourceManager at /192.168.33.10:8032
14/04/22 22:17:25 INFO input.FileInputFormat: Total input paths to process : 1
14/04/22 22:17:25 INFO mapreduce.JobSubmitter: number of splits:1
14/04/22 22:17:25 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
14/04/22 22:17:25 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/04/22 22:17:25 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
14/04/22 22:17:25 INFO Configuration.deprecation: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
14/04/22 22:17:25 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
14/04/22 22:17:25 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/04/22 22:17:25 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
14/04/22 22:17:25 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/04/22 22:17:25 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/04/22 22:17:25 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/04/22 22:17:25 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
14/04/22 22:17:25 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/04/22 22:17:26 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1398204897594_0002
14/04/22 22:17:26 INFO impl.YarnClientImpl: Submitted application application_1398204897594_0002 to ResourceManager at /192.168.33.10:8032
14/04/22 22:17:26 INFO mapreduce.Job: The url to track the job: http://vagrant-ubuntu-saucy-64:8088/proxy/application_1398204897594_0002/
14/04/22 22:17:26 INFO mapreduce.Job: Running job: job_1398204897594_0002
14/04/22 22:17:36 INFO mapreduce.Job: Job job_1398204897594_0002 running in uber mode : false
14/04/22 22:17:36 INFO mapreduce.Job: map 0% reduce 0%
Any ideas?
The problem was in the file yarn-site.xml. The property must be larger than 3072 Mb, and I had it configured to 1024 Mb, so the correct way is
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>4096</value>
<description>Physical memory, in MB, to be made available to running containers</description>
</property>
I think the log is not detailed enough,you can first Open the debug mode:
export HADOOP_ROOT_LOGGER=DEBUG,console
then run wordcount job for see more log and paste

Hadoop TeraSort job failed - exited with exitCode: -1000

I'm trying to run TeraSort benckmark on my hadoop 2.1 cluster. After ran TeraGen successfully, I saw following error when running TeraSort. Could anyone help take a look?
13/12/16 01:18:25 INFO mapreduce.Job: Job job_1382326397507_0063 failed with state FAILED due to: Application application_1382326397507_0063 failed 2 times due to AM Container for appattempt_1382326397507_0063_000002 exited with exitCode: -1000 due to: java.io.FileNotFoundException: File /tmp/hadoop-root/nm-local-dir/filecache does not exist.Failing this attempt.. Failing the application.
More detailed output is as below:
[root#hadoop1 hadoop-testsuite]# hadoop jar /root/hadoop/hadoop-2.1.0-beta/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.1.0-beta.jar terasort test_tera test_tera/out
13/12/16 01:18:18 INFO terasort.TeraSort: starting
13/12/16 01:18:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/12/16 01:18:20 INFO input.FileInputFormat: Total input paths to process : 2
Spent 150ms computing base-splits.
Spent 2ms computing TeraScheduler splits.
Computing input splits took 153ms
Sampling 2 splits of 2
Making 1 from 10000 sampled records
Computing parititions took 466ms
Spent 626ms computing partitions.
13/12/16 01:18:21 INFO client.RMProxy: Connecting to ResourceManager at /10.1.57.195:54313
13/12/16 01:18:21 INFO mapreduce.JobSubmitter: number of splits:2
13/12/16 01:18:21 WARN conf.Configuration: user.name is deprecated. Instead, use mapreduce.job.user.name
13/12/16 01:18:21 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar
13/12/16 01:18:21 WARN conf.Configuration: mapred.cache.files.filesizes is deprecated. Instead, use mapreduce.job.cache.files.filesizes
13/12/16 01:18:21 WARN conf.Configuration: mapred.cache.files is deprecated. Instead, use mapreduce.job.cache.files
13/12/16 01:18:21 WARN conf.Configuration: mapreduce.partitioner.class is deprecated. Instead, use mapreduce.job.partitioner.class
13/12/16 01:18:21 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
13/12/16 01:18:21 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name
13/12/16 01:18:21 WARN conf.Configuration: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class
13/12/16 01:18:21 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
13/12/16 01:18:21 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
13/12/16 01:18:21 WARN conf.Configuration: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
13/12/16 01:18:21 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
13/12/16 01:18:21 WARN conf.Configuration: mapred.cache.files.timestamps is deprecated. Instead, use mapreduce.job.cache.files.timestamps
13/12/16 01:18:21 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
13/12/16 01:18:21 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
13/12/16 01:18:21 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1382326397507_0063
13/12/16 01:18:22 INFO impl.YarnClientImpl: Submitted application application_1382326397507_0063 to ResourceManager at /10.1.57.195:54313
13/12/16 01:18:22 INFO mapreduce.Job: The url to track the job: http://hadoop1:54315/proxy/application_1382326397507_0063/
13/12/16 01:18:22 INFO mapreduce.Job: Running job: job_1382326397507_0063
13/12/16 01:18:25 INFO mapreduce.Job: Job job_1382326397507_0063 running in uber mode : false
13/12/16 01:18:25 INFO mapreduce.Job: map 0% reduce 0%
13/12/16 01:18:25 INFO mapreduce.Job: Job job_1382326397507_0063 failed with state FAILED due to: Application application_1382326397507_0063 failed 2 times due to AM Container for appattempt_1382326397507_0063_000002 exited with exitCode: -1000 due to: java.io.FileNotFoundException: File /tmp/hadoop-root/nm-local-dir/filecache does not exist
.Failing this attempt.. Failing the application.
13/12/16 01:18:25 INFO mapreduce.Job: Counters: 0
13/12/16 01:18:25 INFO terasort.TeraSort: done
Make sure you have the following property defined in yarn-site.xml
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>${hadoop.tmp.dir}/nm-local-dir</value>
</property>

Hadoop - Example MapReduce Application not running

I deployed Hadoop 2.2.0 in Ubuntu 12.04 LTS according this article: http://codesfusion.blogspot.com/2013/10/setup-hadoop-2x-220-on-ubuntu.html?m=1
Everything is OK except when I try to run Hadoop example at last step, it's pause with message Job Running
13/11/24 23:36:30 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
13/11/24 23:36:30 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
13/11/24 23:36:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1385310900376_0001
13/11/24 23:36:32 INFO impl.YarnClientImpl: Submitted application application_1385310900376_0001 to ResourceManager at master/192.168.56.1:8040
13/11/24 23:36:32 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1385310900376_0001/
13/11/24 23:36:32 INFO mapreduce.Job: Running job: job_1385310900376_0001
In ResourceManager Web GUI, i see "App is Pending". So, how I can change to Running State?
Screenshot: http://farm8.staticflickr.com/7344/11031415055_d987e937aa_o.png
Thanks you! :)

Resources