PIG latin - DUMP command not displaying - hadoop

I am just trying to display the result of GROUPed records using DUMP, but instead of displaying the data, there are lots of log data. I am just playing with 10 records.
The details:
grunt> DUMP grouped_records;
2016-02-21 17:34:24,338 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY,FILTER
2016-02-21 17:34:24,339 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier, PartitionFilterOptimizer]}
2016-02-21 17:34:24,354 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2016-02-21 17:34:24,374 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2016-02-21 17:34:24,374 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2016-02-21 17:34:24,434 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2016-02-21 17:34:24,440 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2016-02-21 17:34:24,527 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2016-02-21 17:34:24,530 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers.
2016-02-21 17:34:24,534 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2016-02-21 17:34:24,541 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=142
2016-02-21 17:34:24,541 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
2016-02-21 17:34:25,128 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job662989067023626482.jar
2016-02-21 17:34:31,290 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job662989067023626482.jar created
2016-02-21 17:34:31,335 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2016-02-21 17:34:31,338 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2016-02-21 17:34:31,338 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cache
2016-02-21 17:34:31,338 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2016-02-21 17:34:31,549 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2016-02-21 17:34:31,550 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2016-02-21 17:34:31,556 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2016-02-21 17:34:31,607 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2016-02-21 17:34:31,918 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2016-02-21 17:34:31,918 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2016-02-21 17:34:31,921 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2016-02-21 17:34:31,979 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2016-02-21 17:34:32,092 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1454294818944_0034
2016-02-21 17:34:32,192 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1454294818944_0034
2016-02-21 17:34:32,198 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://quickstart.cloudera:8088/proxy/application_1454294818944_0034/
2016-02-21 17:34:32,198 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1454294818944_0034
2016-02-21 17:34:32,198 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases filtered_records,grouped_records,records
2016-02-21 17:34:32,198 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R:
2016-02-21 17:34:32,198 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_1454294818944_0034
2016-02-21 17:34:32,428 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2016-02-21 17:35:02,623 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2016-02-21 17:35:23,469 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2016-02-21 17:35:23,470 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.6.0-cdh5.5.0 0.12.0-cdh5.5.0 cloudera 2016-02-21 17:34:24 2016-02-21 17:35:23 GROUP_BY,FILTER
Success!
Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs
job_1454294818944_0034 1 1 12 12 12 12 16 16 16 16 filtered_records,grouped_records,records GROUP_BY hdfs://quickstart.cloudera:8020/tmp/temp-1703423271/tmp-988597361,
Input(s):
Successfully read 10 records (525 bytes) from: "/user/hduser/input/maxtemppig.tsv"
Output(s):
Successfully stored 0 records in: "hdfs://quickstart.cloudera:8020/tmp/temp-1703423271/tmp-988597361"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1454294818944_0034
2016-02-21 17:35:23,646 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2016-02-21 17:35:23,648 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2016-02-21 17:35:23,648 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2016-02-21 17:35:23,649 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2016-02-21 17:35:23,660 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2016-02-21 17:35:23,660 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
Commands that I tried:
records = LOAD '/user/hduser/input/maxtemppig.tsv' AS (year:chararray, temperature:int, quality:int);
filtered_records = FILTER records BY temperature IN (-10,19) AND quality IN (0,1,4,5,9);
DUMP filtered_records;
grouped_records = GROUP filtered_records BY year;
DUMP grouped_records;
max_temp = FOREACH grouped_records GENERATE group, MAX(filtered_records.temperature);
DUMP max_temp;
My input tsv file...
1950 32 01459
1951 33 01459
1950 21 01459
1940 24 01459
1950 33 01459
2000 30 01459
2010 44 01459
2014 -10 01459
2016 -20 01459
2011 19 01459
What am I missing?

There is a high chance that the parsing is not working and you are filtering all records.
Try
records = LOAD '/user/hduser/input/maxtemppig.tsv' USING PigStorage('\t') AS (year:chararray, temperature:int, quality:int);

Related

hbase import module don't succeed

I have to move some hbase tables from one hadoop cluster to another. I have extracted the tables using
bin/hbase org.apache.hadoop.hbase.mapreduce.Export \ <tablename> <outputdir> [<versions> [<starttime> [<endtime>]]]
and I've put the return files into HDFS on my new cluster.
But when I try bin/hbase org.apache.hadoop.hbase.mapreduce.Import , I have the strange following logs:
hadoop#edgenode:~$ hbase/bin/hbase org.apache.hadoop.hbase.mapreduce.Import ADCP /hbase/backup_hbase/ADCP/2022-07-04_1546/ADCP/
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hbase/lib/client-facing-thirdparty/slf4j-reload4j-1.7.33.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Reload4jLoggerFactory]
2022-10-03 11:19:09,689 INFO [main] mapreduce.Import: writing directly to table from Mapper.
2022-10-03 11:19:09,847 INFO [main] client.RMProxy: Connecting to ResourceManager at /172.16.42.42:8032
2022-10-03 11:19:09,983 INFO [main] Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2022-10-03 11:19:10,043 INFO [ReadOnlyZKClient-namenode:2181,datanode1:2181,datanode2:2181,datanode3:2181,datanode4:2181,datanode5:2181,datanode6:2181,datanode7:2181,datanode8:2181,datanode9:2181,datanode10:2181,datanode11:2181,datanode12:2181,datanode13:2181,datanode14:2181,datanode15:2181,datanode16:2181,datanode17:2181,datanode18:2181,datanode19:2181,datanode20:2181,datanode21:2181,datanode22:2181,datanode23:2181,datanode24:2181,datanode25:2181,datanode26:2181,edgenode:2181#0x05b970f7] zookeeper.ZooKeeper: Client environment:zookeeper.version=3.5.7-f0fdd52973d373ffd9c86b81d99842dc2c7f660e, built on 02/10/2020 11:30 GMT
2022-10-03 11:19:10,043 INFO [ReadOnlyZKClient-namenode:2181,datanode1:2181,datanode2:2181,datanode3:2181,datanode4:2181,datanode5:2181,datanode6:2181,datanode7:2181,datanode8:2181,datanode9:2181,datanode10:2181,datanode11:2181,datanode12:2181,datanode13:2181,datanode14:2181,datanode15:2181,datanode16:2181,datanode17:2181,datanode18:2181,datanode19:2181,datanode20:2181,datanode21:2181,datanode22:2181,datanode23:2181,datanode24:2181,datanode25:2181,datanode26:2181,edgenode:2181#0x05b970f7] zookeeper.ZooKeeper: Client environment:host.name=edgenode
2022-10-03 11:19:10,043 INFO [ReadOnlyZKClient-namenode:2181,datanode1:2181,datanode2:2181,datanode3:2181,datanode4:2181,datanode5:2181,datanode6:2181,datanode7:2181,datanode8:2181,datanode9:2181,datanode10:2181,datanode11:2181,datanode12:2181,datanode13:2181,datanode14:2181,datanode15:2181,datanode16:2181,datanode17:2181,datanode18:2181,datanode19:2181,datanode20:2181,datanode21:2181,datanode22:2181,datanode23:2181,datanode24:2181,datanode25:2181,datanode26:2181,edgenode:2181#0x05b970f7] zookeeper.ZooKeeper: Client environment:java.version=1.8.0_342
2022-10-03 11:19:10,044 INFO [ReadOnlyZKClient-namenode:2181,datanode1:2181,datanode2:2181,datanode3:2181,datanode4:2181,datanode5:2181,datanode6:2181,datanode7:2181,datanode8:2181,datanode9:2181,datanode10:2181,datanode11:2181,datanode12:2181,datanode13:2181,datanode14:2181,datanode15:2181,datanode16:2181,datanode17:2181,datanode18:2181,datanode19:2181,datanode20:2181,datanode21:2181,datanode22:2181,datanode23:2181,datanode24:2181,datanode25:2181,datanode26:2181,edgenode:2181#0x05b970f7] zookeeper.ZooKeeper: Client environment:java.vendor=Private Build
2022-10-03 11:19:10,044 INFO [ReadOnlyZKClient-namenode:2181,datanode1:2181,datanode2:2181,datanode3:2181,datanode4:2181,datanode5:2181,datanode6:2181,datanode7:2181,datanode8:2181,datanode9:2181,datanode10:2181,datanode11:2181,datanode12:2181,datanode13:2181,datanode14:2181,datanode15:2181,datanode16:2181,datanode17:2181,datanode18:2181,datanode19:2181,datanode20:2181,datanode21:2181,datanode22:2181,datanode23:2181,datanode24:2181,datanode25:2181,datanode26:2181,edgenode:2181#0x05b970f7] zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-8-openjdk-amd64/jre
2022-10-03 11:19:10,044 INFO [ReadOnlyZKClient-namenode:2181,datanode1:2181,datanode2:2181,datanode3:2181,datanode4:2181,datanode5:2181,datanode6:2181,datanode7:2181,datanode8:2181,datanode9:2181,datanode10:2181,datanode11:2181,datanode12:2181,datanode13:2181,datanode14:2181,datanode15:2181,datanode16:2181,datanode17:2181,datanode18:2181,datanode19:2181,datanode20:2181,datanode21:2181,datanode22:2181,datanode23:2181,datanode24:2181,datanode25:2181,datanode26:2181,edgenode:2181#0x05b970f7] zookeeper.ZooKeeper: hadoop-yarn-client-3.3.3.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-server-web-proxy-3.3.3.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-3.3.3.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-services-core-3.3.3.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.3.3.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-common-3.3.3.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-server-router-3.3.3.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-3.3.3.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-registry-3.3.3.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-3.3.3.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-3.3.3.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-server-timeline-pluginstorage-3.3.3.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-server-common-3.3.3.jar
2022-10-03 11:19:10,044 INFO [ReadOnlyZKClient-namenode:2181,datanode1:2181,datanode2:2181,datanode3:2181,datanode4:2181,datanode5:2181,datanode6:2181,datanode7:2181,datanode8:2181,datanode9:2181,datanode10:2181,datanode11:2181,datanode12:2181,datanode13:2181,datanode14:2181,datanode15:2181,datanode16:2181,datanode17:2181,datanode18:2181,datanode19:2181,datanode20:2181,datanode21:2181,datanode22:2181,datanode23:2181,datanode24:2181,datanode25:2181,datanode26:2181,edgenode:2181#0x05b970f7] zookeeper.ZooKeeper: Client environment:java.library.path=/home/hadoop/hadoop/lib/native
2022-10-03 11:19:10,044 INFO [ReadOnlyZKClient-namenode:2181,datanode1:2181,datanode2:2181,datanode3:2181,datanode4:2181,datanode5:2181,datanode6:2181,datanode7:2181,datanode8:2181,datanode9:2181,datanode10:2181,datanode11:2181,datanode12:2181,datanode13:2181,datanode14:2181,datanode15:2181,datanode16:2181,datanode17:2181,datanode18:2181,datanode19:2181,datanode20:2181,datanode21:2181,datanode22:2181,datanode23:2181,datanode24:2181,datanode25:2181,datanode26:2181,edgenode:2181#0x05b970f7] zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
2022-10-03 11:19:10,044 INFO [ReadOnlyZKClient-namenode:2181,datanode1:2181,datanode2:2181,datanode3:2181,datanode4:2181,datanode5:2181,datanode6:2181,datanode7:2181,datanode8:2181,datanode9:2181,datanode10:2181,datanode11:2181,datanode12:2181,datanode13:2181,datanode14:2181,datanode15:2181,datanode16:2181,datanode17:2181,datanode18:2181,datanode19:2181,datanode20:2181,datanode21:2181,datanode22:2181,datanode23:2181,datanode24:2181,datanode25:2181,datanode26:2181,edgenode:2181#0x05b970f7] zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
2022-10-03 11:19:10,044 INFO [ReadOnlyZKClient-namenode:2181,datanode1:2181,datanode2:2181,datanode3:2181,datanode4:2181,datanode5:2181,datanode6:2181,datanode7:2181,datanode8:2181,datanode9:2181,datanode10:2181,datanode11:2181,datanode12:2181,datanode13:2181,datanode14:2181,datanode15:2181,datanode16:2181,datanode17:2181,datanode18:2181,datanode19:2181,datanode20:2181,datanode21:2181,datanode22:2181,datanode23:2181,datanode24:2181,datanode25:2181,datanode26:2181,edgenode:2181#0x05b970f7] zookeeper.ZooKeeper: Client environment:os.name=Linux
2022-10-03 11:19:10,044 INFO [ReadOnlyZKClient-namenode:2181,datanode1:2181,datanode2:2181,datanode3:2181,datanode4:2181,datanode5:2181,datanode6:2181,datanode7:2181,datanode8:2181,datanode9:2181,datanode10:2181,datanode11:2181,datanode12:2181,datanode13:2181,datanode14:2181,datanode15:2181,datanode16:2181,datanode17:2181,datanode18:2181,datanode19:2181,datanode20:2181,datanode21:2181,datanode22:2181,datanode23:2181,datanode24:2181,datanode25:2181,datanode26:2181,edgenode:2181#0x05b970f7] zookeeper.ZooKeeper: Client environment:os.arch=amd64
2022-10-03 11:19:10,044 INFO [ReadOnlyZKClient-namenode:2181,datanode1:2181,datanode2:2181,datanode3:2181,datanode4:2181,datanode5:2181,datanode6:2181,datanode7:2181,datanode8:2181,datanode9:2181,datanode10:2181,datanode11:2181,datanode12:2181,datanode13:2181,datanode14:2181,datanode15:2181,datanode16:2181,datanode17:2181,datanode18:2181,datanode19:2181,datanode20:2181,datanode21:2181,datanode22:2181,datanode23:2181,datanode24:2181,datanode25:2181,datanode26:2181,edgenode:2181#0x05b970f7] zookeeper.ZooKeeper: Client environment:os.version=5.15.0-1018-kvm
2022-10-03 11:19:10,044 INFO [ReadOnlyZKClient-namenode:2181,datanode1:2181,datanode2:2181,datanode3:2181,datanode4:2181,datanode5:2181,datanode6:2181,datanode7:2181,datanode8:2181,datanode9:2181,datanode10:2181,datanode11:2181,datanode12:2181,datanode13:2181,datanode14:2181,datanode15:2181,datanode16:2181,datanode17:2181,datanode18:2181,datanode19:2181,datanode20:2181,datanode21:2181,datanode22:2181,datanode23:2181,datanode24:2181,datanode25:2181,datanode26:2181,edgenode:2181#0x05b970f7] zookeeper.ZooKeeper: Client environment:user.name=hadoop
2022-10-03 11:19:10,044 INFO [ReadOnlyZKClient-namenode:2181,datanode1:2181,datanode2:2181,datanode3:2181,datanode4:2181,datanode5:2181,datanode6:2181,datanode7:2181,datanode8:2181,datanode9:2181,datanode10:2181,datanode11:2181,datanode12:2181,datanode13:2181,datanode14:2181,datanode15:2181,datanode16:2181,datanode17:2181,datanode18:2181,datanode19:2181,datanode20:2181,datanode21:2181,datanode22:2181,datanode23:2181,datanode24:2181,datanode25:2181,datanode26:2181,edgenode:2181#0x05b970f7] zookeeper.ZooKeeper: Client environment:user.home=/home/hadoop
2022-10-03 11:19:10,044 INFO [ReadOnlyZKClient-namenode:2181,datanode1:2181,datanode2:2181,datanode3:2181,datanode4:2181,datanode5:2181,datanode6:2181,datanode7:2181,datanode8:2181,datanode9:2181,datanode10:2181,datanode11:2181,datanode12:2181,datanode13:2181,datanode14:2181,datanode15:2181,datanode16:2181,datanode17:2181,datanode18:2181,datanode19:2181,datanode20:2181,datanode21:2181,datanode22:2181,datanode23:2181,datanode24:2181,datanode25:2181,datanode26:2181,edgenode:2181#0x05b970f7] zookeeper.ZooKeeper: Client environment:user.dir=/home/hadoop
2022-10-03 11:19:10,044 INFO [ReadOnlyZKClient-namenode:2181,datanode1:2181,datanode2:2181,datanode3:2181,datanode4:2181,datanode5:2181,datanode6:2181,datanode7:2181,datanode8:2181,datanode9:2181,datanode10:2181,datanode11:2181,datanode12:2181,datanode13:2181,datanode14:2181,datanode15:2181,datanode16:2181,datanode17:2181,datanode18:2181,datanode19:2181,datanode20:2181,datanode21:2181,datanode22:2181,datanode23:2181,datanode24:2181,datanode25:2181,datanode26:2181,edgenode:2181#0x05b970f7] zookeeper.ZooKeeper: Client environment:os.memory.free=174MB
2022-10-03 11:19:10,044 INFO [ReadOnlyZKClient-namenode:2181,datanode1:2181,datanode2:2181,datanode3:2181,datanode4:2181,datanode5:2181,datanode6:2181,datanode7:2181,datanode8:2181,datanode9:2181,datanode10:2181,datanode11:2181,datanode12:2181,datanode13:2181,datanode14:2181,datanode15:2181,datanode16:2181,datanode17:2181,datanode18:2181,datanode19:2181,datanode20:2181,datanode21:2181,datanode22:2181,datanode23:2181,datanode24:2181,datanode25:2181,datanode26:2181,edgenode:2181#0x05b970f7] zookeeper.ZooKeeper: Client environment:os.memory.max=3860MB
2022-10-03 11:19:10,045 INFO [ReadOnlyZKClient-namenode:2181,datanode1:2181,datanode2:2181,datanode3:2181,datanode4:2181,datanode5:2181,datanode6:2181,datanode7:2181,datanode8:2181,datanode9:2181,datanode10:2181,datanode11:2181,datanode12:2181,datanode13:2181,datanode14:2181,datanode15:2181,datanode16:2181,datanode17:2181,datanode18:2181,datanode19:2181,datanode20:2181,datanode21:2181,datanode22:2181,datanode23:2181,datanode24:2181,datanode25:2181,datanode26:2181,edgenode:2181#0x05b970f7] zookeeper.ZooKeeper: Client environment:os.memory.total=237MB
2022-10-03 11:19:10,048 INFO [ReadOnlyZKClient-namenode:2181,datanode1:2181,datanode2:2181,datanode3:2181,datanode4:2181,datanode5:2181,datanode6:2181,datanode7:2181,datanode8:2181,datanode9:2181,datanode10:2181,datanode11:2181,datanode12:2181,datanode13:2181,datanode14:2181,datanode15:2181,datanode16:2181,datanode17:2181,datanode18:2181,datanode19:2181,datanode20:2181,datanode21:2181,datanode22:2181,datanode23:2181,datanode24:2181,datanode25:2181,datanode26:2181,edgenode:2181#0x05b970f7] zookeeper.ZooKeeper: Initiating client connection, connectString=namenode:2181,datanode1:2181,datanode2:2181,datanode3:2181,datanode4:2181,datanode5:2181,datanode6:2181,datanode7:2181,datanode8:2181,datanode9:2181,datanode10:2181,datanode11:2181,datanode12:2181,datanode13:2181,datanode14:2181,datanode15:2181,datanode16:2181,datanode17:2181,datanode18:2181,datanode19:2181,datanode20:2181,datanode21:2181,datanode22:2181,datanode23:2181,datanode24:2181,datanode25:2181,datanode26:2181,edgenode:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$15/257950720#1124fc36
2022-10-03 11:19:10,054 INFO [ReadOnlyZKClient-namenode:2181,datanode1:2181,datanode2:2181,datanode3:2181,datanode4:2181,datanode5:2181,datanode6:2181,datanode7:2181,datanode8:2181,datanode9:2181,datanode10:2181,datanode11:2181,datanode12:2181,datanode13:2181,datanode14:2181,datanode15:2181,datanode16:2181,datanode17:2181,datanode18:2181,datanode19:2181,datanode20:2181,datanode21:2181,datanode22:2181,datanode23:2181,datanode24:2181,datanode25:2181,datanode26:2181,edgenode:2181#0x05b970f7] common.X509Util: Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation
2022-10-03 11:19:10,061 INFO [ReadOnlyZKClient-namenode:2181,datanode1:2181,datanode2:2181,datanode3:2181,datanode4:2181,datanode5:2181,datanode6:2181,datanode7:2181,datanode8:2181,datanode9:2181,datanode10:2181,datanode11:2181,datanode12:2181,datanode13:2181,datanode14:2181,datanode15:2181,datanode16:2181,datanode17:2181,datanode18:2181,datanode19:2181,datanode20:2181,datanode21:2181,datanode22:2181,datanode23:2181,datanode24:2181,datanode25:2181,datanode26:2181,edgenode:2181#0x05b970f7] zookeeper.ClientCnxnSocket: jute.maxbuffer value is 4194304 Bytes
2022-10-03 11:19:10,069 INFO [ReadOnlyZKClient-namenode:2181,datanode1:2181,datanode2:2181,datanode3:2181,datanode4:2181,datanode5:2181,datanode6:2181,datanode7:2181,datanode8:2181,datanode9:2181,datanode10:2181,datanode11:2181,datanode12:2181,datanode13:2181,datanode14:2181,datanode15:2181,datanode16:2181,datanode17:2181,datanode18:2181,datanode19:2181,datanode20:2181,datanode21:2181,datanode22:2181,datanode23:2181,datanode24:2181,datanode25:2181,datanode26:2181,edgenode:2181#0x05b970f7] zookeeper.ClientCnxn: zookeeper.request.timeout value is 0. feature enabled=
2022-10-03 11:19:10,077 INFO [ReadOnlyZKClient-namenode:2181,datanode1:2181,datanode2:2181,datanode3:2181,datanode4:2181,datanode5:2181,datanode6:2181,datanode7:2181,datanode8:2181,datanode9:2181,datanode10:2181,datanode11:2181,datanode12:2181,datanode13:2181,datanode14:2181,datanode15:2181,datanode16:2181,datanode17:2181,datanode18:2181,datanode19:2181,datanode20:2181,datanode21:2181,datanode22:2181,datanode23:2181,datanode24:2181,datanode25:2181,datanode26:2181,edgenode:2181#0x05b970f7-SendThread(namenode:2181)] zookeeper.ClientCnxn: Opening socket connection to server namenode/172.16.42.42:2181. Will not attempt to authenticate using SASL (unknown error)
2022-10-03 11:19:10,084 INFO [ReadOnlyZKClient-namenode:2181,datanode1:2181,datanode2:2181,datanode3:2181,datanode4:2181,datanode5:2181,datanode6:2181,datanode7:2181,datanode8:2181,datanode9:2181,datanode10:2181,datanode11:2181,datanode12:2181,datanode13:2181,datanode14:2181,datanode15:2181,datanode16:2181,datanode17:2181,datanode18:2181,datanode19:2181,datanode20:2181,datanode21:2181,datanode22:2181,datanode23:2181,datanode24:2181,datanode25:2181,datanode26:2181,edgenode:2181#0x05b970f7-SendThread(namenode:2181)] zookeeper.ClientCnxn: Socket connection established, initiating session, client: /172.16.42.187:48598, server: namenode/172.16.42.42:2181
2022-10-03 11:19:10,120 INFO [ReadOnlyZKClient-namenode:2181,datanode1:2181,datanode2:2181,datanode3:2181,datanode4:2181,datanode5:2181,datanode6:2181,datanode7:2181,datanode8:2181,datanode9:2181,datanode10:2181,datanode11:2181,datanode12:2181,datanode13:2181,datanode14:2181,datanode15:2181,datanode16:2181,datanode17:2181,datanode18:2181,datanode19:2181,datanode20:2181,datanode21:2181,datanode22:2181,datanode23:2181,datanode24:2181,datanode25:2181,datanode26:2181,edgenode:2181#0x05b970f7-SendThread(namenode:2181)] zookeeper.ClientCnxn: Session establishment complete on server namenode/172.16.42.42:2181, sessionid = 0x1b000002cb790005, negotiated timeout = 40000
2022-10-03 11:19:11,001 INFO [ReadOnlyZKClient-namenode:2181,datanode1:2181,datanode2:2181,datanode3:2181,datanode4:2181,datanode5:2181,datanode6:2181,datanode7:2181,datanode8:2181,datanode9:2181,datanode10:2181,datanode11:2181,datanode12:2181,datanode13:2181,datanode14:2181,datanode15:2181,datanode16:2181,datanode17:2181,datanode18:2181,datanode19:2181,datanode20:2181,datanode21:2181,datanode22:2181,datanode23:2181,datanode24:2181,datanode25:2181,datanode26:2181,edgenode:2181#0x05b970f7] zookeeper.ZooKeeper: Session: 0x1b000002cb790005 closed
2022-10-03 11:19:11,001 INFO [ReadOnlyZKClient-namenode:2181,datanode1:2181,datanode2:2181,datanode3:2181,datanode4:2181,datanode5:2181,datanode6:2181,datanode7:2181,datanode8:2181,datanode9:2181,datanode10:2181,datanode11:2181,datanode12:2181,datanode13:2181,datanode14:2181,datanode15:2181,datanode16:2181,datanode17:2181,datanode18:2181,datanode19:2181,datanode20:2181,datanode21:2181,datanode22:2181,datanode23:2181,datanode24:2181,datanode25:2181,datanode26:2181,edgenode:2181#0x05b970f7-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x1b000002cb790005
2022-10-03 11:19:15,366 INFO [main] input.FileInputFormat: Total input files to process : 32
2022-10-03 11:19:15,660 INFO [main] mapreduce.JobSubmitter: number of splits:32
2022-10-03 11:19:15,902 INFO [main] mapreduce.JobSubmitter: Submitting tokens for job: job_1664271607293_0002
2022-10-03 11:19:16,225 INFO [main] conf.Configuration: resource-types.xml not found
2022-10-03 11:19:16,225 INFO [main] resource.ResourceUtils: Unable to find 'resource-types.xml'.
2022-10-03 11:19:16,231 INFO [main] resource.ResourceUtils: Adding resource type - name = memory-mb, units = Mi, type = COUNTABLE
2022-10-03 11:19:16,231 INFO [main] resource.ResourceUtils: Adding resource type - name = vcores, units = , type = COUNTABLE
2022-10-03 11:19:16,293 INFO [main] impl.YarnClientImpl: Submitted application application_1664271607293_0002
2022-10-03 11:19:16,328 INFO [main] mapreduce.Job: The url to track the job: http://namenode:8088/proxy/application_1664271607293_0002/
2022-10-03 11:19:16,329 INFO [main] mapreduce.Job: Running job: job_1664271607293_0002
2022-10-03 11:19:31,513 INFO [main] mapreduce.Job: Job job_1664271607293_0002 running in uber mode : false
2022-10-03 11:19:31,514 INFO [main] mapreduce.Job: map 0% reduce 0%
2022-10-03 11:19:31,534 INFO [main] mapreduce.Job: 2-10-03 11:19:31.345]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
[2022-10-03 11:19:31.346]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
For more detailed output, check the application tracking page: http://namenode:8088/cluster/app/application_1664271607293_0002 Then click on links to logs of each attempt.
. Failing the application.
2022-10-03 11:19:31,552 INFO [main] mapreduce.Job: Counters: 0
I don't understand what the problem could be. I went to http://namenode:8088/cluster/app/application_1664271607293_0002 but i didn't found nothing interesting. I've tried the command with different tables but get the same result. The two clusters are not one the same version but I read that it wasn't a problem. Every service works well on my clusters and I can use hbase commands on the hbase shell without any problem. Also, map reduce programs works well on my new cluster. I've also tested the copyTable and snapchot methods for the data migration, which didn't work either.
Any idea of what should be the problem? Thanks! :)
update :
I found this on a datanode syslog in the hadoop web interface, may be useful?
2022-10-04 14:12:39,341 INFO [main] org.apache.hadoop.security.SecurityUtil: Updating Configuration
2022-10-04 14:12:39,354 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Executing with tokens:
2022-10-04 14:12:39,493 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind: YARN_AM_RM_TOKEN, Service: , Ident: (appAttemptId { application_id { id: 7 cluster_timestamp: 1664271607293 } attemptId: 2 } keyId: -896624238)
2022-10-04 14:12:39,536 INFO [main] org.apache.hadoop.conf.Configuration: resource-types.xml not found
2022-10-04 14:12:39,536 INFO [main] org.apache.hadoop.yarn.util.resource.ResourceUtils: Unable to find 'resource-types.xml'.
2022-10-04 14:12:39,636 INFO [main] org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.yarn.factories.impl.pb.RecordFactoryPBImpl.newRecordInstance(RecordFactoryPBImpl.java:73)
at org.apache.hadoop.yarn.util.Records.newRecord(Records.java:36)
at org.apache.hadoop.mapreduce.v2.util.MRBuilderUtils.newJobId(MRBuilderUtils.java:39)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:298)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$5.run(MRAppMaster.java:1745)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1742)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1673)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.yarn.factories.impl.pb.RecordFactoryPBImpl.newRecordInstance(RecordFactoryPBImpl.java:70)
... 10 more
Caused by: java.lang.VerifyError: Bad type on operand stack
Exception Details:
Location:
org/apache/hadoop/mapreduce/v2/proto/MRProtos$JobIdProto$Builder.setAppId(Lorg/apache/hadoop/yarn/proto/YarnProtos$ApplicationIdProto;)Lorg/apache/hadoop/mapreduce/v2/proto/MRProtos$JobIdProto$Builder; #36: invokevirtual
Reason:
Type 'org/apache/hadoop/yarn/proto/YarnProtos$ApplicationIdProto' (current frame, stack[1]) is not assignable to 'com/google/protobuf/GeneratedMessage'
Current Frame:
bci: #36
flags: { }
locals: { 'org/apache/hadoop/mapreduce/v2/proto/MRProtos$JobIdProto$Builder', 'org/apache/hadoop/yarn/proto/YarnProtos$ApplicationIdProto' }
stack: { 'com/google/protobuf/SingleFieldBuilder', 'org/apache/hadoop/yarn/proto/YarnProtos$ApplicationIdProto' }
Bytecode:
0x0000000: 2ab4 0011 c700 1b2b c700 0bbb 002f 59b7
0x0000010: 0030 bf2a 2bb5 000a 2ab6 0031 a700 0c2a
0x0000020: b400 112b b600 3257 2a59 b400 1304 80b5
0x0000030: 0013 2ab0
Stackmap Table:
same_frame(#19)
same_frame(#31)
same_frame(#40)
at org.apache.hadoop.mapreduce.v2.proto.MRProtos$JobIdProto.newBuilder(MRProtos.java:1017)
at org.apache.hadoop.mapreduce.v2.api.records.impl.pb.JobIdPBImpl.<init>(JobIdPBImpl.java:37)
... 15 more
2022-10-04 14:12:39,641 ERROR [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.yarn.factories.impl.pb.RecordFactoryPBImpl.newRecordInstance(RecordFactoryPBImpl.java:73)
at org.apache.hadoop.yarn.util.Records.newRecord(Records.java:36)
at org.apache.hadoop.mapreduce.v2.util.MRBuilderUtils.newJobId(MRBuilderUtils.java:39)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:298)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$5.run(MRAppMaster.java:1745)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1742)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1673)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.yarn.factories.impl.pb.RecordFactoryPBImpl.newRecordInstance(RecordFactoryPBImpl.java:70)
... 10 more
Caused by: java.lang.VerifyError: Bad type on operand stack
Exception Details:
Location:
org/apache/hadoop/mapreduce/v2/proto/MRProtos$JobIdProto$Builder.setAppId(Lorg/apache/hadoop/yarn/proto/YarnProtos$ApplicationIdProto;)Lorg/apache/hadoop/mapreduce/v2/proto/MRProtos$JobIdProto$Builder; #36: invokevirtual
Reason:
Type 'org/apache/hadoop/yarn/proto/YarnProtos$ApplicationIdProto' (current frame, stack[1]) is not assignable to 'com/google/protobuf/GeneratedMessage'
Current Frame:
bci: #36
flags: { }
locals: { 'org/apache/hadoop/mapreduce/v2/proto/MRProtos$JobIdProto$Builder', 'org/apache/hadoop/yarn/proto/YarnProtos$ApplicationIdProto' }
stack: { 'com/google/protobuf/SingleFieldBuilder', 'org/apache/hadoop/yarn/proto/YarnProtos$ApplicationIdProto' }
Bytecode:
0x0000000: 2ab4 0011 c700 1b2b c700 0bbb 002f 59b7
0x0000010: 0030 bf2a 2bb5 000a 2ab6 0031 a700 0c2a
0x0000020: b400 112b b600 3257 2a59 b400 1304 80b5
0x0000030: 0013 2ab0
Stackmap Table:
same_frame(#19)
same_frame(#31)
same_frame(#40)
at org.apache.hadoop.mapreduce.v2.proto.MRProtos$JobIdProto.newBuilder(MRProtos.java:1017)
at org.apache.hadoop.mapreduce.v2.api.records.impl.pb.JobIdPBImpl.<init>(JobIdPBImpl.java:37)
... 15 more
2022-10-04 14:12:39,643 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting with status 1: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.reflect.InvocationTargetException

Hadoop 2.6.4 MR job quick freeze

Hadoop 2.6.4: 1 master + 2 slaves on AWS EC2
master: namenode, secondary namenode, resource manager
slave: datanode, node manager
When running a test MR job (wordcount), it freezes right away:
hduser#ip-172-31-4-108:~$ hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar wordcount /data/shakespeare /data/out1
16/03/21 10:45:19 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-4-108/172.31.4.108:8032
16/03/21 10:45:21 INFO input.FileInputFormat: Total input paths to process : 5
16/03/21 10:45:21 INFO mapreduce.JobSubmitter: number of splits:5
16/03/21 10:45:22 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1458556970596_0001
16/03/21 10:45:22 INFO impl.YarnClientImpl: Submitted application application_1458556970596_0001
16/03/21 10:45:22 INFO mapreduce.Job: The url to track the job: http://ip-172-31-4-108:8088/proxy/application_1458556970596_0001/
16/03/21 10:45:22 INFO mapreduce.Job: Running job: job_1458556970596_0001
When running start-dfs.sh and start-yarn.sh on master, all daemons run succesfully (jps command) on corresponding EC2 instance.
Below Resource Manager log when launching MR job:
2016-03-21 10:45:20,152 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new applicationId: 1
2016-03-21 10:45:22,784 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application with id 1 submitted by user hduser
2016-03-21 10:45:22,785 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Storing application with id application_1458556970596_0001
2016-03-21 10:45:22,787 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hduser IP=172.31.4.108 OPERATION=Submit Application Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1458556970596_0001
2016-03-21 10:45:22,788 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1458556970596_0001 State change from NEW to NEW_SAVING
2016-03-21 10:45:22,805 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing info for app: application_1458556970596_0001
2016-03-21 10:45:22,807 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1458556970596_0001 State change from NEW_SAVING to SUBMITTED
2016-03-21 10:45:22,809 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application added - appId: application_1458556970596_0001 user: hduser leaf-queue of parent: root #applications: 1
2016-03-21 10:45:22,810 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Accepted application application_1458556970596_0001 from user: hduser, in queue: default
2016-03-21 10:45:22,825 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1458556970596_0001 State change from SUBMITTED to ACCEPTED
2016-03-21 10:45:22,866 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering app attempt : appattempt_1458556970596_0001_000001
2016-03-21 10:45:22,867 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1458556970596_0001_000001 State change from NEW to SUBMITTED
2016-03-21 10:45:22,896 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: maximum-am-resource-percent is insufficient to start a single application in queue, it is likely set too low. skipping enforcement to allow at least one application to start
2016-03-21 10:45:22,896 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: maximum-am-resource-percent is insufficient to start a single application in queue for user, it is likely set too low. skipping enforcement to allow at least one application to start
2016-03-21 10:45:22,897 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Application application_1458556970596_0001 from user: hduser activated in queue: default
2016-03-21 10:45:22,898 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Application added - appId: application_1458556970596_0001 user: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue$User#1d51055, leaf-queue: default #user-pending-applications: 0 #user-active-applications: 1 #queue-pending-applications: 0 #queue-active-applications: 1
2016-03-21 10:45:22,898 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Added Application Attempt appattempt_1458556970596_0001_000001 to scheduler from user hduser in queue default
2016-03-21 10:45:22,900 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1458556970596_0001_000001 State change from SUBMITTED to SCHEDULED
Below NameNode log when launching MR job:
2016-03-21 10:45:03,746 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
2016-03-21 10:45:03,746 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
2016-03-21 10:45:20,613 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 3 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 7
2016-03-21 10:45:20,760 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /tmp/hadoop-yarn/staging/hduser/.staging/job_1458556970596_0001/job.jar. BP-1804768821-172.31.4.108-1458553823105 blk_1073741834_1010{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-5c350bcc-f752-43cd-80c1-80f68e2db73e:NORMAL:172.31.13.117:50010|RBW], ReplicaUnderConstruction[[DISK]DS-a1e2988f-2ef7-4005-8129-0ca18c95b2cb:NORMAL:172.31.14.198:50010|RBW]]}
2016-03-21 10:45:21,290 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* checkFileProgress: blk_1073741834_1010{blockUCState=COMMITTED, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-5c350bcc-f752-43cd-80c1-80f68e2db73e:NORMAL:172.31.13.117:50010|RBW], ReplicaUnderConstruction[[DISK]DS-a1e2988f-2ef7-4005-8129-0ca18c95b2cb:NORMAL:172.31.14.198:50010|RBW]]} has not reached minimal replication 1
2016-03-21 10:45:21,292 INFO org.apache.hadoop.hdfs.server.namenode.EditLogFileOutputStream: Nothing to flush
2016-03-21 10:45:21,297 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 172.31.13.117:50010 is added to blk_1073741834_1010{blockUCState=COMMITTED, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-5c350bcc-f752-43cd-80c1-80f68e2db73e:NORMAL:172.31.13.117:50010|RBW], ReplicaUnderConstruction[[DISK]DS-a1e2988f-2ef7-4005-8129-0ca18c95b2cb:NORMAL:172.31.14.198:50010|RBW]]} size 270356
2016-03-21 10:45:21,297 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 172.31.14.198:50010 is added to blk_1073741834_1010 size 270356
2016-03-21 10:45:21,706 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /tmp/hadoop-yarn/staging/hduser/.staging/job_1458556970596_0001/job.jar is closed by DFSClient_NONMAPREDUCE_-18612056_1
2016-03-21 10:45:21,714 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Increasing replication from 2 to 10 for /tmp/hadoop-yarn/staging/hduser/.staging/job_1458556970596_0001/job.jar
2016-03-21 10:45:21,812 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Increasing replication from 2 to 10 for /tmp/hadoop-yarn/staging/hduser/.staging/job_1458556970596_0001/job.split
2016-03-21 10:45:21,823 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /tmp/hadoop-yarn/staging/hduser/.staging/job_1458556970596_0001/job.split. BP-1804768821-172.31.4.108-1458553823105 blk_1073741835_1011{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-a1e2988f-2ef7-4005-8129-0ca18c95b2cb:NORMAL:172.31.14.198:50010|RBW], ReplicaUnderConstruction[[DISK]DS-5c350bcc-f752-43cd-80c1-80f68e2db73e:NORMAL:172.31.13.117:50010|RBW]]}
2016-03-21 10:45:21,849 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 172.31.13.117:50010 is added to blk_1073741835_1011{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-a1e2988f-2ef7-4005-8129-0ca18c95b2cb:NORMAL:172.31.14.198:50010|RBW], ReplicaUnderConstruction[[DISK]DS-5c350bcc-f752-43cd-80c1-80f68e2db73e:NORMAL:172.31.13.117:50010|RBW]]} size 0
2016-03-21 10:45:21,853 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 172.31.14.198:50010 is added to blk_1073741835_1011{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-a1e2988f-2ef7-4005-8129-0ca18c95b2cb:NORMAL:172.31.14.198:50010|RBW], ReplicaUnderConstruction[[DISK]DS-5c350bcc-f752-43cd-80c1-80f68e2db73e:NORMAL:172.31.13.117:50010|RBW]]} size 0
2016-03-21 10:45:21,855 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /tmp/hadoop-yarn/staging/hduser/.staging/job_1458556970596_0001/job.split is closed by DFSClient_NONMAPREDUCE_-18612056_1
2016-03-21 10:45:21,865 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /tmp/hadoop-yarn/staging/hduser/.staging/job_1458556970596_0001/job.splitmetainfo. BP-1804768821-172.31.4.108-1458553823105 blk_1073741836_1012{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-5c350bcc-f752-43cd-80c1-80f68e2db73e:NORMAL:172.31.13.117:50010|RBW], ReplicaUnderConstruction[[DISK]DS-a1e2988f-2ef7-4005-8129-0ca18c95b2cb:NORMAL:172.31.14.198:50010|RBW]]}
2016-03-21 10:45:21,876 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 172.31.14.198:50010 is added to blk_1073741836_1012{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-5c350bcc-f752-43cd-80c1-80f68e2db73e:NORMAL:172.31.13.117:50010|RBW], ReplicaUnderConstruction[[DISK]DS-a1e2988f-2ef7-4005-8129-0ca18c95b2cb:NORMAL:172.31.14.198:50010|RBW]]} size 0
2016-03-21 10:45:21,877 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 172.31.13.117:50010 is added to blk_1073741836_1012{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-5c350bcc-f752-43cd-80c1-80f68e2db73e:NORMAL:172.31.13.117:50010|RBW], ReplicaUnderConstruction[[DISK]DS-a1e2988f-2ef7-4005-8129-0ca18c95b2cb:NORMAL:172.31.14.198:50010|RBW]]} size 0
2016-03-21 10:45:21,880 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /tmp/hadoop-yarn/staging/hduser/.staging/job_1458556970596_0001/job.splitmetainfo is closed by DFSClient_NONMAPREDUCE_-18612056_1
2016-03-21 10:45:22,277 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /tmp/hadoop-yarn/staging/hduser/.staging/job_1458556970596_0001/job.xml. BP-1804768821-172.31.4.108-1458553823105 blk_1073741837_1013{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-5c350bcc-f752-43cd-80c1-80f68e2db73e:NORMAL:172.31.13.117:50010|RBW], ReplicaUnderConstruction[[DISK]DS-a1e2988f-2ef7-4005-8129-0ca18c95b2cb:NORMAL:172.31.14.198:50010|RBW]]}
2016-03-21 10:45:22,327 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 172.31.14.198:50010 is added to blk_1073741837_1013{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-5c350bcc-f752-43cd-80c1-80f68e2db73e:NORMAL:172.31.13.117:50010|RBW], ReplicaUnderConstruction[[DISK]DS-a1e2988f-2ef7-4005-8129-0ca18c95b2cb:NORMAL:172.31.14.198:50010|RBW]]} size 0
2016-03-21 10:45:22,328 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 172.31.13.117:50010 is added to blk_1073741837_1013{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-5c350bcc-f752-43cd-80c1-80f68e2db73e:NORMAL:172.31.13.117:50010|RBW], ReplicaUnderConstruction[[DISK]DS-a1e2988f-2ef7-4005-8129-0ca18c95b2cb:NORMAL:172.31.14.198:50010|RBW]]} size 0
2016-03-21 10:45:22,332 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /tmp/hadoop-yarn/staging/hduser/.staging/job_1458556970596_0001/job.xml is closed by DFSClient_NONMAPREDUCE_-18612056_1
2016-03-21 10:45:33,746 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds
2016-03-21 10:45:33,747 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
2016-03-21 10:46:03,748 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds
2016-03-21 10:46:03,748 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
2016-03-21 10:46:33,748 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds
2016-03-21 10:46:33,749 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
2016-03-21 10:47:03,749 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
2016-03-21 10:47:03,750 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
Any ideas ? thank you in advance for your support !.
Below *-site.xml files content. Note: I've indeed applied some dimensioning results values to properties, but I still had the EXACT SAME issue with minimal configuration (only mandatory properties).
core-site.xml
<configuration>
<property><name>fs.defaultFS</name><value>hdfs://ip-172-31-4-108:8020</value></property>
</configuration>
hdfs-site.xml
<configuration>
<property><name>dfs.replication</name><value>2</value></property>
<property><name>dfs.namenode.name.dir</name><value>file:///xvda1/dfs/nn</value></property>
<property><name>dfs.datanode.data.dir</name><value>file:///xvda1/dfs/dn</value></property>
</configuration>
mapred-site.xml
<configuration>
<property><name>mapreduce.jobhistory.address</name><value>ip-172-31-4-108:10020</value></property>
<property><name>mapreduce.jobhistory.webapp.address</name><value>ip-172-31-4-108:19888</value></property>
<property><name>mapreduce.framework.name</name><value>yarn</value></property>
<property><name>mapreduce.map.memory.mb</name><value>512</value></property>
<property><name>mapreduce.reduce.memory.mb</name><value>1024</value></property>
<property><name>mapreduce.map.java.opts</name><value>410</value></property>
<property><name>mapreduce.reduce.java.opts</name><value>820</value></property>
</configuration>
yarn-site.xml
<configuration>
<property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property>
<property><name>yarn.resourcemanager.hostname</name><value>ip-172-31-4-108</value></property>
<property><name>yarn.nodemanager.local-dirs</name><value>file:///xvda1/nodemgr/local</value></property>
<property><name>yarn.nodemanager.log-dirs</name><value>/var/log/hadoop-yarn/containers</value></property>
<property><name>yarn.nodemanager.remote-app-log-dir</name><value>/var/log/hadoop-yarn/apps</value></property>
<property><name>yarn.log-aggregation-enable</name><value>true</value></property>
<property><name>yarn.app.mapreduce.am.resource.mb</name><value>1024</value></property>
<property><name>yarn.app.mapreduce.am.command-opts</name><value>820</value></property>
<property><name>yarn.nodemanager.resource.memory-mb</name><value>6291456</value></property>
<property><name>yarn.scheduler.minimum_allocation-mb</name><value>524288</value></property>
<property><name>yarn.scheduler.maximum_allocation-mb</name><value>6291456</value></property>
</configuration>

Some tasks in map() fails when I run it on AWS

I was running page rank on s3://aws-publicdatasets/common-crawl/parse-output/segment/1346876860819/metadata-XXXX dataset. The program worked when I use 10 files (about 1GB) with 2 m1.medium, but when I use 300 files(20GB) with 5 m3.xlarge instances, it fails at map 39%, reduce 4%. Could you please find the possible reason for the failure?
Here are the logs.
stderr:
AttemptID:attempt_1411372099942_0001_m_000010_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000014_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000015_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000057_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000103_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000094_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000109_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000108_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000133_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000136_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000010_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000151_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000014_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000168_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000167_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000015_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000174_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000175_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000057_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000181_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000182_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000190_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000103_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000109_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000094_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000200_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000108_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000133_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000199_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000136_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000010_2 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000151_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000206_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000207_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000014_2 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000168_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000175_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000167_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000174_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000015_2 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000057_2 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000181_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000182_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000190_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000103_2 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000094_2 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000200_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000109_2 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000108_2 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000133_2 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000199_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000136_2 Timed out after 600 secs
part of syslog:
08:24:24,791 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1411372099942_0001_m_000168_1, Status : FAILED
2014-09-22 08:24:46,873 INFO org.apache.hadoop.mapreduce.Job (main): map 39% reduce 4%
2014-09-22 08:24:54,903 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1411372099942_0001_m_000175_1, Status : FAILED
2014-09-22 08:24:54,904 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1411372099942_0001_m_000167_1, Status : FAILED
2014-09-22 08:24:54,904 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1411372099942_0001_m_000174_1, Status : FAILED
2014-09-22 08:24:55,908 INFO org.apache.hadoop.mapreduce.Job (main): map 38% reduce 4%
2014-09-22 08:25:13,968 INFO org.apache.hadoop.mapreduce.Job (main): map 39% reduce 4%
2014-09-22 08:25:25,007 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1411372099942_0001_m_000015_2, Status : FAILED
2014-09-22 08:26:24,210 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1411372099942_0001_m_000057_2, Status : FAILED
2014-09-22 08:26:54,322 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1411372099942_0001_m_000181_1, Status : FAILED
2014-09-22 08:27:24,432 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1411372099942_0001_m_000182_1, Status : FAILED
2014-09-22 08:27:25,435 INFO org.apache.hadoop.mapreduce.Job (main): map 38% reduce 4%
2014-09-22 08:27:54,543 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1411372099942_0001_m_000190_1, Status : FAILED
2014-09-22 08:28:54,751 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1411372099942_0001_m_000103_2, Status : FAILED
2014-09-22 08:29:24,851 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1411372099942_0001_m_000094_2, Status : FAILED
2014-09-22 08:29:24,852 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1411372099942_0001_m_000200_1, Status : FAILED
2014-09-22 08:29:24,853 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1411372099942_0001_m_000109_2, Status : FAILED
2014-09-22 08:29:48,931 INFO org.apache.hadoop.mapreduce.Job (main): map 39% reduce 4%
2014-09-22 08:29:54,954 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1411372099942_0001_m_000108_2, Status : FAILED
2014-09-22 08:30:24,066 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1411372099942_0001_m_000133_2, Status : FAILED
2014-09-22 08:32:54,599 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1411372099942_0001_m_000199_1, Status : FAILED
2014-09-22 08:32:54,600 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1411372099942_0001_m_000136_2, Status : FAILED
2014-09-22 08:34:25,910 INFO org.apache.hadoop.mapreduce.Job (main): map 100% reduce 100%
2014-09-22 08:34:25,915 INFO org.apache.hadoop.mapreduce.Job (main): Job job_1411372099942_0001 failed with state FAILED due to: Task failed task_1411372099942_0001_m_000010
Job failed as tasks failed. failedMaps:1 failedReduces:0
Attempts for: s-1W7C8YIFC87Y8, Job 1411372099942_0001, Task
2014-09-22 08:18:27,238 WARN main org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
2014-09-22 08:18:27,322 WARN main org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
2014-09-22 08:18:28,462 INFO main org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2014-09-22 08:18:28,496 INFO main org.apache.hadoop.metrics2.sink.cloudwatch.CloudWatchSink: Initializing the CloudWatchSink for metrics.
2014-09-22 08:18:28,795 INFO main org.apache.hadoop.metrics2.impl.MetricsSinkAdapter: Sink file started
2014-09-22 08:18:28,967 INFO main org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 300 second(s).
2014-09-22 08:18:28,967 INFO main org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
2014-09-22 08:18:28,982 INFO main org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2014-09-22 08:18:28,983 INFO main org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1411372099942_0001, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier#3fc15856)
2014-09-22 08:18:29,157 INFO main org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2014-09-22 08:18:29,880 INFO main org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /mnt/var/lib/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1411372099942_0001,/mnt1/var/lib/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1411372099942_0001,/mnt2/var/lib/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1411372099942_0001
2014-09-22 08:18:30,164 WARN main org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
2014-09-22 08:18:30,182 WARN main org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
2014-09-22 08:18:31,063 INFO main org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2014-09-22 08:18:32,100 INFO main org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2014-09-22 08:18:32,605 INFO main org.apache.hadoop.mapred.MapTask: Processing split: s3://aws-publicdatasets/common-crawl/parse-output/segment/1346876860819/metadata-00122:0+67108864
2014-09-22 08:18:32,810 INFO main amazon.emr.metrics.MetricsSaver: MetricsSaver YarnChild root:hdfs:///mnt/var/em/ period:120 instanceId:i-ec84e7c1 jobflow:j-27XODJ8WMW4VP
2014-09-22 08:18:33,205 WARN main org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
2014-09-22 08:18:33,219 WARN main org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
2014-09-22 08:18:33,221 INFO main com.amazon.ws.emr.hadoop.fs.guice.EmrFSBaseModule: Consistency disabled, using com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem as FileSystem implementation.
2014-09-22 08:18:35,024 INFO main com.amazon.ws.emr.hadoop.fs.EmrFileSystem: Using com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem as filesystem implementation
2014-09-22 08:18:36,001 WARN main org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
2014-09-22 08:18:36,002 WARN main org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
2014-09-22 08:18:36,024 INFO main org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2014-09-22 08:18:36,514 INFO main org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 52428796(209715184)
2014-09-22 08:18:36,514 INFO main org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 200
2014-09-22 08:18:36,514 INFO main org.apache.hadoop.mapred.MapTask: soft limit at 167772160
2014-09-22 08:18:36,514 INFO main org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 209715200
2014-09-22 08:18:36,514 INFO main org.apache.hadoop.mapred.MapTask: kvstart = 52428796; length = 13107200
2014-09-22 08:18:36,597 INFO main com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem: Opening 's3://aws-publicdatasets/common-crawl/parse-output/segment/1346876860819/metadata-00122' for reading
2014-09-22 08:18:36,716 INFO main org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
2014-09-22 08:18:36,720 INFO main org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor ht t p: //. gz
2014-09-22 08:18:36,726 INFO main org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
2014-09-22 08:18:36,726 INFO main org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
2014-09-22 08:18:36,727 INFO main org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
Edited by: paraxx on Sep 22, 2014 10:25 AM
task_1411372099942_0001_m_000010 has timed out. Try increasing the timeout configuration parameter.
mapreduce.task.timeout=12000000

My MapReduce job become Fails

a have a mapreduce program in Eclipse. and I want to run it.. I follow the program from below url:
http://www.orzota.com/step-by-step-mapreduce-programming/
I do all things that the page says and run the program. but it show me error and my job fails.. the program create output folder but it is empty..
here is my cod:
package org.orzota.bookx.mappers;
import java.io.IOException;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
public class MyHadoopMapper extends MapReduceBase implements Mapper <LongWritable, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
public void map(LongWritable _key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
String st = value.toString();
String[] bookdata = st.split("\";\"");
output.collect(new Text(bookdata[3]), one);
}
}
public class MyHadoopReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable>{
public void reduce(Text _key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
Text key = _key;
int freq = 0;
while (values.hasNext()){
IntWritable value = (IntWritable) values.next();
freq += value.get();
}
output.collect(key, new IntWritable(freq));
}
}
public class MyHadoopDriver {
public static void main(String[] args) {
JobClient client = new JobClient();
JobConf conf = new JobConf(
org.orzota.bookx.mappers.MyHadoopDriver.class);
conf.setJobName("BookCrossing1.0");
// TODO: specify output types
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
// TODO: specify a mapper
conf.setMapperClass(org.orzota.bookx.mappers.MyHadoopMapper.class);
// TODO: specify a reducer
conf.setReducerClass(org.orzota.bookx.mappers.MyHadoopReducer.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
client.setConf(conf);
try {
JobClient.runJob(conf);
} catch (Exception e) {
e.printStackTrace();
}
}
}
and here is the errors:
13/09/03 12:19:11 INFO util.ProcessTree: setsid exited with exit code 0
13/09/03 12:19:11 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#3c2378
13/09/03 12:19:11 INFO mapred.MapTask: Processing split: file:/home/ubuntu/Eclip/Runs/input/BX-Books.csv:0+33554432
13/09/03 12:19:11 INFO mapred.MapTask: numReduceTasks: 1
13/09/03 12:19:12 INFO mapred.MapTask: io.sort.mb = 100
13/09/03 12:19:12 INFO mapred.MapTask: data buffer = 79691776/99614720
13/09/03 12:19:12 INFO mapred.MapTask: record buffer = 262144/327680
13/09/03 12:19:12 INFO mapred.JobClient: map 0% reduce 0%
13/09/03 12:19:13 INFO mapred.MapTask: Starting flush of map output
13/09/03 12:19:14 INFO mapred.MapTask: Finished spill 0
13/09/03 12:19:14 INFO mapred.Task: Task:attempt_local1379860058_0001_m_000000_0 is done. And is in the process of commiting
13/09/03 12:19:14 INFO mapred.LocalJobRunner: file:/home/ubuntu/Eclipse/Runs/input/BX-Books.csv:0+33554432
13/09/03 12:19:14 INFO mapred.Task: Task 'attempt_local1379860058_0001_m_000000_0' done.
13/09/03 12:19:14 INFO mapred.LocalJobRunner: Finishing task: attempt_local1379860058_0001_m_000000_0
13/09/03 12:19:14 INFO mapred.LocalJobRunner: Starting task: attempt_local1379860058_0001_m_000001_0
13/09/03 12:19:14 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#15dd910
13/09/03 12:19:14 INFO mapred.MapTask: Processing split: file:/home/ubuntu/Eclipse/Runs/input/BX-Books.csv:33554432+33554432
13/09/03 12:19:14 INFO mapred.MapTask: numReduceTasks: 1
13/09/03 12:19:14 INFO mapred.MapTask: io.sort.mb = 100
13/09/03 12:19:14 INFO mapred.MapTask: data buffer = 79691776/99614720
13/09/03 12:19:14 INFO mapred.MapTask: record buffer = 262144/327680
13/09/03 12:19:14 INFO mapred.JobClient: map 20% reduce 0%
13/09/03 12:19:15 INFO mapred.MapTask: Starting flush of map output
13/09/03 12:19:15 INFO mapred.MapTask: Finished spill 0
13/09/03 12:19:15 INFO mapred.Task: Task:attempt_local1379860058_0001_m_000001_0 is done. And is in the process of commiting
13/09/03 12:19:15 INFO mapred.LocalJobRunner: file:/home/ubuntu/Eclipse/Runs/input/BX-Books.csv:33554432+33554432
13/09/03 12:19:15 INFO mapred.Task: Task 'attempt_local1379860058_0001_m_000001_0' done.
13/09/03 12:19:15 INFO mapred.LocalJobRunner: Finishing task: attempt_local1379860058_0001_m_000001_0
13/09/03 12:19:15 INFO mapred.LocalJobRunner: Starting task: attempt_local1379860058_0001_m_000002_0
13/09/03 12:19:15 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#7c3885
13/09/03 12:19:15 INFO mapred.MapTask: Processing split: file:/home/ubuntu/Eclipse/Runs/input/BX-Book-Ratings.csv:0+30682276
13/09/03 12:19:15 INFO mapred.MapTask: numReduceTasks: 1
13/09/03 12:19:15 INFO mapred.MapTask: io.sort.mb = 100
13/09/03 12:19:16 INFO mapred.MapTask: data buffer = 79691776/99614720
13/09/03 12:19:16 INFO mapred.MapTask: record buffer = 262144/327680
13/09/03 12:19:16 INFO mapred.LocalJobRunner: Starting task: attempt_local1379860058_0001_m_000003_0
13/09/03 12:19:16 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#11d2572
13/09/03 12:19:16 INFO mapred.MapTask: Processing split: file:/home/ubuntu/Eclipse/Runs/input/BX-Users.csv:0+12284157
13/09/03 12:19:16 INFO mapred.MapTask: numReduceTasks: 1
13/09/03 12:19:16 INFO mapred.MapTask: io.sort.mb = 100
13/09/03 12:19:16 INFO mapred.MapTask: data buffer = 79691776/99614720
13/09/03 12:19:16 INFO mapred.MapTask: record buffer = 262144/327680
13/09/03 12:19:16 INFO mapred.LocalJobRunner: Starting task: attempt_local1379860058_0001_m_000004_0
13/09/03 12:19:16 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#164b09c
13/09/03 12:19:16 INFO mapred.MapTask: Processing split: file:/home/ubuntu/Eclipse/Runs/input/BX-Books.csv:67108864+10678575
13/09/03 12:19:16 INFO mapred.MapTask: numReduceTasks: 1
13/09/03 12:19:16 INFO mapred.MapTask: io.sort.mb = 100
13/09/03 12:19:16 INFO mapred.MapTask: data buffer = 79691776/99614720
13/09/03 12:19:16 INFO mapred.MapTask: record buffer = 262144/327680
13/09/03 12:19:16 INFO mapred.JobClient: map 40% reduce 0%
13/09/03 12:19:17 INFO mapred.MapTask: Starting flush of map output
13/09/03 12:19:17 INFO mapred.MapTask: Finished spill 0
13/09/03 12:19:17 INFO mapred.Task: Task:attempt_local1379860058_0001_m_000004_0 is done. And is in the process of commiting
13/09/03 12:19:17 INFO mapred.LocalJobRunner: file:/home/ubuntu/Eclipse/Runs/input/BX-Books.csv:67108864+10678575
13/09/03 12:19:17 INFO mapred.Task: Task 'attempt_local1379860058_0001_m_000004_0' done.
13/09/03 12:19:17 INFO mapred.LocalJobRunner: Finishing task: attempt_local1379860058_0001_m_000004_0
13/09/03 12:19:17 INFO mapred.LocalJobRunner: Map task executor complete.
13/09/03 12:19:17 WARN mapred.LocalJobRunner: job_local1379860058_0001
java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 3
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 3
at org.orzota.bookx.mappers.MyHadoopMapper.map(MyHadoopMapper.java:17)
at org.orzota.bookx.mappers.MyHadoopMapper.map(MyHadoopMapper.java:1)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
13/09/03 12:19:17 INFO mapred.JobClient: map 60% reduce 0%
13/09/03 12:19:17 INFO mapred.JobClient: Job complete: job_local1379860058_0001
13/09/03 12:19:17 INFO mapred.JobClient: Counters: 16
13/09/03 12:19:17 INFO mapred.JobClient: File Input Format Counters
13/09/03 12:19:17 INFO mapred.JobClient: Bytes Read=77795631
13/09/03 12:19:17 INFO mapred.JobClient: FileSystemCounters
13/09/03 12:19:17 INFO mapred.JobClient: FILE_BYTES_READ=178484057
13/09/03 12:19:17 INFO mapred.JobClient: FILE_BYTES_WRITTEN=6981917
13/09/03 12:19:17 INFO mapred.JobClient: Map-Reduce Framework
13/09/03 12:19:17 INFO mapred.JobClient: Map output materialized bytes=2971356
13/09/03 12:19:17 INFO mapred.JobClient: Map input records=271380
13/09/03 12:19:17 INFO mapred.JobClient: Spilled Records=271380
13/09/03 12:19:17 INFO mapred.JobClient: Map output bytes=2428578
13/09/03 12:19:17 INFO mapred.JobClient: Total committed heap usage (bytes)=883687424
13/09/03 12:19:17 INFO mapred.JobClient: CPU time spent (ms)=0
13/09/03 12:19:17 INFO mapred.JobClient: Map input bytes=77787439
13/09/03 12:19:17 INFO mapred.JobClient: SPLIT_RAW_BYTES=306
13/09/03 12:19:17 INFO mapred.JobClient: Combine input records=0
13/09/03 12:19:17 INFO mapred.JobClient: Combine output records=0
13/09/03 12:19:17 INFO mapred.JobClient: Physical memory (bytes) snapshot=0
13/09/03 12:19:17 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0
13/09/03 12:19:17 INFO mapred.JobClient: Map output records=271380
13/09/03 12:19:17 INFO mapred.JobClient: Job Failed: NA java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at org.orzota.bookx.mappers.MyHadoopDriver.main(MyHadoopDriver.java:44)
I think the error is from this line:
output.collect(new Text(bookdata[3]), one);
but I don't know what it says.. can anyone help me please? thanks..
I checked the link you provided. I think the best thing you can do is do a system.out.println() of your input key value pairs (on a small subset of your input dataset), just to be sure. If the input file you are using contains a '\n' then it might be possible that the csv record is broken into 2 seperate records which contain fewer than 8 substrings. The ArrayOutOfBoundsException seems to point in this direction. I don't think it is a mapreduce error. You could also add the following line to your map function:
if (bookdata.length!=8){
System.out.println("Warning, bad entry");
return;
}
If the simulation survives you have isolated the problem..
Most probably the input file you are reading has a row that doesn't have 4 columns.
So when you split the row into an Array,
String[] bookdata = st.split("\";\"");
And you want to access the 4th element
output.collect(new Text(bookdata[3]), one);
It fails.

hadoop: reduce happened between flush map output and finish spill before maps done

I'm new to hadoop, and i'm trying the examples wordcount/secondsort in src/examples.
wordcount test environment:
input:
file01.txt
file02.txt
secondsort test environment:
input:
sample01.txt
sample02.txt
Which means both the two test would have 2 paths to process.
I print some log info trying to understand the process of map/reduce.
See what's between Starting flush of map output and Finished spill 0:
the wordcount program has another two reduce task before a final reduce while
the secondsort program just do the reduce once and it's done.
Since these programs are so "small", i dont think the io.sort.mb/io.sort.refactor would affect this.
Can anybody explain this?
Thanks for your patience for my broken Englisth and the long log.
These are the log info (i cut some useless info to make it short):
wordcount log:
[hadoop#localhost ~]$ hadoop jar test.jar com.abc.example.test wordcount output
13/08/07 18:14:05 INFO mapred.FileInputFormat: Total input paths to process : 2
13/08/07 18:14:06 INFO mapred.JobClient: Running job: job_local_0001
13/08/07 18:14:06 INFO util.ProcessTree: setsid exited with exit code 0
...
13/08/07 18:14:06 INFO mapred.MapTask: numReduceTasks: 1
13/08/07 18:14:06 INFO mapred.MapTask: io.sort.mb = 100
13/08/07 18:14:06 INFO mapred.MapTask: data buffer = 79691776/99614720
13/08/07 18:14:06 INFO mapred.MapTask: record buffer = 262144/327680
Mapper: 0 | Hello Hadoop GoodBye Hadoop
13/08/07 18:14:06 INFO mapred.MapTask: **Starting flush of map output**
Reduce: GoodBye
Reduce: GoodBye | 1
Reduce: Hadoop
Reduce: Hadoop | 1
Reduce: Hadoop | 1
Reduce: Hello
Reduce: Hello | 1
13/08/07 18:14:06 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
13/08/07 18:14:06 INFO mapred.LocalJobRunner: hdfs://localhost:8020/user/hadoop/wordcount/file02.txt:0+28
13/08/07 18:14:06 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
13/08/07 18:14:06 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#4d16ffed
13/08/07 18:14:06 INFO mapred.MapTask: numReduceTasks: 1
13/08/07 18:14:06 INFO mapred.MapTask: io.sort.mb = 100
13/08/07 18:14:06 INFO mapred.MapTask: data buffer = 79691776/99614720
13/08/07 18:14:06 INFO mapred.MapTask: record buffer = 262144/327680
13/08/07 18:14:06 INFO mapred.MapTask: **Starting flush of map output**
Reduce: Bye
Reduce: Bye | 1
Reduce: Hello
Reduce: Hello | 1
Reduce: world
Reduce: world | 1
Reduce: world | 1
13/08/07 18:14:06 INFO mapred.MapTask: **Finished spill 0**
13/08/07 18:14:06 INFO mapred.Task: Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting
13/08/07 18:14:06 INFO mapred.LocalJobRunner: hdfs://localhost:8020/user/hadoop/wordcount/file01.txt:0+22
13/08/07 18:14:06 INFO mapred.Task: Task 'attempt_local_0001_m_000001_0' done.
13/08/07 18:14:06 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#1f3c0665
13/08/07 18:14:06 INFO mapred.LocalJobRunner:
13/08/07 18:14:06 INFO mapred.Merger: Merging 2 sorted segments
13/08/07 18:14:06 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 77 bytes
13/08/07 18:14:06 INFO mapred.LocalJobRunner:
Reduce: Bye
Reduce: Bye | 1
Reduce: GoodBye
Reduce: GoodBye | 1
Reduce: Hadoop
Reduce: Hadoop | 2
Reduce: Hello
Reduce: Hello | 1
Reduce: Hello | 1
Reduce: world
Reduce: world | 2
13/08/07 18:14:06 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
...
13/08/07 18:14:07 INFO mapred.JobClient: Reduce input groups=5
13/08/07 18:14:07 INFO mapred.JobClient: Combine output records=6
13/08/07 18:14:07 INFO mapred.JobClient: Physical memory (bytes) snapshot=0
13/08/07 18:14:07 INFO mapred.JobClient: Reduce output records=5
13/08/07 18:14:07 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0
13/08/07 18:14:07 INFO mapred.JobClient: Map output records=8
secondsort log info:
[hadoop#localhost ~]$ hadoop jar example.jar com.abc.example.example secondsort output
13/08/07 17:00:11 INFO input.FileInputFormat: Total input paths to process : 2
13/08/07 17:00:11 WARN snappy.LoadSnappy: Snappy native library not loaded
13/08/07 17:00:12 INFO mapred.JobClient: Running job: job_local_0001
13/08/07 17:00:12 INFO util.ProcessTree: setsid exited with exit code 0
13/08/07 17:00:12 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#57d94c7b
13/08/07 17:00:12 INFO mapred.MapTask: io.sort.mb = 100
13/08/07 17:00:12 INFO mapred.MapTask: data buffer = 79691776/99614720
13/08/07 17:00:12 INFO mapred.MapTask: record buffer = 262144/327680
Map: 0 | 5 49
Map: 5 | 9 57
Map: 10 | 19 46
Map: 16 | 3 21
Map: 21 | 9 48
Map: 26 | 7 57
...
13/08/07 17:00:12 INFO mapred.MapTask: **Starting flush of map output**
13/08/07 17:00:12 INFO mapred.MapTask: **Finished spill 0**
13/08/07 17:00:12 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
13/08/07 17:00:12 INFO mapred.LocalJobRunner:
13/08/07 17:00:12 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
13/08/07 17:00:12 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#f3a1ea1
13/08/07 17:00:12 INFO mapred.MapTask: io.sort.mb = 100
13/08/07 17:00:12 INFO mapred.MapTask: data buffer = 79691776/99614720
13/08/07 17:00:12 INFO mapred.MapTask: record buffer = 262144/327680
Map: 0 | 20 21
Map: 6 | 50 51
Map: 12 | 50 52
Map: 18 | 50 53
Map: 24 | 50 54
...
13/08/07 17:00:12 INFO mapred.MapTask: **Starting flush of map output**
13/08/07 17:00:12 INFO mapred.MapTask: **Finished spill 0**
13/08/07 17:00:12 INFO mapred.Task: Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting
13/08/07 17:00:12 INFO mapred.LocalJobRunner:
13/08/07 17:00:12 INFO mapred.Task: Task 'attempt_local_0001_m_000001_0' done.
13/08/07 17:00:12 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#cee4e92
13/08/07 17:00:12 INFO mapred.LocalJobRunner:
13/08/07 17:00:12 INFO mapred.Merger: Merging 2 sorted segments
13/08/07 17:00:12 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 1292 bytes
13/08/07 17:00:12 INFO mapred.LocalJobRunner:
Reduce: 0:35 -----------------
Reduce: 0:35 | 35
Reduce: 0:54 -----------------
...
13/08/07 17:00:12 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
13/08/07 17:00:12 INFO mapred.LocalJobRunner:
13/08/07 17:00:12 INFO mapred.Task: Task attempt_local_0001_r_000000_0 is allowed to commit now
13/08/07 17:00:12 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to output
13/08/07 17:00:12 INFO mapred.LocalJobRunner: reduce > reduce
13/08/07 17:00:12 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done.
13/08/07 17:00:13 INFO mapred.JobClient: map 100% reduce 100%
13/08/07 17:00:13 INFO mapred.JobClient: Job complete: job_local_0001
13/08/07 17:00:13 INFO mapred.JobClient: Counters: 22
13/08/07 17:00:13 INFO mapred.JobClient: File Output Format Counters
13/08/07 17:00:13 INFO mapred.JobClient: Bytes Written=4787
...
13/08/07 17:00:13 INFO mapred.JobClient: SPLIT_RAW_BYTES=236
13/08/07 17:00:13 INFO mapred.JobClient: Reduce input records=92
PS: The main()s for others to check out.
wordcount:
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(test.class);
conf.setJobName("wordcount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
secondsort:
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException
{
Configuration conf = new Configuration();
Job job = new Job(conf, "secondarysort");
job.setJarByClass(example.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setPartitionerClass(FirstPartitioner.class);
job.setGroupingComparatorClass(GroupingComparator.class);
job.setMapOutputKeyClass(IntPair.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
Combine output records=6
This says it all: the reduce function is used both as a combiner and a reducer. So what you are seeing is output from the combiner. The combiner is (sometimes) invoked when output is spilled.
I think you should have added your code, at least the part in the main() to show us how your job is set up. This would make it easier to answer your questions.
I think the lines such as
Reduce: GoodBye
Reduce: GoodBye | 1
are println(...)in your source codes, and you need to check the source code.

Resources