Pig DUMP gets stuck in GROUP - hadoop

I'm a PIG beginner (using pig 0.10.0) and I have some simple JSON like the following:
test.json:
{
"from": "1234567890",
.....
"profile": {
"email": "me#domain.com"
.....
}
}
which i perform some group/counting in pig:
>pig -x local
with the following PIG script:
REGISTER /pig-udfs/oink.jar;
REGISTER /pig-udfs/json-simple-1.1.jar;
REGISTER /pig-udfs/guava-12.0.jar;
REGISTER /pig-udfs/elephant-bird-2.2.3.jar;
users = LOAD 'test.json' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad=true') as (json:map[]);
domain_user = FOREACH users GENERATE oink.EmailDomainFilter(json#'profile'#'email') as email, json#'from' as user_id;
DUMP domain_user; /* Outputs: (domain.com,1234567890) */
grouped_domain_user = GROUP domain_user BY email;
DUMP grouped_domain_user; /* Outputs: =stuck here= */
Basically, when i try to dump the grouped_domain_user, pig gets stuck, seemly waiting for a map output to complete:
2012-05-31 17:45:22,111 [Thread-15] INFO org.apache.hadoop.mapred.Task - Task 'attempt_local_0002_m_000000_0' done.
2012-05-31 17:45:22,119 [Thread-15] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorPlugin : null
2012-05-31 17:45:22,123 [Thread-15] INFO org.apache.hadoop.mapred.ReduceTask - ShuffleRamManager: MemoryLimit=724828160, MaxSingleShuffleLimit=181207040
2012-05-31 17:45:22,125 [Thread-15] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor
2012-05-31 17:45:22,125 [Thread-15] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor
2012-05-31 17:45:22,125 [Thread-15] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor
2012-05-31 17:45:22,126 [Thread-15] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor
2012-05-31 17:45:22,126 [Thread-15] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor
2012-05-31 17:45:22,128 [Thread for merging on-disk files] INFO org.apache.hadoop.mapred.ReduceTask - attempt_local_0002_r_000000_0 Thread started: Thread for merging on-disk files
2012-05-31 17:45:22,128 [Thread for merging in memory files] INFO org.apache.hadoop.mapred.ReduceTask - attempt_local_0002_r_000000_0 Thread started: Thread for merging in memory files
2012-05-31 17:45:22,128 [Thread for merging on-disk files] INFO org.apache.hadoop.mapred.ReduceTask - attempt_local_0002_r_000000_0 Thread waiting: Thread for merging on-disk files
2012-05-31 17:45:22,129 [Thread-15] INFO org.apache.hadoop.mapred.ReduceTask - attempt_local_0002_r_000000_0 Need another 1 map output(s) where 0 is already in progress
2012-05-31 17:45:22,129 [Thread for polling Map Completion Events] INFO org.apache.hadoop.mapred.ReduceTask - attempt_local_0002_r_000000_0 Thread started: Thread for polling Map Completion Events
2012-05-31 17:45:22,129 [Thread-15] INFO org.apache.hadoop.mapred.ReduceTask - attempt_local_0002_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts)
2012-05-31 17:45:28,118 [communication thread] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > copy >
2012-05-31 17:45:31,122 [communication thread] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > copy >
2012-05-31 17:45:37,123 [communication thread] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > copy >
2012-05-31 17:45:43,124 [communication thread] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > copy >
2012-05-31 17:45:46,124 [communication thread] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > copy >
2012-05-31 17:45:52,126 [communication thread] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > copy >
2012-05-31 17:45:58,127 [communication thread] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > copy >
2012-05-31 17:46:01,128 [communication thread] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > copy >
.... repeats ....
Suggestions would be welcome on why this is happening.
Thanks!
UPDATE
Chris solved this one for me. I was setting the fs.default.name, etc to correct values in the pig.properties, however i also had the HADOOP_CONF_DIR environment variable set to point to my local Hadoop installation with these sames values set with <final>true</final>.
Great find and much appreciated.

To mark this question as answered, and to those stumbling across this in future:
When running on local mode (whether that be for pig via the pig -x local, or submitting a map reduce job to the local job runner, if you are seeing the reduce phase 'hang', especially if you see entries in the log similar to:
2012-05-31 17:45:22,129 [Thread-15] INFO org.apache.hadoop.mapred.ReduceTask -
attempt_local_0002_r_000000_0 Need another 1 map output(s) where 0 is already in progress
Then your job, although started in local mode, has probably switched to 'clustered' mode because the mapred.job.tracker property is marked as 'final' in your $HADOOP/conf/mapred-site.xml:
<property>
<name>mapred.job.tracker</name>
<value>hdfs://localhost:9000</value>
<final>true</final>
</property>
You should also check the fs.default.name property in core-site.xml too, and ensure it is not marked as final
This means that you are unable to set this value at runtime, and you may even see error messages similar to:
12/05/22 14:28:29 WARN conf.Configuration:
file:/tmp/.../job_local_0001.xml:a attempt to override final parameter: fs.default.name; Ignoring.
12/05/22 14:28:29 WARN conf.Configuration:
file:/tmp/.../job_local_0001.xml:a attempt to override final parameter: mapred.job.tracker; Ignoring.

Related

Oozie workflow giving error on Hive job when underlying job completes successfully

Part of self-learning I am exploring Oozie, and I am practicing on Hortonworks Sandbox VM. The problem is that Oozie workflow is getting error and getting killed as a result when underlying job given by the link in Oozie UI shows success.
I have looked at this question and have included
<job-xml>hive-site.xml</job-xml>
in the job description, and have copied hive-site.xml to HDFS to the correct folder but to no avail. Additionally, I have double checked all URLs and everything is right.
I am running the Oozie job from command line. I have no idea where to start debugging or how to get a more detailed error. Following are screenshots:
Oozie Error
Underlying Hive job indicates successful completion.
I do not see the final result as hive table as I am supposed to see.
Following is the log output of the Map task:
<<< Invocation of Hive command completed <<<
Hadoop Job IDs executed by Hive:
Intercepting System.exit(12)
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [12]
Oozie Launcher failed, finishing Hadoop job gracefully
Oozie Launcher, uploading action data to HDFS sequence file: hdfs://sandbox.hortonworks.com:8020/user/root/oozie-oozi/0000005-160711211729704-oozie-oozi-W/define_congress_table--hive/action-data.seq
2016-07-12 05:30:57,817 INFO [main] zlib.ZlibFactory (ZlibFactory.java:<clinit>(49)) - Successfully loaded & initialized native-zlib library
2016-07-12 05:30:57,818 INFO [main] compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.deflate]
Oozie Launcher ends
2016-07-12 05:30:57,836 INFO [main] mapred.Task (Task.java:done(1038)) - Task:attempt_1468271868299_0037_m_000000_0 is done. And is in the process of committing
2016-07-12 05:30:57,878 INFO [main] mapred.Task (Task.java:commit(1199)) - Task attempt_1468271868299_0037_m_000000_0 is allowed to commit now
2016-07-12 05:30:57,887 INFO [main] output.FileOutputCommitter (FileOutputCommitter.java:commitTask(582)) - Saved output of task 'attempt_1468271868299_0037_m_000000_0' to hdfs://sandbox.hortonworks.com:8020/user/root/oozie-oozi/0000005-160711211729704-oozie-oozi-W/define_congress_table--hive/output/_temporary/1/task_1468271868299_0037_m_000000
2016-07-12 05:30:57,936 INFO [main] mapred.Task (Task.java:sendDone(1158)) - Task 'attempt_1468271868299_0037_m_000000_0' done.
Log Type: syslog
Log Upload Time: Tue Jul 12 05:31:05 +0000 2016
Log Length: 2781
2016-07-12 05:30:48,083 WARN [main] org.apache.hadoop.metrics2.impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-maptask.properties,hadoop-metrics2.properties
2016-07-12 05:30:48,151 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2016-07-12 05:30:48,152 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
2016-07-12 05:30:48,163 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2016-07-12 05:30:48,163 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1468271868299_0037, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier#1fbe7534)
2016-07-12 05:30:48,212 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: RM_DELEGATION_TOKEN, Service: 10.0.2.15:8050, Ident: (owner=root, renewer=oozie mr token, realUser=oozie, issueDate=1468301434802, maxDate=1468906234802, sequenceNumber=22, masterKeyId=90)
2016-07-12 05:30:48,257 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2016-07-12 05:30:48,496 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /hadoop/yarn/local/usercache/root/appcache/application_1468271868299_0037
2016-07-12 05:30:48,955 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2016-07-12 05:30:49,414 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 1
2016-07-12 05:30:49,414 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2016-07-12 05:30:49,423 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2016-07-12 05:30:49,475 WARN [main] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: Unexpected: procfs stat file is not in the expected format for process with pid 4558
2016-07-12 05:30:49,647 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: org.apache.oozie.action.hadoop.OozieLauncherInputFormat$EmptySplit#1f16b6e6
2016-07-12 05:30:49,654 INFO [main] org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2016-07-12 05:30:49,700 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
2016-07-12 05:30:50,069 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
2016-07-12 05:30:50,253 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050
To find the error check:
Check the "Child Job URLs" check each child job.
Check stdout and stderr
Sometime the error is in the child job and you will need to do some drill down on every one of them.

MapReduce job is failing with an error failed to write data

I'm trying to export data from teradata to hadoop. but my export query is failing by giving an error "Failed to write data".Please look at the Mapreduce and application logs below:
Log Type: syslog
Log Upload Time: Tue Mar 08 22:59:27 -0800 2016
Log Length: 4931
2016-03-08 22:47:07,414 WARN [main] org.apache.hadoop.metrics2.impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-maptask.properties,hadoop-metrics2.properties
2016-03-08 22:47:07,499 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2016-03-08 22:47:07,499 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
2016-03-08 22:47:07,509 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2016-03-08 22:47:07,510 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1457504560070_0004, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier#175b9425)
2016-03-08 22:47:07,556 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: RM_DELEGATION_TOKEN, Service: 39.7.48.2:8032,39.7.48.3:8032, Ident: (owner=hive, renewer=oozie mr token, realUser=oozie, issueDate=1457506410968, maxDate=1458111210968, sequenceNumber=908, masterKeyId=280)
2016-03-08 22:47:07,599 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2016-03-08 22:47:07,848 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /data1/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data2/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data3/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data4/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data5/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data6/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data7/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data8/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data9/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data10/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data12/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004
2016-03-08 22:47:08,132 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2016-03-08 22:47:08,623 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2016-03-08 22:47:08,840 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: com.teradata.dynaload.hcatalog.mapper.TDInputFormat$TeradataInputSplit#2ece4966
2016-03-08 22:47:08,844 INFO [main] com.teradata.dynaload.hcatalog.mapper.TDInputFormat$TeradataRecordReader: recordreader class com.teradata.dynaload.hcatalog.mapper.TDInputFormat$TeradataRecordReaderinitialize time is: 1457506028844
2016-03-08 22:47:09,512 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 300417020(1201668080)
2016-03-08 22:47:09,512 INFO [main] org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 1146
2016-03-08 22:47:09,512 INFO [main] org.apache.hadoop.mapred.MapTask: soft limit at 841167680
2016-03-08 22:47:09,512 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 1201668096
2016-03-08 22:47:09,512 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 300417020; length = 75104256
2016-03-08 22:47:09,515 INFO [main] org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2016-03-08 22:47:09,518 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
2016-03-08 22:47:09,518 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
2016-03-08 22:47:09,848 WARN [main] org.apache.hadoop.hive.conf.HiveConf: HiveConf of name hive.metastore.local does not exist
2016-03-08 22:47:09,914 INFO [main] hive.metastore: Trying to connect to metastore with URI thrift://apus2.labs.teradata.com:9083
2016-03-08 22:47:09,951 INFO [main] hive.metastore: Connected to metastore.
2016-03-08 22:47:10,407 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
2016-03-08 22:47:10,452 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 1
2016-03-08 22:47:10,453 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.work.output.dir is deprecated. Instead, use mapreduce.task.output.dir
2016-03-08 22:47:10,453 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
2016-03-08 22:47:10,457 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
APPLICATION Master LOGS:
Log Type: stderr
Log Upload Time: Tue Mar 08 22:59:27 -0800 2016
Log Length: 240
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Log Type: stdout
Log Upload Time: Tue Mar 08 22:59:27 -0800 2016
Log Length: 0
Log Type: syslog
Log Upload Time: Tue Mar 08 22:59:27 -0800 2016
Log Length: 66959
Showing 4096 bytes of 66959 total. Click here for the full log.
ILED
2016-03-08 22:59:19,325 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: In stop, writing event JOB_FAILED
2016-03-08 22:59:19,456 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copying hdfs://C423A:8020/user/hive/.staging/job_1457504560070_0004/job_1457504560070_0004_1.jhist to hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004-1457506422934-hive-oozie%3Aaction%3AT%3Djava%3AW%3DTDExportMR%3AA%3Dexport%3AID%3D00001-1457506759193-0-0-FAILED-default-1457506429243.jhist_tmp
2016-03-08 22:59:19,550 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copied to done location: hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004-1457506422934-hive-oozie%3Aaction%3AT%3Djava%3AW%3DTDExportMR%3AA%3Dexport%3AID%3D00001-1457506759193-0-0-FAILED-default-1457506429243.jhist_tmp
2016-03-08 22:59:19,562 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copying hdfs://C423A:8020/user/hive/.staging/job_1457504560070_0004/job_1457504560070_0004_1_conf.xml to hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004_conf.xml_tmp
2016-03-08 22:59:19,614 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copied to done location: hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004_conf.xml_tmp
2016-03-08 22:59:19,645 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004.summary_tmp to hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004.summary
2016-03-08 22:59:19,654 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004_conf.xml_tmp to hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004_conf.xml
2016-03-08 22:59:19,666 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004-1457506422934-hive-oozie%3Aaction%3AT%3Djava%3AW%3DTDExportMR%3AA%3Dexport%3AID%3D00001-1457506759193-0-0-FAILED-default-1457506429243.jhist_tmp to hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004-1457506422934-hive-oozie%3Aaction%3AT%3Djava%3AW%3DTDExportMR%3AA%3Dexport%3AID%3D00001-1457506759193-0-0-FAILED-default-1457506429243.jhist
2016-03-08 22:59:19,666 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopped JobHistoryEventHandler. super.stop()
2016-03-08 22:59:19,671 INFO [Thread-89] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: Setting job diagnostics to Task failed task_1457504560070_0004_m_000004
Job failed as tasks failed. failedMaps:1 failedReduces:0
2016-03-08 22:59:19,672 INFO [Thread-89] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: History url is http://apus2.labs.teradata.com:19888/jobhistory/job/job_1457504560070_0004
2016-03-08 22:59:19,680 INFO [Thread-89] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: Waiting for application to be successfully unregistered.
2016-03-08 22:59:20,682 INFO [Thread-89] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Final Stats: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:7 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:7 ContRel:0 HostLocal:6 RackLocal:1
2016-03-08 22:59:20,684 INFO [Thread-89] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Deleting staging directory hdfs://C423A /user/hive/.staging/job_1457504560070_0004
2016-03-08 22:59:20,711 INFO [Thread-89] org.apache.hadoop.ipc.Server: Stopping server on 46067
2016-03-08 22:59:20,712 INFO [IPC Server listener on 46067] org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 46067
2016-03-08 22:59:20,712 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2016-03-08 22:59:20,714 INFO [TaskHeartbeatHandler PingChecker] org.apache.hadoop.mapreduce.v2.app.TaskHeartbeatHandler: TaskHeartbeatHandler thread interrupted.
Plese help me in resolving the issue.
You must be using sqoop to bring data to hadoop. Please paste command you are running. For "Failed to write data" , there can be multiple issues. destination parent directory is not avialble, space is not there at cluster etc.Only command can give explanation.

mapred.JobClient: Error reading task output http:... when running hadoop from Cygwin on Windows OS

I was running the "Generating vectors from documents" sample from the book "Mahout in Action" from Cygwin on Windows.
Hadoop is started only on the local machine.
Below is my running command:
$ bin/mahout seq2sparse -i reuters-seqfiles/ -o reuters-vectors -ow
But it shows below java.io.IOException, anyone knows what causes this problem? Thanks in advance!
Running on hadoop, using HADOOP_HOME=my_hadoop_path
HADOOP_CONF_DIR=my_hadoop_conf_path
13/05/13 18:38:03 WARN driver.MahoutDriver: No seq2sparse.props found on classpath, will use command-line arguments only
13/05/13 18:38:03 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum n-gram size is: 1
13/05/13 18:38:03 INFO common.HadoopUtil: Deleting reuters-vectors
13/05/13 18:38:04 INFO vectorizer.SparseVectorsFromSequenceFiles: Minimum LLR value: 1.0
13/05/13 18:38:04 INFO vectorizer.SparseVectorsFromSequenceFiles: Number of reduce tasks: 1
13/05/13 18:38:04 INFO input.FileInputFormat: Total input paths to process : 2
13/05/13 18:38:04 INFO mapred.JobClient: Running job: job_201305131836_0001
13/05/13 18:38:05 INFO mapred.JobClient: map 0% reduce 0%
13/05/13 18:38:15 INFO mapred.JobClient: Task Id : attempt_201305131836_0001_m_000003_0, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
13/05/13 18:38:15 WARN mapred.JobClient: Error reading task outputhttp://namenode_address:50060/tasklog?plaintext=true&taskid=attempt_201305131836_0001_m_000003_0&filter=stdout
13/05/13 18:38:15 WARN mapred.JobClient: Error reading task outputhttp://namenode_address:50060/tasklog?plaintext=true&taskid=attempt_201305131836_0001_m_000003_0&filter=stderr
13/05/13 18:38:21 INFO mapred.JobClient: Task Id : attempt_201305131836_0001_m_000003_1, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
Below is the running log of tasktracker:
INFO org.apache.hadoop.mapred.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
INFO org.apache.hadoop.mapred.TaskTracker: ProcessTree implementation is missing on this system. TaskMemoryManager is disabled.
INFO org.apache.hadoop.mapred.IndexCache: IndexCache created with max memory = 10485760
INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201305141049_0001_m_000002_0 task's state:UNASSIGNED
INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201305141049_0001_m_000002_0
INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201305141049_0001_m_000002_0
INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201305141049_0001_m_1036671648
INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201305141049_0001_m_1036671648 spawned.
INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201305141049_0001_m_1036671648 exited. Number of tasks it ran: 0
WARN org.apache.hadoop.mapred.TaskRunner: attempt_201305141049_0001_m_000002_0 Child Error
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
INFO org.apache.hadoop.mapred.TaskRunner: attempt_201305141049_0001_m_000002_0 done; removing files.
INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2
By looking at the whatever log you have posted, it seems you haven't set the HADOOP_HOME=my_hadoop_path and HADOOP_CONF_DIR=my_hadoop_conf_path.
You need to put those directory paths for e.g. HADOOP_HOME=/usr/lib/hadoop and HADOOP_CONF_DIR=/usr/lib/hadoop/conf.
If this is not the case, try with bin/mahout only and check if seq2sparse is present somewhere in the list. This line clearly states that it's not found: driver.MahoutDriver: No seq2sparse.props found on classpath, will use command-line arguments only.

Mahout RecommenderJob not converging

This is my first SO post so please let me know if I've missed out anything important. I am a Mahout/Hadoop beginner, and am trying to put together a distributed recommendation engine.
In order to simulate working on a remote cluster, I have set up hadoop on my machine to communicate with a Ubuntu VM (using VirtualBox), also located on my machine, which has hadoop installed on it. This setup seems to be working fine and I am now trying to run Mahout's 'RecommenderJob' on a (very!) small trial dataset as a test.
The input consists of a .csv file (saved on the hadoop dfs) containing around 50 user preferences in the format: userID, itemID, preference ... and the command I am running is:
hadoop jar /Users/MyName/src/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -Dmapred.input.dir=/user/MyName/Recommendations/input/TestRatings.csv -Dmapred.output.dir=/user/MyName/Recommendations/output -s SIMILARITY_PEARSON_CORELLATION
where TestRatings.csv is the file containing the preferences and output is the desired output directory.
At first the job looks like it's running fine, and I get the following output:
12/12/11 12:26:21 INFO common.AbstractJob: Command line arguments: {--booleanData=[false], --endPhase=[2147483647], --maxPrefsPerUser=[10], --maxPrefsPerUserInItemSimilarity=[1000], --maxSimilaritiesPerItem=[100], --minPrefsPerUser=[1], --numRecommendations=[10], --similarityClassname=[SIMILARITY_PEARSON_CORELLATION], --startPhase=[0], --tempDir=[temp]}
12/12/11 12:26:21 INFO common.AbstractJob: Command line arguments: {--booleanData=[false], --endPhase=[2147483647], --input=[/user/Naaman/Delphi/input/TestRatings.csv], --maxPrefsPerUser=[1000], --minPrefsPerUser=[1], --output=[temp/preparePreferenceMatrix], --ratingShift=[0.0], --startPhase=[0], --tempDir=[temp]}
12/12/11 12:26:21 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
12/12/11 12:26:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/12/11 12:26:22 INFO input.FileInputFormat: Total input paths to process : 1
12/12/11 12:26:22 WARN snappy.LoadSnappy: Snappy native library not loaded
12/12/11 12:26:22 INFO mapred.JobClient: Running job: job_local_0001
12/12/11 12:26:22 INFO mapred.Task: Using ResourceCalculatorPlugin : null
12/12/11 12:26:22 INFO mapred.MapTask: io.sort.mb = 100
12/12/11 12:26:22 INFO mapred.MapTask: data buffer = 79691776/99614720
12/12/11 12:26:22 INFO mapred.MapTask: record buffer = 262144/327680
12/12/11 12:26:22 INFO mapred.MapTask: Starting flush of map output
12/12/11 12:26:22 INFO compress.CodecPool: Got brand-new compressor
12/12/11 12:26:22 INFO mapred.MapTask: Finished spill 0
12/12/11 12:26:22 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
12/12/11 12:26:22 INFO mapred.LocalJobRunner:
12/12/11 12:26:22 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
12/12/11 12:26:22 INFO mapred.Task: Using ResourceCalculatorPlugin : null
12/12/11 12:26:22 INFO mapred.ReduceTask: ShuffleRamManager: MemoryLimit=1491035776, MaxSingleShuffleLimit=372758944
12/12/11 12:26:22 INFO compress.CodecPool: Got brand-new decompressor
12/12/11 12:26:22 INFO compress.CodecPool: Got brand-new decompressor
12/12/11 12:26:22 INFO compress.CodecPool: Got brand-new decompressor
12/12/11 12:26:22 INFO compress.CodecPool: Got brand-new decompressor
12/12/11 12:26:22 INFO compress.CodecPool: Got brand-new decompressor
12/12/11 12:26:22 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 Thread started: Thread for merging on-disk files
12/12/11 12:26:22 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 Thread started: Thread for merging in memory files
12/12/11 12:26:22 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 Thread waiting: Thread for merging on-disk files
12/12/11 12:26:22 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 Need another 1 map output(s) where 0 is already in progress
12/12/11 12:26:22 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 Thread started: Thread for polling Map Completion Events
12/12/11 12:26:22 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts)
12/12/11 12:26:23 INFO mapred.JobClient: map 100% reduce 0%
12/12/11 12:26:28 INFO mapred.LocalJobRunner: reduce > copy >
12/12/11 12:26:31 INFO mapred.LocalJobRunner: reduce > copy >
12/12/11 12:26:37 INFO mapred.LocalJobRunner: reduce > copy >
But then the last three lines repeat indefinitely (I left it overnight...), with the two lines:
12/12/11 12:27:22 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 Need another 1 map output(s) where 0 is already in progress
12/12/11 12:27:22 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts)
repeating every twelve rows.
I'm not sure whether there's something wrong with my input, or whether the tiny size of the trial data is messing things up. Any help and/or advice on the best way to go about this would be much appreciated.
p.s. I was trying to follow the instructions from https://www.box.com/s/041rdjeh7sny128r2uki
This is really a Hadoop or cluster issue. It is waiting on mapper output that is not coming. Look for earlier failures, in the mapping phase.

Hadoop attempt jvm hang up

Hadoop 0.20.2
There are a couple of jobs need to be executed one by one, and some attempt job's JVM can't be killed. Logs below. It seems like the tasktracker can't find the JVMId if you see "JVM Not killed jvm_201208192339_6873_m_1286217329 but just removed". I have seen the source code. But I can't find out the reason why the tasktracker can't find the JVMId. By the way, there are 13 tasktrakers, and only the new 3 of them got this problem, did I forget to configure something?
Somebody help me find the reason? Thanks. ^O^
2012-09-20 13:52:56,655 INFO org.apache.hadoop.mapred.TaskTracker:
Received KillTaskAction for task: attempt_201208192339_6873_m_004334_0
2012-09-20 13:52:56,655 INFO org.apache.hadoop.mapred.TaskTracker:
About to purge task: attempt_201208192339_6873_m_004334_0
2012-09-20 13:52:56,655 INFO org.apache.hadoop.mapred.JvmManager:
JVM Not killed jvm_201208192339_6873_m_1286217329 but just removed
2012-09-20 13:52:56,655 INFO org.apache.hadoop.mapred.TaskTracker:
addFreeSlot : current free slots : 8
2012-09-20 13:52:56,655 INFO org.apache.hadoop.mapred.IndexCache: Map
ID attempt_201208192339_6873_m_004334_0 not found in cache
2012-09-20 13:52:56,962 INFO org.apache.hadoop.mapred.TaskTracker:
LaunchTaskAction (registerTask): attempt_201208192339_6873_m_004334_0
task's state:KILLED_UNCLEAN
2012-09-20 13:52:56,962 INFO org.apache.hadoop.mapred.TaskTracker:
Trying to launch : attempt_201208192339_6873_m_004334_0 which needs 1
slots
2012-09-20 13:52:56,962 INFO org.apache.hadoop.mapred.TaskTracker: In
TaskLauncher, current free slots : 8 and trying to launch
attempt_201208192339_6873_m_004334_0 which needs 1 slots
2012-09-20 13:52:56,968 INFO org.apache.hadoop.mapred.JvmManager: In
JvmRunner constructed JVM ID: jvm_201208192339_6873_m_677724590
2012-09-20 13:52:56,968 INFO org.apache.hadoop.mapred.JvmManager: JVM
Runner jvm_201208192339_6873_m_677724590 spawned.
2012-09-20 13:52:56,974 INFO org.apache.hadoop.mapred.TaskController:
Writing commands to
/disk10/hdfs/mapred/local/ttprivate/taskTracker/root/jobcache/job_201208192339_6873/attempt_201208192339_6873_m_004334_0.cleanup/taskjvm.sh
2012-09-20 13:52:58,017 INFO org.apache.hadoop.mapred.TaskTracker: JVM
with ID: jvm_201208192339_6873_m_677724590 given task:
attempt_201208192339_6873_m_004334_0
2012-09-20 13:52:58,557 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201208192339_6873_m_004334_0 0.0%
2012-09-20 13:52:58,564 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201208192339_6873_m_004334_0 0.0% cleanup
2012-09-20 13:52:58,566 INFO org.apache.hadoop.mapred.TaskTracker:
Task attempt_201208192339_6873_m_004334_0 is done.
2012-09-20 13:52:58,566 INFO org.apache.hadoop.mapred.TaskTracker:
reported output size for attempt_201208192339_6873_m_004334_0 was -1
2012-09-20 13:52:58,566 INFO org.apache.hadoop.mapred.TaskTracker:
addFreeSlot : current free slots : 8
Eventually, that node with this problem has another issue that is its operating system didn't match the hardware.After running job in new operating system for a while, that problem didn't appear again. The old operating system turned out to not perform well in network. It will lower the network bandwidth.

Resources