Unable to initialize any output collector in CDH5.3 - hadoop

15/05/24 06:11:40 INFO mapreduce.Job: Task Id : attempt_1432456238397_0004_m_000000_0, Status : FAILED
Error: java.io.IOException: Unable to initialize any output collector
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:412)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:439)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
I am using CDH 5.3 cloudera quickstart, I wrote MapReduce Program. When i run that on shell i getting above exception.
Can any one please help me on this, how to resolve

The error "Unable to initialize any output collector" indicates that the job failed to start the container's, there can be multiple reasons for the same. However, one must review the container logs at hdfs to identify the cause the error.
In this specific instance, the value of mapreduce.task.io.sort.mb value was entered greater than 2047 MB, however the maximum value which it allows is 2047 MB, thus anything above its causes the jobs to fail marking the value provided as Invalid.
Solution:
Set the value of mapreduce.task.io.sort.mb < 2048MB
Reference:
https://support.pivotal.io/hc/en-us/articles/205649987-Map-Reduce-job-failed-with-Unable-to-initialize-any-output-collector-
CDH5.2: MR, Unable to initialize any output collector
https://community.cloudera.com/t5/Storage-Random-Access-HDFS/HBase-MapReduce-Job-Error-java-io-IOException-Unable-to/td-p/23786

Related

Sqoop stucks at 5% of progress

I am using Sqoop for importing data from oracle to HDFS. When Job starts it stucks in 5% of progress for about 1 hours and this info is outputs:
INFO mapreduce.Job: Task Id : attempt_1535519556038_0015_m_000037_0, Status : FAILED
Container launch failed for container_1535519556038_0015_01_000043 : org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
This token is expired. current time is 1536133107764 found 1536133094775
Note: System times on machines may be out of sync. Check system time and time zones.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:375)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
and then it continues until the jobs successfully terminate and all the data imported. So, My question is What is the reason for hanging the job in 5% of progress? Why is it self-correcting? Is it normal? If not, Is it possible to relate to that issued info? How can I fix that?
The error message clearly explains “Unauthorized request to start container.
This token is expired”.
One of the options would be increasing lifespan of container by setting:
yarn.resourcemanager.rm.container-allocation.expiry-interval-ms which is by default is 10 minutes.
Note: The jobs will work if you increase the yarn.resourcemanager.rm.container-allocation.expiry-interval-ms in the yarn-site.xml config file.
<property>
<name>yarn.resourcemanager.rm.container-allocation.expiry-interval-ms</name>
<value>1000000</value>
</property>

PIG Script Error: java.lang.NoSuchMethodError: org.apache.thrift.protocol.TProtocol.getScheme

I am running a PIG script in mapreduce mode. The script reads RCFile (containing Thrift serialized data stored in GZIP compressed format), deserializes it using a UDF, extracts certain fields from the Thrift struct, and stores them.
Some of the mappers fail with following error:
2015-12-23 03:07:45,638 FATAL [Thread-5] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoSuchMethodError: org.apache.thrift.protocol.TProtocol.getScheme()Ljava/lang/Class;
at com.xxx.yyy.thrift.dto.LatLong.read(LatLong.java:553)
at com.twitter.elephantbird.util.ThriftUtils.readSingleFieldNoTag(ThriftUtils.java:318)
at com.twitter.elephantbird.util.ThriftUtils.readFieldNoTag(ThriftUtils.java:352)
at com.twitter.elephantbird.mapreduce.input.RCFileThriftTupleInputFormat$TupleReader.getCurrentTupleValue(RCFileThriftTupleInputFormat.java:74)
at com.twitter.elephantbird.pig.load.RCFileThriftPigLoader.getNext(RCFileThriftPigLoader.java:46)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Here's my script:
REGISTER '/user/ameya/libs/geo-analysis-1.0.0-SNAPSHOT.jar';
REGISTER '/user/ameya/libs/libthrift-0.8.0.jar';
REGISTER '/user/ameya/libs/thrift-0.8-types-1.1.29-SNAPSHOT.jar';
REGISTER '/user/ameya/libs/libs/elephant-bird-pig-4.7.jar';
REGISTER '/user/ameya/libs/libs/elephant-bird-rcfile-4.7.jar';
REGISTER '/user/ameya/libs/libs/elephant-bird-core-4.7.jar';
REGISTER '/user/ameya/libs/libs/elephant-bird-hadoop-compat-4.7.jar';
REGISTER '/user/ameya/libs/libs/hive-0.4.1.jar';
REGISTER '/user/ameya/libs/libs/libs/hive-serde-0.13.3.jar';
SET output.compression.enabled true;
SET output.compression.codec org.apache.hadoop.io.compress.GzipCodec;
thrift = LOAD '$input' USING com.twitter.elephantbird.pig.load.RCFileThriftPigLoader('com.xxx.yyy.thrift.dto.LatLong');
final = FOREACH thrift GENERATE (requestLatLong is not null ? requestLatLong.latitude : null) AS req_ll_lat,
(requestLatLong is not null ? requestLatLong.longitude : null) AS req_ll_lng;
STORE final INTO '$output';
I am using libthrift-0.8.0.jar, where class TProtocol.java has indeed defined getScheme() method (with public access). Interestingly, not all the mappers fail, just a few of them; but that causes my job to fail. Could this be a CLASSPATH issue?
I tried searching for this issue, but could not find relevant answers. Can someone please help me get some leads to fix this?
Found the reason. The class "org.apache.thrift.protocol.TProtocol" was defined in two jars, i.e. libthrift-0.8.0.jar and hive-0.4.1.jar. The one in hive-0.4.1.jar did not have method getScheme() defined. When it picked up hive-0.4.1.jar in the classpath first, the mappers were not able to find method getScheme().
I am not sure why the behavior was not consistent across all mappers. Any comments to explain that would be helpful.
I replaced hive-0.4.1.jar with hive-exec-0.13.3.jar and the issue got resolved.

How to solve that error msg 'Unable to initialize any output collector'?

I make simple map-redue program using hadoop api. However, I met some error message when Hadoop machine did mapredue job. How to solve it??
ps. I use hadoop 2.6.0
{"type":"MAP_ATTEMPT_FAILED","event":{"org.apache.hadoop.mapreduce.jobhistory.TaskAttemptUnsuccessfulCompletion":{"taskid":"task_1438070830862_0003_m_000002","taskType":"MAP","attemptId":"attempt_1438070830862_0003_m_000002_0","finishTime":1438760025042,"hostname":"HADOOP-NODE7","port":56135,"rackname":"/default-rack","status":"FAILED","error":"Error: java.io.IOException: Unable to initialize any output collector\n\tat org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:412)\n\tat org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81)\n\tat org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:695)\n\tat org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)\n\tat org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)\n\tat org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)\n\tat org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)\n","counters":{"org.apache.hadoop.mapreduce.jobhistory.JhCounters":{"name":"COUNTERS","groups":[{"name":"org.apache.hadoop.mapreduce.TaskCounter","displayName":"Map-Reduce Framework","counts":[{"name":"SPILLED_RECORDS","displayName":"Spilled Records","value":0},{"name":"FAILED_SHUFFLE","displayName":"Failed Shuffles","value":0},{"name":"MERGED_MAP_OUTPUTS","displayName":"Merged Map outputs","value":0},{"name":"CPU_MILLISECONDS","displayName":"CPU time spent (ms)","value":0},{"name":"PHYSICAL_MEMORY_BYTES","displayName":"Physical memory (bytes) snapshot","value":0},{"name":"VIRTUAL_MEMORY_BYTES","displayName":"Virtual memory (bytes) snapshot","value":0}]}]}},"clockSplits":[104143,11,11,11,11,11,10,11,11,11,11,11],"cpuUsages":[0,0,0,0,0,0,0,0,0,0,0,0],"vMemKbytes":[0,0,0,0,0,0,0,0,0,0,0,0],"physMemKbytes":[0,0,0,0,0,0,0,0,0,0,0,0]}}}

spark timesout maybe due to binaryFiles() with more than 1 million files in HDFS

I am reading millions of xml files via
val xmls = sc.binaryFiles(xmlDir)
The operation runs fine locally but on yarn it fails with:
client token: N/A
diagnostics: Application application_1433491939773_0012 failed 2 times due to ApplicationMaster for attempt appattempt_1433491939773_0012_000002 timed out. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1433750951883
final status: FAILED
tracking URL: http://controller01:8088/cluster/app/application_1433491939773_0012
user: ariskk
Exception in thread "main" org.apache.spark.SparkException: Application finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:622)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:647)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
On hadoops/userlogs logs I am frequently getting these messages:
15/06/08 09:15:38 WARN util.AkkaUtils: Error sending message [message = Heartbeat(1,[Lscala.Tuple2;#2b4f336b,BlockManagerId(1, controller01.stratified, 58510))] in 2 attempts
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:195)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:427)
I run my spark job via spark-submit and it works for an other HDFS directory that contains only 37k files. Any ideas how to resolve this?
Ok after getting some help on sparks mailing list, I found out there were 2 issues:
the src directory, if it is given as /my_dir/ it makes spark fail and creates the heartbeat issues. Instead it should be given as hdfs:///my_dir/*
An out of memory error appears in the logs after fixing #1. This is the spark driver running on yarn running out of memory due to the number of files (apparently it keeps all file info in memory). So I spark-submit'ed the job with --conf spark.driver.memory=8g which fixed the issue.

Pig script for frequency of books published each year

I am trying to run the pig script by following the steps given on this link- http://www.orzota.com/pig-tutorialfor-beginners/
But I am getting this error.It is not able to read the file loaded into HDFS.
Can you please help? The error is as follows-
Failed Jobs:
JobId Alias Feature Message Outputs
N/A BookXRecords,CountByYear,GroupByYear GROUP_BY,COMBINER Message: Unexpected System Error Occured: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setupUdfEnvAndStores(PigOutputFormat.java:225)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:186)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:458)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:343)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:240)
at org.apache.pig.backend.hadoop20.PigJobControl.run(PigJobControl.java:121)
at java.lang.Thread.run(Thread.java:662)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:271)
/user/hduser/output/pig_output_bookx,
Input(s):
Failed to read data from "/user/hduser/input/BX-BooksCorrected1.txt"
Output(s):
Failed to produce result in "/user/hduser/output/pig_output_bookx"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
null
2015-02-19 22:19:45,852 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
As I understand you have just downloaded the script and run. That is the reason why your the script is not able to find the exact file you want to run the pig script on. Please ensure:
You have run sed command to filter the books.csv file on your local system
Name the filtered file which you have mentioned in the script get after running the sed (Not sure what's in your case but it should be BX-BooksCorrected or BX-BooksCorrected1, please check)
Then move that file into HDFS and then try running the script it will work and wont give the error
P.S.: You get to know the nature of the error by carefully reading the error log. Happy Hadooping!

Resources