Getting java.lang.NoSuchFieldError: INT_8 error while running spark job through oozie - hadoop

I am Getting java.lang.NoSuchFieldError: INT_8 error when I am trying to execute a spark job using OOzie on Cloudera 5.5.1 version.
Any help on this will be appreciated.
Please find the error stackstrace below.
16/01/28 11:21:17 WARN TaskSetManager: Lost task 0.2 in stage 20.0 (TID 40, Zlab-physrv1): java.lang.NoSuchFieldError: INT_8
at org.apache.spark.sql.execution.datasources.parquet.CatalystSchemaConverter.convertField(CatalystSchemaConverter.scala:327)
at org.apache.spark.sql.execution.datasources.parquet.CatalystSchemaConverter.convertField(CatalystSchemaConverter.scala:312)
at org.apache.spark.sql.execution.datasources.parquet.CatalystSchemaConverter$$anonfun$convertField$1.apply(CatalystSchemaConverter.scala:517)
at org.apache.spark.sql.execution.datasources.parquet.CatalystSchemaConverter$$anonfun$convertField$1.apply(CatalystSchemaConverter.scala:516)
at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:108)
at org.apache.spark.sql.execution.datasources.parquet.CatalystSchemaConverter.convertField(CatalystSchemaConverter.scala:516)
at org.apache.spark.sql.execution.datasources.parquet.CatalystSchemaConverter.convertField(CatalystSchemaConverter.scala:312)
at org.apache.spark.sql.execution.datasources.parquet.CatalystSchemaConverter.convertField(CatalystSchemaConverter.scala:521)
at org.apache.spark.sql.execution.datasources.parquet.CatalystSchemaConverter.convertField(CatalystSchemaConverter.scala:312)
at org.apache.spark.sql.execution.datasources.parquet.CatalystSchemaConverter$$anonfun$convert$1.apply(CatalystSchemaConverter.scala:305)
at org.apache.spark.sql.execution.datasources.parquet.CatalystSchemaConverter$$anonfun$convert$1.apply(CatalystSchemaConverter.scala:305)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at org.apache.spark.sql.types.StructType.foreach(StructType.scala:92)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at org.apache.spark.sql.types.StructType.map(StructType.scala:92)
at org.apache.spark.sql.execution.datasources.parquet.CatalystSchemaConverter.convert(CatalystSchemaConverter.scala:305)
at org.apache.spark.sql.execution.datasources.parquet.ParquetTypesConverter$.convertFromAttributes(ParquetTypesConverter.scala:58)
at org.apache.spark.sql.execution.datasources.parquet.RowWriteSupport.init(ParquetTableSupport.scala:55)
at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:277)
at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:251)
at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(ParquetRelation.scala:94)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anon$3.newInstance(ParquetRelation.scala:272)
at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:233)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
As per My idea normally we used to get this error when ever there is some difference on the jars you have used to generate the code and the jars you have used currently.
Note: When I am trying to submit the Same one using spark-submit command it's running fine.
Regards
Nisith

Finally Able to debug and fix the issue. The Issue was with the installation as one of the data nodes are having older version of parquet Jars(5.2 cdh distribution). After replacing the jars with the current version jars it was working fine.

Related

Run install pig from ozzie hadoop

I have installed oozie on my system and I have also installed pig. Now I want ozzie to run workflow from pig which is installed on my system not from the ozzie sharelib. Please help, as I get the following error:
2015-08-19 17:15:25,724 WARN PigActionExecutor:523 - SERVER[edb-node1] USER[hduser] GROUP[-] TOKEN[] APP[pig-wf] JOB[0000002-150819170943510-oozie-hdus-W] ACTION[0000002-150819170943510-oozie-hdus-W#pig-node] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.PigMain], exception invoking main(), java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.PigMain not found
2015-08-19 17:15:25,728 WARN PigActionExecutor:523 - SERVER[edb-node1] USER[hduser] GROUP[-] TOKEN[] APP[pig-wf] JOB[0000002-150819170943510-oozie-hdus-W] ACTION[0000002-150819170943510-oozie-hdus-W#pig-node] Launcher exception: java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.PigMain not found
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.PigMain not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:234)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.PigMain not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1980)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2072)
... 13 more
You've got error messages that show clearly that you have an incomplete CLASSPATH.
That's because the pig command line does a lot of things, among which setting up silently the appropriate Pig JARs in the CLASSPATH, before invoking the PigMain Java class. But Oozie calls the Java class directly; the CLASSPATH issues are supposed to be handled either...
by the Pig ShareLib, when active
or by you, as a knowledgeable Java developer -- after all, it's your
choice not to use the default way to run Pig, so you know what you
are doing, right?
Before asking your question, did you try a search on StackOverflow (and/or Google) with the following keywords? The results could prove useful.
oozie pig custom classpath
In the above case have you checked if you are able to access Pig CLI {grunt shell} or are you able to running Pig Script manually, without Using Oozie.
The Case what happend is Pig Jar was not accessable while You are trying to use via Oozie, the Best Case to avoide such issue is to use Oozie - -ShareLib , but as You mention you dnt want the same to work using sharedlib, then use some alternate case like:
Update Pig Home Path in Hadoop ClassPath. These will help MR to fetch the List of Jars in HDFS Temp Location every time Oozie Submit Request to MR.
Update Pig Home in BashProfile {if Pig shows error while accessing GRANT Shell with Command not Found}.
Hope these will help some.

Job via Oozie HDP 2.1 not creating job.splitmetainfo

When trying to execute a sqoop job which has my Hadoop program passed as a jar file in -jarFiles parameter, the execution blows off with below error. Any resolution seems to be not available. Other jobs with same Hadoop user is getting executed successfully.
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.FileNotFoundException: File does not exist: hdfs://sandbox.hortonworks.com:8020/user/root/.staging/job_1423050964699_0003/job.splitmetainfo
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1541)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1396)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1363)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:976)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:135)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1241)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1041)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1452)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1448)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1381)
So here is the way I solved it. We are using CDH5 to run Camus to pull data from kafka. We run CamusJob which is responsible for getting data from kafka using comman line:
hadoop jar...
The problem is that new hosts didn't get so-called "yarn-gateway". Cloudera names pack of configs related to service and copied to /etc/hadoop/conf
as "gateway". So I just clicked "deploy client configuration" in CM UI. YARN client conf has been copied to each YARN NodeManager node and it solved problem.

Pig 0.13 ERROR 2998: Unhandled internal error. org/apache/hadoop/mapreduce/task/JobContextImpl

Just installed Pig 0.13 and I am attempting to use it with Hadoop 1.1.2. (Pig documentation states Pig 0.13 is compatible with Hadoop 1.1.2). Per the Pig install instructions, I set $PIG_CLASSPATH
to point at /etc/hadoop where core-site.xml, hdfs-site.xml, and mapred-site.xml are defined. Hadoop cluster is functional and works fine with non-Pig jobs. Based on the error descriptions below, I understand that Pig cannot find the JobContextImpl class it is looking for.
Based on the Hadoop 1.1.2 API documentation, I don't believe "task" is a sub-package of the "mapreduce" package. I have tried adding hadoop-core-1.1.2.jar directly to $PIG_CLASSPATH
and that did not work. (After looking at the contents of hadoop-core-1.1.2.jar, and the Hadoop 1.1.2 API documentation, I don't believe JobContextImpl is defined in the package Pig is attempting to load it from). How do I get Pig 0.13 to work with Hadoop 1.1.2?
=======Error follows as below=======
14/08/03 14:01:05 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
14/08/03 14:01:05 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
14/08/03 14:01:05 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2014-08-03 14:01:05,959 [main] INFO org.apache.pig.Main - Apache Pig version 0.13.0 (r1606446) compiled Jun 29 2014, 02:27:58
2014-08-03 14:01:05,959 [main] INFO org.apache.pig.Main - Logging error messages to: /home/hadoop/pig-0.13.0/bin/pig_1407088865958.log
2014-08-03 14:01:06,112 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://master.localdomain:8020/
2014-08-03 14:01:06,388 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: master.localdomain:8021
2014-08-03 14:01:06,440 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org/apache/hadoop/mapreduce/task/JobContextImpl
Details at logfile: /home/hadoop/pig-0.13.0/bin/pig_1407088865958.log
Exception in thread "main" java.lang.NoClassDefFoundError: Could not initialize class org.apache.pig.tools.pigstats.PigStatsUtil
at org.apache.pig.Main.run(Main.java:643)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
===Contents of pig_1407088865958.log ===
Pig Stack Trace
ERROR 2998: Unhandled internal error. org/apache/hadoop/mapreduce/task/JobContextImpl
java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/task/JobContextImpl
at org.apache.pig.tools.pigstats.PigStatsUtil.<clinit>(PigStatsUtil.java:68)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:79)
at org.apache.pig.Main.run(Main.java:510)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.task.JobContextImpl
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 9 more
Though it is unclear how well this works for everyone, it appears that the asker mentioned how he solved the problem:
In my searching for help I saw posts stating that it needs to be
recompiled with a parameter that indicates version. The parameter
values I saw were 23,24. I did not know how that parameter mapped to
the version of hadoop that I am using 1.1.2. I hacked the bin/pig
script to point to hadoop-core-1.1.2.jar. The script requires
HADOOP_HOME be set (which is deprecated).

Mapreduce job fail when submitted from windows machine

I am trying to submit a M/R job from windows machine to hadoop cluster on linux. I am using hadoop 2.2.0 (HDP 2.0).
I am getting following error:
2014-06-06 08:32:37,684 [main] INFO Job.monitorAndPrintJob - Job job_1399458460502_0053 running in uber mode : false
2014-06-06 08:32:37,704 [main] INFO Job.monitorAndPrintJob - map 0% reduce 0%
2014-06-06 08:32:37,717 [main] INFO Job.monitorAndPrintJob - Job job_1399458460502_0053 failed with state FAILED due to: Application application_1399458460502_0053 failed 2 times due to AM Container for appattempt_1399458460502_0053_000002 exited with exitCode: 1 due to: Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException: /bin/bash: line 0: fg: no job control
at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
at org.apache.hadoop.util.Shell.run(Shell.java:379)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
.Failing this attempt.. Failing the application.
java.io.IOException: Job failed!
I using configuration taken from the cluster so there should be exact match.
Any hints or clue, what might be wrong?
Thx
conf.set("mapreduce.app-submission.cross-platform", "true");
I found solution. It is officially reported bug
https://issues.apache.org/jira/browse/MAPREDUCE-4052
with workaround mentioned down in discussion.
Workaround works fine for me. Seems that those bugs with something resolved on a client side are ever green.
I had a similar issue .Now it is working for me .
Check the below link
Error while running Mapreduce(yarn)from windows eclipse

Hbase Hadoop cluster.. java.io.IOException: java.lang.NoSuchMethodExceptio

I am trying to set up a hbase cluster which is running on top of a hadoop cluster.
Both clusters are up and running but
when I try to create a table in Hbase client..am seeing the following error in the logs!!
compute-0-11 : is the name node for the hadoop cluster.
2012-03-18 01:18:54,696 WARN org.apache.hadoop.hbase.util.FSUtils:
Unable to create version file at hdfs://compute-0-11:9000/hbase, retrying:
java.io.IOException: java.lang.NoSuchMethodException:
org.apache.hadoop.hdfs.protocol.ClientProtocol.create(java.lang.String, org.apache.hadoop.fs.permission.FsPermission, java.lang.String, boolean, boolean, short, long)
at java.lang.Class.getMethod(Class.java:1605)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
Please help..
Usually it happens when there is a mismatch between versions of software in your cluster and your job / client.
Errors like class was expected / interface found usually happens when different versions of hadoop APIs are used (napred vs napreduce).

Resources