Run install pig from ozzie hadoop - hadoop

I have installed oozie on my system and I have also installed pig. Now I want ozzie to run workflow from pig which is installed on my system not from the ozzie sharelib. Please help, as I get the following error:
2015-08-19 17:15:25,724 WARN PigActionExecutor:523 - SERVER[edb-node1] USER[hduser] GROUP[-] TOKEN[] APP[pig-wf] JOB[0000002-150819170943510-oozie-hdus-W] ACTION[0000002-150819170943510-oozie-hdus-W#pig-node] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.PigMain], exception invoking main(), java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.PigMain not found
2015-08-19 17:15:25,728 WARN PigActionExecutor:523 - SERVER[edb-node1] USER[hduser] GROUP[-] TOKEN[] APP[pig-wf] JOB[0000002-150819170943510-oozie-hdus-W] ACTION[0000002-150819170943510-oozie-hdus-W#pig-node] Launcher exception: java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.PigMain not found
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.PigMain not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:234)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.PigMain not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1980)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2072)
... 13 more

You've got error messages that show clearly that you have an incomplete CLASSPATH.
That's because the pig command line does a lot of things, among which setting up silently the appropriate Pig JARs in the CLASSPATH, before invoking the PigMain Java class. But Oozie calls the Java class directly; the CLASSPATH issues are supposed to be handled either...
by the Pig ShareLib, when active
or by you, as a knowledgeable Java developer -- after all, it's your
choice not to use the default way to run Pig, so you know what you
are doing, right?
Before asking your question, did you try a search on StackOverflow (and/or Google) with the following keywords? The results could prove useful.
oozie pig custom classpath

In the above case have you checked if you are able to access Pig CLI {grunt shell} or are you able to running Pig Script manually, without Using Oozie.
The Case what happend is Pig Jar was not accessable while You are trying to use via Oozie, the Best Case to avoide such issue is to use Oozie - -ShareLib , but as You mention you dnt want the same to work using sharedlib, then use some alternate case like:
Update Pig Home Path in Hadoop ClassPath. These will help MR to fetch the List of Jars in HDFS Temp Location every time Oozie Submit Request to MR.
Update Pig Home in BashProfile {if Pig shows error while accessing GRANT Shell with Command not Found}.
Hope these will help some.

Related

Getting java.lang.NoSuchFieldError: INT_8 error while running spark job through oozie

I am Getting java.lang.NoSuchFieldError: INT_8 error when I am trying to execute a spark job using OOzie on Cloudera 5.5.1 version.
Any help on this will be appreciated.
Please find the error stackstrace below.
16/01/28 11:21:17 WARN TaskSetManager: Lost task 0.2 in stage 20.0 (TID 40, Zlab-physrv1): java.lang.NoSuchFieldError: INT_8
at org.apache.spark.sql.execution.datasources.parquet.CatalystSchemaConverter.convertField(CatalystSchemaConverter.scala:327)
at org.apache.spark.sql.execution.datasources.parquet.CatalystSchemaConverter.convertField(CatalystSchemaConverter.scala:312)
at org.apache.spark.sql.execution.datasources.parquet.CatalystSchemaConverter$$anonfun$convertField$1.apply(CatalystSchemaConverter.scala:517)
at org.apache.spark.sql.execution.datasources.parquet.CatalystSchemaConverter$$anonfun$convertField$1.apply(CatalystSchemaConverter.scala:516)
at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:108)
at org.apache.spark.sql.execution.datasources.parquet.CatalystSchemaConverter.convertField(CatalystSchemaConverter.scala:516)
at org.apache.spark.sql.execution.datasources.parquet.CatalystSchemaConverter.convertField(CatalystSchemaConverter.scala:312)
at org.apache.spark.sql.execution.datasources.parquet.CatalystSchemaConverter.convertField(CatalystSchemaConverter.scala:521)
at org.apache.spark.sql.execution.datasources.parquet.CatalystSchemaConverter.convertField(CatalystSchemaConverter.scala:312)
at org.apache.spark.sql.execution.datasources.parquet.CatalystSchemaConverter$$anonfun$convert$1.apply(CatalystSchemaConverter.scala:305)
at org.apache.spark.sql.execution.datasources.parquet.CatalystSchemaConverter$$anonfun$convert$1.apply(CatalystSchemaConverter.scala:305)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at org.apache.spark.sql.types.StructType.foreach(StructType.scala:92)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at org.apache.spark.sql.types.StructType.map(StructType.scala:92)
at org.apache.spark.sql.execution.datasources.parquet.CatalystSchemaConverter.convert(CatalystSchemaConverter.scala:305)
at org.apache.spark.sql.execution.datasources.parquet.ParquetTypesConverter$.convertFromAttributes(ParquetTypesConverter.scala:58)
at org.apache.spark.sql.execution.datasources.parquet.RowWriteSupport.init(ParquetTableSupport.scala:55)
at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:277)
at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:251)
at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(ParquetRelation.scala:94)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anon$3.newInstance(ParquetRelation.scala:272)
at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:233)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
As per My idea normally we used to get this error when ever there is some difference on the jars you have used to generate the code and the jars you have used currently.
Note: When I am trying to submit the Same one using spark-submit command it's running fine.
Regards
Nisith
Finally Able to debug and fix the issue. The Issue was with the installation as one of the data nodes are having older version of parquet Jars(5.2 cdh distribution). After replacing the jars with the current version jars it was working fine.

Hive Internal Error: java.lang.ClassNotFoundException(org.apache.atlas.hive.hook.HiveHook)

I am running a hive query throwh oozie using hue..
I am creating a table through hue-oozie work flow...
My job is failing but when I check in hive the table is created.
Log shows below error:
16157 [main] INFO org.apache.hadoop.hive.ql.hooks.ATSHook - Created ATS Hook
2015-09-24 11:05:35,801 INFO [main] hooks.ATSHook (ATSHook.java:<init>(84)) - Created ATS Hook
16159 [main] ERROR org.apache.hadoop.hive.ql.Driver - hive.exec.post.hooks Class not found:org.apache.atlas.hive.hook.HiveHook
2015-09-24 11:05:35,803 ERROR [main] ql.Driver (SessionState.java:printError(960)) - hive.exec.post.hooks Class not found:org.apache.atlas.hive.hook.HiveHook
16159 [main] ERROR org.apache.hadoop.hive.ql.Driver - FAILED: Hive Internal Error: java.lang.ClassNotFoundException(org.apache.atlas.hive.hook.HiveHook)
java.lang.ClassNotFoundException: org.apache.atlas.hive.hook.HiveHook
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
Not able to identify the issue....
I am usig HDP 2.3.1
Basically this error is due to missing atlas jar in oozie share lib.
In HDP the Atlas jar is available in /usr/hdp/2.3.0.0-2557/atlas/
Put all the jars related to atlas in hadoop share lib ..
hadoop fs -put /usr/hdp/2.3.0.0-2557/atlas/hook/hive/* /user/oozie/share/lib/lib200344/hive
Add 'export HIVE_AUX_JARS_PATH=<atlas package>/hook/hive' in hive-env.sh .
Copy <atlas package>/conf/application.propertiesto hive conf directory.
Restart the oozie services. This will solve this problem. If anybody face the problem please comment here so that I can help.
[Comment by Immo Huneke: when using the Hortonworks sandbox VM, I found that just putting the jar files in the share/lib folder under HDFS was enough to resolve the problem. I didn't have to update hive-env.sh or copy the application.properties file. But check the exact path of your share/lib folder by executing the command hdfs dfs -ls /user/oozie/share/lib before copying.]
hive>add jar /usr/hdp//atlas/hook/hive/hive-bridge-${VERSION}.jar
it will be ok.
hope help for u.
It Seems You CLASS is not found exception.
Have you installed Oozie Sharedlib, if Yes, please update all the hive dependent Jar in the sharedLib Location, and check if the status
Also check if Hive Client is available in all the Nodes under the cluster and same should be running
​I tried each and every possible solution mentioned in this forum and in stackoverflow, but it did not resolve my issue.
Finally, I resolved it by copying all the jars in /hook/hive to lib (create a new lib folder at job.properties level) folder of my oozie workflow

Set LD_LIBRARY_PATH or java.library.path for YARN / Hadoop2 Jobs

i have a Hadoop FileSystem which is using native libraries with JNI.
Apparently i have to include the shared object independently of the currently executed job. But i can't find a way to tell Hadoop/Yarn where it should look for the shared object.
I had partial success with the following solutions, while starting the wordcount example with yarn.
Setting export JAVA_LIBRARY_PATH=/path when starting the resource- and the nodemanager.
This helps with with the resource and the nodemanager, but the actual Job/Application fails. Printing the LD_LIBRARY_PATH and the java.library.path while executing the wordcount example yield the following result. What
/logs/userlogs/application_x/container_x_001/stdout
...
java.library.path : /tmp/hadoop-u/nm-local-dir/usercache/u/appcache/application_x/container_x_001:/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib
LD_LIBRARY_PATH : /tmp/hadoop-u/nm-local-dir/usercache/u/appcache/application_x/container_x
Setting yarn.app.mapreduce.am.env="LD_LIBRARY_PATH=/path"
This did help with some of the Jobs. The actual map/reduce job did work (at least i have the correct results), but the call did fail with the Error no jni-xtreemfs in java.library.path.
Somehow the first application/job did work and shows
/logs/userlogs/application_x/container_x_001/stdout
...
java.library.path : /tmp/hadoop-u/nm-local-dir/usercache/u/appcache/application_x/container_x_001:/path:/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib
LD_LIBRARY_PATH : /tmp/hadoop-u/nm-local-dir/usercache/u/appcache/application_x/container_x_001:/path
But the second and the rest did fail with:
/logs/userlogs/application_x/container_x_002/stdout
...
java.library.path : /tmp/hadoop-u/nm-local-dir/usercache/u/appcache/application_x/container_x_002:/opt/hadoop-2.7.1/lib/native:/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib
LD_LIBRARY_PATH : /tmp/hadoop-u/nm-local-dir/usercache/u/appcache/application_x/container_x_002/opt/hadoop-2.7.1/lib/native
The stacktrace for the later shows, that the error occured while executing YarnChild:
2015-08-03 15:24:03,851 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.UnsatisfiedLinkError: no jni-xtreemfs in java.library.path
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1886)
at java.lang.Runtime.loadLibrary0(Runtime.java:849)
at java.lang.System.loadLibrary(System.java:1088)
at org.xtreemfs.common.libxtreemfs.jni.NativeHelper.loadLibrary(NativeHelper.java:54)
at org.xtreemfs.common.libxtreemfs.jni.NativeClient.<clinit>(NativeClient.java:41)
at org.xtreemfs.common.libxtreemfs.ClientFactory.createClient(ClientFactory.java:72)
at org.xtreemfs.common.libxtreemfs.ClientFactory.createClient(ClientFactory.java:51)
at org.xtreemfs.common.clients.hadoop.XtreemFSFileSystem.initialize(XtreemFSFileSystem.java:191)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Supply the libjni-xtreemfs.so via the commandline argument -files
This does work. I assume the .so is copied to the tmp directory. But this is no feasible solution, because it would require the users to supply the path to the .so on every call.
Does anybody now how i can globally set the LD_LIBRARY_PATH or the java.library.path or can suggest which configuration options i did probably miss? I'd be very thankful!
Short Answer: in your mapred-site.xml put the following
<property>
<name>mapred.child.java.opts</name>
<value>-Djava.library.path=$PATH_TO_NATIVE_LIBS</value>
</property>
Explanation:
The Job/Applications aren't executed by yarn rather than by a mapred (map/reduce) container, whoose configuration is controlled by the mapred-site.xml file. Specifying custom java parameters there causes that actual workers to spin with the correct path
Use mapreduce.map.env in your job or site configuration.
Usage is as follows:
<property>
<name>mapreduce.map.env</name>
<value>LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/my/libs</value>
</property>
Note:
Hadoop docs encourage the use of mapreduce.map.env for this over mapred.child.java.opts. "Usage of -Djava.library.path can cause programs to no longer function if hadoop native libraries are used."

Job via Oozie HDP 2.1 not creating job.splitmetainfo

When trying to execute a sqoop job which has my Hadoop program passed as a jar file in -jarFiles parameter, the execution blows off with below error. Any resolution seems to be not available. Other jobs with same Hadoop user is getting executed successfully.
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.FileNotFoundException: File does not exist: hdfs://sandbox.hortonworks.com:8020/user/root/.staging/job_1423050964699_0003/job.splitmetainfo
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1541)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1396)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1363)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:976)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:135)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1241)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1041)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1452)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1448)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1381)
So here is the way I solved it. We are using CDH5 to run Camus to pull data from kafka. We run CamusJob which is responsible for getting data from kafka using comman line:
hadoop jar...
The problem is that new hosts didn't get so-called "yarn-gateway". Cloudera names pack of configs related to service and copied to /etc/hadoop/conf
as "gateway". So I just clicked "deploy client configuration" in CM UI. YARN client conf has been copied to each YARN NodeManager node and it solved problem.

hbase-master fails to start

I am trying to set up a Hadoop development environment. I am using CDH4 and following the installation instructions in their website https://ccp.cloudera.com/display/CDH4DOC/.
I got to the point in which I was able to install CDH4 in pseudo-distributed mode and I am following the part regarding "Components that require additional configuration".
I have installed HBase-master package, but when I try to start the service I am getting the following error:
$ sudo /sbin/service hbase-master start
starting master, logging to /var/log/hbase/hbase-hbase-master-slc01euu.out
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/PlatformName
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.PlatformName
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
Could not find the main class: org.apache.hadoop.util.PlatformName. Program will exit.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/io/Writable
I suposse that it has something to do with some env variable (i believe HADOOP_HOME). But I am not sure where to look at since all the previous processes (name node, data node, job tracker,task tracker) started with no problem.
When I search for HADOOP_HOME variable it says that it is undefined.
Do you guys have any idea about how I could solve this?
Thanks a lot in advance.
Please set HADOOP_HOME to your hadoop installation directory
eg: export HADOOP_HOME=/usr/hadoop
Add a pointer to your HADOOP_CONF_DIR to the HBASE_CLASSPATH environment variable in hbase-env.sh.
Add a copy of hdfs-site.xml (and core-site.xml) in under ${HBASE_HOME}/conf folder

Resources