oozie wofklow intermittently fails on java action for scalding - hadoop

I am using CDH (Cloudera Hadoop) version 5.12.0 (which uses: Hadoop 2.6.0 Oozie 4.1.0) and Scalding 2.11
I am using a shaded jar with my dependencies built in.
I can run all my jobs properly without any error using a hadoop jar command as such:
hadoop jar /<path>/<to>/<my>/<project>.jar com.twitter.scalding.Tool
-libjars <comma>,<separated>,<list>,<of>,<dependencies>
-D mapreduce.framework.name=yarn
-D yarn.app.mapreduce.am.staging-dir=/<path>/<to>/<staging>/<dir>
-D mapreduce.map.output.compress=True
<my>.<scalding>.<job> --hdfs --input <my>/<input> --output <output>/<dir>
I can also run an oozie workflow with pig actions or other actions without any trouble. But when I try to run an oozie workflow with java actions that call scalding like so:
<action name="myAction">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="<output>/<dir>"/>
</prepare>
<configuration>
<property>
<name>oozie.action.sharelib.for.java</name>
<value>scalding</value>
</property>
<property>
<name>oozie.launcher.mapreduce.task.classpath.user.precedence</name>
<value>true</value>
</property>
<property>
<name>oozie.launcher.mapreduce.task.classpath.first</name>
<value>true</value>
</property>
<property>
<name>oozie.launcher.mapreduce.job.user.classpath.first</name>
<value>true</value>
</property>
<property>
<name>mapreduce.job.user.classpath.first</name>
<value>true</value>
</property>
<property>
<name>mapreduce.task.classpath.user.precedence</name>
<value>true</value>
</property>
<property>
<name>mapreduce.task.classpath.first</name>
<value>true</value>
</property>
<property>
<name>mapreduce.map.log.level</name>
<value>DEBUG</value>
</property>
<property>
<name>mapreduce.reduce.log.level</name>
<value>DEBUG</value>
</property>
<property>
<name>mapreduce.job.log4j-properties-file</name>
<value>${appLocation}/conf</value>
</property>
<property>
<name>mapreduce.task.files.preserve.failedtasks</name>
<value>true</value>
</property>
<property>
<name>mapreduce.job.jvm.numtasks</name>
<value>-1</value>
</property>
<property>
<name>oozie.mapreduce.uber.jar</name>
<value>/<path>/<to>/<my>/<class>.jar</value>
</property>
</configuration>
<main-class>com.twitter.scalding.Tool</main-class>
<java-opts>-Xms2G -Xmx3G</java-opts>
<arg><my>.<scalding>.<job></arg>
<arg>--hdfs</arg>
<arg>--input</arg>
<arg><my>/<input></arg>
<arg>--output</arg>
<arg><output>/<dir></arg>
<file>/<path>/<to>/<my>/<project>.jar</file>
</java>
<ok to="end" />
<error to="sendEmailFailure" />
I have also set:
oozie.libpath=<path>/<to>/<lib>/<directory>
oozie.use.system.libpath=true
in my properties file and all the necessary jars are within the libpath, but I get errors where the java action can't find the dependencies it needs:
Error: java.io.IOException: Split class cascading.tap.hadoop.io.MultiInputSplit not found at
org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:363) at
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:422) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ClassNotFoundException: Class cascading.tap.hadoop.io.MultiInputSplit not found at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2108) at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:361) ... 7 more
This happens inconsistently, in that if I only put one action in the workflow, sometimes the workflow completes, sometimes it fails. If I put more than one action in the workflow, it definitely fails on some action eventually, not always the same action but always with the same error.
I believe the problem is coming from the way in which the action is getting it's dependencies and sometimes it loads them in the correct order and gets the one I want first and sometimes it doesn't and is missing the dependency it needs. But I'm new to oozie, so who knows?
I think I can increase the number of max attempts for each action taken by oozie, but I feel like that is not really a solution and is sort of akin to rolling the dice against the cluster.
I saw people post this issue a while back and rolling back to an older verison of CDH (4.1) but that's not an option for me.
Suggestions?

Related

Delegation Token can be issued only with kerberos or web authentication Spark action Oozie

I have a cluster with (hadoop-2.7.3), hbase (1.2), zookeeper (zookeeper-3.4.8), phoenix (apache-phoenix-4.10.0), spark (2.2.0) and oozie (4.3.0). All components ara config with kerberos (less spark).
When I try to deploy spark action with Oozie to kerberized hadoop cluster, in yarn logs (stdout) i have this error:
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, Delegation Token can be issued only with kerberos or web authentication
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:6642)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:564)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:987)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
My workflow is like:
<credentials>
<credential name='hb' type='hbase'>
<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value>
</property>
<property>
<name>hbase.security.authentication</name>
<value>kerberos</value>
</property>
<property>
<name>hbase.master.kerberos.principal</name>
<value>hbase/host#REALM</value>
</property>
<property>
<name>hbase.regionserver.kerberos.principal</name>
<value>hbase/host#REALM</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>domain</value>
</property>
<property>
<name>hadoop.rpc.protection</name>
<value>authentication</value>
</property>
<property>
<name>hbase.rpc.protection</name>
<value>authentication</value>
</property>
<property>
<name>hbase.master.keytab.file</name>
<value>/etc/security/keytabs/hbase.keytab</value>
</property>
<property>
<name>hbase.regionserver.keytab.file</name>
<value>/etc/security/keytabs/hbase.keytab</value>
</property>
</credential>
</credentials>
<action name="action_name" cred="hb">
<spark xmlns="uri:oozie:spark-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<master>yarn</master>
<mode>cluster</mode>
<name>spark action name</name>
<class>com.sparkclass</class>
<jar>spark_code.jar</jar>
<spark-opts>--queue jobs --file hbase-site.xml</spark-opts>
<arg>-D</arg>
<arg>hbase.zookeeper.quorum=${zkQuorum}</arg>
<arg>-D</arg>
<arg>hbase.zookeeper.client.port=${zkClientPort}</arg>
<arg>${date}</arg>
<arg>${arg2}</arg>
<arg>${arg3}</arg>
<file>${nameNode}/user/${wf:user()}/jobs/lib/hbase-site.xml</file>
</spark>
<ok to="end"/>
<error to="fail"/>
</action>
I have a java action, sqoop action and works perfect, and only in a spark action, i have this problem.
I try to change a java action, and shell action whitout results. Maybe i need to change my code?
Thanks

Giraph Job running in local mode always

I ran Giraph 1.1.0 on Hadoop 2.6.0.
The mapredsite.xml looks like this
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>The runtime framework for executing MapReduce jobs. Can be one of
local, classic or yarn.</description>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>4096</value>
<name>mapreduce.reduce.memory.mb</name>
<value>8192</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx3072m</value>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx6144m</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>4</value>
</property>
<property>
<name>mapred.map.tasks</name>
<value>4</value>
</property>
</configuration>
The giraph-site.xml looks like this
<configuration>
<property>
<name>giraph.SplitMasterWorker</name>
<value>true</value>
</property>
<property>
<name>giraph.logLevel</name>
<value>error</value>
</property>
</configuration>
I do not want to run the job in the local mode. I have also set environment variable MAPRED_HOME to be HADOOP_HOME. This is the command to run the program.
hadoop jar myjar.jar hu.elte.inf.mbalassi.msc.giraph.betweenness.BetweennessComputation /user/$USER/inputbc/inputgraph.txt /user/$USER/outputBC 1.0 1
When I run this code that computes betweenness centrality of vertices in a graph, I get the following exception
Exception in thread "main" java.lang.IllegalArgumentException: checkLocalJobRunnerConfiguration: When using LocalJobRunner, you cannot run in split master / worker mode since there is only 1 task at a time!
at org.apache.giraph.job.GiraphJob.checkLocalJobRunnerConfiguration(GiraphJob.java:168)
at org.apache.giraph.job.GiraphJob.run(GiraphJob.java:236)
at hu.elte.inf.mbalassi.msc.giraph.betweenness.BetweennessComputation.runMain(BetweennessComputation.java:214)
at hu.elte.inf.mbalassi.msc.giraph.betweenness.BetweennessComputation.main(BetweennessComputation.java:218)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
What should I do to ensure that the job does not run in local mode?
I have met the problem just a few days ago.Fortunately i solved it by doing this.
Modify the configuration file mapred-site.xml,make sure the value of property 'mapreduce.framework.name' to be 'yarn' and add the property 'mapreduce.jobtracker.address' which value is 'yarn' if there is not.
The mapred-site.xml looks like this:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobtracker.address</name>
<value>yarn</value>
</property>
</configuration>
Restart hadoop after modifying the mapred-site.xml.Then run your program and set the value which is after '-w' to be more than 1 and the value of 'giraph.SplitMasterWorker' to be 'true'.It will probably work.
As for the cause of the problem,I just quote somebody's saying:
These properties are designed for single-node executions and will have to be
changed when executing things in a cluster of nodes. In such a situation, the
jobtracker has to point to one of the machines that will be executing a
NodeManager daemon (a Hadoop slave). As for the framework, it should be
changed to 'yarn'.
We can see that in the stack-trace where the configuration check in LocalJobRunner fails this is a bit misleading because it makes us assume that we run in local model.You already found the responsible configuration option: giraph.SplitMasterWorker but in your case you set it to true. However, on the command-line with the last parameter 1 you specify to use only a single worker. Hence the framework decides that you MUST be running in local mode. As a solution you have two options:
Set giraph.SplitMasterWorker to false although you are running on a cluster.
Increase the number of workers by changing the last parameter to the command-line call.
hadoop jar myjar.jar hu.elte.inf.mbalassi.msc.giraph.betweenness.BetweennessComputation /user/$USER/inputbc/inputgraph.txt /user/$USER/outputBC 1.0 4
Please refer also to my other answer at SO (Apache Giraph master / worker mode) for details on the problem concerning local mode.
If you are after to split the master from the node you can use:
-ca giraph.SplitMasterWorker=true
also to specify the amount of workers you can use:
-w #
where "#" is the number of workers you want to use.

Oozie job fails to run due to impersonation error (wild card support)

I'm not able to run an oozie job on local single node hadoop cluster despite setting the user "kapil.sharma02" as a proxy user. Is this due to wild card in my user name? Can you please suggest a remedy?
kapil.sharma02$ ./oozie job -oozie http://localhost:11000/oozie -config ../examples/apps/map-reduce/job.properties -run
Error: E0501 : E0501: Could not perform authorization operation, User: kapil.sharma02 is not allowed to impersonate kapil.sharma02
Here is my core-site.xml (hadoop 2.6.4)
I have tried adding this config both with and without escape character but no luck.
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.proxyuser.kapil\.sharma02.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.kapil\.sharma02.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>*</value>
</property>
Can you please check your job log for proper failure information. I am not sure whether this may help you, but similar issue is resolved here. please have a look.

Multiple Input Paths configuration in OOZIE

I am trying to configure a Mapreduce job in oozie . This job has two different input formats and two input data folders. I used this post How to configure oozie workflow for multi-input path with multiple mappers
and added these properties to my workflow.xml :
<property>
<name>mapred.input.dir.formats</name>
<value>folder/data/*;org.apache.hadoop.mapred.SequenceFileInputFormat\,data/*;org.apache.hadoop.mapred.TextInputFormat</value>
</property>
<property>
<name>mapred.input.dir.mappers</name>
<value>folder/data/*;....PublicMapper\,data/*;....PublicMapper</value>
</property>
but when the job is launched i have the following error: " No input paths specified in job".
Is there anyone that can help me ?
thks
You need to set some additional properties:
<property>
<name>mapreduce.inputformat.class</name>
<value>org.apache.hadoop.mapreduce.lib.input.DelegatingInputFormat</value>
</property>
<property>
<name>mapreduce.map.class</name>
<value>org.apache.hadoop.mapreduce.lib.input.DelegatingMapper</value>
</property>
I faced the same issue today, so I used the following properties.
<property>
<name>mapreduce.inputformat.class</name>
<value>org.apache.hadoop.mapreduce.lib.input.DelegatingInputFormat</value>
</property>
<property>
<name>mapreduce.map.class</name>
<value>org.apache.hadoop.mapreduce.lib.input.DelegatingMapper</value>
</property>
<property>
<name>mapreduce.input.multipleinputs.dir.formats</name>
<value>/first/input/path;org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat,/second/input/path;org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat</value>
</property>
<property>
<name>mapreduce.input.multipleinputs.dir.mappers</name>
<value>/first/input/path;com.first.Mapper,/second/input/path;com.second.Mapper</value>
</property>
The difference is instead of mapred.input.dir.formats and mapred.input.dir.mappers which is part of the old map-reduce API I used mapreduce.input.multipleinputs.dir.formats and mapreduce.input.multipleinputs.dir.mappers respectively. The code worked just fine after that. I ran it on Hadoop 1.2.1 and Oozie 3.3.2.

Getting E0902: Exception occured: [User: oozie is not allowed to impersonate oozie]

Hi i am new to Oozie and i am getting this error E0902: Exception occured: [User: pramod is not allowed to impersonate pramod] when i run the following command
./oozie job -oozie htt p://localhost:11000/oozie/ -config ~/Desktop/map-reduce /job.properties -run.
My hadoop version is 1.0.3 and oozie version is 3.3.2 and running in a pseudo mode
The following is the content of my core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/pramod/hadoop-${user.name}</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
</property>
<property>
<name>hadoop.proxyuser.${user.name}.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.${user.name}.groups</name>
<value>*</value>
</property>
</configuration>
Can somebody help
Hadoop 1.0.x does not support wildcards. http://mail-archives.apache.org/mod_mbox/oozie-user/201212.mbox/%3CCAOcnVr1TZZ5X0Mrb7fFA8JdW6rO6PgoJ9u0=2UYbfXf_o8r=DA#mail.gmail.com%3E
So try
<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>localhost</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>oozie,pramod</value>
</property>
One thing missed in the discussion above:
In core-site.xml you need to use the user with which oozie is started, as in the user that invoked the command "bin/oozied.sh start". For example: if you have "hadoop.proxyuser.bob.hosts" along with hadoop.proxyuser.bob.groups, then the user 'bob' would be required to start oozie using "bin/oozied.sh start".
I don't think you can use variables in the key name - you'll need to hardcode the user name rather than ${user.name}.
I assume you have an oozie user (which the oozie server is run as), so basically you want to configure as follows to allow the oozie user to impersonate anyone from any host:
<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>*</value>
</property>
Make sure you restart your HDFS / MAPREDUCE services for this to take affect

Resources