In order to use TableMapper, I included the hbase-server dependencies to my hadoop project. hbase-shaded-client and hbase-server both have 1.1.2 version.
But when trying to run the hadoop job, I have an error that seems related to security:
FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
java.lang.NoSuchMethodError: org.apache.hadoop.security.authentication.server.AuthenticationFilter.constructSecretProvider(Ljavax/servlet/ServletContext;Ljava/util/Properties;Z)Lorg/apache/hadoop/security/authentication/util/SignerSecretProvider;
at org.apache.hadoop.http.HttpServer2.constructSecretProvider(HttpServer2.java:447)
at org.apache.hadoop.http.HttpServer2.<init>(HttpServer2.java:339)
at org.apache.hadoop.http.HttpServer2.<init>(HttpServer2.java:114)
at org.apache.hadoop.http.HttpServer2$Builder.build(HttpServer2.java:290)
at org.apache.hadoop.yarn.webapp.WebApps$Builder.build(WebApps.java:261)
at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:303)
at org.apache.hadoop.mapreduce.v2.app.client.MRClientService.serviceStart(MRClientService.java:142)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1107)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1519)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1515)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1448)
2016-08-22 11:04:29,010 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting with status 1
Does someone have this problem before?
Thank you.
hbase shaded client seems to pull in an un-shaded version of servlet api classes. This can cause issues when deploying into web servers or any other framework that expects a different version of the servlet api to be available.
Related
I'm trying to use custom spark serializer defined as:
conf.set("spark.serializer", CustomSparkSerializer.class.getCanonicalName());
But when I submit application to Spark I'm facing issue with ClassNotFoundException when executor env creating, for example:
16/04/01 18:41:11 INFO util.Utils: Successfully started service 'sparkExecutor' on port 52153.
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1643)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:149)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:250)
at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: java.lang.ClassNotFoundException: example.CustomSparkSerializer
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at **org.apache.spark.util.Utils$.classForName(Utils.scala:173)**
at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:266)
at org.apache.spark.SparkEnv$.instantiateClassFromConf$1(SparkEnv.scala:287)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:290)
at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:218)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:183)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
In local standalone mode it can be solved using "spark.executor.extraClassPath=path/to/jar", but on cluster with several nodes it does not help.
I have tried all known (for me) approaches as such as use --jars, executor (and even driver) extra class and library path, sc.addJar also... It was not help.
I found that Spark uses specific classloader in org.apache.spark.util.Utils$.classForName(Utils.scala:173) for load serializer class, but I really don't understand how to make custom serializer loadable.
The application flow submit is more complex - Oozie -> SparkSubmit -> YARN client -> Spark application
The question is - does anybody know how to use custom spark serializer and how to resolve ClassNotFound issue with it ?
Thanks in advance!
The reason why it happens is because I use spark.executor.extraClassPath with prefix /home/some_user. It seems that Spark can not load any class from that path because Spark process owner is another user, once I putted JAR to path smth like /usr/lib/ everything works fine.
So, I confused with users and Hadoop/Oozie/Spark processes owners, but I was not expecting such behavior from ClassLoaders =)
Thank you for help!
I am trying to run the Simple Single Project Yarn Application detailed here. I deployed the application as a jar file to our hadoop cluster. When trying to run, I am getting an exception, stack trace below:
[2015-06-04 14:10:45.866] boot - 13669 ERROR [main] --- SpringApplication: Application startup failed
java.lang.IllegalStateException: Failed to execute CommandLineRunner
at org.springframework.boot.SpringApplication.runCommandLineRunners(SpringApplication.java:680)
at org.springframework.boot.SpringApplication.afterRefresh(SpringApplication.java:695)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:322)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:961)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:950)
at com.aetna.ise.yarn.publish.Application.main(Application.java:21)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:95)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:56)
at java.lang.reflect.Method.invoke(Method.java:620)
at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:53)
at java.lang.Thread.run(Thread.java:857)
Caused by: org.springframework.yarn.YarnSystemException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]; nested exception is org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]
This is due to the fact that our cluster uses Kerberos authentication. Is there a way to pass the Kerberos ticket to the application in the Spring YARN code? I don't see any place to do that.
We can't currently delegate any tickets when application is submitted, but application itself can use kerberos.
This is explained in section http://docs.spring.io/spring-hadoop/docs/2.1.2.RELEASE/reference/html/springandhadoop-security.html#literal-spring-hadoop-security-literal-configuration-properties
For example something like shown below in application.yml(use principals from your cluster):
spring:
hadoop:
fsUri: hdfs://localhost:8020
resourceManagerHost: localhost
security:
userPrincipal: jvalkealahti/neo
userKeytab: /usr/local/hadoops/jvalkealahti.keytab
authMethod: kerberos
namenodePrincipal: hdfs/neo#LOCALDOMAIN
rmManagerPrincipal: yarn/neo#LOCALDOMAIN
I am trying to get a Spark/Shark cluster up but keep running into the same problem.
I have followed the instructions on https://github.com/amplab/shark/wiki/Running-Shark-on-a-Cluster and addressed Hive as stated.
I think that the Shark Driver is picking up another version of Hadoop jars but am unsure why.
Here are the details, any help would be great.
Spark/Shark 0.9.0
Apache Hadoop 2.3.0
Amplabs Hive 0.11
Scala 2.10.3
Java 7
I have everything install but I get some deprecation warnings and then an exception:
14/03/14 11:24:47 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/03/14 11:24:47 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
Exception:
Exception in thread "main" org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1072)
at shark.memstore2.TableRecovery$.reloadRdds(TableRecovery.scala:49)
at shark.SharkCliDriver.<init>(SharkCliDriver.scala:275)
at shark.SharkCliDriver$.main(SharkCliDriver.scala:162)
at shark.SharkCliDriver.main(SharkCliDriver.scala)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1139)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:51)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:61)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2288)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2299)
at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1070)
... 4 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1137)
... 9 more
Caused by: java.lang.UnsupportedOperationException: Not implemented by the DistributedFileSystem FileSystem implementation
I had this same problem, and I think it's caused by incompatible versions of hadoop/hive and spark/shark.
You need to either:
Remove hadoop-core-1.0.x.jar from shark/lib_managed/jars/org.apache.hadoop/hadoop-core/
When building shark, explicitly set SHARK_HADOOP_VERSION as follows:
cd shark;
SHARK_HADOOP_VERSION=2.0.0-mr1-cdh4.5.0 ./sbt/sbt clean
SHARK_HADOOP_VERSION=2.0.0-mr1-cdh4.5.0 ./sbt/sbt package
The second method solved other issues for me as well. You can also see this topic for more details: https://groups.google.com/forum/#!msg/shark-users/lTNPcxHJiOQ/EqzyByZrzQMJ
I am using pig-0.11.0+28 with CDH4 and when I run any Pig job I get this exception. I also happens in local mode. Any ideas?
2013-07-08 13:53:44,035 [main] WARN org.apache.pig.backend.hadoop23.PigJobControl - falling back to default JobControl (not using hadoop 0.23 ?)
java.lang.NoSuchFieldException: jobsInProgress
at java.lang.Class.getDeclaredField(Class.java:1938)
at org.apache.pig.backend.hadoop23.PigJobControl.<clinit>(PigJobControl.java:58)
at org.apache.pig.backend.hadoop.executionengine.shims.HadoopShims.newJobControl(HadoopShims.java:102)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:285)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:177)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1266)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1251)
at org.apache.pig.PigServer.execute(PigServer.java:1241)
at org.apache.pig.PigServer.executeBatch(PigServer.java:335)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:137)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:604)
at org.apache.pig.Main.main(Main.java:157)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
This WARN is harmless at runtime.
Pig is designed to work with a lot of versions.
Since CDH4 ships Apache Hadoop 2.x with an MR1 (0.20/1.x) MR framework option, Pig here is getting confused on what to expect. It detects a 2.x version and tries to load an MR2-style submitter, but if you use MR1, then it would print this noisy WARN and fall back to MR1 and still proceed successfully. The error is followed by a message as or similar to org.apache.pig.backend.hadoop23.PigJobControl - falling back to default indicating this.
I recently upgraded to clodera4b1 .Before upgradation the mapreduce jobs were running fine but now when I execute any mapreduce program,Thefollowing error comes:
Command run:
hadoop jar /usr/lib/hadoop/hadoop-mapreduce-examples-0.23.0-cdh4b1.jar grep *.xml /user/out/ 'dfs'
12/04/10 19:23:15 INFO mapreduce.Cluster: Failed to use org.apache.hadoop.mapred.LocalClientProtocolProvider due to error: Invalid "mapreduce.jobtracker.address" configuration value for LocalJobRunner : "dev-xxx:yyyy"
12/04/10 19:23:15 ERROR security.UserGroupInformation: PriviledgedActionException as:anchauhan (auth:SIMPLE) cause:java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
Exception in thread "main" java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:123)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:85)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:78)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1185)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1181)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1167)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:1180)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1209)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1233)
at com.nextag.mapred.MerchantImport.doTask(MerchantImport.java:221)
at com.nextag.mapred.Main.main(Main.java:9)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:200)
And for my custom mapred jobs,I changed the library files to the new ones while compiling jar files,but no success
We had the same exception and the reason was that we had the wrong dependencies in the pom files. We were pointing to hadoop-client version 2.0.0-cdh4.0.0 which is for yarn but we are using MRv1 so we should have been pointing to 2.0.0-mr1-cdh4.0.0. You should not have any yarn jar files in your dependencies.
Reference