Installing a Spark Cluster, problems with Hive - hadoop

I am trying to get a Spark/Shark cluster up but keep running into the same problem.
I have followed the instructions on https://github.com/amplab/shark/wiki/Running-Shark-on-a-Cluster and addressed Hive as stated.
I think that the Shark Driver is picking up another version of Hadoop jars but am unsure why.
Here are the details, any help would be great.
Spark/Shark 0.9.0
Apache Hadoop 2.3.0
Amplabs Hive 0.11
Scala 2.10.3
Java 7
I have everything install but I get some deprecation warnings and then an exception:
14/03/14 11:24:47 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/03/14 11:24:47 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
Exception:
Exception in thread "main" org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1072)
at shark.memstore2.TableRecovery$.reloadRdds(TableRecovery.scala:49)
at shark.SharkCliDriver.<init>(SharkCliDriver.scala:275)
at shark.SharkCliDriver$.main(SharkCliDriver.scala:162)
at shark.SharkCliDriver.main(SharkCliDriver.scala)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1139)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:51)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:61)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2288)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2299)
at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1070)
... 4 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1137)
... 9 more
Caused by: java.lang.UnsupportedOperationException: Not implemented by the DistributedFileSystem FileSystem implementation

I had this same problem, and I think it's caused by incompatible versions of hadoop/hive and spark/shark.
You need to either:
Remove hadoop-core-1.0.x.jar from shark/lib_managed/jars/org.apache.hadoop/hadoop-core/
When building shark, explicitly set SHARK_HADOOP_VERSION as follows:
cd shark;
SHARK_HADOOP_VERSION=2.0.0-mr1-cdh4.5.0 ./sbt/sbt clean
SHARK_HADOOP_VERSION=2.0.0-mr1-cdh4.5.0 ./sbt/sbt package
The second method solved other issues for me as well. You can also see this topic for more details: https://groups.google.com/forum/#!msg/shark-users/lTNPcxHJiOQ/EqzyByZrzQMJ

Related

StorageBackend version is incompatible with current JanusGraph version

Unable to start gremlin-server due to version mismatch. How can I fix this issue?
Here is the full stack trace
Exception in thread "main" java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor.<init>(ServerGremlinExecutor.java:121)
at org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor.<init>(ServerGremlinExecutor.java:89)
at org.apache.tinkerpop.gremlin.server.GremlinServer.<init>(GremlinServer.java:110)
at org.apache.tinkerpop.gremlin.server.GremlinServer.main(GremlinServer.java:354)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor.<init>(ServerGremlinExecutor.java:110)
... 3 more
Caused by: org.janusgraph.core.JanusGraphException: StorageBackend version is incompatible with current JanusGraph version: storage [0.2.1] vs. runtime [0.2.0]
at org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.<init>(GraphDatabaseConfiguration.java:1427)
at org.janusgraph.core.JanusGraphFactory.lambda$open$0(JanusGraphFactory.java:152)
at org.janusgraph.graphdb.management.JanusGraphManager.openGraph(JanusGraphManager.java:210)
at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:151)
at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:101)
at org.janusgraph.graphdb.management.JanusGraphManager.lambda$new$0(JanusGraphManager.java:65)
at java.util.LinkedHashMap.forEach(LinkedHashMap.java:684)
at org.janusgraph.graphdb.management.JanusGraphManager.<init>(JanusGraphManager.java:64)
... 8 more
Exception in thread "gremlin-server-shutdown" java.lang.NullPointerException
at org.apache.tinkerpop.gremlin.server.GremlinServer.stop(GremlinServer.java:264)
at org.apache.tinkerpop.gremlin.server.GremlinServer.lambda$new$0(GremlinServer.java:91)
at java.lang.Thread.run(Thread.java:748)
I am working with janusgraph-0.2.0-hadoop2.zip downloaded from janusgraph website. I don't know why the error says it has janusgraph 0.2.1
Downloading janusgraph-0.2.1-hadoop2.zip and starting cassandra and gremlin-server running
./janusgraph-0.2.1-hadoop2/bin/cassandra
./janusgraph-0.2.1-hadoop2/bin/gremlin-server.sh
resolved the issue
Note : I also modified the /janusgraph-0.2.1-hadoop2/conf/gremlin-server/gremlin-server.yaml and gremlin-server-configuration.yaml files to use conf/janusgraph-cassandra.properties for ConfigurationManagementGraph property.
You need to add the following property in your configuration file. This property will allow upgrade and DB get compatible with the newer version.
graph.allow-upgrade=true
Name
Description
Datatype
Default Value
graph.allow-upgrade
Setting this to true will allow certain fixed values to be updated such as storage-version. This should only be used for upgrading.
Boolean
false

java.lang.NoClassDefFoundError: org/apache/hadoop/security/authorize/RefreshAuthorizationPolicyProtocol

I am currently trying to install hadoop 2.3.0 on my cluster. However, when I run command "bin/hdfs namenode -format", I got this error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/security/authorize/RefreshAuthorizationPolicyProtocol
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.security.authorize.RefreshAuthorizationPolicyProtocol
Any ideas how to solve it?
There isn't much information in the question, but this error will occur if the environment variable HADOOP_COMMON_HOME is not correctly set to $HADOOP_HOME (the top level where HADOOP is installed).

Class not found in a MapReduce job

I have a mapreduce job which takes an avro file as input. I export it along with all the required libraries (jar library files) into a jar file. I have 2 different clusters, one is HDInsight simulator and the other one in HDP sandbox. It works fine on the HDP sandbox but it gives me an error on the HDInsight simulator and cannot find AvroInputFormat class. I tried running the job with the -libjar option but it didn't help. Here is the error message:
Error: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.avro.mapred.AvroInputFormat not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1927)
at org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:686)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:168)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:409)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.avro.mapred.AvroInputFormat not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1919)
... 9 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.avro.mapred.AvroInputFormat not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
... 10 more
This looks weired because it runs fine on one cluster! Does anyone know what can be the problem?

Hadoop 2.6.0: Basic error "starting MRAppMaster" after installing

I have just started to work with Hadoop 2.
After installing with basic configs, I always failed to run any examples. Has anyone seen this problem and please help me?
And the error is something like
Error starting MRAppMaster
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
This is the log
20152015-01-06 11:56:23,194 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for application appattempt_1420510526926_0002_000001
2015-01-06 11:56:23,362 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
at org.apache.hadoop.security.Groups.<init>(Groups.java:70)
at org.apache.hadoop.security.Groups.<init>(Groups.java:66)
at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:280)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:271)
at org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:299)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1473)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1429)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:129)
... 7 more
Caused by: java.lang.UnsatisfiedLinkError: org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative()V
at org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative(Native Method)
at org.apache.hadoop.security.JniBasedUnixGroupsMapping.<clinit>(JniBasedUnixGroupsMapping.java:49)
at org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.<init>(JniBasedUnixGroupsMappingWithFallback.java:39)
... 12 more
2015-01-06 11:56:23,366 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting with status 1
Error messages: "Error starting MRAppMaster", "InvocationTargetException", "UnsatisfiedLinkError"
Root cause: Fail to execute the native function "anchorNative" in the class "org.apache.hadoop.security.JniBasedUnixGroupsMapping"
Description: the function "anchorNative" will call a function in the library "libhadoop.so". The path of this library is specified by these environment variables:
export JAVA_LIBRARY_PATH
export HADOOP_COMMON_LIB_NATIVE_DIR
In the Hadoop source code, print the java library class path by
System.err.println( (System.getProperty("java.library.path") );
# result
/home/maidinh/hadoop2/build/hadoop-2.6.0-src/hadoop-dist/target/hadoop-2.6.0/lib/native:
/usr/java/packages/lib/amd64:
/usr/lib64:
/lib64:
/lib:
/usr/lib
Different versions of the library "libhadoop.so" can be found in these locations that make a conflict.
Solution: Except the right path of native library (in hadoop-2.6.0/lib/native), delete all "libhadoop.so" in other directories.
Notes: delete all related libraries of hadoop
rm -r libhadoop*
rm -r libhdfs*

Pig java.lang.NoSuchFieldException: jobsInProgress exception

I am using pig-0.11.0+28 with CDH4 and when I run any Pig job I get this exception. I also happens in local mode. Any ideas?
2013-07-08 13:53:44,035 [main] WARN org.apache.pig.backend.hadoop23.PigJobControl - falling back to default JobControl (not using hadoop 0.23 ?)
java.lang.NoSuchFieldException: jobsInProgress
at java.lang.Class.getDeclaredField(Class.java:1938)
at org.apache.pig.backend.hadoop23.PigJobControl.<clinit>(PigJobControl.java:58)
at org.apache.pig.backend.hadoop.executionengine.shims.HadoopShims.newJobControl(HadoopShims.java:102)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:285)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:177)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1266)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1251)
at org.apache.pig.PigServer.execute(PigServer.java:1241)
at org.apache.pig.PigServer.executeBatch(PigServer.java:335)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:137)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:604)
at org.apache.pig.Main.main(Main.java:157)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
This WARN is harmless at runtime.
Pig is designed to work with a lot of versions.
Since CDH4 ships Apache Hadoop 2.x with an MR1 (0.20/1.x) MR framework option, Pig here is getting confused on what to expect. It detects a 2.x version and tries to load an MR2-style submitter, but if you use MR1, then it would print this noisy WARN and fall back to MR1 and still proceed successfully. The error is followed by a message as or similar to org.apache.pig.backend.hadoop23.PigJobControl - falling back to default indicating this.

Resources