Unable to load Hive-JDBC driver when accessed through MapReduce program on Amazon's Elastic MapReduce - hadoop

I have written a MapReduce program in which I am storing some part of output data into Hive table.
I have used Hive-JDBC driver to access Hive table via MapReduce code.
This program has compiled successfully on local machine.
After this, I created a JAR file and uploaded it on S3. Then I created an elasticmapreduce cluster and started it.
However, it is resulting into below mentioned errors:
java.lang.Throwable: Child Error at
org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) Caused
by: java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
attempt_201407161054_0001_m_000001_0: java.lang.ClassNotFoundException:
org.apache.hadoop.hive.jdbc.HiveDriver
attempt_201407161054_0001_m_000001_0: at
java.net.URLClassLoader$1.run(URLClassLoader.java:366)
attempt_201407161054_0001_m_000001_0: at
java.net.URLClassLoader$1.run(URLClassLoader.java:355)
attempt_201407161054_0001_m_000001_0: at
java.security.AccessController.doPrivileged(Native Method)
attempt_201407161054_0001_m_000001_0: at
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
attempt_201407161054_0001_m_000001_0: at
java.lang.ClassLoader.loadClass(ClassLoader.java:424)
attempt_201407161054_0001_m_000001_0: at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
attempt_201407161054_0001_m_000001_0: at
java.lang.ClassLoader.loadClass(ClassLoader.java:357)
attempt_201407161054_0001_m_000001_0: at
java.lang.Class.forName0(Native Method)
attempt_201407161054_0001_m_000001_0: at
java.lang.Class.forName(Class.java:190)
attempt_201407161054_0001_m_000001_0: at
HubAndAuthority.InputHubMapper.configure(InputHubMapper.java:38)
attempt_201407161054_0001_m_000001_0: at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
attempt_201407161054_0001_m_000001_0: at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
attempt_201407161054_0001_m_000001_0: at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
attempt_201407161054_0001_m_000001_0: at
java.lang.reflect.Method.invoke(Method.java:606)
It appears to be an issue of missing Hive-JDBC driver and it should get resolved by adding Hive-JDBC driver in classpath. However, I am not aware of the exact step to do this on Amazon's EMR.
Could you please let me know what is missing from my end and how to resolve it?
Thanks and Regards,
Prafulla

I'm not sure enough, but you should try this:
"Note
If you want your custom classpath to override the original class path, you should set the environment variable, HADOOP_USER_CLASSPATH_FIRST to true so that the HADOOP_CLASSPATH value specified in hadoop-user-env.sh is first."
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hadoop-config.html
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hadoop-config_hadoop-user-env.sh.html
Regards,
revet

Related

AWS glue NoClassDefFoundError on job.init()

Trying to debug AWS Glue scripts locally using Glue ETL library.
I have installed aws-glue-libs and spark-3.1.1-amzn-0-bin-3.2.1-amzn-3.tgz.
When I run job.init(), I get the following error trace:
py4j.protocol.Py4JJavaError: An error occurred while calling z:com.amazonaws.services.glue.util.Job.init.
: java.lang.NoClassDefFoundError: com/typesafe/config/ConfigMergeable
at com.amazonaws.services.glue.util.Job$.init(Job.scala:93)
at com.amazonaws.services.glue.util.Job.init(Job.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: com.typesafe.config.ConfigMergeable
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 13 more
This error wasn't resolved per say. But i found a workaround. Instead of running my scripts from pycharm, i run them using gluesparksubmit bash command. Now it doesn't throw error at job.init(). Still trying to figure out how to get access to data catalog when running glue scripts from local machine.
If it was for ConfigMergeable, check if proper jar file - config-1.3.3.jar - exists in your /opt/spark/jars dir.
Whole idea is that jars in /opt/spark/jars and ./aws-glue-libs/jarsv1 should match.

Errors involving matrix operations in Flink programs

The flink program runs normally locally, but uploading to the server after packaging always fails, and the following message is displayed:
java.lang.NoClassDefFoundError: org/netlib/blas/Dgemm
at com.github.fommil.netlib.F2jBLAS.dgemm(F2jBLAS.java:96)
at breeze.linalg.operators.DenseMatrixMultiplyStuff$implOpMulMatrix_DMD_DMD_eq_DMD$.apply(DenseMatrixOps.scala:63)
at breeze.linalg.operators.DenseMatrixMultiplyStuff$implOpMulMatrix_DMD_DMD_eq_DMD$.apply(DenseMatrixOps.scala:48)
at breeze.linalg.ImmutableNumericOps$class.$times(NumericOps.scala:135)
at breeze.linalg.DenseMatrix.$times(DenseMatrix.scala:53)
at com.zte.flink.machinelearning.robustRegressionAlgorithm$.IRLS(robustRegressionAlgorithm.scala:32)
at com.zte.flink.machinelearning.robustRegressionAlgorithm$.predict(robustRegressionAlgorithm.scala:88)
at com.zte.flink.test$.main(test.scala:21)
at com.zte.flink.test.main(test.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:529)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:421)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427)
at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:813)
at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:287)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:213)
at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1050)
at org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1126)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1126)
Caused by: java.lang.ClassNotFoundException: org.netlib.blas.Dgemm
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 26 more
It appears that org.netlib.blas.Dgemm is in your local CLASSPATH, but is not on the server's CLASSPATH. You'll need to either build a fat jar that includes this library and then submit the far jat to the cluster (recommended), or put it in the lib directory of all of the flink servers.
See the documentation for more details on how to handle dependencies for your Flink applications.

Unable to find partitioner class - Cassandra

Can some help me in fixing the below issue facing with Cassandra, when i run my application on Hadoop.
When i run the application, i am getting the below error with respect to the partitioner class we mentioned in the application.
Caused by: java.lang.RuntimeException: org.apache.cassandra.exceptions.ConfigurationException: Unable to find partitioner class 'org.apache.cassandra.dht.RandomPartitioner'
at org.apache.cassandra.hadoop.ConfigHelper.getInputPartitioner(ConfigHelper.java:426)
at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.validateConfiguration(AbstractColumnFamilyInputFormat.java:85)
at org.apache.cassandra.hadoop.ColumnFamilyInputFormat.validateConfiguration(ColumnFamilyInputFormat.java:74)
at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.getSplits(AbstractColumnFamilyInputFormat.java:122)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:493)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:510)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1313)
at com.test.cassandratest.WcJob.run(WcJob.java:96)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at com.test.cassandratest.WcJob.main(WcJob.java:104)
... 10 more
Caused by: org.apache.cassandra.exceptions.ConfigurationException: Unable to find partitioner class 'org.apache.cassandra.dht.RandomPartitioner'
at org.apache.cassandra.utils.FBUtilities.classForName(FBUtilities.java:458)
at org.apache.cassandra.utils.FBUtilities.construct(FBUtilities.java:470)
at org.apache.cassandra.utils.FBUtilities.newPartitioner(FBUtilities.java:416)
at org.apache.cassandra.hadoop.ConfigHelper.getInputPartitioner(ConfigHelper.java:422)
... 26 more
Caused by: java.lang.NoClassDefFoundError: org/github/jamm/MemoryMeter$Guess
at org.apache.cassandra.utils.ObjectSizes.<clinit>(ObjectSizes.java:34)
at org.apache.cassandra.dht.RandomPartitioner.<clinit>(RandomPartitioner.java:45)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:190)
at org.apache.cassandra.utils.FBUtilities.classForName(FBUtilities.java:450)
... 29 more
Caused by: java.lang.ClassNotFoundException: org.github.jamm.MemoryMeter$Guess
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 34 more
I met the same problem when upgraded Cassandra to 2.1 in our system, and the root cause is as following.
The jamm version Cassandra 2.1 uses is 3.0.0, and older Cassandra was using 2.5. So please update the jamm version you use, and you problem may be fixed.
http://mvnrepository.com/artifact/com.github.jbellis/jamm/0.3.0

Error creating index on elastic search

I am using elasticsearch: stable 1.2.1, HEAD. It was installed with 'brew'.
I am also able to start it without any problems.
However when I create an index I got this exception:
[2014-07-11 13:40:33,300][DEBUG][action.admin.indices.create] [N'astirh] [x_application_item_development] failed to create
org.elasticsearch.indices.IndexCreationException: [x_application_item_development] failed to create index
at org.elasticsearch.indices.InternalIndicesService.createIndex(InternalIndicesService.java:302)
at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService$2.execute(MetaDataCreateIndexService.java:343)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:309)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:134)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoClassDefFoundError: org/elasticsearch/ElasticSearchIllegalArgumentException
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2532)
at java.lang.Class.getDeclaredConstructors(Class.java:1901)
at org.elasticsearch.common.inject.assistedinject.FactoryProvider.createMethodMapping(FactoryProvider.java:214)
at org.elasticsearch.common.inject.assistedinject.FactoryProvider.newFactory(FactoryProvider.java:151)
at org.elasticsearch.common.inject.assistedinject.FactoryProvider.newFactory(FactoryProvider.java:146)
at org.elasticsearch.index.analysis.AnalysisModule.configure(AnalysisModule.java:274)
at org.elasticsearch.common.inject.AbstractModule.configure(AbstractModule.java:60)
at org.elasticsearch.common.inject.spi.Elements$RecordingBinder.install(Elements.java:204)
at org.elasticsearch.common.inject.spi.Elements.getElements(Elements.java:85)
at org.elasticsearch.common.inject.InjectorShell$Builder.build(InjectorShell.java:130)
at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:99)
at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:131)
at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:69)
at org.elasticsearch.indices.InternalIndicesService.createIndex(InternalIndicesService.java:298)
... 6 more
Caused by: java.lang.ClassNotFoundException: org.elasticsearch.ElasticSearchIllegalArgumentException
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 21 more
[2014-07-11 13:40:33,506][DEBUG][action.admin.indices.create] [N'astirh] [x_application_item_development] failed to create
org.elasticsearch.indices.IndexCreationException: [x_application_item_development] failed to create index
at org.elasticsearch.indices.InternalIndicesService.createIndex(InternalIndicesService.java:302)
at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService$2.execute(MetaDataCreateIndexService.java:343)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:309)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:134)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoClassDefFoundError: org/elasticsearch/ElasticSearchIllegalArgumentException
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2532)
at java.lang.Class.getDeclaredConstructors(Class.java:1901)
at org.elasticsearch.common.inject.assistedinject.FactoryProvider.createMethodMapping(FactoryProvider.java:214)
at org.elasticsearch.common.inject.assistedinject.FactoryProvider.newFactory(FactoryProvider.java:151)
at org.elasticsearch.common.inject.assistedinject.FactoryProvider.newFactory(FactoryProvider.java:146)
at org.elasticsearch.index.analysis.AnalysisModule.configure(AnalysisModule.java:274)
at org.elasticsearch.common.inject.AbstractModule.configure(AbstractModule.java:60)
at org.elasticsearch.common.inject.spi.Elements$RecordingBinder.install(Elements.java:204)
at org.elasticsearch.common.inject.spi.Elements.getElements(Elements.java:85)
at org.elasticsearch.common.inject.InjectorShell$Builder.build(InjectorShell.java:130)
at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:99)
at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:131)
at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:69)
at org.elasticsearch.indices.InternalIndicesService.createIndex(InternalIndicesService.java:298)
... 6 more
Caused by: java.lang.ClassNotFoundException: org.elasticsearch.ElasticSearchIllegalArgumentException
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 21 more
This is the class path:
:/usr/local/Cellar/elasticsearch/1.2.1/libexec/elasticsearch-1.2.1.jar:/usr/local/Cellar/elasticsearch/1.2.1/libexec/*:/usr/local/Cellar/elasticsearch/1.2.1/libexec/sigar/*
I downloaded the latest stable (1.2.2 there is version difference) from the elastic search site. And I started manually... The class path contains the same number of items (only the path prefix is different):
:/Users/boti/Downloads/elasticsearch-1.2.2/lib/elasticsearch-1.2.2.jar:/Users/boti/Downloads/elasticsearch-1.2.2/lib/:/Users/boti/Downloads/elasticsearch-1.2.2/lib/sigar/
In the manually installed version everything works...
Is this a brew recipe problem?
Sounds like a brew recipe problem.
Then error you're getting about a missing class means there's something fundamentally wrong in the way the application is started. That or files are actually missing.
Either way it's a problem with your startup script that brew is using or the files brew downloaded for you.

HBase completebulkload returns exception

I am trying to bulk-populate an HBase table quickly from a text file (several GB) by using the bulk loading method described in the Hadoop docs.
I have created an HFile which I now want to push to my HBase table.
When I use this command:
hadoop jar /home/hxcaine/hadoop/lib/hbase.jar completebulkload /user/hxcaine/dbpopulate/output/cf1 my_hbase_table
The job starts and then I get this exception:
Exception in thread "main" java.lang.NoClassDefFoundError: com/google/common/util/concurrent/ThreadFactoryBuilder
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:195)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:696)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:701)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.hbase.mapreduce.Driver.main(Driver.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
Caused by: java.lang.ClassNotFoundException: com.google.common.util.concurrent.ThreadFactoryBuilder
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
... 17 more
However, I can see that the Guava jar is in my classpath and when I check inside the jar I can see ThreadFactoryBuilder.class.
I am using these versions (and stuck with them):
Hadoop 0.20.2-cdh3u3
HBase 0.90.4-cdh3u3
Guava jar: /usr/lib/hadoop-0.20/lib/guava-r09-jarjar.jar
I do have an older Guava jar in my classpath but I don't know where it came from, I don't suppose it should have an effect.
Any ideas?
what happens if you run:
export HADOOP_CLASSPATH=`hbase classpath`
before running the load? From the stack trace, it looks like the jar is needed by one of the actual tasks though I am surprised to see that this actually kicks off an M/R job.

Resources