Errors involving matrix operations in Flink programs - matrix

The flink program runs normally locally, but uploading to the server after packaging always fails, and the following message is displayed:
java.lang.NoClassDefFoundError: org/netlib/blas/Dgemm
at com.github.fommil.netlib.F2jBLAS.dgemm(F2jBLAS.java:96)
at breeze.linalg.operators.DenseMatrixMultiplyStuff$implOpMulMatrix_DMD_DMD_eq_DMD$.apply(DenseMatrixOps.scala:63)
at breeze.linalg.operators.DenseMatrixMultiplyStuff$implOpMulMatrix_DMD_DMD_eq_DMD$.apply(DenseMatrixOps.scala:48)
at breeze.linalg.ImmutableNumericOps$class.$times(NumericOps.scala:135)
at breeze.linalg.DenseMatrix.$times(DenseMatrix.scala:53)
at com.zte.flink.machinelearning.robustRegressionAlgorithm$.IRLS(robustRegressionAlgorithm.scala:32)
at com.zte.flink.machinelearning.robustRegressionAlgorithm$.predict(robustRegressionAlgorithm.scala:88)
at com.zte.flink.test$.main(test.scala:21)
at com.zte.flink.test.main(test.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:529)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:421)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427)
at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:813)
at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:287)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:213)
at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1050)
at org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1126)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1126)
Caused by: java.lang.ClassNotFoundException: org.netlib.blas.Dgemm
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 26 more

It appears that org.netlib.blas.Dgemm is in your local CLASSPATH, but is not on the server's CLASSPATH. You'll need to either build a fat jar that includes this library and then submit the far jat to the cluster (recommended), or put it in the lib directory of all of the flink servers.
See the documentation for more details on how to handle dependencies for your Flink applications.

Related

AWS glue NoClassDefFoundError on job.init()

Trying to debug AWS Glue scripts locally using Glue ETL library.
I have installed aws-glue-libs and spark-3.1.1-amzn-0-bin-3.2.1-amzn-3.tgz.
When I run job.init(), I get the following error trace:
py4j.protocol.Py4JJavaError: An error occurred while calling z:com.amazonaws.services.glue.util.Job.init.
: java.lang.NoClassDefFoundError: com/typesafe/config/ConfigMergeable
at com.amazonaws.services.glue.util.Job$.init(Job.scala:93)
at com.amazonaws.services.glue.util.Job.init(Job.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: com.typesafe.config.ConfigMergeable
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 13 more
This error wasn't resolved per say. But i found a workaround. Instead of running my scripts from pycharm, i run them using gluesparksubmit bash command. Now it doesn't throw error at job.init(). Still trying to figure out how to get access to data catalog when running glue scripts from local machine.
If it was for ConfigMergeable, check if proper jar file - config-1.3.3.jar - exists in your /opt/spark/jars dir.
Whole idea is that jars in /opt/spark/jars and ./aws-glue-libs/jarsv1 should match.

How to add external jar to nifi cluster?

I am trying to add an external JAR to my nifi cluster.
I am following the codes of this post: Nifi multipart form
The script body is exactly the same as the one in the post except that it is without Grab phrase.
I have downloaded the jars: httpcore-4.3.2 and httpmime-4.5.9.jar from maven repo and I have put them into local file system of the cluster.
Then I gave the location of these jars to the additional classpath of ExecuteGroovyScript.
However i have an error of "unable to resolve class ContentType".
It seems it never found the jar.
Someone helps please.
FYI: I am working on a cluster which I cannot add the external jars directly to /lib so i cannot use grab.
and here is the error from nifi-app.log of my local nifi:
2021-05-17 17:38:44,825 ERROR [Timer-Driven Process Thread-10] o.a.n.p.groovyx.ExecuteGroovyScript ExecuteGroovyScript[id=7a78065f-0179-1000-2e4b-daefbdcf006a] java.lang.NoClassDefFoundError: org/apache/http/HttpEntity: java.lang.NoClassDefFoundError: org/apache/http/HttpEntity
java.lang.NoClassDefFoundError: org/apache/http/HttpEntity
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.codehaus.groovy.runtime.callsite.CallSiteArray$1.run(CallSiteArray.java:68)
at org.codehaus.groovy.runtime.callsite.CallSiteArray$1.run(CallSiteArray.java:65)
at java.security.AccessController.doPrivileged(Native Method)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.createCallStaticSite(CallSiteArray.java:65)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.createCallSite(CallSiteArray.java:162)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:117)
at Scriptffffffffea61d3da$_run_closure1.doCall(Scriptffffffffea61d3da.groovy:10)
at sun.reflect.GeneratedMethodAccessor327.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1022)
at groovy.lang.Closure.call(Closure.java:414)
at org.apache.nifi.processors.groovyx.flow.GroovySessionFile$5.process(GroovySessionFile.java:176)
at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2923)
at org.apache.nifi.processors.groovyx.flow.ProcessSessionWrap.write(ProcessSessionWrap.java:823)
at org.apache.nifi.processors.groovyx.flow.SessionFile.write(SessionFile.java:93)
at org.apache.nifi.processors.groovyx.flow.GroovySessionFile.write(GroovySessionFile.java:174)
at org.apache.nifi.processors.groovyx.flow.GroovySessionFile$write.call(Unknown Source)
at Scriptffffffffea61d3da.run(Scriptffffffffea61d3da.groovy:9)
at org.apache.nifi.processors.groovyx.ExecuteGroovyScript.onTrigger(ExecuteGroovyScript.java:449)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1162)
at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:205)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: org.apache.http.HttpEntity
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 38 common frames omitted
as i can see you have to include more libraries to use httpmime-4.5.9
org.apache.httpcomponents/httpmime/jars/httpmime-4.5.9.jar
org.apache.httpcomponents/httpclient/jars/httpclient-4.5.9.jar
org.apache.httpcomponents/httpcore/jars/httpcore-4.4.11.jar
commons-logging/commons-logging/jars/commons-logging-1.2.jar
commons-codec/commons-codec/jars/commons-codec-1.11.jar
you can run the following code in groovy console to determine all libraries required for one dependency:
#Grab(group='org.apache.httpcomponents', module='httpmime', version='4.5.9')
import groovy.grape.Grape
def grape = Grape.getInstance()
def r = grape.listDependencies(this.getClass().getClassLoader())
println grape.resolve(r[0]).join('\n')

Hadoop: Exception in thread "main" java.lang.ClassNotFoundException: com.bogotobogo.hadoop.WordCount

I am very new to hadoop and I am following along with this tutorial: http://www.bogotobogo.com/Hadoop/BigData_hadoop_Creating_Wordcount_Maven_Project_Eclipse_MapReduce.php
I have built with Maven and created jar file. Now when I run the jar using
hadoop jar hadoop/target/wordcount-0.0.1-SNAPSHOT.jar com.bogotobogo.hadoop.WordCount input/wordcount.txt output
I get this stacktrace:
Exception in thread "main" java.lang.ClassNotFoundException: com.bogotobogo.hadoop.WordCount
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
I am not sure if it has something to do with classpath or hadoop_classpath.
What could be the error?
Thanks in advance

Unable to load Hive-JDBC driver when accessed through MapReduce program on Amazon's Elastic MapReduce

I have written a MapReduce program in which I am storing some part of output data into Hive table.
I have used Hive-JDBC driver to access Hive table via MapReduce code.
This program has compiled successfully on local machine.
After this, I created a JAR file and uploaded it on S3. Then I created an elasticmapreduce cluster and started it.
However, it is resulting into below mentioned errors:
java.lang.Throwable: Child Error at
org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) Caused
by: java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
attempt_201407161054_0001_m_000001_0: java.lang.ClassNotFoundException:
org.apache.hadoop.hive.jdbc.HiveDriver
attempt_201407161054_0001_m_000001_0: at
java.net.URLClassLoader$1.run(URLClassLoader.java:366)
attempt_201407161054_0001_m_000001_0: at
java.net.URLClassLoader$1.run(URLClassLoader.java:355)
attempt_201407161054_0001_m_000001_0: at
java.security.AccessController.doPrivileged(Native Method)
attempt_201407161054_0001_m_000001_0: at
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
attempt_201407161054_0001_m_000001_0: at
java.lang.ClassLoader.loadClass(ClassLoader.java:424)
attempt_201407161054_0001_m_000001_0: at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
attempt_201407161054_0001_m_000001_0: at
java.lang.ClassLoader.loadClass(ClassLoader.java:357)
attempt_201407161054_0001_m_000001_0: at
java.lang.Class.forName0(Native Method)
attempt_201407161054_0001_m_000001_0: at
java.lang.Class.forName(Class.java:190)
attempt_201407161054_0001_m_000001_0: at
HubAndAuthority.InputHubMapper.configure(InputHubMapper.java:38)
attempt_201407161054_0001_m_000001_0: at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
attempt_201407161054_0001_m_000001_0: at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
attempt_201407161054_0001_m_000001_0: at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
attempt_201407161054_0001_m_000001_0: at
java.lang.reflect.Method.invoke(Method.java:606)
It appears to be an issue of missing Hive-JDBC driver and it should get resolved by adding Hive-JDBC driver in classpath. However, I am not aware of the exact step to do this on Amazon's EMR.
Could you please let me know what is missing from my end and how to resolve it?
Thanks and Regards,
Prafulla
I'm not sure enough, but you should try this:
"Note
If you want your custom classpath to override the original class path, you should set the environment variable, HADOOP_USER_CLASSPATH_FIRST to true so that the HADOOP_CLASSPATH value specified in hadoop-user-env.sh is first."
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hadoop-config.html
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hadoop-config_hadoop-user-env.sh.html
Regards,
revet

HBase completebulkload returns exception

I am trying to bulk-populate an HBase table quickly from a text file (several GB) by using the bulk loading method described in the Hadoop docs.
I have created an HFile which I now want to push to my HBase table.
When I use this command:
hadoop jar /home/hxcaine/hadoop/lib/hbase.jar completebulkload /user/hxcaine/dbpopulate/output/cf1 my_hbase_table
The job starts and then I get this exception:
Exception in thread "main" java.lang.NoClassDefFoundError: com/google/common/util/concurrent/ThreadFactoryBuilder
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:195)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:696)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:701)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.hbase.mapreduce.Driver.main(Driver.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
Caused by: java.lang.ClassNotFoundException: com.google.common.util.concurrent.ThreadFactoryBuilder
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
... 17 more
However, I can see that the Guava jar is in my classpath and when I check inside the jar I can see ThreadFactoryBuilder.class.
I am using these versions (and stuck with them):
Hadoop 0.20.2-cdh3u3
HBase 0.90.4-cdh3u3
Guava jar: /usr/lib/hadoop-0.20/lib/guava-r09-jarjar.jar
I do have an older Guava jar in my classpath but I don't know where it came from, I don't suppose it should have an effect.
Any ideas?
what happens if you run:
export HADOOP_CLASSPATH=`hbase classpath`
before running the load? From the stack trace, it looks like the jar is needed by one of the actual tasks though I am surprised to see that this actually kicks off an M/R job.

Resources