apache zepplelin shows ava.lang.classnotfoundexception: com.mysql.jdbc.driver error - jdbc

name value
common.max_count 1000
default.driver org.mysql.jdbc.Driver
default.password ****
default.url jdbc:mysql://localhost:3306/
default.user root
zeppelin.interpreter.localRepo /usr/local/zeppelin/local-repo/2DCVRUUK8
zeppelin.interpreter.output.limit 102400
zeppelin.jdbc.auth.type
zeppelin.jdbc.concurrent.max_connection 10
zeppelin.jdbc.concurrent.use true
Dependencies
artifact exclude
/usr/local/zeppelin/interpreter/jdbc/mysql-connector-java-5.1.46-bin.jar
These are my interpreter settings. I have loaded mysql-connector-java-5.1.46-bin.jar with correct path, then I am still unable to run this.

On a side note, if anyone is trying to access mysql table using spark as below
val tempDF = spark.read.jdbc(<JdbcConnectionURL>, "table_name", <ConnectionProperties>)
tempDF.createOrReplaceTempView("tempdf")
tempDF.show(10,false)
And encounters com.mysql.jdbc.driver error due to missing dependency then we can add the mysql-connector as dependency on spark interpreter as mentioned by #cricket_007
or by calling the below deprecated command on the first paragraph of zeppelin notebook
%dep
z.load("mysql:mysql-connector-java:8.0.11")

Zeppelin can just use Maven targets
Add this
mysql:mysql-connector-java:5.1.46
Restart the interpreter

Related

Is it possible to read a file using SparkSession object of Scala language on Windows?

I've been trying to read from a .csv file on many ways, utilizing SparkContext object. I found it possible through scala.io.Source.fromFile function, but I want to use spark object. Everytime I run function textfile for org.apache.spark.SparkContext I get the same error:
scala> sparkSession.read.csv("file://C:\\Users\\184229\\Desktop\\bigdata.csv")
21/12/29 16:47:32 WARN streaming.FileStreamSink: Error while looking for metadata directory.
java.lang.UnsupportedOperationException: Not implemented by the DistributedFileSystem FileSystem implementation
.....
As it's mentioned in the title I run the code on Windows in IntelliJ
[Edit]
In build.sbt have no redundant or overlapped dependencies. I use hadoop-tools, spark-sql and hadoop-xz.
Have you tried to run your spark-shell using local mode?
spark-shell --master=local
Also pay attention to not use both Hadoop-code and Hadoop-commons as a dependencies since you may have conflicting jars issues.
I've found the solution, precisely one of my colleague did that.
In dependencies build.sbt I changed hadoop-tools to hadoop-commons and it worked out.

org.apache.kylin.job.exception.ExecuteException: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/serde2/typeinfo/TypeInfo

I find similar error on https://issues.apache.org/jira/browse/KYLIN-2511
env:
hadoop-2.7.1
hbase-1.3.2
apache-hive-2.1.1-bin
apache-kylin-1.6.0-hbase1.x-bin
I've tried copy all the hive libs to kylin, but get another ERROR.
org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoClassDefFoundError: org/apache/hadoop/hive/serde2/typeinfo/TypeInfo
The missing class should be in hive-exec-.jar; Check and debug the "bin/find-hive-dependency.sh" to see why it wasn't able to locate this jar from your server. You can manually add it to the "hive_exec_path" variable.
BTW, Kylin 1.6 is quite old, try to upgrade to a 2.x version.
Why you just try the method mentioned in https://issues.apache.org/jira/browse/KYLIN-2511. You'd better prepare the env according to the document of v16. It is better for using the latest version of Kylin. It has more feature and fixes some bugs.

Kafka Connect Elasticsearch - NoSuchMethodError

I am trying to run the kafka-connect-elasticsearch plugin from Confluent in order to stream topics from Kafka (V0.11.0.1) directly into Elasticsearch (without putting Logstash in between).
I build the connector using Maven -
$ cd kafka-connect-elasticsearch
$ mvn clean package
I then created the require configuration file -
name=es-cluster-lab
connector.class=io.confluent.connect.elasticsearch.ElasticsearchSinkConnector
tasks.max=1
topics=filebeats-test
topic.index.map=filebeats-test:kafka_test_index
key.ignore=true
schema-ignore=true
connection.url=http://elastic:9200
type.name=log
As per the new Kafka Classpath Isolation spec, I also added the following line to my connect-standalone.properties file -
plugin.path=/home/kafka/kafka-connect-elasticsearch-3.3.0/target/kafka-connect-elasticsearch-3.3.0-development/share/java/kafka-connect-elasticsearch/
I go to run the script ...
bin/connect-standalone.sh config/connect-standalone.properties config/elasticsearch-connect.properties
... and receive the below error.
[2017-09-14 16:08:26,599] INFO Loading plugin from: /home/kafka/kafka-connect-elasticsearch-3.3.0/target/kafka-connect-elasticsearch-3.3.0-development/share/java/kafka-connect-elasticsearch/slf4j-api-1.7.25.jar (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:176)
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.collect.Sets$SetView.iterator()Lcom/google/common/collect/UnmodifiableIterator;
at org.reflections.Reflections.expandSuperTypes(Reflections.java:380)
at org.reflections.Reflections.<init>(Reflections.java:126)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.scanPluginPath(DelegatingClassLoader.java:221)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.scanUrlsAndAddPlugins(DelegatingClassLoader.java:198)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.registerPlugin(DelegatingClassLoader.java:190)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.initLoaders(DelegatingClassLoader.java:150)
at org.apache.kafka.connect.runtime.isolation.Plugins.<init>(Plugins.java:47)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:68)
I also tried to move the JAR files into the /app/kafka/libs directory (default CLASSPATH) and even tried to create a subdirectory /app/kafka/libs/connect_libs and add that manually to my CLASSPATH environment variable.
Not sure what my next step is besides putting Logstash between Kafka and Elastic.
try to change the guava version to 20 before you build it
I think you are missing the star '*' at the end of the path of the plugin path.
plugin.path=/home/kafka/kafka-connect-elasticsearch-3.3.0/target/kafka-connect-elasticsearch-3.3.0-development/share/java/kafka-connect-elasticsearch/*

Unresponsive Spark Master

I'm trying to run a simple Spark app in Standalone mode on Mac.
I manage to run ./sbin/start-master.sh to start the master server and worker.
./bin/spark-shell --master spark://MacBook-Pro.local:7077 also works and I can see it in running application list in Master WebUI
Now I'm trying to write a simple spark app.
import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.SparkContext._
object SimpleApp {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Simple Application")
.setMaster("spark://MacBook-Pro.local:7077")
val sc = new SparkContext(conf)
val logFile = "README.md"
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
}
}
Running this simple app gives me error message that Master is unresponsive
15/02/15 09:47:47 INFO AppClient$ClientActor: Connecting to master spark://MacBook-Pro.local:7077...
15/02/15 09:47:48 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkMaster#MacBook-Pro.local:7077] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15/02/15 09:48:07 INFO AppClient$ClientActor: Connecting to master spark://MacBook-Pro.local:7077...
15/02/15 09:48:07 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkMaster#MacBook-Pro.local:7077] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15/02/15 09:48:27 INFO AppClient$ClientActor: Connecting to master spark://MacBook-Pro.local:7077...
15/02/15 09:48:27 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkMaster#MacBook-Pro.local:7077] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15/02/15 09:48:47 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
15/02/15 09:48:47 WARN SparkDeploySchedulerBackend: Application ID is not initialized yet.
15/02/15 09:48:47 ERROR TaskSchedulerImpl: Exiting due to error from cluster scheduler: All masters are unresponsive! Giving up.
Any idea what is the problem?
Thanks
You can either set the master when calling spark-submit, or (as you've done here) by explicitly setting it via the SparkConf. Try following the example in the Spark Configuration docs, and setting the master as follows:
val conf = new SparkConf().setMaster("local[2]")
From the same page (explaining the number in brackets that follows local): "Note that we run with local[2], meaning two threads - which represents “minimal” parallelism, which can help detect bugs that only exist when we run in a distributed context."
I got the same issue and solve it finally. In my case, because I wrote the source code based on scala 2.11. But for spark, I build it with Maven following the default command:
build/mvn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package
According to this build script, it will set the version of scala to version 2.10. Due to the different scala version between Spark Client and Master, it will raise incompatible serialization when client send message to master via remote actor. Finally "All masters are unresponsive" error message was shown in the console.
My Solution:
1. Re-build spark for scala 2.11 (Make sure your programming env to scala 2.11). Please run this command as below in SPARK_HOME:
dev/change-version-to-2.11.sh
mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -DskipTests clean package
After building, the package will be located in SPARK_HOME/assembly/target/scala-2.11. If you start your spark server using start-all.sh, it will report the target package can't found.
Go to conf folder, edit spark-env.sh file. Append the code line as below:
export SPARK_SCALA_VERSION="2.11"
Please run start-all.sh, and set the correct master url in your program, and run it. It done!
Notice: The error message in the console is not enough. So that you need toggle your log feature on to inspect what happen. You can go to conf folder, and copy log4j.properties.template to log4j.properties. After the spark master was started, the log files will save on SPARK_HOME/logs folder.
I write my code in JAVA, but I got the same problem with you. Because my scala version is 2.10, my dependencies is 2.11. Then I changed spark-core_2.11 and spark-sql_2.11 to spark-core_2.10 and spark-sql_2.10 in pom.xml. Maybe you can solve your issue in similar way.
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>${spark.version}</version>
</dependency>

Hadoop Hive integration INSERT query

i'am hadoop newbie,and i'am trying this tutorial:
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration
1.Starting hive is done successfully with the parameter:
hive --auxpath /cygdrive/c/Hadoop/hive-0.9.0/lib/hive-hbase-handler-0.9.0.jar,/cygdrive/c/javaHBase/hbase-0.94.6/hbase-0.94.6.jar,/cygdrive/c/Hadoop/hive-0.9.0/lib/zookeeper-3.4.3.jar,/cygdrive/c/Hadoop/hive-0.9.0/lib/guava-r09.jar -hiveconf hbase.master=localhost:60010
2.starting hbase is successuful.
3."CREATE TABLE hbase_table_1" is done successfully
4.I verified with commands list and show tables, all is ok
here is my problem
"INSERT OVERWRITE TABLE hbase_table_1 SELECT * FROM pokes WHERE foo=98;"
i get this error message searching in "htp://localhost:50060/tasklog?attemptid..."
java.lang.ClassNotFoundException: org/apache/hadoop/hive/hbase/HBaseSerDe
Continuing ...
java.lang.ClassNotFoundException:
org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat
Continuing ...
java.lang.ClassNotFoundException:
org/apache/hadoop/hive/hbase/HiveHBaseTableOutputFormat
Continuing ...
java.lang.NullPointerException
Continuing ...
java.lang.NullPointerException
at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:280)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)
at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:62)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)
at org.apache.hadoop.hive.ql.exec.FilterOperator.initializeOp(FilterOperator.java:78)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)
...
i tried to copy hive jars to hbase install and vice vesa...
note: i added necessary JARS with hive command: ADD JAR C:...\hive-hbase-handler-0.9.0.jar etc.
hbase version: 0.94.6
hive version: 0.9.0
any additional export? or configuration?
I need help please!
Thanks a lot!
Try adding these jars in ditributed cache by running following commands in hive CLI or include these lines in $HOME/.hiverc file. That should resolve the ClassNotFoundException.
ADD JAR ...../hive-0.9.0/lib/hive-hbase-handler-0.9.0.jar;
ADD JAR ...../hbase-0.94.1/hbase-0.94.6.jar;
ADD JAR ...../hbase-0.94.1/lib/zookeeper-3.4.3.jar;
ADD JAR ...../hbase-0.94.1/lib/guava-11.0.2.jar;
ADD JAR ...../hbase-0.94.1/lib/protobuf-java-2.4.0a.jar;
This is required when running hive queries in mapred mode (not local).
Brother do following things. your problem will be solved. ( change according to your version )
copy these files in hadoop lib directory
$HIVE_HOME/lib/hive-hbase-handler-0.8.1.jar,$HIVE_HOME/lib/hbase-0.90.0.jar,$HIVE_HOME/lib/zookeeper-3.3.1.jar,$HIVE_HOME/lib/guava-r06.jar,
and hive-serde jar

Resources