I did have HDP
I have external Jar example.jar for serialized flume data file.
I did add new parametr in section Custom hive-site
name = hive.aux.jars.path
value hdfs:///user/libs/
Did save new configuration and did restart hadoop componens and in more time restart all hadoop cluster.
After in Hive client I did try to run select
select * from example_serealized_table
and hive did return error
FAILED: RuntimeException MetaException(message:org.apache.hadoop.hive.serde2.SerDeException java.lang.ClassNotFoundException: Class com.my.bigtable.example.model.gen.TSerializedRecord not found)
How solve this problem?
If did try add in current session,
add jar hdfs:///user/libs/example-spark-SerializedRecord.jar;
Did try to put *.jar to local folder.
Problem same.
I did not say that library write my my colleague did write a library.
It did turn out that it redefines the variables that affect the level of logging the field.
After excluding overridden variables in the library, the problem of reproducing did stopp.
I'm running an SQL query on a JSON serde table. It's working in the Hive CLI, but it's failing in Hue with the error:
Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
I guess it's due to the missing jar file; any idea how to add the jar file hive-hcatalog-core-1.2.1.jar for Hue?
Place your jar in HDFS and add same path by using ADD JAR hdfs:///user/hive/lib/hive-hcatalog-core-1.2.1.jar ;
Run ADD JAR hive-hcatalog-core-1.2.1.jar in hue before your query this thing will be present till your current secession persists.
For the benefit of others, who might face same issue either for this particular jar "hive-hcatalog-core-1.2.1.jar" or any udf jar:
In the HUE - Query Editor, run the following command:
add jar hdfs:/hive-hcatalog-core-1.2.1.jar;
Please note single quotes is not required as is the case with Hive CLI
Exact command cloudera gave is ADD JAR {{lib_dir}}/hive/lib/hive-contrib.jar;
1)I am unable to find hive/lib directory on CDH 5
The {{lib_dir}} on CDH installed environments for Hive would either be /usr/lib/hive/ or /opt/cloudera/parcels/CDH/lib/hive/ (depending on packages or parcels being in use).
this is the way to add jar in cloudera
for this you have to change to supper user by use this command
it will change to supper user
I try to use flume with syslogs source and hbase sink.
when I run flume agent I get this error : Failed to start agent because dependencies were not found in classpath. Error follows. java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration, which means (from that question) that some hbase lib are missing, to solve it I need to set in flume-env.sh file the path to these lib,that what I did, and run flume but the error persisted here is the command that I used to run flume agent : bin/flume-ng agent --conf ./conf --conf-file ./conf/flume.properties --name agent -Dflume.root.logger=INFO,console so my question is, If the solution that I used is correct (I need to add lib to flume) why I still get the same error, if not how to solve that problem
from the doc I read : The flume-ng executable looks for and sources a file named "flume-env.sh" in the conf directory specified by the --conf/-c commandline option..
I haven't test it yet but I think that is the solution (I just need a confirmation )
I would recommend you to download HBase full tar ball and set the environment variables like HBASE_HOME etc to the right locations. Then Flume can automatically pick the libraries from HBase repo.
So I've been using sbt with assembly to package all my dependencies into a single jar for my spark jobs. I've got several jobs where I was using c3p0 to setup connection pool information, broadcast that out, and then use foreachPartition on the RDD to then grab a connection, and insert the data into the database. In my sbt build script, I include
"mysql" % "mysql-connector-java" % "5.1.33"
This makes sure the JDBC connector is packaged up with the job. Everything works great.
So recently I started playing around with SparkSQL and realized it's much easier to simply take a dataframe and save it to a jdbc source with the new features in 1.3.0
I'm getting the following exception :
java.sql.SQLException: No suitable driver found for
jdbc:mysql://some.domain.com/myschema?user=user&password=password at
java.sql.DriverManager.getConnection(DriverManager.java:596) at
When I was running this locally I got around it by setting
Ultimately what I'm wanting to know is, why is the job not capable of finding the driver when it should be packaged up with it? My other jobs never had this problem. From what I can tell both c3p0 and the dataframe code both make use of the java.sql.DriverManager (which handles importing everything for you from what I can tell) so it should work just fine?? If there is something that prevents the assembly method from working, what do I need to do to make this work?
This person was having similar issue: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-DataFrame-with-MySQL-td22178.html
Have you updated your connector drivers to the most recent version? Also did you specify the driver class when you called load()?
Map<String, String> options = new HashMap<String, String>();
options.put("url", "jdbc:mysql://localhost:3306/video_rcmd?user=root&password=123456");
options.put("dbtable", "video");
options.put("driver", "com.mysql.cj.jdbc.Driver"); //here
DataFrame jdbcDF = sqlContext.load("jdbc", options);
In spark/conf/spark-defaults.conf, you can also set spark.driver.extraClassPath and spark.executor.extraClassPath to the path of your MySql driver .jar
These options are clearly mentioned in spark docs: --driver-class-path postgresql-9.4.1207.jar --jars postgresql-9.4.1207.jar
The mistake I was doing was mentioning these options after my application's jar.
However the correct way is to specify these options immediately after spark-submit:
spark-submit --driver-class-path /somepath/project/mysql-connector-java-5.1.30-bin.jar --jars /somepath/project/mysql-connector-java-5.1.30-bin.jar --class com.package.MyClass target/scala-2.11/project_2.11-1.0.jar
Both spark driver and executor need mysql driver on class path so specify
spark.driver.extraClassPath = <path>/mysql-connector-java-5.1.36.jar
spark.executor.extraClassPath = <path>/mysql-connector-java-5.1.36.jar
With spark 2.2.0, problem was corrected for me by adding extra class path information for SparkSession session in python script :
spark = SparkSession \
.builder \
.appName("Python Spark SQL basic example") \
.config("spark.driver.extraClassPath", "/path/to/jdbc/driver/postgresql-42.1.4.jar") \
See official documentation https://spark.apache.org/docs/latest/configuration.html
In my case, spark is not launched from cli command, but from django framework https://www.djangoproject.com/
spark.driver.extraClassPath does not work in client-mode:
Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point. Instead, please set this through the --driver-class-path command line option or in your default properties file.
Env variable SPARK_CLASSPATH has been deprecated in Spark 1.0+.
You should first copy the jdbc driver jars into each executor under the same local filesystem path and then use the following options in you spark-submit:
--driver-class-path "driver_local_file_system_jdbc_driver1.jar:driver_local_file_system_jdbc_driver2.jar"
--class "spark.executor.extraClassPath=executors_local_file_system_jdbc_driver1.jar:executors_local_file_system_jdbc_driver2.jar"
For example in case of TeraData you need both terajdbc4.jar and tdgssconfig.jar .
Alternatively modify compute_classpath.sh on all worker nodes, Spark documentation says:
The JDBC driver class must be visible to the primordial class loader on the client session and on all executors. This is because Java’s DriverManager class does a security check that results in it ignoring all drivers not visible to the primordial class loader when one goes to open a connection. One convenient way to do this is to modify compute_classpath.sh on all worker nodes to include your driver JARs.
There exists a simple Java trick to solve your problem. You should specify Class.forName() instance. For example:
val customers: RDD[(Int, String)] = new JdbcRDD(sc, () => {
"SELECT id, name from customer WHERE ? < id and id <= ?" ,
0, range, partitions, r => (r.getInt(1), r.getString(2)))
Check the docs
Simple easy way is to copy "mysql-connector-java-5.1.47.jar" into "spark-2.4.3\jars\" directory
I had the same problem running jobs over a Mesos cluster in cluster mode.
To use a JDBC driver is necessary to add the dependency to the system classpath not to the framework classpath. I only found the way of doing it by adding the dependency in the file spark-defaults.conf in every instance of the cluster.
The properties to add are spark.driver.extraClassPath and spark.executor.extraClassPath and the path must be in the local file system.
I add the jar file to the SPARK_CLASSPATH in spark-env.sh, it works.
export SPARK_CLASSPATH=$SPARK_CLASSPATH:/local/spark-1.6.3-bin-hadoop2.6/lib/mysql-connector-java-5.1.40-bin.jar
I was facing the same issue when I was trying to run the spark-shell command from my windows machine. The path that you pass for the driver location as well as for the jar that you would be using should be in the double quotes otherwise it gets misinterpreted and you would not get the exact output that you want.
you also would have to install the JDBC driver for SQL server from the link : JDBC Driver
I have used the below command for this to work fine for me on my windows machine:
spark-shell --driver-class-path "C:\Program Files\Microsoft JDBC Driver 6.0 for SQL Server\sqljdbc_6.0\enu\jre8\sqljdbc42.jar" --jars "C:\Program Files\Microsoft JDBC Driver 6.0 for SQL Server\sqljdbc_6.0\enu\jre8\sqljdbc42.jar"
I use Hive to create table store sequencefile. Row format is serder class myserde.TestDeserializer in hiveserde-1.0.jar
In the command line I use this command to add the jar file:
hive ADD JAR hiveserde-1.0.jar
Then I create a table, the file loads successfully.
But now I want to run it and create a table on the client by using mysql jdbc.
The error is :
SerDe: myserde.TestDeserializer does not exist.
How to run it ? Thanks
So, there are a few options. In all of them the jar needs to be present on your cluster with Hive installed. The JDBC client code, of course, can be run from anywhere within or outside of the cluster.
Option 1: You issue a HQL query before you run any of your other HQL commands:
ADD JAR hiveserde-1.0.jar
Option 2: You can update your hive-site.xml to have the
hive.aux.jars.path property set to the complete path to your jar hiveserde-1.0.jar
Go to your hive-env.sh and append to the bottom of the file:
export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH:/<path-to-jar>
You can then source this file. Not ideal, but it works.
Are you saying that you'd like to create table by jdbc rather than doing in CLI ? In that case, you should add the jar to your classpath when you run your jdbc code.
Yes this can be a little bit confusing, it seems half the time Hive is reading from the cluster and the other half from the local file system (machine Hive server is installed).
To overcome this simple copy the .jar file to the Hive server machine and you can then reference this in your Hive query for example:
add jar /tmp/json-serde.jar;
create table tweets (
name string,
address1 string,
address2 string,
address3 string,
postcode string
And then onto the next problem ;)
We're using the Cloudera 3.7.5 and having a tough time configuring the Beeswax server such that the Hue can access the Hive databases. I followed all the instructions from the Cloudera documentation that to setup MySQL to serve as Hive's metastore, but when I restart the Hue services and check Beeswax server's StdErr logs, I still see the painful "javax.jdo.JDOFatalInternalException: Error creating transactional connection factory" which is caused by
org.datanucleus.exceptions.NucleusException: Attempt to invoke the "DBCP" plugin to create a ConnectionPool gave an error : The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
This is bizzare to me, because the logs also indicate that the environment variable HIVE_HOME is equal to "/usr/lib/hive", and sure enough I have copied the "mysql-connector-java-5.1.15-bin.jar" into the /usr/lib/hive/lib directory, as the documents dictate.
I have also tried the instructions on the blog post http://hadoopchallenges.blogspot.com/2011/03/hue-120-upgrade-and-beeswax.html, which involved copying the the mysql-connector jar into "/usr/share/hue/apps/beeswax/hive/lib/". Unfortunately I did not have a hive/lib subdirectory in the beeswax folder, so I attempted to make one. This also did not work.
Any advice how I can get the MySQL JDBC library onto Beeswax's classpath?
We finally decided to just bite the bullet and upgrade to CDH4. Placing the JDBC jar in /usr/share/hive/lib allowed the Beeswax server to function perfectly without issue.
If anyone else is experiencing this issue I recommend upgrading from CDH3 to CDH4, the UI is much cleaner, smoother, and we had much fewer installation and maintenance bugs with CDH4.
You have to paste your mysql connector in HUE_HOME/apps/beeswax/hive/lib.
If this path doesn't exist, create hive/lib and then paste the mysql connector. I hope your problem will be solved.
When you start using cloudera 4.5 they move everything into parcels, so this exact problem on my hive meta server was fixed by this command (below). Essentially you're just re-adding modules. I'm sure you can modify the extra classpath in the hive config file to make this oblivious to parcel updates.
cp /usr/lib/hive/lib/mysql-connector-java-5.1.17-bin.jar /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hive/lib/.
So a real fix might be something like this:
cp `locate mysql-connector | grep jar | head -n 1` /opt/cloudera/parcels/*/lib/hive/lib/.
which would copy the jar into every parcel.