Connect Hive through Java JDBC - hadoop

There is a question here connect from java to Hive but mine is different
My hive running on machine1 and I need to pass some queries using Java server running at machine2. As I understand Hive has a JDBC interface for the purpose of receiving remote queries. I took the code from here - HiveServer2 Clients
I installed the dependencies written in the article:
hive-jdbc*.jar
hive-service*.jar
libfb303-0.9.0.jar
libthrift-0.9.0.jar
log4j-1.2.16.jar
slf4j-api-1.6.1.jar
slf4j-log4j12-1.6.1.jar
commons-logging-1.0.4.jar
However I got java.lang.NoClassDefFoundError error at compile time
Full Error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
at org.apache.hive.jdbc.HiveConnection.createBinaryTransport(HiveConnection.java:393)
at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:187)
at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:163)
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
at java.sql.DriverManager.getConnection(DriverManager.java:571)
at java.sql.DriverManager.getConnection(DriverManager.java:215)
at com.bidstalk.tools.RawLogsQuerySystem.HiveJdbcClient.main(HiveJdbcClient.java:25)
Another question at StackOverflow recommended to add Hadoop API dependencies in Maven - Hive Error
I don't understand why do I need hadoop API for a client to connect with Hive. Shouldn't JDBC driver be agnostic of the underlying query system? I just need to pass some SQL query?
Edit:
I am using Cloudera(5.3.1), I think I need to add CDH dependencies. Cloudera instance is running hadoop 2.5.0 and HiveServer2
But the servers are at machine 1. On machine the code should at least compile and I should have issues at runtime only!

In case if you didn't still solve this, I have given it a go.
And I needed the following dependencies for it to compile and run :
libthrift-0.9.0-cdh5-2.jar
httpclient-4.2.5.jar
httpcore-4.2.5.jar
commons-logging-1.1.3.jar
hive-common.jar
slf4j-api-1.7.5.jar
hive-metastore.jar
hive-service.jar
hadoop-common.jar
hive-jdbc.jar
guava-11.0.2.jar
The hive documentation is probably written against a older version/distribution.
Your exception is due to the missing hadoop-common jar, which has the org.apache.hadoop.conf.Configuration.
Hope this helps.

Getting the same error when trying to use hive-jdbc 1.2.1 against hive 0.13.
Comparing to the long list in other answers. Now we use these two:
hive-jdbc-1.2.1-standalone.jar
hadoop-common-2.7.1.jar
Another side note: you might get 'Required field 'client_protocol' is unset!' when using the latest jdbc against an older Hive. If so, change the jdbc version to 1.1.0:
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>1.1.0</version>
<classifier>standalone</classifier>
</dependency>

Answering my own question!
With some hit and trial, I have added following dependencies on my pom file and since then I am able to run code on both CHD 5.3.1 and 5.2.1 cluster.
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>0.13.1-cdh5.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.thrift</groupId>
<artifactId>libthrift</artifactId>
<version>0.9.0</version>
</dependency>
<dependency>
<groupId>org.apache.thrift</groupId>
<artifactId>libfb303</artifactId>
<version>0.9.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>2.5.0-mr1-cdh5.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.5.0-cdh5.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>0.13.1-cdh5.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.5.0-cdh5.3.1</version>
</dependency>
<dependency>
Please note that some of these dependencies might not be required

For others wondering around about what exactly is required to execute remotely a HIVE query using java...
Java code
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
import java.sql.Statement;
public class Runner
{
private static String driverName = "org.apache.hive.jdbc.HiveDriver";
public static void main(String[] args) throws SQLException {
try {
// Register driver and create driver instance
Class.forName(driverName);
} catch (ClassNotFoundException ex) {
ex.printStackTrace();
}
// get connection
System.out.println("before trying to connect");
Connection con = DriverManager.getConnection("jdbc:hive2://[HOST IP]:10000/", "hive", "");
System.out.println("connected");
// create statement
Statement stmt = con.createStatement();
// execute statement
stmt.executeQuery("show tables");
con.close();
}
}
Together with the pom file with the only required dependencies..
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>test-executor</groupId>
<artifactId>test-executor</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<hadoop.version>2.5.2</hadoop.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>1.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
</dependency>
</dependencies>
</project>

I have faced the same issue with CDH5.4.1 version. I updated my POM file with the below code and it worked for me.
My Hadoop Version is Hadoop 2.6.0-cdh5.4.1 and Hive version is Hive 1.1.0-cdh5.4.1
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>0.13.0</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>0.13.0</version>
</dependency>
<dependency>
<groupId>org.apache.thrift</groupId>
<artifactId>libthrift</artifactId>
<version>0.9.0</version>
</dependency>
<dependency>
<groupId>org.apache.thrift</groupId>
<artifactId>libfb303</artifactId>
<version>0.9.0</version>
</dependency>
<dependency>
<groupId>commons-logging</groupId>
<artifactId>commons-logging</artifactId>
<version>1.1.3</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.0</version>
</dependency>
I have resolved with this POM update.

Seems like you are all working with cloudera, I found that the repo in maven looks old because if you go to their site, you can download their jdbc. https://www.cloudera.com/downloads/connectors/hive/jdbc/2-5-20.html
The driver seems to support more functionality than the one in hive. I notice that that they have addBatch implemented. I just wish they had these libraries in maven. Maybe someone can find where to get them from using maven.

Related

Apache Camel Integration with Elasticsearch

I'm working on a project using Apache Camel and Elasticsearch and I was wondering which version of Elasticsearch does Camel support?
My pom.xml looks like this:
<dependencies>
<dependency>
<groupId>org.apache.camel</groupId>
<artifactId>camel-core</artifactId>
<version>2.18.2</version>
</dependency>
<dependency>
<groupId>org.apache.camel</groupId>
<artifactId>camel-elasticsearch</artifactId>
<version>2.18.2</version>
</dependency>
<dependency>
<groupId>org.apache.camel</groupId>
<artifactId>camel-xmljson</artifactId>
<version>2.18.2</version>
</dependency>
<dependency>
<groupId>xom</groupId>
<artifactId>xom</artifactId>
<version>1.2.5</version>
</dependency>
But when I want to route a file to elasticsearch, I've got the following error:
java.lang.IllegalStateException: Received message from unsupported version: [2.0.0] minimal compatible version is: [5.0.0]
I found that this exception is due to a node or a TransportClient using an old version. So I try to add the elasticsearch dependency :
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>5.1.2</version>
</dependency>
But it gives me a new error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/elasticsearch/action/WriteConsistencyLevel
So I'm wondering.. Which version of ES can I use with Apache Camel?
The code for trying to send data to elasticsearch:
XmlJsonDataFormat xmlJsonFormat = new XmlJsonDataFormat();
from("file://C:/Projects/?fileName=data.xml&charset=utf-8")
.marshal(xmlJsonFormat)
.to("elasticsearch://clusterES?transportAddresses=127.0.0.1:9300&operation=BULK_INDEX&indexName=xml&indexType=account");
I don't think you need to add any other pom except the camel-elasticsearch. It seems more likely that you have a TransportClient running on an older version. You need to find it and upgrade the TransportClient.
https://www.elastic.co/guide/en/elasticsearch/guide/current/_transport_client_versus_node_client.html
https://discuss.elastic.co/t/received-message-from-unsupported-version-2-0-0-minimal-compatible-version-is-5-0-0/64708

java.lang.NoClassDefFoundError: com/google/common/util/concurrent/FutureFallback

I'm seeing the following error :
java.lang.NoClassDefFoundError: com/google/common/util/concurrent/FutureFallback
when the cluster.connect() is called :
String hosts = CassandraClientUtil.getHost();
String localDC = CassandraClientUtil.getLocalDC();
Cluster cluster = null;
if (StringUtils.isNotEmpty(localDC))
{
cluster = Cluster.builder().addContactPoints(hosts.split(","))
.withCredentials(CassandraCopsComponentLogger.USER_NAME, CassandraCopsComponentLogger.AUTH_CODE)
.withQueryOptions(new QueryOptions().setConsistencyLevel(ConsistencyLevel.LOCAL_ONE))
.withLoadBalancingPolicy(new TokenAwarePolicy(DCAwareRoundRobinPolicy.builder().withLocalDc(localDC).build())).build();
}
else
{
cluster = Cluster.builder().addContactPoints(hosts.split(","))
.withCredentials(CassandraCopsComponentLogger.USER_NAME, CassandraCopsComponentLogger.AUTH_CODE)
.withQueryOptions(new QueryOptions().setConsistencyLevel(ConsistencyLevel.LOCAL_ONE)).build();
}
Session session = cluster.connect();
CassandraCopsComponentLogger.mappingManager = new MappingManager(session);
The pom.xml has the following dependencies :
<dependencies>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>16.0.1</version>
</dependency>
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-core</artifactId>
<version>2.1.9</version>
</dependency>
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty</artifactId>
<version>3.9.0.Final</version>
</dependency>
<dependency>
<groupId>com.codahale.metrics</groupId>
<artifactId>metrics-core</artifactId>
<version>3.0.2</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.7.5</version>
</dependency>
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-mapping</artifactId>
<version>2.1.9</version>
</dependency>
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.3.1</version>
</dependency>
</dependencies>
I saw a post on stackoverflow here
where they recommended to upgrade the guava version to 16.0.1 but that did not help me solve my problem. Some directions from here will be really helpful as I'm new to cassandra. To add more background this thing works as a standalone project, when I include this project as a maven dependency to some other project it raises this runtime error.
com.google.common.util.concurrent.FutureFallback is deprecated in Guava 19.0 and removed since Guava 20.0.
Use Guava 19.0 and do not use Guava 20.0 or greater, until you upgrade the Cassandra driver.
I updated the Cassandra driver version to latest available and it should fix the issue.
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-core</artifactId>
<version>3.5.0</version>
</dependency>
Don't add external guava version . whatever datastax-cassandra-core using only you can put that version . otherwise don't need of that .
If anyone like me didn't know, that there is a new version (4.x) out there with a new and different group id, take a look at the quickstart. This new version still uses Guava however it's shaded.
The driver now requires Java 8. It does not depend on Guava anymore (we still use it internally but it's shaded).
More information can be found in the upgrade guide.

Error with Flink 0.10.1

With flink 0.10.1 in local I can't connect with jobmanager due the following error:
Association with remote system [akka.tcp://flink#127.0.0.1:49789] has failed, address is now gated for [5000] ms. Reason is: [scala.Option; local class incompatible: stream classdesc serialVersionUID = -2062608324514658839, local class serialVersionUID = -114498752079829388].
And my pom.xml:
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>0.10.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients</artifactId>
<version>0.10.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-hadoop-compatibility</artifactId>
<version>0.10.1-hadoop1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-jdbc</artifactId>
<version>0.10.1-hadoop1</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.6</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-ml</artifactId>
<version>0.10.1-hadoop1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java</artifactId>
<version>0.10.1</version>
</dependency>
</dependencies>
With flink 0.9.1 works fine ... What I'm missing? Thank you!
It sounds like a version miss match, ie, that you have old 0.9.1 binaries in your code base... Try to clean your maven cache via
cd ~/.m2/repositories/org/apache/flink
rm -rf *
Afterward, rebuild your project: mvn -DskipTests clean package
Ok, the problem was the following:
On the flink download page there are several links to Flink project(binaries and source)
I had downloaded the normal ones without hadoop because is for testing purposes in my local and I don't using hadoop.
But I dont't now why its need the hadoop ones with scala I have downloaded:
Hadoop 2.7.0 with scala 2.10 and it works.

Maven embedded deploy not working with org.apache.httpcomponents.httpclient version 4.4

Within my application I have to (mvn) deploy artifacts programmatically. I do this with the help of the maven-embedder artifact and some really simple code:
MavenCli client = new MavenCli();
int result = client.doMain(new String[] { "deploy" }, "C:/some/path/to/my/pom", System.out, System.out);
To be able to do this, I had to add the following dependencies to my pom:
<dependency>
<groupId>org.apache.maven</groupId>
<artifactId>maven-embedder</artifactId>
<version>3.2.5</version>
</dependency>
<dependency>
<groupId>org.eclipse.aether</groupId>
<artifactId>aether-connector-basic</artifactId>
<version>1.0.2.v20150114</version>
</dependency>
<dependency>
<groupId>org.eclipse.aether</groupId>
<artifactId>aether-transport-wagon</artifactId>
<version>1.0.2.v20150114</version>
</dependency>
<dependency>
<groupId>org.apache.maven.wagon</groupId>
<artifactId>wagon-http</artifactId>
<version>2.8</version>
</dependency>
<dependency>
<groupId>org.apache.maven.wagon</groupId>
<artifactId>wagon-provider-api</artifactId>
<version>2.8</version>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.3</version>
</dependency>
Problem is, when I change the version of httpclient to version 4.4 (the most recent one), I get the following error when trying to deploy:
4840 [main] WARN Sisu - Error injecting: org.apache.maven.wagon.providers.http.HttpWagon
java.lang.NoClassDefFoundError: org/apache/http/ssl/TrustStrategy
at java.lang.ClassLoader.defineClass1(Native Method)
...
Caused by: java.lang.ClassNotFoundException: org.apache.http.ssl.TrustStrategy
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
... 72 more
Anybody got an idea, why deploy works fine with version 4.3.x of org.apache.httpcomponents.httpclient and fails with version 4.4?
I suspect that the version of HttpCore? which HttpClient is based upon, still resolves to 4.3.x. Try explicitly setting it to 4.4
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpcore</artifactId>
<version>4.4</version>
</dependency>

error: package org.jclouds.logging.slf4j.config does not exist ( the class SLF4JLoggingModule )

The import specified at the example http://www.jclouds.org/documentation/quickstart/openstack/ fails:
import org.jclouds.logging.slf4j.config.SLF4JLoggingModule;
...
Iterable<Module> modules = ImmutableSet.<Module> of(new SLF4JLoggingModule());
The dependencies were added following the instruccions at http://www.jclouds.org/documentation/userguide/installation-guide/:
pom.xml:
...
<dependencies>
<dependency>
<groupId>org.jclouds</groupId>
<artifactId>jclouds-allcompute</artifactId>
<version>1.5.7</version>
</dependency>
<dependency>
<groupId>org.jclouds</groupId>
<artifactId>jclouds-allblobstore</artifactId>
<version>1.5.7</version>
</dependency>
</dependencies>
...
SOLUTION
1) Add the dependency of the artifact jclouds-slf4j (http://mvnrepository.com/artifact/org.jclouds.driver/jclouds-slf4j/1.5.4)
<dependency>
<groupId>org.jclouds.driver</groupId>
<artifactId>jclouds-slf4j</artifactId>
<version>1.5.4</version>
</dependency>
2) Rebuild the project
The pom.xml may look like this:
...
<dependencies>
<dependency>
<groupId>org.jclouds</groupId>
<artifactId>jclouds-allcompute</artifactId>
<version>1.5.7</version>
</dependency>
<dependency>
<groupId>org.jclouds</groupId>
<artifactId>jclouds-allblobstore</artifactId>
<version>1.5.7</version>
</dependency>
<dependency>
<groupId>org.jclouds.driver</groupId>
<artifactId>jclouds-slf4j</artifactId>
<version>1.5.4</version>
</dependency>
</dependencies>
...
My bad. I wrote that doc.
The Get jclouds section on that page previously read
Follow the instructions for Getting the binaries using Apache Ant.
But only linked to the Installation guide at the top. It should have linked directly to the Getting the binaries using Apache Ant section which has the jclouds-slf4j driver in it.
I've added anchors to the Installation guide so you can link into individual sections now. Hope that helps clear it up.

Resources