Error with Flink 0.10.1 - maven

With flink 0.10.1 in local I can't connect with jobmanager due the following error:
Association with remote system [akka.tcp://flink#127.0.0.1:49789] has failed, address is now gated for [5000] ms. Reason is: [scala.Option; local class incompatible: stream classdesc serialVersionUID = -2062608324514658839, local class serialVersionUID = -114498752079829388].
And my pom.xml:
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>0.10.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients</artifactId>
<version>0.10.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-hadoop-compatibility</artifactId>
<version>0.10.1-hadoop1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-jdbc</artifactId>
<version>0.10.1-hadoop1</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.6</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-ml</artifactId>
<version>0.10.1-hadoop1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java</artifactId>
<version>0.10.1</version>
</dependency>
</dependencies>
With flink 0.9.1 works fine ... What I'm missing? Thank you!

It sounds like a version miss match, ie, that you have old 0.9.1 binaries in your code base... Try to clean your maven cache via
cd ~/.m2/repositories/org/apache/flink
rm -rf *
Afterward, rebuild your project: mvn -DskipTests clean package

Ok, the problem was the following:
On the flink download page there are several links to Flink project(binaries and source)
I had downloaded the normal ones without hadoop because is for testing purposes in my local and I don't using hadoop.
But I dont't now why its need the hadoop ones with scala I have downloaded:
Hadoop 2.7.0 with scala 2.10 and it works.

Related

java.lang.NoClassDefFoundError: com/google/common/util/concurrent/FutureFallback

I'm seeing the following error :
java.lang.NoClassDefFoundError: com/google/common/util/concurrent/FutureFallback
when the cluster.connect() is called :
String hosts = CassandraClientUtil.getHost();
String localDC = CassandraClientUtil.getLocalDC();
Cluster cluster = null;
if (StringUtils.isNotEmpty(localDC))
{
cluster = Cluster.builder().addContactPoints(hosts.split(","))
.withCredentials(CassandraCopsComponentLogger.USER_NAME, CassandraCopsComponentLogger.AUTH_CODE)
.withQueryOptions(new QueryOptions().setConsistencyLevel(ConsistencyLevel.LOCAL_ONE))
.withLoadBalancingPolicy(new TokenAwarePolicy(DCAwareRoundRobinPolicy.builder().withLocalDc(localDC).build())).build();
}
else
{
cluster = Cluster.builder().addContactPoints(hosts.split(","))
.withCredentials(CassandraCopsComponentLogger.USER_NAME, CassandraCopsComponentLogger.AUTH_CODE)
.withQueryOptions(new QueryOptions().setConsistencyLevel(ConsistencyLevel.LOCAL_ONE)).build();
}
Session session = cluster.connect();
CassandraCopsComponentLogger.mappingManager = new MappingManager(session);
The pom.xml has the following dependencies :
<dependencies>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>16.0.1</version>
</dependency>
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-core</artifactId>
<version>2.1.9</version>
</dependency>
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty</artifactId>
<version>3.9.0.Final</version>
</dependency>
<dependency>
<groupId>com.codahale.metrics</groupId>
<artifactId>metrics-core</artifactId>
<version>3.0.2</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.7.5</version>
</dependency>
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-mapping</artifactId>
<version>2.1.9</version>
</dependency>
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.3.1</version>
</dependency>
</dependencies>
I saw a post on stackoverflow here
where they recommended to upgrade the guava version to 16.0.1 but that did not help me solve my problem. Some directions from here will be really helpful as I'm new to cassandra. To add more background this thing works as a standalone project, when I include this project as a maven dependency to some other project it raises this runtime error.
com.google.common.util.concurrent.FutureFallback is deprecated in Guava 19.0 and removed since Guava 20.0.
Use Guava 19.0 and do not use Guava 20.0 or greater, until you upgrade the Cassandra driver.
I updated the Cassandra driver version to latest available and it should fix the issue.
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-core</artifactId>
<version>3.5.0</version>
</dependency>
Don't add external guava version . whatever datastax-cassandra-core using only you can put that version . otherwise don't need of that .
If anyone like me didn't know, that there is a new version (4.x) out there with a new and different group id, take a look at the quickstart. This new version still uses Guava however it's shaded.
The driver now requires Java 8. It does not depend on Guava anymore (we still use it internally but it's shaded).
More information can be found in the upgrade guide.

Maven: 1.7.4 openejb dependancy fails

Wanted to make use of openejb on top of tomcat v7 using maven instead of installing tomee. Referring to Apache documentation, 3 dependencies have to be added to the maven project.
<dependency>
<groupId>org.apache.openejb</groupId>
<artifactId>javaee-api</artifactId>
<version>6.0-6</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.openejb</groupId>
<artifactId>openejb-core</artifactId>
<version>4.7.4</version>
</dependency>
<dependency>
<groupId>org.apache.openejb</groupId>
<artifactId>tomee</artifactId>
<version>1.7.4</version>
</dependency>
but the last depency generates following error: Missing artifact org.apache.openejb:tomee:jar:1.7.4
The correct artifactId being visible in:
http://mvnrepository.com/artifact/org.apache.openejb/apache-tomee/1.7.4
is apache-tomee and not tomee
so, replace the last depency by the following one and the problem will be solved:
<dependency>
<groupId>org.apache.openejb</groupId>
<artifactId>apache-tomee</artifactId>
<version>1.7.4</version>
</dependency>

Connect Hive through Java JDBC

There is a question here connect from java to Hive but mine is different
My hive running on machine1 and I need to pass some queries using Java server running at machine2. As I understand Hive has a JDBC interface for the purpose of receiving remote queries. I took the code from here - HiveServer2 Clients
I installed the dependencies written in the article:
hive-jdbc*.jar
hive-service*.jar
libfb303-0.9.0.jar
libthrift-0.9.0.jar
log4j-1.2.16.jar
slf4j-api-1.6.1.jar
slf4j-log4j12-1.6.1.jar
commons-logging-1.0.4.jar
However I got java.lang.NoClassDefFoundError error at compile time
Full Error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
at org.apache.hive.jdbc.HiveConnection.createBinaryTransport(HiveConnection.java:393)
at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:187)
at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:163)
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
at java.sql.DriverManager.getConnection(DriverManager.java:571)
at java.sql.DriverManager.getConnection(DriverManager.java:215)
at com.bidstalk.tools.RawLogsQuerySystem.HiveJdbcClient.main(HiveJdbcClient.java:25)
Another question at StackOverflow recommended to add Hadoop API dependencies in Maven - Hive Error
I don't understand why do I need hadoop API for a client to connect with Hive. Shouldn't JDBC driver be agnostic of the underlying query system? I just need to pass some SQL query?
Edit:
I am using Cloudera(5.3.1), I think I need to add CDH dependencies. Cloudera instance is running hadoop 2.5.0 and HiveServer2
But the servers are at machine 1. On machine the code should at least compile and I should have issues at runtime only!
In case if you didn't still solve this, I have given it a go.
And I needed the following dependencies for it to compile and run :
libthrift-0.9.0-cdh5-2.jar
httpclient-4.2.5.jar
httpcore-4.2.5.jar
commons-logging-1.1.3.jar
hive-common.jar
slf4j-api-1.7.5.jar
hive-metastore.jar
hive-service.jar
hadoop-common.jar
hive-jdbc.jar
guava-11.0.2.jar
The hive documentation is probably written against a older version/distribution.
Your exception is due to the missing hadoop-common jar, which has the org.apache.hadoop.conf.Configuration.
Hope this helps.
Getting the same error when trying to use hive-jdbc 1.2.1 against hive 0.13.
Comparing to the long list in other answers. Now we use these two:
hive-jdbc-1.2.1-standalone.jar
hadoop-common-2.7.1.jar
Another side note: you might get 'Required field 'client_protocol' is unset!' when using the latest jdbc against an older Hive. If so, change the jdbc version to 1.1.0:
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>1.1.0</version>
<classifier>standalone</classifier>
</dependency>
Answering my own question!
With some hit and trial, I have added following dependencies on my pom file and since then I am able to run code on both CHD 5.3.1 and 5.2.1 cluster.
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>0.13.1-cdh5.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.thrift</groupId>
<artifactId>libthrift</artifactId>
<version>0.9.0</version>
</dependency>
<dependency>
<groupId>org.apache.thrift</groupId>
<artifactId>libfb303</artifactId>
<version>0.9.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>2.5.0-mr1-cdh5.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.5.0-cdh5.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>0.13.1-cdh5.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.5.0-cdh5.3.1</version>
</dependency>
<dependency>
Please note that some of these dependencies might not be required
For others wondering around about what exactly is required to execute remotely a HIVE query using java...
Java code
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
import java.sql.Statement;
public class Runner
{
private static String driverName = "org.apache.hive.jdbc.HiveDriver";
public static void main(String[] args) throws SQLException {
try {
// Register driver and create driver instance
Class.forName(driverName);
} catch (ClassNotFoundException ex) {
ex.printStackTrace();
}
// get connection
System.out.println("before trying to connect");
Connection con = DriverManager.getConnection("jdbc:hive2://[HOST IP]:10000/", "hive", "");
System.out.println("connected");
// create statement
Statement stmt = con.createStatement();
// execute statement
stmt.executeQuery("show tables");
con.close();
}
}
Together with the pom file with the only required dependencies..
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>test-executor</groupId>
<artifactId>test-executor</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<hadoop.version>2.5.2</hadoop.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>1.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
</dependency>
</dependencies>
</project>
I have faced the same issue with CDH5.4.1 version. I updated my POM file with the below code and it worked for me.
My Hadoop Version is Hadoop 2.6.0-cdh5.4.1 and Hive version is Hive 1.1.0-cdh5.4.1
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>0.13.0</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>0.13.0</version>
</dependency>
<dependency>
<groupId>org.apache.thrift</groupId>
<artifactId>libthrift</artifactId>
<version>0.9.0</version>
</dependency>
<dependency>
<groupId>org.apache.thrift</groupId>
<artifactId>libfb303</artifactId>
<version>0.9.0</version>
</dependency>
<dependency>
<groupId>commons-logging</groupId>
<artifactId>commons-logging</artifactId>
<version>1.1.3</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.0</version>
</dependency>
I have resolved with this POM update.
Seems like you are all working with cloudera, I found that the repo in maven looks old because if you go to their site, you can download their jdbc. https://www.cloudera.com/downloads/connectors/hive/jdbc/2-5-20.html
The driver seems to support more functionality than the one in hive. I notice that that they have addBatch implemented. I just wish they had these libraries in maven. Maybe someone can find where to get them from using maven.

Maven embedded deploy not working with org.apache.httpcomponents.httpclient version 4.4

Within my application I have to (mvn) deploy artifacts programmatically. I do this with the help of the maven-embedder artifact and some really simple code:
MavenCli client = new MavenCli();
int result = client.doMain(new String[] { "deploy" }, "C:/some/path/to/my/pom", System.out, System.out);
To be able to do this, I had to add the following dependencies to my pom:
<dependency>
<groupId>org.apache.maven</groupId>
<artifactId>maven-embedder</artifactId>
<version>3.2.5</version>
</dependency>
<dependency>
<groupId>org.eclipse.aether</groupId>
<artifactId>aether-connector-basic</artifactId>
<version>1.0.2.v20150114</version>
</dependency>
<dependency>
<groupId>org.eclipse.aether</groupId>
<artifactId>aether-transport-wagon</artifactId>
<version>1.0.2.v20150114</version>
</dependency>
<dependency>
<groupId>org.apache.maven.wagon</groupId>
<artifactId>wagon-http</artifactId>
<version>2.8</version>
</dependency>
<dependency>
<groupId>org.apache.maven.wagon</groupId>
<artifactId>wagon-provider-api</artifactId>
<version>2.8</version>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.3</version>
</dependency>
Problem is, when I change the version of httpclient to version 4.4 (the most recent one), I get the following error when trying to deploy:
4840 [main] WARN Sisu - Error injecting: org.apache.maven.wagon.providers.http.HttpWagon
java.lang.NoClassDefFoundError: org/apache/http/ssl/TrustStrategy
at java.lang.ClassLoader.defineClass1(Native Method)
...
Caused by: java.lang.ClassNotFoundException: org.apache.http.ssl.TrustStrategy
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
... 72 more
Anybody got an idea, why deploy works fine with version 4.3.x of org.apache.httpcomponents.httpclient and fails with version 4.4?
I suspect that the version of HttpCore? which HttpClient is based upon, still resolves to 4.3.x. Try explicitly setting it to 4.4
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpcore</artifactId>
<version>4.4</version>
</dependency>

pig-0.9.0.pom does not contain all its runtime dependencies, like pig-0.8.1-cdh3u1.pom

maven noob, be patient...
I'm upgrading from cdh3u1 to apache hadoop 0.20.203.0 and pig 0.9.0. I used to have:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>0.20.2-cdh3u1</version>
</dependency>
<dependency>
<groupId>org.apache.pig</groupId>
<artifactId>pig</artifactId>
<version>0.8.1-cdh3u1</version>
</dependency>
and running them from inside eclipse, with junit run configuration worked great.
Now I have:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>0.20.203.0</version>
</dependency>
<dependency>
<groupId>org.apache.pig</groupId>
<artifactId>pig</artifactId>
<version>0.9.0</version>
</dependency>
and I got NoClassDefFoundError: jline/ConsoleReaderInputStream on runtime.
I ended with adding all these dependencies manually until it worked:
<dependency>
<groupId>jline</groupId>
<artifactId>jline</artifactId>
<version>0.9.94</version>
</dependency>
<dependency>
<groupId>org.antlr</groupId>
<artifactId>antlr-runtime</artifactId>
<version> 3.2 </version> <- this is 3.0.1 in cdh3u1, but probably changed in pig 0.9.0
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>r06</version>
</dependency>
What gives? why isn't maven automatically pulling my dependencies and putting them in the classpath?
Maven has a feature called Transitive dependencies, so you don´t have to specify the libraries that your own dependencies require.
ConsoleReaderInputStream is in the Jline JAR. When you were using Pig.0.8.1-cdh3u1, you didn´t have to add the Jline dependency because it is declared in Pig.0.8.1-cdh3u1.pom. Pig 0.9.0.pom does not have Jline dependency declared anymore, that´s the reason you had to add it by yourself. As for the reason JLine was removed from Pig, you have to ask the developers of that project.

Resources