Interoperability of pig 0.9.0 with cdh3u1? - hadoop

Softer version of cdh3 client interoperable with apache hadoop server 0.20.xx?
We have a java app that runs a series of pig scripts (it injects some variables in them, but generally it's just a driver/client for running them).
We need the macro features of pig 0.9.0 but cdh comes with pig 0.8.1.
Is the following a good idea?
We try working with:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>0.20.2-cdh3u1</version>
</dependency>
with
<dependency>
<groupId>org.apache.pig</groupId>
<artifactId>pig</artifactId>
<version>0.9.0</version>
</dependency>
instead of
<dependency>
<groupId>org.apache.pig</groupId>
<artifactId>pig</artifactId>
<version>0.8.1-cdh3u1</version>
</dependency>

Related

How to use com.fasterxml.jackson 2.8.1 and 2.6.5 in the same module of maven project?

I have a module which has Spark 2.1.0 and Presto 0.166.
Spark 2.1.0 requires com.faster.xml version 2.6.5 while Presto 0.166 requires 2.8.1 strictly. How Can I resolve the issue in the same pom.xml so that I can run them in the same module?
Simply specify the version of com-fasterxml-jackson in your pom file. The version specified here will override the versions in Spark 2.1.0 and Presto 0.166
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<version>2.8.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.0</version>
</dependency>
<dependency>
<groupId>com.facebook.presto</groupId>
<artifactId>presto...</artifactId>
<version>0.166</version>
</dependency>
Since, Spark 2.1.0 can use com.fasterxml.jackson 2.8.1, you won't need 2 different versions of it in your module.
Resources -
Introduction to the Dependency Mechanism
You cannot use multiple version(s) of same dependency in a single pom.xml, exclude-dependency com.faster.xml version either from Spark 2.1.0 or from Presto 0.166, for example:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.0</version>
<exclusions>
<exclusion>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
</exclusion>
</exclusions>
</dependency>
Are you trying to write a plugin for Presto? If so, the Presto SPI explicitly depends on only jackson-annotations and not the implementation. There should be no problem with using the newer version of the annotations with an older version of Jackson within your plugin. The version of Jackson used by the Presto engine can and will be different from the one used by your plugin as plugins are loaded in a separate class loader.
The Presto plugin system is designed to have very minimal dependencies and allow you to use whatever versions of libraries you want (as that is often necessary when writing a connector to a random system that uses older versions of libraries).

Hadoop/Hbase: java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration

I have hadoop and Hbase installed, both working fine as far as I can tell. When trying to the built jar with hadoop, I get a
java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
error, using Hbase version 0.90.2 in my maven dependency.
I think this is quite an old version of Hbase and I am unsure if this old version is compatible with hadoop 2.7.2 or even Java 8. Thus I tried using Hbase version 0.99.2 in my maven dependency, but then I get a
Failed to execute goal on project exercise_2: Could not resolve dependencies for project com.company.exercise_2:exercise_2:jar:1.0-SNAPSHOT: Failure to find org.apache.hbase:hbase:jar:0.99.2 in http://repo.maven.apache.org/maven2 was cached in the local repository
error from the maven plugin. What am I doing wrong?
Here is my pom.xml:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.7.2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.7.2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase</artifactId>
<version>0.99.2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>1.1.2</version>
<scope>provided</scope>
</dependency>
Seems like this is jar caching issue, I think HbaseConfiguration is common class regardless of which version of hbase used.
Can you manually delete local repository file of hbase and try mvn XXXX command once again.
Maven will then try to download and fix the class path.
for cross checking, use mvn ... -X option to see which version of jar its trying to download.
Since scope of this jar is
provided
Cross check the hbase version of this jar in your cluster. by using "hbase classpath" and check whether this jar version is closely matching with your jar file version of maven repository of your pom.xml.
That should fix.

Error with Flink 0.10.1

With flink 0.10.1 in local I can't connect with jobmanager due the following error:
Association with remote system [akka.tcp://flink#127.0.0.1:49789] has failed, address is now gated for [5000] ms. Reason is: [scala.Option; local class incompatible: stream classdesc serialVersionUID = -2062608324514658839, local class serialVersionUID = -114498752079829388].
And my pom.xml:
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>0.10.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients</artifactId>
<version>0.10.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-hadoop-compatibility</artifactId>
<version>0.10.1-hadoop1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-jdbc</artifactId>
<version>0.10.1-hadoop1</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.6</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-ml</artifactId>
<version>0.10.1-hadoop1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java</artifactId>
<version>0.10.1</version>
</dependency>
</dependencies>
With flink 0.9.1 works fine ... What I'm missing? Thank you!
It sounds like a version miss match, ie, that you have old 0.9.1 binaries in your code base... Try to clean your maven cache via
cd ~/.m2/repositories/org/apache/flink
rm -rf *
Afterward, rebuild your project: mvn -DskipTests clean package
Ok, the problem was the following:
On the flink download page there are several links to Flink project(binaries and source)
I had downloaded the normal ones without hadoop because is for testing purposes in my local and I don't using hadoop.
But I dont't now why its need the hadoop ones with scala I have downloaded:
Hadoop 2.7.0 with scala 2.10 and it works.

maven artifactId hadoop 2.2.0 for hadoop-core

I am migrating my application from hadoop 1.0.3 to hadoop 2.2.0 and maven build had hadoop-core marked as dependency. Since hadoop-core is not present for hadoop 2.2.0. I tried replacing it with hadoop-client and hadoop-common but I am still getting this error for ant.filter. Can anybody please suggest which artifact to use?
previous config :
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.0.3</version>
</dependency>
New Config:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.2.0</version>
</dependency>
Error:
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project event: Compilation failure: Compilation failure:
[ERROR] /opt/teamcity/buildAgent/work/c670ebea1992ec2f/event/src/main/java/com/intel/event/EventContext.java:[27,36] package org.apache.tools.ant.filters does not exist
[ERROR] /opt/teamcity/buildAgent/work/c670ebea1992ec2f/event/src/main/java/com/intel/event/EventContext.java:[27,36] package org.apache.tools.ant.filters does not exist
[ERROR] /opt/teamcity/buildAgent/work/c670ebea1992ec2f/event/src/main/java/com/intel/event/EventContext.java:[180,59] cannot find symbol
[ERROR] symbol: class StringInputStream
[ERROR] location: class com.intel.event.EventContext
We mainly depend on hdfs api for our application. When we migrated to hadoop 2.X, we were surprised to see the changes in dependencies. We started adding dependencies one at a time. Today we depend on the following core libraries.
hadoop-annotations-2.2.0
hadoop-auth-2.2.0
hadoop-common-2.2.0
hadoop-hdfs-2.2.0
hadoop-mapreduce-client-core-2.2.0
In addition to these we depend on test libraries too. Based on your needs, you may want to include hadoop-hdfs and hadoop-mapreduce-client to the dependencies along with hadoop-common.
Try with these artifacts, word fine on my sample project wordcount
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
</dependency>
Maven dependencies can be got from this link.
As far as hadoop-core dependies goes, hadoop-core was the name for hadoop 1.X and just renaming the version to 2.X wont help. Also in a hadoop 2.X project using the hadoop 1.X dependency gives an error like
Caused by: org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4
Thus it is suggested not to use it. I have been using the following dependencies in my hadoop
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-jobclient</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-common</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
You can try these.

pig-0.9.0.pom does not contain all its runtime dependencies, like pig-0.8.1-cdh3u1.pom

maven noob, be patient...
I'm upgrading from cdh3u1 to apache hadoop 0.20.203.0 and pig 0.9.0. I used to have:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>0.20.2-cdh3u1</version>
</dependency>
<dependency>
<groupId>org.apache.pig</groupId>
<artifactId>pig</artifactId>
<version>0.8.1-cdh3u1</version>
</dependency>
and running them from inside eclipse, with junit run configuration worked great.
Now I have:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>0.20.203.0</version>
</dependency>
<dependency>
<groupId>org.apache.pig</groupId>
<artifactId>pig</artifactId>
<version>0.9.0</version>
</dependency>
and I got NoClassDefFoundError: jline/ConsoleReaderInputStream on runtime.
I ended with adding all these dependencies manually until it worked:
<dependency>
<groupId>jline</groupId>
<artifactId>jline</artifactId>
<version>0.9.94</version>
</dependency>
<dependency>
<groupId>org.antlr</groupId>
<artifactId>antlr-runtime</artifactId>
<version> 3.2 </version> <- this is 3.0.1 in cdh3u1, but probably changed in pig 0.9.0
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>r06</version>
</dependency>
What gives? why isn't maven automatically pulling my dependencies and putting them in the classpath?
Maven has a feature called Transitive dependencies, so you don´t have to specify the libraries that your own dependencies require.
ConsoleReaderInputStream is in the Jline JAR. When you were using Pig.0.8.1-cdh3u1, you didn´t have to add the Jline dependency because it is declared in Pig.0.8.1-cdh3u1.pom. Pig 0.9.0.pom does not have Jline dependency declared anymore, that´s the reason you had to add it by yourself. As for the reason JLine was removed from Pig, you have to ask the developers of that project.

Resources