Install oozie on Hadoop 2.2 - hadoop

I need some guidance on installing Oozie on Hadoop 2.2. The Quick Start docs page indicates that
IMPORTANT: By default it builds against Hadoop 1.1.1. It's possible to
build against Hadoop 2.x versions as well, but it is strongly
recommend to use a Bigtop distribution if using Hadoop 2.x because the
Oozie sharelibs built from the tarball distribution will not work with
it.
I haven't been able to get Bigtop to work.
I tried following some guidance from here but it only tells me to edit the pom.xml files, not what to edit in them.
I have pig and maven installed.
Thanks in advance

This is a problem with the releases resolving shared libraries with Maven, and has been since fixed if you use git master. I had this problem so hopefully this solution will work for the Oozie version you are building from.
The advice at here is of use. Similar to the blog post you linked, the grep command will indicate the offending files:
$ grep -l "2.2.0-SNAPSHOT" `find . -name "pom.xml"`
./hadooplibs/hadoop-2/pom.xml
./hadooplibs/hadoop-distcp-2/pom.xml
./hadooplibs/hadoop-test-2/pom.xml
./pom.xml
Any mentions of 2.2.0-SNAPSHOT in these files should be replaced with 2.2.0
I would suggest removing the -SNAPSHOT part using the following command:
$ grep -l "2.2.0-SNAPSHOT" `find . -name "pom.xml"` | xargs sed -i 's|2.2.0-SNAPSHOT|2.2.0|g'
UPDATE: If you don't have Hadoop JARs built from when you built Hadoop itself then you will need to add the option -DincludeHadoopJars
And then build the package:
$ mvn clean package assembly:single -Dhadoop.version=2.2.0 -DskipTests
Or if you're using JDK7 and/or targeting Java 7 (as I did):
$ mvn clean package assembly:single -Dhadoop.version=2.2.0 -DjavaVersion=1.7 -DtargetJavaVersion=1.7 -DskipTests
Documentation on building Oozie (version 4 docs) is available here.
The above worked building release-4.0.0 with Hadoop 2.2 and Java SDK 7.
The distro can then be found in distro/target.

Related

How do I turn Keycloak old version 4.1.0 into 'Standalone server distribution'?

I tried something (readme.md, blog etc).But I don't turn 'standalone server distribution'.
I can give an example. What I want to say:
The following is a 'standalone server distribution' files. This is ready for running.
Picture-1
I need to run old version keycloak (version 4.1.0). This package seem like this :
Picture-2
According to Picture-2, this packages don't ready for running.
How can I ready for running ? Like to Picture-1.
I need your suggestions and suggestions. Can you help me?
Greetings,
That's the source code.
You have to build it by executing the following command from parent directory (you need Java JDK and Maven installed and configured):
mvn -Pdistribution -pl distribution/server-dist -am -Dmaven.test.skip clean install
Resulting release distribution will be in ./distribution/server-dist/target/keycloak-4.1.0.Final.zip archive.
Compiling the sources is described here: https://github.com/keycloak/keycloak/blob/master/docs/building.md
You can download the latest release version 4.X from archive: https://www.keycloak.org/archive/downloads-4.8.3.html

what is the difference between two downloadable versions of giraph: 1.2giraph-dist-1.2.0-hadoop2-bin.tar.gz and giraph-dist-1.2.0-bin.tar.gz

What is the difference between
giraph-dist-1.2.0-hadoop2-bin.tar.gz and giraph-dist-1.2.0-bin.tar.gz.
Is there any documentation about that?
The only documentation that I found is the following one:
Apache Hadoop 2 (latest version: 2.5.1)
This is the latest version of Hadoop 2 (supporting YARN in addition
to MapReduce) Giraph could use. You may tell maven to use this version
with "mvn -Phadoop_2 ".

Building spark without any hadoop dependencies

I found some references to -Phadoop-provided flag for building spark without hadoop libraries but cannot find a good example of how to use it. how can I build spark from source and make sure it does not add any of it's own hadoop dependencies. it looks like when I built the latest spark it included a bunch of 2.8.x hadoop stuff which conflicts with my cluster hadoop version.
Spark has download options for "pre-built with user-provided Hadoop", which are consequently named with spark-VERSION-bin-without-hadoop.tgz
If you would really like to build it, then run this from the project root
./build/mvn -Phadoop-provided -DskipTests clean package

How do I install the hadoop-examples* and hadoop-test* jars in HDP 2.2?

How do I install the hadoop-examples* and hadoop-test* jars on Hortonworks Data Platform 2.2? The jars do not exist on any of the servers. Is there another package that I need to install?
I found a reference that says they should be located at /usr/share/hadoop, but that directory does not exist on any of the nodes in my cluster.
Most things moved under /usr/hdp for HDP 2.2 so these are probably what you are looking for.
[hdpdemo#hdp-demo-mas5 hdp]$ pwd
/usr/hdp
[hdpdemo#hdp-demo-mas5 hdp]$ ls
2.2.0.0-2041 current
[hdpdemo#hdp-demo-mas5 hdp]$ find ./2.2.0.0-2041 -name "hadoop*examples*"
./2.2.0.0-2041/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0.2.2.0.0-2041.jar
./2.2.0.0-2041/hadoop-mapreduce/hadoop-mapreduce-examples.jar
./2.2.0.0-2041/knox/samples/hadoop-examples.jar
[hdpdemo#hdp-demo-mas5 hdp]$ find ./2.2.0.0-2041 -name "hadoop*test*"
./2.2.0.0-2041/hadoop/hadoop-common-tests.jar
./2.2.0.0-2041/hadoop/hadoop-common-2.6.0.2.2.0.0-2041-tests.jar
./2.2.0.0-2041/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.6.0.2.2.0.0-2041-tests.jar
./2.2.0.0-2041/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar
./2.2.0.0-2041/hadoop-yarn/hadoop-yarn-server-tests.jar
./2.2.0.0-2041/hadoop-yarn/hadoop-yarn-server-tests-2.6.0.2.2.0.0-2041.jar
./2.2.0.0-2041/hadoop-hdfs/hadoop-hdfs-tests.jar
./2.2.0.0-2041/hadoop-hdfs/hadoop-hdfs-2.6.0.2.2.0.0-2041-tests.jar
[hdpdemo#hdp-demo-mas5 hdp]$
You need to install hadoop packages, Hadoop libraries can be installed in two ways using RPM or you can use Tarball
RPM (Binary - Requires sudo access)
Step 1 : Configuring Remote repository
Step 2 : Install hadoop libraries - Refer this link
Tarball (Source + Binary : No need of sudo access just need to
extract )
Get tar ball from the link extract and use it.
Once the installation is done using RPM, you can use either locate command or find command for locating these jars. In case of Tarball you can location hadoop libraries within the extracted directory.
locate hadoop-test.jar hadoop-examples.jar
find /usr -iname hadoop-test.jar

Which Hadoop to use for Mahout 0.9

I'm using Mahout Cookbook, which shows examples for Mahout 0.8 and uses Hadoop 0.23.5.
I'm new to the whole system, so I would like to know which Hadoop version to use when running Mahout 0.9?
Thanks
When pulling Mahout 0.9 from maven it includes hadoop-core version 1.2.1. Mahout version 0.9 does not work with hadoop 2 according to this. It is resolved in the latest master branch on github, but this requires you to recompile mahout from the source and include the hadoop 2 libraries. Mahout 1.0 should support hadoop 2.X versions.
If you choose to run Mahout 0.9 with Hadoop 2, you can follow these steps to make it work:
git clone https://github.com/apache/mahout.git
In the Mahout folder, type:
mvn -Dhadoop2.version=2.2.0 -DskipTests clean install
mvn -Dhadoop2.version=2.2.0 clean package
And below is a usage example for recommenditembased:
bin/mahout recommenditembased --input input/input.txt --output output --usersFile input/users.txt --similarityClassname SIMILARITY_COOCCURRENCE
Edit: original source is http://mahout.apache.org/developers/buildingmahout.html
This version of Mahout also runs with hadoop 0.2 core jar.
I am using it on windows machine , as 0.2 onwards, hadoop gives permission exception for windows system

Resources