Is it possible to build Apache Spark against Hadoop 2.5.1 - maven

After compiling Hadoop 2.5.1 with maven
hadoop version
Hadoop 2.5.1, I tried to compile apache spark using the following command:
mvn -Pyarn -Phadoop-2.5 -Dhadoop.version=2.5.1 -Pdeb -DskipTests clean package
But apparently there is no 2.5 profile.
My question is : what should I do?
rebuild hadoop 2.4
or compile spark with profile 2.4
or any other solution ?

Looks like this was asked after the poster inquired:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-1-0-with-Hadoop-2-5-0-td15827.html
"The hadoop-2.4 profile is really intended to be "Hadoop 2.4+". It
should compile and run fine with Hadoop 2.5 as far as I know. CDH 5.2
is Hadoop 2.5 + Spark 1.1, so there is evidence it works."

Just changing the profile name worked for me.
Thx for the answers.

Related

what is the difference between two downloadable versions of giraph: 1.2giraph-dist-1.2.0-hadoop2-bin.tar.gz and giraph-dist-1.2.0-bin.tar.gz

What is the difference between
giraph-dist-1.2.0-hadoop2-bin.tar.gz and giraph-dist-1.2.0-bin.tar.gz.
Is there any documentation about that?
The only documentation that I found is the following one:
Apache Hadoop 2 (latest version: 2.5.1)
This is the latest version of Hadoop 2 (supporting YARN in addition
to MapReduce) Giraph could use. You may tell maven to use this version
with "mvn -Phadoop_2 ".

Building spark without any hadoop dependencies

I found some references to -Phadoop-provided flag for building spark without hadoop libraries but cannot find a good example of how to use it. how can I build spark from source and make sure it does not add any of it's own hadoop dependencies. it looks like when I built the latest spark it included a bunch of 2.8.x hadoop stuff which conflicts with my cluster hadoop version.
Spark has download options for "pre-built with user-provided Hadoop", which are consequently named with spark-VERSION-bin-without-hadoop.tgz
If you would really like to build it, then run this from the project root
./build/mvn -Phadoop-provided -DskipTests clean package

Which Hadoop to use for Mahout 0.9

I'm using Mahout Cookbook, which shows examples for Mahout 0.8 and uses Hadoop 0.23.5.
I'm new to the whole system, so I would like to know which Hadoop version to use when running Mahout 0.9?
Thanks
When pulling Mahout 0.9 from maven it includes hadoop-core version 1.2.1. Mahout version 0.9 does not work with hadoop 2 according to this. It is resolved in the latest master branch on github, but this requires you to recompile mahout from the source and include the hadoop 2 libraries. Mahout 1.0 should support hadoop 2.X versions.
If you choose to run Mahout 0.9 with Hadoop 2, you can follow these steps to make it work:
git clone https://github.com/apache/mahout.git
In the Mahout folder, type:
mvn -Dhadoop2.version=2.2.0 -DskipTests clean install
mvn -Dhadoop2.version=2.2.0 clean package
And below is a usage example for recommenditembased:
bin/mahout recommenditembased --input input/input.txt --output output --usersFile input/users.txt --similarityClassname SIMILARITY_COOCCURRENCE
Edit: original source is http://mahout.apache.org/developers/buildingmahout.html
This version of Mahout also runs with hadoop 0.2 core jar.
I am using it on windows machine , as 0.2 onwards, hadoop gives permission exception for windows system

can't get hadoop to see snappy

i'm on rhel7 64bit. I managed to apparently build the hadoop 2.4.1 distribution from source. before that, i built snappy from source and installed it. then i build the hadoop dist. with
mvn clean install -Pdist,native,src -DskipTests -Dtar -Dmaven.javadoc.skip=true -Drequire.snappy
yet when i look at $HADOOP_HOME/lib/native i see hdfs and hadoop libs but not snappy. so when i run hadoop checknative it says that i don't have snappy installed. furthermore, i downloaded hadoop-snappy, and compiled /that/ and it generated the snappy libs. i copied those over to $HADOOP_HOME/lib/native /and/ to $HADOOP_HOME/lib just for extra measure. STILL, hadoop checknative doesn't see it!
found the non-obvious solution in an obscure place http://lucene.472066.n3.nabble.com/Issue-with-loading-the-Snappy-Codec-td3910039.html
needed to add -Dcompile.native=true. this was not highlighted in the apache build doc nor was it in any build guide i've come across!

Sqoop Installation with hadoop 2.2.0?

I am trying to install all apache hadoop components in my system. I installed hadoop-2.2.0, hive-0.11.0, pig-0.12.0, hbase-0.96.0, now its time to install sqoop. So please suggest me installation steps of sqoop which is compatable with hadoop-2.2.0 and hbase.
Hope for reply soon
thanks in advance for reply back.
#Naveen:The link that you have provided is for Sqoop2.It is not specifically for Hadoop 2.0 branch.Basically it tries to resolve and enhance Sqoop by changing the design to client server model(i.e it's major promises include ease of use,ease of extension,security).For more details,find this interesting video for sqoop2 #https://www.youtube.com/watch?v=hg683-GOWP4.
We can use the latest Sqoop(version 1.4.4(compiled library for hadoop2.0) or 1.4.5) from ASF.Just download the correct version of sqoop for hadoop 2.0 branch.For e.g sqoop-1.4.5.bin__hadoop-2.0.4-alpha.tar.gz can be downloaded and used without any issue with Hadoop 2.0+ versions.
If you couldn't find sqoop version(I assume you are using versions earlier than 1.4.4) for Hadoop2.0 + from the ASF site,you have to recompile the sqoop source code for hadoop 2.0 branch.But it is not required since you can just use the latest sqoop version which supports hadoop 2.0(Hope you are not looking for production ready sqoop version for hadoop 2.0 since the recent versions of sqoop for hadoop2 is still in alpha phase!!!)
I haven't tried Sqoop2 yet.It will also help with the new enhancements for all Hadoop versions 1.0,2.0.
Thank you
Try These steps for installation of sqoop with hadoop 2.2.0
https://sqoop.apache.org/docs/1.99.1/Installation.html

Resources