Hadoop Installation - Directory structure - /etc/hadoop - hadoop

I downloaded hadoop 2.7 and each installation guide that I have found mentions /etc/hadoop/.. directory but the distribution that I have downloaded doesn't have this directory.
I tried with Hadoop 2.6 as well and it doesn't have this directory either.
Should I create these directories ?
Caveat; I am a complete newbie !
Thanks in advance.

It seems you have downloaded the source. Build the Hadoop source then you will get that folder.
To build the hadoop source, refer building.txt file available in the Hadoop package that you have downloaded.

Try downloading hadoop-2.6.0.tar.gz instead of hadoop-2.6.0-src.tar.gz from Hadoop Archive. As #Kumar has mentioned, you might have source distribution.
If you don't want to compile hadoop from source, then download hadoop-2.6.0.tar.gz from the link given above.

Try to download compiled hadoop from the below link
http://a.mbbsindia.com/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz instead of downloading source
you will get the /etc/hadoop/.. path in it..

Download it from Apache Website
Apache Hadoop link

Related

Where is the bin directory in Hadoop project dir?

I have been following the official as well as DigitalOcean's documentation(tutorial) but I could not follow their lead. Each of them suggests editing a file an running the Hadoop after editing:
etc/hadoop/hadoop-env.sh in the dist directory.
I am unable to find such a file in the whole extracted dir from the latest stable release as well a release of 2.7.7
Where is the etc/hadoop/hadoop-env.sh ?
The paths in the guide may or may not represent the actual bundle packaging. You should run find and see the results from the tar.

Building spark without any hadoop dependencies

I found some references to -Phadoop-provided flag for building spark without hadoop libraries but cannot find a good example of how to use it. how can I build spark from source and make sure it does not add any of it's own hadoop dependencies. it looks like when I built the latest spark it included a bunch of 2.8.x hadoop stuff which conflicts with my cluster hadoop version.
Spark has download options for "pre-built with user-provided Hadoop", which are consequently named with spark-VERSION-bin-without-hadoop.tgz
If you would really like to build it, then run this from the project root
./build/mvn -Phadoop-provided -DskipTests clean package

can't get hadoop to see snappy

i'm on rhel7 64bit. I managed to apparently build the hadoop 2.4.1 distribution from source. before that, i built snappy from source and installed it. then i build the hadoop dist. with
mvn clean install -Pdist,native,src -DskipTests -Dtar -Dmaven.javadoc.skip=true -Drequire.snappy
yet when i look at $HADOOP_HOME/lib/native i see hdfs and hadoop libs but not snappy. so when i run hadoop checknative it says that i don't have snappy installed. furthermore, i downloaded hadoop-snappy, and compiled /that/ and it generated the snappy libs. i copied those over to $HADOOP_HOME/lib/native /and/ to $HADOOP_HOME/lib just for extra measure. STILL, hadoop checknative doesn't see it!
found the non-obvious solution in an obscure place http://lucene.472066.n3.nabble.com/Issue-with-loading-the-Snappy-Codec-td3910039.html
needed to add -Dcompile.native=true. this was not highlighted in the apache build doc nor was it in any build guide i've come across!

Hadoop Installation: No such file while run hadoop format

I've checked answers on stackoverflow, no solutions work for my case.
Command:
bin/hadoop namenode -format
Error Message:
/bin/java: No such file or directory1.7.0_09/
/bin/java: No such file or directory1.7.0_09/
/bin/java: cannot execute: No such file or directory
Relevant change in hadoop_env.sh
# The java implementation to use. Required.
export JAVA_HOME=/usr/local/jdk1.7.0_09/
I use soft-link by
ln -s "c:\Program Files\java\jdk1.7.0_09" /usr/local/jdk1.7.0_09
Java HOME:
C:\Program Files\Java\jdk1.7.0_09
Path :
C:\cygwin64\bin;C:\cygwin64\usr\sbin
If any one has clues, please feel free to point it out. Thanks.
#xhudik #s.singh Finally! There is a problem when modifying hadoop_env.sh in Windows. I've fixed the problem with dos2unix command to eliminate dos style character.
If dos2unix command can't be found in cygwin, re-download cygwin and update it.
Please follow the link here:
https://superuser.com/questions/612435/cygwin-dos2unix-command-not-found
The command is
dos2unix hadoop_env.sh
Then everything is all set. Hope my experience would help others.
Thanks for s.singh and xhudik's help.
there is no java. Are you sure that your java binaries (./java , ./javac...) are in the specified directories? Maybe ln is a problem. Java also doesn't like " " in directory name (c:\program files) ...
You need to correctly place java distribution and then define JAVA_HOME variable. You can test it by:
$JAVA_HOME/bin/java -version
Set your java home like this:
JAVA_HOME=C:/Program Files/java/jdk1.7.0_09 in hadoop_env.sh
also you need to set Java Path in environment variable for java.
If still getting issue, then please let us know.
For learning and best practice on hadoop, try using cloudera version or Hortonworks version of hadoop . You can download their windows version. Please check link:
hortonworks.
cloudera
Or you can use IBM Smart Cloud enterprise. IBM is giving free access for students and learning.

Build a Hadoop Ecplise Library from CDH4 jar files

I am trying to build a Hadoop library of all the jar files that I need to a build map/reduce job in Eclipse.
What are the .jar files that I need AND from what folders of the single node install of CDH4 when installed Hadoop on Ubuntu?
Assuming you've downloaded the CDH4 tarball distro from https://ccp.cloudera.com/display/SUPPORT/CDH4+Downloadable+Tarballs
Unpack the tarball
locate the build.properties file in the unpacked directory:
hadoop-2.0.0-cdh4.0.0/src/hadoop-mapreduce-project/src/contrib/eclipse-plugin
Add a property to this file for your eclipse installation directory:
eclipse.home=/opt/eclipse/jee-indigo-SR2
Finally run ant from the hadoop-2.0.0-cdh4.0.0/src/hadoop-mapreduce-project directory to build the jar
You'll now have a jar in the hadoop-2.0.0-cdh4.0.0/src/hadoop-mapreduce-project/build/contrib/eclipse-plugin/ folder
To finally answer your question, the dependency jars are now in:
hadoop-2.0.0-cdh4.0.0/src/hadoop-mapreduce-project/build/contrib/eclipse-plugin/
And to be really verbose if you want the list, see this pastebin

Resources