Hadoop - install process for /usr/libexec etc - macos

I'm trying to compile/install/run Hadoop as a single node cluster on a Mac OS X 10.7.5.
I've downloaded the hadoop-2.2.0-src, and am able to compile all modules with
mvn install
The install is successful, and the tests check out too.
When trying to run hadoop (specifically, hdfs -namenode format to start off with), I see a requirement for hadoop components to exist in directories like:
/usr/libexec
/usr/lib/conf etc.
What is the install step required to get the files into this directory? Can it be done from Maven, or is there a manual install step required?
One option, and I'm not sure if it's correct, is to set my HADOOP_HOME - is this where hadoop finds its /libexecs?
Thanks guys
Pete

Related

How to install Hive For Windows?

I have installed hadoop. I used hadoop-2.7.0-src.tar.gz and hadoop-2.7.0.tar.gz files. And uses apache-maven-3.1.1 to collect the hadoop tar file for windows.
After so many tries I made it run. It was difficult to install hadoop without knowing what I am doing.
Now I want to install Hive. Do I have to collect Hive files with Maven?
If yes what folders should I use to collect them?
And then I want to install sqoop.
Any information is appreciated.
I have not tried on windows but I did on Linux - Ubuntu, and I have detailed these step by step in my blog over here - Hive Install
Have a look, I think most of the steps will be necessary in the same sequence as described but nature of command may be different for windows.

Installing spark on hadoop

I installed hadoop 2.7 on my mac. Then i want to install spark on it. But there is no any document for this.can anybody explain step by step how to install spark on hadoop?
Steps to Install Apache Spark
1) Open Apache Spark Website http://spark.apache.org/
2) Click on Downloads Tab a new Page will get open
3) Choose Pre-built for Hadoop 2.7 and later
4) Choose Direct Download
5) Click on Download Spark: spark-2.0.2-bin-hadoop2.7.tgz and save it on your desired location.
6) Go to the Downloaded Tar file and Extract it.
7) Again Extract the spark-2.0.2-bin-hadoop2.7.tar [File name will differ as version changes] to generate spark-2.0.2-bin-hadoop2.7 folder
8) Now open Shell Prompt and go to the bin directory of spark-2.0.2-bin-hadoop2.7 folder [Folder name will differ as version changes ]
9) Execute command spark-shell.sh
You will be in Spark Shell you can execute the spark commands
https://spark.apache.org/docs/latest/quick-start.html <-- Quick start Guide from spark
Hope this Helps!!!
For running spark on yarn cluster there is lot of steps to install hadoop and spark and all so i write one blog on it step by step you can install it and run spark shell on yarn see the below link
https://blog.knoldus.com/2016/01/30/spark-shell-on-yarn-resource-manager-basic-steps-to-create-hadoop-cluster-and-run-spark-on-it/
Here are the steps I took to install Apache Spark to a Linux Centos system with hadoop:
Install a default Java system (ex: sudo yum install java-11-openjdk)
Download latest release of Apache Spark from spark.apache.org
Extract the Spark tarball (tar xvf spark-2.4.5-bin-hadoop2.7.tgz)
Move Spark folder created after extraction to the /opt/ directory (sudo mv spark-2.4.5-bin-hadoop2.7/ /opt/spark)
Execute with command /opt/spark/bin/spark-shell if you wish to work with Scala or /opt/spark/bin/pyspark if you want to work with Python

Hadoop installation status

I'm running debian. I'm new to Hadoop. Sometime back, I was trying to install Hadoop. I'm not sure if I've successfully installed. When I enter command at terminal
hadoop version
I see the output:
Hadoop 2.7.1
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 15ecc87ccf4a0228f35af08fc56de536e6ce657a
Compiled by jenkins on 2015-06-29T06:04Z
Compiled with protoc 2.5.0
From source with checksum fc0a1a23fc1868e4d5ee7fa2b28a58a
This command was run using /home/xxxxxxx/java/hadoop-2.7.1/share/hadoop/common/hadoop-common-2.7.1.jar
Is hadoop installed properly? If not, what other tests I've to do? If yes, is there some simple "get-started" tutorial/exercise you're aware of, that can help me get started with?
Thank you!

how to install Spark and Hadoop from tarball separately [Cloudera]

I want to install Cloudera distribution of Hadoop and Spark using tarball.
I have already set up Hadoop in Pseudo-Distributed mode in my local machine and successfully ran a Yarn example.
I have downloaded latest tarballs CDH 5.3.x from here
But the folder structure of Spark downloaded from Cloudera is differrent from Apache website. This may be because Cloudera provides it's own version maintained separately.
So, as there are no documentation I have found yet to install Spark from this Cloudera's tarball separately.
Could someone help me to understand how to do it?
Spark could be extracted to any directory. You just need to run the ./bin/spark-submit command (available in extracted spark directory) with required parameters to submit the job. To start spark interactive shell, please use command ./bin/spark-shell.

Hadoop showing old version despite latest version installation

I am trying to install hadoop in my ubuntu OS. I followed each and every step exactly from this link Hadoop Install Tutorial and everything was going as expected until i tried to run
$ start-dfs.sh and $ hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2 5 command. These commands doesn't work as expected.I tried R&D and somehow came to know that i was using older hadoop version Hadoop 1.0.2 despite of me getting latest 2.2.0 version.
As i could not solve this, i tried to uninstall hadoop completely, Now when i try doing it, it says
$ sudo dpkg -r hadoop
dpkg: dependency problems prevent removal of hadoop:
hadoop-native depends on hadoop (= 1.0.2-0ubuntu1~hadoop1).
dpkg: error processing hadoop (--remove):
dependency problems - not removing
Errors were encountered while processing:
hadoop
Appreciate any help !
I dont know whether its a proper way to remove hadoop or not, but i have removed it using below method.
I first manually deleted the /usr/local/hadoop folder from all the users(If any).If you are not able to remove it due to lack of permissions, then make sure about the permissions of the folder. Make the permission of the folder to "Sudo" and on "Creating and deleting files" so that every user can delete from their instances.
Then from Terminal $ rm -r hadoop does the job going to the /usr/local path.
After this, i checked $ hadoop version again in terminal ..and boom it again showed its existence. Then i did below step.
2.Goto terminal sudo apt-get purge hadoop or sudo apt-get remove hadoop...then it worked

Resources