Hadoop installation status - hadoop

I'm running debian. I'm new to Hadoop. Sometime back, I was trying to install Hadoop. I'm not sure if I've successfully installed. When I enter command at terminal
hadoop version
I see the output:
Hadoop 2.7.1
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 15ecc87ccf4a0228f35af08fc56de536e6ce657a
Compiled by jenkins on 2015-06-29T06:04Z
Compiled with protoc 2.5.0
From source with checksum fc0a1a23fc1868e4d5ee7fa2b28a58a
This command was run using /home/xxxxxxx/java/hadoop-2.7.1/share/hadoop/common/hadoop-common-2.7.1.jar
Is hadoop installed properly? If not, what other tests I've to do? If yes, is there some simple "get-started" tutorial/exercise you're aware of, that can help me get started with?
Thank you!

Related

./bin/hadoop command does not return any usage documentation

I am getting started with hadoop. I installed java. Set Java_Home to 1.8 and installed hadoop.2.7.6 and I cd'ed into the hadoop installation directory to run bin/hadoop. How ever I donot see any output. I have also tried one of the examples using the command
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.6.jar
Initially
Appreciate your help.
It looks like there is some issue with version 2.7.6. I installed version 2.7.7 and it started to work.

Installing spark on hadoop

I installed hadoop 2.7 on my mac. Then i want to install spark on it. But there is no any document for this.can anybody explain step by step how to install spark on hadoop?
Steps to Install Apache Spark
1) Open Apache Spark Website http://spark.apache.org/
2) Click on Downloads Tab a new Page will get open
3) Choose Pre-built for Hadoop 2.7 and later
4) Choose Direct Download
5) Click on Download Spark: spark-2.0.2-bin-hadoop2.7.tgz and save it on your desired location.
6) Go to the Downloaded Tar file and Extract it.
7) Again Extract the spark-2.0.2-bin-hadoop2.7.tar [File name will differ as version changes] to generate spark-2.0.2-bin-hadoop2.7 folder
8) Now open Shell Prompt and go to the bin directory of spark-2.0.2-bin-hadoop2.7 folder [Folder name will differ as version changes ]
9) Execute command spark-shell.sh
You will be in Spark Shell you can execute the spark commands
https://spark.apache.org/docs/latest/quick-start.html <-- Quick start Guide from spark
Hope this Helps!!!
For running spark on yarn cluster there is lot of steps to install hadoop and spark and all so i write one blog on it step by step you can install it and run spark shell on yarn see the below link
https://blog.knoldus.com/2016/01/30/spark-shell-on-yarn-resource-manager-basic-steps-to-create-hadoop-cluster-and-run-spark-on-it/
Here are the steps I took to install Apache Spark to a Linux Centos system with hadoop:
Install a default Java system (ex: sudo yum install java-11-openjdk)
Download latest release of Apache Spark from spark.apache.org
Extract the Spark tarball (tar xvf spark-2.4.5-bin-hadoop2.7.tgz)
Move Spark folder created after extraction to the /opt/ directory (sudo mv spark-2.4.5-bin-hadoop2.7/ /opt/spark)
Execute with command /opt/spark/bin/spark-shell if you wish to work with Scala or /opt/spark/bin/pyspark if you want to work with Python

Failed dependencies when install pxf service

When I rpm pxf service in hawq, I got some errors:
error: Failed dependencies:
hadoop >= 2.6.0 is needed by pxf-service-0:3.0.0-root.noarch
hadoop-hdfs >= 2.6.0 is needed by pxf-service-0:3.0.0-root.noarch
What's your advice here ?
Please make sure the PXF rpm OS architecture version matches. For example if the PXF rpm is built for RHEL6 and you are installing on RHEL7 then you may see some dependency issues
Could you please make sure the version of hadoop you are running in the cluster .I guess you might be running a lower version of hadoop .You have to run atleast 2.6 version of hadoop to run the current version of pxf .
The wiki here use the rpm bigtop(hadoop).
https://cwiki.apache.org/confluence/display/HAWQ/Build+Package+and+Install+with+RPM
It means if I install with rpm(HAWQ 2.2.0), the other ways (using binary hadoop without rpm installs like tar) are not support.
If I install hadoop use tar, I must build HAWQ from source code for now.
Please refer to:
https://issues.apache.org/jira/browse/HAWQ-1568

how to install Spark and Hadoop from tarball separately [Cloudera]

I want to install Cloudera distribution of Hadoop and Spark using tarball.
I have already set up Hadoop in Pseudo-Distributed mode in my local machine and successfully ran a Yarn example.
I have downloaded latest tarballs CDH 5.3.x from here
But the folder structure of Spark downloaded from Cloudera is differrent from Apache website. This may be because Cloudera provides it's own version maintained separately.
So, as there are no documentation I have found yet to install Spark from this Cloudera's tarball separately.
Could someone help me to understand how to do it?
Spark could be extracted to any directory. You just need to run the ./bin/spark-submit command (available in extracted spark directory) with required parameters to submit the job. To start spark interactive shell, please use command ./bin/spark-shell.

Install hadoop on ubuntu

Hi i am new to Big Data concept. I have installed Hbase0.98-hadoop2. Does it mean that hadoop 2 has also been installed on my machine along with HBase? If yes then how can I run hadoop?
I have no idea about Ubuntu packages. After the installation, you can do one quick experiment. In the shell, just type (incomplete command):
$ hadoop
If it tells you command not found: hadoop, then no, you don't have Hadoop installed.
Follow the link for hadoop installation -
http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-common/SingleCluster.html

Resources