How to uninstall Hadoop 1.0.0 - hadoop

I set up my Hadoop clusters with Hadoop 2.0.2. Then, today I tried to test 1.0.0. So I downloaded the deb file from the Hadoop website and installed it: It did mess up everything.
Now, when I type "which -a hadoop" I get 2 results
one pointing to my old Hadoop installation folder
and the other one pointing to /usr/bin/hadoop.
So the question is: how to get rid off of Hadoop 1.0.0 completely?

Try using dpkg -r hadoop; this should remove the Hadoop package from the system, but leave the config files intact. If you want to lose the config files as well, try dpkg -P hadoop instead.

> $HADOOP_HOME
> /home/shiv/hadoop
> sudo rm -r /home/shiv/hadoop
And Hadoop is uninstalled!

I struggled through this for longer than a while and then decided to share it here:
The trick is to basically delete all the symlinks pointing back to locations where HDP components reside since that is what causes 80% of the problem. Here is a step by step tutorial for that:
http://www.yourtechchick.com/hadoop/how-to-completely-remove-and-uninstall-hdp-components-hadoop-uninstall-on-linux-system/
Hope that helps!

Related

How to install custom Spark version in Cloudera

I am new to Spark, Hadoop and Cloudera. We need to use a specific version (1.5.2) of Spark and also have the requirement to use Cloudera for the cluster management, also for Spark.
However, CDH 5.5 comes with Spark 1.5.0 and can not be changed very easily.
People are mentioning to "just download" a custom version of spark manually. But how to manage this "custom" spark version by Cloudera, so I can distribute it across the cluster? Or, does it need to be operated and provisioned completely separate from Cloudera?
Thanks for any help and explanation.
Yes, It is possible to run any Apache Spark version .!!
Steps we need to make sure before doing it:
You have YARN configured in the CM. After which you can run your application as a YARN application with spark-submit. please refer to this link. It will be used to work like any other YARN application.
It is not mandatory to install spark, you can run your application.
Under YARN, you can run any application, with any version of Spark. After all, Spark it's a bunch of libraries, so you can pack your jar with your dependencies and send it to YARN. However there are some additional, small tasks to be done.
In the following link, dlb8 provides a list of tasks to be done to run Spark 2.0 in an installation with a previous version. Just change version/paths accordingly.
Find the version of CDH and Hadoop running on your cluster using
$ hadoop version
Hadoop 2.6.0-cdh5.4.8
Download Spark and extract the sources. Pre built Spark binaries should work out of the box with most CDH versions, unless there are custom fixes in your CDH build in which case you can use the spark-2.0.0-bin-without-hadoop.tgz.
(Optional) You can also build Spark by opening the distribution directory in the shell and running the following command using the CDH and Hadoop version from step 1
$ ./dev/make-distribution.sh --name custom-spark --tgz -Psparkr -Phadoop-2.6 -Phive -Phive-thriftserver -Pyarn
Note: With Spark 2.0 the default build uses Scala version 2.11. If you need to stick to Scala 2.10, use the -Dscala-2.10 property or
$ ./dev/change-scala-version.sh 2.10
Note that -Phadoop-provided enables the profile to build the assembly without including Hadoop-ecosystem dependencies provided by Cloudera.
Extract the tgz file.
$tar -xvzf /path/to/spark-2.0.0-bin-hadoop2.6.tgz
cd into the custom Spark distribution and configure the custom Spark distribution by copying the configuration from your current Spark version
$ cp -R /etc/spark/conf/* conf/
$ cp /etc/hive/conf/hive-site.xml conf/
Change SPARK_HOME to point to folder with the Spark 2.0 distribution
$ sed -i "s#\(.*SPARK_HOME\)=.*#\1=$(pwd)#" conf/spark-env.sh
Change spark.master to yarn from yarn-client in spark-defaults.conf
$ sed -i 's/spark.master=yarn-client/spark.master=yarn/' conf/spark-
defaults.conf
Delete spark.yarn.jar from spark-defaults.conf
$ sed '-i /spark.yarn.jar/d' conf/spark-defaults.conf
Finally test your new Spark installation:
$ ./bin/run-example SparkPi 10 --master yarn
$ ./bin/spark-shell --master yarn
$ ./bin/pyspark
Update log4j.properties to suppress annoying warnings. Add the following to conf/log4j.properties
echo "log4j.logger.org.spark_project.jetty=ERROR" >> conf/log4j.properties
However, it can be adapted to the opposite, since the bottom line is "to use a Spark version on an installation with a different version".
It's even simpler if you don't have to deal with 1.x - 2.x version changes, because you don't need to pay attention to the change of scala version and of the assembly approach.
I tested it in a CDH5.4 installation to set 1.6.3 and it worked fine. I did it with the "spark.yarn.jars" option:
#### set "spark.yarn.jars"
$ cd $SPARK_HOME
$ hadoop fs mkdir spark-2.0.0-bin-hadoop
$ hadoop fs -copyFromLocal jars/* spark-2.0.0-bin-hadoop
$ echo "spark.yarn.jars=hdfs:///nameservice1/user/<yourusername>/spark-2.0.0-bin-hadoop/*" >> conf/spark-defaults.conf

Hadoop installation on Ubuntu

Can anybody provide me with the commands to install Hadoop 2.2 on Ubuntu 14.04?
I have checked various sites but they all seem to have different procedures.
I successfully installed it using this step-by-step guide (in my case it was on 14.10, but I doubt there will be any difference)
Here is a important part:
wget http://apache.mirrors.pair.com/hadoop/common/stable2/hadoop-
2.2..tar.gz
tar –xvzf hadoop-2.2.0.tar.gz
mv hadoop-2.2.0 hadoop
sudo mv hadoop /usr/local/
sudo chown -R hduser:hadoop Hadoop
You can further configure it for your convenience, choose interface etc.
Hope this helps

apache sqoop installation missing addtowar script

I was installing apache sqoop for hadoop v1. in the installation it says script /bin/addtowar.sh shoudl be in sqoop bin dir but i dont find it.
used web url
https://sqoop.apache.org/docs/1.99.1/Installation.html
Thanks!
Hi I think your missing this command: mv sqoop-(version)-bin-hadoop(hadoop version).tar.gz /usr/lib/sqoop
Please replace that command with this command:: mv sqoop-(version)-bin-hadoop(hadoop version) /usr/lib/sqoop
I think Your problem will solve.
addToWar.sh is no longer used in the install.
I ran into the same thing, and documented the solution here:
http://brianoneill.blogspot.com/2014/10/sqoop-1993-w-hadoop-2-installation.html

Hadoop showing old version despite latest version installation

I am trying to install hadoop in my ubuntu OS. I followed each and every step exactly from this link Hadoop Install Tutorial and everything was going as expected until i tried to run
$ start-dfs.sh and $ hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2 5 command. These commands doesn't work as expected.I tried R&D and somehow came to know that i was using older hadoop version Hadoop 1.0.2 despite of me getting latest 2.2.0 version.
As i could not solve this, i tried to uninstall hadoop completely, Now when i try doing it, it says
$ sudo dpkg -r hadoop
dpkg: dependency problems prevent removal of hadoop:
hadoop-native depends on hadoop (= 1.0.2-0ubuntu1~hadoop1).
dpkg: error processing hadoop (--remove):
dependency problems - not removing
Errors were encountered while processing:
hadoop
Appreciate any help !
I dont know whether its a proper way to remove hadoop or not, but i have removed it using below method.
I first manually deleted the /usr/local/hadoop folder from all the users(If any).If you are not able to remove it due to lack of permissions, then make sure about the permissions of the folder. Make the permission of the folder to "Sudo" and on "Creating and deleting files" so that every user can delete from their instances.
Then from Terminal $ rm -r hadoop does the job going to the /usr/local path.
After this, i checked $ hadoop version again in terminal ..and boom it again showed its existence. Then i did below step.
2.Goto terminal sudo apt-get purge hadoop or sudo apt-get remove hadoop...then it worked

Hadoop - install process for /usr/libexec etc

I'm trying to compile/install/run Hadoop as a single node cluster on a Mac OS X 10.7.5.
I've downloaded the hadoop-2.2.0-src, and am able to compile all modules with
mvn install
The install is successful, and the tests check out too.
When trying to run hadoop (specifically, hdfs -namenode format to start off with), I see a requirement for hadoop components to exist in directories like:
/usr/libexec
/usr/lib/conf etc.
What is the install step required to get the files into this directory? Can it be done from Maven, or is there a manual install step required?
One option, and I'm not sure if it's correct, is to set my HADOOP_HOME - is this where hadoop finds its /libexecs?
Thanks guys
Pete

Resources