I am working on a hadoop project on Ubuntu 14.04. Whenever I give the start-all.sh or start-dfs.sh, it gives me command not found message. What should I do?
You are not running the command in right environment.
The start-all.sh(deprecated) or start-dfs.sh command lies in /hadoop/bin directory. You have to find your hadoop home directory and find bin folder in it, then run the command
./start-dfs.sh
Do below inside your ~/.bashrc
export PATH=$PATH:$HADOOP_HOME/bin
then run source ~/.bashrc file. Now command should work.
This situation should be that the bin environment variable of Hadoop is not configured properly.
Modify the vi /etc/profile file
export $HADOOP_HOME=/usr/hadoop #the directory where your hadoop installed
export PATH=$HADOOP_HOME/bin:$PATH
then
source /etc/profile
Related
I am working on OpenClassroom and trying to understand Hadoop, but I have some problems installing it (I am kinda new on Linux):
I have installed and configured Hadoop (I have changed the files etc/hadoop/core-site.xml, etc/hadoop/hdfs-site.xml, etc/hadoop/mapred-site.xml and etc/hadoop/yarn-site.xml as asked in the website)
But after, they tell me to do that in order to launch Hadoop:
$ hdfs namenode -format
$ start-dfs.sh
$ start-yarn.sh
But when I do it, it gives me:
hdfs : command can't be found
What am I doing wrong?
The issue can happen in the below scenarios:-
1) hdfs binaries are not installed properly
2) The location of the hdfs execuatble script is not present in $PATH
of the user who is executing the command. To verify the same, try the
below steps:-
A) Please clarify if the hdfs binaries is installed by navigating to
the location "/usr/hdp//hadoop-hdfs/bin" directory.
B) Please check if /usr/bin directory and HADOOP_HOME is present in
the $PATH environment variable? (echo $PATH)
C) Output of the command ls -ltr /usr/bin/hdfs. By default a softlink
is created for hdfs script in usr/bin directory.
I have installed spark on my Mac, following the instructions in the book: "Apache Spark in 24 Hours". When I am in the spark directory, I am able to run pyspark by using the command:
./bin/pyspark
To install spark I created the env variable:
export SPARK_HOME=/opt/spark
Added it to the PATH:
export PATH=$SPARK_HOME/bin:$PATH
The book says that I should be able to run the "pyspark" or the "spark-shell" command from any directory, but it doesn't work:
pyspark: command not found
I followed instructions on similar questions asked by others on here:
I set my JAVA_HOME env variable:
export JAVA_HOME=$(/usr/libexec/java_home)
I also ran the following commands:
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.9-src.zip:$PYTHONPATH
When I run the env command this is the output:
SPARK_HOME=/opt/spark
TERM_PROGRAM=Apple_Terminal
SHELL=/bin/bash
TERM=xterm-256color
TMPDIR=/var/folders/hq/z0wh5c357cbgp1dh33lfhjj40000gn/T/
Apple_PubSub_Socket_Render=/private/tmp/com.apple.launchd.fJdtLqZ7dN/Render
TERM_PROGRAM_VERSION=361.1
TERM_SESSION_ID=A8BD2144-72AD-402C-A591-5C8A43DD398B
USER=richardgray
SSH_AUTH_SOCK=/private/tmp/com.apple.launchd.cQeqaF2v1z/Listeners
__CF_USER_TEXT_ENCODING=0x1F5:0x0:0x0
PATH=/opt/spark/bin:/Library/Frameworks/Python.framework/Versions/3.5/bin: /Library/Frameworks/Python.framework/Versions/3.5/bin:/Library/Frameworks/Python.framework/Versions/2.7/bin:/usr/local/heroku/bin:/Users/richardgray/.rbenv/shims:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin
PWD=/Users/richardgray
JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.7.0_25.jdk/Contents/Home
LANG=en_GB.UTF-8
XPC_FLAGS=0x0
XPC_SERVICE_NAME=0
SHLVL=1
HOME=/Users/richardgray
PYTHONPATH=/opt/spark/python/lib/py4j-0.9-src.zip:/opt/spark/python/:
LOGNAME=richardgray
_=/usr/bin/env
Is there something I am missing? Thanks in advance.
You wrote that
When I am in the spark directory, I am able to run pyspark by using
the command: ./bin/pyspark
You created export SPARK_HOME=/opt/spark
Can you please confirm that spark directory is indeed /opt/spark ?
If you execute spark from /Users/richardgray/opt/spark/bin please set:
export SPARK_HOME=/Users/richardgray/opt/spark
followed by:
export PATH=$SPARK_HOME/bin:$PATH
Note: If it solve your problem, you'll need to add those two exports to your login scripts (e.g. .profile) so the path will be set automatically
I`m trying to install hadoop 2.6 on Ubuntu 14.04.
When I write this command line
bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'"
this is the cmd
araziz#araziz-HP-EliteBook-8440p:~$ cd hadoop
araziz#araziz-HP-EliteBook-8440p:~/hadoop$ ls
hadoop-2.6.0-src hadoop-2.6.0-src.tar.gz
araziz#araziz-HP-EliteBook-8440p:~/hadoop$ cd ha*
araziz#araziz-HP-EliteBook-8440p:~/hadoop/hadoop-2.6.0-src$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
bash: bin/hadoop: No such file or directory
In all hadoop tutorials, bin/hadoop is the location of hadoop, you could see it also as $HADOOP_HOME/bin/hadoop. $HADOOP_HOME it's where hadoop it's located. In my case, it's in /usr/local/hadoop. But, again, it depends on the instructions that you are following. Check more closely your tutorial !
Before running Hadoop commands you need to set $HADOOP_HOME in .bashrc file
To help in this situations, I've created some scripts in this repository: https://github.com/lalosam/EasyHadoop.
hadoop.sh script download, unpack, configure hadoop, install required dependencies and set environment variables according with the latest (hadoop 2.7.1) official Getting Started tutorial. I developed it on Linux Mint but it should work in Ubuntu since they are using the same package manager (apt-get).
I'm new at Mahout and Hadoop.
I've successfully installed Hadoop Cluster with 3 machines, and the cluster is running fine, and I just installed Mahout on the Main namenode for "testing purposes", and I followed the instructions of installation and set the JAVA_HOME, but when I try to run classify-20newsgroups.sh it goes and download the dataset but after that I get the following error:
Error: JAVA_HOME is not set
Then I've revised the .bashrc and confirmed that the JAVA_HOME is set correctly, but it doesn't help.
Also how do I verify that Mahout is configured to run on Hadoop correctly and do you know of any example that can verify this configuration or environment?
The .bashrc is only read by a shell that is non-login, otherwise is read .bash_profile.
So you could set to read .bashrc from .bash_profile (see here What's the difference between .bashrc, .bash_profile, and .environment?) or just a set JAVA_HOME in .bash_profile.
There are another several possibilities to set the JAVA_HOME:
1) set .bashrc from terminal
~$ source .bashrc
2) set JAVA_HOME in open terminal before running classify-20newsgroups.sh
~$ JAVA_HOME=/path
~$ classify-20newsgroups.sh
3) run classify-20newsgroups.sh with JAVA_HOME, i.e.
~$ JAVA_HOME=/path classify-20newsgroups.sh
As for question about Mahout configuration for run on Hadoop. Standart example with classify-20newsgroups should work on hadoop if HADOOP_HOME is set.
You might need to explicitly set JAVA_HOME in hadoop-env.sh
In hadoop-env.sh, look for the comment "#The java implementation to use", and modify the JAVA_HOME path under it.
It should look something like this:
# The java implementation to use.
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
Of course fix the path of JAVA_HOME.
I am trying to execute hadoop command from bin folder of hadoop. It doesn't work. But ./hadoop in bin folder works. What would be the problem?
Thanks,
Madhu