Problems running Mahout and Hadoop

Problems running Mahout and Hadoop - hadoop

I'm new at Mahout and Hadoop.
I've successfully installed Hadoop Cluster with 3 machines, and the cluster is running fine, and I just installed Mahout on the Main namenode for "testing purposes", and I followed the instructions of installation and set the JAVA_HOME, but when I try to run classify-20newsgroups.sh it goes and download the dataset but after that I get the following error:
Error: JAVA_HOME is not set
Then I've revised the .bashrc and confirmed that the JAVA_HOME is set correctly, but it doesn't help.
Also how do I verify that Mahout is configured to run on Hadoop correctly and do you know of any example that can verify this configuration or environment?

The .bashrc is only read by a shell that is non-login, otherwise is read .bash_profile.
So you could set to read .bashrc from .bash_profile (see here What's the difference between .bashrc, .bash_profile, and .environment?) or just a set JAVA_HOME in .bash_profile.
There are another several possibilities to set the JAVA_HOME:
1) set .bashrc from terminal
~$ source .bashrc
2) set JAVA_HOME in open terminal before running classify-20newsgroups.sh
~$ JAVA_HOME=/path
~$ classify-20newsgroups.sh
3) run classify-20newsgroups.sh with JAVA_HOME, i.e.
~$ JAVA_HOME=/path classify-20newsgroups.sh
As for question about Mahout configuration for run on Hadoop. Standart example with classify-20newsgroups should work on hadoop if HADOOP_HOME is set.

You might need to explicitly set JAVA_HOME in hadoop-env.sh
In hadoop-env.sh, look for the comment "#The java implementation to use", and modify the JAVA_HOME path under it.
It should look something like this:
# The java implementation to use.
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
Of course fix the path of JAVA_HOME.

Related

Why I am getting command not found in hadoop?

I am working on a hadoop project on Ubuntu 14.04. Whenever I give the start-all.sh or start-dfs.sh, it gives me command not found message. What should I do?

You are not running the command in right environment.
The start-all.sh(deprecated) or start-dfs.sh command lies in /hadoop/bin directory. You have to find your hadoop home directory and find bin folder in it, then run the command
./start-dfs.sh

Do below inside your ~/.bashrc
export PATH=$PATH:$HADOOP_HOME/bin
then run source ~/.bashrc file. Now command should work.

This situation should be that the bin environment variable of Hadoop is not configured properly.
Modify the vi /etc/profile file
export $HADOOP_HOME=/usr/hadoop #the directory where your hadoop installed
export PATH=$HADOOP_HOME/bin:$PATH
then
source /etc/profile

Can not run pyspark command from any directory on my Mac after installation of Apache Spark

I have installed spark on my Mac, following the instructions in the book: "Apache Spark in 24 Hours". When I am in the spark directory, I am able to run pyspark by using the command:
./bin/pyspark
To install spark I created the env variable:
export SPARK_HOME=/opt/spark
Added it to the PATH:
export PATH=$SPARK_HOME/bin:$PATH
The book says that I should be able to run the "pyspark" or the "spark-shell" command from any directory, but it doesn't work:
pyspark: command not found
I followed instructions on similar questions asked by others on here:
I set my JAVA_HOME env variable:
export JAVA_HOME=$(/usr/libexec/java_home)
I also ran the following commands:
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.9-src.zip:$PYTHONPATH
When I run the env command this is the output:
SPARK_HOME=/opt/spark
TERM_PROGRAM=Apple_Terminal
SHELL=/bin/bash
TERM=xterm-256color
TMPDIR=/var/folders/hq/z0wh5c357cbgp1dh33lfhjj40000gn/T/
Apple_PubSub_Socket_Render=/private/tmp/com.apple.launchd.fJdtLqZ7dN/Render
TERM_PROGRAM_VERSION=361.1
TERM_SESSION_ID=A8BD2144-72AD-402C-A591-5C8A43DD398B
USER=richardgray
SSH_AUTH_SOCK=/private/tmp/com.apple.launchd.cQeqaF2v1z/Listeners
__CF_USER_TEXT_ENCODING=0x1F5:0x0:0x0
PATH=/opt/spark/bin:/Library/Frameworks/Python.framework/Versions/3.5/bin: /Library/Frameworks/Python.framework/Versions/3.5/bin:/Library/Frameworks/Python.framework/Versions/2.7/bin:/usr/local/heroku/bin:/Users/richardgray/.rbenv/shims:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin
PWD=/Users/richardgray
JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.7.0_25.jdk/Contents/Home
LANG=en_GB.UTF-8
XPC_FLAGS=0x0
XPC_SERVICE_NAME=0
SHLVL=1
HOME=/Users/richardgray
PYTHONPATH=/opt/spark/python/lib/py4j-0.9-src.zip:/opt/spark/python/:
LOGNAME=richardgray
_=/usr/bin/env
Is there something I am missing? Thanks in advance.

You wrote that
When I am in the spark directory, I am able to run pyspark by using
the command: ./bin/pyspark
You created export SPARK_HOME=/opt/spark
Can you please confirm that spark directory is indeed /opt/spark ?
If you execute spark from /Users/richardgray/opt/spark/bin please set:
export SPARK_HOME=/Users/richardgray/opt/spark
followed by:
export PATH=$SPARK_HOME/bin:$PATH
Note: If it solve your problem, you'll need to add those two exports to your login scripts (e.g. .profile) so the path will be set automatically

Apache Spark can not Run on Windows

I had downloaded spark-2.0.1-bin-hadoop2.7 and installed it. I installed JAVA and set JAVA_HOME in System Variables.
But in running I have this Error:
How to it can be fixed ?

I think the problem is with whitespaces in your path.
Try to place downloaded spark in for example. F:\Msc\BigData\BigDataSeminar\Spark\
Also check whether SPARK_HOME, JAVA_HOME and HADOOP_HOME are placed in the path without whitespaces.

Hadoop 2.2.0 fails running start-dfs.sh with Error: JAVA_HOME is not set and could not be found

I have a work in progress installation of Hadoop in Ubuntu 12.x. I already had a deploy user which I plan to use to run hadoop in a cluster of machines. The following code demonstrate my problem basically I can ssh olympus no problems but start-dfs.sh fails doing exactly that:
deploy#olympus:~$ ssh olympus
Welcome to Ubuntu 12.04.4 LTS (GNU/Linux 3.5.0-45-generic x86_64)
* Documentation: https://help.ubuntu.com/
Last login: Mon Feb 3 18:22:27 2014 from olympus
deploy#olympus:~$ echo $JAVA_HOME
/opt/dev/java/1.7.0_51
deploy#olympus:~$ start-dfs.sh
Starting namenodes on [olympus]
olympus: Error: JAVA_HOME is not set and could not be found.

You can edit hadoop-env.sh file and set JAVA_HOME for Hadoop
Open the file and find the line as bellow
export JAVA_HOME=/usr/lib/j2sdk1.6-sun
Uncomment the line And update the java_home as per your environment
This will solve the problem with java_home.

Weird out of the box bug on Ubuntu. The current line
export JAVA_HOME=${JAVA_HOME}
in /etc/hadoop/hadoop-env.sh should pick up java home from host but it doesnt.
Just edit the file and hard code the java home for now.

Alternatively you can edit /etc/environment to include:
JAVA_HOME=/usr/lib/jvm/[YOURJAVADIRECTORY]
This makes JAVA_HOME available to all users on the system, and allows start-dfs.sh to see the value. My guess is that start-dfs.sh is kicking off a process as another user somewhere that does not pick up the variable unless explicitly set in hadoop-env.sh.
Using hadoop-env.sh is arguably clearer -- just adding this option for completeness.

Edit the Hadoop start-up script /etc/hadoop/hadoop-env.sh by setting the JAVA_PATH explicitly.
For example:
Instead of export JAVA_HOME=${JAVA_HOME}, do
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.65-3.b17.el7.x86_64/jre
This is with the Java version, java-1.8.0-openjdk.

I have hadoop installed on /opt/hadoop/ and java is installed into /usr/lib/jvm/java-8-oracle
At the end adding this into the bash profile files, I solved any problem.
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export HADOOP_HOME=/opt/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_ROOT_LOGGERi=INFO,console
export HADOOP_SECURITY_LOGGER=INFO,NullAppender
export HDFS_AUDIT_LOGGER=INFO,NullAppender
export HADOOP_INSTALL=$HADOOP_HOME
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_PREFIX=$HADOOP_HOME
export HADOOP_LIBEXEC_DIR=$HADOOP_HOME/libexec
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
export HADOOP_YARN_HOME=$HADOOP_HOME
export YARN_LOG_DIR=/tmp

How do you install and run Accumulo and Hadoop on OS X 10.7.4

So I'm trying to run a MapReduce, word count example but I need to have Hadoop running. I tried following instructions from here but it doesn't seem to be working. The problem is the environment variable is not being set. I added the line setenv HADOOP_HOME /opt/hadoop-0.20.2 in /etc/launchd.conf but when I run echo $HADOOP_HOME it doesn't print the path.

Set the HADOOP_HOME variable directly in Accumulo's conf/accumulo-env.sh script.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Problems running Mahout and Hadoop - hadoop

Related

Why I am getting command not found in hadoop?

Can not run pyspark command from any directory on my Mac after installation of Apache Spark

Apache Spark can not Run on Windows

Hadoop 2.2.0 fails running start-dfs.sh with Error: JAVA_HOME is not set and could not be found

How do you install and run Accumulo and Hadoop on OS X 10.7.4

Categories

Resources