I am trying to run Spark using yarn and I am running into this error:
Exception in thread "main" java.lang.Exception: When running with master 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment.
I am not sure where the "environment" is (what specific file?). I tried using:
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
in the bash_profile, but this doesn't seem to help.

While running spark using Yarn, you need to add following line in to spark-env.sh
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
Note: check $HADOOP_HOME/etc/hadoop is correct one in your environment. And spark-env.sh contains export of HADOOP_HOME as well.

For the Windows environment, open file load-spark-env.cmd in the Spark bin folder and add the following line:

just an update to answer by Shubhangi,
cd $SPARK_HOME/bin
sudo nano load-spark-env.sh
add below lines , save and exit
export SPARK_LOCAL_IP=""
export HADOOP_CONF_DIR="$HADOOP_HOME/etc/hadoop"
export YARN_CONF_DIR="$HADOOP_HOME/etc/hadoop"


I am new apache-spark. I have tested some application in spark standalone mode.but I want to run application yarn mode.I am running apache-spark 2.1.0 in windows.Here is My code
c:\spark>spark-submit2 --master yarn --deploy-mode client --executor-cores 4 --jars C:\DependencyJars\spark-streaming-eventhubs_2.11-2.0.3.jar,C:\DependencyJars\scalaj-http_2.11-2.3.0.jar,C:\DependencyJars\config-1.3.1.jar,C:\DependencyJars\commons-lang3-3.3.2.jar --conf spark.driver.userClasspathFirst=true --conf spark.executor.extraClassPath=C:\DependencyJars\commons-lang3-3.3.2.jar --conf spark.executor.userClasspathFirst=true --class "GeoLogConsumerRT" C:\sbtazure\target\scala-2.11\azuregeologproject_2.11-1.0.jar
so from searching website. I have created a folder name Hadoop_CONF_DIR and place hive site.xml in it and pointed as environment variable, after that i have run spark-submit then I have got
connection refused exception
I think i could not configure yarn mode set up properly.Could anyone help me for solving this issue? do I need to install Hadoop and yarn separately?I want to run my application in pseudo distributed mode.Kindly help me to configure yarn mode in windows thanks
You need to export two variables HADOOP_CONF_DIR and YARN_CONF_DIR to make your configurations file visible to yarn. Use below code in .bashrc file if you are using linux.
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
In windows you need to set environment variable.
Hope this helps!
If you are running spark using Yarn then you better need to add this to spark-env.sh:
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

sqoop started but command show sqoop command not found

i am learning sqoop from few days and successfully installed and configure with hadoop.
hadoop_usr#sawai-Lenovo-G580:/usr/local/sqoop/bin$ sqoop2-server start
Setting conf dir: /usr/local/sqoop/bin/conf
Sqoop home directory: /usr/local/sqoop
The Sqoop server is already started.
hadoop_usr#sawai-Lenovo-G580:/usr/local/sqoop/bin$ sqoop
sqoop: command not found
sqoop server is already running and when i try to fire sqoop command then i get error message. command not found. sqoop home is already in path
export SQOOP_HOME=/usr/local/sqoop
$ echo $PATH
Please help me to resolve this issue.
Thanks in advance.
command not found error in most of the cases happens because of path is not set for same.
Kindly set the paths of sqoop, which you already have done.
Compile the file where you have set $PATH or restart your terminal.
put below command in .bashrc file
export SQOOP_HOME=/home/pj/sqoop
and restart .bashrc
source .bashrc
If still issue persist, restart terminal.

spark-shell throws error in Apache spark

I have installed hadoop on ubuntu on virtual box(host os Windows 7).I have also installed Apache spark, configured SPARK_HOME in .bashrc and added HADOOP_CONF_DIR to spark-env.sh. Now when I start the spark-shell it throws error and do not initialize spark context, sql context. Am I missing something in installation and also I would want to run it on a cluster (hadoop 3 node cluster is set up).
I have the same issue when trying to install Spark local with Windows 7. Please make sure the below paths is correct and I am sure I will work with you. I answer same question in this link So, you can follow the below and it will work.
Create JAVA_HOME variable: C:\Program Files\Java\jdk1.8.0_181\bin
Add the following part to your path: ;%JAVA_HOME%\bin
Create SPARK_HOME variable: C:\spark-2.3.0-bin-hadoop2.7\bin
Add the following part to your path: ;%SPARK_HOME%\bin
The most important part Hadoop path should include bin file before winutils.ee as the following: C:\Hadoop\bin Sure you will locate winutils.exe inside this path.
Create HADOOP_HOME Variable: C:\Hadoop
Add the following part to your path: ;%HADOOP_HOME%\bin
Now you can run the cmd and write spark-shell it will work.

hadoop installation -hadooop-home set error

I'm installing Hanborq optimized Hadoop Distribution (fully distribution mode ) ,i followed all steps exactly in the following links,and there is no errors happened .when I reach to step that format the hdfs file :
$ hadoop namenode -format
An error accursed tells that "HADOOP_HOME is not set correctly
please set your hadoop_home variable to the absolute path of the directorythat contains hadoop-core-VERSION.jar"
It seems you did not set HADOOP_HOME correctly in .bashrc file. Add below lines in your .bashrc file and execute it by . .bashrc. Please give reply if it works
export HADOOP_HOME="/usr/local/hadoop/hadoop-2.6"
export PATH
Note: HADOOP_HOME is location of hadoop directory

running fpg algorithm of mahout on hadoop as cluster mod

I install mahout-0.7 and hadoop-1.2.1 on linux (centos).hadoop config as multi_node.
I created a user named hadoop and install mahout and hadoop in path /home/hadoop/opt/
I set MAHOU_HOME and HADOOP_HOME and MAHOUT_LOCAL , .... in .bashrc file in the user's environment hadoop
# .bashrc
# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
# User specific aliases and functions
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-
export HADOOP_HOME=/home/hadoop/opt/hadoop
export HADOOP_CONF_DIR=/opt/hadoop/conf
export MAHOUT_HOME=/home/hadoop/opt/mahout
I want to run mahout on hadoop systemfile ,When I run the following command, I get an error
hadoop#master mahout$ bin/mahout fpg -i /home/hadoop/output.dat -o patterns -method mapreduce -k 50 -s 2
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running locally
Error occurred during initialization of VM
Could not reserve enough space for object heap
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
Please help me. I tried but could not fix the error.
It seems that there are some conflicts in your configurations and usage.
In the fist look you can make sure about these:
To make sure that you've set the Mahout path correctly use this command:
This should not return an empty string (when you run mahout locally)
Also HADOOP_CONF_DIR should be set to $HADOOP_HOME/conf
Here's a list of popular environment variables for Hadoop:
export JAVA_HOME=/path/to/jdk1.8.0/ #your jdk path
export HADOOP_HOME=/usr/local/hadoop #your hadoop path
export HADOOP_INSTALL=/usr/local/hadoop #your hadoop path
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
export HADOOP_CLASSPATH=/home/hduser/lib/* #thir party libraries to be loaded with Hadoop
Also you get a heap error and you should increase your heap size so JVM will be enable to initialize
Also you may help to solve your error by adding more info about your cluster:
how many machine are you using?
what is the hardware spec of these machines?
what distribution and version of Hadoop?
