HADOOP_HOME is not set correctly - hadoop

I downloaded the binary tarball of hadoop from here: http://hadoop.apache.org/releases.html (ver 2.8.4). I unpacked the tar.gz file and then changed the etc/hadoop-env.sh from
export JAVA_HOME={$JAVA_HOME}
to my java jdk locaction:
export JAVA_HOME=C:\Program Files\Java\jdk1.8.0_131
I also added these two lines:
export HADOOP_HOME=D:/hadoop/hadoop-2.8.4
export PATH=$PATH:$HADOOP_HOME/bin
But when i try to run
$ hadoop version
from cmd i get an error message that says
Error: HADOOP_HOME is not set correctly
What did I do wrong and how should I change the hadoop_home path for it to work?

Other than {$JAVA_HOME} has the dollar sign in the wrong spot (needs to be outside the brackets), Windows doesn't run the shell script to locate your variables
You need to set environment variables in Windows from the Control Panel. And you also need to remove all spaces from the file path of "Program Files"
Its not clear if you're using Cygwin or using Windows Linux subsystem, but it's different from the native CMD

Set the path HADOOP_HOME Environment variable as below:
export HADOOP_HOME=D:\hadoop\hadoop-2.8.4
export PATH=$PATH:$HADOOP_HOME\bin
$ hadoop version
It will work

I come across such error when I try to use hadoop-3.3.1, the latest version. I have searched a lot about "HADOOP_HOME not correctly set" and there are no useful results.
But after I downgrade to hadoop-3.2.2, this error disappears.
I think you can try the non-latest version again.

Related

How can I change the JAVA_HOME directory permanently on MAC?

I have a problem with my JAVA_HOME directory. Every time I want to use gradle I need to set my JAVA_HOME directory again and again.
I know how to export it to the right directory but after I close my terminal, I have to do it all over again.
What I do is
which java --> /Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/bin/java
export JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/"
to set the correct path. Everything is then set correctly, but when I close the terminal and get check the path again:
echo $JAVA_HOME -> /Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Homeexport
Can you guys tell me how to change the directory permanently? I googled so often and found solutions which didn't help me at all
This isn't so much a Gradle question, as an environment question. You would need to set JAVA_HOME in your shell's configuration. If you can tell me what shell you're using, I can point you in the right direction.
Since you're using ZSH, you can open ~/.zshenv and paste this in:
export JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/"
Another solution would be to use something like Jenv to manage your Java environment, with the Gradle plugin enabled.

The system cannot find the path specified error while running pyspark

I just downloaded spark-2.3.0-bin-hadoop2.7.tgz. After downloading I followed the steps mentioned here pyspark installation for windows 10.I used the comment bin\pyspark to run the spark & got error message
The system cannot find the path specified
Attached is the screen shot of error message
Attached is the screen shot of my spark bin folder
Screen shot of my path variable looks like
I have python 3.6 & Java "1.8.0_151" in my windows 10 system
Can you suggest me how to resolve this issue?
Actually, the problem was with the JAVA_HOME environment variable path. The JAVA_HOME path was set to .../jdk/bin previously,
I stripped the last /bin part for JAVA_HOME while keeping it (/jdk/bin) in system or environment path variable (%path%) did the trick.
My problem was that the JAVA_HOME was pointing to JRE folder instead of JDK. Make sure that you take care of that
Worked hours and hours on this. My problem was with Java 10 installation. I uninstalled it and installed Java 8, and now Pyspark works.
Switching SPARK_HOME to C:\spark\spark-2.3.0-bin-hadoop2.7 and changing PATH to include %SPARK_HOME%\bin did the trick for me.
Originally my SPARK_HOME was set to C:\spark\spark-2.3.0-bin-hadoop2.7\bin and PATH was referencing it as %SPARK_HOME%.
Running a spark command directly in my SPARK_HOME dir worked but only once. After that initial success I then noticed your same error and that echo %SPARK_HOME% was showing C:\spark\spark-2.3.0-bin-hadoop2.7\bin\.. I thought perhaps spark-shell2.cmd had edited it in attempts to get itself working, which led me here.
For those who use Windows and still trying, what solved to me was reinstalling Python (3.9) as a local user (c:\Users\<user>\AppData\Local\Programs\Python) and defined both env variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON to c:\Users\<user>\AppData\Local\Programs\Python\python.exe
Fixing problems installing Pyspark (Windows)
Incorrect JAVA_HOME path
> pyspark
The system cannot find the path specified.
Open System Environment variables:
rundll32 sysdm.cpl,EditEnvironmentVariables
Set JAVA_HOME: System Variables > New:
Variable Name: JAVA_HOME
Variable Value: C:\Program Files\Java\jdk1.8.0_261
Also, check that SPARK_HOME and HADOOP_HOME are correctly set, e.g.:
SPARK_HOME=C:\Spark\spark-3.2.0-bin-hadoop3.2
HADOOP_HOME=C:\Spark\spark-3.2.0-bin-hadoop3.2
Important: Double-check the following
The path exists
The path does not contain the bin folder
Incorrect Java version
> pyspark
WARN SparkContext: Another SparkContext is being constructed
UserWarning: Failed to initialize Spark session.
java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.storage.StorageUtils$
Ensure that JAVA_HOME is set to Java 8 (jdk1.8.0)
winutils not installed
> pyspark
WARN Shell: Did not find winutils.exe
java.io.FileNotFoundException: Could not locate Hadoop executable
Download winutils.exe and copy it to your spark home bin folder
curl -OutFile C:\Spark\spark-3.2.0-bin-hadoop3.2\bin\winutils.exe -Uri https://github.com/steveloughran/winutils/raw/master/hadoop-3.0.0/bin/winutils.exe
Most likely you forgot to define the Windows environment variables such that the Spark bin directory is in your PATH environment variable.
Define the following environment variables using the usual methods for Windows.
First define an environment variable called SPARK_HOME to be C:\spark\spark-2.3.0-bin-hadoop2.7
Then either add %SPARK_HOME%\bin to your existing PATH environment variable, or if none exists (unlikely) define PATH to be %SPARK_HOME%\bin
If there is no typo specifying the PATH,
echo %PATH% should give you the fully resolved path to the Spark bin directory i.e. it should look like
C:\spark\spark-2.3.0-bin-hadoop2.7\bin;
If PATH is correct, you should be able to type pyspark in any directory and it should run.
If this does not resolve the issue perhaps the issue is as specified in pyspark: The system cannot find the path specified in which case this question is a duplicate.
Update: in my case it came down to wrong path for JAVA, I got it to work...
I'm having the same problem. I initially installed Spark through pip, and pyspark ran successfully. Then I started messing with Anaconda updates and it never worked again. Any help will be appreciated...
I'm assuming PATH is installed correctly for the original author. A way to check that is to run spark-class from command prompt. With correct PATH it will return Usage: spark-class <class> [<args>] when ran from an arbitrary location. The error from pyspark comes from a string of .cmd files that I traced to the last lines in spark-class2.cmd
This maybe silly, but altering the last block of code shown below changes the error message you get from pyspark from "The system cannot find the path specified" to "The syntax of the command is incorrect". Removing this whole block makes pyspark do nothing.
rem The launcher library prints the command to be executed in a single line suitable for being
rem executed by the batch interpreter. So read all the output of the launcher into a variable.
set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt
"%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" org.apache.spark.launcher.Main
%* > %LAUNCHER_OUTPUT%
for /f "tokens=*" %%i in (%LAUNCHER_OUTPUT%) do (
set SPARK_CMD=%%i
)
del %LAUNCHER_OUTPUT%
%SPARK_CMD%
I removed "del %LAUNCHER_OUTPUT%" and saw that the text file generated remains empty. Turns out "%RUNNER%" failed to find correct directory with java.exe because I messed up the PATH to Java (not Spark).
I know this is an old post, but I am adding my finding in case it helps anyone.
The issue is mainly due to the line source "${SPARK_HOME}"/bin/load-spark-env.sh in pyspark file. As you can see it's not expecting 'bin' in SPARK_HOME. All I had to do was remove 'bin' from my SPARK_HOME environment variable and it worked (C:\spark\spark-3.0.1-bin-hadoop2.7\bin to C:\spark\spark-3.0.1-bin-hadoop2.7\).
The error on Windows Command Prompt made it appear like it wasn't recognizing 'pyspark', while the real issue was with it not able to find the file 'load-spark-env.sh.'
if you use anaconda for window. The below command can save your time
conda install -c conda-forge pyspark
After that restart anaconda and start "jupyter notebook"

Getting "cat: /release: No such file or directory" when running scala

I tried to install Scala 2.12.1 on my Mac (El Capitan) via Homebrew and also by downloading the binaries from https://www.scala-lang.org/download/.
In both cases, whenever I run scala (or scalac, scaladoc, etc) I get this printed to screen:
cat: /release: No such file or directory
The action is successful however. E.g. scala myscript.scala works just fine, but that error message gets printed first.
Does anyone have an idea of why that's happening?
Opening up bin/scala, there's a line:
java_release="$(cat $JAVA_HOME/release | grep JAVA_VERSION)"
My $JAVA_HOME wasn't set. All fixed now.
I would add this as a comment but can't as of yet due to my newbie status. To specifically fix your $JAVA_HOME - if you're on UNIX you can copy and paste this in to your Terminal:
export JAVA_HOME="$(/usr/libexec/java_home -v 1.8)"
Replace the 1.8 with whatever version of Java you are currently working from.
Try setting your JAVA_HOME environment variable. In my case, JAVA_HOME was already set.
I got the error specifically
cat: /usr/lib/jvm/java-8-openjdk-amd64/release: No such file or directory
I got it fixed by creating an empty file release in it
sudo touch $JAVA_HOME/release
My JAVA_HOME looks like
$ echo $JAVA_HOME
/usr/lib/jvm/java-8-openjdk-amd64`
That's a hack, but it just works fine for me!
There doesn't seem to be any hard doing that also if you look at what does happen with this /release file while running scala; https://github.com/scala/scala/pull/5588/files.
Caution: This solution is only applicable when you don't have release file in place.

Installing hadoop issue

I am following Joseph Adler instructions on how to install ( page 555 here - http:// it-e
books. info/book/1014/ ) Hadoop on my lubuntu.
I wrote in terminal:
wget http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u4.tar.gz
tar xvfz hadoop-0.20.2-cdh3u4.tar.gz
and everything went fine, .tar.gz file was downloaded and then it was untarred.
But when I wrote
hadoop version
in the terminal, there appeared a message saying that there is no command hadoop.
Does anybody has an idea on what should I do to use (already) installed but (still) somehow invisible Hadoop?
Thanks for help!
In Linux invoking a command without prefixing its path requires the location where the command resides should be present the environment variable PATH.
Here, For executing the command you got to specify either absolute or relative path of the command. Following can be used, replace with the extracted location.
<EXTRACT_LOC_PATH>/hadoop-0.20.2-cdh3u4/bin/hadoop version
If your present working directory is /hadoop-0.20.2-cdh3u4/bin/ then ./hadoop version would be sufficient.
Whenever you are getting COMMAND NOT FOUND ERROR the problem will be there in .bashrc file only. You might not have properly set the JAVA_HOME, HADOOP_HOME and PATH Variable. So check it out whether you have given proper path for all these 3 variables or not.

How can I set my Cygwin PATH to find postgresql header and library path?

I am on Windows 7 and installed Cygwin and PostgresSql-8.4 on it. I have an open-source application written in C that requires to be build and for that, I am using Cygwin.
My problem is with setting the path for PostgreSql in Cygwin. As per the instruction that came with open-source, the build requires me to export path to postgreSql header and library path as follows:
export ENV_PG_INC_PATH=/usr/include/pgsql
export ENV_PG_LIB_PATH=/usr/lib/pgsql
I tried to export the same path in Windows using Cygwin as follows:
export ENV_PG_INC_PATH=$ENV_PG_INC_PATH:"/cygdrive/C/Program Files (x86)/PostgreSQL/8.4/include"
export ENV_PG_LIB_PATH=$ENV_PG_LIB_PATH:"/cygdrive/C/Program Files (x86)/PostgreSQL/8.4/lib"
But this doesn't seems to be working as when I try to access the dll's or any exe's inside these folders, it throws the error as follows:
-bash: _int.dll: command not found
I don't know what is it that I am doing wrong as I am new to Cygwin. Any help would be appreciated.
Thanks in advance.
You have to escape all spaces in file paths:
export ENV_PG_LIB_PATH=$ENV_PG_LIB_PATH:"/cygdrive/C/Program\ Files\ (x86)/PostgreSQL/8.4/lib"

Resources