Installing hadoop issue - hadoop

I am following Joseph Adler instructions on how to install ( page 555 here - http:// it-e
books. info/book/1014/ ) Hadoop on my lubuntu.
I wrote in terminal:
wget http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u4.tar.gz
tar xvfz hadoop-0.20.2-cdh3u4.tar.gz
and everything went fine, .tar.gz file was downloaded and then it was untarred.
But when I wrote
hadoop version
in the terminal, there appeared a message saying that there is no command hadoop.
Does anybody has an idea on what should I do to use (already) installed but (still) somehow invisible Hadoop?
Thanks for help!

In Linux invoking a command without prefixing its path requires the location where the command resides should be present the environment variable PATH.
Here, For executing the command you got to specify either absolute or relative path of the command. Following can be used, replace with the extracted location.
<EXTRACT_LOC_PATH>/hadoop-0.20.2-cdh3u4/bin/hadoop version
If your present working directory is /hadoop-0.20.2-cdh3u4/bin/ then ./hadoop version would be sufficient.

Whenever you are getting COMMAND NOT FOUND ERROR the problem will be there in .bashrc file only. You might not have properly set the JAVA_HOME, HADOOP_HOME and PATH Variable. So check it out whether you have given proper path for all these 3 variables or not.

Related

WSL Not able to find file or directory due to space

The error I was originally getting was that wsl was not able to find JAVA_HOME. After I ran the command
export JAVA_HOME="/mnt/c/Program Files/JAVA/jdk-15.0.2"
And now the error it gives me is:
ERROR: JAVA_HOME is set to an invalid directory: /mnt/c/Program Files/Java/jdk-15.0.2
Please set the JAVA_HOME variable in your environment to match the
location of your Java installation.
When I run
${JAVA_HOME}
to check the variable I get the response
bash: /mnt/c/Program: No such file or directory
Which I believe is due to the space in the file name. Online it said that the space shouldn't be an issue as it is enclosed in quotes so I don't know what to do here.
Any help would be appreciated!
It looks like you are trying to use the Windows version of Java from within WSL. That should be possible, but you are currently exporting a Linux-style path, which the Windows version won't handle (as you can see).
If you have both the Windows and Linux version of Java installed, then see this answer for some related information. The question there is about npm, but the core issue is the same -- The Windows version is getting picked up in the path before the Linux version.
If you just have the Windows version, then at least modify the JAVA_HOME to be 'C:\Program Files\JAVA\jdk-15.0.2' (watch out for potential quoting issues with backslashes in the Linux-shell string, though). I'm not sure that's going to take care of all of your issues -- I've never tried running the Windows Java version through WSL myself. But it's at least the first step you're going to need to take to get past the current error.
The second error when you just execute ${JAVA_HOME} is to be expected, as you are trying to execute this directory (with a space) as a command. The shell is interpreting the portion before the space as a command, and the portion after the space as the argument. If you were to set it to a directory without a space, you'd still get an error message when trying to execute it (as you are now), just that it would be something like bash: /mnt/c: Is a directory.
If you just want to check it, use echo ${JAVA_HOME}.

HADOOP_HOME is not set correctly

I downloaded the binary tarball of hadoop from here: http://hadoop.apache.org/releases.html (ver 2.8.4). I unpacked the tar.gz file and then changed the etc/hadoop-env.sh from
export JAVA_HOME={$JAVA_HOME}
to my java jdk locaction:
export JAVA_HOME=C:\Program Files\Java\jdk1.8.0_131
I also added these two lines:
export HADOOP_HOME=D:/hadoop/hadoop-2.8.4
export PATH=$PATH:$HADOOP_HOME/bin
But when i try to run
$ hadoop version
from cmd i get an error message that says
Error: HADOOP_HOME is not set correctly
What did I do wrong and how should I change the hadoop_home path for it to work?
Other than {$JAVA_HOME} has the dollar sign in the wrong spot (needs to be outside the brackets), Windows doesn't run the shell script to locate your variables
You need to set environment variables in Windows from the Control Panel. And you also need to remove all spaces from the file path of "Program Files"
Its not clear if you're using Cygwin or using Windows Linux subsystem, but it's different from the native CMD
Set the path HADOOP_HOME Environment variable as below:
export HADOOP_HOME=D:\hadoop\hadoop-2.8.4
export PATH=$PATH:$HADOOP_HOME\bin
$ hadoop version
It will work
I come across such error when I try to use hadoop-3.3.1, the latest version. I have searched a lot about "HADOOP_HOME not correctly set" and there are no useful results.
But after I downgrade to hadoop-3.2.2, this error disappears.
I think you can try the non-latest version again.

The system cannot find the path specified error while running pyspark

I just downloaded spark-2.3.0-bin-hadoop2.7.tgz. After downloading I followed the steps mentioned here pyspark installation for windows 10.I used the comment bin\pyspark to run the spark & got error message
The system cannot find the path specified
Attached is the screen shot of error message
Attached is the screen shot of my spark bin folder
Screen shot of my path variable looks like
I have python 3.6 & Java "1.8.0_151" in my windows 10 system
Can you suggest me how to resolve this issue?
Actually, the problem was with the JAVA_HOME environment variable path. The JAVA_HOME path was set to .../jdk/bin previously,
I stripped the last /bin part for JAVA_HOME while keeping it (/jdk/bin) in system or environment path variable (%path%) did the trick.
My problem was that the JAVA_HOME was pointing to JRE folder instead of JDK. Make sure that you take care of that
Worked hours and hours on this. My problem was with Java 10 installation. I uninstalled it and installed Java 8, and now Pyspark works.
Switching SPARK_HOME to C:\spark\spark-2.3.0-bin-hadoop2.7 and changing PATH to include %SPARK_HOME%\bin did the trick for me.
Originally my SPARK_HOME was set to C:\spark\spark-2.3.0-bin-hadoop2.7\bin and PATH was referencing it as %SPARK_HOME%.
Running a spark command directly in my SPARK_HOME dir worked but only once. After that initial success I then noticed your same error and that echo %SPARK_HOME% was showing C:\spark\spark-2.3.0-bin-hadoop2.7\bin\.. I thought perhaps spark-shell2.cmd had edited it in attempts to get itself working, which led me here.
For those who use Windows and still trying, what solved to me was reinstalling Python (3.9) as a local user (c:\Users\<user>\AppData\Local\Programs\Python) and defined both env variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON to c:\Users\<user>\AppData\Local\Programs\Python\python.exe
Fixing problems installing Pyspark (Windows)
Incorrect JAVA_HOME path
> pyspark
The system cannot find the path specified.
Open System Environment variables:
rundll32 sysdm.cpl,EditEnvironmentVariables
Set JAVA_HOME: System Variables > New:
Variable Name: JAVA_HOME
Variable Value: C:\Program Files\Java\jdk1.8.0_261
Also, check that SPARK_HOME and HADOOP_HOME are correctly set, e.g.:
SPARK_HOME=C:\Spark\spark-3.2.0-bin-hadoop3.2
HADOOP_HOME=C:\Spark\spark-3.2.0-bin-hadoop3.2
Important: Double-check the following
The path exists
The path does not contain the bin folder
Incorrect Java version
> pyspark
WARN SparkContext: Another SparkContext is being constructed
UserWarning: Failed to initialize Spark session.
java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.storage.StorageUtils$
Ensure that JAVA_HOME is set to Java 8 (jdk1.8.0)
winutils not installed
> pyspark
WARN Shell: Did not find winutils.exe
java.io.FileNotFoundException: Could not locate Hadoop executable
Download winutils.exe and copy it to your spark home bin folder
curl -OutFile C:\Spark\spark-3.2.0-bin-hadoop3.2\bin\winutils.exe -Uri https://github.com/steveloughran/winutils/raw/master/hadoop-3.0.0/bin/winutils.exe
Most likely you forgot to define the Windows environment variables such that the Spark bin directory is in your PATH environment variable.
Define the following environment variables using the usual methods for Windows.
First define an environment variable called SPARK_HOME to be C:\spark\spark-2.3.0-bin-hadoop2.7
Then either add %SPARK_HOME%\bin to your existing PATH environment variable, or if none exists (unlikely) define PATH to be %SPARK_HOME%\bin
If there is no typo specifying the PATH,
echo %PATH% should give you the fully resolved path to the Spark bin directory i.e. it should look like
C:\spark\spark-2.3.0-bin-hadoop2.7\bin;
If PATH is correct, you should be able to type pyspark in any directory and it should run.
If this does not resolve the issue perhaps the issue is as specified in pyspark: The system cannot find the path specified in which case this question is a duplicate.
Update: in my case it came down to wrong path for JAVA, I got it to work...
I'm having the same problem. I initially installed Spark through pip, and pyspark ran successfully. Then I started messing with Anaconda updates and it never worked again. Any help will be appreciated...
I'm assuming PATH is installed correctly for the original author. A way to check that is to run spark-class from command prompt. With correct PATH it will return Usage: spark-class <class> [<args>] when ran from an arbitrary location. The error from pyspark comes from a string of .cmd files that I traced to the last lines in spark-class2.cmd
This maybe silly, but altering the last block of code shown below changes the error message you get from pyspark from "The system cannot find the path specified" to "The syntax of the command is incorrect". Removing this whole block makes pyspark do nothing.
rem The launcher library prints the command to be executed in a single line suitable for being
rem executed by the batch interpreter. So read all the output of the launcher into a variable.
set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt
"%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" org.apache.spark.launcher.Main
%* > %LAUNCHER_OUTPUT%
for /f "tokens=*" %%i in (%LAUNCHER_OUTPUT%) do (
set SPARK_CMD=%%i
)
del %LAUNCHER_OUTPUT%
%SPARK_CMD%
I removed "del %LAUNCHER_OUTPUT%" and saw that the text file generated remains empty. Turns out "%RUNNER%" failed to find correct directory with java.exe because I messed up the PATH to Java (not Spark).
I know this is an old post, but I am adding my finding in case it helps anyone.
The issue is mainly due to the line source "${SPARK_HOME}"/bin/load-spark-env.sh in pyspark file. As you can see it's not expecting 'bin' in SPARK_HOME. All I had to do was remove 'bin' from my SPARK_HOME environment variable and it worked (C:\spark\spark-3.0.1-bin-hadoop2.7\bin to C:\spark\spark-3.0.1-bin-hadoop2.7\).
The error on Windows Command Prompt made it appear like it wasn't recognizing 'pyspark', while the real issue was with it not able to find the file 'load-spark-env.sh.'
if you use anaconda for window. The below command can save your time
conda install -c conda-forge pyspark
After that restart anaconda and start "jupyter notebook"

Getting "cat: /release: No such file or directory" when running scala

I tried to install Scala 2.12.1 on my Mac (El Capitan) via Homebrew and also by downloading the binaries from https://www.scala-lang.org/download/.
In both cases, whenever I run scala (or scalac, scaladoc, etc) I get this printed to screen:
cat: /release: No such file or directory
The action is successful however. E.g. scala myscript.scala works just fine, but that error message gets printed first.
Does anyone have an idea of why that's happening?
Opening up bin/scala, there's a line:
java_release="$(cat $JAVA_HOME/release | grep JAVA_VERSION)"
My $JAVA_HOME wasn't set. All fixed now.
I would add this as a comment but can't as of yet due to my newbie status. To specifically fix your $JAVA_HOME - if you're on UNIX you can copy and paste this in to your Terminal:
export JAVA_HOME="$(/usr/libexec/java_home -v 1.8)"
Replace the 1.8 with whatever version of Java you are currently working from.
Try setting your JAVA_HOME environment variable. In my case, JAVA_HOME was already set.
I got the error specifically
cat: /usr/lib/jvm/java-8-openjdk-amd64/release: No such file or directory
I got it fixed by creating an empty file release in it
sudo touch $JAVA_HOME/release
My JAVA_HOME looks like
$ echo $JAVA_HOME
/usr/lib/jvm/java-8-openjdk-amd64`
That's a hack, but it just works fine for me!
There doesn't seem to be any hard doing that also if you look at what does happen with this /release file while running scala; https://github.com/scala/scala/pull/5588/files.
Caution: This solution is only applicable when you don't have release file in place.

javaerror while installing the hive

I want to install Hive and hadoop on my ubuntu.I followed this article all of things seems good but the end step when I write this command an error about Java appear like this:
/home/babak/Downloads/hadoop/bin/../bin hadoop: row 258:/usr/lib/j2sdk1.5-sun/bin/java: file or Folder not found
what should i do to solve this problem?
You need to find where on your machine java is installed:
which java
and then from there follow any symlinks or wrapper scripts to the actual location of the java executable.
An easier way to do this is to run the file indexer and then locate the file (here i use the jps executable, which is in the same folder as java:
#> sudo updatedb
#> locate jps
Whatever you get back, trim off the bin/jps suffix, and that's your JAVA_HOME value. If you can't find the executable, than you'll need to install java
Hadoop requires Java version 1.6 or higher. It seems like hadoop is looking for Java 1.5. Also, make sure the variable HADOOP_HOME is set in file /conf/hadoop-env.sh
I have a line like the following in mine:
export JAVA_HOME=/usr/lib/jvm/java-6-sun/

Resources