Spark installation failed on cygwin - windows

I'm currently installing spark using cygwin terminal, I followed the steps indicated here, http://www.tutorialspoint.com/apache_spark/apache_spark_installation.htm, everything was fine until the last step..
I added the "export PATH=$PATH:/usr/local/spark/bin" line to ~/.bashrc file.
When I run the $spark-shell command, it gives me this error..
/usr/local/spark/bin/spark-class: line 86: C:\Program Files\Java\jdk1.7.0_75\bin/bin/java: No such file or directory
I tried searching for answers online but unfortunately nothing worked for me.
Please help. Thanks!

The problem is in:
/usr/local/spark/bin/spark-class: line 86: C:\Program Files\Java\jdk1.7.0_75\bin/bin/java: No such file or directory
You have two "bin" - one from Windows "\bin" and one from cygwin "/bin"
You can modify the JAVA_HOME environment variable - and set it to be C:\Program Files\Java\jdk1.7.0_75 (or the actual path of jdk)
I'd make a backup of /usr/local/spark/bin/spark-class and try to change the following line in /usr/local/spark/bin/spark-class
RUNNER="${JAVA_HOME}/bin/java
to be:
RUNNER="${JAVA_HOME}\java
And then run again the $spark-shell command

Related

mvn: command not found for Git Bash but Java and Maven path are ok using CMD

I try to understand what is the problem for git bash: mvn: command not found
I set path in this way:
C:\Program Files (x86)\Common Files\Oracle\Java\javapath;%SystemRoot%\system32;%SystemRoot%;%SystemRoot%\System32\Wbem;%SYSTEMROOT%\System32\WindowsPowerShell\v1.0\;%JAVA_HOME%\bin;%M2_HOME%\bin;C:\Program Files\Git\cmd;export PATH=$PATH:/c/Program\ Files/Java/jdk1.8.0_281/bin:/c/Program\ Files/apache-maven-3.8.1/bin
I check with cmd Maven and Java and seems to be ok
Solution by khmarbaise
I removed M2_HOME, MAVEN_HOME and I change path in this way
C:\Program Files (x86)\Common Files\Oracle\Java\javapath;%SystemRoot%\system32;%SystemRoot%;%SystemRoot%\System32\Wbem;%SYSTEMROOT%\System32\WindowsPowerShell\v1.0\;%JAVA_HOME%\bin;%M2_HOME%\bin;C:\Program Files\Git\cmd;C:\Program Files\apache-maven-3.8.1\bin;
I retype mvn --version and works correctly
In git bash, try
echo $PATH
See which of the equivalent entries from your windows paths is missing. Probably $M2_HOME/bin or $MAVEN_HOME/bin. Add those to your .bashrc (the file that tells git bash how to setup its environment) by adding a line like the following, including whatever seems to be missing:
export PATH=${PATH}:${MAVEN_HOME}/bin:${M2_HOME}/bin
To make it take effect, open a new git bash and try your mvn command again in there.

The system cannot find the path specified error while running pyspark

I just downloaded spark-2.3.0-bin-hadoop2.7.tgz. After downloading I followed the steps mentioned here pyspark installation for windows 10.I used the comment bin\pyspark to run the spark & got error message
The system cannot find the path specified
Attached is the screen shot of error message
Attached is the screen shot of my spark bin folder
Screen shot of my path variable looks like
I have python 3.6 & Java "1.8.0_151" in my windows 10 system
Can you suggest me how to resolve this issue?
Actually, the problem was with the JAVA_HOME environment variable path. The JAVA_HOME path was set to .../jdk/bin previously,
I stripped the last /bin part for JAVA_HOME while keeping it (/jdk/bin) in system or environment path variable (%path%) did the trick.
My problem was that the JAVA_HOME was pointing to JRE folder instead of JDK. Make sure that you take care of that
Worked hours and hours on this. My problem was with Java 10 installation. I uninstalled it and installed Java 8, and now Pyspark works.
Switching SPARK_HOME to C:\spark\spark-2.3.0-bin-hadoop2.7 and changing PATH to include %SPARK_HOME%\bin did the trick for me.
Originally my SPARK_HOME was set to C:\spark\spark-2.3.0-bin-hadoop2.7\bin and PATH was referencing it as %SPARK_HOME%.
Running a spark command directly in my SPARK_HOME dir worked but only once. After that initial success I then noticed your same error and that echo %SPARK_HOME% was showing C:\spark\spark-2.3.0-bin-hadoop2.7\bin\.. I thought perhaps spark-shell2.cmd had edited it in attempts to get itself working, which led me here.
For those who use Windows and still trying, what solved to me was reinstalling Python (3.9) as a local user (c:\Users\<user>\AppData\Local\Programs\Python) and defined both env variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON to c:\Users\<user>\AppData\Local\Programs\Python\python.exe
Fixing problems installing Pyspark (Windows)
Incorrect JAVA_HOME path
> pyspark
The system cannot find the path specified.
Open System Environment variables:
rundll32 sysdm.cpl,EditEnvironmentVariables
Set JAVA_HOME: System Variables > New:
Variable Name: JAVA_HOME
Variable Value: C:\Program Files\Java\jdk1.8.0_261
Also, check that SPARK_HOME and HADOOP_HOME are correctly set, e.g.:
SPARK_HOME=C:\Spark\spark-3.2.0-bin-hadoop3.2
HADOOP_HOME=C:\Spark\spark-3.2.0-bin-hadoop3.2
Important: Double-check the following
The path exists
The path does not contain the bin folder
Incorrect Java version
> pyspark
WARN SparkContext: Another SparkContext is being constructed
UserWarning: Failed to initialize Spark session.
java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.storage.StorageUtils$
Ensure that JAVA_HOME is set to Java 8 (jdk1.8.0)
winutils not installed
> pyspark
WARN Shell: Did not find winutils.exe
java.io.FileNotFoundException: Could not locate Hadoop executable
Download winutils.exe and copy it to your spark home bin folder
curl -OutFile C:\Spark\spark-3.2.0-bin-hadoop3.2\bin\winutils.exe -Uri https://github.com/steveloughran/winutils/raw/master/hadoop-3.0.0/bin/winutils.exe
Most likely you forgot to define the Windows environment variables such that the Spark bin directory is in your PATH environment variable.
Define the following environment variables using the usual methods for Windows.
First define an environment variable called SPARK_HOME to be C:\spark\spark-2.3.0-bin-hadoop2.7
Then either add %SPARK_HOME%\bin to your existing PATH environment variable, or if none exists (unlikely) define PATH to be %SPARK_HOME%\bin
If there is no typo specifying the PATH,
echo %PATH% should give you the fully resolved path to the Spark bin directory i.e. it should look like
C:\spark\spark-2.3.0-bin-hadoop2.7\bin;
If PATH is correct, you should be able to type pyspark in any directory and it should run.
If this does not resolve the issue perhaps the issue is as specified in pyspark: The system cannot find the path specified in which case this question is a duplicate.
Update: in my case it came down to wrong path for JAVA, I got it to work...
I'm having the same problem. I initially installed Spark through pip, and pyspark ran successfully. Then I started messing with Anaconda updates and it never worked again. Any help will be appreciated...
I'm assuming PATH is installed correctly for the original author. A way to check that is to run spark-class from command prompt. With correct PATH it will return Usage: spark-class <class> [<args>] when ran from an arbitrary location. The error from pyspark comes from a string of .cmd files that I traced to the last lines in spark-class2.cmd
This maybe silly, but altering the last block of code shown below changes the error message you get from pyspark from "The system cannot find the path specified" to "The syntax of the command is incorrect". Removing this whole block makes pyspark do nothing.
rem The launcher library prints the command to be executed in a single line suitable for being
rem executed by the batch interpreter. So read all the output of the launcher into a variable.
set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt
"%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" org.apache.spark.launcher.Main
%* > %LAUNCHER_OUTPUT%
for /f "tokens=*" %%i in (%LAUNCHER_OUTPUT%) do (
set SPARK_CMD=%%i
)
del %LAUNCHER_OUTPUT%
%SPARK_CMD%
I removed "del %LAUNCHER_OUTPUT%" and saw that the text file generated remains empty. Turns out "%RUNNER%" failed to find correct directory with java.exe because I messed up the PATH to Java (not Spark).
I know this is an old post, but I am adding my finding in case it helps anyone.
The issue is mainly due to the line source "${SPARK_HOME}"/bin/load-spark-env.sh in pyspark file. As you can see it's not expecting 'bin' in SPARK_HOME. All I had to do was remove 'bin' from my SPARK_HOME environment variable and it worked (C:\spark\spark-3.0.1-bin-hadoop2.7\bin to C:\spark\spark-3.0.1-bin-hadoop2.7\).
The error on Windows Command Prompt made it appear like it wasn't recognizing 'pyspark', while the real issue was with it not able to find the file 'load-spark-env.sh.'
if you use anaconda for window. The below command can save your time
conda install -c conda-forge pyspark
After that restart anaconda and start "jupyter notebook"

How do I add a file to a directory in terminal?

I am trying to add files to a directory that I created, and I attempted to use
cvs add filename
but that did not work, as I got the error message:
-bash: cvs: command not found
How do I fix this and be able to add files to a directory?
It's either cvs has not been installed or not found in the environment path variable e.g. $PATH. If you're sure that you have installed cvs successfully, try to execute the cvs command by using its absolute path.

Vagrant up throws error

Hello guys I'm trying to run the command vagrant up but it is throwing me an error. It was working day before yesterday and since yesterday it has been throwing me this error
The directory where plugins are installed (the Vagrant home directory)
has a space in it. On Windows, there is a bug in Ruby when compiling
plugins into directories with spaces. Please move your Vagrant home
directory to a path without spaces and try again.
Now I have googled it and what I got was there may be a space in the directory and yes those answers are right there is a space in some of the paths that are in the PATH variable but the first question that I want to know that
Why was it running it all the days before yestarday
and the second question is
Which path do I have to correct since I personally never did it's installation the system I got had it configuration. Here are the paths that are in my path variable :
C:\ProgramData\Oracle\Java\javapath;
C:\Windows\system32;C:\Windows;
C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;
C:\Program Files\Git\cmd;
C:\Program Files (x86)\Ampps\php;
C:\ProgramData\ComposerSetup\bin;
C:\Program Files (x86)\Skype\Phone\;
D:\Ampps\mysql\bin;D:\Ampps\php;
C:\HashiCorp\Vagrant\bin;
C:\Program Files (x86)\nodejs\;
C:\Users\TBox Solutions\AppData\Local\atom\bin;
C:\Users\TBox Solutions\AppData\Roaming\Composer\vendor\bin;
C:\Users\TBox Solutions\AppData\Local\.meteor\;
C:\adt-bundle-windows-x86_64-20140702\sdk\platform-tools;
C:\adt-bundle-windows-x86_64-20140702\sdk\tools;
C:\Program Files\Java\jdk1.8.0_77\bin;
C:\Users\TBox Solutions\AppData\Local\Spoon\Cmd;
C:\Users\TBox Solutions\AppData\Roaming\npm
The message says:
The directory where plugins are installed (the Vagrant home directory)
has a space in it. On Windows, there is a bug in Ruby when compiling
plugins into directories with spaces. Please move your Vagrant home
directory to a path without spaces and try again.
You must to change your user name without spaces in it. for example TBox_Solutions.
Create a new user to test it.

Installing hadoop issue

I am following Joseph Adler instructions on how to install ( page 555 here - http:// it-e
books. info/book/1014/ ) Hadoop on my lubuntu.
I wrote in terminal:
wget http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u4.tar.gz
tar xvfz hadoop-0.20.2-cdh3u4.tar.gz
and everything went fine, .tar.gz file was downloaded and then it was untarred.
But when I wrote
hadoop version
in the terminal, there appeared a message saying that there is no command hadoop.
Does anybody has an idea on what should I do to use (already) installed but (still) somehow invisible Hadoop?
Thanks for help!
In Linux invoking a command without prefixing its path requires the location where the command resides should be present the environment variable PATH.
Here, For executing the command you got to specify either absolute or relative path of the command. Following can be used, replace with the extracted location.
<EXTRACT_LOC_PATH>/hadoop-0.20.2-cdh3u4/bin/hadoop version
If your present working directory is /hadoop-0.20.2-cdh3u4/bin/ then ./hadoop version would be sufficient.
Whenever you are getting COMMAND NOT FOUND ERROR the problem will be there in .bashrc file only. You might not have properly set the JAVA_HOME, HADOOP_HOME and PATH Variable. So check it out whether you have given proper path for all these 3 variables or not.

Resources