How to set up Spark on Windows? - windows

I am trying to setup Apache Spark on Windows.
After searching a bit, I understand that the standalone mode is what I want.
Which binaries do I download in order to run Apache spark in windows? I see distributions with hadoop and cdh at the spark download page.
I don't have references in web to this. A step by step guide to this is highly appreciated.

Steps to install Spark in local mode:
Install Java 7 or later.
To test java installation is complete, open command prompt type java and hit enter.
If you receive a message 'Java' is not recognized as an internal or external command. You need to configure your environment variables, JAVA_HOME and PATH to point to the path of jdk.
Download and install Scala.
Set SCALA_HOME in Control Panel\System and Security\System goto "Adv System settings" and add %SCALA_HOME%\bin in PATH variable in environment variables.
Install Python 2.6 or later from Python Download link.
Download SBT. Install it and set SBT_HOME as an environment variable with value as <<SBT PATH>>.
Download winutils.exe from HortonWorks repo or git repo. Since we don't have a local Hadoop installation on Windows we have to download winutils.exe and place it in a bin directory under a created Hadoop home directory.
Set HADOOP_HOME = <<Hadoop home directory>> in environment variable.
We will be using a pre-built Spark package, so choose a Spark pre-built package for Hadoop Spark download. Download and extract it.
Set SPARK_HOME and add %SPARK_HOME%\bin in PATH variable in environment variables.
Run command: spark-shell
Open http://localhost:4040/ in a browser to see the SparkContext web UI.

I found the easiest solution on Windows is to build from source.
You can pretty much follow this guide: http://spark.apache.org/docs/latest/building-spark.html
Download and install Maven, and set MAVEN_OPTS to the value specified in the guide.
But if you're just playing around with Spark, and don't actually need it to run on Windows for any other reason that your own machine is running Windows, I'd strongly suggest you install Spark on a linux virtual machine. The simplest way to get started probably is to download the ready-made images made by Cloudera or Hortonworks, and either use the bundled version of Spark, or install your own from source or the compiled binaries you can get from the spark website.

You can download spark from here:
http://spark.apache.org/downloads.html
I recommend you this version: Hadoop 2 (HDP2, CDH5)
Since version 1.0.0 there are .cmd scripts to run spark in windows.
Unpack it using 7zip or similar.
To start you can execute /bin/spark-shell.cmd --master local[2]
To configure your instance, you can follow this link: http://spark.apache.org/docs/latest/

You can use following ways to setup Spark:
Building from Source
Using prebuilt release
Though there are various ways to build Spark from Source.
First I tried building Spark source with SBT but that requires hadoop. To avoid those issues, I used pre-built release.
Instead of Source,I downloaded Prebuilt release for hadoop 2.x version and ran it.
For this you need to install Scala as prerequisite.
I have collated all steps here :
How to run Apache Spark on Windows7 in standalone mode
Hope it'll help you..!!!

Trying to work with spark-2.x.x, building Spark source code didn't work for me.
So, although I'm not going to use Hadoop, I downloaded the pre-built Spark with hadoop embeded : spark-2.0.0-bin-hadoop2.7.tar.gz
Point SPARK_HOME on the extracted directory, then add to PATH: ;%SPARK_HOME%\bin;
Download the executable winutils from the Hortonworks repository, or from Amazon AWS platform winutils.
Create a directory where you place the executable winutils.exe. For example, C:\SparkDev\x64. Add the environment variable %HADOOP_HOME% which points to this directory, then add %HADOOP_HOME%\bin to PATH.
Using command line, create the directory:
mkdir C:\tmp\hive
Using the executable that you downloaded, add full permissions to the file directory you created but using the unixian formalism:
%HADOOP_HOME%\bin\winutils.exe chmod 777 /tmp/hive
Type the following command line:
%SPARK_HOME%\bin\spark-shell
Scala command line input should be shown automatically.
Remark : You don't need to configure Scala separately. It's built-in too.

Here's the fixes to get it to run in Windows without rebuilding everything - such as if you do not have a recent version of MS-VS. (You will need a Win32 C++ compiler, but you can install MS VS Community Edition free.)
I've tried this with Spark 1.2.2 and mahout 0.10.2 as well as with the latest versions in November 2015. There are a number of problems including the fact that the Scala code tries to run a bash script (mahout/bin/mahout) which does not work of course, the sbin scripts have not been ported to windows, and the winutils are missing if hadoop is not installed.
(1) Install scala, then unzip spark/hadoop/mahout into the root of C: under their respective product names.
(2) Rename \mahout\bin\mahout to mahout.sh.was (we will not need it)
(3) Compile the following Win32 C++ program and copy the executable to a file named C:\mahout\bin\mahout (that's right - no .exe suffix, like a Linux executable)
#include "stdafx.h"
#define BUFSIZE 4096
#define VARNAME TEXT("MAHOUT_CP")
int _tmain(int argc, _TCHAR* argv[]) {
DWORD dwLength; LPTSTR pszBuffer;
pszBuffer = (LPTSTR)malloc(BUFSIZE*sizeof(TCHAR));
dwLength = GetEnvironmentVariable(VARNAME, pszBuffer, BUFSIZE);
if (dwLength > 0) { _tprintf(TEXT("%s\n"), pszBuffer); return 0; }
return 1;
}
(4) Create the script \mahout\bin\mahout.bat and paste in the content below, although the exact names of the jars in the _CP class paths will depend on the versions of spark and mahout. Update any paths per your installation. Use 8.3 path names without spaces in them. Note that you cannot use wildcards/asterisks in the classpaths here.
set SCALA_HOME=C:\Progra~2\scala
set SPARK_HOME=C:\spark
set HADOOP_HOME=C:\hadoop
set MAHOUT_HOME=C:\mahout
set SPARK_SCALA_VERSION=2.10
set MASTER=local[2]
set MAHOUT_LOCAL=true
set path=%SCALA_HOME%\bin;%SPARK_HOME%\bin;%PATH%
cd /D %SPARK_HOME%
set SPARK_CP=%SPARK_HOME%\conf\;%SPARK_HOME%\lib\xxx.jar;...other jars...
set MAHOUT_CP=%MAHOUT_HOME%\lib\xxx.jar;...other jars...;%MAHOUT_HOME%\xxx.jar;...other jars...;%SPARK_CP%;%MAHOUT_HOME%\lib\spark\xxx.jar;%MAHOUT_HOME%\lib\hadoop\xxx.jar;%MAHOUT_HOME%\src\conf;%JAVA_HOME%\lib\tools.jar
start "master0" "%JAVA_HOME%\bin\java" -cp "%SPARK_CP%" -Xms1g -Xmx1g org.apache.spark.deploy.master.Master --ip localhost --port 7077 --webui-port 8082 >>out-master0.log 2>>out-master0.err
start "worker1" "%JAVA_HOME%\bin\java" -cp "%SPARK_CP%" -Xms1g -Xmx1g org.apache.spark.deploy.worker.Worker spark://localhost:7077 --webui-port 8083 >>out-worker1.log 2>>out-worker1.err
...you may add more workers here...
cd /D %MAHOUT_HOME%
"%JAVA_HOME%\bin\java" -Xmx4g -classpath "%MAHOUT_CP%" "org.apache.mahout.sparkbindings.shell.Main"
The name of the variable MAHOUT_CP should not be changed, as it is referenced in the C++ code.
Of course you can comment-out the code that launches the Spark master and worker because Mahout will run Spark as-needed; I just put it in the batch job to show you how to launch it if you wanted to use Spark without Mahout.
(5) The following tutorial is a good place to begin:
https://mahout.apache.org/users/sparkbindings/play-with-shell.html
You can bring up the Mahout Spark instance at:
"C:\Program Files (x86)\Google\Chrome\Application\chrome" --disable-web-security http://localhost:4040

The guide by Ani Menon (thx!) almost worked for me on windows 10, i just had to get a newer winutils.exe off that git (currently hadoop-2.8.1): https://github.com/steveloughran/winutils

Here are seven steps to install spark on windows 10 and run it from python:
Step 1: download the spark 2.2.0 tar (tape Archive) gz file to any folder F from this link - https://spark.apache.org/downloads.html. Unzip it and copy the unzipped folder to the desired folder A. Rename the spark-2.2.0-bin-hadoop2.7 folder to spark.
Let path to the spark folder be C:\Users\Desktop\A\spark
Step 2: download the hardoop 2.7.3 tar gz file to the same folder F from this link - https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz. Unzip it and copy the unzipped folder to the same folder A. Rename the folder name from Hadoop-2.7.3.tar to hadoop.
Let path to the hadoop folder be C:\Users\Desktop\A\hadoop
Step 3: Create a new notepad text file. Save this empty notepad file as winutils.exe (with Save as type: All files). Copy this O KB winutils.exe file to your bin folder in spark - C:\Users\Desktop\A\spark\bin
Step 4: Now, we have to add these folders to the System environment.
4a: Create a system variable (not user variable as user variable will inherit all the properties of the system variable) Variable name: SPARK_HOME
Variable value: C:\Users\Desktop\A\spark
Find Path system variable and click edit. You will see multiple paths. Do not delete any of the paths. Add this variable value - ;C:\Users\Desktop\A\spark\bin
4b: Create a system variable
Variable name: HADOOP_HOME
Variable value: C:\Users\Desktop\A\hadoop
Find Path system variable and click edit. Add this variable value - ;C:\Users\Desktop\A\hadoop\bin
4c: Create a system variable Variable name: JAVA_HOME
Search Java in windows. Right click and click open file location. You will have to again right click on any one of the java files and click on open file location. You will be using the path of this folder. OR you can search for C:\Program Files\Java. My Java version installed on the system is jre1.8.0_131.
Variable value: C:\Program Files\Java\jre1.8.0_131\bin
Find Path system variable and click edit. Add this variable value - ;C:\Program Files\Java\jre1.8.0_131\bin
Step 5: Open command prompt and go to your spark bin folder (type cd C:\Users\Desktop\A\spark\bin). Type spark-shell.
C:\Users\Desktop\A\spark\bin>spark-shell
It may take time and give some warnings. Finally, it will show
welcome to spark version 2.2.0
Step 6: Type exit() or restart the command prompt and go the spark bin folder again. Type pyspark:
C:\Users\Desktop\A\spark\bin>pyspark
It will show some warnings and errors but ignore. It works.
Step 7: Your download is complete. If you want to directly run spark from python shell then:
go to Scripts in your python folder and type
pip install findspark
in command prompt.
In python shell
import findspark
findspark.init()
import the necessary modules
from pyspark import SparkContext
from pyspark import SparkConf
If you would like to skip the steps for importing findspark and initializing it, then please follow the procedure given in
importing pyspark in python shell

Here is a simple minimum script to run from any python console.
It assumes that you have extracted the Spark libraries that you have downloaded into C:\Apache\spark-1.6.1.
This works in Windows without building anything and solves problems where Spark would complain about recursive pickling.
import sys
import os
spark_home = 'C:\Apache\spark-1.6.1'
sys.path.insert(0, os.path.join(spark_home, 'python'))
sys.path.insert(0, os.path.join(spark_home, 'python\lib\pyspark.zip'))
sys.path.insert(0, os.path.join(spark_home, 'python\lib\py4j-0.9-src.zip'))
# Start a spark context:
sc = pyspark.SparkContext()
#
lines = sc.textFile(os.path.join(spark_home, "README.md")
pythonLines = lines.filter(lambda line: "Python" in line)
pythonLines.first()

Cloudera and Hortonworks are the best tools to start up with the
HDFS in Microsoft Windows. You can also use VMWare or VBox to initiate Virtual Machine to establish build to your HDFS and Spark, Hive, HBase, Pig, Hadoop with Scala, R, Java, Python.

Related

Maven isn't installing properly

I've tried everything I could find on this topic, yet I'm not able to install Maven.
I'm at the following point:
I have java installed
I unzipped the files from the apache website
I have set up environment variables and added requirement parameters to the path (had to use the escape character in the path, because of the space in Program Files folder name: C:\Program^ Files\apache-maven-3.6.3)
what could be the problem?
cmd
From the attached image all requirements are OK. Try one of those two solutions:
Close the CMD window and reopen it (if you didn't this already).
Restart your computer in in order to apply the environment variables you've just added.

How to Setup Spark on Windows 10, Step by Step

I was trying setup spark on windows 10, found lot of good solutions on the stack overflow. So, i am trying to combine all the solutions and create standardized steps of installation
For installation first you need to download following:
JAVA JDK - http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
SBT and Scala - https://www.scala-lang.org/download/
Winutils.exe - https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1
Spark - https://spark.apache.org/downloads.html
After downloading is completed
Installing and setting up java
When java installation completed, then:
Create folder BigData under C:\
Copy “Java” folder from "C:\Program Files\" --> "C:\"
Then create Environment Variables with name “JAVA_HOME”.
Advance System Settings --> Environment Variables --> Click on New button
Variable Name: JAVA_HOME
Variable Value: C:\Java\jdk1.8.0_181
Add bin to "Path", go to Advance System Settings-->Environment Variables-->Click on Path --> Click on new --> Write
%JAVA_HOME%\bin
Installing and setting up sbt and scala
Install sbt and scala, under the folder C:\Bigdata, after installation is done with sbt and scala
Create Environment Variable with name “SCALA_HOME”.
Advance System Settings --> Environment Variables --> Click on New button
Variable Name: SCALA_HOME
Variable Value: C:\BigData\scala
Add bin to "Path", go to Advance System Settings-->Environment Variables-->Click on Path --> Click on new --> Write,
%SCALA_HOME%\bin
Setting up Hadoop libraries for windows
Download zip from the mentioned git link above, then unzip the downloaded file from git and then, copy the winutils.exe from the “winutils-master\hadoop-2.7.1\bin” folder to C:\Bigdata\hadoop\bin
Create Environment Variable with name "HADOOP_HOME", Advance Settings --> Environment Variables --> Click on New
Variable Name: HADOOP_HOME
Variable Value: C:\BigData\hadoop
Add bin to "Path", go to Advance Settings --> Environment Variables --> Click on Path--> Click on New , and write
%HADOOP_HOME%\bin
Installing and setting up spark
Extract the downloaded package of spark and then copy the folder to C:\Bigdata\, and rename the copied folder to "spark".
Create Environment Variable with name "SPARK_HOME",
Advance Settings --> Environment Variables --> Click on New -->
Variable Name : SPARK_HOME
Variable Value: C:\BigData\spark
Add bin to Path, Advance Settings --> Environment Variables --> Click on Path --> Click on New --> Write
%SPARK_HOME%\bin
Now create /tmp/hive directory under C:\, and set the permissions by following commands:
open cmd prompt:
mkdir c:\tmp
mkdir c:\tmp\hive
winutils chmod 777 /tmp/hive
Now, setup is completed.
goto cmd prompt and type "spark-shell", to run the spark.
Some things that I faced, and it was giving issue:
Your Computer Name should not contain underscore, that was giving me error.
Java JDK need to be installed and version should be Java 1.8.0_181
Multiple Java version configured, and that was giving me issue, there should be only one java version need to be configure.

Error installing WebLogic server using Console mode in windows 8.1

Hi i have been trying to install the server weblogic of oracle on windows 8.1 but I obtain the follow when I run the configure.cmd file:
ERROR: You must set MW_HOME and it must point to a directory where an
installation of WebLogic exists. Ensure you point this variable to the
extract location of the zip distribution.
How do I correct this error?
There's a readme file linked from the product download page http://www.oracle.com/technetwork/middleware/weblogic/downloads/wls-main-097127.html although your experience would suggest that defining MW_HOME isn't optional! ...
1. Extract the contents of the zip to a directory (eg: /home/myhome/mywls)
This will create a base directory named wls12130 under /home/myhome/mywls
MW_HOME will be the entire directory including the base directory.
(eg: MW_HOME will be /home/myhome/mywls/wls12130).
2. Setup JAVA_HOME and optionally, MW_HOME variables in the current shell as required
for the target platform.
Windows
> set JAVA_HOME=C:\home\myhome\myjavahome
> set MW_HOME=C:\home\myhome\mywls\wls12130
3. Run the installation configuration script in the MW_HOME directory.
This step is required to be run only once. If you move the installation to
another location/machine, you need to rerun this step.
Windows
> configure.cmd
Environment variables are not set properly.
1.- Create product directory
mkdir E:\weblogic\wls << I'm sure you did it and Weblogic binaries are already installed.
2.- set environment variables properly
set JAVA_HOME=_path_to_\jdk1.7.0
set MW_HOME=E:\weblogic\wls
(change above settings according to your installation)
3.- Run configure.cmd

hadoop 1.1.2 installation on windows

I am trying to install hadoop 1.1.2 on window machine with cygwin.
From on-line videos and tutorials, I have set up all most everything.
Now problem is when i try to create folder with commands
cd /usr/local/hadoop-1.1.2/bin --> this works proper and set proper path then
./hadoop dfs -mkdir input --> when this execute i get error
in error i get issue related to JAVA_HOME path is not set proper also show me text like /Java/jre7/bin/bin/java which looks wrong path.
but i have set JAVA_HOME path properly its here,
I have set same path with /bin in path variable.
I don't know where i have made mistake.
EDIT
full Error
./hadoop: line 320: C:/Java/jre7/bin/bin/java: No such file or directory
./hadoop: line 390: C:/Java/jre7/bin/bin/java: No such file or directory
./hadoop: line 390: exec: C:/Java/jre7/bin/bin/java: cannot execute: No such file or directory
problems with its solution
1. JAVA PATH ISSUE
first is JAVA_PATH issue
Note: Have to use JDK not JRE.
For Hadoop try to use folder name without space
In Environment variable
JAVA_HOME = C:\Java\jdk1.7.0_25
In path variable add below code with othres with ;sepration
%JAVA_HOME%\bin
In hadoop-env.sh file (you can find this file from C:\cygwin\usr\local\hadoop-1.1.2\conf if you are using windows machine).
Note that, remove # from starting of line and USE \\ twice in file
export JAVA_HOME=C:\\Java\\jdk1.7.0_25
if everything OK with JAVA_PATH you can check from CYGWIN consol
try below code to get javaPath which hadoop will use
echo $JAVA_HOME
here you will get java version path
also can set JAVAPATH from runtime,try below code on CYGWIN treminal
export JAVA_HOME=C:/JAVA/Jdk1.7.0_25
2. USER ISSUE
first of all Note that when you start with hadoop installation use same USER for MASTER and SALVE.
If you have different user then have to generate one extra file with name config(without extension)
if your MASTER's machine name is jubin-pc and username is jubinp and SLAVES machine name is trainees11 and username is trainees(have to do vice-verse for both)
config file(for MASTER) Location C:\cygwin\home\jubinp\.ssh\
Host trainees11
User trainees
config file(for SLAVE) Location C:\cygwin\home\trainees\.ssh\
Host jubin-pc
User jubinp
Solution for hadoop-2.6.0 and early:
Be sure that path to JDK does not contain backspaces.
(my variant C:\Java\jdk1.8.0_25)
Add JAVA_HOME to path
My Computer -> Properties -> Advanced -> Environment Variables -> Create
JAVA_HOME
C:\Java\jdk1.8.0_25
Add ;%JAVA_HOME%\bin to system Path
Open hadoop-env.sh (It located in C:\hadoop-2.6.0\etc\hadoop for my hadoop-2.6.0)
and add line export JAVA_HOME=C:/Java/Jdk1.8.0_25
Quit Cygwin.
Your path to the bin folder of JAVA contains is in another folder named bin ? I don't think so.
Install a JDK (not a JRE) properly in a Path with no blanks. For example : C:\jdk1.7.0_21
In Windows :
Add an environnement variable JAVA_HOME to C:\jdk1.7.0_21
Then, add the JAVA_HOME/bin to your PATH.
Edit hadoop/conf/hadoop-env.sh : Uncomment the JAVA_HOME Export. For my example :
export JAVA_HOME=/cygdrive/c/jdk1.7.0_21/

Installing Zend Framework After Xampp

I am running Windows 7 and am using Xampp. I would like to install the Zend framework for PHP, but I am having difficulties understanding how to install it. I have used the Zend framework before, but it was already installed on the Linux system I was working on.
I am reading through the Zend documentation here: http://framework.zend.com/manual/en/learning.quickstart.create-project.html
I am having trouble with updating the includes_path portion. My original include path was include_path = ".;C:\xampp\php\PEAR", but I updated it to include_path = ".;C:\Zend".
I then followed the directions for creating a new project by opening the command line tool and running % C:\Zend\bin\zf.sh create project testproject in the desired directory. I get the following error message: '%' is not recognized as an internal or external command, or a batch file.
Some help with this would be greatly appreciated.
C:\Zend\bin\zf.sh is for linux, you need C:\Zend\bin\zf.bat I dont even know how you could run it
To setup ZF you just need to add C:\Zend\library in your include_path
To fix PHP not found you need to add ;C:\xampp\php\ to the environment variable "Path" from
Click Start, Right Click on Computer then click Properties
Then Advanced System settings>Advanced Tab/Environment Variables>System variables>Path/Edit...
Then append to the end ;C:\xampp\php
You should also append ;C:\Zend\bin for easy access to zf.bat [2]
Then to create project do not use cd C:\Zend\bin!! because your project will be created into that directory. Use the full path C:\Zend\bin\zf create project quickstart
or if you did step [2], simply go to your htdocs (with cd your_htdocs_path) or whatever you set in apache for web root and execute zf create projext quickstart
you also might need to setup a virtual host "quickstart" in apache and probably new line in windows hosts file: 127.0.0.1 quickstart because ZF is designed mainly for virtual hosts
you can install the zend framework using PEAR
pear channel-discover zend.googlecode.com/svn
pear install zend/zend
If you don't know where is your PEAR executable, run a file search for "pear", or "pear.exe".
After that, check out the section called "Setting up the CLI tool on Windows" in http://framework.zend.com/manual/en/zend.tool.framework.clitool.html. It will help you set up a easy access to the zf command line tool.
You can also go to your zendFrame work where you have extracted it and then go to bin folder (in my case “C:\xampp\htdocs\ZendFramework\bin”) where you can find zf.bat .
Edit it with any editer and go to
“SET PHP_BIN=php.exe ”
and set it to
“SET PHP_BIN=C:\xampp\php\php.exe”
and it worked for me…
Look's like your trying to run a shell script on cmd, not going to work, try this:
Start > Run > cmd [enter]
> cd C:\path\to\html\docs
> C:\Zend\bin\zf create project quickstart
and do not place the > marker, just the cd .. and zf ..
Add the path to the php interpreter to the %PATH% environment variable:
control panel => system => Advanced system settings => Environment Variables... => System Variables => => append ;C:\xampp\php

Resources