Running Apache Spark on Windows 7 - windows

I am trying to run Apache Spark on Windows 7. At first I have installed SBT by msi, then extracted files from spark-1.0.0 to program files by 7-zip. In the command line, I wrote the following:
spark-directory: sbt/sbt assembly
After a few seconds of processing, I got errors like:
-server access error: connection timed out
-could not retrieve jansi 1.1
-error during sbt execution: error retrieving required libraries
-unresolved dependency, jansi 1.1 not found
Could you please give me some advices about running Spark on Windows? I am looking for the right way because I am completely new with this technology. Regards.

You could use the pre-built spark from here
The scripts inside bin folder works in windows 7.
You need to set HADOOP_HOME variable in your path.
spark on windows for more information

If you are using building with sbt approach, then you'll need git also.
Install Scala, sbt and git on your machine. Download Spark source code and run following command
sbt assembly
In case,if you use prebuilt release,Here is the step by step process :
How to run Apache Spark on Windows7 in standalone mode

Related

How do I clear the cache in Pentaho Data Integration

I am trying to run Pentaho Data Integration (ver. 8.3) in my Windows machine and it is not working.
These are the steps I tried to make it work:
Tried rebooting the machine without success.
Also tried to run the Spoon.bat command directly from the directory where Pentaho is located, but it did not work.
Checked if my java installation changed since the last time it worked, it did not, what can be happening?
In a support chat I read someone was able to fix the problem by clearing the cache, but did not explain how to do it, how do I clean the cache?
Have you installed >jdk java 1.8 environment?
And you should open powershell or other terminal to check: java -version
These are the steps to clear the cache in a Windows environment:
Go to C:\Users\youruser.kettle
Look for the file db.cache-* (I have PDI version 8.3, my file is named db.cache-8.3.0.0-371)
Edit the file with any editor (i.e. Notepad) and erase all content
It worked for me!
First, you should run .\data-integration\set-pentaho-env.bat; It will set pdi environment. The most important is the Java HOME, and JAVA Version. PDI 8.3 can only run in JAVA 1.8 and above.

How to run Nutch in Hadoop installed in pseudo-distributed mode

I have Nutch 1.13 installed on my Ubuntu. I can run a crawl in standalone mode. It successfully runs and produces the desired results but I have no idea how to run it in hadoop now? I have Hadoop installed in pseudo distributed mode and I want to run a Nutch crawl with Hadoop and monitor it. How can I do it? There are a lot of tutorials for running it in standalone mode but I couldn't find any clear instructions on how Can I run it in Hadoop except that I have to use "Nutch Job" after I build it with ant.
Thanks for your help.
Make sure you have built Nutch from source i.e. don't use the binary release which works only in local mode. Once you've compile with
ant clean runtime
go to runtime/deploy/bin and run the scripts as usual.
NB you need to modify the conf files prior to recompiling.

build spark on windows from source code

It would be nice if I could download the source code of spark from github, then build it with sbt on my windows machine, and use IntelliJ to make little modifications to the code base. I have installed spark before on windows quite a few times, but I just use the packaged tarball and not the source code. Has anyone built the source code on a windows machine before?
You need to account also for the simple differences in \n\r and \n. So you should use dos2Unix utility for Linux and make sure that you are using an up to date version of Cygwin when installing and running hadoop utils.
I found the spark developer tools page and it was very helpful. I needed "build/sbt compile"
http://spark.apache.org/developer-tools.html#reducing-build-times

Running Spark-Shell on Windows

I have downloaded spark, sbt, scala, and git onto my Windows computer. When I try and run spark-shell in my command prompt, I get "Failed to find Spark assembly JAR. You need to build Spark with sbt\sbt assembly before running this program."
I tried to follow this guide: https://x86x64.wordpress.com/2015/04/29/installing-spark-on-windows/ ,but I don't have a build subfolder so I am not sure if that is the problem.
Any help would be appreciated.
That's an old guide for spark 1.3.
Please use this guide to set up spark on Windows.
http://www.ics.uci.edu/~shantas/Install_Spark_on_Windows10.pdf
This guide uses Maven and you are going to use sbt but nevertheless you will be able to execute spark-shell with this guide.

Coursera - Functional Programming Principles in Scala - can't work with example project because of errors

From that course
https://class.coursera.org/progfun-004/assignment
i downloaded
http://spark-public.s3.amazonaws.com/progfun/assignments/example.zip
Imported this to Intellij Idea.
But the problem is to verify code, because in course they running sbt in console...
After run "sbt" in console i get:
D:\learning\example>sbt
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
[info] Loading project definition from D:\learning\example\project\project
error: error while loading CharSequence, class file 'C:\Program Files\Java\jdk1.8.0_05\jre\lib\rt.jar(java/lang/CharSequence.cl
ass)' is broken
(bad constant pool tag 15 at byte 1501)
[error] Type error in expression
Project loading failed: (r)etry, (q)uit, (l)ast, or (i)gnore? q
I created new project in Intellij Idea with SBT and it works... but version of SBT is other then in example project. But when i changing version of SBT to newest i get dependency errors... I stack and can't move... How to solve situation like that?
I guess i can try move all project to Java8 or force sbt in my console to works with Java7. I don't know how to do both :)
I believe you're getting this issue because Scala prior 2.10.4 doesn't support JDK8. There is an issue on github describing the problem. You have to downgrade to Java 7.
If you're running sbt on Linux you can set the -java-home parameter.
$ sbt -help
# java version (default: java from PATH, currently java version "1.7.0_60")
-java-home <path> alternate JAVA_HOME
On Windows however it's not that easy.
Sbt uses sbt.bat to run sbt on Windows. You can find it by writing where sbt in windows console.
C:\Users\lpiepiora>where sbt
C:\Program Files\sbt\bin\sbt
C:\Program Files\sbt\bin\sbt.bat
Now you can edit C:\Program Files\sbt\bin\sbt.bat and at the beginning of the file just after
#REM SBT launcher script
#REM
#REM Envioronment:
#REM JAVA_HOME - location of a JDK home dir (mandatory)
#REM SBT_OPTS - JVM options (optional)
#REM Configuration:
#REM sbtconfig.txt found in the SBT_HOME.
add
set JAVA_HOME="C:\Program Files\...<path to your Java 7>"
If you're running sbt, restart it. Confirm the version you're running by typing about in the sbt command line.
I use windows and this command in git shell(http://msysgit.github.io/) works for me:
sbt -java-home "C:\Program Files\Java\jdk7"
Of course besides jdk8 you have to install jdk7 (I have installed it in path as above).

Resources